RE: [patch] optimize o_direct on block device - v3

2007-01-11 Thread Chen, Kenneth W
Randy Dunlap wrote on Thursday, January 11, 2007 1:45 PM > > +/* return a pge back to pvec array */ > > is pge just a typo or some other tla that i don't know? > (not portland general electric or pacific gas & electric) Typo with fat fingers. Thanks for catching it. Full patch with typo fixed.

RE: [patch] optimize o_direct on block device - v3

2007-01-11 Thread Chen, Kenneth W
Andrew Morton wrote on Thursday, January 11, 2007 11:29 AM > On Thu, 11 Jan 2007 13:21:57 -0600 > Michael Reed <[EMAIL PROTECTED]> wrote: > > Testing on my ia64 system reveals that this patch introduces a > > data integrity error for direct i/o to a block device. Device > > errors which result in

RE: [patch] optimize o_direct on block device - v3

2007-01-11 Thread Chen, Kenneth W
Andrew Morton wrote on Thursday, January 11, 2007 11:29 AM On Thu, 11 Jan 2007 13:21:57 -0600 Michael Reed [EMAIL PROTECTED] wrote: Testing on my ia64 system reveals that this patch introduces a data integrity error for direct i/o to a block device. Device errors which result in i/o

RE: [patch] optimize o_direct on block device - v3

2007-01-11 Thread Chen, Kenneth W
Randy Dunlap wrote on Thursday, January 11, 2007 1:45 PM +/* return a pge back to pvec array */ is pge just a typo or some other tla that i don't know? (not portland general electric or pacific gas electric) Typo with fat fingers. Thanks for catching it. Full patch with typo fixed.

RE: [PATCH] 4/4 block: explicit plugging

2007-01-05 Thread Chen, Kenneth W
Andrew Morton wrote on Wednesday, January 03, 2007 12:09 AM > Do you have any benchmarks which got faster with these changes? Jens Axboe wrote on Wednesday, January 03, 2007 12:22 AM > I've asked Ken to run this series on some of his big iron, I hope he'll > have some results for us soonish. I

RE: [PATCH] 4/4 block: explicit plugging

2007-01-05 Thread Chen, Kenneth W
Andrew Morton wrote on Wednesday, January 03, 2007 12:09 AM Do you have any benchmarks which got faster with these changes? Jens Axboe wrote on Wednesday, January 03, 2007 12:22 AM I've asked Ken to run this series on some of his big iron, I hope he'll have some results for us soonish. I can

RE: open(O_DIRECT) on a tmpfs?

2007-01-04 Thread Chen, Kenneth W
Hugh Dickins wrote on Thursday, January 04, 2007 11:14 AM > On Thu, 4 Jan 2007, Hua Zhong wrote: > > So I'd argue that it makes more sense to support O_DIRECT > > on tmpfs as the memory IS the backing store. > > A few more voices in favour and I'll be persuaded. Perhaps I'm > out of date: when

RE: open(O_DIRECT) on a tmpfs?

2007-01-04 Thread Chen, Kenneth W
Hugh Dickins wrote on Thursday, January 04, 2007 11:14 AM On Thu, 4 Jan 2007, Hua Zhong wrote: So I'd argue that it makes more sense to support O_DIRECT on tmpfs as the memory IS the backing store. A few more voices in favour and I'll be persuaded. Perhaps I'm out of date: when O_DIRECT

RE: [PATCH] 4/4 block: explicit plugging

2007-01-03 Thread Chen, Kenneth W
Jens Axboe wrote on Wednesday, January 03, 2007 2:30 PM > > We are having some trouble with the patch set that some of our fiber channel > > host controller doesn't initialize properly anymore and thus lost whole > > bunch of disks (somewhere around 200 disks out of 900) at boot time. > >

RE: [PATCH] 4/4 block: explicit plugging

2007-01-03 Thread Chen, Kenneth W
Jens Axboe wrote on Wednesday, January 03, 2007 12:22 AM > > Do you have any benchmarks which got faster with these changes? > > On the hardware I have immediately available, I see no regressions wrt > performance. With instrumentation it's simple to demonstrate that most > of the queueing

RE: [PATCH] 4/4 block: explicit plugging

2007-01-03 Thread Chen, Kenneth W
Jens Axboe wrote on Wednesday, January 03, 2007 12:22 AM Do you have any benchmarks which got faster with these changes? On the hardware I have immediately available, I see no regressions wrt performance. With instrumentation it's simple to demonstrate that most of the queueing activity of

RE: [PATCH] 4/4 block: explicit plugging

2007-01-03 Thread Chen, Kenneth W
Jens Axboe wrote on Wednesday, January 03, 2007 2:30 PM We are having some trouble with the patch set that some of our fiber channel host controller doesn't initialize properly anymore and thus lost whole bunch of disks (somewhere around 200 disks out of 900) at boot time. Presumably FC

RE: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 6:06 PM > > In the example you > > gave earlier, task with min_nr of 2 will be woken up after 4 completed > > events. > > I only gave 2 ios/events in that example. > > Does that clear up the confusion? It occurs to me that people might not be aware

RE: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 6:06 PM > On Jan 2, 2007, at 5:50 PM, Chen, Kenneth W wrote: > > Zach Brown wrote on Tuesday, January 02, 2007 5:24 PM > >>> That is not possible because when multiple tasks waiting for > >>> events, they > &g

RE: [patch] aio: streamline read events after woken up

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 5:06 PM > To: Chen, Kenneth W > > Given the previous patch "aio: add per task aio wait event condition" > > that we properly wake up event waiting process knowing that we have > > enough events to reap, it's just plain

RE: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 5:24 PM > > That is not possible because when multiple tasks waiting for > > events, they > > enter the wait queue in FIFO order, prepare_to_wait_exclusive() does > > __add_wait_queue_tail(). So first io_getevents() with min_nr of 2 > > will be

RE: [patch] aio: make aio_ring_info->nr_pages an unsigned int

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 5:14 PM > To: Chen, Kenneth W > > --- ./include/linux/aio.h.orig 2006-12-24 22:31:55.0 -0800 > > +++ ./include/linux/aio.h 2006-12-24 22:41:28.0 -0800 > > @@ -165,7 +165,7 @@ struct aio_ring_info { &g

RE: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 4:49 PM > On Dec 29, 2006, at 6:31 PM, Chen, Kenneth W wrote: > > This patch adds a wait condition to the wait queue and only wake-up > > process when that condition meets. And this condition is added on a > > per task base for ha

RE: [patch] remove redundant iov segment check

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 10:22 AM > >> I wonder if it wouldn't be better to make this change as part of a > >> larger change that moves towards an explicit iovec container struct > >> rather than bare 'struct iov *' and 'nr_segs' arguments. > > > I suspect it should be rather

RE: [patch] remove redundant iov segment check

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 10:22 AM I wonder if it wouldn't be better to make this change as part of a larger change that moves towards an explicit iovec container struct rather than bare 'struct iov *' and 'nr_segs' arguments. I suspect it should be rather trivial to

RE: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 4:49 PM On Dec 29, 2006, at 6:31 PM, Chen, Kenneth W wrote: This patch adds a wait condition to the wait queue and only wake-up process when that condition meets. And this condition is added on a per task base for handling multi-threaded app

RE: [patch] aio: make aio_ring_info-nr_pages an unsigned int

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 5:14 PM To: Chen, Kenneth W --- ./include/linux/aio.h.orig 2006-12-24 22:31:55.0 -0800 +++ ./include/linux/aio.h 2006-12-24 22:41:28.0 -0800 @@ -165,7 +165,7 @@ struct aio_ring_info { struct page

RE: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 5:24 PM That is not possible because when multiple tasks waiting for events, they enter the wait queue in FIFO order, prepare_to_wait_exclusive() does __add_wait_queue_tail(). So first io_getevents() with min_nr of 2 will be woken up

RE: [patch] aio: streamline read events after woken up

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 5:06 PM To: Chen, Kenneth W Given the previous patch aio: add per task aio wait event condition that we properly wake up event waiting process knowing that we have enough events to reap, it's just plain waste of time to insert itself

RE: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 6:06 PM On Jan 2, 2007, at 5:50 PM, Chen, Kenneth W wrote: Zach Brown wrote on Tuesday, January 02, 2007 5:24 PM That is not possible because when multiple tasks waiting for events, they enter the wait queue in FIFO order

RE: [patch] aio: add per task aio wait event condition

2007-01-02 Thread Chen, Kenneth W
Zach Brown wrote on Tuesday, January 02, 2007 6:06 PM In the example you gave earlier, task with min_nr of 2 will be woken up after 4 completed events. I only gave 2 ios/events in that example. Does that clear up the confusion? It occurs to me that people might not be aware how

[patch] aio: make aio_ring_info->nr_pages an unsigned int

2006-12-29 Thread Chen, Kenneth W
The number of io_event in AIO event queue allowed currently is no more than 2^32-1, because the syscall defines: asmlinkage long sys_io_setup(unsigned nr_events, aio_context_t __user *ctxp) We internally allocate a ring buffer for nr_events and keeps tracks of page descriptors for each

[patch] aio: remove spurious ring head index modulo info->nr

2006-12-29 Thread Chen, Kenneth W
In aio_read_evt(), the ring->head will never wrap info->nr because we already does the wrap when updating the ring head index: if (head != ring->tail) { ... head = (head + 1) % info->nr; ring->head = head; } This makes the modulo of

[patch] aio: streamline read events after woken up

2006-12-29 Thread Chen, Kenneth W
The read event loop in the blocking path is also inefficient. For every event it reap (if not blocking), it does the following in a loop: while (i < nr) { prepare_to_wait_exclusive aio_read_evt finish_wait ... } Given the previous patch

[patch] aio: add per task aio wait event condition

2006-12-29 Thread Chen, Kenneth W
The AIO wake-up notification from aio_complete is really inefficient in current AIO implementation in the presence of process waiting in io_getevents(). For example, if app calls io_getevents with min_nr > 1, and aio event queue doesn't have enough completed aio event, the process will block in

[patch] aio: fix buggy put_ioctx call in aio_complete - v2

2006-12-29 Thread Chen, Kenneth W
An AIO bug was reported that sleeping function is being called in softirq context: BUG: warning at kernel/mutex.c:132/__mutex_lock_common() Call Trace: [] __mutex_lock_slowpath+0x640/0x6c0 [] mutex_lock+0x20/0x40 [] flush_workqueue+0xb0/0x1a0 [] __put_ioctx+0xc0/0x240 []

[patch] aio: fix buggy put_ioctx call in aio_complete - v2

2006-12-29 Thread Chen, Kenneth W
An AIO bug was reported that sleeping function is being called in softirq context: BUG: warning at kernel/mutex.c:132/__mutex_lock_common() Call Trace: [a00100577b00] __mutex_lock_slowpath+0x640/0x6c0 [a00100577ba0] mutex_lock+0x20/0x40 [a001000a25b0]

[patch] aio: add per task aio wait event condition

2006-12-29 Thread Chen, Kenneth W
The AIO wake-up notification from aio_complete is really inefficient in current AIO implementation in the presence of process waiting in io_getevents(). For example, if app calls io_getevents with min_nr 1, and aio event queue doesn't have enough completed aio event, the process will block in

[patch] aio: streamline read events after woken up

2006-12-29 Thread Chen, Kenneth W
The read event loop in the blocking path is also inefficient. For every event it reap (if not blocking), it does the following in a loop: while (i nr) { prepare_to_wait_exclusive aio_read_evt finish_wait ... } Given the previous patch

[patch] aio: remove spurious ring head index modulo info-nr

2006-12-29 Thread Chen, Kenneth W
In aio_read_evt(), the ring-head will never wrap info-nr because we already does the wrap when updating the ring head index: if (head != ring-tail) { ... head = (head + 1) % info-nr; ring-head = head; } This makes the modulo of

[patch] aio: make aio_ring_info-nr_pages an unsigned int

2006-12-29 Thread Chen, Kenneth W
The number of io_event in AIO event queue allowed currently is no more than 2^32-1, because the syscall defines: asmlinkage long sys_io_setup(unsigned nr_events, aio_context_t __user *ctxp) We internally allocate a ring buffer for nr_events and keeps tracks of page descriptors for each

RE: [PATCH] mm: fix page_mkclean_one

2006-12-27 Thread Chen, Kenneth W
Chen, Kenneth wrote on Wednesday, December 27, 2006 9:55 PM > Linus Torvalds wrote on Wednesday, December 27, 2006 7:05 PM > > On Wed, 27 Dec 2006, David Miller wrote: > > > > > > > > I still don't see _why_, though. But maybe smarter people than me can > > > > see > > > > it.. > > > > > >

RE: [PATCH] mm: fix page_mkclean_one

2006-12-27 Thread Chen, Kenneth W
Linus Torvalds wrote on Wednesday, December 27, 2006 7:05 PM > On Wed, 27 Dec 2006, David Miller wrote: > > > > > > I still don't see _why_, though. But maybe smarter people than me can see > > > it.. > > > > FWIW this program definitely triggers the bug for me. > > Ok, now that I have

RE: [PATCH] mm: fix page_mkclean_one

2006-12-27 Thread Chen, Kenneth W
Linus Torvalds wrote on Wednesday, December 27, 2006 7:05 PM On Wed, 27 Dec 2006, David Miller wrote: I still don't see _why_, though. But maybe smarter people than me can see it.. FWIW this program definitely triggers the bug for me. Ok, now that I have something simple to do

RE: [PATCH] mm: fix page_mkclean_one

2006-12-27 Thread Chen, Kenneth W
Chen, Kenneth wrote on Wednesday, December 27, 2006 9:55 PM Linus Torvalds wrote on Wednesday, December 27, 2006 7:05 PM On Wed, 27 Dec 2006, David Miller wrote: I still don't see _why_, though. But maybe smarter people than me can see it.. FWIW this program definitely

RE: [patch] aio: fix buggy put_ioctx call in aio_complete

2006-12-21 Thread Chen, Kenneth W
[EMAIL PROTECTED] wrote on Thursday, December 21, 2006 9:35 AM > kenneth.w.chen> Take ioctx_lock is one part, the other part is to move > kenneth.w.chen> spin_unlock_irqrestore(>ctx_lock, flags); > kenneth.w.chen> in aio_complete all the way down to the end of the > kenneth.w.chen> function,

RE: [patch] aio: fix buggy put_ioctx call in aio_complete

2006-12-21 Thread Chen, Kenneth W
[EMAIL PROTECTED] wrote on Thursday, December 21, 2006 8:56 AM > kenneth.w.chen> I think I'm going to abandon this whole synchronize thing > kenneth.w.chen> and going to put the wake up call inside ioctx_lock spin > kenneth.w.chen> lock along with the other patch you mentioned above in the >

RE: [patch] aio: fix buggy put_ioctx call in aio_complete

2006-12-21 Thread Chen, Kenneth W
Andrew Morton wrote on Thursday, December 21, 2006 12:18 AM > Alas, your above description doesn't really tell us what the bug is, so I'm > at a bit of a loss here. > > http://marc.theaimsgroup.com/?l=linux-aio=116616463009218=2> > > So that's a refcounting bug. But it's really a locking bug,

RE: [patch] aio: fix buggy put_ioctx call in aio_complete

2006-12-21 Thread Chen, Kenneth W
Andrew Morton wrote on Wednesday, December 20, 2006 8:06 PM > On Tue, 19 Dec 2006 13:49:18 -0800 > "Chen, Kenneth W" <[EMAIL PROTECTED]> wrote: > > Regarding to a bug report on: > > http://marc.theaimsgroup.com/?l=linux-kernel=116599593200888=2 > &g

RE: [patch] aio: fix buggy put_ioctx call in aio_complete

2006-12-21 Thread Chen, Kenneth W
Andrew Morton wrote on Wednesday, December 20, 2006 8:06 PM On Tue, 19 Dec 2006 13:49:18 -0800 Chen, Kenneth W [EMAIL PROTECTED] wrote: Regarding to a bug report on: http://marc.theaimsgroup.com/?l=linux-kernelm=116599593200888w=2 flush_workqueue() is not allowed to be called

RE: [patch] aio: fix buggy put_ioctx call in aio_complete

2006-12-21 Thread Chen, Kenneth W
Andrew Morton wrote on Thursday, December 21, 2006 12:18 AM Alas, your above description doesn't really tell us what the bug is, so I'm at a bit of a loss here. finds http://marc.theaimsgroup.com/?l=linux-aiom=116616463009218w=2 So that's a refcounting bug. But it's really a locking bug,

RE: [patch] aio: fix buggy put_ioctx call in aio_complete

2006-12-21 Thread Chen, Kenneth W
[EMAIL PROTECTED] wrote on Thursday, December 21, 2006 8:56 AM kenneth.w.chen I think I'm going to abandon this whole synchronize thing kenneth.w.chen and going to put the wake up call inside ioctx_lock spin kenneth.w.chen lock along with the other patch you mentioned above in the

RE: [patch] aio: fix buggy put_ioctx call in aio_complete

2006-12-21 Thread Chen, Kenneth W
[EMAIL PROTECTED] wrote on Thursday, December 21, 2006 9:35 AM kenneth.w.chen Take ioctx_lock is one part, the other part is to move kenneth.w.chen spin_unlock_irqrestore(ctx-ctx_lock, flags); kenneth.w.chen in aio_complete all the way down to the end of the kenneth.w.chen function,

RE: [RFC PATCH 1/8] rqbased-dm: allow blk_get_request() to be called from interrupt context

2006-12-20 Thread Chen, Kenneth W
Kiyoshi Ueda wrote on Wednesday, December 20, 2006 9:50 AM > On Wed, 20 Dec 2006 14:48:49 +0100, Jens Axboe <[EMAIL PROTECTED]> wrote: > > Big NACK on this - it's not only really ugly, it's also buggy to pass > > interrupt flags as function arguments. As you also mention in the 0/1 > > mail, this

RE: [RFC PATCH 1/8] rqbased-dm: allow blk_get_request() to be called from interrupt context

2006-12-20 Thread Chen, Kenneth W
Kiyoshi Ueda wrote on Wednesday, December 20, 2006 9:50 AM On Wed, 20 Dec 2006 14:48:49 +0100, Jens Axboe [EMAIL PROTECTED] wrote: Big NACK on this - it's not only really ugly, it's also buggy to pass interrupt flags as function arguments. As you also mention in the 0/1 mail, this also

[patch] aio: fix buggy put_ioctx call in aio_complete

2006-12-19 Thread Chen, Kenneth W
Regarding to a bug report on: http://marc.theaimsgroup.com/?l=linux-kernel=116599593200888=2 flush_workqueue() is not allowed to be called in the softirq context. However, aio_complete() called from I/O interrupt can potentially call put_ioctx with last ref count on ioctx and trigger a bug

[patch] aio: fix buggy put_ioctx call in aio_complete

2006-12-19 Thread Chen, Kenneth W
Regarding to a bug report on: http://marc.theaimsgroup.com/?l=linux-kernelm=116599593200888w=2 flush_workqueue() is not allowed to be called in the softirq context. However, aio_complete() called from I/O interrupt can potentially call put_ioctx with last ref count on ioctx and trigger a bug

RE: [PATCH] incorrect direct io error handling

2006-12-18 Thread Chen, Kenneth W
Dmitriy Monakhov wrote on Monday, December 18, 2006 5:23 AM > This patch is result of discussion started week ago here: > http://lkml.org/lkml/2006/12/11/66 > changes from original patch: > - Update wrong comments about i_mutex locking. > - Add BUG_ON(!mutex_is_locked(..)) for non blkdev. > -

RE: [PATCH] IA64: alignment bug in ldscript

2006-12-18 Thread Chen, Kenneth W
Kirill Korotaev wrote on Monday, December 18, 2006 4:05 AM > [IA64] bug in ldscript (mainstream) > > Occasionally, in mainstream number of fsys entries is even. Is it a typo on "fsys entries is even"? If not, then this change log is misleading. It is the instruction patch list of FSYS_RETURN

RE: [PATCH] incorrect direct io error handling

2006-12-18 Thread Chen, Kenneth W
Dmitriy Monakhov wrote on Monday, December 18, 2006 5:23 AM This patch is result of discussion started week ago here: http://lkml.org/lkml/2006/12/11/66 changes from original patch: - Update wrong comments about i_mutex locking. - Add BUG_ON(!mutex_is_locked(..)) for non blkdev. -

RE: [PATCH] IA64: alignment bug in ldscript

2006-12-18 Thread Chen, Kenneth W
Kirill Korotaev wrote on Monday, December 18, 2006 4:05 AM [IA64] bug in ldscript (mainstream) Occasionally, in mainstream number of fsys entries is even. Is it a typo on fsys entries is even? If not, then this change log is misleading. It is the instruction patch list of FSYS_RETURN that

RE: 2.6.18.4: flush_workqueue calls mutex_lock in interruptenvironment

2006-12-15 Thread Chen, Kenneth W
Trond Myklebust wrote on Friday, December 15, 2006 6:01 AM > Oops. Missed the fact that you are removed the put_ioctx from > aio_put_req, but the first sentence is still true. If you try to wake up > wait_for_all_aios before you've changed the condition it is waiting for, > then it may end up

RE: [PATCH] incorrect error handling inside generic_file_direct_write

2006-12-15 Thread Chen, Kenneth W
Christoph Hellwig wrote on Friday, December 15, 2006 2:44 AM > So we're doing the sync_page_range once in __generic_file_aio_write > with i_mutex held. > > > > mutex_lock(>i_mutex); > > - ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, > > - >ki_pos); > > +

RE: [PATCH] incorrect error handling inside generic_file_direct_write

2006-12-15 Thread Chen, Kenneth W
Christoph Hellwig wrote on Friday, December 15, 2006 2:44 AM So we're doing the sync_page_range once in __generic_file_aio_write with i_mutex held. mutex_lock(inode-i_mutex); - ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs, - iocb-ki_pos); + ret =

RE: 2.6.18.4: flush_workqueue calls mutex_lock in interruptenvironment

2006-12-15 Thread Chen, Kenneth W
Trond Myklebust wrote on Friday, December 15, 2006 6:01 AM Oops. Missed the fact that you are removed the put_ioctx from aio_put_req, but the first sentence is still true. If you try to wake up wait_for_all_aios before you've changed the condition it is waiting for, then it may end up hanging

RE: 2.6.18.4: flush_workqueue calls mutex_lock in interrupt environment

2006-12-14 Thread Chen, Kenneth W
Chen, Kenneth wrote on Thursday, December 14, 2006 5:59 PM > > It seems utterly insane to have aio_complete() flush a workqueue. That > > function has to be called from a number of different environments, > > including non-sleep tolerant environments. > > > > For instance it means that directIO

RE: 2.6.18.4: flush_workqueue calls mutex_lock in interrupt environment

2006-12-14 Thread Chen, Kenneth W
Andrew Morton wrote on Thursday, December 14, 2006 5:20 PM > it's hard to disagree. > > Begin forwarded message: > > On Wed, 2006-12-13 at 08:25 +0100, xb wrote: > > > Hi all, > > > > > > Running some IO stress tests on a 8*ways IA64 platform, we got: > > > BUG: warning at

RE: 2.6.18.4: flush_workqueue calls mutex_lock in interrupt environment

2006-12-14 Thread Chen, Kenneth W
Chen, Kenneth wrote on Thursday, December 14, 2006 5:59 PM It seems utterly insane to have aio_complete() flush a workqueue. That function has to be called from a number of different environments, including non-sleep tolerant environments. For instance it means that directIO on NFS will

RE: 2.6.18.4: flush_workqueue calls mutex_lock in interrupt environment

2006-12-14 Thread Chen, Kenneth W
Andrew Morton wrote on Thursday, December 14, 2006 5:20 PM it's hard to disagree. Begin forwarded message: On Wed, 2006-12-13 at 08:25 +0100, xb wrote: Hi all, Running some IO stress tests on a 8*ways IA64 platform, we got: BUG: warning at

RE: cfq performance gap

2006-12-13 Thread Chen, Kenneth W
Miquel van Smoorenburg wrote on Wednesday, December 13, 2006 1:57 AM > Chen, Kenneth W <[EMAIL PROTECTED]> wrote: > >This rawio test plows through sequential I/O and modulo each small record > >over number of threads. So each thread appears to be non-contiguous within >

RE: cfq performance gap

2006-12-13 Thread Chen, Kenneth W
Miquel van Smoorenburg wrote on Wednesday, December 13, 2006 1:57 AM Chen, Kenneth W [EMAIL PROTECTED] wrote: This rawio test plows through sequential I/O and modulo each small record over number of threads. So each thread appears to be non-contiguous within its own process context, overall

RE: cfq performance gap

2006-12-12 Thread Chen, Kenneth W
AVANTIKA R. MATHUR wrote on Tuesday, December 12, 2006 5:33 PM > >> rawio is actually performing sequential reads, but I don't believe it is > >> purely sequential with the multiple processes. > >> I am currently running the test with longer runtimes and will post > >> results once it is complete.

RE: [PATCH] incorrect error handling inside generic_file_direct_write

2006-12-12 Thread Chen, Kenneth W
Andrew Morton wrote on Tuesday, December 12, 2006 2:40 AM > On Tue, 12 Dec 2006 16:18:32 +0300 > Dmitriy Monakhov <[EMAIL PROTECTED]> wrote: > > > >> but according to filemaps locking rules: mm/filemap.c:77 > > >> .. > > >> * ->i_mutex(generic_file_buffered_write) > > >> *

RE: [PATCH] incorrect error handling inside generic_file_direct_write

2006-12-12 Thread Chen, Kenneth W
Andrew Morton wrote on Tuesday, December 12, 2006 2:40 AM On Tue, 12 Dec 2006 16:18:32 +0300 Dmitriy Monakhov [EMAIL PROTECTED] wrote: but according to filemaps locking rules: mm/filemap.c:77 .. * -i_mutex(generic_file_buffered_write) *-mmap_sem

RE: cfq performance gap

2006-12-12 Thread Chen, Kenneth W
AVANTIKA R. MATHUR wrote on Tuesday, December 12, 2006 5:33 PM rawio is actually performing sequential reads, but I don't believe it is purely sequential with the multiple processes. I am currently running the test with longer runtimes and will post results once it is complete. I've also

RE: [PATCH] connector: Some fixes for ia64 unaligned access errors

2006-12-11 Thread Chen, Kenneth W
Pete Zaitcev wrote on Monday, December 11, 2006 5:29 PM > On Mon, 11 Dec 2006 15:52:47 -0800, Matt Helsley <[EMAIL PROTECTED]> wrote: > > > I'm shocked memcpy() introduces 8-byte stores that violate architecture > > alignment rules. Is there any chance this a bug in ia64's memcpy() > >

RE: [PATCH] connector: Some fixes for ia64 unaligned access errors

2006-12-11 Thread Chen, Kenneth W
Pete Zaitcev wrote on Monday, December 11, 2006 5:29 PM On Mon, 11 Dec 2006 15:52:47 -0800, Matt Helsley [EMAIL PROTECTED] wrote: I'm shocked memcpy() introduces 8-byte stores that violate architecture alignment rules. Is there any chance this a bug in ia64's memcpy() implementation?

RE: [patch] speed up single bio_vec allocation

2006-12-08 Thread Chen, Kenneth W
> Chen, Kenneth wrote on Wednesday, December 06, 2006 10:20 AM > > Jens Axboe wrote on Wednesday, December 06, 2006 2:09 AM > > This is what I had in mind, in case it wasn't completely clear. Not > > tested, other than it compiles. Basically it eliminates the small > > bio_vec pool, and grows the

RE: [patch] speed up single bio_vec allocation

2006-12-08 Thread Chen, Kenneth W
Chen, Kenneth wrote on Wednesday, December 06, 2006 10:20 AM Jens Axboe wrote on Wednesday, December 06, 2006 2:09 AM This is what I had in mind, in case it wasn't completely clear. Not tested, other than it compiles. Basically it eliminates the small bio_vec pool, and grows the bio by

RE: [patch] speed up single bio_vec allocation

2006-12-07 Thread Chen, Kenneth W
Andi Kleen wrote on Thursday, December 07, 2006 6:28 PM > "Chen, Kenneth W" <[EMAIL PROTECTED]> writes: > > I tried to use cache_line_size() to find out the alignment of struct bio, > > but > > stumbled on that it is a runtime function for x86_64. >

RE: [patch] speed up single bio_vec allocation

2006-12-07 Thread Chen, Kenneth W
Nate Diller wrote on Thursday, December 07, 2006 1:46 PM > the current code is straightforward and obviously correct. you want > to make the alloc/dealloc paths more complex, by special-casing for an > arbitrary limit of "small" I/O, AFAICT. of *course* you can expect > less overhead when you're

RE: [patch] speed up single bio_vec allocation

2006-12-07 Thread Chen, Kenneth W
Nate Diller wrote on Thursday, December 07, 2006 11:22 AM > > > I still can't help but think we can do better than this, and that this > > > is nothing more than optimizing for a benchmark. For high performance > > > I/O, you will be doing > 1 page bio's anyway and this patch wont help > > > you

RE: [patch] speed up single bio_vec allocation

2006-12-07 Thread Chen, Kenneth W
Nate Diller wrote on Thursday, December 07, 2006 11:22 AM I still can't help but think we can do better than this, and that this is nothing more than optimizing for a benchmark. For high performance I/O, you will be doing 1 page bio's anyway and this patch wont help you at all.

RE: [patch] speed up single bio_vec allocation

2006-12-07 Thread Chen, Kenneth W
Nate Diller wrote on Thursday, December 07, 2006 1:46 PM the current code is straightforward and obviously correct. you want to make the alloc/dealloc paths more complex, by special-casing for an arbitrary limit of small I/O, AFAICT. of *course* you can expect less overhead when you're doing

RE: [patch] speed up single bio_vec allocation

2006-12-07 Thread Chen, Kenneth W
Andi Kleen wrote on Thursday, December 07, 2006 6:28 PM Chen, Kenneth W [EMAIL PROTECTED] writes: I tried to use cache_line_size() to find out the alignment of struct bio, but stumbled on that it is a runtime function for x86_64. It's a single global variable access: #define

[patch] optimize o_direct on block device - v3

2006-12-06 Thread Chen, Kenneth W
This patch implements block device specific .direct_IO method instead of going through generic direct_io_worker for block device. direct_io_worker is fairly complex because it needs to handle O_DIRECT on file system, where it needs to perform block allocation, hole detection, extents file on

RE: [patch] speed up single bio_vec allocation

2006-12-06 Thread Chen, Kenneth W
Jens Axboe wrote on Wednesday, December 06, 2006 2:09 AM > > > I will try that too. I'm a bit touchy about sharing a cache line for > > > different bio. But given that there are 200,000 I/O per second we are > > > currently pushing the kernel, the chances of two cpu working on two > > > bio that

RE: [patch] speed up single bio_vec allocation

2006-12-06 Thread Chen, Kenneth W
Jens Axboe wrote on Wednesday, December 06, 2006 2:09 AM I will try that too. I'm a bit touchy about sharing a cache line for different bio. But given that there are 200,000 I/O per second we are currently pushing the kernel, the chances of two cpu working on two bio that sits in the

[patch] optimize o_direct on block device - v3

2006-12-06 Thread Chen, Kenneth W
This patch implements block device specific .direct_IO method instead of going through generic direct_io_worker for block device. direct_io_worker is fairly complex because it needs to handle O_DIRECT on file system, where it needs to perform block allocation, hole detection, extents file on

RE: [-mm patch] sched remove lb_stopbalance counter

2006-12-05 Thread Chen, Kenneth W
Ingo Molnar wrote on Tuesday, December 05, 2006 7:42 AM > * Chen, Kenneth W <[EMAIL PROTECTED]> wrote: > > > but, please: > > > > > > > -#define SCHEDSTAT_VERSION 13 > > > > +#define SCHEDSTAT_VERSION 12 > > > > > > change t

RE: [-mm patch] sched remove lb_stopbalance counter

2006-12-05 Thread Chen, Kenneth W
Ingo Molnar wrote on Tuesday, December 05, 2006 7:32 AM > * Chen, Kenneth W <[EMAIL PROTECTED]> wrote: > > in -mm tree: I would like to revert the change on adding > > lb_stopbalance counter. This count can be calculated by: lb_balanced > > - lb_nobusyg - lb_nobusyq.

[-mm patch] sched remove lb_stopbalance counter

2006-12-05 Thread Chen, Kenneth W
Regarding to sched-decrease-number-of-load-balances.patch currently in -mm tree: I would like to revert the change on adding lb_stopbalance counter. This count can be calculated by: lb_balanced - lb_nobusyg - lb_nobusyq. There is no need to create gazillion counters while we can derive the

[-mm patch] sched remove lb_stopbalance counter

2006-12-05 Thread Chen, Kenneth W
Regarding to sched-decrease-number-of-load-balances.patch currently in -mm tree: I would like to revert the change on adding lb_stopbalance counter. This count can be calculated by: lb_balanced - lb_nobusyg - lb_nobusyq. There is no need to create gazillion counters while we can derive the

RE: [-mm patch] sched remove lb_stopbalance counter

2006-12-05 Thread Chen, Kenneth W
Ingo Molnar wrote on Tuesday, December 05, 2006 7:32 AM * Chen, Kenneth W [EMAIL PROTECTED] wrote: in -mm tree: I would like to revert the change on adding lb_stopbalance counter. This count can be calculated by: lb_balanced - lb_nobusyg - lb_nobusyq. There is no need to create

RE: [-mm patch] sched remove lb_stopbalance counter

2006-12-05 Thread Chen, Kenneth W
Ingo Molnar wrote on Tuesday, December 05, 2006 7:42 AM * Chen, Kenneth W [EMAIL PROTECTED] wrote: but, please: -#define SCHEDSTAT_VERSION 13 +#define SCHEDSTAT_VERSION 12 change this to 14 instead. Versions should only go upwards, even if we revert to an earlier output

RE: [patch] add an iterator index in struct pagevec

2006-12-04 Thread Chen, Kenneth W
Andrew Morton wrote on Monday, December 04, 2006 9:45 PM > On Mon, 4 Dec 2006 21:21:31 -0800 > "Chen, Kenneth W" <[EMAIL PROTECTED]> wrote: > > > pagevec is never expected to be more than PAGEVEC_SIZE, I think a > > unsigned char is enough to c

[patch] add an iterator index in struct pagevec

2006-12-04 Thread Chen, Kenneth W
pagevec is never expected to be more than PAGEVEC_SIZE, I think a unsigned char is enough to count them. This patch makes nr, cold to be unsigned char and also adds an iterator index. With that, the size can be even bumped up by 1 to 15. Signed-off-by: Ken Chen <[EMAIL PROTECTED]> diff -Nurp

[patch] optimize o_direct on block device - v2

2006-12-04 Thread Chen, Kenneth W
This patch implements block device specific .direct_IO method instead of going through generic direct_io_worker for block device. direct_io_worker is fairly complex because it needs to handle O_DIRECT on file system, where it needs to perform block allocation, hole detection, extents file on

RE: [patch] speed up single bio_vec allocation

2006-12-04 Thread Chen, Kenneth W
Jens Axboe wrote on Monday, December 04, 2006 12:07 PM > On Mon, Dec 04 2006, Chen, Kenneth W wrote: > > On 64-bit arch like x86_64, struct bio is 104 byte. Since bio slab is > > created with SLAB_HWCACHE_ALIGN flag, there are usually spare memory > > available at the end of

RE: [patch] remove redundant iov segment check

2006-12-04 Thread Chen, Kenneth W
Andrew Morton wrote on Monday, December 04, 2006 11:36 AM > On Mon, 4 Dec 2006 08:26:36 -0800 > "Chen, Kenneth W" <[EMAIL PROTECTED]> wrote: > > > So it's not possible to call down to generic_file_aio_read/write with > > invalid > > iov segment. Pa

RE: [patch] remove redundant iov segment check

2006-12-04 Thread Chen, Kenneth W
Zach Brown wrote on Monday, December 04, 2006 11:19 AM > On Dec 4, 2006, at 8:26 AM, Chen, Kenneth W wrote: > > > The access_ok() and negative length check on each iov segment in > > function > > generic_file_aio_read/write are redundant. They are all already > &

[patch] speed up single bio_vec allocation

2006-12-04 Thread Chen, Kenneth W
On 64-bit arch like x86_64, struct bio is 104 byte. Since bio slab is created with SLAB_HWCACHE_ALIGN flag, there are usually spare memory available at the end of bio. I think we can utilize that memory for bio_vec allocation. The purpose is not so much on saving memory consumption for bio_vec,

[patch] remove redundant iov segment check

2006-12-04 Thread Chen, Kenneth W
The access_ok() and negative length check on each iov segment in function generic_file_aio_read/write are redundant. They are all already checked before calling down to these low level generic functions. Vector I/O (both sync and async) are checked via rw_copy_check_uvector(). For single segment

[patch] kill pointless ki_nbytes assignment in aio_setup_single_vector

2006-12-04 Thread Chen, Kenneth W
io_submit_one assigns ki_left = ki_nbytes = iocb->aio_nbytes, then calls down to aio_setup_iocb, then to aio_setup_single_vector. In there, ki_nbytes is reassigned to the same value it got two call stack above it. There is no need to do so. Signed-off-by: Ken Chen <[EMAIL PROTECTED]> diff -Nurp

[patch] kill pointless ki_nbytes assignment in aio_setup_single_vector

2006-12-04 Thread Chen, Kenneth W
io_submit_one assigns ki_left = ki_nbytes = iocb-aio_nbytes, then calls down to aio_setup_iocb, then to aio_setup_single_vector. In there, ki_nbytes is reassigned to the same value it got two call stack above it. There is no need to do so. Signed-off-by: Ken Chen [EMAIL PROTECTED] diff -Nurp

  1   2   3   >