Re: lockdep warning with LTP dio test (v2.6.24-rc6-125-g5356f66)

2008-01-02 Thread Zach Brown
Erez Zadok wrote: Setting: ltp-full-20071031, dio01 test on ext3 with Linus's latest tree. Kernel w/ SMP, preemption, and lockdep configured. This is a real lock ordering problem. Thanks for reporting it. The updating of atime inside sys_mmap() orders the mmap_sem in the vfs outside of the

Re: [PATCH] dio: falling through to buffered I/O when invalidation of a page fails

2007-12-14 Thread Zach Brown
If anyone has a testcase - I can take a look at the problem again. I can try and throw something together.. - z - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at

Re: [PATCH] fix invalidate_inode_pages2_range not to clear ret

2007-12-13 Thread Zach Brown
in invalidate_inode_pages2_range(). Later do_launder_page() calls could overwrite errors generated by earlier calls. Fix this by storing do_launder_page in a temporary variable which is only promoted to the function's return code if it hadn't already generated an error. Signed-off-by: Zach Brown

Re: [PATCH] dio: falling through to buffered I/O when invalidation of a page fails

2007-12-11 Thread Zach Brown
Hisashi Hifumi wrote: Hi. Current dio has some problems: 1, In ext3 ordered, dio write can return with EIO because of the race between invalidation of a page and jbd. jbd pins the bhs while committing journal so try_to_release_page fails when jbd is committing the transaction. Yeah. It

Re: [PATCH]loop cleanup in fs/namespace.c - repost

2007-11-21 Thread Zach Brown
The patch given below replaces the goto-loop by a while-based one. That certainly looks fine. I would also replace the 'return' with 'break', but I guess that's more of a question of personal preference. Besides, it removes the export for the same routine, because there are no users for it

Re: [PATCH]loop cleanup in fs/namespace.c - repost

2007-11-21 Thread Zach Brown
This doesn't look fine. Did you test this? Oops, my fault. Of course, I tested the patch, but kernel modules are disabled in my test setup, so I missed the error. :) Enclosed to this message is a new patch, which replaces the goto-loop by the while-based one, but leaves the

[PATCH 4/4] add dio interface for page/offset/len tuples

2007-11-06 Thread Zach Brown
This is what it might look like to feed pgol in to some part of the fs stack instead of iovecs. I imagine we'd want to do it at a much higher level, perhaps something like vfs_write_pages(). --- fs/direct-io.c | 21 + 1 files changed, 21 insertions(+), 0 deletions(-) diff

[RFC] fs io with struct page instead of iovecs

2007-11-06 Thread Zach Brown
At the FS meeting at LCE there was some talk of doing O_DIRECT writes from the kernel with pages instead of with iovecs. This patch series explores one direction we could head in to achieve this. We obviously can't just translate user iovecs (which might represent more memory than the machine

[PATCH 1/4] struct rwmem: an abstraction of the memory argument to read/write

2007-11-06 Thread Zach Brown
This adds a structure and interface to represent the segments of memory which are acting as the source or destination for a read or write operation. Callers would fill this structure and then pass it down the rw path. The intent is to let stages in the rw path make specific calls against this

[PATCH 2/4] dio: use rwmem to work with r/w memory arguments

2007-11-06 Thread Zach Brown
This switches dio to work with the rwmem api to get memory pages for the IO instead of working with iovecs directly. It can use direct rwm struct accesses for some static universal properties of a set of memory segments that make up the buffer argument. It uses helper functions to work with the

Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-31 Thread Zach Brown
The second use case is to look at the physical layout of blocks on disk for a specific file, use Mark Lord's write_long patches to inject a disk error and then read that file to make sure that we are handling disk IO errors correctly. A bit obscure, but really quite useful. Hmm, yeah,

Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Zach Brown
But, we shouldn't inflict all of this on fibmap/fiemapwe'll get lost trying to make the one true interface for all operations. For grouping operations on files, I think a read_tree syscall with hints for what userland will do (read, stat, delete, list filenames), and a better cookie

Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Zach Brown
Can you clarify what you mean above with an example? I don't really follow. Sure, take 'tar' as an example. It'll read files in the order that their names are returned from directory listing. This can produce bad IO patterns because the order in which the file names are returned doesn't

Re: [PATCH 01/31] Add an ERR_CAST() macro to complement ERR_PTR and co. [try #5]

2007-10-25 Thread Zach Brown
+ * ERR_CAST - Explicitly cast an error-valued pointer to another pointer type + * @ptr: The pointer to cast. + * + * Explicitly cast an error-valued pointer to another pointer type in such a + * way as to make it clear that's what's going on. + */ +static inline void *ERR_CAST(const void

Re: [PATCH 01/31] Add an ERR_CAST() macro to complement ERR_PTR and co. [try #5]

2007-10-25 Thread Zach Brown
Roland Dreier wrote: +static inline void *ERR_CAST(const void *ptr) +{ +return (void *) ptr; +} Just to nit, surely you don't need the cast inside the function. The casting happens at the call site between the argument and returned pointer. The way it's

Re: [PATCH 07/30] IGET: Stop BEFS from using iget() and read_inode()

2007-10-01 Thread Zach Brown
If you're soliciting opinions, I think I tend to prefer the feel of the code paths after the changes. I don't know the benefits of the change are worth the risk in unmaintained file systems, though. + return ERR_PTR(PTR_ERR(inode)); This caught my eye. Surely we can do better :).

Re: [PATCH 07/30] IGET: Stop BEFS from using iget() and read_inode()

2007-10-01 Thread Zach Brown
return ERR_PTR(PTR_ERR(inode)); I tend to prefer the latter. It seems like a pretty noisy way to get a (void *) cast :/. Maybe a function that has the cast but makes sure it's only used for IS_ERR() pointers? /* haha, continuing the fine tradition of terrible names in this

Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-13 Thread Zach Brown
I fear the consequences of this change :( I love it. In the past I've lost time by working with patches which didn't quite realize that ext3 holds a transaction open during -direct_IO. Oh well, please keep it alive, maybe beat on it a bit, resend it later on? I can test the patch to make

Re: [PATCH] dio: remove bogus refcounting BUG_ON

2007-07-05 Thread Zach Brown
the BUG_ON(). But unfortunately, our perf. team is able reproduce the problem. What are they doing to reproduce it? How much setup does it take? Debug indicated that, the ret2 == 1 :( That could be consistent with the theory that we're racing with the dio struct being freed and reused

Re: vm/fs meetup details

2007-07-05 Thread Zach Brown
- repair driven design, we know what it is (Val told us), but how does it apply to the things we are currently working on? should we do more of it? I'm sure Chris and I could talk about the design elements in btrfs that should aid repair if folks are interested in hearing about them.

[PATCH] dio: remove bogus refcounting BUG_ON

2007-07-03 Thread Zach Brown
the lock if the final reference was just dropped. Another CPU might free the dio in bio completion and reuse the memory after this path drops the dio lock but before the BUG_ON() is evaluated. This patch passed aio+dio regression unit tests and aio-stress on ext3. Signed-off-by: Zach Brown [EMAIL

Re: DIO panic on 2.6.21.5

2007-06-29 Thread Zach Brown
On Jun 27, 2007, at 8:01 PM, Badari Pulavarty wrote: Hi Zach, One of our perf. team ran into this while doing some runs. I didn't see anything obvious - it looks like we converted async IO to synchronous one. I didn't spend much time digging around. It looks pretty bad, a *shouldn't happen*

Re: [PATCH] eCryptfs: Delay writing 0's after llseek until write

2007-05-22 Thread Zach Brown
FWIW, I believe Andrew's point was that critical information for Joe Enduser (and Joe Patch-Ho) was lacking in the original changelog. and don't forget Joe eCryptfs-Maintainer-2-Years-In-The-Future. - z - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a

Re: [PATCH 2 of 8] Change O_DIRECT to use placeholders instead of i_mutex/i_alloc_sem locking

2007-02-07 Thread Zach Brown
+static void dio_unlock_page_range(struct dio *dio) +{ + if (dio-lock_type != DIO_NO_LOCKING) { + remove_placeholder_pages(dio-inode-i_mapping, +dio-fspages_start_off, +dio-fspages_end_off); +

Re: [PATCH 2 of 8] Change O_DIRECT to use placeholders instead of i_mutex/i_alloc_sem locking

2007-02-07 Thread Zach Brown
The test case Linus sent me boils down to this: fd = open(file) buffer = mmap(fd, 128 pages); close(fd); fd = open(file, O_DIRECT); write(fd, buffer, 66 pages); Yeah, though I bet the inner close/open isn't needed. I think the deadlock is limited to cases where get_user_pages will get stuck

Re: [RFC] Heads up on a series of AIO patchsets

2007-01-02 Thread Zach Brown
). The IO_CMD_EPOLL_WAIT patch (originally from Zach Brown with modifications from Jeff Moyer and me) addresses this problem for native linux aio in a simple manner. It's simple looking, sure. This current flipping didn't even occur to me while throwing the patch together

Re: [PATCH] configfs, a filesystem for userspace-driven kernel object configuration

2005-04-05 Thread Zach Brown
Arjan van de Ven wrote: On Sun, 2005-04-03 at 12:57 -0700, Joel Becker wrote: Folks, I humbly submit configfs. With configfs, a configfs config_item is created via an explicit userspace operation: mkdir(2). It is destroyed via rmdir(2). The attributes appear at mkdir(2) time, and can be

Re: Address space operations questions

2005-03-31 Thread Zach Brown
Bryan Henderson wrote: So, semantics of -sync_page() are roughly kick underlying storage driver to actually perform all IO queued for this page, and, maybe, for other pages on this device too. I prefer to think of it in a more modular sense. To preserve modularity, the caller of

Re: Efficient handling of sparse files

2005-02-28 Thread Zach Brown
I was wondering if we could introduce a new system call (or ioctl?) that, given an fd would find the next block with data in it. We could use the -bmap method ... except that has dire warnings about adding new callers and viro may soon be in testicle-gouging range. Hmm. What you're talking

Re: Efficient handling of sparse files

2005-02-28 Thread Zach Brown
Please keep one thing in mind and that is that there are file systems where -bmap actually makes no sense whatsoever Of course, so return -ESORRY. This is one of the reasons why noone should be using -bmap. It is a stupid interface that only fits very particular sets of file systems and