Erez Zadok wrote:
Setting: ltp-full-20071031, dio01 test on ext3 with Linus's latest tree.
Kernel w/ SMP, preemption, and lockdep configured.
This is a real lock ordering problem. Thanks for reporting it.
The updating of atime inside sys_mmap() orders the mmap_sem in the vfs
outside of the
If anyone has a testcase - I can take a look at the problem again.
I can try and throw something together..
- z
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at
in
invalidate_inode_pages2_range(). Later do_launder_page() calls could overwrite
errors generated by earlier calls. Fix this by storing do_launder_page in a
temporary variable which is only promoted to the function's return code if it
hadn't already generated an error.
Signed-off-by: Zach Brown
Hisashi Hifumi wrote:
Hi.
Current dio has some problems:
1, In ext3 ordered, dio write can return with EIO because of the race
between invalidation of
a page and jbd. jbd pins the bhs while committing journal so
try_to_release_page fails when jbd
is committing the transaction.
Yeah. It
The patch given below replaces the goto-loop by a while-based one.
That certainly looks fine. I would also replace the 'return' with
'break', but I guess that's more of a question of personal preference.
Besides, it removes the export for the same routine, because there are
no users for it
This doesn't look fine. Did you test this?
Oops, my fault. Of course, I tested the patch, but kernel modules are
disabled in my test setup, so I missed the error.
:)
Enclosed to this message is a new patch, which replaces the goto-loop by
the while-based one, but leaves the
This is what it might look like to feed pgol in to some part of the fs stack
instead of iovecs. I imagine we'd want to do it at a much higher level, perhaps
something like vfs_write_pages().
---
fs/direct-io.c | 21 +
1 files changed, 21 insertions(+), 0 deletions(-)
diff
At the FS meeting at LCE there was some talk of doing O_DIRECT writes from the
kernel with pages instead of with iovecs. This patch series explores one
direction we could head in to achieve this.
We obviously can't just translate user iovecs (which might represent more
memory than the machine
This adds a structure and interface to represent the segments of memory
which are acting as the source or destination for a read or write operation.
Callers would fill this structure and then pass it down the rw path.
The intent is to let stages in the rw path make specific calls against this
This switches dio to work with the rwmem api to get memory pages for the IO
instead of working with iovecs directly.
It can use direct rwm struct accesses for some static universal properties of a
set of memory segments that make up the buffer argument.
It uses helper functions to work with the
The second use case is to look at the physical layout of blocks on disk
for a specific file, use Mark Lord's write_long patches to inject a disk
error and then read that file to make sure that we are handling disk IO
errors correctly. A bit obscure, but really quite useful.
Hmm, yeah,
But, we shouldn't inflict all of this on fibmap/fiemapwe'll get
lost trying to make the one true interface for all operations.
For grouping operations on files, I think a read_tree syscall with
hints for what userland will do (read, stat, delete, list
filenames), and a better cookie
Can you clarify what you mean above with an example? I don't really
follow.
Sure, take 'tar' as an example. It'll read files in the order that
their names are returned from directory listing. This can produce bad
IO patterns because the order in which the file names are returned
doesn't
+ * ERR_CAST - Explicitly cast an error-valued pointer to another pointer type
+ * @ptr: The pointer to cast.
+ *
+ * Explicitly cast an error-valued pointer to another pointer type in such a
+ * way as to make it clear that's what's going on.
+ */
+static inline void *ERR_CAST(const void
Roland Dreier wrote:
+static inline void *ERR_CAST(const void *ptr)
+{
+return (void *) ptr;
+}
Just to nit, surely you don't need the cast inside the function. The
casting happens at the call site between the argument and returned pointer.
The way it's
If you're soliciting opinions, I think I tend to prefer the feel of the
code paths after the changes. I don't know the benefits of the change
are worth the risk in unmaintained file systems, though.
+ return ERR_PTR(PTR_ERR(inode));
This caught my eye. Surely we can do better :).
return ERR_PTR(PTR_ERR(inode));
I tend to prefer the latter.
It seems like a pretty noisy way to get a (void *) cast :/. Maybe a
function that has the cast but makes sure it's only used for IS_ERR()
pointers?
/* haha, continuing the fine tradition of terrible names in this
I fear the consequences of this change :(
I love it. In the past I've lost time by working with patches which
didn't quite realize that ext3 holds a transaction open during
-direct_IO.
Oh well, please keep it alive, maybe beat on it a bit, resend it
later on?
I can test the patch to make
the BUG_ON(). But unfortunately, our perf. team is able reproduce the
problem.
What are they doing to reproduce it? How much setup does it take?
Debug indicated that, the ret2 == 1 :(
That could be consistent with the theory that we're racing with the
dio struct being freed and reused
- repair driven design, we know what it is (Val told us), but
how does it apply to the things we are currently working on?
should we do more of it?
I'm sure Chris and I could talk about the design elements in btrfs
that should aid repair if folks are interested in hearing about
them.
the
lock if the final reference was just dropped. Another CPU might free
the dio in bio completion and reuse the memory after this path drops the
dio lock but before the BUG_ON() is evaluated.
This patch passed aio+dio regression unit tests and aio-stress on ext3.
Signed-off-by: Zach Brown [EMAIL
On Jun 27, 2007, at 8:01 PM, Badari Pulavarty wrote:
Hi Zach,
One of our perf. team ran into this while doing some runs.
I didn't see anything obvious - it looks like we converted
async IO to synchronous one. I didn't spend much time digging
around.
It looks pretty bad, a *shouldn't happen*
FWIW, I believe Andrew's point was that critical information for Joe
Enduser (and Joe Patch-Ho) was lacking in the original changelog.
and don't forget Joe eCryptfs-Maintainer-2-Years-In-The-Future.
- z
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a
+static void dio_unlock_page_range(struct dio *dio)
+{
+ if (dio-lock_type != DIO_NO_LOCKING) {
+ remove_placeholder_pages(dio-inode-i_mapping,
+dio-fspages_start_off,
+dio-fspages_end_off);
+
The test case Linus sent me boils down to this:
fd = open(file)
buffer = mmap(fd, 128 pages);
close(fd);
fd = open(file, O_DIRECT);
write(fd, buffer, 66 pages);
Yeah, though I bet the inner close/open isn't needed.
I think the deadlock is limited to cases where get_user_pages will get
stuck
). The IO_CMD_EPOLL_WAIT patch (originally from Zach
Brown with
modifications from Jeff Moyer and me) addresses this problem
for native
linux aio in a simple manner.
It's simple looking, sure. This current flipping didn't even occur
to me while throwing the patch together
Arjan van de Ven wrote:
On Sun, 2005-04-03 at 12:57 -0700, Joel Becker wrote:
Folks,
I humbly submit configfs. With configfs, a configfs
config_item is created via an explicit userspace operation: mkdir(2).
It is destroyed via rmdir(2). The attributes appear at mkdir(2) time,
and can be
Bryan Henderson wrote:
So, semantics of -sync_page() are roughly kick underlying storage
driver to actually perform all IO queued for this page, and, maybe, for
other pages on this device too.
I prefer to think of it in a more modular sense. To preserve modularity,
the caller of
I was wondering if we could introduce a new system call (or ioctl?) that,
given an fd would find the next block with data in it. We could use the
-bmap method ... except that has dire warnings about adding new callers
and viro may soon be in testicle-gouging range.
Hmm. What you're talking
Please keep one thing in mind and that is that there are file systems
where -bmap actually makes no sense whatsoever
Of course, so return -ESORRY.
This is one of the reasons why noone should be using -bmap. It is a
stupid interface that only fits very particular sets of file systems and
30 matches
Mail list logo