Re: GFS

2005-08-08 Thread Zach Brown
Pekka J Enberg wrote: Sorry if this is an obvious question but what prevents another thread from doing mmap() before we do the second walk and messing up num_gh? Nothing, I suspect. OCFS2 has a problem like this, too. It wants a way for a file system to serialize mmap/munmap/mremap during

Re: GFS

2005-08-09 Thread Zach Brown
Pekka Enberg wrote: In addition, the vma walk will become an unmaintainable mess as soon as someone introduces another mmap() capable fs that needs similar locking. Yup, I suspect that if the core kernel ends up caring about this problem then the VFS will be involved in helping file systems

Re: Vectored AIO breakage for sockets and pipes ?

2007-01-18 Thread Zach Brown
I'm not sure what the best way to fix this is. One option is to always make a copy of the iovec and pass that down. Any other thoughts ? Can we use this as another motivation to introduce an iovec container struct instead of passing a raw iov/seg? The transition could turn hand-rolled

Re: [patch] optimize o_direct on block device - v3

2006-12-07 Thread Zach Brown
(my monkey test code is on http://kernel-perf.sourceforge.net/ diotest). Nice. Do you have any interest in working with the autotest ( http:// test.kernel.org/autotest ) guys to get your tests into their rotation? - z - To unsubscribe from this list: send the line unsubscribe linux-kernel

Re: [PATCH -mm 1/5][AIO] - Rework compat_sys_io_submit

2006-11-29 Thread Zach Brown
On Nov 29, 2006, at 2:32 AM, Sébastien Dugué wrote: compat_sys_io_submit() cleanup Cleanup compat_sys_io_submit by duplicating some of the native syscall logic in the compat layer and directly calling io_submit_one() instead of fooling the syscall into thinking it is

Re: [PATCH -mm 1/5][AIO] - Rework compat_sys_io_submit

2006-11-30 Thread Zach Brown
sys_io_getevents() reads: uh! ^you must be meaning sys_io_submit()? Heh, yes, of course. Damn these fingers! - z - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at

Re: [rfc patch] optimize o_direct on block device

2006-11-30 Thread Zach Brown
At that time, a patch was written for raw device to demonstrate that large performance head room is achievable (at ~20% speedup for micro- benchmark and ~2% for db transaction processing benchmark) with a tight I/O submission processing loop. Where exactly does the benefit come from? icache

Re: [rfc patch] optimize o_direct on block device

2006-12-01 Thread Zach Brown
On Nov 30, 2006, at 10:16 PM, Chen, Kenneth W wrote: Zach Brown wrote on Thursday, November 30, 2006 1:45 PM At that time, a patch was written for raw device to demonstrate that large performance head room is achievable (at ~20% speedup for micro- benchmark and ~2% for db transaction

Re: [patch] remove redundant iov segment check

2006-12-04 Thread Zach Brown
On Dec 4, 2006, at 8:26 AM, Chen, Kenneth W wrote: The access_ok() and negative length check on each iov segment in function generic_file_aio_read/write are redundant. They are all already checked before calling down to these low level generic functions. ... So it's not possible to

Re: [patch] remove redundant iov segment check

2006-12-04 Thread Zach Brown
Maybe we should create another internal generic_file_aio_read/write for in-core function? fs/read_write.c and fs/aio.c are not module-able and the check is already there. For external module, we can do the check and then calls down to the internal one. Maybe. I'd rather see fewer moving

Re: [patch] kill pointless ki_nbytes assignment in aio_setup_single_vector

2006-12-04 Thread Zach Brown
[EMAIL PROTECTED] That seems to be the case, indeed. Acked-by: Zach Brown [EMAIL PROTECTED] - z - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ

Re: [RFC][PATCH] Make io_submit non-blocking

2012-07-26 Thread Zach Brown
On Fri, Jul 27, 2012 at 01:22:10AM +0530, Ankit Jain wrote: I should probably be doing better tests, any suggestions on what or how I can test? Well, is the test actually *doing* anything with these IOs? Calling io_submit() and then immediately waiting for completion is the best case for

Re: [RFC] VFS: File System Mount Wide O_DIRECT Support

2012-09-04 Thread Zach Brown
The idea is simple, leave the desicion for the file system user to enable file system mount wide O_DIRECT support with a new mount option, for example, I believe a better approach to your problem is actually to enable loopback device driver to use direct IO. Someone was actually

Re: [rfc] direct IO submission and completion scalability issues

2008-02-04 Thread Zach Brown
[ ugh, still jet lagged. ] Hi Nick, When Matthew was describing this work at an LCA presentation (not sure whether you were at that presentation or not), Zach came up with the idea that allowing the submitting application control the CPU that the io completion processing was occurring

Re: [PATCH 4 of 4] Introduce aio system call submission and completion system calls

2007-02-01 Thread Zach Brown
Do you have any userspace code that can be used to get started experimenting with your fibril based AIO stuff? I only have a goofy little test app so far: http://www.zabbo.net/~zab/aio-walk-tree.c It's not to be taken too seriously :) I want to try it on from a userspace

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-01 Thread Zach Brown
let me clarify this: i very much like your AIO patchset in general, in the sense that it 'completes' the AIO implementation: finally everything can be done via it, greatly increasing its utility and hopefully its penetration. This is the most important step, by far. We violently agree on

Re: [PATCH 4 of 4] Introduce aio system call submission and completion system calls

2007-02-01 Thread Zach Brown
Wooo ...hold on ... I think this is swinging out of perspective :) I'm sorry, but I don't. I think using the EIOCBRETRY method in complicated code paths requires too much maintenance cost to justify its benefits. We can agree to disagree on that judgement :). - z - To unsubscribe from

Re: [patch] aio: add per task aio wait event condition

2007-02-01 Thread Zach Brown
That sounds like a programming error, don't you think? Maybe returning EINVAL is the right approach? Maybe. I think I'd prefer to be permissive and queue as much as possible, but it's not a strong preference. Returning EINVAL seems ok, too. - z - To unsubscribe from this list: send the

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-01 Thread Zach Brown
Priorities cannot be shared, as they have to adapt to the per-request priority when we get down to the nitty gitty of POSIX AIO, as otherwise realtime issues like keepalive transmits will be handled incorrectly. Well, maybe not *blind* sharing. But something more than the disconnect

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-05 Thread Zach Brown
Other questions really relate to the scheduling - Zach do you intend schedule_fibrils() to be a call code would make or just from schedule() ? I'd much rather keep the current sleeping API in as much as is possible. So, yeah, if we can get schedule() to notice and behave accordingly I'd

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-05 Thread Zach Brown
ok, i think i noticed another misunderstanding. The kernel thread based scheme i'm suggesting would /not/ 'switch' to another kernel thread in the cached case, by default. It would just execute in the original context (as if it were a synchronous syscall), and the switch to a kernel thread from

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-05 Thread Zach Brown
Since I still think that the many-thousands potential async operations coming from network sockets are better handled with a classical event machanism [1], and since smooth integration of new async syscall into the standard POSIX infrastructure is IMO a huge win, I think we need to have a

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-05 Thread Zach Brown
But really, being a scheduler guy i was much more concerned about the duplication and problems caused by the fibril concept itself - which duplication and complexity makes up 80% of Zach's submitted patchset. For example this bit: [PATCH 3 of 4] Teach paths to wake a specific void * target

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-05 Thread Zach Brown
+ current-per_call = next-per_call; Pointer instead of structure copy? Sure, there are lots of trade-offs there, but the story changes if we keep the 1:1 relationship between task_struct and thread_info. - z - To unsubscribe from this list: send the line unsubscribe linux-kernel

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-05 Thread Zach Brown
Or we need some sort of enter_context()/leave_context() (adopt mm, files, ...) to have a per-CPU kthread to be able to execute the syscall from the async() caller context. I believe that's what Ingo is hoping for, yes. - z - To unsubscribe from this list: send the line unsubscribe

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-05 Thread Zach Brown
The result of one async operation is basically a cookie and a result code. Eight or sixteen bytes at most. s/basically/minimally/ Well, yeah. The patches I sent had: struct asys_completion { longreturn_code; unsigned long cookie; }; That's as stupid as it gets.

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-05 Thread Zach Brown
No, that's *really* it ;) For syscalls, sure. The kevent work incorporates Uli's desire to have more data per event. Have you read his OLS stuff? It's been a while since I did so I've lost the details of why he cares to have more. Let me say it again, maybe a little louder this time:

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-05 Thread Zach Brown
- we'd need to do it in the kernel (which is actually nasty, since different system calls have slightly different semantics - some don't return any error value at all, and negative numbers are real numbers) - we'd have to teach user space about the negative errno mechanism, in

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-05 Thread Zach Brown
It has me excited in any case. Once anything even remotely testable appears (Zach tells me not to try the current code), I'll work it into MTasker (http://ds9a.nl/mtasker) and make it power a nameserver that does async i/o, for use with very very large zones that aren't preloaded. I'll be

Re: [PATCH 2 of 4] Introduce i386 fibril scheduling

2007-02-06 Thread Zach Brown
That's not how the patches work right now, but yes, I at least personally think that it's something we should aim for (ie the interface shouldn't _require_ us to always wait for things even if perhaps an early implementation might make everything be delayed at first) I agree that we

Re: [PATCH] aio: fix kernel bug when page is temporally busy

2007-02-09 Thread Zach Brown
On Feb 9, 2007, at 6:05 AM, Suparna Bhattacharya wrote: On Fri, Feb 09, 2007 at 11:40:27AM +0100, Jiri Kosina wrote: On Fri, 9 Feb 2007, Andrew Morton wrote: @@ -1204,7 +1204,7 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,

Re: [patch 1/3] fs: add an iovec iterator

2007-02-09 Thread Zach Brown
What I have there is not actually a full-blown file io descriptor, because there is no file or offset. It is just an iovec iterator (so maybe I should rename it to iov_iter, rather than iodesc). I think it might be a nice idea to keep this iov_iter as a standalone structure, and it could be

Re: dio_get_page() lockdep complaints

2007-11-09 Thread Zach Brown
So, reiserfs and NFS are nesting i_mutex inside the mmap_sem. [b038c6e5] mutex_lock+0x1c/0x1f [b01b17e9] reiserfs_file_release+0x54/0x447 [b016afe7] __fput+0x53/0x101 [b016b0ee] fput+0x19/0x1c [b015bcd5] remove_vma+0x3b/0x4d [b015c659]

Re: dio_get_page() lockdep complaints

2007-11-09 Thread Zach Brown
So reiser and NFS need to be fixed. No? Actually, it is rather mmap() needs to be fixed. Sure, I'm willing to have that demonstrated. My point was that DIO getting the mmap_sem inside i_mutex is currently correct. reiserfs, though, seems to be out on a more precarious limb ;). - z - To

Re: dio_get_page() lockdep complaints

2007-11-09 Thread Zach Brown
won't pack. There are already a host of conditions under which it won't pack. Totally untested, but built. Signed-off-by: Zach Brown [EMAIL PROTECTED] diff --git a/fs/reiserfs/file.c b/fs/reiserfs/file.c index a804903..40085f1 100644 --- a/fs/reiserfs/file.c +++ b/fs/reiserfs/file.c @@ -46,7

Re: dio_get_page() lockdep complaints

2007-11-09 Thread Zach Brown
Ugh, I thought the preallocation was getting freed elsewhere, but it looks like I was wrong. We can't just skip the i_mutex after all, sorry. Ah, so none of those tests at the top will stop tail packing if there's been pre-allocation? Like, uh, the inode reference count test? - z [

[PATCH] fuse: verify all ioctl retry iov elements

2012-07-24 Thread Zach Brown
cc:ing stable because the initial commit did as well. Signed-off-by: Zach Brown z...@redhat.com CC: sta...@kernel.org [2.6.37+] --- fs/fuse/file.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index b321a68..514f12a 100644

Re: [RFC][PATCH] Make io_submit non-blocking

2012-07-24 Thread Zach Brown
On Tue, Jul 24, 2012 at 05:11:05PM +0530, Ankit Jain wrote: Currently, io_submit tries to execute the io requests on the same thread, which could block because of various reaons (eg. allocation of disk blocks). So, essentially, io_submit ends up being a blocking call. Yup, sadly that's how

Re: [RFC][PATCH] Make io_submit non-blocking

2012-07-24 Thread Zach Brown
And most importantly block devices, as they are one of the biggest use cases of AIO. With an almost no-op get_blocks callback I can't see how this change would provide any gain there. Historically we'd often see submission stuck waiting for requests. Tasks often try to submit way more aio

Re: [PATCH 00/25] AIO performance improvements/cleanups

2012-11-28 Thread Zach Brown
On Wed, Nov 28, 2012 at 08:43:24AM -0800, Kent Overstreet wrote: Bunch of performance improvements and cleanups Zach Brown and I have been working on. The code should be pretty solid at this point, though it could of course use more review and testing. Thanks for sending these out. I have

Re: [PATCH 07/25] aio: kiocb_cancel()

2012-11-28 Thread Zach Brown
On Wed, Nov 28, 2012 at 08:43:31AM -0800, Kent Overstreet wrote: Minor refactoring, to get rid of some duplicated code A minor nit: spin_lock_irq(ctx-ctx_lock); - ret = -EAGAIN; + kiocb = lookup_kiocb(ctx, iocb, key); - if (kiocb kiocb-ki_cancel) { -

Re: [PATCH 12/25] aio: Refcounting cleanup

2012-11-28 Thread Zach Brown
struct kioctx { atomic_tusers; - int dead; + atomic_tdead; Do we want to be paranoid and atomic_set() that to 0 when the ioctx is allocated? + while (!list_empty(ctx-active_reqs)) { + struct list_head *pos

Re: [PATCH 13/25] aio: Convert read_events() to hrtimers

2012-11-28 Thread Zach Brown
- int i = 0; + DEFINE_WAIT(wait); + struct hrtimer_sleeper t; + size_t i = 0; Changing i to size_t is kind of surprising. Is that on purpose? - set_task_state(tsk, TASK_RUNNING); - remove_wait_queue(ctx-wait, wait); -

Re: [PATCH 14/25] aio: Make aio_read_evt() more efficient

2012-11-28 Thread Zach Brown
We can't use cmpxchg() on the ring buffer's head pointer directly, since it's modded to nr_events and would be susceptible to ABA. So instead we maintain a shadow head that uses the full 32 bits, and cmpxchg() that and then updated the real head pointer. Time to update this comment to reflect

Re: [RFC, PATCH] Extensible AIO interface

2012-10-01 Thread Zach Brown
On Mon, Oct 01, 2012 at 03:23:41PM -0700, Kent Overstreet wrote: So, I and other people keep running into things where we really need to add an interface to pass some auxiliary... stuff along with a pread() or pwrite(). Sure. Martin (cc:ed) will sympathize. A few examples: * IO scheduler

Re: [RFC, PATCH] Extensible AIO interface

2012-10-01 Thread Zach Brown
Not just per sector, Per hardware sector. For passing around checksums userspace would have to find out the hardware sector size and checksum type/size via a different interface, and then the attribute would contain a pointer to a buffer that can hold the appropriate number of checksums. All

Re: [RFC, PATCH] Extensible AIO interface

2012-10-02 Thread Zach Brown
The generic code wouldn't know about any user pointers inside attributes, so it'd have to be downstream consumers. Hopefully there won't be many attributes with user pointers in them (I don't expect there to be), so we won't have too much of this messyness. I really don't like this. We

Re: [PATCH] mm/slab: add a leak decoder callback

2013-01-15 Thread Zach Brown
The merge processing occurs during kmem_cache_create and you are setting up the decoder field afterwards! Wont work. In the thread I suggested providing the callback at destruction: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg21130.html I liked that it limits accesibility of

Re: [PATCH] Syslets - Fix cachemiss_thread return value

2007-06-08 Thread Zach Brown
I don't like it :-) For a fundamental reason or because it happens to not work yet? :) - z - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the

Re: [PATCH] Syslets - Fix cachemiss_thread return value

2007-06-08 Thread Zach Brown
The latter - it fails the test that I posted. OK, good. That's easy enough to fix :) I'll send out a tested version. - z - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at

Re: TUX2 filesystem

2007-06-21 Thread Zach Brown
Second, Oracle is now working on Btrfs (if ever a FS needed a better name... is that pronounced ButterFS?). (In our silliest moments, yes. Absolutely.) - z - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo

Re: [RFC 2/4] CONFIG_STABLE: Switch off kmalloc(0) tests in slab allocators

2007-05-31 Thread Zach Brown
+#ifndef CONFIG_STABLE /* * We should return 0 if size == 0 (which would result in the * kmalloc caller to get NULL) but we use the smallest object @@ -81,6 +82,7 @@ static inline int kmalloc_index(size_t s * we can discover locations where we do 0 sized

Re: [PATCH] Syslets - Fix cachemiss_thread return value

2007-05-31 Thread Zach Brown
cachemiss_thread should explicitly return 0 or error instead of task_ret_reg(current) (which is -ENOSYS anyway) because async_thread_helper is careful to put the return value in eax anyway. Can you explain what motivated you to send out this patch? It used to return 0. It was changed

Re: [PATCH 1/3] syslet demos - add more includes

2007-05-31 Thread Zach Brown
Add a bunch of includes to sys.h and syslet.h to kill off compilation warnings. This, and the patches which add tests, all look great to me. Ingo, are you patching up your tests or do you want me to take care of these? - z - To unsubscribe from this list: send the line unsubscribe

Re: [PATCH] Syslets - Fix cachemiss_thread return value

2007-05-31 Thread Zach Brown
the demos I sent out. Dunno about the existing ones, but I bet they do the same. Hmm, they didn't when I ran them, but I'll give yours a try and take a closer look. Thanks for taking the time to bring it up. - z - To unsubscribe from this list: send the line unsubscribe linux-kernel in the

Re: Syslets, signals, and security

2007-06-04 Thread Zach Brown
On Mon, Jun 04, 2007 at 12:31:45PM -0400, Jeff Dike wrote: Syslets seem like a fundamentally good idea to me, but the current implementation, using CLONE_THREAD threads, seems like a basic problem. It has remaining problems that need to be addressed, yes. First, there are signals. If the

Re: [PATCH] Syslets - Fix cachemiss_thread return value

2007-06-07 Thread Zach Brown
and then return it. __exec_atom() sets task_ret_reg() to NULL if there's a chance that it will block while executing the syscall in the atom. Signed-off-by: Zach Brown [EMAIL PROTECTED] diff -r f0d8ee165e2e kernel/async.c --- a/kernel/async.cThu Jun 07 14:32:31 2007 -0700 +++ b/kernel/async.cThu

Re: vm/fs meetup in september?

2007-06-25 Thread Zach Brown
I'd just like to take the chance also to ask about a VM/FS meetup some time around kernel summit (maybe take a big of time during UKUUG or so). Yeah, I'd be interested. More issues: - chris mason's patches to normalize buffered and direct locking - z - To unsubscribe from this list: send

[PATCH] dio: remove bogus refcounting BUG_ON

2007-07-03 Thread Zach Brown
the lock if the final reference was just dropped. Another CPU might free the dio in bio completion and reuse the memory after this path drops the dio lock but before the BUG_ON() is evaluated. This patch passed aio+dio regression unit tests and aio-stress on ext3. Signed-off-by: Zach Brown [EMAIL

Re: [PATCH] dio: remove bogus refcounting BUG_ON

2007-07-05 Thread Zach Brown
the BUG_ON(). But unfortunately, our perf. team is able reproduce the problem. What are they doing to reproduce it? How much setup does it take? Debug indicated that, the ret2 == 1 :( That could be consistent with the theory that we're racing with the dio struct being freed and reused

Re: [PATCH] eCryptfs: Delay writing 0's after llseek until write

2007-05-22 Thread Zach Brown
FWIW, I believe Andrew's point was that critical information for Joe Enduser (and Joe Patch-Ho) was lacking in the original changelog. and don't forget Joe eCryptfs-Maintainer-2-Years-In-The-Future. - z - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a

Syslets, Threadlets, generic AIO support, v6

2007-05-29 Thread Zach Brown
I'm pleased to announce the availability of version 6 of the syslet subsystem. Ingo and I agreed that I'll handle syslet releases while he's busy with CFS. I copied the cc: list from Ingo's v5 announcement. If you'd like to be dropped (or added), please let me know. The v6 patch series against

Re: Syslets, Threadlets, generic AIO support, v6

2007-05-29 Thread Zach Brown
.. so don't keep us in suspense. Do you have any numbers for anything (like Oracle, to pick a random thing out of thin air ;) that might actually indicate whether this actually works or not? I haven't gotten to running Oracle's database against it. It is going to be Very Cranky if O_DIRECT

Re: Syslets, Threadlets, generic AIO support, v6

2007-05-29 Thread Zach Brown
You should pick up the kevent work :) I haven't looked at it in a while but yes, it's on the radar :). Having async request and response rings would be quite useful, and most closely match what is going on under the hood in the kernel and hardware. Yeah, but I have lots of competing

Re: Syslets, Threadlets, generic AIO support, v6

2007-05-30 Thread Zach Brown
Yeah, it'll confuse CFQ a lot actually. The threads either need to share an io context (clean approach, however will introduce locking for things that were previously lockless), or CFQ needs to get better support for cooperating processes. Do let me know if I can be of any help in this. For

Re: Syslets, Threadlets, generic AIO support, v6

2007-05-30 Thread Zach Brown
due to the added syscall. (Maybe we can just get that reserved upstream now?) Maybe, but we'd have to agree on the bare syslet interface that is being supported :). Personally, I'd like that to be the simplest thing that works for people and I'm not convinced that the current syslet-specific

Re: [PATCH 0/6] lock contention tracking -v4

2007-05-30 Thread Zach Brown
On Wed, May 30, 2007 at 02:49:03PM +0200, Peter Zijlstra wrote: Use the lockdep infrastructure to track lock contention and other lock statistics. I really like the sound of this. Has anyone given you an indication of when it might be merged? - z - To unsubscribe from this list: send the

Re: [EXT4 set 5][PATCH 1/1] expand inode i_extra_isize to support features in larger inode

2007-07-13 Thread Zach Brown
I fear the consequences of this change :( I love it. In the past I've lost time by working with patches which didn't quite realize that ext3 holds a transaction open during -direct_IO. Oh well, please keep it alive, maybe beat on it a bit, resend it later on? I can test the patch to make

Re: [RFC/PATCH 0/2] ext4: Transparent Decompression Support

2013-07-25 Thread Zach Brown
What about introducing a new flag, O_COMPR which tells the kernel, btw, we want this file to be decompressed if it can be. It can fallback to O_RDONLY or something like that? That gets rid of the chattr ugliness. How is that different from chattr ugliness, which also comes down to a

Re: [PATCH] truncate: drop 'oldsize' truncate_pagecache() parameter

2013-07-29 Thread Zach Brown
@@ -50,7 +50,7 @@ static void adfs_write_failed(struct address_space *mapping, loff_t to) struct inode *inode = mapping-host; if (to inode-i_size) - truncate_pagecache(inode, to, inode-i_size); + truncate_pagecache(inode, inode-i_size); } All these

Re: [PATCH-v2 0/9] target: Add support for EXTENDED_COPY (VAAI) offload emulation

2013-08-26 Thread Zach Brown
On Mon, Aug 26, 2013 at 10:02:59PM +, Nicholas A. Bellinger wrote: From: Nicholas Bellinger n...@daterainc.com Hi folks, This -v2 series adds support to target-core for generic EXTENDED_COPY offload emulation as defined by SPC-4 using virtual (IBLOCK, FILEIO, RAMDISK) backends. Cool,

[PATCH 2/3] splice: add f_op-splice_direct

2013-09-11 Thread Zach Brown
if the caller wants to avoid unaccelerated copying, perhaps by setting behavioural flags. The SPLICE_F_DIRECT flag is arguably misused here to indicate both file-to-file direct splicing *and* acceleration. Signed-off-by: Zach Brown z...@redhat.com --- fs/bad_inode.c | 8 fs/splice.c

[RFC] extending splice for copy offloading

2013-09-11 Thread Zach Brown
When I first started on this stuff I followed the lead of previous work and added a new syscall for the copy operation: https://lkml.org/lkml/2013/5/14/618 Towards the end of that thread Eric Wong asked why we didn't just extend splice. I immediately replied with some dumb dismissive answer.

[PATCH 1/3] splice: add DIRECT flag for splicing between files

2013-09-11 Thread Zach Brown
lets the file system lock both for the duration of the copy, should it need to. If the method refuses to accelerate the copy, for whatever reason, we can naturally fall back to the generic direct splice method that sendfile uses today. Signed-off-by: Zach Brown z...@redhat.com --- fs/splice.c

[PATCH 3/3] btrfs: implement .splice_direct extent copying

2013-09-11 Thread Zach Brown
() already does elsewhere) is moved to a new much smaller btrfs_ioctl_clone(). btrfs_splice_direct() thus inherits the conservative limitations of the btrfs clone ioctl: it only allows block-aligned copies between files on the same snapshot. Signed-off-by: Zach Brown z...@redhat.com --- fs/btrfs

Re: linux-next: manual merge of the block tree with the tree

2013-11-08 Thread Zach Brown
That make sense? I can show you more concretely what I'm working on if you want. Or if I'm full of crap and this is useless for what you guys want I'm sure you'll let me know :) It sounds interesting, but also a little confusing at this point, at least from the non-block side of

Re: [RFC] extending splice for copy offloading

2013-10-01 Thread Zach Brown
- app calls splice(from, 0, to, 0, SIZE_MAX) 1) VFS calls -direct_splice(from, 0, to, 0, SIZE_MAX) 1.a) fs reflinks the whole file in a jiffy and returns the size of the file 1 b) fs does copy offload of, say, 64MB and returns 64M 2) VFS does page copy of, say, 1MB and returns

Re: linux-next: manual merge of the vfs tree with the aio-direct tree

2013-09-18 Thread Zach Brown
As for aio-direct... Two questions: * had anybody tried to measure the effect on branch predictor from introducing that method vector? Commit d6afd4c4 (iov_iter: hide iovec details behind ops function pointers) FWIW, I never did. I only went that route to begin with because the few

Re: [PATCH v3 next/akpm] aio: convert the ioctx list to radix tree

2013-06-12 Thread Zach Brown
I've got an alternate approach for fixing this wart in lookup_ioctx()... Instead of using an rbtree, just use the reserved id in the ring buffer header to index an array pointing the ioctx. It's not finished yet, and it needs to be tidied up, but is most of the way there. Yeah, that

Re: libata maintainership change

2013-05-03 Thread Zach Brown
Time for new open source pastures outside the kernel, for me. Thanks for all your hard work over the years. Here's to good luck in the future! - z -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info

Re: [RFC PATCH] vfs: add permute operation

2013-05-29 Thread Zach Brown
+static void sort_parents3(struct dentry **p) +void sort_parents(struct dentry **p, unsigned *nump) Yikes, that's a bunch of fiddly code. Is it *really* worth all that to avoid calling the generic sort helpers? AFAICS, I cannot make the compare function transitive, e.g.: A is

Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())

2013-05-17 Thread Zach Brown
I ended up working on this a bit today, and managed to cobble together something that somewhat works -- please see the patch below. Just some quick observations: + ctx-ctx_file = anon_inode_getfile([aio], aio_ctx_fops, ctx, O_RDWR); + if (IS_ERR(ctx-ctx_file)) { +

Re: [RFC v0 1/4] vfs: add copy_range syscall and vfs entry point

2013-05-21 Thread Zach Brown
On Tue, May 21, 2013 at 07:47:19PM +, Eric Wong wrote: Zach Brown z...@redhat.com wrote: On Wed, May 15, 2013 at 07:44:05PM +, Eric Wong wrote: Why introduce a new syscall instead of extending sys_splice? Personally, I think it's ugly to have different operations use the same

Re: [RFC PATCH] vfs: add permute operation

2013-05-28 Thread Zach Brown
Some quick thoughts: Permute the location of files. E.g. 'permute(A, B, C)' is equivalent to A-B, B-C and C-A. This is essentially a series of renames done as a single atomic operation. Hmm. Can we choose a more specific name than 'permute'? To me, -permute() tells me just as much

[RFC v0 2/4] x86: add sys_copy_range to syscall tables

2013-05-14 Thread Zach Brown
Add sys_copy_range to the x86 syscall tables. Happily, it doesn't require compat helpers. Signed-off-by: Zach Brown z...@redhat.com --- arch/x86/syscalls/syscall_32.tbl | 1 + arch/x86/syscalls/syscall_64.tbl | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/syscalls/syscall_32.tbl

[RFC v0 4/4] nfs, nfsd: rough sys_copy_range and COPY support

2013-05-14 Thread Zach Brown
This crude patch illustrates the simplest plumbing involved in supporting sys_call_range with the NFS COPY operation that's pending in the 4.2 draft spec. The patch is based on a previous prototype that used the COPY op to implement sys_copyfileat which created a new file (based on the ocfs2

[RFC v0 0/4] sys_copy_range() rough draft

2013-05-14 Thread Zach Brown
We've been talking about implementing some form of bulk data copy offloading for a while now. BTRFS and OCFS2 implement forms of copy offloading with ioctls, NFS 4.2 will include a byte-granular COPY operation, and the SCSI XCOPY command is being implemented now that Windows can issue it. In the

[RFC v0 3/4] btrfs: add .copy_range file operation

2013-05-14 Thread Zach Brown
the CLONE_RANGE ioctl and copy_range syscall. Signed-off-by: Zach Brown z...@redhat.com --- fs/btrfs/ctree.h | 3 ++ fs/btrfs/file.c | 1 + fs/btrfs/ioctl.c | 122 +-- 3 files changed, 77 insertions(+), 49 deletions(-) diff --git a/fs/btrfs/ctree.h

[RFC v0 1/4] vfs: add copy_range syscall and vfs entry point

2013-05-14 Thread Zach Brown
mpage.o ioprio.o diff --git a/fs/copy_range.c b/fs/copy_range.c new file mode 100644 index 000..3000b9f --- /dev/null +++ b/fs/copy_range.c @@ -0,0 +1,127 @@ +/* + * copy_range: offload data copying between existing files + * + * Copyright (C) 2013 Zach Brown z...@redhat.com + */ +#include linux/fs.h

Re: [RFC v0 0/4] sys_copy_range() rough draft

2013-05-14 Thread Zach Brown
On Wed, May 15, 2013 at 07:42:51AM +1000, Dave Chinner wrote: On Tue, May 14, 2013 at 02:15:22PM -0700, Zach Brown wrote: I'm going to keep hacking away at this. My next step is to get ext4 supporting .copy_range, probably with a quick hack to copy the contents of bios. Hopefully that'll

Re: [RFC v0 1/4] vfs: add copy_range syscall and vfs entry point

2013-05-15 Thread Zach Brown
On Wed, May 15, 2013 at 07:44:05PM +, Eric Wong wrote: Why introduce a new syscall instead of extending sys_splice? Personally, I think it's ugly to have different operations use the same syscall just because their arguments match. But that preference aside, sure, if the consensus is that

Re: [RFC] extending splice for copy offloading

2013-09-25 Thread Zach Brown
Hrmph. I had composed a reply to you during Plumbers but.. something happened to it :). Here's another try now that I'm back. Some things to talk about: - I really don't care about the naming here. If you do, holler. - We might want different flags for file-to-file splicing and

Re: [RFC] extending splice for copy offloading

2013-09-25 Thread Zach Brown
On Wed, Sep 25, 2013 at 03:02:29PM -0400, Anna Schumaker wrote: On Wed, Sep 25, 2013 at 2:38 PM, Zach Brown z...@redhat.com wrote: Hrmph. I had composed a reply to you during Plumbers but.. something happened to it :). Here's another try now that I'm back. Some things to talk about

Re: [RFC] extending splice for copy offloading

2013-09-25 Thread Zach Brown
A client-side copy will be slower, but I guess it does have the advantage that the application can track progress to some degree, and abort it fairly quickly without leaving the file in a totally undefined state--and both might be useful if the copy's not a simple constant-time operation. I

Re: [PATCH 1/6] block: Introduce bio_for_each_page()

2013-09-25 Thread Zach Brown
void zero_fill_bio(struct bio *bio) { - unsigned long flags; struct bio_vec bv; struct bvec_iter iter; - bio_for_each_segment(bv, bio, iter) { +#if defined(CONFIG_HIGHMEM) || defined(ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE) + bio_for_each_page(bv, bio, iter) { +

Re: [PATCH 1/6] block: Introduce bio_for_each_page()

2013-09-25 Thread Zach Brown
On Wed, Sep 25, 2013 at 02:49:10PM -0700, Kent Overstreet wrote: On Wed, Sep 25, 2013 at 02:17:02PM -0700, Zach Brown wrote: void zero_fill_bio(struct bio *bio) { - unsigned long flags; struct bio_vec bv; struct bvec_iter iter; - bio_for_each_segment(bv, bio, iter

Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap()

2013-09-26 Thread Zach Brown
Sigh. A pox on whoever thought up huge pages. managing 1TB+ of memory in 4K chunks is just insane. The question of larger pages is not if, but only when. And how! Sprinking a bunch of magical if (thp) {} else {} throughtout the code looks like a stunningly bad idea to me. It'd take real

Re: [RFC] extending splice for copy offloading

2013-09-26 Thread Zach Brown
On Thu, Sep 26, 2013 at 10:58:05AM +0200, Miklos Szeredi wrote: On Wed, Sep 25, 2013 at 11:07 PM, Zach Brown z...@redhat.com wrote: A client-side copy will be slower, but I guess it does have the advantage that the application can track progress to some degree, and abort it fairly quickly

Re: [RFC] extending splice for copy offloading

2013-09-26 Thread Zach Brown
On Thu, Sep 26, 2013 at 08:06:41PM +0200, Miklos Szeredi wrote: On Thu, Sep 26, 2013 at 5:34 PM, J. Bruce Fields bfie...@fieldses.org wrote: On Thu, Sep 26, 2013 at 10:58:05AM +0200, Miklos Szeredi wrote: On Wed, Sep 25, 2013 at 11:07 PM, Zach Brown z...@redhat.com wrote: A client-side

Re: [RFC] extending splice for copy offloading

2013-09-27 Thread Zach Brown
Sure. So we'd have: - no flag default that forbids knowingly copying with shared references so that it will be used by default by people who feel strongly about their assumptions about independent write durability. - a flag that allows shared references for people who would

<    1   2   3   4   5   6   7   8   9   10   >