Re: [PATCH]loop cleanup in fs/namespace.c - repost

2007-11-21 Thread Zach Brown
>> This doesn't look fine. Did you test this? > > Oops, my fault. Of course, I tested the patch, but kernel modules are > disabled in my test setup, so I missed the error. :) > Enclosed to this message is a new patch, which replaces the goto-loop by > the while-based one, but leaves the

Re: [PATCH 5/5] Make wait_on_retry_sync_kiocb killable

2007-10-25 Thread Zach Brown
Matthew Wilcox wrote: > Use TASK_KILLABLE to allow wait_on_retry_sync_kiocb to return -EINTR. > All callers then check the return value and break out of their loops. This won't work because "sync" kiocbs are a nasty hack that don't follow the (also nasty) refcounting patterns of the aio core.

Re: [PATCH 01/31] Add an ERR_CAST() macro to complement ERR_PTR and co. [try #5]

2007-10-25 Thread Zach Brown
> + * ERR_CAST - Explicitly cast an error-valued pointer to another pointer type > + * @ptr: The pointer to cast. > + * > + * Explicitly cast an error-valued pointer to another pointer type in such a > + * way as to make it clear that's what's going on. > + */ > +static inline void

Re: [PATCH 01/31] Add an ERR_CAST() macro to complement ERR_PTR and co. [try #5]

2007-10-25 Thread Zach Brown
Roland Dreier wrote: > > > +static inline void *ERR_CAST(const void *ptr) > > > +{ > > > +return (void *) ptr; > > > +} > > > > Just to nit, surely you don't need the cast inside the function. The > > casting happens at the call site between the argument and returned pointer. > >

Re: [PATCH] Fix bad data from non-direct-io read after direct-io write

2007-10-26 Thread Zach Brown
Linus Torvalds wrote: > Hmm. If I read this right, this bug seems to have been introduced by > commit 65b8291c4000e5f38fc94fb2ca0cb7e8683c8a1b ("dio: invalidate clean > pages before dio write") back in March. Agreed. And it's a really dumb bug. ->direct_io will almost always return

Re: [PATCH] Fix bad data from non-direct-io read after direct-io write

2007-10-26 Thread Zach Brown
Linus Torvalds wrote: > > On Fri, 26 Oct 2007, Zach Brown wrote: >> I think that test should be changed to > > How about not testing at all? Which was what the old code did. > > Just do the invalidate unconditionally for any writes, and screw the end > result of the

Re: [PATCH] Fix bad data from non-direct-io read after direct-io write

2007-10-26 Thread Zach Brown
Linus Torvalds wrote: > > On Fri, 26 Oct 2007, Zach Brown wrote: >> I can throw together a patch if you haven't already committed one by the >> time you read this ;). > > I'm not touching that code except to send out possible patches for others > to test and comme

Re: dio_get_page() lockdep complaints

2007-11-09 Thread Zach Brown
So, reiserfs and NFS are nesting i_mutex inside the mmap_sem. >>[] mutex_lock+0x1c/0x1f >>[] reiserfs_file_release+0x54/0x447 >>[] __fput+0x53/0x101 >>[] fput+0x19/0x1c >>[] remove_vma+0x3b/0x4d >>[] do_munmap+0x17f/0x1cf >[]

Re: dio_get_page() lockdep complaints

2007-11-09 Thread Zach Brown
>> So reiser and NFS need to be fixed. No? > > Actually, it is rather mmap() needs to be fixed. Sure, I'm willing to have that demonstrated. My point was that DIO getting the mmap_sem inside i_mutex is currently correct. reiserfs, though, seems to be out on a more precarious limb ;). - z -

Re: dio_get_page() lockdep complaints

2007-11-09 Thread Zach Brown
simply won't pack. There are already a host of conditions under which it won't pack. Totally untested, but built. Signed-off-by: Zach Brown <[EMAIL PROTECTED]> diff --git a/fs/reiserfs/file.c b/fs/reiserfs/file.c index a804903..40085f1 100644 --- a/fs/reiserfs/file.c +++ b/fs/reiserfs/fil

Re: dio_get_page() lockdep complaints

2007-11-09 Thread Zach Brown
> Ugh, I thought the preallocation was getting freed elsewhere, but it > looks like I was wrong. We can't just skip the i_mutex after all, > sorry. Ah, so none of those tests at the top will stop tail packing if there's been pre-allocation? Like, uh, the inode reference count test? - z [

Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Zach Brown
>> And another of my pet peeves with ->bmap is that it uses 0 to mean >> "sparse" which causes a conflict on NTFS at least as block zero is >> part of the $Boot system file so it is a real, valid block... NTFS >> uses -1 to denote sparse blocks internally. > > Reiserfs and Btrfs also use

Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Zach Brown
> Can you clarify what you mean above with an example? I don't really > follow. Sure, take 'tar' as an example. It'll read files in the order that their names are returned from directory listing. This can produce bad IO patterns because the order in which the file names are returned doesn't

Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-29 Thread Zach Brown
> But, we shouldn't inflict all of this on fibmap/fiemapwe'll get > lost trying to make the one true interface for all operations. > > For grouping operations on files, I think a read_tree syscall with > hints for what userland will do (read, stat, delete, list > filenames), and a better

Re: [PATCH] Fix bad data from non-direct-io read after direct-io write

2007-10-30 Thread Zach Brown
ire a bit more work. This gives up on the idea of returning EIO to indicate to userspace that stale data remains if the invalidation failed. Signed-off-by: Zach Brown <[EMAIL PROTECTED]> --- linux-2.6.23.1-base/mm/filemap.c2007-10-12 12:43:44.0 -0400 +++ linux-2.6.23.1/mm/filemap.c 2007

Re: checkpatch bug: space between left parenthesis and asterisk

2007-10-30 Thread Zach Brown
Timur Tabi wrote: > I'm running checkpatch.pl (dated 10/17), and it complains about this line: > > crc = __be32_to_cpu(* ((__be32 *) ((void *) firmware + calc_size))); Well, that is a bit of a stinker. Maybe it could be reworked a little to make it easier for humans and checkpatch to

Re: [patch 0/6][RFC] Cleanup FIBMAP

2007-10-31 Thread Zach Brown
> The second use case is to look at the physical layout of blocks on disk > for a specific file, use Mark Lord's write_long patches to inject a disk > error and then read that file to make sure that we are handling disk IO > errors correctly. A bit obscure, but really quite useful. Hmm, yeah,

Re: [PATCH] truncate: drop 'oldsize' truncate_pagecache() parameter

2013-07-29 Thread Zach Brown
> @@ -50,7 +50,7 @@ static void adfs_write_failed(struct address_space > *mapping, loff_t to) > struct inode *inode = mapping->host; > > if (to > inode->i_size) > - truncate_pagecache(inode, to, inode->i_size); > + truncate_pagecache(inode, inode->i_size); >

Re: [PATCH 6/7] btrfs: cleanup: removed unused 'btrfs_reada_detach'

2013-08-08 Thread Zach Brown
> > even though the function is currently unused, I'm hesitating to remove it > > as it's part of the reada-API and might be handy for anyone going to use > > the API in the future. > > I agree. As replied here, > http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg24047.html > please keep

Re: btrfs zero divide

2013-08-09 Thread Zach Brown
On Fri, Aug 09, 2013 at 02:26:36PM +0200, Andreas Schwab wrote: > Josef Bacik writes: > > > So stripe_len shouldn't be 0, if it is you have bigger problems :). > > The bigger problem is that stripe_nr is u64, this is completely bogus. > The first operand of do_div must be u32. This goes

Re: [RFC/PATCH 0/2] ext4: Transparent Decompression Support

2013-07-25 Thread Zach Brown
> > What about introducing a new flag, O_COMPR which tells the > > kernel, btw, we want this file to be decompressed if it can be. It > > can fallback to O_RDONLY or something like that? That gets rid of > > the chattr ugliness. > > How is that different from chattr ugliness, which also comes

Re: linux-next: manual merge of the block tree with the tree

2013-11-08 Thread Zach Brown
> > > That make sense? I can show you more concretely what I'm working on if > > > you want. Or if I'm full of crap and this is useless for what you guys > > > want I'm sure you'll let me know :) > > > > It sounds interesting, but also a little confusing at this point, at > > least from the

adding missed signed-off-by for aio retry removal patch

2013-03-18 Thread Zach Brown
sob for the version of the patch that's in -next via Andrew's patches at this moment: commit ae5e0fe5ecafc6f8285367a20a8915b67a91066c Author: Zach Brown Date: Thu Jan 24 13:14:37 2013 +1100 aio: remove retry-based AIO Signed-off-by: Zach Brown Hopefully that's sufficient

[RFC v0 2/4] x86: add sys_copy_range to syscall tables

2013-05-14 Thread Zach Brown
Add sys_copy_range to the x86 syscall tables. Happily, it doesn't require compat helpers. Signed-off-by: Zach Brown --- arch/x86/syscalls/syscall_32.tbl | 1 + arch/x86/syscalls/syscall_64.tbl | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86

[RFC v0 4/4] nfs, nfsd: rough sys_copy_range and COPY support

2013-05-14 Thread Zach Brown
This crude patch illustrates the simplest plumbing involved in supporting sys_call_range with the NFS COPY operation that's pending in the 4.2 draft spec. The patch is based on a previous prototype that used the COPY op to implement sys_copyfileat which created a new file (based on the ocfs2

[RFC v0 0/4] sys_copy_range() rough draft

2013-05-14 Thread Zach Brown
We've been talking about implementing some form of bulk data copy offloading for a while now. BTRFS and OCFS2 implement forms of copy offloading with ioctls, NFS 4.2 will include a byte-granular COPY operation, and the SCSI XCOPY command is being implemented now that Windows can issue it. In the

[RFC v0 3/4] btrfs: add .copy_range file operation

2013-05-14 Thread Zach Brown
the CLONE_RANGE ioctl and copy_range syscall. Signed-off-by: Zach Brown --- fs/btrfs/ctree.h | 3 ++ fs/btrfs/file.c | 1 + fs/btrfs/ioctl.c | 122 +-- 3 files changed, 77 insertions(+), 49 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs

[RFC v0 1/4] vfs: add copy_range syscall and vfs entry point

2013-05-14 Thread Zach Brown
mpage.o ioprio.o diff --git a/fs/copy_range.c b/fs/copy_range.c new file mode 100644 index 000..3000b9f --- /dev/null +++ b/fs/copy_range.c @@ -0,0 +1,127 @@ +/* + * "copy_range": offload data copying between existing files + * + * Copyright (C) 2013 Zach Brown + */ +#include +#include

Re: [RFC v0 0/4] sys_copy_range() rough draft

2013-05-14 Thread Zach Brown
On Wed, May 15, 2013 at 07:42:51AM +1000, Dave Chinner wrote: > On Tue, May 14, 2013 at 02:15:22PM -0700, Zach Brown wrote: > > I'm going to keep hacking away at this. My next step is to get ext4 > > supporting .copy_range, probably with a quick hack to copy the > > content

Re: [RFC v0 1/4] vfs: add copy_range syscall and vfs entry point

2013-05-15 Thread Zach Brown
On Wed, May 15, 2013 at 07:44:05PM +, Eric Wong wrote: > Why introduce a new syscall instead of extending sys_splice? Personally, I think it's ugly to have different operations use the same syscall just because their arguments match. But that preference aside, sure, if the consensus is that

Re: [WiP]: aio support for migrating pages (Re: [PATCH V2 1/2] mm: hotplug: implement non-movable version of get_user_pages() called get_user_pages_non_movable())

2013-05-17 Thread Zach Brown
> I ended up working on this a bit today, and managed to cobble together > something that somewhat works -- please see the patch below. Just some quick observations: > + ctx->ctx_file = anon_inode_getfile("[aio]", _ctx_fops, ctx, O_RDWR); > + if (IS_ERR(ctx->ctx_file)) { > +

Re: [PATCH 2/2] fs/aio.c: use get_user_pages_non_movable() to pin ring pages when support memory hotremove

2013-02-04 Thread Zach Brown
> > index 71f613c..0e9b30a 100644 > > --- a/fs/aio.c > > +++ b/fs/aio.c > > @@ -138,9 +138,15 @@ static int aio_setup_ring(struct kioctx *ctx) > > } > > > > dprintk("mmap address: 0x%08lx\n", info->mmap_base); > > +#ifdef CONFIG_MEMORY_HOTREMOVE > > + info->nr_pages =

Re: Improving AIO cancellation

2013-02-08 Thread Zach Brown
> The draft implementation will look like this. struct bio should have > some way to get current status of kiocb that generated bio. So we add > a pointer to bool flag. > > struct bio { > bool *cancelled; > } > > in async DIO codepath this pointer will be initialized with bool at > "struct

Re: libata maintainership change

2013-05-03 Thread Zach Brown
> Time for new open source pastures outside the kernel, for me. Thanks for all your hard work over the years. Here's to good luck in the future! - z -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo

Re: [PATCH] nfsd: fix bad offset use

2013-03-22 Thread Zach Brown
d Agreed, the original code does look fishy and this fix right to me. Reviewed-by: Zach Brown - z -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH] aio: convert the ioctx list to radix tree

2013-03-22 Thread Zach Brown
On Fri, Mar 22, 2013 at 08:33:19PM +0200, Octavian Purdila wrote: > When using a large number of threads performing AIO operations the > IOCTX list may get a significant number of entries which will cause > significant overhead. For example, when running this fio script: Indeed. But you also

Re: linux-next: manual merge of the vfs tree with the aio-direct tree

2013-09-18 Thread Zach Brown
> As for aio-direct... Two questions: > * had anybody tried to measure the effect on branch predictor from > introducing that method vector? Commit d6afd4c4 ("iov_iter: hide iovec > details behind ops function pointers") FWIW, I never did. I only went that route to begin with because the

[PATCH 2/3] splice: add f_op->splice_direct

2013-09-11 Thread Zach Brown
if the caller wants to avoid unaccelerated copying, perhaps by setting behavioural flags. The SPLICE_F_DIRECT flag is arguably misused here to indicate both file-to-file "direct" splicing *and* acceleration. Signed-off-by: Zach Brown --- fs/bad_inode.c | 8 fs/splice.c

[RFC] extending splice for copy offloading

2013-09-11 Thread Zach Brown
When I first started on this stuff I followed the lead of previous work and added a new syscall for the copy operation: https://lkml.org/lkml/2013/5/14/618 Towards the end of that thread Eric Wong asked why we didn't just extend splice. I immediately replied with some dumb dismissive answer.

[PATCH 1/3] splice: add DIRECT flag for splicing between files

2013-09-11 Thread Zach Brown
the method lets the file system lock both for the duration of the copy, should it need to. If the method refuses to accelerate the copy, for whatever reason, we can naturally fall back to the generic direct splice method that sendfile uses today. Signed-off-by: Zach Brown --- fs/splice.c

[PATCH 3/3] btrfs: implement .splice_direct extent copying

2013-09-11 Thread Zach Brown
() already does elsewhere) is moved to a new much smaller btrfs_ioctl_clone(). btrfs_splice_direct() thus inherits the conservative limitations of the btrfs clone ioctl: it only allows block-aligned copies between files on the same snapshot. Signed-off-by: Zach Brown --- fs/btrfs/ctree.h | 2

Re: [RFC] extending splice for copy offloading

2013-09-25 Thread Zach Brown
Hrmph. I had composed a reply to you during Plumbers but.. something happened to it :). Here's another try now that I'm back. > > Some things to talk about: > > - I really don't care about the naming here. If you do, holler. > > - We might want different flags for file-to-file splicing and

Re: [RFC] extending splice for copy offloading

2013-09-25 Thread Zach Brown
On Wed, Sep 25, 2013 at 03:02:29PM -0400, Anna Schumaker wrote: > On Wed, Sep 25, 2013 at 2:38 PM, Zach Brown wrote: > > > > Hrmph. I had composed a reply to you during Plumbers but.. something > > happened to it :). Here's another try now that I'm back. > > >

Re: [RFC] extending splice for copy offloading

2013-09-25 Thread Zach Brown
> A client-side copy will be slower, but I guess it does have the > advantage that the application can track progress to some degree, and > abort it fairly quickly without leaving the file in a totally undefined > state--and both might be useful if the copy's not a simple constant-time >

Re: [PATCH 1/6] block: Introduce bio_for_each_page()

2013-09-25 Thread Zach Brown
> void zero_fill_bio(struct bio *bio) > { > - unsigned long flags; > struct bio_vec bv; > struct bvec_iter iter; > > - bio_for_each_segment(bv, bio, iter) { > +#if defined(CONFIG_HIGHMEM) || defined(ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE) > + bio_for_each_page(bv, bio, iter)

Re: [PATCH 1/6] block: Introduce bio_for_each_page()

2013-09-25 Thread Zach Brown
On Wed, Sep 25, 2013 at 02:49:10PM -0700, Kent Overstreet wrote: > On Wed, Sep 25, 2013 at 02:17:02PM -0700, Zach Brown wrote: > > > void zero_fill_bio(struct bio *bio) > > > { > > > - unsigned long flags; > > > struct bi

Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap()

2013-09-26 Thread Zach Brown
> > Sigh. A pox on whoever thought up huge pages. > > managing 1TB+ of memory in 4K chunks is just insane. > The question of larger pages is not "if", but only "when". And "how"! Sprinking a bunch of magical if (thp) {} else {} throughtout the code looks like a stunningly bad idea to me.

Re: [RFC] extending splice for copy offloading

2013-09-26 Thread Zach Brown
On Thu, Sep 26, 2013 at 10:58:05AM +0200, Miklos Szeredi wrote: > On Wed, Sep 25, 2013 at 11:07 PM, Zach Brown wrote: > >> A client-side copy will be slower, but I guess it does have the > >> advantage that the application can track progress to some degree, and > &g

Re: [RFC] extending splice for copy offloading

2013-09-26 Thread Zach Brown
On Thu, Sep 26, 2013 at 08:06:41PM +0200, Miklos Szeredi wrote: > On Thu, Sep 26, 2013 at 5:34 PM, J. Bruce Fields wrote: > > On Thu, Sep 26, 2013 at 10:58:05AM +0200, Miklos Szeredi wrote: > >> On Wed, Sep 25, 2013 at 11:07 PM, Zach Brown wrote: > >> >> A clien

Re: [RFC] extending splice for copy offloading

2013-09-27 Thread Zach Brown
> > >Sure. So we'd have: > > > > > >- no flag default that forbids knowingly copying with shared references > > > so that it will be used by default by people who feel strongly about > > > their assumptions about independent write durability. > > > > > >- a flag that allows shared references

Re: [RFC PATCH] vfs: add permute operation

2013-05-28 Thread Zach Brown
Some quick thoughts: > Permute the location of files. E.g. 'permute(A, B, C)' is equivalent to > A->B, > B->C and C->A. This is essentially a series of renames done as a single > atomic > operation. Hmm. Can we choose a more specific name than 'permute'? To me, ->permute() tells me just

Re: [RFC PATCH] vfs: add permute operation

2013-05-29 Thread Zach Brown
> >> +static void sort_parents3(struct dentry **p) > >> +void sort_parents(struct dentry **p, unsigned *nump) > > > > Yikes, that's a bunch of fiddly code. Is it *really* worth all that to > > avoid calling the generic sort helpers? > > AFAICS, I cannot make the compare function transitive,

Re: [RFC v0 1/4] vfs: add copy_range syscall and vfs entry point

2013-05-21 Thread Zach Brown
On Tue, May 21, 2013 at 07:47:19PM +, Eric Wong wrote: > Zach Brown wrote: > > On Wed, May 15, 2013 at 07:44:05PM +, Eric Wong wrote: > > > Why introduce a new syscall instead of extending sys_splice? > > > > Personally, I think it's ugly to have dif

Re: New copyfile system call - discuss before LSF?

2013-02-21 Thread Zach Brown
On Thu, Feb 21, 2013 at 08:50:27PM +, Myklebust, Trond wrote: > On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote: > > Il 21/02/2013 15:57, Ric Wheeler ha scritto: > > >>> > > >> sendfile64() pretty much already has the right arguments for a > > >> "copyfile", however it would be nice to

Re: New copyfile system call - discuss before LSF?

2013-02-22 Thread Zach Brown
> This seems to be suspiciously close to a clear consensus on how to > move forward after many years of spinning our wheels. Anyone want to > promote an actual patch before we change our collective minds? It seems like we'd want to start with the exisiting (presumably bitrotten) prototypes that

Re: [RFC] f_pos in readdir() (was Re: [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex)

2013-02-25 Thread Zach Brown
> As for ->readdir(), I'd like to resurrect an old proposal to change the ABI > of that sucker. Quoting the thread from 4 years ago: I'd love to see the readdir() interface cleaned up, yes please. > Comments? Hmm. Do we want to think about letting callers copy the name to userspace in

Re: New copyfile system call - discuss before LSF?

2013-02-25 Thread Zach Brown
> > I think it would be neat if it couldn't > > corrupt data. > > It would also be neat if the moon were made of cheese... And there we have the lsf2013 t-shirt slogan. I think we're done here! - z -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a

Re: [RFC] extending splice for copy offloading

2013-12-18 Thread Zach Brown
On Wed, Dec 18, 2013 at 04:41:26AM -0800, Christoph Hellwig wrote: > On Wed, Sep 11, 2013 at 10:06:47AM -0700, Zach Brown wrote: > > When I first started on this stuff I followed the lead of previous > > work and added a new syscall for the copy operation: > > > > http

Re: [PATCH 1/1] Btrfs: fix sparse warning

2014-07-16 Thread Zach Brown
51:got char * > > We can safely use (const char __user *) with set_fs(KERNEL_DS) Yeah, that cast is correct. Reviewed-by: Zach Brown > @@ -515,7 +515,8 @@ static int write_buf(struct file *filp, const void *buf, > u32 len, loff_t *off) Though this probably wants to be rewritt

Re: [PATCH 1/1] Btrfs: fix sparse warning

2014-07-17 Thread Zach Brown
> > > @@ -515,7 +515,8 @@ static int write_buf(struct file *filp, const void > > > *buf, > > > u32 len, loff_t *off) > > > > Though this probably wants to be rewritten in terms of kernel_write(). > > That'd give an opportunity to get rid of the sctx->send_off and have it > > use f_pos in the

Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-17 Thread Zach Brown
> SYNOPSIS > #include > > int getrandom(void *buf, size_t buflen, unsigned int flags); I certainly like the idea of getting entropy without having to worry about fds. > If the GRND_RANDOM flags bit is not set, then the /dev/raundom (raundom typo) > RETURN VALUE >On

Re: [PATCH, RFC] random: introduce getrandom(2) system call

2014-07-17 Thread Zach Brown
On Thu, Jul 17, 2014 at 04:54:17PM -0400, Theodore Ts'o wrote: > On Thu, Jul 17, 2014 at 12:48:12PM -0700, Zach Brown wrote: > > > > > + return urandom_read(NULL, buf, count, NULL); > > > > I wonder if we want to refactor the entry points a bit more instead of &

Re: [PATCH, RFC -v2] random: introduce getrandom(2) system call

2014-07-17 Thread Zach Brown
> + if (r) > + return r; > + } > + return urandom_read(NULL, buf, count, NULL); > +} I like how tiny this ends up being. Feel free to add my rb:. Reviewed-by: Zach Brown - z -- To unsubscribe from this list: send the line "unsubs

Re: [PATCH] Remove certain calls for releasing page cache

2014-07-30 Thread Zach Brown
On Wed, Jul 30, 2014 at 04:47:12PM -0400, Josef Bacik wrote: > On 07/30/2014 04:42 PM, Nicholas Krause wrote: > >This patch removes the lines for releasing the page cache in certain > >files as this may aid in perfomance with writes in the compression > >rountines of btrfs. Please note that this

Re: [PATCH 1/1] Btrfs: fix sparse warning

2014-08-04 Thread Zach Brown
On Sat, Aug 02, 2014 at 02:24:49PM +0200, Fabian Frederick wrote: > On Thu, 17 Jul 2014 12:01:52 -0700 > Zach Brown wrote: > > > > > > @@ -515,7 +515,8 @@ static int write_buf(struct file *filp, const > > > > > void *buf, > > > > > u32

Re: [PATCH 1/1] Btrfs: fix sparse warning

2014-08-05 Thread Zach Brown
> > > Hello Zach, > > > > > >     Here's an untested patch which > > > > Try testing it.  It's easy with virtualization and xfstests. > > > > You'll find that sending to a file fails because each individual file > > write call that makes up a send starts at offset 0 -- at the start of > > the

Re: [PATCH 7/9] aio: add aio_kernel_() interface

2014-07-23 Thread Zach Brown
On Thu, Jul 24, 2014 at 06:55:28AM +0800, Ming Lei wrote: > From: Dave Kleikamp > > This adds an interface that lets kernel callers submit aio iocbs without > going through the user space syscalls. This lets kernel callers avoid > the management limits and overhead of the context. It will also

Re: [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls

2014-07-25 Thread Zach Brown
On Fri, Jul 25, 2014 at 01:37:19PM -0400, Abhijith Das wrote: > Hi all, > > The topic of a readdirplus-like syscall had come up for discussion at last > year's > LSF/MM collab summit. I wrote a couple of syscalls with their GFS2 > implementations > to get at a directory's entries as well as

Re: [Cluster-devel] [RFC] readdirplus implementations: xgetdents vs dirreadahead syscalls

2014-07-25 Thread Zach Brown
On Fri, Jul 25, 2014 at 07:08:12PM +0100, Steven Whitehouse wrote: > Hi, > > On 25/07/14 18:52, Zach Brown wrote: > >On Fri, Jul 25, 2014 at 01:37:19PM -0400, Abhijith Das wrote: > >>Hi all, > >> > >>The topic of a readdirplus-like syscall had come up for

Re: [PATCH v1 1/9] aio: add aio_kernel_() interface

2014-08-14 Thread Zach Brown
On Thu, Aug 14, 2014 at 11:50:32PM +0800, Ming Lei wrote: > From: Dave Kleikamp > > This adds an interface that lets kernel callers submit aio iocbs without > going through the user space syscalls. This lets kernel callers avoid > the management limits and overhead of the context. It will also

Re: [PATCH 1/5] aio: Kill return value of aio_complete()

2012-10-09 Thread Zach Brown
from out of tree code?) Acked-by: Zach Brown Though, in the future please cc: aio patches to the maintainers. I'd have missed this if I wasn't sifting through lkml: $ ./scripts/get_maintainer.pl -f fs/aio.c Benjamin LaHaise (supporter:AIO) Alexander Viro (maintainer:FILESYSTEMS (VFS.

Re: [PATCH 2/5] aio: kiocb_cancel()

2012-10-09 Thread Zach Brown
On Mon, Oct 08, 2012 at 11:39:17PM -0700, Kent Overstreet wrote: > Minor refactoring, to get rid of some duplicated code Honestly: I wouldn't bother. Nothing of consequence uses cancel. I have an RFC patch series that tears it out. Let me polish that up send it out, I'll cc: you. - z -- To

Re: [PATCH 3/5] aio: Rewrite refcounting

2012-10-09 Thread Zach Brown
On Mon, Oct 08, 2012 at 11:39:18PM -0700, Kent Overstreet wrote: > The refcounting before wasn't very clear; there are two refcounts in > struct kioctx, with an unclear relationship between them (or between > them and ctx->dead). > > Now, reqs_active holds a refcount on users (when reqs_active is

Re: [PATCH 4/5] aio: vmap ringbuffer

2012-10-09 Thread Zach Brown
On Mon, Oct 08, 2012 at 11:39:19PM -0700, Kent Overstreet wrote: > It simplifies a lot of stuff if the ringbuffer is contiguously mapped > into kernel space, and we can delete a lot of code - in particular, this > is useful for converting read_events() to cmpxchg. 1) I'm concerned that Our

Re: [PATCH 5/5] aio: Refactor aio_read_evt, use cmxchg(), fix bug

2012-10-09 Thread Zach Brown
On Mon, Oct 08, 2012 at 11:39:20PM -0700, Kent Overstreet wrote: > Bunch of cleanup Ugh. That's way too much noisy change for one patch with no description. Break it up into functional pieces and actually describe them. > events off the ringbuffer without racing with io_getevents(). Are you

Re: [PATCH 4/5] aio: vmap ringbuffer

2012-10-09 Thread Zach Brown
> If it is measurable I'll take another stab at using memory from > __get_free_pages() for the ringbuffer. That really would be the ideal > solution. No, then you'll run into high order allocation failures with rings that don't fit in a single page. > The other reason I wanted to do this was for

Re: [PATCH 3/5] aio: Rewrite refcounting

2012-10-09 Thread Zach Brown
> Alright... send it out then. Workin' on it! :) > Also, do you know which branch Jens has his patches in? http://git.kernel.dk/?p=linux-block.git;a=commit;h=6b6723fc3e4f24dbd80526df935ca115ead578c6 https://plus.google.com/111643045511375507360/posts As far as I know, he hasn't had a chance

Re: [PATCH 5/5] aio: Refactor aio_read_evt, use cmxchg(), fix bug

2012-10-09 Thread Zach Brown
> If libaio is the only thing in userspace looking at the ringbuffer, and > if I'm looking at the latest libaio code this shouldn't break > anything... We can't assume that libaio is the only thing in userspace using the mapped buffer -- as scary a thought as that is :). If we wanted to change

Re: [PATCH 4/5] aio: vmap ringbuffer

2012-10-09 Thread Zach Brown
> Not if we decouple the ringbuffer size from max_requests. Hmm, interesting. > This would be useful to do anyways because right now, allocating a kiocb > has to take a global refcount and check head and tail in the ringbuffer > just so it can avoid overflowing the ringbuffer. I'm not sure what

Re: [PATCH 5/5] aio: Refactor aio_read_evt, use cmxchg(), fix bug

2012-10-09 Thread Zach Brown
> Well, the ringbuffer does have those compat flags and incompat flags. > Which libaio conveniently doesn't check, but for what it does it > shouldn't really matter I guess. Well, the presumed point of the incompat flags would be to tell an app that it isn't going to get what it expects! Ideally

Re: [PATCH 5/5] aio: Refactor aio_read_evt, use cmxchg(), fix bug

2012-10-09 Thread Zach Brown
> The AIO ringbuffer stuff just annoys me more than most Not more than everyone, though, I can personally promise you that :). > (it wasn't until > the other day that I realized it was actually exported to userspace... > what led to figuring that out was noticing aio_context_t was a ulong, > and

Re: [PATCH 4/5] aio: vmap ringbuffer

2012-10-09 Thread Zach Brown
> The only situation you have to worry about is when the ringbuffer fills > up and stuff goes on the list, and then completions completely stop - > this should be a rare enough situation that maybe we could just hack > around it with a timer that gets flipped on when the list isn't empty. Right.

Re: [PATCH 2/5] aio: kiocb_cancel()

2012-10-10 Thread Zach Brown
> And maybe the current way of doing things isn't the best way. But it > would be nice if we didn't completely give up on the functionality of > aio_cancel. I sympathize, but the reality is that the current infrastructure is very bad and no one is using it. It's not like we're getting rid of

Re: [PATCH 5/5] aio: Refactor aio_read_evt, use cmxchg(), fix bug

2012-10-10 Thread Zach Brown
> True. But that could be solved with a separate interface that either > doesn't use a context to submit a call synchronously, or uses an > implicit per thread context. Sure, but why bother if we can make the one submission interface fast enough to satisfy quick callers? Less is more, and all

Re: [PATCH 5/5] aio: Refactor aio_read_evt, use cmxchg(), fix bug

2012-10-11 Thread Zach Brown
> Yeah, but that means the completion has to be delivered from process > context. That's not what aio does today, and it'd be a real performance > regression. It'd only have to to complete from process context if it faults. The cheapest possible delivery mechanism is simple cpu stores. In the

Re: [BUG REPORT] [mm-hotplug, aio] aio ring_pages can't be offlined

2012-11-29 Thread Zach Brown
> The best I can think of is to make changes in or around > get_user_pages(), to steal the pages from userspace and replace them > with non-movable ones before pinning them. The performance cost of > something like this would surely be unacceptable for direct-io, but > maybe OK for the aio ring

Re: [patch] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-11-30 Thread Zach Brown
> + ret = cpulist_parse(buf, newmask); > + if (!ret) { > + spin_lock(>wb_lock); > + task = wb->task; > + get_task_struct(task); > + spin_unlock(>wb_lock); > + if (task) > + ret = set_cpus_allowed_ptr(task,

Re: [patch] bdi: add a user-tunable cpu_list for the bdi flusher threads

2012-12-03 Thread Zach Brown
On Mon, Dec 03, 2012 at 11:22:31AM -0500, Jeff Moyer wrote: > Jeff Moyer writes: > > >>> + bdi->flusher_cpumask = kmalloc(sizeof(cpumask_t), GFP_KERNEL); > >>> + if (!bdi->flusher_cpumask) > >>> + return -ENOMEM; > >> > >> The bare GFP_KERNEL raises an eyebrow.

Re: [PATCH] Update atime from future.

2012-12-04 Thread Zach Brown
On Tue, Dec 04, 2012 at 01:56:39AM +0800, yangsheng wrote: > Relatime should update the inode atime if it is more than a day in the > future. The original problem seen was a tarball that had a bad atime, > but could also happen if someone fat-fingers a "touch". The future > atime will never be

Re: next-20130117 - kernel BUG with aio

2013-01-24 Thread Zach Brown
> No, I didn't see that bug until after I'd fixed the other three, but as > far as I can tell everything's fixed with the patches I'm about to mail > out - my test VM has been running for the past two days without errors, > it's kill -9'ing a process that's got iocbs in flight to a loopback >

Re: lockdep warning with LTP dio test (v2.6.24-rc6-125-g5356f66)

2008-01-02 Thread Zach Brown
Erez Zadok wrote: > Setting: ltp-full-20071031, dio01 test on ext3 with Linus's latest tree. > Kernel w/ SMP, preemption, and lockdep configured. This is a real lock ordering problem. Thanks for reporting it. The updating of atime inside sys_mmap() orders the mmap_sem in the vfs outside of the

Re: [PATCH] aio: partial write should not return error code.

2008-01-03 Thread Zach Brown
Rusty Russell wrote: > When an AIO write gets an error after writing some data (eg. ENOSPC), > it should return the amount written already, not the error. Just like > write() is supposed to. Andrew, please don't queue this fix. I think the bug is valid but the patch is subtly dangerous. > diff

Re: [PATCH] aio: negative offset should return -EINVAL

2008-01-03 Thread Zach Brown
Rusty Russell wrote: > An AIO read or write should return -EINVAL if the offset is negative. > This check matches the one in pread and pwrite. > > This was found by the libaio test suite. > > Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> This looks fine to me. S

Re: [PATCH] aio: partial write should not return error code.

2008-01-04 Thread Zach Brown
gt; > Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> This looks good, feel free to push this from your tree. Acked-By: Zach Brown <[EMAIL PROTECTED]> - z -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTE

Re: [PATCH 5/6] syslets: add generic syslets infrastructure

2008-01-08 Thread Zach Brown
> Firstly, why not just specify an address for the return value and be done > with it? This infrastructure seems overkill, and you can always extend later > if required. Sorry, which infrastructure? Providing the function and stack to return to? Sure, I could certainly entertain the

Re: [PATCH 5/6] syslets: add generic syslets infrastructure

2008-01-09 Thread Zach Brown
>> Or do you mean the syscall return value ending up in the userspace >> completion event ring? That's mostly about being able to wait for >> pending syslets to complete. > > The latter. A ring is optimal for processing a huge number of requests, but > if you're really going to be firing off

Re: [PATCH 5/6] syslets: add generic syslets infrastructure

2008-01-09 Thread Zach Brown
Linus Torvalds wrote: > > On Thu, 10 Jan 2008, Rusty Russell wrote: >> I'd have to read his original statement, but eventfd doesn't build up state, >> so I think it qualifies. > > How about you guys battle it out by giving an example program usign the > interface? > > Here's a favourite

Re: hwclock failure in x86.git

2008-01-10 Thread Zach Brown
I'm no expert, but I happened to notice this go by. > The first thing I notice about the path is that ioport_32.c and the unified > ioport.c use __clear_bit, > while ioport_64.c uses clear_bit. That doesn't seem too critical. > +#ifdef CONFIG_X86_32 > +asmlinkage long sys_iopl(unsigned long

[PATCH 1/6] indirect: use asmlinkage in i386 syscall table prototype

2007-12-06 Thread Zach Brown
call_indirect() was using the wrong calling convention for the system call handlers. system call handlers would get mixed up arguments. Signed-off-by: Zach Brown <[EMAIL PROTECTED]> diff --git a/include/asm-x86/indirect_32.h b/include/asm-x86/indirect_32.h index a1b72ac..e3dea8e

syslets v7: back to basics

2007-12-06 Thread Zach Brown
The following patches are a substantial refactoring of the syslet code. I'm branding them as the v7 release of the syslet infrastructure, though they represent a signifiant change in focus. My current focus is to see the most fundamental functionality brought to maturity. To me, this means

<    3   4   5   6   7   8   9   10   11   >