Re: dm-clock queue

2015-11-05 Thread Christoph Hellwig
Can someone explain what dm-clock is? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: dm-clock queue

2015-11-05 Thread Christoph Hellwig
Oh, ok - so ti's not a device mapper module. Thanks a for the clarification! -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: newstore direction

2015-10-22 Thread Christoph Hellwig
On Wed, Oct 21, 2015 at 10:30:28AM -0700, Sage Weil wrote: > For example: we need to do an overwrite of an existing object that is > atomic with respect to a larger ceph transaction (we're updating a bunch > of other metadata at the same time, possibly overwriting or appending to > multiple

Re: [PATCH 12/18] target: compare and write backend driver sense handling

2015-09-06 Thread Christoph Hellwig
On Wed, Jul 29, 2015 at 04:23:49AM -0500, mchri...@redhat.com wrote: > From: Mike Christie > > Currently, backend drivers seem to only fail IO with > SAM_STAT_CHECK_CONDITION which gets us > TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE. > For compare and write support we will

Re: FileStore should not use syncfs(2)

2015-08-06 Thread Christoph Hellwig
On Wed, Aug 05, 2015 at 02:26:30PM -0700, Sage Weil wrote: Today I learned that syncfs(2) does an O(n) search of the superblock's inode list searching for dirty items. I've always assumed that it was only traversing dirty inodes (e.g., a list of dirty inodes), but that appears not to be

Re: FileStore should not use syncfs(2)

2015-08-06 Thread Christoph Hellwig
On Thu, Aug 06, 2015 at 06:00:42AM -0700, Sage Weil wrote: I'm guessing the strategy here should be to fsync the file (leaf) and then any affected ancestors, such that the directory fsyncs are effectively no-ops? Or does it matter? All metadata transactions log the involve parties (parent

Re: [PATCH 01/18] libceph: add scatterlist messenger data type

2015-07-30 Thread Christoph Hellwig
On Wed, Jul 29, 2015 at 06:40:01PM -0500, Mike Christie wrote: I guess I was viewing this similar to cephfs where it does not use rbd and the block layer. It just makes ceph/rados calls directly using libceph. I am using rbd.c for its helper/wrapper functions around the libceph ones, but I

Re: [PATCH 01/18] libceph: add scatterlist messenger data type

2015-07-29 Thread Christoph Hellwig
On Wed, Jul 29, 2015 at 04:23:38AM -0500, mchri...@redhat.com wrote: From: Mike Christie micha...@cs.wisc.edu LIO uses scatterlist for its page/data management. This patch adds a scatterlist messenger data type, so LIO can pass its sg down directly to rbd. Just as I mentioned for David's

Re: [RFC PATCH 0/5] rbd_tcm cluster COMPARE AND WRITE

2015-07-29 Thread Christoph Hellwig
Hi David, please introduce a proper compare and write API at the block layer instead of bypassing it. Thanks! -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at

Re: Ceph write path optimization

2015-07-29 Thread Christoph Hellwig
On Tue, Jul 28, 2015 at 11:46:06PM +0200, ??ukasz Redynk wrote: Hi, Have you tried to tune XFS mkfs options? From mkfs.xfs(8) a) (log section, -l) lazy-count=value // by default is 0 It's default. And less AGs arent going to help you here. Please don't start micro tuning filesystem

Re: Ceph write path optimization

2015-07-29 Thread Christoph Hellwig
On Tue, Jul 28, 2015 at 09:08:27PM +, Somnath Roy wrote: 2. Each filestore Op threads is now doing O_DSYNC write followed by posix_fadvise(**fd, 0, 0, POSIX_FADV_DONTNEED); Where aren't you using O_DIRECT | O_DSYNC? 15. The main challenge I am facing in both the scheme is XFS metadata

Re: [PATCH v4 08/11] block: kill merge_bvec_fn() completely

2015-05-25 Thread Christoph Hellwig
On Fri, May 22, 2015 at 11:18:40AM -0700, Ming Lin wrote: From: Kent Overstreet kent.overstr...@gmail.com As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own -merge_bvec_fn() callback. Remove

Re: [PATCH v4 08/11] block: kill merge_bvec_fn() completely

2015-05-25 Thread Christoph Hellwig
On Mon, May 25, 2015 at 06:02:30PM +0300, Ilya Dryomov wrote: I'm not Alex, but yeah, we have all the clone/split machinery and so we can handle a spanning case just fine. I think rbd_merge_bvec() exists to make sure we don't have to do that unless it's really necessary - like when a single

Re: [PATCH 11/12] fs: don't reassign dirty inodes to default_backing_dev_info

2015-03-24 Thread Christoph Hellwig
On Mon, Mar 23, 2015 at 06:40:13PM -0400, Mike Snitzer wrote: FYI, here is the DM fix I've staged for 4.0-rc6. I'll continue testing the various DM targets before requesting Linus to pull. Yeah, from looking at the bugzilla it seemed like dm was releasing the dev_t before the queue has been

Re: NewStore update

2015-02-22 Thread Christoph Hellwig
On Sat, Feb 21, 2015 at 09:53:45AM -0800, Sage Weil wrote: Ah, thanks. I guess in the buffered case though we won't block normally anyway (unless we've hit the bdi dirty threshold). So it's probably either aio direct or buffered write + aio fsync, depending on the cache hints? buffered

Re: NewStore update

2015-02-21 Thread Christoph Hellwig
On Thu, Feb 19, 2015 at 03:50:45PM -0800, Sage Weil wrote: - assemble the transaction - start any aio writes (we could use O_DIRECT here if the new hints include WONTNEED?) Note that kernel aio only is async if you specifiy O_DIRECT, otherwise io_submit will simply block. -- To unsubscribe

Re: backing_dev_info cleanups lifetime rule fixes V2

2015-02-02 Thread Christoph Hellwig
as this: Make super_blocks and sb_lock static The only user outside of fs/super.c is gone now Signed-off-by: Al Viro v...@zeniv.linux.org.uk I'd say merge it through the block tree.. Acked-by: Christoph Hellwig h...@lst.de -- To unsubscribe from this list: send the line unsubscribe ceph-devel

[PATCH 01/12] fs: deduplicate noop_backing_dev_info

2015-01-14 Thread Christoph Hellwig
hugetlbfs, kernfs and dlmfs can simply use noop_backing_dev_info instead of creating a local duplicate. Signed-off-by: Christoph Hellwig h...@lst.de Acked-by: Tejun Heo t...@kernel.org --- fs/hugetlbfs/inode.c| 14 +- fs/kernfs/inode.c | 14 +- fs/kernfs

backing_dev_info cleanups lifetime rule fixes V2

2015-01-14 Thread Christoph Hellwig
The first 8 patches are unchanged from the series posted a week ago and cleans up how we use the backing_dev_info structure in preparation for fixing the life time rules for it. The most important change is to split the unrelated nommu mmap flags from it, but it also remove a backing_dev_info

[PATCH 12/12] fs: remove default_backing_dev_info

2015-01-14 Thread Christoph Hellwig
do. - we can assign noop_backing_dev_info as the default one in alloc_super. All filesystems already either assigned their own or noop_backing_dev_info. Signed-off-by: Christoph Hellwig h...@lst.de Reviewed-by: Tejun Heo t...@kernel.org --- fs/btrfs/disk-io.c | 2 +- fs/ceph

[PATCH 06/12] nilfs2: set up s_bdi like the generic mount_bdev code

2015-01-14 Thread Christoph Hellwig
mapping-backing_dev_info will go away, so don't rely on it. Signed-off-by: Christoph Hellwig h...@lst.de Acked-by: Ryusuke Konishi konishi.ryus...@lab.ntt.co.jp Reviewed-by: Tejun Heo t...@kernel.org --- fs/nilfs2/super.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs

[PATCH 03/12] fs: introduce f_op-mmap_capabilities for nommu mmap support

2015-01-14 Thread Christoph Hellwig
for the mtd_inodefs filesystem. Signed-off-by: Christoph Hellwig h...@lst.de Reviewed-by: Tejun Heo t...@kernel.org --- Documentation/nommu-mmap.txt| 8 +-- block/blk-core.c| 2 +- drivers/char/mem.c | 64

[PATCH 11/12] fs: don't reassign dirty inodes to default_backing_dev_info

2015-01-14 Thread Christoph Hellwig
. Signed-off-by: Christoph Hellwig h...@lst.de Reviewed-by: Tejun Heo t...@kernel.org --- mm/backing-dev.c | 91 +++- 1 file changed, 24 insertions(+), 67 deletions(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 52e0c76..3ebba25 100644

[PATCH 02/12] fs: kill BDI_CAP_SWAP_BACKED

2015-01-14 Thread Christoph Hellwig
This bdi flag isn't too useful - we can determine that a vma is backed by either swap or shmem trivially in the caller. This also allows removing the backing_dev_info instaces for swap and shmem in favor of noop_backing_dev_info. Signed-off-by: Christoph Hellwig h...@lst.de Reviewed-by: Tejun

[PATCH 07/12] fs: export inode_to_bdi and use it in favor of mapping-backing_dev_info

2015-01-14 Thread Christoph Hellwig
. Signed-off-by: Christoph Hellwig h...@lst.de Reviewed-by: Tejun Heo t...@kernel.org --- fs/btrfs/file.c | 2 +- fs/ceph/file.c | 2 +- fs/ext2/ialloc.c | 2 +- fs/ext4/super.c | 2 +- fs/fs-writeback.c| 3

[PATCH 04/12] block_dev: only write bdev inode on close

2015-01-14 Thread Christoph Hellwig
blkdev_put, but not when doing a blkdev_get. Factoring out the write out from the bdi list switch prepares from removing the list switch later in the series. Signed-off-by: Christoph Hellwig h...@lst.de Acked-by: Tejun Heo t...@kernel.org --- fs/block_dev.c | 31 +++ 1

[PATCH 09/12] ceph: remove call to bdi_unregister

2015-01-14 Thread Christoph Hellwig
bdi_destroy already does all the work, and if we delay freeing the anon bdev we can get away with just that single call. Signed-off-by: Christoph Hellwig h...@lst.de --- fs/ceph/super.c | 18 ++ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/fs/ceph/super.c b/fs

[PATCH 08/12] fs: remove mapping-backing_dev_info

2015-01-14 Thread Christoph Hellwig
Now that we never use the backing_dev_info pointer in struct address_space we can simply remove it and save 4 to 8 bytes in every inode. Signed-off-by: Christoph Hellwig h...@lst.de Acked-by: Ryusuke Konishi konishi.ryus...@lab.ntt.co.jp Reviewed-by: Tejun Heo t...@kernel.org --- drivers/char

[PATCH 10/12] nfs: don't call bdi_unregister

2015-01-14 Thread Christoph Hellwig
bdi_destroy already does all the work, and if we delay freeing the anon bdev we can get away with just that single call. Addintionally remove the call during mount failure, as deactivate_super_locked will already call -kill_sb and clean up the bdi for us. Signed-off-by: Christoph Hellwig h

[PATCH 05/12] block_dev: get bdev inode bdi directly from the block device

2015-01-14 Thread Christoph Hellwig
Directly grab the backing_dev_info from the request_queue instead of detouring through the address_space. Signed-off-by: Christoph Hellwig h...@lst.de Reviewed-by: Tejun Heo t...@kernel.org --- fs/fs-writeback.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/fs

Re: [PATCH v2] rbd: convert to blk-mq

2015-01-13 Thread Christoph Hellwig
On Mon, Jan 12, 2015 at 08:10:48PM +0300, Ilya Dryomov wrote: Why is this call here? Why not above or below? I doubt it makes much difference, but from a clarity standpoint at least, shouldn't it be placed after all the checks and allocations, say before the call to rbd_img_request_submit()?

[PATCH v3] rbd: convert to blk-mq

2015-01-13 Thread Christoph Hellwig
8iops. Signed-off-by: Christoph Hellwig h...@lst.de Reviewed-by: Alex Elder el...@linaro.org --- drivers/block/rbd.c | 121 +--- 1 file changed, 67 insertions(+), 54 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index

Re: [PATCH 04/12] block_dev: only write bdev inode on close

2015-01-12 Thread Christoph Hellwig
On Sun, Jan 11, 2015 at 12:32:09PM -0500, Tejun Heo wrote: Is this an optimization or something necessary for the following changes? If latter, maybe it's a good idea to state why this is necessary in the description? Otherwise, It gets rid of a bdi reassignment, and thus makes life a lot

Re: [PATCH 07/12] fs: export inode_to_bdi and use it in favor of mapping-backing_dev_info

2015-01-12 Thread Christoph Hellwig
On Sun, Jan 11, 2015 at 01:16:51PM -0500, Tejun Heo wrote: +struct backing_dev_info *inode_to_bdi(struct inode *inode) { struct super_block *sb = inode-i_sb; #ifdef CONFIG_BLOCK @@ -75,6 +75,7 @@ static inline struct backing_dev_info *inode_to_bdi(struct inode *inode) #endif

[PATCH v2] rbd: convert to blk-mq

2015-01-12 Thread Christoph Hellwig
8iops. Signed-off-by: Christoph Hellwig h...@lst.de Reviewed-by: Alex Elder el...@linaro.org --- drivers/block/rbd.c | 120 +--- 1 file changed, 67 insertions(+), 53 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index

[PATCH] rbd: convert to blk-mq

2015-01-10 Thread Christoph Hellwig
8iops. Signed-off-by: Christoph Hellwig h...@lst.de --- drivers/block/rbd.c | 118 +--- 1 file changed, 67 insertions(+), 51 deletions(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 3ec85df..52cd677 100644 --- a/drivers/block

Re: [PATCH v2 00/10] locks: saner method for managing file locks

2015-01-09 Thread Christoph Hellwig
Modulo the minor nitpiks this looks fine to me: Acked-by: Christoph Hellwig h...@lst.de -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 02/10] locks: have locks_release_file use flock_lock_file to release generic flock locks

2015-01-09 Thread Christoph Hellwig
On Thu, Jan 08, 2015 at 10:34:17AM -0800, Jeff Layton wrote: ...instead of open-coding it and removing flock locks directly. This simplifies some coming interim changes in the following patches when we have different file_lock types protected by different spinlocks. It took me quite a while to

Re: [PATCH v2 04/10] locks: move flock locks to file_lock_context

2015-01-09 Thread Christoph Hellwig
void ceph_count_locks(struct inode *inode, int *fcntl_count, int *flock_count) { struct file_lock *lock; + struct file_lock_context *ctx; *fcntl_count = 0; *flock_count = 0; + spin_lock(inode-i_lock); Seems like moving the locking around is unrelated to

Re: [PATCH v2 02/10] locks: have locks_release_file use flock_lock_file to release generic flock locks

2015-01-09 Thread Christoph Hellwig
On Fri, Jan 09, 2015 at 06:42:57AM -0800, Jeff Layton wrote: I'd suggest keeping an open coded loop in locks_remove_flock, which should both be more efficient and easier to review. I don't know. On the one hand, I rather like keeping all of the lock removal logic in a single spot. On

[PATCH 01/12] fs: deduplicate noop_backing_dev_info

2015-01-08 Thread Christoph Hellwig
hugetlbfs, kernfs and dlmfs can simply use noop_backing_dev_info instead of creating a local duplicate. Signed-off-by: Christoph Hellwig h...@lst.de --- fs/hugetlbfs/inode.c| 14 +- fs/kernfs/inode.c | 14 +- fs/kernfs/kernfs-internal.h | 1 - fs/kernfs

backing_dev_info cleanups lifetime rule fixes

2015-01-08 Thread Christoph Hellwig
The first 8 patches are unchanged from the series posted a week ago and cleans up how we use the backing_dev_info structure in preparation for fixing the life time rules for it. The most important change is to split the unrelated nommu mmap flags from it, but it also remove a backing_dev_info

[PATCH 12/12] fs: remove default_backing_dev_info

2015-01-08 Thread Christoph Hellwig
do. - we can assign noop_backing_dev_info as the default one in alloc_super. All filesystems already either assigned their own or noop_backing_dev_info. Signed-off-by: Christoph Hellwig h...@lst.de --- fs/btrfs/disk-io.c | 2 +- fs/ceph/super.c | 2 +- fs

[PATCH 11/12] fs: don't reassign dirty inodes to default_backing_dev_info

2015-01-08 Thread Christoph Hellwig
that the bdi must always outlive the super block. Signed-off-by: Christoph Hellwig h...@lst.de --- mm/backing-dev.c | 91 +++- 1 file changed, 24 insertions(+), 67 deletions(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 52e0c76..3ebba25

[PATCH 02/12] fs: kill BDI_CAP_SWAP_BACKED

2015-01-08 Thread Christoph Hellwig
This bdi flag isn't too useful - we can determine that a vma is backed by either swap or shmem trivially in the caller. This also allows removing the backing_dev_info instaces for swap and shmem in favor of noop_backing_dev_info. Signed-off-by: Christoph Hellwig h...@lst.de --- include/linux

[PATCH 07/12] fs: export inode_to_bdi and use it in favor of mapping-backing_dev_info

2015-01-08 Thread Christoph Hellwig
. Signed-off-by: Christoph Hellwig h...@lst.de --- fs/btrfs/file.c | 2 +- fs/ceph/file.c | 2 +- fs/ext2/ialloc.c | 2 +- fs/ext4/super.c | 2 +- fs/fs-writeback.c| 3 ++- fs/fuse/file.c | 10

[PATCH 10/12] nfs: don't call bdi_unregister

2015-01-08 Thread Christoph Hellwig
bdi_destroy already does all the work, and if we delay freeing the anon bdev we can get away with just that single call. Addintionally remove the call during mount failure, as deactivate_super_locked will already call -kill_sb and clean up the bdi for us. Signed-off-by: Christoph Hellwig h

[PATCH 03/12] fs: introduce f_op-mmap_capabilities for nommu mmap support

2015-01-08 Thread Christoph Hellwig
for the mtd_inodefs filesystem. Signed-off-by: Christoph Hellwig h...@lst.de --- Documentation/nommu-mmap.txt| 8 +-- block/blk-core.c| 2 +- drivers/char/mem.c | 64 ++-- drivers/mtd/mtdchar.c

[PATCH 08/12] fs: remove mapping-backing_dev_info

2015-01-08 Thread Christoph Hellwig
Now that we never use the backing_dev_info pointer in struct address_space we can simply remove it and save 4 to 8 bytes in every inode. Signed-off-by: Christoph Hellwig h...@lst.de Acked-by: Ryusuke Konishi konishi.ryus...@lab.ntt.co.jp --- drivers/char/raw.c | 4 +--- fs/aio.c

[PATCH 06/12] nilfs2: set up s_bdi like the generic mount_bdev code

2015-01-08 Thread Christoph Hellwig
mapping-backing_dev_info will go away, so don't rely on it. Signed-off-by: Christoph Hellwig h...@lst.de Acked-by: Ryusuke Konishi konishi.ryus...@lab.ntt.co.jp --- fs/nilfs2/super.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c

[PATCH 09/12] ceph: remove call to bdi_unregister

2015-01-08 Thread Christoph Hellwig
bdi_destroy already does all the work, and if we delay freeing the anon bdev we can get away with just that single call. Signed-off-by: Christoph Hellwig h...@lst.de --- fs/ceph/super.c | 18 ++ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/fs/ceph/super.c b/fs

[PATCH 05/12] block_dev: get bdev inode bdi directly from the block device

2015-01-08 Thread Christoph Hellwig
Directly grab the backing_dev_info from the request_queue instead of detouring through the address_space. Signed-off-by: Christoph Hellwig h...@lst.de --- fs/fs-writeback.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index

Re: krbd blk-mq support ?

2014-12-10 Thread Christoph Hellwig
On Thu, Nov 13, 2014 at 10:44:18AM +0100, Alexandre DERUMIER wrote: Did you manage to get those numbers? Not yet, I'll try next week. What's the result? I'd really like to get rid of old request drivers as much as possible. -- To unsubscribe from this list: send the line unsubscribe

Re: krbd blk-mq support ?

2014-11-12 Thread Christoph Hellwig
On Tue, Nov 04, 2014 at 08:19:32AM +0100, Alexandre DERUMIER wrote: Now : 3.18 kernel + your patch : 12 iops 3.10 kernel : 8iops I'll try 3.18 kernel without your patch to compare. Did you manage to get those numbers? -- To unsubscribe from this list: send the line

blk-mq: allow to defer -queue_rq invocations to workqueue

2014-11-03 Thread Christoph Hellwig
Drivers that need to do synchronous, blocking operations to do I/O generally want to defer all I/O to a drŅ–ver-private workqueue. Examples for that are the loop driver, rbd, or ubi block driver, and probably lots more that haven't been evaluated yet. -- To unsubscribe from this list: send the

Re: [PATCH 2/2] blk-mq: allow direct dispatch to a driver specific workqueue

2014-11-03 Thread Christoph Hellwig
On Mon, Nov 03, 2014 at 04:40:47PM +0800, Ming Lei wrote: The above two aren't enough because the big problem is that drivers need a per-request work structure instead of 'hctx-run_work', otherwise there are at most NR_CPUS concurrent submissions. So the per-request work structure should be

Re: krbd blk-mq support ?

2014-11-03 Thread Christoph Hellwig
Hi Alexandre, can you try the patch below instead of the previous three patches? This one uses a per-request work struct to allow for more concurrency. diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 0a54c58..b981096 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@

Re: krbd blk-mq support ?

2014-10-28 Thread Christoph Hellwig
On Mon, Oct 27, 2014 at 11:00:46AM +0100, Alexandre DERUMIER wrote: Can you do a perf report -ag and then a perf report to see where these cycles are spent? Yes, sure. I have attached the perf report to this mail. (This is with kernel 3.14, don't have access to my 3.18 host for now) Oh,

Re: krbd blk-mq support ?

2014-10-27 Thread Christoph Hellwig
On Sun, Oct 26, 2014 at 02:46:03PM +0100, Alexandre DERUMIER wrote: Hi, some news: I have applied patches succefully on top of 3.18-rc1 kernel. But don't seem to help is my case. (I think that blk-mq is working because I don't see any io schedulers on rbd devices, as blk-mq don't

Re: krbd blk-mq support ?

2014-10-24 Thread Christoph Hellwig
If you're willing to experiment give the patches below a try, not that I don't have a ceph test cluster available, so the conversion is untestested. From 00668f00afc6f0cfbce05d1186116469c1f3f9b3 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig h...@lst.de Date: Fri, 24 Oct 2014 11:53:36 +0200

Re: kerberos / AD requirements, blueprint

2014-10-23 Thread Christoph Hellwig
On Wed, Oct 22, 2014 at 06:46:06PM -0400, m...@linuxbox.com wrote: I think the overwhelming common implementation is AD - at all sizes of organizations from small to large. But most of those will be microsoft-only environments, so aren't particularly relevant to ceph. I don't have good stats

Re: [PATCH 2/5] block: add function to issue compare and write

2014-10-18 Thread Christoph Hellwig
On Fri, Oct 17, 2014 at 07:38:37PM -0400, Martin K. Petersen wrote: The problem with this is that, as it stands, a bio has no type. And it would suck if we couldn't keep bio rw and request flags in sync. I wonder if it would make more sense to move the remaining rq types to cmd_flags after

Re: [PATCH 2/5] block: add function to issue compare and write

2014-10-17 Thread Christoph Hellwig
On Thu, Oct 16, 2014 at 12:37:12AM -0500, micha...@cs.wisc.edu wrote: @@ -160,7 +160,7 @@ enum rq_flag_bits { __REQ_DISCARD, /* request to discard sectors */ __REQ_SECURE, /* secure discard (used with __REQ_DISCARD) */ __REQ_WRITE_SAME, /* write same

Re: Weekly performance meeting

2014-09-26 Thread Christoph Hellwig
On Fri, Sep 26, 2014 at 08:58:56AM -0400, Milosz Tanski wrote: First, I have recently submitted a series of patches to kernel to add a new preadv2 syscall that lets you do a fast read out of the page cache the point being that you can skip the whole disk IO queue in user space in the cases

Re: [PATCH] rbd: rework rbd_request_fn()

2014-08-05 Thread Christoph Hellwig
On Tue, Aug 05, 2014 at 11:38:44AM +0400, Ilya Dryomov wrote: While it was never a good idea to sleep in request_fn(), commit 34c6bc2c919a (locking/mutexes: Add extra reschedule point) made it a *bad* idea. mutex_lock() since 3.15 may reschedule *before* putting task on the mutex wait queue,

Re: Forever growing data in ceph using RBD image

2014-07-17 Thread Christoph Hellwig
On Thu, Jul 17, 2014 at 11:27:31AM -0700, Sage Weil wrote: I assume you are using kvm/qemu? It may be that older versions aren't passing through trims; Josh would know more. Or maybe the trim sizes are too small to let rados effectively deallocate entire objects. Logs might help there.

Re: v0.80.4 Firefly released

2014-07-16 Thread Christoph Hellwig
On Tue, Jul 15, 2014 at 04:45:59PM -0700, Sage Weil wrote: This Firefly point release fixes an potential data corruption problem when ceph-osd daemons run on top of XFS and service Firefly librbd clients. A recently added allocation hint that RBD utilizes triggers an XFS bug on some kernels

Re: [PATCH 2/4] fs: Prevent doing FALLOC_FL_ZERO_RANGE on append only file

2014-04-12 Thread Christoph Hellwig
On Fri, Apr 11, 2014 at 08:57:43PM +0200, Lukas Czerner wrote: /* - * It's not possible to punch hole or perform collapse range - * on append only file + * It's not possible to punch hole, perform collapse range + * or zero range on append only file */ -

Re: [PATCH 3/4] fs: Remove i_size check from do_fallocate

2014-04-12 Thread Christoph Hellwig
Looks good, but the subject line is misleading, it should read something like: fs: move falloc collapse range check into the filesystem methods Might also be worth mentioning that size checks for the other modes are in the filesystems in the the long description. Reviewed-by: Christoph Hellwig

Re: [PATCH 4/4] fs: Disallow all fallocate operation on active swapfile

2014-04-12 Thread Christoph Hellwig
Given that the earlier patches were about races - what protects us from swapon racing with the check outside the filesystem locks? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at

Re: [PATCH v2] ceph: fix posix ACL hooks

2014-02-04 Thread Christoph Hellwig
On Tue, Feb 04, 2014 at 11:33:35AM +, Steven Whitehouse wrote: To diverge from that topic for a moment, this thread has also brought together some discussion on another issue which I've been pondering recently that of whether the inode operations for get/set_xattr should take a dentry

Re: [PATCH v2] ceph: fix posix ACL hooks

2014-02-03 Thread Christoph Hellwig
On Thu, Jan 30, 2014 at 02:01:38PM -0800, Linus Torvalds wrote: In the end, all the original call-sites should have a dentry, and none of this is fundamental. But you're right, it looks like an absolute nightmare to add the dentry pointer through the whole chain. Damn. So I'm not thrilled

Re: [PATCH v2] ceph: fix posix ACL hooks

2014-02-03 Thread Christoph Hellwig
On Mon, Feb 03, 2014 at 01:03:32PM -0800, Linus Torvalds wrote: Now, to be honest, pushing it down one more level (to generic_permission()) will actually start causing some trouble. In particular, gfs2_permission() fundamentally does not have a dentry for several of the callers. Looking over

Re: [PATCH v2] ceph: fix posix ACL hooks

2014-02-03 Thread Christoph Hellwig
On Mon, Feb 03, 2014 at 09:19:55PM +, Al Viro wrote: Result *is* a function of inode alone; the problem with 9P is that we are caching FIDs in the wrong place. I don't think that's true for CIFS unfortunately, which is path based. -- To unsubscribe from this list: send the line unsubscribe

Re: [PATCH v2] ceph: fix posix ACL hooks

2014-02-03 Thread Christoph Hellwig
On Mon, Feb 03, 2014 at 09:31:53PM +, Al Viro wrote: Yes, and...? CIFS also doesn't have hardlinks, so _there_ d_find_alias() is just fine. It does have hardlinks, look at cifs_hardlink and functions called from it. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in

Re: [GIT PULL] Ceph updates for -rc1

2014-01-30 Thread Christoph Hellwig
On Wed, Jan 29, 2014 at 06:30:00AM -0800, Sage Weil wrote: The set_acl inode_operation wasn't getting set, and the prototype needed to be adjusted a bit (it doesn't take a dentry anymore). All seems to be well with the below patch. Btw, there's a few minor bits that should go on top of

Re: [PATCH v2] ceph: fix posix ACL hooks

2014-01-29 Thread Christoph Hellwig
On Wed, Jan 29, 2014 at 11:09:18AM -0800, Linus Torvalds wrote: So attached is the incremental diff of the patch by Sage and Ilya, and I'll apply it (delayed a bit to see if I can get the sign-off from Ilya), but I also think we should fix the (non-cached) ACL functions that call down to the

Re: os recommendations

2013-11-27 Thread Christoph Hellwig
On Tue, Nov 26, 2013 at 06:50:33AM -0800, Sage Weil wrote: If syncfs(2) is not present, we have to use sync(2). That means you have N daemons calling sync(2) to force a commit on a single fs, but all other mounted fs's are also synced... which means N times the sync(2) calls. Fortunately

Re: os recommendations

2013-11-26 Thread Christoph Hellwig
On Tue, Nov 26, 2013 at 11:43:07AM +0100, Dominik Mostowiec wrote: Hi, I found in doc: http://ceph.com/docs/master/start/os-recommendations/ Putting multiple ceph-osd daemons using XFS or ext4 on the same host will not perform as well as they could. For now recommended filesystem is XFS.

Re: poor read performance on rbd+LVM, LVM overload

2013-10-21 Thread Christoph Hellwig
On Sun, Oct 20, 2013 at 08:58:58PM -0700, Sage Weil wrote: It looks like without LVM we're getting 128KB requests (which IIRC is typical), but with LVM it's only 4KB. Unfortunately my memory is a bit fuzzy here, but I seem to recall a property on the request_queue or device that affected

Re: poor read performance on rbd+LVM, LVM overload

2013-10-21 Thread Christoph Hellwig
On Mon, Oct 21, 2013 at 11:01:29AM -0400, Mike Snitzer wrote: It isn't DM that splits the IO into 4K chunks; it is the VM subsystem no? Well, it's the block layer based on what DM tells it. Take a look at dm_merge_bvec From dm_merge_bvec: /* * If the target doesn't support

Re: xattr limits

2013-10-04 Thread Christoph Hellwig
Might be good to send the crash report to the XFS list.. On Thu, Oct 03, 2013 at 11:54:29PM -0700, David Zafman wrote: Here is the test script: David Zafman Senior Developer http://www.inktank.com On Oct 3, 2013, at 11:02 PM, Loic Dachary l...@dachary.org wrote: Hi David,

Re: [PATCH v1 11/11] locks: give the blocked_hash its own spinlock

2013-06-04 Thread Christoph Hellwig
Having RCU for modification mostly workloads never is a good idea, so I don't think it makes sense to mention it here. If you care about the overhead it's worth trying to use per-cpu lists, though. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to

Re: Bobtail vs Argonaut Performance Preview

2012-12-22 Thread Christoph Hellwig
On Thu, Dec 20, 2012 at 11:08:19AM -0500, Patrick McGarry wrote: Hey All, Inktank's Mark Nelson just posted a great performance preview of Bobtail with comparison to Argonaut. Feel free to check it out: http://ow.ly/gg87B What's the problem with using a proper link instead of these

Re: Bobtail vs Argonaut Performance Preview

2012-12-22 Thread Christoph Hellwig
On Sat, Dec 22, 2012 at 07:36:41AM -0600, Mark Nelson wrote: Btw Christoph, thank you for taking the time to read my article. If I've done anything dumb or suboptimal regarding xfs, please do let me know. Soon I will be doing parametric sweeps over ceph parameter spaces to see how

Re: Bobtail vs Argonaut Performance Preview

2012-12-22 Thread Christoph Hellwig
On Sat, Dec 22, 2012 at 01:44:15PM -0600, Mark Nelson wrote: Is inode64 typically faster than inode32? I thought I remembered dchinner saying that the situation wasn't always particularly clear and it depended on the workload. Having said that, I can't really see it not being a good thing

Re: [ceph-commit] [ceph/ceph] e6a154: osx: compile on OSX

2012-12-13 Thread Christoph Hellwig
On Mon, Dec 10, 2012 at 07:11:44AM -1000, Sam Lang wrote: Is libaio really needed to build ceph-fuse? I use macports on my system and the last time I tried to make a change set to let ceph/ceph-fuse build on my laptop failed as I didn't have libaio, though I could just write a port for it.

Re: TIER: combine SSDs and HDDs into a single block device

2012-08-03 Thread Christoph Hellwig
On Thu, Aug 02, 2012 at 04:49:11PM -0500, Mark Nelson wrote: I was thinking of doing that. Is the realtime allocator a good fit for this kind of thing? I think dchinner mentioned on the xfs mailing list last year that it's single threaded and not very well optimized (and maybe not production

Re: TIER: combine SSDs and HDDs into a single block device

2012-08-02 Thread Christoph Hellwig
On Thu, Aug 02, 2012 at 12:02:44PM -0500, Mark Nelson wrote: Alex is also trying to bug the XFS guys (and Sage bugged the BTRFS guys) about ways to put metadata on SSD while keeping data on spinning disk. It sounds like there is a hack for XFS that would let us keep inodes in the lower portion

Re: Unable to restart Mon after reboot

2012-07-03 Thread Christoph Hellwig
On Tue, Jul 03, 2012 at 09:44:38AM -0700, Tommi Virtanen wrote: We've seen similar issues with btrfs, and others have reported that the large metadata btrfs option helps. We're still compiling information, but as of right now I hear best performance tends to happen with xfs; however, the lead

Re: Unable to restart Mon after reboot

2012-07-03 Thread Christoph Hellwig
On Tue, Jul 03, 2012 at 10:09:33AM -0700, Sage Weil wrote: The OSD keeps directories small on its own by breaking the contents of large directories into smaller subdirectories. Right, that's what I remembered. At least for XFS that'll actually give you much worse allocation patters as each

Re: FS / Kernel question choosing the correct kernel version

2012-06-26 Thread Christoph Hellwig
On Mon, Jun 25, 2012 at 03:11:17PM -0700, Sage Weil wrote: On Sat, 23 Jun 2012, Stefan Priebe wrote: Hi, i got stuck while selecting the right FS for ceph / RBD. XFS: - deadlock / hung task under 3.0.34 in xfs_ilock / xfs_buf_lock while syncfs There was an ilock fix that went

Re: all rbd users: set 'filestore fiemap = false'

2012-06-22 Thread Christoph Hellwig
On Mon, Jun 18, 2012 at 08:32:50AM -0700, Sage Weil wrote: On Mon, 18 Jun 2012, Christoph Hellwig wrote: On Sun, Jun 17, 2012 at 09:02:15PM -0700, Sage Weil wrote: that data over the wire. We have observed incorrect/changing FIEMAP on both btrfs: both btrfs and? Whoops

Re: [PATCH] ceph: use a shared zero page rather than one per messenger

2012-02-28 Thread Christoph Hellwig
On Tue, Feb 28, 2012 at 07:06:22PM -0800, Alex Elder wrote: Each messenger allocates a page to be used when writing zeroes out in the event of error or other abnormal condition. Just allocate one at initialization time and have them all share it. Any reason you don't simply use the

Re: [PATCH 0/6] ceph: virtual extended attribute cleanup

2012-02-28 Thread Christoph Hellwig
On Tue, Feb 28, 2012 at 07:17:41PM -0800, Alex Elder wrote: This series cleans up some code involving ceph's virtual extended attributes. Three of them define some simple macros are set up to help ensure the attributes are defined in a consistent way. One makes the size of certain constant

Re: [PATCH 1/2] vfs: export symbol d_find_any_alias()

2012-01-12 Thread Christoph Hellwig
-by: Christoph Hellwig h...@lst.de -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] vfs: export symbol d_find_any_alias()

2012-01-11 Thread Christoph Hellwig
On Wed, Jan 11, 2012 at 10:46:41AM -0800, Sage Weil wrote: Ceph needs this. Signed-off-by: Sage Weil s...@newdream.net Can you add a kerneldoc comment now that it is exported? -static struct dentry * d_find_any_alias(struct inode *inode) +struct dentry * d_find_any_alias(struct inode

Re: [PATCH 2/2] ceph: enable/disable dentry complete flags via mount option

2012-01-11 Thread Christoph Hellwig
+ dcache +Use the dcache contents to perform negative lookups and +readdir when the client has the entire directory contents in +its cache. (This does not change correctness; the client uses +cached metadata only when a lease or capability ensures it is +

Re: [PATCH 1/3] ceph: take inode lock when finding an inode alias

2011-12-29 Thread Christoph Hellwig
On Wed, Dec 28, 2011 at 06:05:13PM -0800, Sage Weil wrote: +/* The following code copied from fs/dcache.c */ +static struct dentry * d_find_any_alias(struct inode *inode) +{ + struct dentry *de; + + spin_lock(inode-i_lock); + de = __d_find_any_alias(inode); +

Re: [PATCH 2/3] ceph: take a reference to the dentry in d_find_any_alias()

2011-12-29 Thread Christoph Hellwig
On Wed, Dec 28, 2011 at 06:05:14PM -0800, Sage Weil wrote: From: Alex Elder el...@dreamhost.com The ceph code duplicates __d_find_any_alias(), but it currently does not take a reference to the returned dentry as it should. Replace the ceph implementation with an exact copy of what's found

  1   2   >