[RESEND][PATCH 0/2] fix few root xattr bugs

2013-04-18 Thread Kuan Kai Chiu
The first patch fixes a bug that causes MDS crash while setting or removing xattrs on root directory. The second patch fixes another bug that root xattrs not correctly logged in MDS journal. Kuan Kai Chiu (2): mds: fix setting/removing xattrs on root mds: journal the projected root xattrs

[PATCH 1/2] mds: fix setting/removing xattrs on root

2013-04-18 Thread Kuan Kai Chiu
MDS crashes while journaling dirty root inode in handle_client_setxattr and handle_client_removexattr. We should use journal_dirty_inode to safely log root inode here. Signed-off-by: Kuan Kai Chiu big.c...@bigtera.com --- src/mds/Server.cc |6 ++ 1 file changed, 2 insertions(+), 4

[PATCH 2/2] mds: journal the projected root xattrs in add_root()

2013-04-18 Thread Kuan Kai Chiu
In EMetaBlob::add_root(), we should log the projected root xattrs instead of original ones to reflect xattr changes. Signed-off-by: Kuan Kai Chiu big.c...@bigtera.com --- src/mds/events/EMetaBlob.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mds/events/EMetaBlob.h

Re: RBD Read performance

2013-04-18 Thread Mark Nelson
On 04/17/2013 11:35 PM, Malcolm Haak wrote: Hi all, Hi Malcolm! I jumped into the IRC channel yesterday and they said to email ceph-devel. I have been having some read performance issues. With Reads being slower than writes by a factor of ~5-8. I recently saw this kind of behaviour

Re: [PATCH] mds: fix setting/removing xattrs on root

2013-04-18 Thread Big Chiu
I didn't notice the bug. Guessing it was hidden because CephFS had been accessed by other daemons in my test environment. Thank you for the hint! The signed-off patches are resent, also including your fix. On Wed, Apr 17, 2013 at 4:06 AM, Gregory Farnum g...@inktank.com wrote: On Mon, Apr 15,

Re: RBD Read performance

2013-04-18 Thread Malcolm Haak
Hi Mark! Thanks for the quick reply! I'll reply inline below. On 18/04/13 17:04, Mark Nelson wrote: On 04/17/2013 11:35 PM, Malcolm Haak wrote: Hi all, Hi Malcolm! I jumped into the IRC channel yesterday and they said to email ceph-devel. I have been having some read performance issues.

poor write performance

2013-04-18 Thread James Harper
I'm doing some basic testing so I'm not really fussed about poor performance, but my write performance appears to be so bad I think I'm doing something wrong. Using dd to test gives me kbytes/second for write performance for 4kb block sizes, while read performance is acceptable (for testing at

Re: poor write performance

2013-04-18 Thread Wolfgang Hennerbichler
Hi James, This is just pure speculation, but can you assure that the bonding works correctly? Maybe you have issues there. I have seen a lot of incorrectly configured bonding throughout my life as unix admin. Maybe this could help you a little:

[PATCH 7/7, v2] rbd: issue stat request before layered write

2013-04-18 Thread Alex Elder
(Since this hasn't been reviewed I have updated it slightly. I rebased the series onto the current testing branch. They are all available in the review/wip-4679-3 in the ceph-client git repository. I also made some minor changes in the definition of rbd_img_obj_exists_callback()). This is a

[PATCH V2] radosgw: receiving unexpected error code while accessing an non-existing object by authorized not-owner user

2013-04-18 Thread Li Wang
This patch fixes a bug in radosgw swift compatibility code, that is, if a not-owner but authorized user access a non-existing object in a container, he wiil receive unexpected error code, to repeat this bug, do the following steps, 1 User1 creates a container, and grants the read/write permission

Re: poor write performance

2013-04-18 Thread Mark Nelson
On 04/18/2013 06:46 AM, James Harper wrote: I'm doing some basic testing so I'm not really fussed about poor performance, but my write performance appears to be so bad I think I'm doing something wrong. Using dd to test gives me kbytes/second for write performance for 4kb block sizes, while

Re: test osd on zfs

2013-04-18 Thread Sage Weil
On Thu, 18 Apr 2013, Stefan Priebe - Profihost AG wrote: Am 17.04.2013 um 23:14 schrieb Brian Behlendorf behlendo...@llnl.gov: On 04/17/2013 01:16 PM, Mark Nelson wrote: I'll let Brian talk about the virtues of ZFS, I think the virtues of ZFS have been discussed at length in various

Xen blktap driver for Ceph RBD : Anybody wants to test ? :p

2013-04-18 Thread Sylvain Munaut
Hi, I've been working on getting a working blktap driver allowing to access ceph RBD block devices without relying on the RBD kernel driver and it finally got to a point where, it works and is testable. Some of the advantages are: - Easier to update to newer RBD version - Allows functionality

Re: [PATCH] Swift ACL .rlistings support

2013-04-18 Thread Yehuda Sadeh
Sorry for the late response, this somehow went through the cracks. The main issue that I see with this patch is that it introduces a new bit for object listing that is not really needed. You just need to set the RGW_PERM_READ on the bucket. This way setting this flag through swift you'd be able to

Re: poor write performance

2013-04-18 Thread Andrey Korolyov
On Thu, Apr 18, 2013 at 5:43 PM, Mark Nelson mark.nel...@inktank.com wrote: On 04/18/2013 06:46 AM, James Harper wrote: I'm doing some basic testing so I'm not really fussed about poor performance, but my write performance appears to be so bad I think I'm doing something wrong. Using dd to

Re: poor write performance

2013-04-18 Thread Mark Nelson
On 04/18/2013 11:46 AM, Andrey Korolyov wrote: On Thu, Apr 18, 2013 at 5:43 PM, Mark Nelson mark.nel...@inktank.com wrote: On 04/18/2013 06:46 AM, James Harper wrote: I'm doing some basic testing so I'm not really fussed about poor performance, but my write performance appears to be so bad I

Re: [fuse-devel] fuse_lowlevel_notify_inval_inode deadlock

2013-04-18 Thread Sage Weil
On Wed, 17 Apr 2013, Anand Avati wrote: On Wed, Apr 17, 2013 at 9:45 PM, Sage Weil s...@inktank.com wrote: On Wed, 17 Apr 2013, Anand Avati wrote: On Wed, Apr 17, 2013 at 5:43 PM, Sage Weil s...@inktank.com wrote: We've hit a new deadlock with fuse_lowlevel_notify_inval_inode,

Re: [RESEND][PATCH 0/2] fix few root xattr bugs

2013-04-18 Thread Gregory Farnum
Thanks! I merged these into next (going to be Cuttlefish) in commits f379ce37bfdcb3670f52ef47c02787f82e50e612 and 87634d882fda80c4a2e3705c83a38bdfd613763f. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Wed, Apr 17, 2013 at 11:43 PM, Kuan Kai Chiu big.c...@bigtera.com

Re: [fuse-devel] fuse_lowlevel_notify_inval_inode deadlock

2013-04-18 Thread Anand Avati
On Apr 18, 2013, at 10:05 AM, Sage Weil s...@inktank.com wrote: On Wed, 17 Apr 2013, Anand Avati wrote: On Wed, Apr 17, 2013 at 9:45 PM, Sage Weil s...@inktank.com wrote: On Wed, 17 Apr 2013, Anand Avati wrote: On Wed, Apr 17, 2013 at 5:43 PM, Sage Weil s...@inktank.com wrote: We've

Re: [fuse-devel] fuse_lowlevel_notify_inval_inode deadlock

2013-04-18 Thread Sage Weil
On Thu, 18 Apr 2013, Anand Avati wrote: On Apr 18, 2013, at 10:05 AM, Sage Weil s...@inktank.com wrote: On Wed, 17 Apr 2013, Anand Avati wrote: On Wed, Apr 17, 2013 at 9:45 PM, Sage Weil s...@inktank.com wrote: On Wed, 17 Apr 2013, Anand Avati wrote: On Wed, Apr 17, 2013 at 5:43 PM,

Re: [fuse-devel] fuse_lowlevel_notify_inval_inode deadlock

2013-04-18 Thread Anand Avati
On 04/18/2013 12:12 PM, Sage Weil wrote: On Thu, 18 Apr 2013, Anand Avati wrote: On Apr 18, 2013, at 10:05 AM, Sage Weils...@inktank.com wrote: On Wed, 17 Apr 2013, Anand Avati wrote: On Wed, Apr 17, 2013 at 9:45 PM, Sage Weils...@inktank.com wrote: On Wed, 17 Apr 2013, Anand Avati wrote:

erasure coding (sorry)

2013-04-18 Thread Plaetinck, Dieter
sorry to bring this up again, googling revealed some people don't like the subject [anymore]. but I'm working on a new +- 3PB cluster for storage of immutable files. and it would be either all cold data, or mostly cold. 150MB avg filesize, max size 5GB (for now) For this use case, my impression

Re: erasure coding (sorry)

2013-04-18 Thread Sage Weil
On Thu, 18 Apr 2013, Plaetinck, Dieter wrote: sorry to bring this up again, googling revealed some people don't like the subject [anymore]. but I'm working on a new +- 3PB cluster for storage of immutable files. and it would be either all cold data, or mostly cold. 150MB avg filesize, max

Re: erasure coding (sorry)

2013-04-18 Thread Mark Nelson
On 04/18/2013 04:08 PM, Josh Durgin wrote: On 04/18/2013 01:47 PM, Sage Weil wrote: On Thu, 18 Apr 2013, Plaetinck, Dieter wrote: sorry to bring this up again, googling revealed some people don't like the subject [anymore]. but I'm working on a new +- 3PB cluster for storage of immutable

Re: erasure coding (sorry)

2013-04-18 Thread Noah Watkins
On Apr 18, 2013, at 2:08 PM, Josh Durgin josh.dur...@inktank.com wrote: I talked to some folks interested in doing a more limited form of this yesterday. They started a blueprint [1]. One of their ideas was to have erasure coding done by a separate process (or thread perhaps). It would use

Re: erasure coding (sorry)

2013-04-18 Thread Sage Weil
On Thu, 18 Apr 2013, Noah Watkins wrote: On Apr 18, 2013, at 2:08 PM, Josh Durgin josh.dur...@inktank.com wrote: I talked to some folks interested in doing a more limited form of this yesterday. They started a blueprint [1]. One of their ideas was to have erasure coding done by a separate

RE: poor write performance

2013-04-18 Thread James Harper
Where should I start looking for performance problems? I've tried running some of the benchmark stuff in the documentation but I haven't gotten very far... Hi James! Sorry to hear about the performance trouble! Is it just sequential 4KB direct IO writes that are giving you troubles?

Re: RBD Read performance

2013-04-18 Thread Malcolm Haak
Morning all, Did the echos on all boxes involved... and the results are in.. [root@dogbreath ~]# [root@dogbreath ~]# dd if=/todd-rbd-fs/DELETEME of=/dev/null bs=4M count=1 iflag=direct 1+0 records in 1+0 records out 4194304 bytes (42 GB) copied, 144.083 s, 291 MB/s

Re: erasure coding (sorry)

2013-04-18 Thread Christopher LILJENSTOLPE
Supposedly, on 2013-Apr-18, at 14.08 PDT(-0700), someone claiming to be Josh Durgin scribed: On 04/18/2013 01:47 PM, Sage Weil wrote: On Thu, 18 Apr 2013, Plaetinck, Dieter wrote: sorry to bring this up again, googling revealed some people don't like the subject [anymore]. but I'm working

Re: RBD Read performance

2013-04-18 Thread Mark Nelson
On 04/18/2013 07:27 PM, Malcolm Haak wrote: Morning all, Did the echos on all boxes involved... and the results are in.. [root@dogbreath ~]# [root@dogbreath ~]# dd if=/todd-rbd-fs/DELETEME of=/dev/null bs=4M count=1 iflag=direct 1+0 records in 1+0 records out 4194304 bytes (42

Re: erasure coding (sorry)

2013-04-18 Thread Christopher LILJENSTOLPE
Supposedly, on 2013-Apr-18, at 14.31 PDT(-0700), someone claiming to be Plaetinck, Dieter scribed: On Thu, 18 Apr 2013 16:09:52 -0500 Mark Nelson mark.nel...@inktank.com wrote: On 04/18/2013 04:08 PM, Josh Durgin wrote: On 04/18/2013 01:47 PM, Sage Weil wrote: On Thu, 18 Apr 2013,

Re: erasure coding (sorry)

2013-04-18 Thread Christopher LILJENSTOLPE
Supposedly, on 2013-Apr-18, at 14.24 PDT(-0700), someone claiming to be Noah Watkins scribed: On Apr 18, 2013, at 2:08 PM, Josh Durgin josh.dur...@inktank.com wrote: I talked to some folks interested in doing a more limited form of this yesterday. They started a blueprint [1]. One of their

Re: erasure coding (sorry)

2013-04-18 Thread Christopher LILJENSTOLPE
Supposedly, on 2013-Apr-18, at 14.26 PDT(-0700), someone claiming to be Sage Weil scribed: On Thu, 18 Apr 2013, Noah Watkins wrote: On Apr 18, 2013, at 2:08 PM, Josh Durgin josh.dur...@inktank.com wrote: I talked to some folks interested in doing a more limited form of this yesterday. They

Re: RBD Read performance

2013-04-18 Thread Malcolm Haak
Ok this is getting interesting. rados -p pool bench 300 write --no-cleanup Total time run: 301.103933 Total writes made: 22477 Write size: 4194304 Bandwidth (MB/sec): 298.595 Stddev Bandwidth: 171.941 Max bandwidth (MB/sec): 832 Min bandwidth (MB/sec): 8