Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-17 Thread Nathan Cutler
 We've since merged something 
 that stripes over several small xattrs so that we can keep things inline, 
 but it hasn't been backported to hammer yet.  See
 c6cdb4081e366f471b372102905a1192910ab2da.

Hi Sage:

You wrote yet - should we earmark it for hammer backport?

Nathan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-17 Thread Abhishek L
On Wed, Jun 17, 2015 at 1:02 PM, Nathan Cutler ncut...@suse.cz wrote:
 We've since merged something
 that stripes over several small xattrs so that we can keep things inline,
 but it hasn't been backported to hammer yet.  See
 c6cdb4081e366f471b372102905a1192910ab2da.

 Hi Sage:

 You wrote yet - should we earmark it for hammer backport?

I'm guessing https://github.com/ceph/ceph/pull/4973 is the backport for hammer
(issue http://tracker.ceph.com/issues/11981)

Regards
Abhishek
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-17 Thread Sage Weil
On Wed, 17 Jun 2015, Nathan Cutler wrote:
  We've since merged something 
  that stripes over several small xattrs so that we can keep things inline, 
  but it hasn't been backported to hammer yet.  See
  c6cdb4081e366f471b372102905a1192910ab2da.
 
 Hi Sage:
 
 You wrote yet - should we earmark it for hammer backport?

Yes, please!

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread GuangYang
Hi Cephers,
While looking at disk utilization on OSD, I noticed the disk was constantly 
busy with large number of small writes, further investigation showed that, as 
radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which 
made the xattrs get from local to extents, which incurred extra I/O.

I would like to check if anybody has experience with offloading the metadata to 
omap:
  1 Offload everything to omap? If this is the case, should we make the inode 
size as 512 (instead of 2k)?
  2 Partial offload the metadata to omap, e.g. only offloading the rgw 
specified metadata to omap.

Any sharing is deeply appreciated. Thanks!

Thanks,
Guang 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread Somnath Roy
Guang,
Try to play around with the following conf attributes specially 
filestore_max_inline_xattr_size and filestore_max_inline_xattrs

// Use omap for xattrs for attrs over
// filestore_max_inline_xattr_size or
OPTION(filestore_max_inline_xattr_size, OPT_U32, 0) //Override
OPTION(filestore_max_inline_xattr_size_xfs, OPT_U32, 65536)
OPTION(filestore_max_inline_xattr_size_btrfs, OPT_U32, 2048)
OPTION(filestore_max_inline_xattr_size_other, OPT_U32, 512)

// for more than filestore_max_inline_xattrs attrs
OPTION(filestore_max_inline_xattrs, OPT_U32, 0) //Override
OPTION(filestore_max_inline_xattrs_xfs, OPT_U32, 10)
OPTION(filestore_max_inline_xattrs_btrfs, OPT_U32, 10)
OPTION(filestore_max_inline_xattrs_other, OPT_U32, 2)

I think the behavior for XFS is if the xttrs are more than 10, it will use OMAP.

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
GuangYang
Sent: Tuesday, June 16, 2015 11:31 AM
To: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
Subject: [ceph-users] xattrs vs. omap with radosgw

Hi Cephers,
While looking at disk utilization on OSD, I noticed the disk was constantly 
busy with large number of small writes, further investigation showed that, as 
radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which 
made the xattrs get from local to extents, which incurred extra I/O.

I would like to check if anybody has experience with offloading the metadata to 
omap:
  1 Offload everything to omap? If this is the case, should we make the inode 
size as 512 (instead of 2k)?
  2 Partial offload the metadata to omap, e.g. only offloading the rgw 
specified metadata to omap.

Any sharing is deeply appreciated. Thanks!

Thanks,
Guang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread Sage Weil
On Wed, 17 Jun 2015, Zhou, Yuan wrote:
 FWIW, there was some discussion in OpenStack Swift and their performance 
 tests showed 255 is not the best in recent XFS. They decided to use large 
 xattr boundary size(65535).
 
 https://gist.github.com/smerritt/5e7e650abaa20599ff34

If I read this correctly the total metadata they are setting is pretty 
big:

PILE_O_METADATA = pickle.dumps(dict(
(attribute%d % i, hashlib.sha512(thingy %d % i).hexdigest())
for i in range(200)))

So lots of small attrs won't really help since they'll have to spill out 
into other extents eventually no matter what.

In our case, we have big (2k) inodes and can easily fit everything in 
there.. as long as it is in 255 byte pieces.

sage


 
 
 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
 Sent: Wednesday, June 17, 2015 3:43 AM
 To: GuangYang
 Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
 Subject: Re: xattrs vs. omap with radosgw
 
 On Tue, 16 Jun 2015, GuangYang wrote:
  Hi Cephers,
  While looking at disk utilization on OSD, I noticed the disk was constantly 
  busy with large number of small writes, further investigation showed that, 
  as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), 
  which made the xattrs get from local to extents, which incurred extra I/O.
  
  I would like to check if anybody has experience with offloading the 
  metadata to omap:
    1 Offload everything to omap? If this is the case, should we make the 
  inode size as 512 (instead of 2k)?
    2 Partial offload the metadata to omap, e.g. only offloading the rgw 
  specified metadata to omap.
  
  Any sharing is deeply appreciated. Thanks!
 
 Hi Guang,
 
 Is this hammer or firefly?
 
 With hammer the size of object_info_t crossed the 255 byte boundary, which is 
 the max xattr value that XFS can inline.  We've since merged something that 
 stripes over several small xattrs so that we can keep things inline, but it 
 hasn't been backported to hammer yet.  See 
 c6cdb4081e366f471b372102905a1192910ab2da.  Perhaps this is what you're seeing?
 
 I think we're still better off with larger XFS inodes and inline xattrs if it 
 means we avoid leveldb at all for most objects.
 
 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread GuangYang
After back-porting Sage's patch to Giant, with radosgw, the xattrs can get 
inline. I haven't run extensive testing yet, will update once I have some 
performance data to share.

Thanks,
Guang

 Date: Tue, 16 Jun 2015 15:51:44 -0500
 From: mnel...@redhat.com
 To: yguan...@outlook.com; s...@newdream.net
 CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
 Subject: Re: xattrs vs. omap with radosgw
 
 
 
 On 06/16/2015 03:48 PM, GuangYang wrote:
  Thanks Sage for the quick response.
 
  It is on Firefly v0.80.4.
 
  While trying to put with *rados* directly, the xattrs can be inline. The 
  problem comes to light when using radosgw, since we have a bunch of 
  metadata to keep via xattrs, including:
  rgw.idtag  : 15 bytes
  rgw.manifest :  381 bytes
 
 Ah, that manifest will push us over the limit afaik resulting in every 
 inode getting a new extent.
 
  rgw.acl : 121 bytes
  rgw.etag : 33 bytes
 
  Given the background, it looks like the problem is that the rgw.manifest is 
  too large so that XFS make it extents. If I understand correctly, if we 
  port the change to Firefly, we should be able to inline the inode since the 
  accumulated size is still less than 2K (please correct me if I am wrong 
  here).
 
 I think you are correct so long as the patch breaks that manifest down 
 into 254 byte or smaller chunks.
 
 
  Thanks,
  Guang
 
 
  
  Date: Tue, 16 Jun 2015 12:43:08 -0700
  From: s...@newdream.net
  To: yguan...@outlook.com
  CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
  Subject: Re: xattrs vs. omap with radosgw
 
  On Tue, 16 Jun 2015, GuangYang wrote:
  Hi Cephers,
  While looking at disk utilization on OSD, I noticed the disk was 
  constantly busy with large number of small writes, further investigation 
  showed that, as radosgw uses xattrs to store metadata (e.g. etag, 
  content-type, etc.), which made the xattrs get from local to extents, 
  which incurred extra I/O.
 
  I would like to check if anybody has experience with offloading the 
  metadata to omap:
  1 Offload everything to omap? If this is the case, should we make the 
  inode size as 512 (instead of 2k)?
  2 Partial offload the metadata to omap, e.g. only offloading the rgw 
  specified metadata to omap.
 
  Any sharing is deeply appreciated. Thanks!
 
  Hi Guang,
 
  Is this hammer or firefly?
 
  With hammer the size of object_info_t crossed the 255 byte boundary, which
  is the max xattr value that XFS can inline. We've since merged something
  that stripes over several small xattrs so that we can keep things inline,
  but it hasn't been backported to hammer yet. See
  c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're
  seeing?
 
  I think we're still better off with larger XFS inodes and inline xattrs if
  it means we avoid leveldb at all for most objects.
 
  sage
--
  To unsubscribe from this list: send the line unsubscribe ceph-devel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-16 Thread GuangYang
Hi Yuan,
Thanks for sharing the link, it is interesting to read. My understanding of the 
test results, is that with a fixed size of xattrs, using smaller stripe size 
will incur larger latency for read, which kind of makes sense since there are 
more k-v pairs, and with the size, it needs to get extents anyway. 

Correct me if I am wrong here...

Thanks,
Guang

 From: yuan.z...@intel.com
 To: s...@newdream.net; yguan...@outlook.com
 CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
 Subject: RE: xattrs vs. omap with radosgw
 Date: Wed, 17 Jun 2015 01:32:35 +
 
 FWIW, there was some discussion in OpenStack Swift and their performance 
 tests showed 255 is not the best in recent XFS. They decided to use large 
 xattr boundary size(65535).
 
 https://gist.github.com/smerritt/5e7e650abaa20599ff34
 
 
 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
 Sent: Wednesday, June 17, 2015 3:43 AM
 To: GuangYang
 Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
 Subject: Re: xattrs vs. omap with radosgw
 
 On Tue, 16 Jun 2015, GuangYang wrote:
 Hi Cephers,
 While looking at disk utilization on OSD, I noticed the disk was constantly 
 busy with large number of small writes, further investigation showed that, 
 as radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), 
 which made the xattrs get from local to extents, which incurred extra I/O.
 
 I would like to check if anybody has experience with offloading the metadata 
 to omap:
   1 Offload everything to omap? If this is the case, should we make the 
 inode size as 512 (instead of 2k)?
   2 Partial offload the metadata to omap, e.g. only offloading the rgw 
 specified metadata to omap.
 
 Any sharing is deeply appreciated. Thanks!
 
 Hi Guang,
 
 Is this hammer or firefly?
 
 With hammer the size of object_info_t crossed the 255 byte boundary, which is 
 the max xattr value that XFS can inline. We've since merged something that 
 stripes over several small xattrs so that we can keep things inline, but it 
 hasn't been backported to hammer yet. See 
 c6cdb4081e366f471b372102905a1192910ab2da. Perhaps this is what you're seeing?
 
 I think we're still better off with larger XFS inodes and inline xattrs if it 
 means we avoid leveldb at all for most objects.
 
 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com