RE: ObjectStore collections

2014-11-10 Thread Xue, Chendi
Made some changes:  
BufferTransaction::buffers is now using map coll_t, ghobject_t, 
mappairprefix, key, bufferlist  to records IOs in one transaction, codes 
is under test with 700s seq write 64k and 700s random write 4k without osd 
crash.

Best Regards,
-Chendi

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Xue, Chendi
Sent: Sunday, November 9, 2014 10:32 PM
To: Haomai Wang
Cc: ceph-devel@vger.kernel.org; Sage Weil
Subject: RE: ObjectStore collections

Hi, Haomai

As discussed this afternoon, I moved buffers from strip_header to 
BufferTransaction

The codes has been tested, there seems to be a slight performance enhancement 
After a pretty long time seq/random write test, I observed no segment fault and 
growing memory usage problem or any other osd crash problem.

Hope you can help to review the codes, thanks

https://github.com/ceph/ceph/pull/2889
 
===
For others, there is a brief  explanation:

'strip_header' in KeyValueStore is used to cache all headers, 'buffers' is used 
to ensure all io in one transaction can read the lastest update.

Move buffers from strip_header to BufferTransaction can ensure 'buffers' being 
destructed after the transaction being submitted, which prevent growing 
'strip_header-buffers' causing OOM problem.

Also, for all 'meta collection' will be cached in strip_header, and there lacks 
of lock of parallel io accessing 'meta collection', multi-thread accessing the 
'meta collection' strip_header-buffers causes segmentation fault, so moving 
'buffers' from strip_header to BufferTransaction can also prevent such 
scenario. 

Best Regards,
-Chendi

-Original Message-
From: Haomai Wang [mailto:haomaiw...@gmail.com]
Sent: Sunday, November 9, 2014 5:21 PM
To: Sage Weil
Cc: Xue, Chendi; ceph-devel@vger.kernel.org
Subject: Re: ObjectStore collections

On Sun, Nov 9, 2014 at 5:59 AM, Sage Weil s...@newdream.net wrote:
 On Sat, 8 Nov 2014, Haomai Wang wrote:
 As for OOM, I think the root cause is the mistake commit above too.
 Because meta collection will be updated each transaction and 
 StripObjectHeader::buffers will be always kept in memory because of 
 the strategy of cache. So this object's buffers will keep in 
 increasing all the time. So I think if we avoid cache meta
 collection's object will just be fine. Although we don't observe OOM 
 for previous release except this mistake commit, I prefer to add 
 codes to discard buffers each submit transaction time to avoid 
 potential unpredicted memory growing.

 Do you have a more clear impl about it? I'm just thinking a better 
 way to solve the performance bottleneck for meta collections.

 I would really like to see if we can eliminate collections from the 
 API entirely.  Or, perhaps more importantly, if that would be helpful.
 For the most part, hobject_t's already sort themselves into 
 collections based on the hash.  The exceptions are:

 - The 'meta' collection.  Mostly this includes the pg logs and pg info 
 objects (which are per-pg and would otherwise need no locking) and the 
 osdmap objects.

 - collection_move and collection_move_rename.  I think if we move 
 everything to collection_move_rename and use temporary objects with 
 unique names for everything (I think in-progress recovery objects is 
 the main user of collection_move) then this really just turns into a 
 rename operation.

 - object listing is currently in terms of pg, but could just as easily 
 specify a hash range.

 - collection attributes can be moved to the pginfo objects.

 It sounds like the problem in KeyValueStore is that the pglog/pginfo 
 objects are written by transactions in all PGs but the per-collection 
 index/cache structures weren't being locked.  If we can find a way to 
 fit these into the sorted hash in the right position that is 
 conceptually simpler.  But I'm not sure if that simplicity actually 
 helps with the implementation, where the data structure locking is the 
 important part.
 Perhaps we need to keep a collection concept simply for that purpose 
 and the only real problem is 'meta'?

Original, KeyvalueStore is true by just avoiding cache meta
collection for concurrent purpose.

I'm just thinking about is there a way to make meta collection ops more 
parallelization. BTW, FileStore also can get the benefits because of the same 
reason.


 sage



--
Best Regards,

Wheat
  칻  ~    +-  ݶ  w  ˛   m  ^  b  ^n r   z   h   G   h ( 階 ݢj  
 m z ޖ   f   h   ~ m 
N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

Re: ObjectStore collections

2014-11-09 Thread Haomai Wang
On Sun, Nov 9, 2014 at 5:59 AM, Sage Weil s...@newdream.net wrote:
 On Sat, 8 Nov 2014, Haomai Wang wrote:
 As for OOM, I think the root cause is the mistake commit above too.
 Because meta collection will be updated each transaction and
 StripObjectHeader::buffers will be always kept in memory because of
 the strategy of cache. So this object's buffers will keep in
 increasing all the time. So I think if we avoid cache meta
 collection's object will just be fine. Although we don't observe OOM
 for previous release except this mistake commit, I prefer to add codes
 to discard buffers each submit transaction time to avoid potential
 unpredicted memory growing.

 Do you have a more clear impl about it? I'm just thinking a better way
 to solve the performance bottleneck for meta collections.

 I would really like to see if we can eliminate collections from the API
 entirely.  Or, perhaps more importantly, if that would be helpful.  For
 the most part, hobject_t's already sort themselves into collections based
 on the hash.  The exceptions are:

 - The 'meta' collection.  Mostly this includes the pg logs and pg info
 objects (which are per-pg and would otherwise need no locking) and the
 osdmap objects.

 - collection_move and collection_move_rename.  I think if we move
 everything to collection_move_rename and use temporary objects with
 unique names for everything (I think in-progress recovery objects is
 the main user of collection_move) then this really just turns into a
 rename operation.

 - object listing is currently in terms of pg, but could just as easily
 specify a hash range.

 - collection attributes can be moved to the pginfo objects.

 It sounds like the problem in KeyValueStore is that the pglog/pginfo
 objects are written by transactions in all PGs but the per-collection
 index/cache structures weren't being locked.  If we can find a way to fit
 these into the sorted hash in the right position that is conceptually
 simpler.  But I'm not sure if that simplicity actually helps with the
 implementation, where the data structure locking is the important part.
 Perhaps we need to keep a collection concept simply for that purpose and
 the only real problem is 'meta'?

Original, KeyvalueStore is true by just avoiding cache meta
collection for concurrent purpose.

I'm just thinking about is there a way to make meta collection ops
more parallelization. BTW, FileStore also can get the benefits because
of the same reason.


 sage



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ObjectStore collections

2014-11-09 Thread Xue, Chendi
Hi, Haomai

As discussed this afternoon, I moved buffers from strip_header to 
BufferTransaction

The codes has been tested, there seems to be a slight performance enhancement
After a pretty long time seq/random write test, I observed no segment fault and 
growing memory usage problem or any other osd crash problem.

Hope you can help to review the codes, thanks

https://github.com/ceph/ceph/pull/2889
 
===
For others, there is a brief  explanation:

'strip_header' in KeyValueStore is used to cache all headers, 
'buffers' is used to ensure all io in one transaction can read the lastest 
update.

Move buffers from strip_header to BufferTransaction can ensure 'buffers' being 
destructed after the transaction being submitted, which prevent growing 
'strip_header-buffers' causing OOM problem.

Also, for all 'meta collection' will be cached in strip_header, and there lacks 
of lock of parallel io accessing 'meta collection', multi-thread accessing the 
'meta collection' strip_header-buffers causes segmentation fault, so moving 
'buffers' from strip_header to BufferTransaction can also prevent such 
scenario. 

Best Regards,
-Chendi

-Original Message-
From: Haomai Wang [mailto:haomaiw...@gmail.com] 
Sent: Sunday, November 9, 2014 5:21 PM
To: Sage Weil
Cc: Xue, Chendi; ceph-devel@vger.kernel.org
Subject: Re: ObjectStore collections

On Sun, Nov 9, 2014 at 5:59 AM, Sage Weil s...@newdream.net wrote:
 On Sat, 8 Nov 2014, Haomai Wang wrote:
 As for OOM, I think the root cause is the mistake commit above too.
 Because meta collection will be updated each transaction and 
 StripObjectHeader::buffers will be always kept in memory because of 
 the strategy of cache. So this object's buffers will keep in 
 increasing all the time. So I think if we avoid cache meta
 collection's object will just be fine. Although we don't observe OOM 
 for previous release except this mistake commit, I prefer to add 
 codes to discard buffers each submit transaction time to avoid 
 potential unpredicted memory growing.

 Do you have a more clear impl about it? I'm just thinking a better 
 way to solve the performance bottleneck for meta collections.

 I would really like to see if we can eliminate collections from the 
 API entirely.  Or, perhaps more importantly, if that would be helpful.  
 For the most part, hobject_t's already sort themselves into 
 collections based on the hash.  The exceptions are:

 - The 'meta' collection.  Mostly this includes the pg logs and pg info 
 objects (which are per-pg and would otherwise need no locking) and the 
 osdmap objects.

 - collection_move and collection_move_rename.  I think if we move 
 everything to collection_move_rename and use temporary objects with 
 unique names for everything (I think in-progress recovery objects is 
 the main user of collection_move) then this really just turns into a 
 rename operation.

 - object listing is currently in terms of pg, but could just as easily 
 specify a hash range.

 - collection attributes can be moved to the pginfo objects.

 It sounds like the problem in KeyValueStore is that the pglog/pginfo 
 objects are written by transactions in all PGs but the per-collection 
 index/cache structures weren't being locked.  If we can find a way to 
 fit these into the sorted hash in the right position that is 
 conceptually simpler.  But I'm not sure if that simplicity actually 
 helps with the implementation, where the data structure locking is the 
 important part.
 Perhaps we need to keep a collection concept simply for that purpose 
 and the only real problem is 'meta'?

Original, KeyvalueStore is true by just avoiding cache meta
collection for concurrent purpose.

I'm just thinking about is there a way to make meta collection ops more 
parallelization. BTW, FileStore also can get the benefits because of the same 
reason.


 sage



--
Best Regards,

Wheat