Re: [ceph-users] why not add (offset,len) to pglog

2015-12-24 Thread Ning Yao
also for a short-time maintaining. Otherwise, it will trigger the backfill process. So wait for Sage's opinion @sage If you are interest on this, we may cooperate to do this. Regards Ning Yao 2015-12-25 14:23 GMT+08:00 Dong Wu : > Thanks, from this pull request I learned that this issue is n

Re: The max single write IOPS on single RBD

2015-12-11 Thread Ning Yao
Currently, yes before we can improve the osd code efficiency further. You can achieve better performance by using client writeback cache if application allowed. Regards Ning Yao 2015-12-11 18:00 GMT+08:00 Zhi Zhang : > Hi Guys, > > We have a small 4 nodes cluster. Here is the

Fwd: Long time between waiting_for_osdmap event and reached_pg event in dump_historic_ops

2015-12-09 Thread Ning Yao
this would not be occurs. We encounter this situation under two cases: 1) High load parallel requests comes from client 2) Or High miss rate for cache tier Regards Ning Yao 2015-12-09 1:56 GMT+08:00 Rongze Zhu : > > Hi guys, > > I found out a strange issue in a ceph cluste

Fwd: problem about pgmeta object?

2015-12-09 Thread Ning Yao
108.444us averagely (about 15% improvement), and reduce the whole cpu time 0.5% ~ 1% globally. Regards Ning Yao 2015-12-08 21:37 GMT+08:00 Sage Weil : > > On Tue, 8 Dec 2015, Ning Yao wrote: > > Umm, it seems that MemStore requires in memory meta object to keep the > > attribut

Re: problem about pgmeta object?

2015-12-08 Thread Ning Yao
for all backends. Regards Ning Yao 2015-11-18 21:12 GMT+08:00 Sage Weil : > On Wed, 18 Nov 2015, Ning Yao wrote: >> Hi, Sage >> >> pgmeta object is a meta-object (like __head___2) without >> significant information. It is created when in PG::_init() when >>

problem about pgmeta object?

2015-11-17 Thread Ning Yao
osition &spos) { dout(15) << __func__ << " " << cid << "/" << hoid << dendl; Index index; int r; if(hoid.pgmeta()) goto out; *** *** *** out: r = object_map->set_keys(hoid, aset, &spos); dout(20) << __func__

Re: disabling buffer::raw crc cache

2015-11-11 Thread Ning Yao
2015-11-11 21:13 GMT+08:00 Sage Weil : > On Wed, 11 Nov 2015, Ning Yao wrote: >> >>>the code logic would touch crc cache is bufferlist::crc32c and >> >>>invalidate_crc. >> >>Also for pg_log::_write_log(), but seems it is always miss and use at >>

Re: disabling buffer::raw crc cache

2015-11-11 Thread Ning Yao
er::ptr length diff with ::encode(crc, bl), right? So the previous ebl.crc32c(0) calculation would be also no need to cache. Regards Ning Yao 2015-11-11 18:05 GMT+08:00 Ning Yao : >>>the code logic would touch crc cache is bufferlist::crc32c and >>>invalidate_crc. >>Also

Re: disabling buffer::raw crc cache

2015-11-11 Thread Ning Yao
>>the code logic would touch crc cache is bufferlist::crc32c and invalidate_crc. >Also for pg_log::_write_log(), but seems it is always miss and use at >once, no need to cache crc actually? Oh, no, it will be hit in FileJournal writing Regards Ning Yao 2015-11-11 18:03 GMT+08

Re: disabling buffer::raw crc cache

2015-11-11 Thread Ning Yao
the things like pg_info, attrs as they are always not reused in FileJournal writing? Regards Ning Yao 2015-11-11 16:25 GMT+08:00 Evgeniy Firsov : > Rb-tree construction, insertion, which needs memory allocation, mutex > lock, unlock is more CPU expensive then streamlined crc calculation of >

why keep and update rollback info for ReplicatedPG?

2015-11-10 Thread Ning Yao
? We may avoid frequently updating those information based on the pool types? Regards Ning Yao -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Specify omap path for filestore

2015-11-05 Thread Ning Yao
, data_map (which can act as the inode in filesystem), so that we can achieve the whole HDD iops as real data without the interference of filesystem-journal and inode get/set. Regards Ning Yao 2015-11-04 23:19 GMT+08:00 Chen, Xiaoxi : > Hi Ning, > > Yes, we doesn’t save any IO, or may even need m

Re: Specify omap path for filestore

2015-11-03 Thread Ning Yao
nable filestore_max_inline_xattr in the first test? If not, it may be reasonable. In my previous test, I remember just about 20%~30% improvement. And can you also provide cpu cost per Op on osd node? Regards Ning Yao 2015-10-30 10:04 GMT+08:00 Xue, Chendi : > Hi, Sam > > Last week I introduced ab

Re: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread Ning Yao
> BTW, latest code base is already separating out 2 transaction. No more append > call. Yeah, Got it. But still extra memory allocation in local_t like op_bl. we may use the previous op_bl in op_t in most cases. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body o

Re: pg scrub check problem

2015-10-31 Thread Ning Yao
> Good point. In my previous response I did "echo garbage > > ./foo__head_7FC1F406__1" to corrupt a replica. I think this may just happen when mixing O_DIRECT and buffer_io, which may just happen in Newstore. Or, inode content changes such as FileStore write " ./foo__head_7FC1F406

Re: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread Ning Yao
improve this? Regards Ning Yao 2015-10-31 21:18 GMT+08:00 Sage Weil : > On Sat, 31 Oct 2015, ??? wrote: >> hi, all: >> >> There are two ObjectStore::Transaction in >> ReplicatedBackend::submit_transaction, one is op_t and the other one >> is local_t. Is that s

Re: Inline dedup/compression

2015-07-01 Thread Ning Yao
For compression, I prefer to implement it in ECpool, it is much easier because objects in ECpool are already striped, which is what we have already finished now(and in testing). And the only Append write operation is allowed in EC, which is also lead us to implement it conveniently. Moreover, as is

Performance test for cache tier

2015-04-20 Thread Ning Yao
set the hit : miss rate for the test. Is there any way to do this? Regards Ning Yao -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RBD Discard issue for Cache_tier

2015-03-30 Thread Ning Yao
rollbacks so that we do not need to promote all snap object if we just want the head (and actually most cases are) Regards Ning Yao 2015-03-30 15:05 GMT+08:00 Wang, Zhiqiang : > How about handling the DELETE op in the cache tier like this: > 1) If the object is in the cache tier, we delete it in

RBD Discard issue for Cache_tier

2015-03-27 Thread Ning Yao
bject when Calling can_skip_promote() and send a CEPH_OSD_OP_DELETE op to cold pool from the Objecter interface, which would be much better when deleting file occurs. Is that possible? Regards Ning Yao -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body

Re: keyvaluestore speed up?

2015-03-23 Thread Ning Yao
2015-03-20 10:22 GMT+08:00 Shu, Xinxin : > I think rocksdb can support this configuration. > I do not find this option in rocksdb. If you know, can you provide this option to redirect the WAL file? Regards Ning Yao > Cheers, > xinxin > > -Original Message- >

Re: Can i improve the performance of rbd rollback in this way?

2015-03-18 Thread Ning Yao
, and some data in the rbd is modified but some not, then after calling rollback. It seems lots of objects (which is not modified before) also run clone() and generate the new object. I do not investigate the reasons, you may try it. Regards Ning Yao 2015-03-17 17:13 GMT+08:00 徐昕 : > Hi Ning

Re: Fwd: Fwd: Reduce read latency and bandwidth for ec pool

2015-03-17 Thread Ning Yao
or four hours and tend to be stable after ten hours. So does LRU or LRU2 makes sense here? or other strategies to make training process converge faster? Regards Ning Yao 2015-03-18 2:18 GMT+08:00 Josh Durgin : > On 03/17/2015 01:58 AM, Loic Dachary wrote: >> >> >> >>

what is the main reason for bad crc in data

2015-03-17 Thread Ning Yao
process if the content is owned by object data, or it may lead to osd error, if the inconsistent data occurs in pg_log, pg_info or osdmap?) Regards Ning Yao -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More

Re: crc error when decode_message?

2015-03-17 Thread Ning Yao
Thanks all guys. I got the ideas Regards Ning Yao 2015-03-17 21:58 GMT+08:00 Gregory Farnum : > On Tue, Mar 17, 2015 at 6:46 AM, Sage Weil wrote: >> On Tue, 17 Mar 2015, Ning Yao wrote: >>> 2015-03-16 22:06 GMT+08:00 Haomai Wang : >>> > On Mon, Mar 16, 2015

Re: Can i improve the performance of rbd rollback in this way?

2015-03-17 Thread Ning Yao
2015-03-17 15:25 GMT+08:00 徐昕 : > Hi Alexandre, > > I have tried this out. It can improve the performance of rbd rollback > greatly when the difference between the image and the sanpshot is > small. > If the clone does not happen in the rollback process, you may consider it would properly happen wh

Re: crc error when decode_message?

2015-03-17 Thread Ning Yao
2015-03-16 22:06 GMT+08:00 Haomai Wang : > On Mon, Mar 16, 2015 at 10:04 PM, Xinze Chi wrote: >> How to process the write request in primary? >> >> Thanks. >> >> 2015-03-16 22:01 GMT+08:00 Haomai Wang : >>> AFAR Pipe and AsyncConnection both will mark self fault and shutdown >>> socket and peer wi

Re: FileStore performance: coalescing operations

2015-03-10 Thread Ning Yao
Can we also consider to coalesce two OP_SETATTR transaction to a single OP_SETATTRS transaction? Regards Ning Yao 2015-03-05 15:04 GMT+08:00 Haomai Wang : > I think the performance improvement can be refer to > https://github.com/ceph/ceph/pull/2972 which I did a simple benchmark > c