Re: 答复: journal alignment

2015-11-20 Thread
Because we should keep the logic of journal write thread simple. It performs better in pcie ssd. But I think the strategy you mentioned above is good for hdd or sata ssd. 2015-11-20 17:16 GMT+08:00 池信泽 : > Yes, You are right. But in ceph master branch, we have already > prepare_entry(add p

Re: 答复: journal alignment

2015-11-20 Thread
Yes, You are right. But in ceph master branch, we have already prepare_entry(add padding) before sumibt_entry. If you have good idea base on this, It would be great. 2015-11-20 17:12 GMT+08:00 changtao381 : > Hi All, > > Thanks for you apply! > > If directioIO + async IO requirement that alignment

Re: Regarding op_t, local_t

2015-11-18 Thread
worker thread , I need to introduce heap allocation on that > path as well , reducing one transaction would help there. > > Thanks & Regards > Somnath > > -Original Message- > From: 池信泽 [mailto:xmdx...@gmail.com] > Sent: Wednesday, November 18, 2015 6:00 PM > To:

Re: Regarding op_t, local_t

2015-11-18 Thread
Good catch. I think it does make sense. 2015-11-19 9:54 GMT+08:00 Somnath Roy : > Hi Sage, > I saw we are now having single transaction in submit_transaction. But, > in the replication path we are still having two transaction, can't we merge > it to one there ? > > Thanks & Regards > Somnath > --

Re: disabling buffer::raw crc cache

2015-11-11 Thread
Evgeniy Firsov : > Rb-tree construction, insertion, which needs memory allocation, mutex > lock, unlock is more CPU expensive then streamlined crc calculation of > sometimes 100 bytes or less. > > On 11/11/15, 12:03 AM, "池信泽" wrote: > >>Ah, I confuse that why the crc

Re: disabling buffer::raw crc cache

2015-11-11 Thread
Ah, I confuse that why the crc cache logic would exhaust so much cpu. 2015-11-11 15:27 GMT+08:00 Evgeniy Firsov : > Hello, Guys! > > While running CPU bound 4k block workload, I found that disabling crc > cache in the buffer::raw gives around 7% performance improvement. > > If there is no strong u

Re: why ShardedWQ in osd using smart pointer for PG?

2015-11-10 Thread
I wonder if we want to keep the PG from going out of scope at an inopportune time, why snap_trim_queue and scrub_queue declared as xlist instead of xlist? 2015-11-11 2:28 GMT+08:00 Gregory Farnum : > On Tue, Nov 10, 2015 at 7:19 AM, 池信泽 wrote: >> hi, all: >> >>

why ShardedWQ in osd using smart pointer for PG?

2015-11-10 Thread
hi, all: op_wq is declared as ShardedThreadPool::ShardedWQ < pair > &op_wq. I do not know why we should use PGRef in this? Because the overhead of the smart pointer is not small. Maybe the raw point PG* is also OK? If op_wq is changed to ShardedThreadPool::ShardedWQ < pair > &op

Re: ceph encoding optimization

2015-11-04 Thread
I agree with pg_stat_t (and friends) is a good first start. The eversion_t and utime_t are also good choice to start because they are used at many places. 2015-11-04 23:07 GMT+08:00 Gregory Farnum : > On Wed, Nov 4, 2015 at 7:00 AM, 池信泽 wrote: >> hi, all: >> >> I am f

ceph encoding optimization

2015-11-04 Thread
hi, all: I am focus on the cpu usage of ceph now. I find the struct (such as pg_info_t , transaction and so on) encode and decode exhaust too much cpu resource. For now, we should encode every member variable one by one which calling encode_raw finally. When there are many members, we s

Re: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread
If we keep them separate and pass them to ObjectStore::queue_transactions() ,the cpu usage of ObjectStore::queue_transactions() would take up from 6.03% to 6.76% compared with re-using op_t items. 2015-11-01 11:05 GMT+08:00 池信泽 : > Yes, I think so. > keeping them separate and pass t

Re: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread
Yes, I think so. keeping them separate and pass them to ObjectStore::queue_transactions() would increase the time on transaction encode process and exhaust more cpu. The transaction::append holds 0.8% cpu on my environment. The transaction encoding is also really a bottleneck which process holds 1

Re: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread
ing out 2 transaction. No more append > call.. > > Thanks & Regards > Somnath > > -Original Message- > From: ceph-devel-ow...@vger.kernel.org > [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ning Yao > Sent: Saturday, October 31, 2015 8:35 AM >

Re: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread
the op_t items are encoded in issue_op, so after issue_op, we could use it directly instead of local_t items? 2015-10-31 21:18 GMT+08:00 Sage Weil : > On Sat, 31 Oct 2015, ??? wrote: >> hi, all: >> >> There are two ObjectStore::Transaction in >> ReplicatedBackend::submit_transaction, one is op

why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread
hi, all: There are two ObjectStore::Transaction in ReplicatedBackend::submit_transaction, one is op_t and the other one is local_t. Is that something critilal logic we should consider? If we could reuse variable op_t it would be great. Because it is expensive to calling local_t.append

Re: why we should use two Mutex in OSD ShardData?

2015-10-30 Thread
I do not see any improvement by moving to single mutex. I just fell puzzle why we use two mutex. But I also do not see any improvement using two mutex in my environment. Thanks for your explanation. 2015-10-30 22:59 GMT+08:00 Somnath Roy : > > Hi xinze, > This is mainly for reducing lock contentio

why we should use two Mutex in OSD ShardData?

2015-10-30 Thread
hi, all: There are two Mutex in ShardData, one is sdata_lock and the other one is sdata_op_ordering_lock. I wonder could we replace sdata_lock with sdata_op_ordering_lock? -- Regards, xinze -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message

Re: pg scrub check problem

2015-10-28 Thread
Yes, I think we should also the scrub_interval. OSD::sched_scrub() { if ((double)diff < cct->_conf->osd_scrub_min_interval) { dout(10) << "sched_scrub " << pgid << " at " << t << ": " << (double)diff << " < min (" << cct->_conf->osd_scrub_min_interval << " seconds)" << dendl; break; }

Re: pg scrub check problem

2015-10-28 Thread
Are you sure the osd begin to scrub? maybe you could check it from osd log, or using 'ceph pg dump' to check whether the scrub stamp changes or not. Because there is some strategy which would reject the scrub command Such as the system load , osd_scrub_min_interval, osd_deep_scrub_interval and so o

ec pool history objects

2015-06-15 Thread
hi, all: when I use ec poll, I see there are some object history for object xx. Such as: xx__head_610951D6__2_fe1_2, xx__head_610951D6__2_fe2_2 xx__head_610951D6__2__2 I think this object is used for roll_back when not all shards have written object to

set different target_max_bytes for each pg in cache pool

2015-04-22 Thread
hi, all: In my production environment, each pg is the same pool has different io pressure. The max is 3- 4 times more than min. Currently, max size of each pg is pool.info.target_max_bytes / pg_num. So I think, we could do better. Such as, set different pg target_max_bytes according to thei

more human readable log to track request or using mapreduce for data statistics

2015-03-26 Thread
hi,ceph: Currently, the command ”ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_historic_ops“ may return as below: { "description": "osd_op(client.4436.1:11617 rb.0.1153.6b8b4567.0192 [] 2.8eb4757c ondisk+write e92)", "received_at": "2015-03-25 19:41:47.146145",

the cpu optimize in ceph

2014-12-12 Thread
hi, cephers: Now, I want to reduce the cpu usage rate by osd in full ssd cluster. In my test case, ceph run out of cpu, the cpu idle is about 10%. The cpu in my cluster is Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz. Can you give me some suggestion? Thanks. There are the cpu u

zeromq in ceph?

2014-11-06 Thread
hi, cephers: Zeromq is a very high-speed asynchronous I/O engine, which is also used in storm, a distributed and fault-tolerant realtime computation. Does anyone want to use it in the communication of osds? Thanks -- To unsubscribe from this list: send the line "unsubscribe ceph-de

Fwd: Cache pool bug?

2014-10-27 Thread
-- Forwarded message -- From: 池信泽 Date: 2014-10-28 13:37 GMT+08:00 Subject: Re: Cache pool bug? To: "Wang, Zhiqiang" If we use the timestamp, get_position_micro(int32_t v, uint64_t *lower, uint64_t *upper) unsigned bin = calc_bits_of(v); // this function will

Re: Cache pool bug?

2014-10-27 Thread
w - p->first; if (temp) ++(*temp); else return; } } } 2014-10-28 12:36 GMT+08:00 池信泽 : > I am sorry. We should only just modify the logic > ReplicatedPG::agent_estimate_atime_temp, atime should be now. > > if (temp) > *temp = 0; > if (hit_set->conta

Re: Cache pool bug?

2014-10-27 Thread
I am sorry. We should only just modify the logic ReplicatedPG::agent_estimate_atime_temp, atime should be now. if (temp) *temp = 0; if (hit_set->contains(oid)) { *atime = 0; if (temp) ++(*temp); else return; } 2014-10-28 12:28 GMT+08:00 池信泽 : > I think

Re: Cache pool bug?

2014-10-27 Thread
I think if it changed to be *atime = p->first, Below logic should also be modified. ReplicatedPG::agent_maybe_evict if (atime < 0 && obc->obs.oi.mtime != utime_t()) { if (obc->obs.oi.local_mtime != utime_t()) { atime = ceph_clock_now(NULL).sec() - obc->obs.oi.local_mtime;

Re: Cache pool bug?

2014-10-27 Thread
Because if there are mutiple access time in agent_state for the same object, we should use the recently one. 2014-10-28 9:42 GMT+08:00 池信泽 : > I think there is also bug in ReplicatedPG::agent_estimate_atime_temp > I think we should change the following code: > for (map::iterator p = ag

Re: Cache pool bug?

2014-10-27 Thread
e < 0) *atime = now - p->first; if (temp) ++(*temp); else return; } } 2014-10-28 9:38 GMT+08:00 池信泽 : > I think there is also bug in ReplicatedPG::agent_estimate_atime_temp > I think we should change the following code: > for (map::iterator p = agent_state-&g