Re: ceph encoding optimization

2015-12-03 Thread
I write a new patch about this https://github.com/XinzeChi/ceph/commit/06eb471e463a4687e251273d0b5dfe170acbef2d If you use __attribute__ ((packed)) after struct, we could encode many struct member in a batch. There is not compatibility problem if we keep the order of members defined in struct.

Re: Compiling for FreeBSD, Clang refuses to compile a test

2015-12-05 Thread
I think "const int k = 12; const int m = 4" would pass the compile? 2015-12-05 20:56 GMT+08:00 Willem Jan Withagen : > src/test/erasure-code/TestErasureCodeIsa.cc > > contains snippets, function definition like: > > buffer::ptr enc[k + m]; > // create buffers with a copy of

Re: FreeBSD is receiving traps on os/FileJournal.cc:1036

2015-12-16 Thread
Would you mind create an issue in http://tracker.ceph.com/ and tell me how to reproduce the bug? Thanks. 2015-12-16 18:26 GMT+08:00 Willem Jan Withagen <w...@digiware.nl>: > On 16-12-2015 10:40, Xinze Chi (信泽) wrote: >> >> Because we use the new strategy for filejou

Re: FreeBSD is receiving traps on os/FileJournal.cc:1036

2015-12-15 Thread
You mean your ceph assert(0 == "bl should be align"), right? But in master branch, the 1036 line is not assert(0 == "bl should be align"). 2015-12-16 7:56 GMT+08:00 Willem Jan Withagen : > Hi, > > I'm receiving traps when running the tests going with 'gmake check' > and on one

Re: FreeBSD is receiving traps on os/FileJournal.cc:1036

2015-12-16 Thread
Willem Jan Withagen <w...@digiware.nl>: > On 16-12-2015 02:57, Xinze Chi (信泽) wrote: >> You mean your ceph assert(0 == "bl should be align"), right? >> >> But in master branch, the 1036 line is not assert(0 == "bl should be align"). > > Yes you

Re: FreeBSD Building and Testing

2015-12-20 Thread
sorry for delay reply. Please have a try https://github.com/ceph/ceph/commit/ae4a8162eacb606a7f65259c6ac236e144bfef0a. 2015-12-21 0:10 GMT+08:00 Willem Jan Withagen : > Hi, > > Most of the Ceph is getting there in the most crude and rough state. > So beneath is a status update

Fwd: FreeBSD Building and Testing

2015-12-20 Thread
-- Forwarded message -- From: Xinze Chi (信泽) <xmdx...@gmail.com> Date: 2015-12-21 8:59 GMT+08:00 Subject: Re: FreeBSD Building and Testing To: Willem Jan Withagen <w...@digiware.nl> Please try this patch https://github.com/XinzeChi

Re: [ceph-users] why not add (offset,len) to pglog

2015-12-24 Thread
Yeah, This is good idea for recovery, but not for backfill. @YaoNing have pull a request about this https://github.com/ceph/ceph/pull/3837 this year. 2015-12-25 11:16 GMT+08:00 Dong Wu : > Hi, > I have doubt about pglog, the pglog contains (op,object,version) etc. > when

Re: Cache pool bug?

2014-10-27 Thread
) *atime = now - p-first; if (temp) ++(*temp); else return; } } 2014-10-28 9:38 GMT+08:00 池信泽 xmdx...@gmail.com: I think there is also bug in ReplicatedPG::agent_estimate_atime_temp I think we should change the following code: for (maptime_t,HitSetRef::iterator p = agent_state

Re: Cache pool bug?

2014-10-27 Thread
Because if there are mutiple access time in agent_state for the same object, we should use the recently one. 2014-10-28 9:42 GMT+08:00 池信泽 xmdx...@gmail.com: I think there is also bug in ReplicatedPG::agent_estimate_atime_temp I think we should change the following code: for (maptime_t

Re: Cache pool bug?

2014-10-27 Thread
I think if it changed to be *atime = p-first, Below logic should also be modified. ReplicatedPG::agent_maybe_evict if (atime 0 obc-obs.oi.mtime != utime_t()) { if (obc-obs.oi.local_mtime != utime_t()) { atime = ceph_clock_now(NULL).sec() - obc-obs.oi.local_mtime; }

Re: Cache pool bug?

2014-10-27 Thread
I am sorry. We should only just modify the logic ReplicatedPG::agent_estimate_atime_temp, atime should be now. if (temp) *temp = 0; if (hit_set-contains(oid)) { *atime = 0; if (temp) ++(*temp); else return; } 2014-10-28 12:28 GMT+08:00 池信泽 xmdx...@gmail.com: I

Re: Cache pool bug?

2014-10-27 Thread
return; } } } 2014-10-28 12:36 GMT+08:00 池信泽 xmdx...@gmail.com: I am sorry. We should only just modify the logic ReplicatedPG::agent_estimate_atime_temp, atime should be now. if (temp) *temp = 0; if (hit_set-contains(oid)) { *atime = 0; if (temp) ++(*temp

Fwd: Cache pool bug?

2014-10-27 Thread
-- Forwarded message -- From: 池信泽 xmdx...@gmail.com Date: 2014-10-28 13:37 GMT+08:00 Subject: Re: Cache pool bug? To: Wang, Zhiqiang zhiqiang.w...@intel.com If we use the timestamp, get_position_micro(int32_t v, uint64_t *lower, uint64_t *upper) unsigned bin = calc_bits_of

zeromq in ceph?

2014-11-06 Thread
hi, cephers: Zeromq is a very high-speed asynchronous I/O engine, which is also used in storm, a distributed and fault-tolerant realtime computation. Does anyone want to use it in the communication of osds? Thanks -- To unsubscribe from this list: send the line unsubscribe

the cpu optimize in ceph

2014-12-12 Thread
hi, cephers: Now, I want to reduce the cpu usage rate by osd in full ssd cluster. In my test case, ceph run out of cpu, the cpu idle is about 10%. The cpu in my cluster is Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz. Can you give me some suggestion? Thanks. There are the cpu

more human readable log to track request or using mapreduce for data statistics

2015-03-26 Thread
hi,ceph: Currently, the command ”ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_historic_ops“ may return as below: { description: osd_op(client.4436.1:11617 rb.0.1153.6b8b4567.0192 [] 2.8eb4757c ondisk+write e92), received_at: 2015-03-25 19:41:47.146145, age:

set different target_max_bytes for each pg in cache pool

2015-04-22 Thread
hi, all: In my production environment, each pg is the same pool has different io pressure. The max is 3- 4 times more than min. Currently, max size of each pg is pool.info.target_max_bytes / pg_num. So I think, we could do better. Such as, set different pg target_max_bytes according to

ec pool history objects

2015-06-15 Thread
hi, all: when I use ec poll, I see there are some object history for object xx. Such as: xx__head_610951D6__2_fe1_2, xx__head_610951D6__2_fe2_2 xx__head_610951D6__2__2 I think this object is used for roll_back when not all shards have written object

Re: pg scrub check problem

2015-10-28 Thread
Are you sure the osd begin to scrub? maybe you could check it from osd log, or using 'ceph pg dump' to check whether the scrub stamp changes or not. Because there is some strategy which would reject the scrub command Such as the system load , osd_scrub_min_interval, osd_deep_scrub_interval and so

Re: pg scrub check problem

2015-10-28 Thread
Yes, I think we should also the scrub_interval. OSD::sched_scrub() { if ((double)diff < cct->_conf->osd_scrub_min_interval) { dout(10) << "sched_scrub " << pgid << " at " << t << ": " << (double)diff << " < min (" << cct->_conf->osd_scrub_min_interval << " seconds)" << dendl; break; }

why ShardedWQ in osd using smart pointer for PG?

2015-11-10 Thread
hi, all: op_wq is declared as ShardedThreadPool::ShardedWQ < pair > _wq. I do not know why we should use PGRef in this? Because the overhead of the smart pointer is not small. Maybe the raw point PG* is also OK? If op_wq is changed to

Re: disabling buffer::raw crc cache

2015-11-11 Thread
Evgeniy Firsov <evgeniy.fir...@sandisk.com>: > Rb-tree construction, insertion, which needs memory allocation, mutex > lock, unlock is more CPU expensive then streamlined crc calculation of > sometimes 100 bytes or less. > > On 11/11/15, 12:03 AM, "池信泽" <xmdx...@g

Re: why ShardedWQ in osd using smart pointer for PG?

2015-11-10 Thread
I wonder if we want to keep the PG from going out of scope at an inopportune time, why snap_trim_queue and scrub_queue declared as xlist<PG*> instead of xlist? 2015-11-11 2:28 GMT+08:00 Gregory Farnum <gfar...@redhat.com>: > On Tue, Nov 10, 2015 at 7:19 AM, 池信泽 <xmdx...@gmail

Re: disabling buffer::raw crc cache

2015-11-11 Thread
Ah, I confuse that why the crc cache logic would exhaust so much cpu. 2015-11-11 15:27 GMT+08:00 Evgeniy Firsov : > Hello, Guys! > > While running CPU bound 4k block workload, I found that disabling crc > cache in the buffer::raw gives around 7% performance

Re: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread
If we keep them separate and pass them to ObjectStore::queue_transactions() ,the cpu usage of ObjectStore::queue_transactions() would take up from 6.03% to 6.76% compared with re-using op_t items. 2015-11-01 11:05 GMT+08:00 池信泽 <xmdx...@gmail.com>: > Yes, I think so. > keeping t

Re: why we should use two Mutex in OSD ShardData?

2015-10-30 Thread
I do not see any improvement by moving to single mutex. I just fell puzzle why we use two mutex. But I also do not see any improvement using two mutex in my environment. Thanks for your explanation. 2015-10-30 22:59 GMT+08:00 Somnath Roy : > > Hi xinze, > This is mainly

why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread
hi, all: There are two ObjectStore::Transaction in ReplicatedBackend::submit_transaction, one is op_t and the other one is local_t. Is that something critilal logic we should consider? If we could reuse variable op_t it would be great. Because it is expensive to calling

ceph encoding optimization

2015-11-04 Thread
hi, all: I am focus on the cpu usage of ceph now. I find the struct (such as pg_info_t , transaction and so on) encode and decode exhaust too much cpu resource. For now, we should encode every member variable one by one which calling encode_raw finally. When there are many members, we

Re: ceph encoding optimization

2015-11-04 Thread
I agree with pg_stat_t (and friends) is a good first start. The eversion_t and utime_t are also good choice to start because they are used at many places. 2015-11-04 23:07 GMT+08:00 Gregory Farnum <gfar...@redhat.com>: > On Wed, Nov 4, 2015 at 7:00 AM, 池信泽 <xmdx...@gmail.com> wr

Re: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread
the op_t items are encoded in issue_op, so after issue_op, we could use it directly instead of local_t items? 2015-10-31 21:18 GMT+08:00 Sage Weil : > On Sat, 31 Oct 2015, ??? wrote: >> hi, all: >> >> There are two ObjectStore::Transaction in >>

Re: why we use two ObjectStore::Transaction in ReplicatedBackend::submit_transaction?

2015-10-31 Thread
er 31, 2015 8:35 AM > To: Sage Weil > Cc: 池信泽; ceph-devel@vger.kernel.org > Subject: Re: why we use two ObjectStore::Transaction in > ReplicatedBackend::submit_transaction? > > Yeah, since issue_op is called before log_operation, we may consider to reuse > op_t after se

Re: Regarding op_t, local_t

2015-11-18 Thread
Good catch. I think it does make sense. 2015-11-19 9:54 GMT+08:00 Somnath Roy : > Hi Sage, > I saw we are now having single transaction in submit_transaction. But, > in the replication path we are still having two transaction, can't we merge > it to one there ? > >

Re: Regarding op_t, local_t

2015-11-18 Thread
FileStore worker thread , I need to introduce heap allocation on that > path as well , reducing one transaction would help there. > > Thanks & Regards > Somnath > > -Original Message- > From: 池信泽 [mailto:xmdx...@gmail.com] > Sent: Wednesday, November 18, 2

Re: 答复: journal alignment

2015-11-20 Thread
Yes, You are right. But in ceph master branch, we have already prepare_entry(add padding) before sumibt_entry. If you have good idea base on this, It would be great. 2015-11-20 17:12 GMT+08:00 changtao381 : > Hi All, > > Thanks for you apply! > > If directioIO + async IO

Re: 答复: journal alignment

2015-11-20 Thread
Because we should keep the logic of journal write thread simple. It performs better in pcie ssd. But I think the strategy you mentioned above is good for hdd or sata ssd. 2015-11-20 17:16 GMT+08:00 池信泽 <xmdx...@gmail.com>: > Yes, You are right. But in ceph master branch, we hav