Because we should keep the logic of journal write thread simple. It
performs better in pcie ssd. But I think the strategy you mentioned
above is good for hdd or sata ssd.
2015-11-20 17:16 GMT+08:00 池信泽 :
> Yes, You are right. But in ceph master branch, we have already
> prepare_entry(add p
Yes, You are right. But in ceph master branch, we have already
prepare_entry(add padding) before sumibt_entry. If you have good idea
base on this,
It would be great.
2015-11-20 17:12 GMT+08:00 changtao381 :
> Hi All,
>
> Thanks for you apply!
>
> If directioIO + async IO requirement that alignment
worker thread , I need to introduce heap allocation on that
> path as well , reducing one transaction would help there.
>
> Thanks & Regards
> Somnath
>
> -Original Message-
> From: 池信泽 [mailto:xmdx...@gmail.com]
> Sent: Wednesday, November 18, 2015 6:00 PM
> To:
Good catch. I think it does make sense.
2015-11-19 9:54 GMT+08:00 Somnath Roy :
> Hi Sage,
> I saw we are now having single transaction in submit_transaction. But,
> in the replication path we are still having two transaction, can't we merge
> it to one there ?
>
> Thanks & Regards
> Somnath
> --
Evgeniy Firsov :
> Rb-tree construction, insertion, which needs memory allocation, mutex
> lock, unlock is more CPU expensive then streamlined crc calculation of
> sometimes 100 bytes or less.
>
> On 11/11/15, 12:03 AM, "池信泽" wrote:
>
>>Ah, I confuse that why the crc
Ah, I confuse that why the crc cache logic would exhaust so much cpu.
2015-11-11 15:27 GMT+08:00 Evgeniy Firsov :
> Hello, Guys!
>
> While running CPU bound 4k block workload, I found that disabling crc
> cache in the buffer::raw gives around 7% performance improvement.
>
> If there is no strong u
I wonder if we want to keep the PG from going out of scope at an
inopportune time, why snap_trim_queue and scrub_queue declared as
xlist instead of xlist?
2015-11-11 2:28 GMT+08:00 Gregory Farnum :
> On Tue, Nov 10, 2015 at 7:19 AM, 池信泽 wrote:
>> hi, all:
>>
>>
hi, all:
op_wq is declared as ShardedThreadPool::ShardedWQ < pair > &op_wq. I do not know why we should use PGRef in this?
Because the overhead of the smart pointer is not small. Maybe the
raw point PG* is also OK?
If op_wq is changed to ShardedThreadPool::ShardedWQ < pair > &op
I agree with pg_stat_t (and friends) is a good first start.
The eversion_t and utime_t are also good choice to start because they
are used at many places.
2015-11-04 23:07 GMT+08:00 Gregory Farnum :
> On Wed, Nov 4, 2015 at 7:00 AM, 池信泽 wrote:
>> hi, all:
>>
>> I am f
hi, all:
I am focus on the cpu usage of ceph now. I find the struct (such
as pg_info_t , transaction and so on) encode and decode exhaust too
much cpu resource.
For now, we should encode every member variable one by one which
calling encode_raw finally. When there are many members, we s
If we keep them separate and pass them to
ObjectStore::queue_transactions() ,the cpu usage of
ObjectStore::queue_transactions() would take up from 6.03% to 6.76%
compared with re-using op_t items.
2015-11-01 11:05 GMT+08:00 池信泽 :
> Yes, I think so.
> keeping them separate and pass t
Yes, I think so.
keeping them separate and pass them to
ObjectStore::queue_transactions() would increase the time on
transaction encode process and exhaust more cpu.
The transaction::append holds 0.8% cpu on my environment.
The transaction encoding is also really a bottleneck which process
holds 1
ing out 2 transaction. No more append
> call..
>
> Thanks & Regards
> Somnath
>
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org
> [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Ning Yao
> Sent: Saturday, October 31, 2015 8:35 AM
>
the op_t items are encoded in issue_op, so after issue_op, we could
use it directly instead of local_t items?
2015-10-31 21:18 GMT+08:00 Sage Weil :
> On Sat, 31 Oct 2015, ??? wrote:
>> hi, all:
>>
>> There are two ObjectStore::Transaction in
>> ReplicatedBackend::submit_transaction, one is op
hi, all:
There are two ObjectStore::Transaction in
ReplicatedBackend::submit_transaction, one is op_t and the other one
is local_t. Is that something
critilal logic we should consider?
If we could reuse variable op_t it would be great. Because it is
expensive to calling local_t.append
I do not see any improvement by moving to single mutex. I just fell
puzzle why we use two mutex.
But I also do not see any improvement using two mutex in my environment.
Thanks for your explanation.
2015-10-30 22:59 GMT+08:00 Somnath Roy :
>
> Hi xinze,
> This is mainly for reducing lock contentio
hi, all:
There are two Mutex in ShardData, one is sdata_lock and the other
one is sdata_op_ordering_lock.
I wonder could we replace sdata_lock with sdata_op_ordering_lock?
--
Regards,
xinze
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message
Yes, I think we should also the scrub_interval.
OSD::sched_scrub()
{
if ((double)diff < cct->_conf->osd_scrub_min_interval) {
dout(10) << "sched_scrub " << pgid << " at " << t
<< ": " << (double)diff << " < min (" <<
cct->_conf->osd_scrub_min_interval << " seconds)" << dendl;
break;
}
Are you sure the osd begin to scrub? maybe you could check it from osd
log, or using 'ceph pg dump' to
check whether the scrub stamp changes or not.
Because there is some strategy which would reject the scrub command
Such as the system load , osd_scrub_min_interval,
osd_deep_scrub_interval and so o
hi, all:
when I use ec poll, I see there are some object history for object xx.
Such as: xx__head_610951D6__2_fe1_2,
xx__head_610951D6__2_fe2_2
xx__head_610951D6__2__2
I think this object is used for roll_back when not all shards have
written object to
hi, all:
In my production environment, each pg is the same pool has
different io pressure. The max is 3-
4 times more than min. Currently, max size of each pg is
pool.info.target_max_bytes / pg_num.
So I think, we could do better. Such as, set different pg
target_max_bytes according to thei
hi,ceph:
Currently, the command ”ceph --admin-daemon
/var/run/ceph/ceph-osd.0.asok dump_historic_ops“ may return as below:
{ "description": "osd_op(client.4436.1:11617
rb.0.1153.6b8b4567.0192 [] 2.8eb4757c ondisk+write e92)",
"received_at": "2015-03-25 19:41:47.146145",
hi, cephers:
Now, I want to reduce the cpu usage rate by osd in full ssd
cluster. In my test case, ceph run out of cpu, the cpu idle is about
10%.
The cpu in my cluster is Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz.
Can you give me some suggestion?
Thanks.
There are the cpu u
hi, cephers:
Zeromq is a very high-speed asynchronous I/O engine, which is
also used in storm,
a distributed and fault-tolerant realtime computation. Does anyone
want to use it in the
communication of osds?
Thanks
--
To unsubscribe from this list: send the line "unsubscribe ceph-de
-- Forwarded message --
From: 池信泽
Date: 2014-10-28 13:37 GMT+08:00
Subject: Re: Cache pool bug?
To: "Wang, Zhiqiang"
If we use the timestamp,
get_position_micro(int32_t v, uint64_t *lower, uint64_t *upper)
unsigned bin = calc_bits_of(v); // this function will
w - p->first;
if (temp)
++(*temp);
else
return;
}
}
}
2014-10-28 12:36 GMT+08:00 池信泽 :
> I am sorry. We should only just modify the logic
> ReplicatedPG::agent_estimate_atime_temp, atime should be now.
>
> if (temp)
> *temp = 0;
> if (hit_set->conta
I am sorry. We should only just modify the logic
ReplicatedPG::agent_estimate_atime_temp, atime should be now.
if (temp)
*temp = 0;
if (hit_set->contains(oid)) {
*atime = 0;
if (temp)
++(*temp);
else
return;
}
2014-10-28 12:28 GMT+08:00 池信泽 :
> I think
I think if it changed to be *atime = p->first, Below logic should
also be modified.
ReplicatedPG::agent_maybe_evict
if (atime < 0 && obc->obs.oi.mtime != utime_t()) {
if (obc->obs.oi.local_mtime != utime_t()) {
atime = ceph_clock_now(NULL).sec() - obc->obs.oi.local_mtime;
Because if there are mutiple access time in agent_state for the same
object, we should use the recently one.
2014-10-28 9:42 GMT+08:00 池信泽 :
> I think there is also bug in ReplicatedPG::agent_estimate_atime_temp
> I think we should change the following code:
> for (map::iterator p = ag
e < 0)
*atime = now - p->first;
if (temp)
++(*temp);
else
return;
}
}
2014-10-28 9:38 GMT+08:00 池信泽 :
> I think there is also bug in ReplicatedPG::agent_estimate_atime_temp
> I think we should change the following code:
> for (map::iterator p = agent_state-&g
30 matches
Mail list logo