Re: [ceph-users] Fwd: Hammer OSD memory increase when add new machine

2016-11-08 Thread Dong Wu
Thanks, though CERN 30PB cluster test, the osdmap caches causes memory
increase, I'll test how these configs( osd_map_cache_size,
osd_map_max_advance, etc.) influence the memory usage.

2016-11-08 22:48 GMT+08:00 zphj1987 <zphj1...@gmail.com>:
> I  remember CERN had a test ceph cluster 30PB  and the osd use more memery
> than usual   ,and thay tune osdmap_epochs ,if it is the osdmap make it use
> more memery,ithink  you may have a test use less osdmap_epochs to see if
> have some change
>
> default mon_min_osdmap_epochs is 500
>
>
> zphj1987
>
> 2016-11-08 22:08 GMT+08:00 Sage Weil <s...@newdream.net>:
>>
>> > -- Forwarded message --
>> > From: Dong Wu <archer.wud...@gmail.com>
>> > Date: 2016-10-27 18:50 GMT+08:00
>> > Subject: Re: [ceph-users] Hammer OSD memory increase when add new
>> > machine
>> > To: huang jun <hjwsm1...@gmail.com>
>> > 抄送: ceph-users <ceph-users@lists.ceph.com>
>> >
>> >
>> > 2016-10-27 17:50 GMT+08:00 huang jun <hjwsm1...@gmail.com>:
>> > > how do you add the new machine ?
>> > > does it first added to default ruleset and then you add the new rule
>> > > for this group?
>> > > do you have data pool use the default rule, does these pool contain
>> > > data?
>> >
>> > we dont use default ruleset, when we add new group machine,
>> > crush_location auto generate root and chassis, then we add a new rule
>> > for this group.
>> >
>> >
>> > > 2016-10-27 17:34 GMT+08:00 Dong Wu <archer.wud...@gmail.com>:
>> > >> Hi all,
>> > >>
>> > >> We have a ceph cluster only use rbd. The cluster contains several
>> > >> group machines, each group contains several machines, then each
>> > >> machine has 12 SSDs, each ssd as an OSD (journal and data together).
>> > >> eg:
>> > >> group1: machine1~machine12
>> > >> group2: machine13~machine24
>> > >> ..
>> > >> each group is separated with other group, which means each group has
>> > >> separated pools.
>> > >>
>> > >> we use Hammer(0.94.6) compiled with jemalloc(4.2).
>> > >>
>> > >> We have found that when we add a new group machine, the other group
>> > >> machine's memory increase 5% more or less (OSDs usage).
>> > >>
>> > >> each group's data is separated with others, so backfill only in
>> > >> group,
>> > >> not across.
>> > >> Why add a group of machine cause others memory increase? Is this
>> > >> reasonable?
>>
>> It could be cached OSDmaps (they get slightly larger when you add OSDs)
>> but it's hard to say.  It seems more likely that the pools and crush rules
>> aren't configured right and you're adding OSDs to the wrong group.
>>
>> If you look at the 'ceph daemon osd.NNN perf dump' output you can see,
>> among other things, how many PGs are on the OSD.  Can you capture the
>> output before and after the change (and 5% memory footprint increase)?
>>
>> sage
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Hammer OSD memory increase when add new machine

2016-11-07 Thread Dong Wu
any sugesstions?

Thanks.


-- Forwarded message --
From: Dong Wu <archer.wud...@gmail.com>
Date: 2016-10-27 18:50 GMT+08:00
Subject: Re: [ceph-users] Hammer OSD memory increase when add new machine
To: huang jun <hjwsm1...@gmail.com>
抄送: ceph-users <ceph-users@lists.ceph.com>


2016-10-27 17:50 GMT+08:00 huang jun <hjwsm1...@gmail.com>:
> how do you add the new machine ?
> does it first added to default ruleset and then you add the new rule
> for this group?
> do you have data pool use the default rule, does these pool contain data?

we dont use default ruleset, when we add new group machine,
crush_location auto generate root and chassis, then we add a new rule
for this group.


> 2016-10-27 17:34 GMT+08:00 Dong Wu <archer.wud...@gmail.com>:
>> Hi all,
>>
>> We have a ceph cluster only use rbd. The cluster contains several
>> group machines, each group contains several machines, then each
>> machine has 12 SSDs, each ssd as an OSD (journal and data together).
>> eg:
>> group1: machine1~machine12
>> group2: machine13~machine24
>> ..
>> each group is separated with other group, which means each group has
>> separated pools.
>>
>> we use Hammer(0.94.6) compiled with jemalloc(4.2).
>>
>> We have found that when we add a new group machine, the other group
>> machine's memory increase 5% more or less (OSDs usage).
>>
>> each group's data is separated with others, so backfill only in group,
>> not across.
>> Why add a group of machine cause others memory increase? Is this reasonable?
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Thank you!
> HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Hammer OSD memory increase when add new machine

2016-10-27 Thread Dong Wu
Hi all,

We have a ceph cluster only use rbd. The cluster contains several
group machines, each group contains several machines, then each
machine has 12 SSDs, each ssd as an OSD (journal and data together).
eg:
group1: machine1~machine12
group2: machine13~machine24
..
each group is separated with other group, which means each group has
separated pools.

we use Hammer(0.94.6) compiled with jemalloc(4.2).

We have found that when we add a new group machine, the other group
machine's memory increase 5% more or less (OSDs usage).

each group's data is separated with others, so backfill only in group,
not across.
Why add a group of machine cause others memory increase? Is this reasonable?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] lsof ceph-osd find many "can't identify protocol"

2016-05-17 Thread Dong Wu
Hi, cephers.
I use lsof in my system, find a lot of "can't identify protocol", dose
it mean socket descriptor leaks?
ceph-osd   5389root  112u sock0,7
  0t0  295880018 can't identify protocol
ceph-osd   5389root  136u sock0,7
  0t0  295572256 can't identify protocol
ceph-osd   5389root  176u sock0,7
  0t0  292738022 can't identify protocol
ceph-osd   5389root  240u sock0,7
  0t0  297919149 can't identify protocol
ceph-osd   5389root  301u sock0,7
  0t0  313075907 can't identify protocol
ceph-osd   5389root  351u sock0,7
  0t0  295314260 can't identify protocol
ceph-osd   5389root  617u sock0,7
  0t0  296221898 can't identify protocol
ceph-osd   5389root  657u sock0,7
  0t0  313075919 can't identify protocol
ceph-osd   5389root  714u sock0,7
  0t0  295881042 can't identify protocol
ceph-osd   5389root  743u sock0,7
  0t0  295904170 can't identify protocol
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] after upgrade from 0.80.11 to 0.94.6, rbd cmd core dump

2016-03-25 Thread Dong Wu
Hi all, I upgraded my cluster from 0.80.11 to 0.94.6, everything is ok
except that rbd cmd cord dump on one host and success on others.
I have disabled auth in  ceph.conf:
auth_cluster_required = none
auth_service_required = none
auth_client_required = none

here is the core message.

$ sudo rbd ls
2016-03-25 16:00:43.043000 7f3ae6c13780  1 -- :/0 messenger.start
2016-03-25 16:00:43.043329 7f3ae6c13780  1 -- :/1008171 -->
10.180.0.46:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0
0x434a330 con 0x4349fc0
2016-03-25 16:00:43.043377 7f3ae6c13780  0 -- :/1008171 submit_message
auth(proto 0 30 bytes epoch 0) v1
 : 00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00 : 
0010 : 00 00 00 00 00 00 1e 00 00 00 01 01 00 00 00 01 : 
0020 : 00 00 00 08 00 00 00 05 00 00 00 61 64 6d 69 6e : ...admin
0030 : 00 00 00 00 00 00 00 00 00 00 00 00 : 

2016-03-25 16:00:43.043450 7f3adb7fe700  1 monclient(hunting): continuing hunt
2016-03-25 16:00:43.043489 7f3adb7fe700  1 -- :/1008171 mark_down
0x4349fc0 -- 0x4349d30
2016-03-25 16:00:43.043614 7f3adb7fe700  1 -- :/1008171 -->
10.180.0.31:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0
0x7f39cc001060 con 0x7f39cc000cf0
2016-03-25 16:00:43.043648 7f3adb7fe700  0 -- :/1008171 submit_message
auth(proto 0 30 bytes epoch 0) v1
 : 00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00 : 
0010 : 00 00 00 00 00 00 1e 00 00 00 01 01 00 00 00 01 : 
0020 : 00 00 00 08 00 00 00 05 00 00 00 61 64 6d 69 6e : ...admin
0030 : 00 00 00 00 00 00 00 00 00 00 00 00 : 

2016-03-25 16:00:43.043694 7f3ae6c13780  0 monclient(hunting):
authenticate timed out after 2.47033e-321
*** Caught signal (Segmentation fault) **
 in thread 7f3adbfff700
2016-03-25 16:00:43.043756 7f3adb7fe700  1 monclient(hunting): continuing hunt
2016-03-25 16:00:43.043749 7f3ae6c13780  0 librados: client.admin
authentication error (110) Connection timed out
 ceph version 0.94.6-2-gbb98b8f (bb98b8fcb0bb0bd3688310f6a1688736ef422b25)
 1: rbd() [0x60408c]
 2: (()+0xf8d0) [0x7f3ae4ea88d0]
 3: rbd() [0x52b841]
 4: (Mutex::~Mutex()+0x9b) [0x562a6b]
 5: (Connection::~Connection()+0x6e) [0x7f3ae5550fce]
 6: (Connection::~Connection()+0x9) [0x7f3ae5551049]
 7: (Pipe::~Pipe()+0x90) [0x7f3ae553f330]
 8: (Pipe::~Pipe()+0x9) [0x7f3ae553f4e9]
 9: (SimpleMessenger::reaper()+0x8a9) [0x7f3aebf9]
 10: (SimpleMessenger::reaper_entry()+0x88) [0x7f3ae5556b38]
 11: (SimpleMessenger::ReaperThread::entry()+0xd) [0x7f3ae555ba8d]
 12: (()+0x80a4) [0x7f3ae4ea10a4]
 13: (clone()+0x6d) [0x7f3ae3a2d04d]
2016-03-25 16:00:43.045278 7f3adbfff700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7f3adbfff700

 ceph version 0.94.6-2-gbb98b8f (bb98b8fcb0bb0bd3688310f6a1688736ef422b25)
 1: rbd() [0x60408c]
 2: (()+0xf8d0) [0x7f3ae4ea88d0]
 3: rbd() [0x52b841]
 4: (Mutex::~Mutex()+0x9b) [0x562a6b]
 5: (Connection::~Connection()+0x6e) [0x7f3ae5550fce]
 6: (Connection::~Connection()+0x9) [0x7f3ae5551049]
 7: (Pipe::~Pipe()+0x90) [0x7f3ae553f330]
 8: (Pipe::~Pipe()+0x9) [0x7f3ae553f4e9]
 9: (SimpleMessenger::reaper()+0x8a9) [0x7f3aebf9]
 10: (SimpleMessenger::reaper_entry()+0x88) [0x7f3ae5556b38]
 11: (SimpleMessenger::ReaperThread::entry()+0xd) [0x7f3ae555ba8d]
 12: (()+0x80a4) [0x7f3ae4ea10a4]
 13: (clone()+0x6d) [0x7f3ae3a2d04d]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

--- begin dump of recent events ---
   -39> 2016-03-25 16:00:43.036565 7f3ae6c13780  5 asok(0x42f1830)
register_command perfcounters_dump hook 0x42f5000
   -38> 2016-03-25 16:00:43.036596 7f3ae6c13780  5 asok(0x42f1830)
register_command 1 hook 0x42f5000
   -37> 2016-03-25 16:00:43.036608 7f3ae6c13780  5 asok(0x42f1830)
register_command perf dump hook 0x42f5000
   -36> 2016-03-25 16:00:43.036621 7f3ae6c13780  5 asok(0x42f1830)
register_command perfcounters_schema hook 0x42f5000
   -35> 2016-03-25 16:00:43.036630 7f3ae6c13780  5 asok(0x42f1830)
register_command 2 hook 0x42f5000
   -34> 2016-03-25 16:00:43.036634 7f3ae6c13780  5 asok(0x42f1830)
register_command perf schema hook 0x42f5000
   -33> 2016-03-25 16:00:43.036639 7f3ae6c13780  5 asok(0x42f1830)
register_command perf reset hook 0x42f5000
   -32> 2016-03-25 16:00:43.036643 7f3ae6c13780  5 asok(0x42f1830)
register_command config show hook 0x42f5000
   -31> 2016-03-25 16:00:43.036651 7f3ae6c13780  5 asok(0x42f1830)
register_command config set hook 0x42f5000
   -30> 2016-03-25 16:00:43.036654 7f3ae6c13780  5 asok(0x42f1830)
register_command config get hook 0x42f5000
   -29> 2016-03-25 16:00:43.036659 7f3ae6c13780  5 asok(0x42f1830)
register_command config diff hook 0x42f5000
   -28> 2016-03-25 16:00:43.036662 7f3ae6c13780  5 asok(0x42f1830)
register_command log flush hook 0x42f5000
   -27> 2016-03-25 16:00:43.036667 7f3ae6c13780  5 asok(0x42f1830)
register_command log dump hook 0x42f5000
   -26> 2016-03-25 16:00:43.036670 7f3ae6c13780  5 asok(0x42f1830)

Re: [ceph-users] why not add (offset,len) to pglog

2016-03-14 Thread Dong Wu
Based on Yao Ning's PR, I promote a new PR for this
https://github.com/ceph/ceph/pull/8083

In this PR, i also solved such a upgrade situation problem:
consider such a upgrade situation which we need to upgrade to this
can_recover_partial version:
eg. a pg 3.67 [0, 1, 2]
1)firstly, we update osd.0(service ceph restart osd.0), and recover
normally, everything goes on;
2)a write req(eg. req1, will write to obj1) is sent to primary(osd.0),
and pglog record such a req;
3)then we update osd.1, req1 send to osd.1 fail, but will send to
osd.2, when osd.2 is dealing with the req(just in function
do_request), pg3.67 starts peering, then on osd.7, it call
can_discard_request to check that req1 should be dropped;
4)so the req1 only write successfuly on osd.0, because min_size=2,
osd.0 re-enqueue the req1;
5)when peering, primary find that req1's object obj1 is missing on
osd.1 and osd.2, so recover the object;
6)because osd.0 and osd.1 is already updated, osd.0 will calculate
partial data in prep_push_to_replica, and osd.1 can deal with the
partial data very well,
7)but osd.2 has not been updated, on osd.2's code
logic(submit_push_data), it will remove origin object first, then
write the partial data from osd.0, so the origin data of the object is
lost;

2016-01-22 19:40 GMT+08:00 Ning Yao <zay11...@gmail.com>:
> Great! Based on Sage's suggestion, we just add a flag
> can_recover_partial to indicate whether.
> And I promote a new PR for this https://github.com/ceph/ceph/pull/7325
> Please review and comment
> Regards
> Ning Yao
>
>
> 2015-12-25 22:27 GMT+08:00 Sage Weil <s...@newdream.net>:
>> On Fri, 25 Dec 2015, Ning Yao wrote:
>>> Hi, Dong Wu,
>>>
>>> 1. As I currently work for other things, this proposal is abandon for
>>> a long time
>>> 2. This is a complicated task as we need to consider a lots such as
>>> (not just for writeOp, as well as truncate, delete) and also need to
>>> consider the different affects for different backends(Replicated, EC).
>>> 3. I don't think it is good time to redo this patch now, since the
>>> BlueStore and Kstore  is inprogress, and I'm afraid to bring some
>>> side-effect.  We may prepare and propose the whole design in next CDS.
>>> 4. Currently, we already have some tricks to deal with recovery (like
>>> throttle the max recovery op, set the priority for recovery and so
>>> on). So this kind of patch may not solve the critical problem but just
>>> make things better, and I am not quite sure that this will really
>>> bring a big improvement. Based on my previous test, it works
>>> excellently on slow disk (say hdd), and also for a short-time
>>> maintaining. Otherwise, it will trigger the backfill process.  So wait
>>> for Sage's opinion @sage
>>>
>>> If you are interest on this, we may cooperate to do this.
>>
>> I think it's a great idea.  We didn't do it before only because it is
>> complicated.  The good news is that if we can't conclusively infer exactly
>> which parts of hte object need to be recovered from the log entry we can
>> always just fall back to recovering the whole thing.  Also, the place
>> where this is currently most visible is RBD small writes:
>>
>>  - osd goes down
>>  - client sends a 4k overwrite and modifies an object
>>  - osd comes back up
>>  - client sends another 4k overwrite
>>  - client io blocks while osd recovers 4mb
>>
>> So even if we initially ignore truncate and omap and EC and clones and
>> anything else complicated I suspect we'll get a nice benefit.
>>
>> I haven't thought about this too much, but my guess is that the hard part
>> is making the primary's missing set representation include a partial delta
>> (say, an interval_set<> indicating which ranges of the file have changed)
>> in a way that gracefully degrades to recovering the whole object if we're
>> not sure.
>>
>> In any case, we should definitely have the design conversation!
>>
>> sage
>>
>>>
>>> Regards
>>> Ning Yao
>>>
>>>
>>> 2015-12-25 14:23 GMT+08:00 Dong Wu <archer.wud...@gmail.com>:
>>> > Thanks, from this pull request I learned that this issue is not
>>> > completed, is there any new progress of this issue?
>>> >
>>> > 2015-12-25 12:30 GMT+08:00 Xinze Chi (??) <xmdx...@gmail.com>:
>>> >> Yeah, This is good idea for recovery, but not for backfill.
>>> >> @YaoNing have pull a request about this
>>> >> https://github.com/ceph/ceph/pull/3837 this year.
>>> >>
>>> >> 2015-12-25 11:16 GMT+08:00 Dong Wu <a

[ceph-users] how to downgrade when upgrade from firefly to hammer fail

2016-03-06 Thread Dong Wu
hi, cephers
I want to upgrade my ceph cluster from firefly(0.80.11) to hammer,
when i successfully install hammer deb package on all my hosts, then i
update monitor first, and it success.
but when i restart osds on one host to upgrade, it failed, osds
cannot startup, then i want to downgrade to firefly again to keep my
cluster going on, after i reinstall firefly deb package, i failed to
start osds on the host, here is the log:

2016-03-07 09:47:14.704242 7f2f11ba87c0  0 ceph version 0.80.11
(8424145d49264624a3b0a204aedb127835161070), process ceph-osd, pid
37459
2016-03-07 09:47:14.709159 7f2f11ba87c0 -1
filestore(/var/lib/ceph/osd/ceph-0) FileStore::mount : stale version
stamp 4. Please run the FileStore update script before starting the
OSD, or set filestore_update_to to 3
2016-03-07 09:47:14.709176 7f2f11ba87c0 -1  ** ERROR: error converting
store /var/lib/ceph/osd/ceph-0: (22) Invalid argument
2016-03-07 09:47:18.385399 7f98478187c0  0 ceph version 0.80.11
(8424145d49264624a3b0a204aedb127835161070), process ceph-osd, pid
39041
2016-03-07 09:47:18.390320 7f98478187c0 -1
filestore(/var/lib/ceph/osd/ceph-0) FileStore::mount : stale version
stamp 4. Please run the FileStore update script before starting the
OSD, or set filestore_update_to to 3
2016-03-07 09:47:18.390337 7f98478187c0 -1  ** ERROR: error converting
store /var/lib/ceph/osd/ceph-0: (22) Invalid argument

  how can i downgrade to firefly successfully?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read IO to object while new data still in journal

2015-12-30 Thread Dong Wu
what i know is that librbd use applied callback, here is the code: in
send_write() calls librados::Rados::aio_create_completion,  parameter
rados_req_cb is cb_safe, and cb_complete is NULL, the cb_safe is just
applied callback.

  void AbstractWrite::send_write() {
ldout(m_ictx->cct, 20) << "send_write " << this << " " << m_oid << " "
   << m_object_off << "~" << m_object_len << dendl;

m_state = LIBRBD_AIO_WRITE_FLAT;
guard_write();
add_write_ops(_write);
assert(m_write.size() != 0);

librados::AioCompletion *rados_completion =
  librados::Rados::aio_create_completion(this, NULL, rados_req_cb);
int r = m_ictx->data_ctx.aio_operate(m_oid, rados_completion, _write,
 m_snap_seq, m_snaps);
assert(r == 0);
rados_completion->release();
  }


librados::AioCompletion *librados::Rados::aio_create_completion(void *cb_arg,
callback_t cb_complete,
callback_t cb_safe)
{
  AioCompletionImpl *c;
  int r = rados_aio_create_completion(cb_arg, cb_complete, cb_safe, (void**));
  assert(r == 0);
  return new AioCompletion(c);
}


anything wrong?

Regards,
Dong Wu

2015-12-31 10:33 GMT+08:00 min fang <louisfang2...@gmail.com>:
> yes, the question here is, librbd use the committed callback, as my
> understanding, when this callback returned, librbd write will be looked as
> completed. So I can issue a read IO even if the data is not readable. In
> this case, i would like to know what data will be returned for the read IO?
>
> 2015-12-31 10:29 GMT+08:00 Dong Wu <archer.wud...@gmail.com>:
>>
>> there are two callbacks: committed and applied, committed means write
>> to all replica's journal, applied means write to all replica's file
>> system. so when applied callback return to client, it means data can
>> be read.
>>
>> 2015-12-31 10:15 GMT+08:00 min fang <louisfang2...@gmail.com>:
>> > Hi, as my understanding, write IO will committed data to journal
>> > firstly,
>> > then give a safe callback to ceph client. So it is possible that data
>> > still
>> > in journal when I send a read IO to the same area. So what data will be
>> > returned if the new data still in journal?
>> >
>> > Thanks.
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read IO to object while new data still in journal

2015-12-30 Thread Dong Wu
there are two callbacks: committed and applied, committed means write
to all replica's journal, applied means write to all replica's file
system. so when applied callback return to client, it means data can
be read.

2015-12-31 10:15 GMT+08:00 min fang :
> Hi, as my understanding, write IO will committed data to journal firstly,
> then give a safe callback to ceph client. So it is possible that data still
> in journal when I send a read IO to the same area. So what data will be
> returned if the new data still in journal?
>
> Thanks.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how io works when backfill

2015-12-27 Thread Dong Wu
Hi,
When add osd or remove osd, ceph will backfill to rebalance data.
eg:
- pg1.0[1, 2, 3]
- add an osd(eg. osd.7)
- ceph start backfill, then pg1.0 osd set changes to [1, 2, 7]
- if [a, b, c, d, e] are objects needing to backfill to osd.7 and now
object a is backfilling
- when a write io hits object a, then the io needs to wait for its
complete, then goes on.
- but if io hits object b which has not been backfilled, io reaches
osd.1, then osd.1 send the io to osd.2  and osd.7, but osd.7 does not
have object b, so osd.7 needs to wait for object b to backfilled, then
write. Is it right? Or osd.1 only send the io to osd.2, not both?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] why not add (offset,len) to pglog

2015-12-25 Thread Dong Wu
Thank you for your reply. I am looking formard to Sage's opinion too @sage.
Also I'll keep on with the BlueStore and Kstore's progress.

Regards

2015-12-25 14:48 GMT+08:00 Ning Yao <zay11...@gmail.com>:
> Hi, Dong Wu,
>
> 1. As I currently work for other things, this proposal is abandon for
> a long time
> 2. This is a complicated task as we need to consider a lots such as
> (not just for writeOp, as well as truncate, delete) and also need to
> consider the different affects for different backends(Replicated, EC).
> 3. I don't think it is good time to redo this patch now, since the
> BlueStore and Kstore  is inprogress, and I'm afraid to bring some
> side-effect.  We may prepare and propose the whole design in next CDS.
> 4. Currently, we already have some tricks to deal with recovery (like
> throttle the max recovery op, set the priority for recovery and so
> on). So this kind of patch may not solve the critical problem but just
> make things better, and I am not quite sure that this will really
> bring a big improvement. Based on my previous test, it works
> excellently on slow disk (say hdd), and also for a short-time
> maintaining. Otherwise, it will trigger the backfill process.  So wait
> for Sage's opinion @sage
>
> If you are interest on this, we may cooperate to do this.
>
> Regards
> Ning Yao
>
>
> 2015-12-25 14:23 GMT+08:00 Dong Wu <archer.wud...@gmail.com>:
>> Thanks, from this pull request I learned that this issue is not
>> completed, is there any new progress of this issue?
>>
>> 2015-12-25 12:30 GMT+08:00 Xinze Chi (信泽) <xmdx...@gmail.com>:
>>> Yeah, This is good idea for recovery, but not for backfill.
>>> @YaoNing have pull a request about this
>>> https://github.com/ceph/ceph/pull/3837 this year.
>>>
>>> 2015-12-25 11:16 GMT+08:00 Dong Wu <archer.wud...@gmail.com>:
>>>> Hi,
>>>> I have doubt about pglog, the pglog contains (op,object,version) etc.
>>>> when peering, use pglog to construct missing list,then recover the
>>>> whole object in missing list even if different data among replicas is
>>>> less then a whole object data(eg,4MB).
>>>> why not add (offset,len) to pglog? If so, the missing list can contain
>>>> (object, offset, len), then we can reduce recover data.
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Xinze Chi
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] why not add (offset,len) to pglog

2015-12-24 Thread Dong Wu
Thanks, from this pull request I learned that this issue is not
completed, is there any new progress of this issue?

2015-12-25 12:30 GMT+08:00 Xinze Chi (信泽) <xmdx...@gmail.com>:
> Yeah, This is good idea for recovery, but not for backfill.
> @YaoNing have pull a request about this
> https://github.com/ceph/ceph/pull/3837 this year.
>
> 2015-12-25 11:16 GMT+08:00 Dong Wu <archer.wud...@gmail.com>:
>> Hi,
>> I have doubt about pglog, the pglog contains (op,object,version) etc.
>> when peering, use pglog to construct missing list,then recover the
>> whole object in missing list even if different data among replicas is
>> less then a whole object data(eg,4MB).
>> why not add (offset,len) to pglog? If so, the missing list can contain
>> (object, offset, len), then we can reduce recover data.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Regards,
> Xinze Chi
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] why not add (offset,len) to pglog

2015-12-24 Thread Dong Wu
Hi,
I have doubt about pglog, the pglog contains (op,object,version) etc.
when peering, use pglog to construct missing list,then recover the
whole object in missing list even if different data among replicas is
less then a whole object data(eg,4MB).
why not add (offset,len) to pglog? If so, the missing list can contain
(object, offset, len), then we can reduce recover data.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com