Re: [ceph-users] why not add (offset,len) to pglog

2015-12-25 Thread Dong Wu
Thank you for your reply. I am looking formard to Sage's opinion too @sage.
Also I'll keep on with the BlueStore and Kstore's progress.

Regards

2015-12-25 14:48 GMT+08:00 Ning Yao :
> Hi, Dong Wu,
>
> 1. As I currently work for other things, this proposal is abandon for
> a long time
> 2. This is a complicated task as we need to consider a lots such as
> (not just for writeOp, as well as truncate, delete) and also need to
> consider the different affects for different backends(Replicated, EC).
> 3. I don't think it is good time to redo this patch now, since the
> BlueStore and Kstore  is inprogress, and I'm afraid to bring some
> side-effect.  We may prepare and propose the whole design in next CDS.
> 4. Currently, we already have some tricks to deal with recovery (like
> throttle the max recovery op, set the priority for recovery and so
> on). So this kind of patch may not solve the critical problem but just
> make things better, and I am not quite sure that this will really
> bring a big improvement. Based on my previous test, it works
> excellently on slow disk (say hdd), and also for a short-time
> maintaining. Otherwise, it will trigger the backfill process.  So wait
> for Sage's opinion @sage
>
> If you are interest on this, we may cooperate to do this.
>
> Regards
> Ning Yao
>
>
> 2015-12-25 14:23 GMT+08:00 Dong Wu :
>> Thanks, from this pull request I learned that this issue is not
>> completed, is there any new progress of this issue?
>>
>> 2015-12-25 12:30 GMT+08:00 Xinze Chi (信泽) :
>>> Yeah, This is good idea for recovery, but not for backfill.
>>> @YaoNing have pull a request about this
>>> https://github.com/ceph/ceph/pull/3837 this year.
>>>
>>> 2015-12-25 11:16 GMT+08:00 Dong Wu :
 Hi,
 I have doubt about pglog, the pglog contains (op,object,version) etc.
 when peering, use pglog to construct missing list,then recover the
 whole object in missing list even if different data among replicas is
 less then a whole object data(eg,4MB).
 why not add (offset,len) to pglog? If so, the missing list can contain
 (object, offset, len), then we can reduce recover data.
 ___
 ceph-users mailing list
 ceph-us...@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Xinze Chi
>> ___
>> ceph-users mailing list
>> ceph-us...@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] why not add (offset,len) to pglog

2015-12-25 Thread Sage Weil
On Fri, 25 Dec 2015, Ning Yao wrote:
> Hi, Dong Wu,
> 
> 1. As I currently work for other things, this proposal is abandon for
> a long time
> 2. This is a complicated task as we need to consider a lots such as
> (not just for writeOp, as well as truncate, delete) and also need to
> consider the different affects for different backends(Replicated, EC).
> 3. I don't think it is good time to redo this patch now, since the
> BlueStore and Kstore  is inprogress, and I'm afraid to bring some
> side-effect.  We may prepare and propose the whole design in next CDS.
> 4. Currently, we already have some tricks to deal with recovery (like
> throttle the max recovery op, set the priority for recovery and so
> on). So this kind of patch may not solve the critical problem but just
> make things better, and I am not quite sure that this will really
> bring a big improvement. Based on my previous test, it works
> excellently on slow disk (say hdd), and also for a short-time
> maintaining. Otherwise, it will trigger the backfill process.  So wait
> for Sage's opinion @sage
> 
> If you are interest on this, we may cooperate to do this.

I think it's a great idea.  We didn't do it before only because it is 
complicated.  The good news is that if we can't conclusively infer exactly 
which parts of hte object need to be recovered from the log entry we can 
always just fall back to recovering the whole thing.  Also, the place 
where this is currently most visible is RBD small writes:

 - osd goes down
 - client sends a 4k overwrite and modifies an object
 - osd comes back up
 - client sends another 4k overwrite
 - client io blocks while osd recovers 4mb

So even if we initially ignore truncate and omap and EC and clones and 
anything else complicated I suspect we'll get a nice benefit.

I haven't thought about this too much, but my guess is that the hard part 
is making the primary's missing set representation include a partial delta 
(say, an interval_set<> indicating which ranges of the file have changed) 
in a way that gracefully degrades to recovering the whole object if we're 
not sure.

In any case, we should definitely have the design conversation!

sage

> 
> Regards
> Ning Yao
> 
> 
> 2015-12-25 14:23 GMT+08:00 Dong Wu :
> > Thanks, from this pull request I learned that this issue is not
> > completed, is there any new progress of this issue?
> >
> > 2015-12-25 12:30 GMT+08:00 Xinze Chi (??) :
> >> Yeah, This is good idea for recovery, but not for backfill.
> >> @YaoNing have pull a request about this
> >> https://github.com/ceph/ceph/pull/3837 this year.
> >>
> >> 2015-12-25 11:16 GMT+08:00 Dong Wu :
> >>> Hi,
> >>> I have doubt about pglog, the pglog contains (op,object,version) etc.
> >>> when peering, use pglog to construct missing list,then recover the
> >>> whole object in missing list even if different data among replicas is
> >>> less then a whole object data(eg,4MB).
> >>> why not add (offset,len) to pglog? If so, the missing list can contain
> >>> (object, offset, len), then we can reduce recover data.
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-us...@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Xinze Chi
> > ___
> > ceph-users mailing list
> > ceph-us...@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] why not add (offset,len) to pglog

2015-12-24 Thread Ning Yao
Hi, Dong Wu,

1. As I currently work for other things, this proposal is abandon for
a long time
2. This is a complicated task as we need to consider a lots such as
(not just for writeOp, as well as truncate, delete) and also need to
consider the different affects for different backends(Replicated, EC).
3. I don't think it is good time to redo this patch now, since the
BlueStore and Kstore  is inprogress, and I'm afraid to bring some
side-effect.  We may prepare and propose the whole design in next CDS.
4. Currently, we already have some tricks to deal with recovery (like
throttle the max recovery op, set the priority for recovery and so
on). So this kind of patch may not solve the critical problem but just
make things better, and I am not quite sure that this will really
bring a big improvement. Based on my previous test, it works
excellently on slow disk (say hdd), and also for a short-time
maintaining. Otherwise, it will trigger the backfill process.  So wait
for Sage's opinion @sage

If you are interest on this, we may cooperate to do this.

Regards
Ning Yao


2015-12-25 14:23 GMT+08:00 Dong Wu :
> Thanks, from this pull request I learned that this issue is not
> completed, is there any new progress of this issue?
>
> 2015-12-25 12:30 GMT+08:00 Xinze Chi (信泽) :
>> Yeah, This is good idea for recovery, but not for backfill.
>> @YaoNing have pull a request about this
>> https://github.com/ceph/ceph/pull/3837 this year.
>>
>> 2015-12-25 11:16 GMT+08:00 Dong Wu :
>>> Hi,
>>> I have doubt about pglog, the pglog contains (op,object,version) etc.
>>> when peering, use pglog to construct missing list,then recover the
>>> whole object in missing list even if different data among replicas is
>>> less then a whole object data(eg,4MB).
>>> why not add (offset,len) to pglog? If so, the missing list can contain
>>> (object, offset, len), then we can reduce recover data.
>>> ___
>>> ceph-users mailing list
>>> ceph-us...@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Regards,
>> Xinze Chi
> ___
> ceph-users mailing list
> ceph-us...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] why not add (offset,len) to pglog

2015-12-24 Thread Dong Wu
Thanks, from this pull request I learned that this issue is not
completed, is there any new progress of this issue?

2015-12-25 12:30 GMT+08:00 Xinze Chi (信泽) :
> Yeah, This is good idea for recovery, but not for backfill.
> @YaoNing have pull a request about this
> https://github.com/ceph/ceph/pull/3837 this year.
>
> 2015-12-25 11:16 GMT+08:00 Dong Wu :
>> Hi,
>> I have doubt about pglog, the pglog contains (op,object,version) etc.
>> when peering, use pglog to construct missing list,then recover the
>> whole object in missing list even if different data among replicas is
>> less then a whole object data(eg,4MB).
>> why not add (offset,len) to pglog? If so, the missing list can contain
>> (object, offset, len), then we can reduce recover data.
>> ___
>> ceph-users mailing list
>> ceph-us...@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Regards,
> Xinze Chi
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] why not add (offset,len) to pglog

2015-12-24 Thread 信泽
Yeah, This is good idea for recovery, but not for backfill.
@YaoNing have pull a request about this
https://github.com/ceph/ceph/pull/3837 this year.

2015-12-25 11:16 GMT+08:00 Dong Wu :
> Hi,
> I have doubt about pglog, the pglog contains (op,object,version) etc.
> when peering, use pglog to construct missing list,then recover the
> whole object in missing list even if different data among replicas is
> less then a whole object data(eg,4MB).
> why not add (offset,len) to pglog? If so, the missing list can contain
> (object, offset, len), then we can reduce recover data.
> ___
> ceph-users mailing list
> ceph-us...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Regards,
Xinze Chi
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html