Re: [ceph-users] Revert a CephFS snapshot?

2019-12-03 Thread Luis Henriques
On Tue, Dec 03, 2019 at 02:09:30PM -0500, Jeff Layton wrote:
> On Tue, 2019-12-03 at 07:59 -0800, Robert LeBlanc wrote:
> > On Thu, Nov 14, 2019 at 11:48 AM Sage Weil  wrote:
> > > On Thu, 14 Nov 2019, Patrick Donnelly wrote:
> > > > On Wed, Nov 13, 2019 at 6:36 PM Jerry Lee  
> > > > wrote:
> > > > >
> > > > > On Thu, 14 Nov 2019 at 07:07, Patrick Donnelly  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Nov 13, 2019 at 2:30 AM Jerry Lee  
> > > > > > wrote:
> > > > > > > Recently, I'm evaluating the snpahsot feature of CephFS from 
> > > > > > > kernel
> > > > > > > client and everthing works like a charm.  But, it seems that 
> > > > > > > reverting
> > > > > > > a snapshot is not available currently.  Is there some reason or
> > > > > > > technical limitation that the feature is not provided?  Any 
> > > > > > > insights
> > > > > > > or ideas are appreciated.
> > > > > >
> > > > > > Please provide more information about what you tried to do (commands
> > > > > > run) and how it surprised you.
> > > > >
> > > > > The thing I would like to do is to rollback a snapped directory to a
> > > > > previous version of snapshot.  It looks like the operation can be done
> > > > > by over-writting all the current version of files/directories from a
> > > > > previous snapshot via cp.  But cp may take lots of time when there are
> > > > > many files and directories in the target directory.  Is there any
> > > > > possibility to achieve the goal much faster from the CephFS internal
> > > > > via command like "ceph fs   snap rollback
> > > > > " (just a example)?  Thank you!
> > > > 
> > > > RADOS doesn't support rollback of snapshots so it needs to be done
> > > > manually. The best tool to do this would probably be rsync of the
> > > > .snap directory with appropriate options including deletion of files
> > > > that do not exist in the source (snapshot).
> > > 
> > > rsync is the best bet now, yeah.
> > > 
> > > RADOS does have a rollback operation that uses clone where it can, but 
> > > it's a per-object operation, so something still needs to walk the 
> > > hierarchy and roll back each file's content.  The MDS could do this more 
> > > efficiently than rsync give what it knows about the snapped inodes 
> > > (skipping untouched inodes or, eventually, entire subtrees) but it's a 
> > > non-trivial amount of work to implement.
> > > 
> > 
> > Would it make sense to extend CephFS to leverage reflinks for cases like 
> > this? That could be faster than rsync and more space efficient. It would 
> > require some development time though.
> > 
> 
> I think reflink would be hard. Ceph hardcodes the inode number into the
> object name of the backing objects, so sharing between different inode
> numbers is really difficult to do. It could be done, but it means a new
> in-object-store layout scheme.
> 
> That said...I wonder if we could get better performance by just
> converting rsync to use copy_file_range in this situation. That has the
> potential to offload a lot of the actual copying work to the OSDs. 

Just to add my 2 cents, I haven't done any serious performance
measurements with copy_file_range.  However, the very limited
observations I've done surprised me a bit, showing that performance
isn't great.  In fact, when file objects size is small, using
copy_file_range seems to be slower than a full read+write cycle.

It's still on my TODO list to do some more serious performance analysis
and figure out why.  It didn't seemed to be an issue on the client side,
but I don't really have any real evidences.  Once the COPY_FROM2
operation is stable, I can plan to spend some time on this.

Cheers,
--
Luís
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Quotas on Subdirectories

2019-02-26 Thread Luis Henriques
Hendrik Peyerl  writes:

> Thank you Ramana and Luis for your quick reply.
>
> @ Ramana: I have a quota for 300G for this specific environment, I dont want 
> to
> split this into 100G quotas for all the subdirectories as i cannot yet forsee
> how big they will be.
>
> @ Luis: The Client has access to the Environment directory as you can
> see from the Client Caps I sent aswell.

Hmm.. Ok, I misunderstood your issue.

I've done a quick test and the fuse client seems to be able to handle
this scenario correctly, so I've created a bug in the tracker[1].  I'll
investigate and see if this can be fixed.

[1] https://tracker.ceph.com/issues/38482

Cheers,
-- 
Luis


>
> Thanks and best regards,
>
> Hendrik
>
>> On 26. Feb 2019, at 11:11, Luis Henriques  wrote:
>> 
>> On Tue, Feb 26, 2019 at 03:47:31AM -0500, Ramana Raja wrote:
>>> On Tue, Feb 26, 2019 at 1:38 PM, Hendrik Peyerl  
>>> wrote: 
>>>> 
>>>> Hello All,
>>>> 
>>>> I am having some troubles with Ceph Quotas not working on subdirectories. I
>>>> am running with the following directory tree:
>>>> 
>>>> - customer
>>>>  - project
>>>>- environment
>>>>  - application1
>>>>  - application2
>>>>  - applicationx
>>>> 
>>>> I set a quota on environment which works perfectly fine, the client sees 
>>>> the
>>>> quota and is not breaching it. The problem starts when I try to mount a
>>>> subdirectory like application1, this directory does not have any quota at
>>>> all.
>>>> Is there a possibility to set a quota for environment so that the 
>>>> application
>>>> directories will not be able to go over that quota?
>>> 
>>> Can you set quotas on the application directories as well?
>>> setfattr -n ceph.quota.max_bytes -v  
>>> /environment/application1 
>> 
>> Right, that would work of course.  The client needs to have access to
>> the 'environment' directory inode in order to enforce quotas, otherwise
>> it won't be aware of the existence of any quotas at all.  See
>> "Limitations" (#4 in particular) in
>> 
>> http://docs.ceph.com/docs/master/cephfs/quota/
>> 
>> Cheers,
>> --
>> Luís
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Quotas on Subdirectories

2019-02-26 Thread Luis Henriques
On Tue, Feb 26, 2019 at 03:47:31AM -0500, Ramana Raja wrote:
> On Tue, Feb 26, 2019 at 1:38 PM, Hendrik Peyerl  wrote: 
> > 
> > Hello All,
> > 
> > I am having some troubles with Ceph Quotas not working on subdirectories. I
> > am running with the following directory tree:
> > 
> > - customer
> >   - project
> > - environment
> >   - application1
> >   - application2
> >   - applicationx
> > 
> > I set a quota on environment which works perfectly fine, the client sees the
> > quota and is not breaching it. The problem starts when I try to mount a
> > subdirectory like application1, this directory does not have any quota at
> > all.
> > Is there a possibility to set a quota for environment so that the 
> > application
> > directories will not be able to go over that quota?
> 
> Can you set quotas on the application directories as well?
> setfattr -n ceph.quota.max_bytes -v  
> /environment/application1 

Right, that would work of course.  The client needs to have access to
the 'environment' directory inode in order to enforce quotas, otherwise
it won't be aware of the existence of any quotas at all.  See
"Limitations" (#4 in particular) in

 http://docs.ceph.com/docs/master/cephfs/quota/

Cheers,
--
Luís
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs quota limit

2018-11-07 Thread Luis Henriques
Jan Fajerski  writes:

> On Tue, Nov 06, 2018 at 08:57:48PM +0800, Zhenshi Zhou wrote:
>>   Hi,
>>   I'm wondering whether cephfs have quota limit options.
>>   I use kernel client and ceph version is 12.2.8.
>>   Thanks
> CephFS has quota support, see 
> http://docs.ceph.com/docs/luminous/cephfs/quota/.
> The kernel has recently gained CephFS quota support too (before only the fuse
> client supported it) so it depends on your distro and kernel version.

Correct, in order to have support for quotas using a kernel client
you'll need to meet 2 requirements:

 - kernel >= 4.17 (or have the relevant patches backported)
 - ceph version >= mimic

i.e. a kernel client 4.17 won't support quotas on a luminous-based
cluster.  The quota documentation for mimic states this too:

  http://docs.ceph.com/docs/mimic/cephfs/quota/

-- 
Luis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Re : general protection fault: 0000 [#1] SMP

2017-10-12 Thread Luis Henriques
Olivier Bonvalet  writes:

> Le jeudi 12 octobre 2017 à 09:12 +0200, Ilya Dryomov a écrit :
>> It's a crash in memcpy() in skb_copy_ubufs().  It's not in ceph, but
>> ceph-induced, it looks like.  I don't remember seeing anything
>> similar
>> in the context of krbd.
>> 
>> This is a Xen dom0 kernel, right?  What did the workload look like?
>> Can you provide dmesg before the crash?
>
> Hi,
>
> yes it's a Xen dom0 kernel. Linux 4.13.3, Xen 4.8.2, with an old
> 0.94.10 Ceph (so, Hammer).
>
> Before this error, I add this in logs :
>
> Oct 11 16:00:41 lorunde kernel: [310548.899082] libceph: read_partial_message 
> 88021a910200 data crc 2306836368 != exp. 2215155875
> Oct 11 16:00:41 lorunde kernel: [310548.899841] libceph: osd117 
> 10.0.0.31:6804 bad crc/signature
> Oct 11 16:02:25 lorunde kernel: [310652.695015] libceph: read_partial_message 
> 880220b10100 data crc 842840543 != exp. 2657161714
> Oct 11 16:02:25 lorunde kernel: [310652.695731] libceph: osd3 10.0.0.26:6804 
> bad crc/signature
> Oct 11 16:07:24 lorunde kernel: [310952.485202] libceph: read_partial_message 
> 88025d1aa400 data crc 938978341 != exp. 4154366769
> Oct 11 16:07:24 lorunde kernel: [310952.485870] libceph: osd117 
> 10.0.0.31:6804 bad crc/signature
> Oct 11 16:10:44 lorunde kernel: [311151.841812] libceph: read_partial_message 
> 880260300400 data crc 2988747958 != exp. 319958859
> Oct 11 16:10:44 lorunde kernel: [311151.842672] libceph: osd9 10.0.0.51:6802 
> bad crc/signature
> Oct 11 16:10:57 lorunde kernel: [311165.211412] libceph: read_partial_message 
> 8802208b8300 data crc 369498361 != exp. 906022772
> Oct 11 16:10:57 lorunde kernel: [311165.212135] libceph: osd87 10.0.0.5:6800 
> bad crc/signature
> Oct 11 16:12:27 lorunde kernel: [311254.635767] libceph: read_partial_message 
> 880236f9a000 data crc 2586662963 != exp. 2886241494
> Oct 11 16:12:27 lorunde kernel: [311254.636493] libceph: osd90 10.0.0.5:6814 
> bad crc/signature
> Oct 11 16:14:31 lorunde kernel: [311378.808191] libceph: read_partial_message 
> 88027e633c00 data crc 1102363051 != exp. 679243837
> Oct 11 16:14:31 lorunde kernel: [311378.808889] libceph: osd13 10.0.0.21:6804 
> bad crc/signature
> Oct 11 16:15:01 lorunde kernel: [311409.431034] libceph: read_partial_message 
> 88024ce0a800 data crc 2467415342 != exp. 1753860323
> Oct 11 16:15:01 lorunde kernel: [311409.431718] libceph: osd111 
> 10.0.0.30:6804 bad crc/signature
> Oct 11 16:15:11 lorunde kernel: [311418.891238] general protection fault: 
>  [#1] SMP
>
>
> We had to switch to TCP Cubic (instead of badly configured TCP BBR, without 
> FQ), to reduce the data crc errors.
> But since we still had some errors, last night we rebooted all the OSD nodes 
> in Linux 4.4.91, instead of Linux 4.9.47 & 4.9.53.
>
> Since the last 7 hours, we haven't got any data crc errors from OSD, but we 
> had one from a MON. Without hang/crash.

Since there are a bunch of errors before the GPF I suspect this bug is
related to some error paths that haven't been thoroughly tested (as it is
the case for error paths in general I guess).

My initial guess was a race in ceph_con_workfn:

 - An error returned from try_read() would cause a delayed retry (in
   function con_fault())
 - con_fault_finish() would then trigger a ceph_con_close/ceph_con_open in
   osd_fault.
 - the delayed retry kicks-in and the above close+open, which includes
   releasing con->in_msg and con->out_msg, could cause this GPF.

Unfortunately, I wasn't yet able to find any race there (probably because
there's none), but maybe there's a small window where this could occur.

I wonder if this occurred only once, or if this is something that is
easily triggerable.

Cheers,
-- 
Luis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com