Re: [ceph-users] cephfs slow delete
On Sat, Oct 15, 2016 at 1:36 AM, Heller, Chriswrote: > Just a thought, but since a directory tree is a first class item in cephfs, > could the wire protocol be extended with an “recursive delete” operation, > specifically for cases like this? In principle yes, but the problem is that the POSIX filesystem interface doesn't have a recursive delete operation (we just see a series of individual unlinks), so the complicated part would be making the client clever enough to notice when a series of unlink operations appear to be traversing a particular directory, and batching them up until all files in a directory are unlinked, and then finally sending a recursive unlink message to the server. John > > On 10/14/16, 4:16 PM, "Gregory Farnum" wrote: > > On Fri, Oct 14, 2016 at 1:11 PM, Heller, Chris wrote: > > Ok. Since I’m running through the Hadoop/ceph api, there is no syscall > boundary so there is a simple place to improve the throughput here. Good to > know, I’ll work on a patch… > > Ah yeah, if you're in whatever they call the recursive tree delete > function you can unroll that loop a whole bunch. I forget where the > boundary is so you may need to go play with the JNI code; not sure. > -Greg > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
On Fri, Oct 14, 2016 at 7:45 PM, <mykola.dvor...@gmail.com> wrote: > I was doing parallel deletes until the point when there are >1M objects in > the stry. Then delete fails with ‘no space left’ error. If one would > deep-scrub those pgs containing corresponidng metadata, they turn to be > inconsistent. In worst case one would get virtually empty folders that have > size of 16EB. Those are impossible to delete as they are ‘non empty’. Yeah, as far as I can tell these are unrelated. You just got unlucky. :) -Greg > > > > -Mykola > > > > From: Gregory Farnum > Sent: Saturday, 15 October 2016 05:02 > To: Mykola Dvornik > Cc: Heller, Chris; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] cephfs slow delete > > > > On Fri, Oct 14, 2016 at 6:26 PM, <mykola.dvor...@gmail.com> wrote: > >> If you are running 10.2.3 on your cluster, then I would strongly recommend > >> to NOT delete files in parallel as you might hit > >> http://tracker.ceph.com/issues/17177 > > > > I don't think these have anything to do with each other. What gave you > > the idea simultaneous deletes could invoke that issue? > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
I was doing parallel deletes until the point when there are >1M objects in the stry. Then delete fails with ‘no space left’ error. If one would deep-scrub those pgs containing corresponidng metadata, they turn to be inconsistent. In worst case one would get virtually empty folders that have size of 16EB. Those are impossible to delete as they are ‘non empty’. -Mykola From: Gregory Farnum Sent: Saturday, 15 October 2016 05:02 To: Mykola Dvornik Cc: Heller, Chris; ceph-users@lists.ceph.com Subject: Re: [ceph-users] cephfs slow delete On Fri, Oct 14, 2016 at 6:26 PM, <mykola.dvor...@gmail.com> wrote: > If you are running 10.2.3 on your cluster, then I would strongly recommend > to NOT delete files in parallel as you might hit > http://tracker.ceph.com/issues/17177 I don't think these have anything to do with each other. What gave you the idea simultaneous deletes could invoke that issue? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
On Fri, Oct 14, 2016 at 6:26 PM,wrote: > If you are running 10.2.3 on your cluster, then I would strongly recommend > to NOT delete files in parallel as you might hit > http://tracker.ceph.com/issues/17177 I don't think these have anything to do with each other. What gave you the idea simultaneous deletes could invoke that issue? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
If you are running 10.2.3 on your cluster, then I would strongly recommend to NOT delete files in parallel as you might hit http://tracker.ceph.com/issues/17177 -Mykola From: Heller, Chris Sent: Saturday, 15 October 2016 03:36 To: Gregory Farnum Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] cephfs slow delete Just a thought, but since a directory tree is a first class item in cephfs, could the wire protocol be extended with an “recursive delete” operation, specifically for cases like this? On 10/14/16, 4:16 PM, "Gregory Farnum" <gfar...@redhat.com> wrote: On Fri, Oct 14, 2016 at 1:11 PM, Heller, Chris <chel...@akamai.com> wrote: > Ok. Since I’m running through the Hadoop/ceph api, there is no syscall boundary so there is a simple place to improve the throughput here. Good to know, I’ll work on a patch… Ah yeah, if you're in whatever they call the recursive tree delete function you can unroll that loop a whole bunch. I forget where the boundary is so you may need to go play with the JNI code; not sure. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
Just a thought, but since a directory tree is a first class item in cephfs, could the wire protocol be extended with an “recursive delete” operation, specifically for cases like this? On 10/14/16, 4:16 PM, "Gregory Farnum"wrote: On Fri, Oct 14, 2016 at 1:11 PM, Heller, Chris wrote: > Ok. Since I’m running through the Hadoop/ceph api, there is no syscall boundary so there is a simple place to improve the throughput here. Good to know, I’ll work on a patch… Ah yeah, if you're in whatever they call the recursive tree delete function you can unroll that loop a whole bunch. I forget where the boundary is so you may need to go play with the JNI code; not sure. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
On Fri, Oct 14, 2016 at 1:11 PM, Heller, Chriswrote: > Ok. Since I’m running through the Hadoop/ceph api, there is no syscall > boundary so there is a simple place to improve the throughput here. Good to > know, I’ll work on a patch… Ah yeah, if you're in whatever they call the recursive tree delete function you can unroll that loop a whole bunch. I forget where the boundary is so you may need to go play with the JNI code; not sure. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
Ok. Since I’m running through the Hadoop/ceph api, there is no syscall boundary so there is a simple place to improve the throughput here. Good to know, I’ll work on a patch… On 10/14/16, 3:58 PM, "Gregory Farnum"wrote: On Fri, Oct 14, 2016 at 11:41 AM, Heller, Chris wrote: > Unfortunately, it was all in the unlink operation. Looks as if it took nearly 20 hours to remove the dir, roundtrip is a killer there. What can be done to reduce RTT to the MDS? Does the client really have to sequentially delete directories or can it have internal batching or parallelization? It's bound by the same syscall APIs as anything else. You can spin off multiple deleters; I'd either keep them on one client (if you want to work within a single directory) or if using multiple clients assign them to different portions of the hierarchy. That will let you parallelize across the IO latency until you hit a cap on the MDS' total throughput (should be 1-10k deletes/s based on latest tests IIRC). -Greg > > -Chris > > On 10/13/16, 4:22 PM, "Gregory Farnum" wrote: > > On Thu, Oct 13, 2016 at 12:44 PM, Heller, Chris wrote: > > I have a directory I’ve been trying to remove from cephfs (via > > cephfs-hadoop), the directory is a few hundred gigabytes in size and > > contains a few million files, but not in a single sub directory. I startd > > the delete yesterday at around 6:30 EST, and it’s still progressing. I can > > see from (ceph osd df) that the overall data usage on my cluster is > > decreasing, but at the rate its going it will be a month before the entire > > sub directory is gone. Is a recursive delete of a directory known to be a > > slow operation in CephFS or have I hit upon some bad configuration? What > > steps can I take to better debug this scenario? > > Is it the actual unlink operation taking a long time, or just the > reduction in used space? Unlinks require a round trip to the MDS > unfortunately, but you should be able to speed things up at least some > by issuing them in parallel on different directories. > > If it's the used space, you can let the MDS issue more RADOS delete > ops by adjusting the "mds max purge files" and "mds max purge ops" > config values. > -Greg > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
On Fri, Oct 14, 2016 at 11:41 AM, Heller, Chriswrote: > Unfortunately, it was all in the unlink operation. Looks as if it took nearly > 20 hours to remove the dir, roundtrip is a killer there. What can be done to > reduce RTT to the MDS? Does the client really have to sequentially delete > directories or can it have internal batching or parallelization? It's bound by the same syscall APIs as anything else. You can spin off multiple deleters; I'd either keep them on one client (if you want to work within a single directory) or if using multiple clients assign them to different portions of the hierarchy. That will let you parallelize across the IO latency until you hit a cap on the MDS' total throughput (should be 1-10k deletes/s based on latest tests IIRC). -Greg > > -Chris > > On 10/13/16, 4:22 PM, "Gregory Farnum" wrote: > > On Thu, Oct 13, 2016 at 12:44 PM, Heller, Chris > wrote: > > I have a directory I’ve been trying to remove from cephfs (via > > cephfs-hadoop), the directory is a few hundred gigabytes in size and > > contains a few million files, but not in a single sub directory. I > startd > > the delete yesterday at around 6:30 EST, and it’s still progressing. I > can > > see from (ceph osd df) that the overall data usage on my cluster is > > decreasing, but at the rate its going it will be a month before the > entire > > sub directory is gone. Is a recursive delete of a directory known to be > a > > slow operation in CephFS or have I hit upon some bad configuration? What > > steps can I take to better debug this scenario? > > Is it the actual unlink operation taking a long time, or just the > reduction in used space? Unlinks require a round trip to the MDS > unfortunately, but you should be able to speed things up at least some > by issuing them in parallel on different directories. > > If it's the used space, you can let the MDS issue more RADOS delete > ops by adjusting the "mds max purge files" and "mds max purge ops" > config values. > -Greg > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
Unfortunately, it was all in the unlink operation. Looks as if it took nearly 20 hours to remove the dir, roundtrip is a killer there. What can be done to reduce RTT to the MDS? Does the client really have to sequentially delete directories or can it have internal batching or parallelization? -Chris On 10/13/16, 4:22 PM, "Gregory Farnum"wrote: On Thu, Oct 13, 2016 at 12:44 PM, Heller, Chris wrote: > I have a directory I’ve been trying to remove from cephfs (via > cephfs-hadoop), the directory is a few hundred gigabytes in size and > contains a few million files, but not in a single sub directory. I startd > the delete yesterday at around 6:30 EST, and it’s still progressing. I can > see from (ceph osd df) that the overall data usage on my cluster is > decreasing, but at the rate its going it will be a month before the entire > sub directory is gone. Is a recursive delete of a directory known to be a > slow operation in CephFS or have I hit upon some bad configuration? What > steps can I take to better debug this scenario? Is it the actual unlink operation taking a long time, or just the reduction in used space? Unlinks require a round trip to the MDS unfortunately, but you should be able to speed things up at least some by issuing them in parallel on different directories. If it's the used space, you can let the MDS issue more RADOS delete ops by adjusting the "mds max purge files" and "mds max purge ops" config values. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
On Thu, Oct 13, 2016 at 12:44 PM, Heller, Chriswrote: > I have a directory I’ve been trying to remove from cephfs (via > cephfs-hadoop), the directory is a few hundred gigabytes in size and > contains a few million files, but not in a single sub directory. I startd > the delete yesterday at around 6:30 EST, and it’s still progressing. I can > see from (ceph osd df) that the overall data usage on my cluster is > decreasing, but at the rate its going it will be a month before the entire > sub directory is gone. Is a recursive delete of a directory known to be a > slow operation in CephFS or have I hit upon some bad configuration? What > steps can I take to better debug this scenario? Is it the actual unlink operation taking a long time, or just the reduction in used space? Unlinks require a round trip to the MDS unfortunately, but you should be able to speed things up at least some by issuing them in parallel on different directories. If it's the used space, you can let the MDS issue more RADOS delete ops by adjusting the "mds max purge files" and "mds max purge ops" config values. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs slow delete
I have a directory I’ve been trying to remove from cephfs (via cephfs-hadoop), the directory is a few hundred gigabytes in size and contains a few million files, but not in a single sub directory. I startd the delete yesterday at around 6:30 EST, and it’s still progressing. I can see from (ceph osd df) that the overall data usage on my cluster is decreasing, but at the rate its going it will be a month before the entire sub directory is gone. Is a recursive delete of a directory known to be a slow operation in CephFS or have I hit upon some bad configuration? What steps can I take to better debug this scenario? -Chris ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com