Re: [ceph-users] cephfs slow delete

2016-10-16 Thread John Spray
On Sat, Oct 15, 2016 at 1:36 AM, Heller, Chris  wrote:
> Just a thought, but since a directory tree is a first class item in cephfs, 
> could the wire protocol be extended with an “recursive delete” operation, 
> specifically for cases like this?

In principle yes, but the problem is that the POSIX filesystem
interface doesn't have a recursive delete operation (we just see a
series of individual unlinks), so the complicated part would be making
the client clever enough to notice when a series of unlink operations
appear to be traversing a particular directory, and batching them up
until all files in a directory are unlinked, and then finally sending
a recursive unlink message to the server.

John

>
> On 10/14/16, 4:16 PM, "Gregory Farnum"  wrote:
>
> On Fri, Oct 14, 2016 at 1:11 PM, Heller, Chris  wrote:
> > Ok. Since I’m running through the Hadoop/ceph api, there is no syscall 
> boundary so there is a simple place to improve the throughput here. Good to 
> know, I’ll work on a patch…
>
> Ah yeah, if you're in whatever they call the recursive tree delete
> function you can unroll that loop a whole bunch. I forget where the
> boundary is so you may need to go play with the JNI code; not sure.
> -Greg
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs slow delete

2016-10-14 Thread Gregory Farnum
On Fri, Oct 14, 2016 at 7:45 PM,  <mykola.dvor...@gmail.com> wrote:
> I was doing parallel deletes until the point when there are >1M objects in
> the stry. Then delete fails with ‘no space left’ error. If one would
> deep-scrub those pgs containing corresponidng metadata, they turn to be
> inconsistent. In worst case one would get virtually empty folders that have
> size of 16EB. Those are impossible to delete as they are ‘non empty’.

Yeah, as far as I can tell these are unrelated. You just got unlucky. :)
-Greg

>
>
>
> -Mykola
>
>
>
> From: Gregory Farnum
> Sent: Saturday, 15 October 2016 05:02
> To: Mykola Dvornik
> Cc: Heller, Chris; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] cephfs slow delete
>
>
>
> On Fri, Oct 14, 2016 at 6:26 PM,  <mykola.dvor...@gmail.com> wrote:
>
>> If you are running 10.2.3 on your cluster, then I would strongly recommend
>
>> to NOT delete files in parallel as you might hit
>
>> http://tracker.ceph.com/issues/17177
>
>
>
> I don't think these have anything to do with each other. What gave you
>
> the idea simultaneous deletes could invoke that issue?
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs slow delete

2016-10-14 Thread mykola.dvornik
I was doing parallel deletes until the point when there are >1M objects in the 
stry. Then delete fails with ‘no space left’ error. If one would deep-scrub 
those pgs containing corresponidng metadata, they turn to be inconsistent. In 
worst case one would get virtually empty folders that have size of 16EB. Those 
are impossible to delete as they are ‘non empty’. 

-Mykola

From: Gregory Farnum
Sent: Saturday, 15 October 2016 05:02
To: Mykola Dvornik
Cc: Heller, Chris; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] cephfs slow delete

On Fri, Oct 14, 2016 at 6:26 PM,  <mykola.dvor...@gmail.com> wrote:
> If you are running 10.2.3 on your cluster, then I would strongly recommend
> to NOT delete files in parallel as you might hit
> http://tracker.ceph.com/issues/17177

I don't think these have anything to do with each other. What gave you
the idea simultaneous deletes could invoke that issue?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs slow delete

2016-10-14 Thread Gregory Farnum
On Fri, Oct 14, 2016 at 6:26 PM,   wrote:
> If you are running 10.2.3 on your cluster, then I would strongly recommend
> to NOT delete files in parallel as you might hit
> http://tracker.ceph.com/issues/17177

I don't think these have anything to do with each other. What gave you
the idea simultaneous deletes could invoke that issue?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs slow delete

2016-10-14 Thread mykola.dvornik
If you are running 10.2.3 on your cluster, then I would strongly recommend to 
NOT delete files in parallel as you might hit 
http://tracker.ceph.com/issues/17177

-Mykola

From: Heller, Chris
Sent: Saturday, 15 October 2016 03:36
To: Gregory Farnum
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] cephfs slow delete

Just a thought, but since a directory tree is a first class item in cephfs, 
could the wire protocol be extended with an “recursive delete” operation, 
specifically for cases like this?

On 10/14/16, 4:16 PM, "Gregory Farnum" <gfar...@redhat.com> wrote:

On Fri, Oct 14, 2016 at 1:11 PM, Heller, Chris <chel...@akamai.com> wrote:
> Ok. Since I’m running through the Hadoop/ceph api, there is no syscall 
boundary so there is a simple place to improve the throughput here. Good to 
know, I’ll work on a patch…

Ah yeah, if you're in whatever they call the recursive tree delete
function you can unroll that loop a whole bunch. I forget where the
boundary is so you may need to go play with the JNI code; not sure.
-Greg


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs slow delete

2016-10-14 Thread Heller, Chris
Just a thought, but since a directory tree is a first class item in cephfs, 
could the wire protocol be extended with an “recursive delete” operation, 
specifically for cases like this?

On 10/14/16, 4:16 PM, "Gregory Farnum"  wrote:

On Fri, Oct 14, 2016 at 1:11 PM, Heller, Chris  wrote:
> Ok. Since I’m running through the Hadoop/ceph api, there is no syscall 
boundary so there is a simple place to improve the throughput here. Good to 
know, I’ll work on a patch…

Ah yeah, if you're in whatever they call the recursive tree delete
function you can unroll that loop a whole bunch. I forget where the
boundary is so you may need to go play with the JNI code; not sure.
-Greg


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs slow delete

2016-10-14 Thread Gregory Farnum
On Fri, Oct 14, 2016 at 1:11 PM, Heller, Chris  wrote:
> Ok. Since I’m running through the Hadoop/ceph api, there is no syscall 
> boundary so there is a simple place to improve the throughput here. Good to 
> know, I’ll work on a patch…

Ah yeah, if you're in whatever they call the recursive tree delete
function you can unroll that loop a whole bunch. I forget where the
boundary is so you may need to go play with the JNI code; not sure.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs slow delete

2016-10-14 Thread Heller, Chris
Ok. Since I’m running through the Hadoop/ceph api, there is no syscall boundary 
so there is a simple place to improve the throughput here. Good to know, I’ll 
work on a patch…

On 10/14/16, 3:58 PM, "Gregory Farnum"  wrote:

On Fri, Oct 14, 2016 at 11:41 AM, Heller, Chris  wrote:
> Unfortunately, it was all in the unlink operation. Looks as if it took 
nearly 20 hours to remove the dir, roundtrip is a killer there. What can be 
done to reduce RTT to the MDS? Does the client really have to sequentially 
delete directories or can it have internal batching or parallelization?

It's bound by the same syscall APIs as anything else. You can spin off
multiple deleters; I'd either keep them on one client (if you want to
work within a single directory) or if using multiple clients assign
them to different portions of the hierarchy. That will let you
parallelize across the IO latency until you hit a cap on the MDS'
total throughput (should be 1-10k deletes/s based on latest tests
IIRC).
-Greg

>
> -Chris
>
> On 10/13/16, 4:22 PM, "Gregory Farnum"  wrote:
>
> On Thu, Oct 13, 2016 at 12:44 PM, Heller, Chris  
wrote:
> > I have a directory I’ve been trying to remove from cephfs (via
> > cephfs-hadoop), the directory is a few hundred gigabytes in size and
> > contains a few million files, but not in a single sub directory. I 
startd
> > the delete yesterday at around 6:30 EST, and it’s still 
progressing. I can
> > see from (ceph osd df) that the overall data usage on my cluster is
> > decreasing, but at the rate its going it will be a month before the 
entire
> > sub directory is gone. Is a recursive delete of a directory known 
to be a
> > slow operation in CephFS or have I hit upon some bad configuration? 
What
> > steps can I take to better debug this scenario?
>
> Is it the actual unlink operation taking a long time, or just the
> reduction in used space? Unlinks require a round trip to the MDS
> unfortunately, but you should be able to speed things up at least some
> by issuing them in parallel on different directories.
>
> If it's the used space, you can let the MDS issue more RADOS delete
> ops by adjusting the "mds max purge files" and "mds max purge ops"
> config values.
> -Greg
>
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs slow delete

2016-10-14 Thread Gregory Farnum
On Fri, Oct 14, 2016 at 11:41 AM, Heller, Chris  wrote:
> Unfortunately, it was all in the unlink operation. Looks as if it took nearly 
> 20 hours to remove the dir, roundtrip is a killer there. What can be done to 
> reduce RTT to the MDS? Does the client really have to sequentially delete 
> directories or can it have internal batching or parallelization?

It's bound by the same syscall APIs as anything else. You can spin off
multiple deleters; I'd either keep them on one client (if you want to
work within a single directory) or if using multiple clients assign
them to different portions of the hierarchy. That will let you
parallelize across the IO latency until you hit a cap on the MDS'
total throughput (should be 1-10k deletes/s based on latest tests
IIRC).
-Greg

>
> -Chris
>
> On 10/13/16, 4:22 PM, "Gregory Farnum"  wrote:
>
> On Thu, Oct 13, 2016 at 12:44 PM, Heller, Chris  
> wrote:
> > I have a directory I’ve been trying to remove from cephfs (via
> > cephfs-hadoop), the directory is a few hundred gigabytes in size and
> > contains a few million files, but not in a single sub directory. I 
> startd
> > the delete yesterday at around 6:30 EST, and it’s still progressing. I 
> can
> > see from (ceph osd df) that the overall data usage on my cluster is
> > decreasing, but at the rate its going it will be a month before the 
> entire
> > sub directory is gone. Is a recursive delete of a directory known to be 
> a
> > slow operation in CephFS or have I hit upon some bad configuration? What
> > steps can I take to better debug this scenario?
>
> Is it the actual unlink operation taking a long time, or just the
> reduction in used space? Unlinks require a round trip to the MDS
> unfortunately, but you should be able to speed things up at least some
> by issuing them in parallel on different directories.
>
> If it's the used space, you can let the MDS issue more RADOS delete
> ops by adjusting the "mds max purge files" and "mds max purge ops"
> config values.
> -Greg
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs slow delete

2016-10-14 Thread Heller, Chris
Unfortunately, it was all in the unlink operation. Looks as if it took nearly 
20 hours to remove the dir, roundtrip is a killer there. What can be done to 
reduce RTT to the MDS? Does the client really have to sequentially delete 
directories or can it have internal batching or parallelization?

-Chris

On 10/13/16, 4:22 PM, "Gregory Farnum"  wrote:

On Thu, Oct 13, 2016 at 12:44 PM, Heller, Chris  wrote:
> I have a directory I’ve been trying to remove from cephfs (via
> cephfs-hadoop), the directory is a few hundred gigabytes in size and
> contains a few million files, but not in a single sub directory. I startd
> the delete yesterday at around 6:30 EST, and it’s still progressing. I can
> see from (ceph osd df) that the overall data usage on my cluster is
> decreasing, but at the rate its going it will be a month before the entire
> sub directory is gone. Is a recursive delete of a directory known to be a
> slow operation in CephFS or have I hit upon some bad configuration? What
> steps can I take to better debug this scenario?

Is it the actual unlink operation taking a long time, or just the
reduction in used space? Unlinks require a round trip to the MDS
unfortunately, but you should be able to speed things up at least some
by issuing them in parallel on different directories.

If it's the used space, you can let the MDS issue more RADOS delete
ops by adjusting the "mds max purge files" and "mds max purge ops"
config values.
-Greg


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs slow delete

2016-10-13 Thread Gregory Farnum
On Thu, Oct 13, 2016 at 12:44 PM, Heller, Chris  wrote:
> I have a directory I’ve been trying to remove from cephfs (via
> cephfs-hadoop), the directory is a few hundred gigabytes in size and
> contains a few million files, but not in a single sub directory. I startd
> the delete yesterday at around 6:30 EST, and it’s still progressing. I can
> see from (ceph osd df) that the overall data usage on my cluster is
> decreasing, but at the rate its going it will be a month before the entire
> sub directory is gone. Is a recursive delete of a directory known to be a
> slow operation in CephFS or have I hit upon some bad configuration? What
> steps can I take to better debug this scenario?

Is it the actual unlink operation taking a long time, or just the
reduction in used space? Unlinks require a round trip to the MDS
unfortunately, but you should be able to speed things up at least some
by issuing them in parallel on different directories.

If it's the used space, you can let the MDS issue more RADOS delete
ops by adjusting the "mds max purge files" and "mds max purge ops"
config values.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs slow delete

2016-10-13 Thread Heller, Chris
I have a directory I’ve been trying to remove from cephfs (via cephfs-hadoop), 
the directory is a few hundred gigabytes in size and contains a few million 
files, but not in a single sub directory. I startd the delete yesterday at 
around 6:30 EST, and it’s still progressing. I can see from (ceph osd df) that 
the overall data usage on my cluster is decreasing, but at the rate its going 
it will be a month before the entire sub directory is gone. Is a recursive 
delete of a directory known to be a slow operation in CephFS or have I hit upon 
some bad configuration? What steps can I take to better debug this scenario?

-Chris
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com