Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-11-19 Thread Dylan McCulloch
> On Mon, Oct 22, 2018 at 7:47 PM Dylan McCulloch  wrote:
> >
> > > On Mon, Oct 22, 2018 at 2:37 PM Dylan McCulloch  
> > > wrote:
> > > >
> > > > >
> > > > > On Mon, Oct 22, 2018 at 9:46 AM Dylan McCulloch  > > > > unimelb.edu.au> wrote:
> > > > > >
> > > > > > On Mon, Oct 8, 2018 at 2:57 PM Dylan McCulloch  > > > > > unimelb.edu.au>> wrote:
> > > > > > >>
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >>
> > > > > > >> We have identified some unexpected blocking behaviour by the 
> > > > > > >> ceph-fs kernel client.
> > > > > > >>
> > > > > > >>
> > > > > > >> When performing 'rm' on large files (100+GB), there appears to 
> > > > > > >> be a significant delay of 10 seconds or more, before a 'stat' 
> > > > > > >> operation can be performed on the same directory on the 
> > > > > > >> filesystem.
> > > > > > >>
> > > > > > >>
> > > > > > >> Looking at the kernel client's mds inflight-ops, we observe that 
> > > > > > >> there are pending
> > > > > > >>
> > > > > > >> UNLINK operations corresponding to the deleted files.
> > > > > > >>
> > > > > > >>
> > > > > > >> We have noted some correlation between files being in the client 
> > > > > > >> page cache and the blocking behaviour. For example, if the cache 
> > > > > > >> is dropped or the filesystem remounted the blocking will not 
> > > > > > >> occur.
> > > > > > >>
> > > > > > >>
> > > > > > >> Test scenario below:
> > > > > > >>
> > > > > > >>
> > > > > > >> /mnt/cephfs_mountpoint type ceph 
> > > > > > >> (rw,relatime,name=ceph_filesystem,secret=>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
> > > > > > >>
> > > > > > >>
> > > > > > >> Test1:
> > > > > > >>
> > > > > > >> 1) unmount & remount:
> > > > > > >>
> > > > > > >>
> > > > > > >> 2) Add 10 x 100GB files to a directory:
> > > > > > >>
> > > > > > >>
> > > > > > >> for i in {1..10}; do dd if=/dev/zero 
> > > > > > >> of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; 
> > > > > > >> done
> > > > > > >>
> > > > > > >>
> > > > > > >> 3) Delete all files in directory:
> > > > > > >>
> > > > > > >>
> > > > > > >> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
> > > > > > >>
> > > > > > >>
> > > > > > >> 4) Immediately perform ls on directory:
> > > > > > >>
> > > > > > >>
> > > > > > >> time ls /mnt/cephfs_mountpoint/test1
> > > > > > >>
> > > > > > >>
> > > > > > >> Result: delay ~16 seconds
> > > > > > >>
> > > > > > >> real0m16.818s
> > > > > > >>
> > > > > > >> user0m0.000s
> > > > > > >>
> > > > > > >> sys 0m0.002s
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > > > Are cephfs metadata pool and data pool on the same set of OSD. Is 
> > > > > > > is
> > > > > > > possible that heavy data IO slowed down metadata IO?
> > > > > >
> > > > > > Test results are from a new pre-production cluster that does not 
> > > > > > have any significant data IO. We've also confirmed the same 
> > > > > > behaviour on another cluster with similar configuration. Both 
> > > > > > clusters have separate device-class/crush rule for metadata pool 
> > > > > > using NVME OSDs, while the data pool uses HDD OSDs.
> > > > > > Most metadata operations are unaffected. It appears that it is only 
> > > > > > metadata operations on files that exist in client page cache prior 
> > > > > > to rm that are delayed.
> > > > > >
> > > > >
> > > > > Ok. Please enable kernel debug when running 'ls' and send kernel log 
> > > > > to us.
> > > > >
> > > > > echo module ceph +p > /sys/kernel/debug/dynamic_debug/control;
> > > > > time /mnt/cephfs_mountpoint/test1
> > > > > echo module ceph -p > /sys/kernel/debug/dynamic_debug/control;
> > > > >
> > > > > Yan, Zheng
> > > >
> > > > Thanks Yan, Zheng
> > > > I've attached two logfiles as I ran the test twice.
> > > > The first time as previously described Test1 - cephfs_kern.log
> > > > The second time I dropped caches prior to rm as in previous Test2 - 
> > > > cephfs_drop_caches_kern.log
> > > >
> > >
> > > The log shows that client waited 16 seconds for readdir reply. please
> > > try again with debug mds/ms enabled and send both kerne log and mds
> > > log to us.
> > >
> > > before writing data to files, enable debug_mds and debug_ms (On the
> > > machine where mds.0 runs, run 'ceph daemon mds.x config set debug_mds
> > > 10; ceph daemon mds.x config set debug_ms 1')
> > > ...
> > > echo module ceph +p > /sys/kernel/debug/dynamic_debug/control
> > > time ls /mnt/cephfs_mountpoint/test1
> > > echo module ceph -p > /sys/kernel/debug/dynamic_debug/control
> > > disable debug_mds and debug_ms
> > >
> > > Yan, Zheng
> >
> > tarball of kernel log and mds debug log uploaded:
> > https://swift.rc.nectar.org.au:/v1/AUTH_42/cephfs_issue/mds_debug_kern_logs_20181022_2141.tar.gz?temp_url_sig=51f74f07c77346138a164ed229dc8a92f18bed8d_url_expires=1545046086
> >
> > Thanks,
> > Dylan
> >
>
> The log shows that mds sent reply immediately after receiving readdir
> request. But the reply message 

Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-22 Thread Yan, Zheng
On Mon, Oct 22, 2018 at 7:47 PM Dylan McCulloch  wrote:
>
> > On Mon, Oct 22, 2018 at 2:37 PM Dylan McCulloch  
> > wrote:
> > >
> > > >
> > > > On Mon, Oct 22, 2018 at 9:46 AM Dylan McCulloch  
> > > > wrote:
> > > > >
> > > > > On Mon, Oct 8, 2018 at 2:57 PM Dylan McCulloch  > > > > unimelb.edu.au>> wrote:
> > > > > >>
> > > > > >> Hi all,
> > > > > >>
> > > > > >>
> > > > > >> We have identified some unexpected blocking behaviour by the 
> > > > > >> ceph-fs kernel client.
> > > > > >>
> > > > > >>
> > > > > >> When performing 'rm' on large files (100+GB), there appears to be 
> > > > > >> a significant delay of 10 seconds or more, before a 'stat' 
> > > > > >> operation can be performed on the same directory on the filesystem.
> > > > > >>
> > > > > >>
> > > > > >> Looking at the kernel client's mds inflight-ops, we observe that 
> > > > > >> there are pending
> > > > > >>
> > > > > >> UNLINK operations corresponding to the deleted files.
> > > > > >>
> > > > > >>
> > > > > >> We have noted some correlation between files being in the client 
> > > > > >> page cache and the blocking behaviour. For example, if the cache 
> > > > > >> is dropped or the filesystem remounted the blocking will not occur.
> > > > > >>
> > > > > >>
> > > > > >> Test scenario below:
> > > > > >>
> > > > > >>
> > > > > >> /mnt/cephfs_mountpoint type ceph 
> > > > > >> (rw,relatime,name=ceph_filesystem,secret=>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
> > > > > >>
> > > > > >>
> > > > > >> Test1:
> > > > > >>
> > > > > >> 1) unmount & remount:
> > > > > >>
> > > > > >>
> > > > > >> 2) Add 10 x 100GB files to a directory:
> > > > > >>
> > > > > >>
> > > > > >> for i in {1..10}; do dd if=/dev/zero 
> > > > > >> of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done
> > > > > >>
> > > > > >>
> > > > > >> 3) Delete all files in directory:
> > > > > >>
> > > > > >>
> > > > > >> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
> > > > > >>
> > > > > >>
> > > > > >> 4) Immediately perform ls on directory:
> > > > > >>
> > > > > >>
> > > > > >> time ls /mnt/cephfs_mountpoint/test1
> > > > > >>
> > > > > >>
> > > > > >> Result: delay ~16 seconds
> > > > > >>
> > > > > >> real0m16.818s
> > > > > >>
> > > > > >> user0m0.000s
> > > > > >>
> > > > > >> sys 0m0.002s
> > > > > >>
> > > > > >>
> > > > >
> > > > > > Are cephfs metadata pool and data pool on the same set of OSD. Is is
> > > > > > possible that heavy data IO slowed down metadata IO?
> > > > >
> > > > > Test results are from a new pre-production cluster that does not have 
> > > > > any significant data IO. We've also confirmed the same behaviour on 
> > > > > another cluster with similar configuration. Both clusters have 
> > > > > separate device-class/crush rule for metadata pool using NVME OSDs, 
> > > > > while the data pool uses HDD OSDs.
> > > > > Most metadata operations are unaffected. It appears that it is only 
> > > > > metadata operations on files that exist in client page cache prior to 
> > > > > rm that are delayed.
> > > > >
> > > >
> > > > Ok. Please enable kernel debug when running 'ls' and send kernel log to 
> > > > us.
> > > >
> > > > echo module ceph +p > /sys/kernel/debug/dynamic_debug/control;
> > > > time /mnt/cephfs_mountpoint/test1
> > > > echo module ceph -p > /sys/kernel/debug/dynamic_debug/control;
> > > >
> > > > Yan, Zheng
> > >
> > > Thanks Yan, Zheng
> > > I've attached two logfiles as I ran the test twice.
> > > The first time as previously described Test1 - cephfs_kern.log
> > > The second time I dropped caches prior to rm as in previous Test2 - 
> > > cephfs_drop_caches_kern.log
> > >
> >
> > The log shows that client waited 16 seconds for readdir reply. please
> > try again with debug mds/ms enabled and send both kerne log and mds
> > log to us.
> >
> > before writing data to files, enable debug_mds and debug_ms (On the
> > machine where mds.0 runs, run 'ceph daemon mds.x config set debug_mds
> > 10; ceph daemon mds.x config set debug_ms 1')
> > ...
> > echo module ceph +p > /sys/kernel/debug/dynamic_debug/control
> > time ls /mnt/cephfs_mountpoint/test1
> > echo module ceph -p > /sys/kernel/debug/dynamic_debug/control
> > disable debug_mds and debug_ms
> >
> > Yan, Zheng
>
> tarball of kernel log and mds debug log uploaded:
> https://swift.rc.nectar.org.au:/v1/AUTH_42/cephfs_issue/mds_debug_kern_logs_20181022_2141.tar.gz?temp_url_sig=51f74f07c77346138a164ed229dc8a92f18bed8d_url_expires=1545046086
>
> Thanks,
> Dylan
>

The log shows that mds sent reply immediately after receiving readdir
request. But the reply message was delayed for 16 seconds. (mds sent 5
messages at 2018-10-22 21:39:12, the last one is readdir reply.  The
kclient received the first message at 18739.612013 and received the
readdir reply at 18755.894441). The delay pattern is that
kclient received a message, nothing happened for 4 seconds, received
another one or two messages, 

Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-22 Thread Dylan McCulloch
> On Mon, Oct 22, 2018 at 2:37 PM Dylan McCulloch  wrote:
> >
> > >
> > > On Mon, Oct 22, 2018 at 9:46 AM Dylan McCulloch  
> > > wrote:
> > > >
> > > > On Mon, Oct 8, 2018 at 2:57 PM Dylan McCulloch > 
> > > > wrote:
> > > > >>
> > > > >> Hi all,
> > > > >>
> > > > >>
> > > > >> We have identified some unexpected blocking behaviour by the ceph-fs 
> > > > >> kernel client.
> > > > >>
> > > > >>
> > > > >> When performing 'rm' on large files (100+GB), there appears to be a 
> > > > >> significant delay of 10 seconds or more, before a 'stat' operation 
> > > > >> can be performed on the same directory on the filesystem.
> > > > >>
> > > > >>
> > > > >> Looking at the kernel client's mds inflight-ops, we observe that 
> > > > >> there are pending
> > > > >>
> > > > >> UNLINK operations corresponding to the deleted files.
> > > > >>
> > > > >>
> > > > >> We have noted some correlation between files being in the client 
> > > > >> page cache and the blocking behaviour. For example, if the cache is 
> > > > >> dropped or the filesystem remounted the blocking will not occur.
> > > > >>
> > > > >>
> > > > >> Test scenario below:
> > > > >>
> > > > >>
> > > > >> /mnt/cephfs_mountpoint type ceph 
> > > > >> (rw,relatime,name=ceph_filesystem,secret=>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
> > > > >>
> > > > >>
> > > > >> Test1:
> > > > >>
> > > > >> 1) unmount & remount:
> > > > >>
> > > > >>
> > > > >> 2) Add 10 x 100GB files to a directory:
> > > > >>
> > > > >>
> > > > >> for i in {1..10}; do dd if=/dev/zero 
> > > > >> of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done
> > > > >>
> > > > >>
> > > > >> 3) Delete all files in directory:
> > > > >>
> > > > >>
> > > > >> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
> > > > >>
> > > > >>
> > > > >> 4) Immediately perform ls on directory:
> > > > >>
> > > > >>
> > > > >> time ls /mnt/cephfs_mountpoint/test1
> > > > >>
> > > > >>
> > > > >> Result: delay ~16 seconds
> > > > >>
> > > > >> real0m16.818s
> > > > >>
> > > > >> user0m0.000s
> > > > >>
> > > > >> sys 0m0.002s
> > > > >>
> > > > >>
> > > >
> > > > > Are cephfs metadata pool and data pool on the same set of OSD. Is is
> > > > > possible that heavy data IO slowed down metadata IO?
> > > >
> > > > Test results are from a new pre-production cluster that does not have 
> > > > any significant data IO. We've also confirmed the same behaviour on 
> > > > another cluster with similar configuration. Both clusters have separate 
> > > > device-class/crush rule for metadata pool using NVME OSDs, while the 
> > > > data pool uses HDD OSDs.
> > > > Most metadata operations are unaffected. It appears that it is only 
> > > > metadata operations on files that exist in client page cache prior to 
> > > > rm that are delayed.
> > > >
> > >
> > > Ok. Please enable kernel debug when running 'ls' and send kernel log to 
> > > us.
> > >
> > > echo module ceph +p > /sys/kernel/debug/dynamic_debug/control;
> > > time /mnt/cephfs_mountpoint/test1
> > > echo module ceph -p > /sys/kernel/debug/dynamic_debug/control;
> > >
> > > Yan, Zheng
> >
> > Thanks Yan, Zheng
> > I've attached two logfiles as I ran the test twice.
> > The first time as previously described Test1 - cephfs_kern.log
> > The second time I dropped caches prior to rm as in previous Test2 - 
> > cephfs_drop_caches_kern.log
> >
> 
> The log shows that client waited 16 seconds for readdir reply. please
> try again with debug mds/ms enabled and send both kerne log and mds
> log to us.
> 
> before writing data to files, enable debug_mds and debug_ms (On the
> machine where mds.0 runs, run 'ceph daemon mds.x config set debug_mds
> 10; ceph daemon mds.x config set debug_ms 1')
> ...
> echo module ceph +p > /sys/kernel/debug/dynamic_debug/control
> time ls /mnt/cephfs_mountpoint/test1
> echo module ceph -p > /sys/kernel/debug/dynamic_debug/control
> disable debug_mds and debug_ms
> 
> Yan, Zheng

tarball of kernel log and mds debug log uploaded:
https://swift.rc.nectar.org.au:/v1/AUTH_42/cephfs_issue/mds_debug_kern_logs_20181022_2141.tar.gz?temp_url_sig=51f74f07c77346138a164ed229dc8a92f18bed8d_url_expires=1545046086

Thanks,
Dylan

> >
> > > > >>
> > > > >> Test2:
> > > > >>
> > > > >>
> > > > >> 1) unmount & remount
> > > > >>
> > > > >>
> > > > >> 2) Add 10 x 100GB files to a directory
> > > > >>
> > > > >> for i in {1..10}; do dd if=/dev/zero 
> > > > >> of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done
> > > > >>
> > > > >>
> > > > >> 3) Either a) unmount & remount; or b) drop caches
> > > > >>
> > > > >>
> > > > >> echo 3 >>/proc/sys/vm/drop_caches
> > > > >>
> > > > >>
> > > > >> 4) Delete files in directory:
> > > > >>
> > > > >>
> > > > >> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
> > > > >>
> > > > >>
> > > > >> 5) Immediately perform ls on directory:
> > > > >>
> > > > >>
> > > > >> time ls /mnt/cephfs_mountpoint/test1
> 

Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-22 Thread Yan, Zheng
On Mon, Oct 22, 2018 at 2:37 PM Dylan McCulloch  wrote:
>
> >
> > On Mon, Oct 22, 2018 at 9:46 AM Dylan McCulloch  wrote:
> > >
> > > On Mon, Oct 8, 2018 at 2:57 PM Dylan McCulloch > 
> > > wrote:
> > > >>
> > > >> Hi all,
> > > >>
> > > >>
> > > >> We have identified some unexpected blocking behaviour by the ceph-fs 
> > > >> kernel client.
> > > >>
> > > >>
> > > >> When performing 'rm' on large files (100+GB), there appears to be a 
> > > >> significant delay of 10 seconds or more, before a 'stat' operation can 
> > > >> be performed on the same directory on the filesystem.
> > > >>
> > > >>
> > > >> Looking at the kernel client's mds inflight-ops, we observe that there 
> > > >> are pending
> > > >>
> > > >> UNLINK operations corresponding to the deleted files.
> > > >>
> > > >>
> > > >> We have noted some correlation between files being in the client page 
> > > >> cache and the blocking behaviour. For example, if the cache is dropped 
> > > >> or the filesystem remounted the blocking will not occur.
> > > >>
> > > >>
> > > >> Test scenario below:
> > > >>
> > > >>
> > > >> /mnt/cephfs_mountpoint type ceph 
> > > >> (rw,relatime,name=ceph_filesystem,secret=>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
> > > >>
> > > >>
> > > >> Test1:
> > > >>
> > > >> 1) unmount & remount:
> > > >>
> > > >>
> > > >> 2) Add 10 x 100GB files to a directory:
> > > >>
> > > >>
> > > >> for i in {1..10}; do dd if=/dev/zero 
> > > >> of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done
> > > >>
> > > >>
> > > >> 3) Delete all files in directory:
> > > >>
> > > >>
> > > >> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
> > > >>
> > > >>
> > > >> 4) Immediately perform ls on directory:
> > > >>
> > > >>
> > > >> time ls /mnt/cephfs_mountpoint/test1
> > > >>
> > > >>
> > > >> Result: delay ~16 seconds
> > > >>
> > > >> real0m16.818s
> > > >>
> > > >> user0m0.000s
> > > >>
> > > >> sys 0m0.002s
> > > >>
> > > >>
> > >
> > > > Are cephfs metadata pool and data pool on the same set of OSD. Is is
> > > > possible that heavy data IO slowed down metadata IO?
> > >
> > > Test results are from a new pre-production cluster that does not have any 
> > > significant data IO. We've also confirmed the same behaviour on another 
> > > cluster with similar configuration. Both clusters have separate 
> > > device-class/crush rule for metadata pool using NVME OSDs, while the data 
> > > pool uses HDD OSDs.
> > > Most metadata operations are unaffected. It appears that it is only 
> > > metadata operations on files that exist in client page cache prior to rm 
> > > that are delayed.
> > >
> >
> > Ok. Please enable kernel debug when running 'ls' and send kernel log to us.
> >
> > echo module ceph +p > /sys/kernel/debug/dynamic_debug/control;
> > time /mnt/cephfs_mountpoint/test1
> > echo module ceph -p > /sys/kernel/debug/dynamic_debug/control;
> >
> > Yan, Zheng
>
> Thanks Yan, Zheng
> I've attached two logfiles as I ran the test twice.
> The first time as previously described Test1 - cephfs_kern.log
> The second time I dropped caches prior to rm as in previous Test2 - 
> cephfs_drop_caches_kern.log
>

The log shows that client waited 16 seconds for readdir reply. please
try again with debug mds/ms enabled and send both kerne log and mds
log to us.

before writing data to files, enable debug_mds and debug_ms (On the
machine where mds.0 runs, run 'ceph daemon mds.x config set debug_mds
10; ceph daemon mds.x config set debug_ms 1')
...
echo module ceph +p > /sys/kernel/debug/dynamic_debug/control
time ls /mnt/cephfs_mountpoint/test1
echo module ceph -p > /sys/kernel/debug/dynamic_debug/control
disable debug_mds and debug_ms

Yan, Zheng

>
> > > >>
> > > >> Test2:
> > > >>
> > > >>
> > > >> 1) unmount & remount
> > > >>
> > > >>
> > > >> 2) Add 10 x 100GB files to a directory
> > > >>
> > > >> for i in {1..10}; do dd if=/dev/zero 
> > > >> of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done
> > > >>
> > > >>
> > > >> 3) Either a) unmount & remount; or b) drop caches
> > > >>
> > > >>
> > > >> echo 3 >>/proc/sys/vm/drop_caches
> > > >>
> > > >>
> > > >> 4) Delete files in directory:
> > > >>
> > > >>
> > > >> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
> > > >>
> > > >>
> > > >> 5) Immediately perform ls on directory:
> > > >>
> > > >>
> > > >> time ls /mnt/cephfs_mountpoint/test1
> > > >>
> > > >>
> > > >> Result: no delay
> > > >>
> > > >> real0m0.010s
> > > >>
> > > >> user0m0.000s
> > > >>
> > > >> sys 0m0.001s
> > > >>
> > > >>
> > > >> Our understanding of ceph-fs’ file deletion mechanism, is that there 
> > > >> should be no blocking observed on the client. 
> > > >> http://docs.ceph.com/docs/mimic/dev/delayed-delete/ .
> > > >>
> > > >> It appears that if files are cached on the client, either by being 
> > > >> created or accessed recently  it will cause the kernel client to block 
> > > >> for 

Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-22 Thread Dylan McCulloch
>
> On Mon, Oct 22, 2018 at 9:46 AM Dylan McCulloch  wrote:
> >
> > On Mon, Oct 8, 2018 at 2:57 PM Dylan McCulloch > wrote:
> > >>
> > >> Hi all,
> > >>
> > >>
> > >> We have identified some unexpected blocking behaviour by the ceph-fs 
> > >> kernel client.
> > >>
> > >>
> > >> When performing 'rm' on large files (100+GB), there appears to be a 
> > >> significant delay of 10 seconds or more, before a 'stat' operation can 
> > >> be performed on the same directory on the filesystem.
> > >>
> > >>
> > >> Looking at the kernel client's mds inflight-ops, we observe that there 
> > >> are pending
> > >>
> > >> UNLINK operations corresponding to the deleted files.
> > >>
> > >>
> > >> We have noted some correlation between files being in the client page 
> > >> cache and the blocking behaviour. For example, if the cache is dropped 
> > >> or the filesystem remounted the blocking will not occur.
> > >>
> > >>
> > >> Test scenario below:
> > >>
> > >>
> > >> /mnt/cephfs_mountpoint type ceph 
> > >> (rw,relatime,name=ceph_filesystem,secret=>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
> > >>
> > >>
> > >> Test1:
> > >>
> > >> 1) unmount & remount:
> > >>
> > >>
> > >> 2) Add 10 x 100GB files to a directory:
> > >>
> > >>
> > >> for i in {1..10}; do dd if=/dev/zero 
> > >> of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done
> > >>
> > >>
> > >> 3) Delete all files in directory:
> > >>
> > >>
> > >> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
> > >>
> > >>
> > >> 4) Immediately perform ls on directory:
> > >>
> > >>
> > >> time ls /mnt/cephfs_mountpoint/test1
> > >>
> > >>
> > >> Result: delay ~16 seconds
> > >>
> > >> real0m16.818s
> > >>
> > >> user0m0.000s
> > >>
> > >> sys 0m0.002s
> > >>
> > >>
> >
> > > Are cephfs metadata pool and data pool on the same set of OSD. Is is
> > > possible that heavy data IO slowed down metadata IO?
> >
> > Test results are from a new pre-production cluster that does not have any 
> > significant data IO. We've also confirmed the same behaviour on another 
> > cluster with similar configuration. Both clusters have separate 
> > device-class/crush rule for metadata pool using NVME OSDs, while the data 
> > pool uses HDD OSDs.
> > Most metadata operations are unaffected. It appears that it is only 
> > metadata operations on files that exist in client page cache prior to rm 
> > that are delayed.
> >
>
> Ok. Please enable kernel debug when running 'ls' and send kernel log to us.
>
> echo module ceph +p > /sys/kernel/debug/dynamic_debug/control;
> time /mnt/cephfs_mountpoint/test1
> echo module ceph -p > /sys/kernel/debug/dynamic_debug/control;
>
> Yan, Zheng

Thanks Yan, Zheng
I've attached two logfiles as I ran the test twice.
The first time as previously described Test1 - cephfs_kern.log
The second time I dropped caches prior to rm as in previous Test2 - 
cephfs_drop_caches_kern.log


> > >>
> > >> Test2:
> > >>
> > >>
> > >> 1) unmount & remount
> > >>
> > >>
> > >> 2) Add 10 x 100GB files to a directory
> > >>
> > >> for i in {1..10}; do dd if=/dev/zero 
> > >> of=/mnt/cephfs_mountpoint/file$i.txt count=102400 bs=1048576; done
> > >>
> > >>
> > >> 3) Either a) unmount & remount; or b) drop caches
> > >>
> > >>
> > >> echo 3 >>/proc/sys/vm/drop_caches
> > >>
> > >>
> > >> 4) Delete files in directory:
> > >>
> > >>
> > >> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
> > >>
> > >>
> > >> 5) Immediately perform ls on directory:
> > >>
> > >>
> > >> time ls /mnt/cephfs_mountpoint/test1
> > >>
> > >>
> > >> Result: no delay
> > >>
> > >> real0m0.010s
> > >>
> > >> user0m0.000s
> > >>
> > >> sys 0m0.001s
> > >>
> > >>
> > >> Our understanding of ceph-fs’ file deletion mechanism, is that there 
> > >> should be no blocking observed on the client. 
> > >> http://docs.ceph.com/docs/mimic/dev/delayed-delete/ .
> > >>
> > >> It appears that if files are cached on the client, either by being 
> > >> created or accessed recently  it will cause the kernel client to block 
> > >> for reasons we have not identified.
> > >>
> > >>
> > >> Is this a known issue, are there any ways to mitigate this behaviour?
> > >>
> > >> Our production system relies on our client’s processes having concurrent 
> > >> access to the file system, and access contention must be avoided.
> > >>
> > >>
> > >> An old mailing list post that discusses changes to client’s page cache 
> > >> behaviour may be relevant.
> > >>
> > >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html
> > >>
> > >>
> > >> Client System:
> > >>
> > >>
> > >> OS: RHEL7
> > >>
> > >> Kernel: 4.15.15-1
> > >>
> > >>
> > >> Cluster: Ceph: Luminous 12.2.8
> > >>
> > >>
> >
> >
> >
> > >> Thanks,
> > >>
> > >> Dylan
> > >>
> > >>
> > >> ___
> > >> ceph-users mailing list
> > >> ceph-users@lists.ceph.com
> > >> 

Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-21 Thread Yan, Zheng
On Mon, Oct 22, 2018 at 9:46 AM Dylan McCulloch  wrote:
>
> On Mon, Oct 8, 2018 at 2:57 PM Dylan McCulloch > wrote:
> >>
> >> Hi all,
> >>
> >>
> >> We have identified some unexpected blocking behaviour by the ceph-fs 
> >> kernel client.
> >>
> >>
> >> When performing 'rm' on large files (100+GB), there appears to be a 
> >> significant delay of 10 seconds or more, before a 'stat' operation can be 
> >> performed on the same directory on the filesystem.
> >>
> >>
> >> Looking at the kernel client's mds inflight-ops, we observe that there are 
> >> pending
> >>
> >> UNLINK operations corresponding to the deleted files.
> >>
> >>
> >> We have noted some correlation between files being in the client page 
> >> cache and the blocking behaviour. For example, if the cache is dropped or 
> >> the filesystem remounted the blocking will not occur.
> >>
> >>
> >> Test scenario below:
> >>
> >>
> >> /mnt/cephfs_mountpoint type ceph 
> >> (rw,relatime,name=ceph_filesystem,secret=>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
> >>
> >>
> >> Test1:
> >>
> >> 1) unmount & remount:
> >>
> >>
> >> 2) Add 10 x 100GB files to a directory:
> >>
> >>
> >> for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
> >> count=102400 bs=1048576; done
> >>
> >>
> >> 3) Delete all files in directory:
> >>
> >>
> >> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
> >>
> >>
> >> 4) Immediately perform ls on directory:
> >>
> >>
> >> time ls /mnt/cephfs_mountpoint/test1
> >>
> >>
> >> Result: delay ~16 seconds
> >>
> >> real0m16.818s
> >>
> >> user0m0.000s
> >>
> >> sys 0m0.002s
> >>
> >>
>
> > Are cephfs metadata pool and data pool on the same set of OSD. Is is
> > possible that heavy data IO slowed down metadata IO?
>
> Test results are from a new pre-production cluster that does not have any 
> significant data IO. We've also confirmed the same behaviour on another 
> cluster with similar configuration. Both clusters have separate 
> device-class/crush rule for metadata pool using NVME OSDs, while the data 
> pool uses HDD OSDs.
> Most metadata operations are unaffected. It appears that it is only metadata 
> operations on files that exist in client page cache prior to rm that are 
> delayed.
>

Ok. Please enable kernel debug when running 'ls' and send kernel log to us.

echo module ceph +p > /sys/kernel/debug/dynamic_debug/control;
time /mnt/cephfs_mountpoint/test1
echo module ceph -p > /sys/kernel/debug/dynamic_debug/control;

Yan, Zheng

> >>
> >> Test2:
> >>
> >>
> >> 1) unmount & remount
> >>
> >>
> >> 2) Add 10 x 100GB files to a directory
> >>
> >> for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
> >> count=102400 bs=1048576; done
> >>
> >>
> >> 3) Either a) unmount & remount; or b) drop caches
> >>
> >>
> >> echo 3 >>/proc/sys/vm/drop_caches
> >>
> >>
> >> 4) Delete files in directory:
> >>
> >>
> >> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
> >>
> >>
> >> 5) Immediately perform ls on directory:
> >>
> >>
> >> time ls /mnt/cephfs_mountpoint/test1
> >>
> >>
> >> Result: no delay
> >>
> >> real0m0.010s
> >>
> >> user0m0.000s
> >>
> >> sys 0m0.001s
> >>
> >>
> >> Our understanding of ceph-fs’ file deletion mechanism, is that there 
> >> should be no blocking observed on the client. 
> >> http://docs.ceph.com/docs/mimic/dev/delayed-delete/ .
> >>
> >> It appears that if files are cached on the client, either by being created 
> >> or accessed recently  it will cause the kernel client to block for reasons 
> >> we have not identified.
> >>
> >>
> >> Is this a known issue, are there any ways to mitigate this behaviour?
> >>
> >> Our production system relies on our client’s processes having concurrent 
> >> access to the file system, and access contention must be avoided.
> >>
> >>
> >> An old mailing list post that discusses changes to client’s page cache 
> >> behaviour may be relevant.
> >>
> >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html
> >>
> >>
> >> Client System:
> >>
> >>
> >> OS: RHEL7
> >>
> >> Kernel: 4.15.15-1
> >>
> >>
> >> Cluster: Ceph: Luminous 12.2.8
> >>
> >>
>
>
>
> >> Thanks,
> >>
> >> Dylan
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-21 Thread Dylan McCulloch
On Mon, Oct 8, 2018 at 2:57 PM Dylan McCulloch > wrote:
>>
>> Hi all,
>>
>>
>> We have identified some unexpected blocking behaviour by the ceph-fs kernel 
>> client.
>>
>>
>> When performing 'rm' on large files (100+GB), there appears to be a 
>> significant delay of 10 seconds or more, before a 'stat' operation can be 
>> performed on the same directory on the filesystem.
>>
>>
>> Looking at the kernel client's mds inflight-ops, we observe that there are 
>> pending
>>
>> UNLINK operations corresponding to the deleted files.
>>
>>
>> We have noted some correlation between files being in the client page cache 
>> and the blocking behaviour. For example, if the cache is dropped or the 
>> filesystem remounted the blocking will not occur.
>>
>>
>> Test scenario below:
>>
>>
>> /mnt/cephfs_mountpoint type ceph 
>> (rw,relatime,name=ceph_filesystem,secret=>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
>>
>>
>> Test1:
>>
>> 1) unmount & remount:
>>
>>
>> 2) Add 10 x 100GB files to a directory:
>>
>>
>> for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
>> count=102400 bs=1048576; done
>>
>>
>> 3) Delete all files in directory:
>>
>>
>> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
>>
>>
>> 4) Immediately perform ls on directory:
>>
>>
>> time ls /mnt/cephfs_mountpoint/test1
>>
>>
>> Result: delay ~16 seconds
>>
>> real0m16.818s
>>
>> user0m0.000s
>>
>> sys 0m0.002s
>>
>>

> Are cephfs metadata pool and data pool on the same set of OSD. Is is
> possible that heavy data IO slowed down metadata IO?

Test results are from a new pre-production cluster that does not have any 
significant data IO. We've also confirmed the same behaviour on another cluster 
with similar configuration. Both clusters have separate device-class/crush rule 
for metadata pool using NVME OSDs, while the data pool uses HDD OSDs.
Most metadata operations are unaffected. It appears that it is only metadata 
operations on files that exist in client page cache prior to rm that are 
delayed.

>>
>> Test2:
>>
>>
>> 1) unmount & remount
>>
>>
>> 2) Add 10 x 100GB files to a directory
>>
>> for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
>> count=102400 bs=1048576; done
>>
>>
>> 3) Either a) unmount & remount; or b) drop caches
>>
>>
>> echo 3 >>/proc/sys/vm/drop_caches
>>
>>
>> 4) Delete files in directory:
>>
>>
>> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
>>
>>
>> 5) Immediately perform ls on directory:
>>
>>
>> time ls /mnt/cephfs_mountpoint/test1
>>
>>
>> Result: no delay
>>
>> real0m0.010s
>>
>> user0m0.000s
>>
>> sys 0m0.001s
>>
>>
>> Our understanding of ceph-fs’ file deletion mechanism, is that there should 
>> be no blocking observed on the client. 
>> http://docs.ceph.com/docs/mimic/dev/delayed-delete/ .
>>
>> It appears that if files are cached on the client, either by being created 
>> or accessed recently  it will cause the kernel client to block for reasons 
>> we have not identified.
>>
>>
>> Is this a known issue, are there any ways to mitigate this behaviour?
>>
>> Our production system relies on our client’s processes having concurrent 
>> access to the file system, and access contention must be avoided.
>>
>>
>> An old mailing list post that discusses changes to client’s page cache 
>> behaviour may be relevant.
>>
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html
>>
>>
>> Client System:
>>
>>
>> OS: RHEL7
>>
>> Kernel: 4.15.15-1
>>
>>
>> Cluster: Ceph: Luminous 12.2.8
>>
>>



>> Thanks,
>>
>> Dylan
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-19 Thread Yan, Zheng
On Mon, Oct 8, 2018 at 2:57 PM Dylan McCulloch  wrote:
>
> Hi all,
>
>
> We have identified some unexpected blocking behaviour by the ceph-fs kernel 
> client.
>
>
> When performing 'rm' on large files (100+GB), there appears to be a 
> significant delay of 10 seconds or more, before a 'stat' operation can be 
> performed on the same directory on the filesystem.
>
>
> Looking at the kernel client's mds inflight-ops, we observe that there are 
> pending
>
> UNLINK operations corresponding to the deleted files.
>
>
> We have noted some correlation between files being in the client page cache 
> and the blocking behaviour. For example, if the cache is dropped or the 
> filesystem remounted the blocking will not occur.
>
>
> Test scenario below:
>
>
> /mnt/cephfs_mountpoint type ceph 
> (rw,relatime,name=ceph_filesystem,secret=,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
>
>
> Test1:
>
> 1) unmount & remount:
>
>
> 2) Add 10 x 100GB files to a directory:
>
>
> for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
> count=102400 bs=1048576; done
>
>
> 3) Delete all files in directory:
>
>
> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
>
>
> 4) Immediately perform ls on directory:
>
>
> time ls /mnt/cephfs_mountpoint/test1
>
>
> Result: delay ~16 seconds
>
> real0m16.818s
>
> user0m0.000s
>
> sys 0m0.002s
>
>

Are cephfs metadata pool and data pool on the same set of OSD. Is is
possible that heavy data IO slowed down metadata IO?
>
> Test2:
>
>
> 1) unmount & remount
>
>
> 2) Add 10 x 100GB files to a directory
>
> for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
> count=102400 bs=1048576; done
>
>
> 3) Either a) unmount & remount; or b) drop caches
>
>
> echo 3 >/proc/sys/vm/drop_caches
>
>
> 4) Delete files in directory:
>
>
> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
>
>
> 5) Immediately perform ls on directory:
>
>
> time ls /mnt/cephfs_mountpoint/test1
>
>
> Result: no delay
>
> real0m0.010s
>
> user0m0.000s
>
> sys 0m0.001s
>
>
> Our understanding of ceph-fs’ file deletion mechanism, is that there should 
> be no blocking observed on the client. 
> http://docs.ceph.com/docs/mimic/dev/delayed-delete/ .
>
> It appears that if files are cached on the client, either by being created or 
> accessed recently  it will cause the kernel client to block for reasons we 
> have not identified.
>
>
> Is this a known issue, are there any ways to mitigate this behaviour?
>
> Our production system relies on our client’s processes having concurrent 
> access to the file system, and access contention must be avoided.
>
>
> An old mailing list post that discusses changes to client’s page cache 
> behaviour may be relevant.
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html
>
>
> Client System:
>
>
> OS: RHEL7
>
> Kernel: 4.15.15-1
>
>
> Cluster: Ceph: Luminous 12.2.8
>
>



> Thanks,
>
> Dylan
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-15 Thread Gregory Farnum
On Tue, Oct 9, 2018 at 10:57 PM Dylan McCulloch  wrote:

> Hi Greg,
>
> Nowhere in your test procedure do you mention syncing or flushing the
> files to disk. That is almost certainly the cause of the slowness
>
> We have tested performing sync after file creation and the delay still
> occurs. (See Test3 results below)
>
> To clarify, it appears the delay is observed only when ls is performed on
> the same directory in which the files were removed, provided the files have
> been recently cached.
>
> e.g. rm -f /mnt/cephfs_mountpoint/file*; ls /mnt/cephfs_mountpoint
>
> the client which wrote the data is required to flush it out before
> dropping enough file "capabilities" for the other client to do the rm.
>
> Our tests are performed on the same host.
>
> In Test1 the rm and ls are performed by the same client id. And for other
> tests in which an unmount & remount were performed, I would assume the
> unmount would cause that particular client id to terminate and drop any
> caps.
>
> Do you still believe held caps are contributing to slowness in these test
> scenarios?
>

Hmm, perhaps not. Or at least not in that way.

These tests are interesting; I'm not quite sure what might be going on
here, but I think I'll have to let one of our more dedicated kernel CephFS
people look at it, sorry.
-Greg


>
> We’ve added 3 additional test cases below.
>
> Test 3) Sync write (delay observed when writing files and syncing)
>
> Test 4) Bypass cache (no delay observed when files are not written to
> cache)
>
> Test 5) Read test (delay observed when removing files that have been read
> recently in to cache)
>
> Test3: Sync Write - File creation, with sync after write.
>
> 1) unmount & remount:
>
> 2) Add 5 x 100GB files to a directory:
>
> for i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt
> count=102400 bs=1048576;done
>
> 3) sync
>
> 4) Delete all files in directory:
>
> for i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
>
> 5) Immediately perform ls on directory:
>
> time ls /mnt/cephfs_mountpoint
>
> real0m8.765s
>
> user0m0.001s
>
> sys 0m0.000s
>
>
> Test4: Bypass cache - File creation, with nocache options for dd.
>
> 1) unmount & remount:
>
> 2) Add 5 x 100GB files to a directory:
>
> for i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt
> count=102400 bs=1048576 oflag=nocache,sync iflag=nocache;done
>
> 3) sync
>
> 4) Delete all files in directory:
>
> for i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
>
> 5) Immediately perform ls on directory:
>
> time ls /mnt/cephfs_mountpoint
>
> real0m0.003s
>
> user0m0.000s
>
> sys 0m0.001s
>
>
> Test5: Read test - Read files into empty page cache, before deletion.
>
> 1) unmount & remount
>
> 2) Add 5 x 100GB files to a directory:
>
> for i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt
> count=102400 bs=1048576;done
>
> 3) sync
>
> 4) unmount & remount #empty cache
>
> 5) read files (to add back to cache)
>
> for i in {1..5};do cat /mnt/cephfs_mountpoint/file$i.txt > /dev/null; done
>
> 6) Delete all files in directory:
>
> for i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
>
> 5) Immediately perform ls on directory:
>
> time ls /mnt/cephfs_mountpoint
>
> real0m8.723s
>
> user0m0.000s
>
> sys 0m0.001s
>
> Regards,
>
> Dylan
> --
> *From:* Gregory Farnum 
> *Sent:* Wednesday, October 10, 2018 4:37:49 AM
> *To:* Dylan McCulloch
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] cephfs kernel client blocks when removing
> large files
>
> Nowhere in your test procedure do you mention syncing or flushing the
> files to disk. That is almost certainly the cause of the slowness — the
> client which wrote the data is required to flush it out before dropping
> enough file "capabilities" for the other client to do the rm.
> -Greg
>
> On Sun, Oct 7, 2018 at 11:57 PM Dylan McCulloch 
> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * Hi all, We have identified some unexpected blocking behaviour by the
> ceph-fs kernel client. When performing 'rm' on large files (100+GB), there
> appears to be a significant delay of 10 seconds or more, before a 'stat'
> operation can be performed on the same directory on the filesystem. Looking
> at the kernel client's mds inflight-ops, we observe that there ar

Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-09 Thread Dylan McCulloch
Hi Greg,


Nowhere in your test procedure do you mention syncing or flushing the files to 
disk. That is almost certainly the cause of the slowness


We have tested performing sync after file creation and the delay still occurs. 
(See Test3 results below)


To clarify, it appears the delay is observed only when ls is performed on the 
same directory in which the files were removed, provided the files have been 
recently cached.

e.g. rm -f /mnt/cephfs_mountpoint/file*; ls /mnt/cephfs_mountpoint


the client which wrote the data is required to flush it out before dropping 
enough file "capabilities" for the other client to do the rm.


Our tests are performed on the same host.


In Test1 the rm and ls are performed by the same client id. And for other tests 
in which an unmount & remount were performed, I would assume the unmount would 
cause that particular client id to terminate and drop any caps.


Do you still believe held caps are contributing to slowness in these test 
scenarios?


We’ve added 3 additional test cases below.

Test 3) Sync write (delay observed when writing files and syncing)

Test 4) Bypass cache (no delay observed when files are not written to cache)

Test 5) Read test (delay observed when removing files that have been read 
recently in to cache)


Test3: Sync Write - File creation, with sync after write.


1) unmount & remount:


2) Add 5 x 100GB files to a directory:


for i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
count=102400 bs=1048576;done


3) sync


4) Delete all files in directory:


for i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done


5) Immediately perform ls on directory:


time ls /mnt/cephfs_mountpoint

real0m8.765s

user0m0.001s

sys 0m0.000s



Test4: Bypass cache - File creation, with nocache options for dd.


1) unmount & remount:


2) Add 5 x 100GB files to a directory:


for i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
count=102400 bs=1048576 oflag=nocache,sync iflag=nocache;done


3) sync


4) Delete all files in directory:


for i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done


5) Immediately perform ls on directory:


time ls /mnt/cephfs_mountpoint

real0m0.003s

user0m0.000s

sys 0m0.001s



Test5: Read test - Read files into empty page cache, before deletion.

1) unmount & remount


2) Add 5 x 100GB files to a directory:


for i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
count=102400 bs=1048576;done


3) sync


4) unmount & remount #empty cache


5) read files (to add back to cache)

for i in {1..5};do cat /mnt/cephfs_mountpoint/file$i.txt > /dev/null; done


6) Delete all files in directory:


for i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done


5) Immediately perform ls on directory:


time ls /mnt/cephfs_mountpoint

real0m8.723s

user0m0.000s

sys 0m0.001s

Regards,

Dylan


From: Gregory Farnum 
Sent: Wednesday, October 10, 2018 4:37:49 AM
To: Dylan McCulloch
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] cephfs kernel client blocks when removing large files

Nowhere in your test procedure do you mention syncing or flushing the files to 
disk. That is almost certainly the cause of the slowness — the client which 
wrote the data is required to flush it out before dropping enough file 
"capabilities" for the other client to do the rm.
-Greg

On Sun, Oct 7, 2018 at 11:57 PM Dylan McCulloch 
mailto:d...@unimelb.edu.au>> wrote:

Hi all,


We have identified some unexpected blocking behaviour by the ceph-fs kernel 
client.


When performing 'rm' on large files (100+GB), there appears to be a significant 
delay of 10 seconds or more, before a 'stat' operation can be performed on the 
same directory on the filesystem.


Looking at the kernel client's mds inflight-ops, we observe that there are 
pending

UNLINK operations corresponding to the deleted files.


We have noted some correlation between files being in the client page cache and 
the blocking behaviour. For example, if the cache is dropped or the filesystem 
remounted the blocking will not occur.


Test scenario below:


/mnt/cephfs_mountpoint type ceph 
(rw,relatime,name=ceph_filesystem,secret=,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)


Test1:

1) unmount & remount:


2) Add 10 x 100GB files to a directory:


for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
count=102400 bs=1048576; done


3) Delete all files in directory:


for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done


4) Immediately perform ls on directory:


time ls /mnt/cephfs_mountpoint/test1


Result: delay ~16 seconds

real0m16.818s

user0m0.000s

sys 0m0.002s



Test2:


1) unmount & remount


2) Add 10 x 100GB files to a directory

for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs

Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-09 Thread Gregory Farnum
Nowhere in your test procedure do you mention syncing or flushing the files
to disk. That is almost certainly the cause of the slowness — the client
which wrote the data is required to flush it out before dropping enough
file "capabilities" for the other client to do the rm.
-Greg

On Sun, Oct 7, 2018 at 11:57 PM Dylan McCulloch  wrote:

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * Hi all, We have identified some unexpected blocking behaviour by the
> ceph-fs kernel client. When performing 'rm' on large files (100+GB), there
> appears to be a significant delay of 10 seconds or more, before a 'stat'
> operation can be performed on the same directory on the filesystem. Looking
> at the kernel client's mds inflight-ops, we observe that there are pending
> UNLINK operations corresponding to the deleted files. We have noted some
> correlation between files being in the client page cache and the blocking
> behaviour. For example, if the cache is dropped or the filesystem remounted
> the blocking will not occur. Test scenario below: /mnt/cephfs_mountpoint
> type ceph
> (rw,relatime,name=ceph_filesystem,secret=,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
> Test1: 1) unmount & remount: 2) Add 10 x 100GB files to a directory: for i
> in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt
> count=102400 bs=1048576; done 3) Delete all files in directory: for i in
> {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done 4) Immediately
> perform ls on directory: time ls /mnt/cephfs_mountpoint/test1 Result: delay
> ~16 seconds real0m16.818s user0m0.000s sys 0m0.002s Test2: 1)
> unmount & remount 2) Add 10 x 100GB files to a directory for i in {1..10};
> do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt count=102400
> bs=1048576; done 3) Either a) unmount & remount; or b) drop caches echo 3
> >/proc/sys/vm/drop_caches 4) Delete files in directory: for i in {1..10};do
> rm -f /mnt/cephfs_mountpoint/file$i.txt; done 5) Immediately perform ls on
> directory: time ls /mnt/cephfs_mountpoint/test1 Result: no delay real
>0m0.010s user0m0.000s sys 0m0.001s Our understanding of ceph-fs’
> file deletion mechanism, is that there should be no blocking observed on
> the client. http://docs.ceph.com/docs/mimic/dev/delayed-delete/
>  . It appears that if
> files are cached on the client, either by being created or accessed
> recently  it will cause the kernel client to block for reasons we have not
> identified. Is this a known issue, are there any ways to mitigate this
> behaviour? Our production system relies on our client’s processes having
> concurrent access to the file system, and access contention must be
> avoided. An old mailing list post that discusses changes to client’s page
> cache behaviour may be relevant.
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html
> 
> Client System: OS: RHEL7 Kernel: 4.15.15-1 Cluster: Ceph: Luminous 12.2.8
> Thanks, Dylan *
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs kernel client blocks when removing large files

2018-10-08 Thread Dylan McCulloch
Hi all,


We have identified some unexpected blocking behaviour by the ceph-fs kernel 
client.


When performing 'rm' on large files (100+GB), there appears to be a significant 
delay of 10 seconds or more, before a 'stat' operation can be performed on the 
same directory on the filesystem.


Looking at the kernel client's mds inflight-ops, we observe that there are 
pending

UNLINK operations corresponding to the deleted files.


We have noted some correlation between files being in the client page cache and 
the blocking behaviour. For example, if the cache is dropped or the filesystem 
remounted the blocking will not occur.


Test scenario below:


/mnt/cephfs_mountpoint type ceph 
(rw,relatime,name=ceph_filesystem,secret=,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)


Test1:

1) unmount & remount:


2) Add 10 x 100GB files to a directory:


for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
count=102400 bs=1048576; done


3) Delete all files in directory:


for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done


4) Immediately perform ls on directory:


time ls /mnt/cephfs_mountpoint/test1


Result: delay ~16 seconds

real0m16.818s

user0m0.000s

sys 0m0.002s



Test2:


1) unmount & remount


2) Add 10 x 100GB files to a directory

for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
count=102400 bs=1048576; done


3) Either a) unmount & remount; or b) drop caches


echo 3 >/proc/sys/vm/drop_caches


4) Delete files in directory:


for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done


5) Immediately perform ls on directory:


time ls /mnt/cephfs_mountpoint/test1


Result: no delay

real0m0.010s

user0m0.000s

sys 0m0.001s


Our understanding of ceph-fs’ file deletion mechanism, is that there should be 
no blocking observed on the client. 
http://docs.ceph.com/docs/mimic/dev/delayed-delete/ .

It appears that if files are cached on the client, either by being created or 
accessed recently  it will cause the kernel client to block for reasons we have 
not identified.


Is this a known issue, are there any ways to mitigate this behaviour?

Our production system relies on our client’s processes having concurrent access 
to the file system, and access contention must be avoided.


An old mailing list post that discusses changes to client’s page cache 
behaviour may be relevant.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html


Client System:


OS: RHEL7

Kernel: 4.15.15-1


Cluster: Ceph: Luminous 12.2.8


Thanks,

Dylan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com