Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure
Hello Zheng This is my initial email containing ceph -s and session ls info. I will send cache dump shortly. Note that per John's suggestion, I have upgraded the offending clients to 4.8 kernel, so my cache dump will be current with these new clients. Thanks, Stephen Begin forwarded message: > From: Stephen Horton> Date: October 3, 2016 at 6:45:49 PM CDT > To: "ceph-users@lists.ceph.com" > Subject: cephfs kernel driver - failing to respond to cache pressure > Reply-To: Stephen Horton > > I am using Ceph to back Openstack Nova ephemeral, Cinder volumes, Glance > images, and Openstack Manila File Share storage. Originally, I was using > ceph-fuse with Manila, but performance and resource usage was poor, so I > changed to using the CephFs kernel driver. Now however, I am getting messages > from my MDS that all of my Manila file share clients are "failing to respond > to cache pressure". Can anyone take a look and advise me? I have earlier > increased mds_cache_size from 10 to 50. Am I missing some > configuration on the clients to correctly enable them to respond to cache > instruction? > thanks! > shorton > > arcuser@arccloud01:~$ sudo ceph daemon mds.arccloud01 perf dump mds > { > "mds": { > "request": 16147252, > "reply": 16147249, > "reply_latency": { > "avgcount": 16147249, > "sum": 19885.911791960 > }, > "forward": 0, > "dir_fetch": 535805, > "dir_commit": 35493, > "dir_split": 0, > "inode_max": 50, > "inodes": 499912, > "inodes_top": 129548, > "inodes_bottom": 365853, > "inodes_pin_tail": 4511, > "inodes_pinned": 314789, > "inodes_expired": 3214675, > "inodes_with_caps": 314579, > "caps": 5338001, > "subtrees": 2, > "traverse": 19339004, > "traverse_hit": 16500851, > "traverse_forward": 0, > "traverse_discover": 0, > "traverse_dir_fetch": 2738658, > "traverse_remote_ino": 0, > "traverse_lock": 2149, > "load_cent": 1614725200, > "q": 0, > "exported": 0, > "exported_inodes": 0, > "imported": 0, > "imported_inodes": 0 > } > } > > arcuser@arccloud01:~$ cat /etc/ceph/ceph.conf > [global] > fsid = 6e647506-631a-457e-a52a-f21a3866a023 > mon_initial_members = arccloud01, arccloud02, arccloud03 > mon_host = 10.155.92.128,10.155.92.129,10.155.92.130 > mon_pg_warn_max_per_osd = 400 > mon_lease = 50 > mon_lease_renew_interval = 30 > mon_lease_ack_timeout = 100 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > public_network = 10.155.92.0/22 > cluster_network = 192.168.92.0/22 > > [client.glanceimages] > keyring = /etc/ceph/ceph.client.glanceimages.keyring > > [client.novapool] > keyring = /etc/ceph/ceph.client.novapool.keyring > > [client.cindervolumes] > keyring = /etc/ceph/ceph.client.cindervolumes.keyring > > [client.manila] > client_mount_uid = 0 > client_mount_gid = 0 > log_file = /opt/stack/logs/ceph-client.manila.log > admin_socket = /opt/stack/status/stack/ceph-$name.$pid.asok > keyring = /etc/ceph/ceph.client.manila.keyring > > [mon.arccloud01] > host = arccloud01 > mon addr = 10.155.92.128:6789 > > [mon.arccloud02] > host = arccloud02 > mon addr = 10.155.92.129:6789 > > [mon.arccloud03] > host = arccloud03 > mon addr = 10.155.92.130:6789 > > [osd.2] > host = arccloud01 > public addr = 10.155.92.128 > cluster addr = 192.168.92.128 > > [osd.1] > host = arccloud02 > public addr = 10.155.92.129 > cluster addr = 192.168.92.129 > > [osd.0] > host = arccloud03 > public addr = 10.155.92.130 > cluster addr = 192.168.92.130 > > [mds] > mds cache size = 50 > > arcuser@arccloud01:/usr/local/bin$ sudo ceph -s > cluster 6e647506-631a-457e-a52a-f21a3866a023 > health HEALTH_WARN > mds0: Client ROSA-LIN-DESKTOP failing to respond to cache pressure > mds0: Client QI-DAI-DESKTOP failing to respond to cache pressure > monmap e1: 3 mons at > {arccloud01=10.155.92.128:6789/0,arccloud02=10.155.92.129:6789/0,arccloud03=10.155.92.130:6789/0} > election epoch 5152, quorum 0,1,2 arccloud01,arccloud02,arccloud03 > fsmap e1774: 1/1/1 up {0=arccloud01=up:active} > osdmap e1528: 3 osds: 3 up, 3 in > flags sortbitwise > pgmap v1637495: 384 pgs, 6 pools, 458 GB data, 1992 kobjects > 1752 GB used, 40431 GB / 42184 GB avail > 384 active+clean > client io 36944 B/s wr, 0 op/s rd, 7 op/s wr > > arcuser@arccloud01:/usr/local/bin$ sudo ceph auth list > installed auth entries: > > mds.arccloud01 > key: AQCqk6tXSuIhBxAAhBWtOpaezVMooYlWJyRXCQ== > caps: [mds] allow > caps: [mon] allow profile mds >
Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure
Clients are almost all idle, very little load on the cluster. I can see no errors or warnings in the client logs when the file share is unmounted. Thx! > On Oct 4, 2016, at 10:31 PM, Yan, Zhengwrote: > >> On Tue, Oct 4, 2016 at 11:30 PM, John Spray wrote: >>> On Tue, Oct 4, 2016 at 5:09 PM, Stephen Horton wrote: >>> Thank you John. Both my Openstack hosts and the VMs are all running >>> 4.4.0-38-generic #57-Ubuntu SMP x86_64. I can see no evidence that any of >>> the VMs are holding large numbers of files open. If this is likely a client >>> bug, is there some process I can follow to file a bug report? >> >> It might be worthwhile to file a bug report with Ubuntu, as they'd be >> the ones who would ideally backport fixes to their stable kernels (in >> this instance it's hard to know if this is a bug in the latest kernel >> code or something fixed since 4.4). >> >> It would be really useful if you could try installing the latest >> released kernel on the clients and see if the issue persists: if so >> then a ticket on tracker.ceph.com will be a priority for us to fix. >> >> CCing Zheng -- are there any noteworthy fixes between 4.4 and latest >> kernel that might be relevant? > > No bug found in this area since 4.4 > >> >> John >> >> >>> > On Oct 4, 2016, at 9:39 AM, John Spray wrote: > > On Tue, Oct 4, 2016 at 4:27 PM, Stephen Horton wrote: > Adding that all of my ceph components are version: > 10.2.2-0ubuntu0.16.04.2 > > Openstack is Mitaka on Ubuntu 16.04x. Manila file share is > 1:2.0.0-0ubuntu1 > > My scenario is that I have a 3-node ceph cluster running openstack > mitaka. Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running > in openstack; all are mounted to the Manila file share using cephfs > native kernel client driver. Each VM user has put 10-20 gb of files on > the share, but most of this is back-up, so IO requirement is very low. > However, I initially tried using ceph-fuse but performance latency was > poor. Moving to kernel client driver for mounting the share has improved > performance greatly. However, I am getting the cache pressure issue. Aside: bear in mind that the kernel client doesn't support quotas, so any size limits you set on your Manila shares won't be respected. > Can someone help me with the math to properly size the mds cache? How do > I know if the cache size is too small (I think very few files in-use at > any given time) versus the clients are broken and not releasing cache > properly? It's almost never the case that your cache is too small unless your workload is holding a silly number of files open at one time -- assume this is a client bug (although some people work around it by creating much bigger MDS caches!) You've mentioned the versions of openstack/ubuntu/ceph, but what kernel are you running? > > When the warnings happen?(client is idle or client is under load). Are > there any kernel warning when umounting the client that emitted the > warning? > > Regards > Yan, Zheng > > John > Thank you! > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure
On Tue, Oct 4, 2016 at 11:30 PM, John Spraywrote: > On Tue, Oct 4, 2016 at 5:09 PM, Stephen Horton wrote: >> Thank you John. Both my Openstack hosts and the VMs are all running >> 4.4.0-38-generic #57-Ubuntu SMP x86_64. I can see no evidence that any of >> the VMs are holding large numbers of files open. If this is likely a client >> bug, is there some process I can follow to file a bug report? > > It might be worthwhile to file a bug report with Ubuntu, as they'd be > the ones who would ideally backport fixes to their stable kernels (in > this instance it's hard to know if this is a bug in the latest kernel > code or something fixed since 4.4). > > It would be really useful if you could try installing the latest > released kernel on the clients and see if the issue persists: if so > then a ticket on tracker.ceph.com will be a priority for us to fix. > > CCing Zheng -- are there any noteworthy fixes between 4.4 and latest > kernel that might be relevant? No bug found in this area since 4.4 > > John > > >> >>> On Oct 4, 2016, at 9:39 AM, John Spray wrote: >>> On Tue, Oct 4, 2016 at 4:27 PM, Stephen Horton wrote: Adding that all of my ceph components are version: 10.2.2-0ubuntu0.16.04.2 Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 1:2.0.0-0ubuntu1 My scenario is that I have a 3-node ceph cluster running openstack mitaka. Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running in openstack; all are mounted to the Manila file share using cephfs native kernel client driver. Each VM user has put 10-20 gb of files on the share, but most of this is back-up, so IO requirement is very low. However, I initially tried using ceph-fuse but performance latency was poor. Moving to kernel client driver for mounting the share has improved performance greatly. However, I am getting the cache pressure issue. >>> >>> Aside: bear in mind that the kernel client doesn't support quotas, so >>> any size limits you set on your Manila shares won't be respected. >>> Can someone help me with the math to properly size the mds cache? How do I know if the cache size is too small (I think very few files in-use at any given time) versus the clients are broken and not releasing cache properly? >>> >>> It's almost never the case that your cache is too small unless your >>> workload is holding a silly number of files open at one time -- assume >>> this is a client bug (although some people work around it by creating >>> much bigger MDS caches!) >>> >>> You've mentioned the versions of openstack/ubuntu/ceph, but what >>> kernel are you running? >>> When the warnings happen?(client is idle or client is under load). Are there any kernel warning when umounting the client that emitted the warning? Regards Yan, Zheng >>> John >>> Thank you! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure
Thanks again John. I am installing 4.8.0-040800 kernel on my VM clients and will report back. Just to confirm: there is no reason for this issue to try the newer kernel on the mds node correct? > On Oct 4, 2016, at 10:30 AM, John Spraywrote: > >> On Tue, Oct 4, 2016 at 5:09 PM, Stephen Horton wrote: >> Thank you John. Both my Openstack hosts and the VMs are all running >> 4.4.0-38-generic #57-Ubuntu SMP x86_64. I can see no evidence that any of >> the VMs are holding large numbers of files open. If this is likely a client >> bug, is there some process I can follow to file a bug report? > > It might be worthwhile to file a bug report with Ubuntu, as they'd be > the ones who would ideally backport fixes to their stable kernels (in > this instance it's hard to know if this is a bug in the latest kernel > code or something fixed since 4.4). > > It would be really useful if you could try installing the latest > released kernel on the clients and see if the issue persists: if so > then a ticket on tracker.ceph.com will be a priority for us to fix. > > CCing Zheng -- are there any noteworthy fixes between 4.4 and latest > kernel that might be relevant? > > John > > >> On Oct 4, 2016, at 9:39 AM, John Spray wrote: On Tue, Oct 4, 2016 at 4:27 PM, Stephen Horton wrote: Adding that all of my ceph components are version: 10.2.2-0ubuntu0.16.04.2 Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 1:2.0.0-0ubuntu1 My scenario is that I have a 3-node ceph cluster running openstack mitaka. Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running in openstack; all are mounted to the Manila file share using cephfs native kernel client driver. Each VM user has put 10-20 gb of files on the share, but most of this is back-up, so IO requirement is very low. However, I initially tried using ceph-fuse but performance latency was poor. Moving to kernel client driver for mounting the share has improved performance greatly. However, I am getting the cache pressure issue. >>> >>> Aside: bear in mind that the kernel client doesn't support quotas, so >>> any size limits you set on your Manila shares won't be respected. >>> Can someone help me with the math to properly size the mds cache? How do I know if the cache size is too small (I think very few files in-use at any given time) versus the clients are broken and not releasing cache properly? >>> >>> It's almost never the case that your cache is too small unless your >>> workload is holding a silly number of files open at one time -- assume >>> this is a client bug (although some people work around it by creating >>> much bigger MDS caches!) >>> >>> You've mentioned the versions of openstack/ubuntu/ceph, but what >>> kernel are you running? >>> >>> John >>> Thank you! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure
On Tue, Oct 4, 2016 at 5:09 PM, Stephen Hortonwrote: > Thank you John. Both my Openstack hosts and the VMs are all running > 4.4.0-38-generic #57-Ubuntu SMP x86_64. I can see no evidence that any of the > VMs are holding large numbers of files open. If this is likely a client bug, > is there some process I can follow to file a bug report? It might be worthwhile to file a bug report with Ubuntu, as they'd be the ones who would ideally backport fixes to their stable kernels (in this instance it's hard to know if this is a bug in the latest kernel code or something fixed since 4.4). It would be really useful if you could try installing the latest released kernel on the clients and see if the issue persists: if so then a ticket on tracker.ceph.com will be a priority for us to fix. CCing Zheng -- are there any noteworthy fixes between 4.4 and latest kernel that might be relevant? John > >> On Oct 4, 2016, at 9:39 AM, John Spray wrote: >> >>> On Tue, Oct 4, 2016 at 4:27 PM, Stephen Horton wrote: >>> Adding that all of my ceph components are version: >>> 10.2.2-0ubuntu0.16.04.2 >>> >>> Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 1:2.0.0-0ubuntu1 >>> >>> My scenario is that I have a 3-node ceph cluster running openstack mitaka. >>> Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running in >>> openstack; all are mounted to the Manila file share using cephfs native >>> kernel client driver. Each VM user has put 10-20 gb of files on the share, >>> but most of this is back-up, so IO requirement is very low. However, I >>> initially tried using ceph-fuse but performance latency was poor. Moving to >>> kernel client driver for mounting the share has improved performance >>> greatly. However, I am getting the cache pressure issue. >> >> Aside: bear in mind that the kernel client doesn't support quotas, so >> any size limits you set on your Manila shares won't be respected. >> >>> Can someone help me with the math to properly size the mds cache? How do I >>> know if the cache size is too small (I think very few files in-use at any >>> given time) versus the clients are broken and not releasing cache properly? >> >> It's almost never the case that your cache is too small unless your >> workload is holding a silly number of files open at one time -- assume >> this is a client bug (although some people work around it by creating >> much bigger MDS caches!) >> >> You've mentioned the versions of openstack/ubuntu/ceph, but what >> kernel are you running? >> >> John >> >>> Thank you! >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure
Thank you John. Both my Openstack hosts and the VMs are all running 4.4.0-38-generic #57-Ubuntu SMP x86_64. I can see no evidence that any of the VMs are holding large numbers of files open. If this is likely a client bug, is there some process I can follow to file a bug report? > On Oct 4, 2016, at 9:39 AM, John Spraywrote: > >> On Tue, Oct 4, 2016 at 4:27 PM, Stephen Horton wrote: >> Adding that all of my ceph components are version: >> 10.2.2-0ubuntu0.16.04.2 >> >> Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 1:2.0.0-0ubuntu1 >> >> My scenario is that I have a 3-node ceph cluster running openstack mitaka. >> Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running in >> openstack; all are mounted to the Manila file share using cephfs native >> kernel client driver. Each VM user has put 10-20 gb of files on the share, >> but most of this is back-up, so IO requirement is very low. However, I >> initially tried using ceph-fuse but performance latency was poor. Moving to >> kernel client driver for mounting the share has improved performance >> greatly. However, I am getting the cache pressure issue. > > Aside: bear in mind that the kernel client doesn't support quotas, so > any size limits you set on your Manila shares won't be respected. > >> Can someone help me with the math to properly size the mds cache? How do I >> know if the cache size is too small (I think very few files in-use at any >> given time) versus the clients are broken and not releasing cache properly? > > It's almost never the case that your cache is too small unless your > workload is holding a silly number of files open at one time -- assume > this is a client bug (although some people work around it by creating > much bigger MDS caches!) > > You've mentioned the versions of openstack/ubuntu/ceph, but what > kernel are you running? > > John > >> Thank you! >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure
On Tue, Oct 4, 2016 at 4:27 PM, Stephen Hortonwrote: > Adding that all of my ceph components are version: > 10.2.2-0ubuntu0.16.04.2 > > Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 1:2.0.0-0ubuntu1 > > My scenario is that I have a 3-node ceph cluster running openstack mitaka. > Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running in > openstack; all are mounted to the Manila file share using cephfs native > kernel client driver. Each VM user has put 10-20 gb of files on the share, > but most of this is back-up, so IO requirement is very low. However, I > initially tried using ceph-fuse but performance latency was poor. Moving to > kernel client driver for mounting the share has improved performance greatly. > However, I am getting the cache pressure issue. Aside: bear in mind that the kernel client doesn't support quotas, so any size limits you set on your Manila shares won't be respected. > Can someone help me with the math to properly size the mds cache? How do I > know if the cache size is too small (I think very few files in-use at any > given time) versus the clients are broken and not releasing cache properly? It's almost never the case that your cache is too small unless your workload is holding a silly number of files open at one time -- assume this is a client bug (although some people work around it by creating much bigger MDS caches!) You've mentioned the versions of openstack/ubuntu/ceph, but what kernel are you running? John > Thank you! > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure
Adding that all of my ceph components are version: 10.2.2-0ubuntu0.16.04.2 Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 1:2.0.0-0ubuntu1 My scenario is that I have a 3-node ceph cluster running openstack mitaka. Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running in openstack; all are mounted to the Manila file share using cephfs native kernel client driver. Each VM user has put 10-20 gb of files on the share, but most of this is back-up, so IO requirement is very low. However, I initially tried using ceph-fuse but performance latency was poor. Moving to kernel client driver for mounting the share has improved performance greatly. However, I am getting the cache pressure issue. Can someone help me with the math to properly size the mds cache? How do I know if the cache size is too small (I think very few files in-use at any given time) versus the clients are broken and not releasing cache properly? Thank you! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com