Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure

2016-10-05 Thread Stephen Horton
Hello Zheng
This is my initial email containing ceph -s and session ls info. I will send 
cache dump shortly. Note that per John's suggestion, I have upgraded the 
offending clients to 4.8 kernel, so my cache dump will be current with these 
new clients.
Thanks,
Stephen


Begin forwarded message:

> From: Stephen Horton 
> Date: October 3, 2016 at 6:45:49 PM CDT
> To: "ceph-users@lists.ceph.com" 
> Subject: cephfs kernel driver - failing to respond to cache pressure
> Reply-To: Stephen Horton 
> 
> I am using Ceph to back Openstack Nova ephemeral, Cinder volumes, Glance 
> images, and Openstack Manila File Share storage. Originally, I was using 
> ceph-fuse with Manila, but performance and resource usage was poor, so I 
> changed to using the CephFs kernel driver. Now however, I am getting messages 
> from my MDS that all of my Manila file share clients are "failing to respond 
> to cache pressure". Can anyone take a look and advise me? I have earlier 
> increased mds_cache_size from 10 to 50. Am I missing some 
> configuration on the clients to correctly enable them to respond to cache 
> instruction?
> thanks!
> shorton
> 
> arcuser@arccloud01:~$ sudo ceph daemon mds.arccloud01 perf dump mds
> {
> "mds": {
> "request": 16147252,
> "reply": 16147249,
> "reply_latency": {
> "avgcount": 16147249,
> "sum": 19885.911791960
> },
> "forward": 0,
> "dir_fetch": 535805,
> "dir_commit": 35493,
> "dir_split": 0,
> "inode_max": 50,
> "inodes": 499912,
> "inodes_top": 129548,
> "inodes_bottom": 365853,
> "inodes_pin_tail": 4511,
> "inodes_pinned": 314789,
> "inodes_expired": 3214675,
> "inodes_with_caps": 314579,
> "caps": 5338001,
> "subtrees": 2,
> "traverse": 19339004,
> "traverse_hit": 16500851,
> "traverse_forward": 0,
> "traverse_discover": 0,
> "traverse_dir_fetch": 2738658,
> "traverse_remote_ino": 0,
> "traverse_lock": 2149,
> "load_cent": 1614725200,
> "q": 0,
> "exported": 0,
> "exported_inodes": 0,
> "imported": 0,
> "imported_inodes": 0
> }
> }
> 
> arcuser@arccloud01:~$ cat /etc/ceph/ceph.conf
> [global]
>   fsid = 6e647506-631a-457e-a52a-f21a3866a023
>   mon_initial_members = arccloud01, arccloud02, arccloud03
>   mon_host = 10.155.92.128,10.155.92.129,10.155.92.130
>   mon_pg_warn_max_per_osd = 400
>   mon_lease = 50
>   mon_lease_renew_interval = 30
>   mon_lease_ack_timeout = 100
>   auth_cluster_required = cephx
>   auth_service_required = cephx
>   auth_client_required = cephx
>   public_network = 10.155.92.0/22
>   cluster_network = 192.168.92.0/22
> 
> [client.glanceimages]
>   keyring = /etc/ceph/ceph.client.glanceimages.keyring
> 
> [client.novapool]
>   keyring = /etc/ceph/ceph.client.novapool.keyring
> 
> [client.cindervolumes]
>   keyring = /etc/ceph/ceph.client.cindervolumes.keyring
> 
> [client.manila]
>   client_mount_uid = 0
>   client_mount_gid = 0
>   log_file = /opt/stack/logs/ceph-client.manila.log
>   admin_socket = /opt/stack/status/stack/ceph-$name.$pid.asok
>   keyring = /etc/ceph/ceph.client.manila.keyring
> 
> [mon.arccloud01]
>   host = arccloud01
>   mon addr = 10.155.92.128:6789
> 
> [mon.arccloud02]
>   host = arccloud02
>   mon addr = 10.155.92.129:6789
> 
> [mon.arccloud03]
>   host = arccloud03
>   mon addr = 10.155.92.130:6789
> 
> [osd.2]
>   host = arccloud01
>   public addr = 10.155.92.128
>   cluster addr = 192.168.92.128
> 
> [osd.1]
>   host = arccloud02
>   public addr = 10.155.92.129
>   cluster addr = 192.168.92.129
> 
> [osd.0]
>   host = arccloud03
>   public addr = 10.155.92.130
>   cluster addr = 192.168.92.130
> 
> [mds]
>   mds cache size = 50
> 
> arcuser@arccloud01:/usr/local/bin$ sudo ceph -s
> cluster 6e647506-631a-457e-a52a-f21a3866a023
>  health HEALTH_WARN
> mds0: Client ROSA-LIN-DESKTOP failing to respond to cache pressure
> mds0: Client QI-DAI-DESKTOP failing to respond to cache pressure
>  monmap e1: 3 mons at 
> {arccloud01=10.155.92.128:6789/0,arccloud02=10.155.92.129:6789/0,arccloud03=10.155.92.130:6789/0}
> election epoch 5152, quorum 0,1,2 arccloud01,arccloud02,arccloud03
>   fsmap e1774: 1/1/1 up {0=arccloud01=up:active}
>  osdmap e1528: 3 osds: 3 up, 3 in
> flags sortbitwise
>   pgmap v1637495: 384 pgs, 6 pools, 458 GB data, 1992 kobjects
> 1752 GB used, 40431 GB / 42184 GB avail
>  384 active+clean
>   client io 36944 B/s wr, 0 op/s rd, 7 op/s wr
> 
> arcuser@arccloud01:/usr/local/bin$ sudo ceph auth list
> installed auth entries:
> 
> mds.arccloud01
> key: AQCqk6tXSuIhBxAAhBWtOpaezVMooYlWJyRXCQ==
> caps: [mds] allow
> caps: [mon] allow profile mds
> 

Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure

2016-10-05 Thread Stephen Horton
Clients are almost all idle, very little load on the cluster. I can see no 
errors or warnings in the client logs when the file share is unmounted. Thx!

> On Oct 4, 2016, at 10:31 PM, Yan, Zheng  wrote:
> 
>> On Tue, Oct 4, 2016 at 11:30 PM, John Spray  wrote:
>>> On Tue, Oct 4, 2016 at 5:09 PM, Stephen Horton  wrote:
>>> Thank you John. Both my Openstack hosts and the VMs are all running 
>>> 4.4.0-38-generic #57-Ubuntu SMP x86_64. I can see no evidence that any of 
>>> the VMs are holding large numbers of files open. If this is likely a client 
>>> bug, is there some process I can follow to file a bug report?
>> 
>> It might be worthwhile to file a bug report with Ubuntu, as they'd be
>> the ones who would ideally backport fixes to their stable kernels (in
>> this instance it's hard to know if this is a bug in the latest kernel
>> code or something fixed since 4.4).
>> 
>> It would be really useful if you could try installing the latest
>> released kernel on the clients and see if the issue persists: if so
>> then a ticket on tracker.ceph.com will be a priority for us to fix.
>> 
>> CCing Zheng -- are there any noteworthy fixes between 4.4 and latest
>> kernel that might be relevant?
> 
> No bug found in this area since 4.4
> 
>> 
>> John
>> 
>> 
>>> 
> On Oct 4, 2016, at 9:39 AM, John Spray  wrote:
> 
> On Tue, Oct 4, 2016 at 4:27 PM, Stephen Horton  wrote:
> Adding that all of my ceph components are version:
> 10.2.2-0ubuntu0.16.04.2
> 
> Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 
> 1:2.0.0-0ubuntu1
> 
> My scenario is that I have a 3-node ceph cluster running openstack 
> mitaka. Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running 
> in openstack; all are mounted to the Manila file share using cephfs 
> native kernel client driver. Each VM user has put 10-20 gb of files on 
> the share, but most of this is back-up, so IO requirement is very low. 
> However, I initially tried using ceph-fuse but performance latency was 
> poor. Moving to kernel client driver for mounting the share has improved 
> performance greatly. However, I am getting the cache pressure issue.
 
 Aside: bear in mind that the kernel client doesn't support quotas, so
 any size limits you set on your Manila shares won't be respected.
 
> Can someone help me with the math to properly size the mds cache? How do 
> I know if the cache size is too small (I think very few files in-use at 
> any given time) versus the clients are broken and not releasing cache 
> properly?
 
 It's almost never the case that your cache is too small unless your
 workload is holding a silly number of files open at one time -- assume
 this is a client bug (although some people work around it by creating
 much bigger MDS caches!)
 
 You've mentioned the versions of openstack/ubuntu/ceph, but what
 kernel are you running?
 
> 
> When the warnings happen?(client is idle or client is under load). Are
> there any kernel warning when umounting the client that emitted the
> warning?
> 
> Regards
> Yan, Zheng
> 
> 
 John
 
> Thank you!
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure

2016-10-04 Thread Yan, Zheng
On Tue, Oct 4, 2016 at 11:30 PM, John Spray  wrote:
> On Tue, Oct 4, 2016 at 5:09 PM, Stephen Horton  wrote:
>> Thank you John. Both my Openstack hosts and the VMs are all running 
>> 4.4.0-38-generic #57-Ubuntu SMP x86_64. I can see no evidence that any of 
>> the VMs are holding large numbers of files open. If this is likely a client 
>> bug, is there some process I can follow to file a bug report?
>
> It might be worthwhile to file a bug report with Ubuntu, as they'd be
> the ones who would ideally backport fixes to their stable kernels (in
> this instance it's hard to know if this is a bug in the latest kernel
> code or something fixed since 4.4).
>
> It would be really useful if you could try installing the latest
> released kernel on the clients and see if the issue persists: if so
> then a ticket on tracker.ceph.com will be a priority for us to fix.
>
> CCing Zheng -- are there any noteworthy fixes between 4.4 and latest
> kernel that might be relevant?

No bug found in this area since 4.4

>
> John
>
>
>>
>>> On Oct 4, 2016, at 9:39 AM, John Spray  wrote:
>>>
 On Tue, Oct 4, 2016 at 4:27 PM, Stephen Horton  wrote:
 Adding that all of my ceph components are version:
 10.2.2-0ubuntu0.16.04.2

 Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 1:2.0.0-0ubuntu1

 My scenario is that I have a 3-node ceph cluster running openstack mitaka. 
 Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running in 
 openstack; all are mounted to the Manila file share using cephfs native 
 kernel client driver. Each VM user has put 10-20 gb of files on the share, 
 but most of this is back-up, so IO requirement is very low. However, I 
 initially tried using ceph-fuse but performance latency was poor. Moving 
 to kernel client driver for mounting the share has improved performance 
 greatly. However, I am getting the cache pressure issue.
>>>
>>> Aside: bear in mind that the kernel client doesn't support quotas, so
>>> any size limits you set on your Manila shares won't be respected.
>>>
 Can someone help me with the math to properly size the mds cache? How do I 
 know if the cache size is too small (I think very few files in-use at any 
 given time) versus the clients are broken and not releasing cache properly?
>>>
>>> It's almost never the case that your cache is too small unless your
>>> workload is holding a silly number of files open at one time -- assume
>>> this is a client bug (although some people work around it by creating
>>> much bigger MDS caches!)
>>>
>>> You've mentioned the versions of openstack/ubuntu/ceph, but what
>>> kernel are you running?
>>>

When the warnings happen?(client is idle or client is under load). Are
there any kernel warning when umounting the client that emitted the
warning?

Regards
Yan, Zheng


>>> John
>>>
 Thank you!
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure

2016-10-04 Thread Stephen Horton
Thanks again John. I am installing 4.8.0-040800 kernel on my VM clients and 
will report back. Just to confirm: there is no reason for this issue to try the 
newer kernel on the mds node correct?

> On Oct 4, 2016, at 10:30 AM, John Spray  wrote:
> 
>> On Tue, Oct 4, 2016 at 5:09 PM, Stephen Horton  wrote:
>> Thank you John. Both my Openstack hosts and the VMs are all running 
>> 4.4.0-38-generic #57-Ubuntu SMP x86_64. I can see no evidence that any of 
>> the VMs are holding large numbers of files open. If this is likely a client 
>> bug, is there some process I can follow to file a bug report?
> 
> It might be worthwhile to file a bug report with Ubuntu, as they'd be
> the ones who would ideally backport fixes to their stable kernels (in
> this instance it's hard to know if this is a bug in the latest kernel
> code or something fixed since 4.4).
> 
> It would be really useful if you could try installing the latest
> released kernel on the clients and see if the issue persists: if so
> then a ticket on tracker.ceph.com will be a priority for us to fix.
> 
> CCing Zheng -- are there any noteworthy fixes between 4.4 and latest
> kernel that might be relevant?
> 
> John
> 
> 
>> 
 On Oct 4, 2016, at 9:39 AM, John Spray  wrote:
 
 On Tue, Oct 4, 2016 at 4:27 PM, Stephen Horton  wrote:
 Adding that all of my ceph components are version:
 10.2.2-0ubuntu0.16.04.2
 
 Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 1:2.0.0-0ubuntu1
 
 My scenario is that I have a 3-node ceph cluster running openstack mitaka. 
 Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running in 
 openstack; all are mounted to the Manila file share using cephfs native 
 kernel client driver. Each VM user has put 10-20 gb of files on the share, 
 but most of this is back-up, so IO requirement is very low. However, I 
 initially tried using ceph-fuse but performance latency was poor. Moving 
 to kernel client driver for mounting the share has improved performance 
 greatly. However, I am getting the cache pressure issue.
>>> 
>>> Aside: bear in mind that the kernel client doesn't support quotas, so
>>> any size limits you set on your Manila shares won't be respected.
>>> 
 Can someone help me with the math to properly size the mds cache? How do I 
 know if the cache size is too small (I think very few files in-use at any 
 given time) versus the clients are broken and not releasing cache properly?
>>> 
>>> It's almost never the case that your cache is too small unless your
>>> workload is holding a silly number of files open at one time -- assume
>>> this is a client bug (although some people work around it by creating
>>> much bigger MDS caches!)
>>> 
>>> You've mentioned the versions of openstack/ubuntu/ceph, but what
>>> kernel are you running?
>>> 
>>> John
>>> 
 Thank you!
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure

2016-10-04 Thread John Spray
On Tue, Oct 4, 2016 at 5:09 PM, Stephen Horton  wrote:
> Thank you John. Both my Openstack hosts and the VMs are all running 
> 4.4.0-38-generic #57-Ubuntu SMP x86_64. I can see no evidence that any of the 
> VMs are holding large numbers of files open. If this is likely a client bug, 
> is there some process I can follow to file a bug report?

It might be worthwhile to file a bug report with Ubuntu, as they'd be
the ones who would ideally backport fixes to their stable kernels (in
this instance it's hard to know if this is a bug in the latest kernel
code or something fixed since 4.4).

It would be really useful if you could try installing the latest
released kernel on the clients and see if the issue persists: if so
then a ticket on tracker.ceph.com will be a priority for us to fix.

CCing Zheng -- are there any noteworthy fixes between 4.4 and latest
kernel that might be relevant?

John


>
>> On Oct 4, 2016, at 9:39 AM, John Spray  wrote:
>>
>>> On Tue, Oct 4, 2016 at 4:27 PM, Stephen Horton  wrote:
>>> Adding that all of my ceph components are version:
>>> 10.2.2-0ubuntu0.16.04.2
>>>
>>> Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 1:2.0.0-0ubuntu1
>>>
>>> My scenario is that I have a 3-node ceph cluster running openstack mitaka. 
>>> Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running in 
>>> openstack; all are mounted to the Manila file share using cephfs native 
>>> kernel client driver. Each VM user has put 10-20 gb of files on the share, 
>>> but most of this is back-up, so IO requirement is very low. However, I 
>>> initially tried using ceph-fuse but performance latency was poor. Moving to 
>>> kernel client driver for mounting the share has improved performance 
>>> greatly. However, I am getting the cache pressure issue.
>>
>> Aside: bear in mind that the kernel client doesn't support quotas, so
>> any size limits you set on your Manila shares won't be respected.
>>
>>> Can someone help me with the math to properly size the mds cache? How do I 
>>> know if the cache size is too small (I think very few files in-use at any 
>>> given time) versus the clients are broken and not releasing cache properly?
>>
>> It's almost never the case that your cache is too small unless your
>> workload is holding a silly number of files open at one time -- assume
>> this is a client bug (although some people work around it by creating
>> much bigger MDS caches!)
>>
>> You've mentioned the versions of openstack/ubuntu/ceph, but what
>> kernel are you running?
>>
>> John
>>
>>> Thank you!
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure

2016-10-04 Thread Stephen Horton
Thank you John. Both my Openstack hosts and the VMs are all running 
4.4.0-38-generic #57-Ubuntu SMP x86_64. I can see no evidence that any of the 
VMs are holding large numbers of files open. If this is likely a client bug, is 
there some process I can follow to file a bug report?

> On Oct 4, 2016, at 9:39 AM, John Spray  wrote:
> 
>> On Tue, Oct 4, 2016 at 4:27 PM, Stephen Horton  wrote:
>> Adding that all of my ceph components are version:
>> 10.2.2-0ubuntu0.16.04.2
>> 
>> Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 1:2.0.0-0ubuntu1
>> 
>> My scenario is that I have a 3-node ceph cluster running openstack mitaka. 
>> Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running in 
>> openstack; all are mounted to the Manila file share using cephfs native 
>> kernel client driver. Each VM user has put 10-20 gb of files on the share, 
>> but most of this is back-up, so IO requirement is very low. However, I 
>> initially tried using ceph-fuse but performance latency was poor. Moving to 
>> kernel client driver for mounting the share has improved performance 
>> greatly. However, I am getting the cache pressure issue.
> 
> Aside: bear in mind that the kernel client doesn't support quotas, so
> any size limits you set on your Manila shares won't be respected.
> 
>> Can someone help me with the math to properly size the mds cache? How do I 
>> know if the cache size is too small (I think very few files in-use at any 
>> given time) versus the clients are broken and not releasing cache properly?
> 
> It's almost never the case that your cache is too small unless your
> workload is holding a silly number of files open at one time -- assume
> this is a client bug (although some people work around it by creating
> much bigger MDS caches!)
> 
> You've mentioned the versions of openstack/ubuntu/ceph, but what
> kernel are you running?
> 
> John
> 
>> Thank you!
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure

2016-10-04 Thread John Spray
On Tue, Oct 4, 2016 at 4:27 PM, Stephen Horton  wrote:
> Adding that all of my ceph components are version:
> 10.2.2-0ubuntu0.16.04.2
>
> Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 1:2.0.0-0ubuntu1
>
> My scenario is that I have a 3-node ceph cluster running openstack mitaka. 
> Each node has 256gb ram, 14tb raid 5 array. I have 30 VMs running in 
> openstack; all are mounted to the Manila file share using cephfs native 
> kernel client driver. Each VM user has put 10-20 gb of files on the share, 
> but most of this is back-up, so IO requirement is very low. However, I 
> initially tried using ceph-fuse but performance latency was poor. Moving to 
> kernel client driver for mounting the share has improved performance greatly. 
> However, I am getting the cache pressure issue.

Aside: bear in mind that the kernel client doesn't support quotas, so
any size limits you set on your Manila shares won't be respected.

> Can someone help me with the math to properly size the mds cache? How do I 
> know if the cache size is too small (I think very few files in-use at any 
> given time) versus the clients are broken and not releasing cache properly?

It's almost never the case that your cache is too small unless your
workload is holding a silly number of files open at one time -- assume
this is a client bug (although some people work around it by creating
much bigger MDS caches!)

You've mentioned the versions of openstack/ubuntu/ceph, but what
kernel are you running?

John

> Thank you!
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel driver - failing to respond to cache pressure

2016-10-04 Thread Stephen Horton
Adding that all of my ceph components are version:
10.2.2-0ubuntu0.16.04.2

Openstack is Mitaka on Ubuntu 16.04x. Manila file share is 1:2.0.0-0ubuntu1

My scenario is that I have a 3-node ceph cluster running openstack mitaka. Each 
node has 256gb ram, 14tb raid 5 array. I have 30 VMs running in openstack; all 
are mounted to the Manila file share using cephfs native kernel client driver. 
Each VM user has put 10-20 gb of files on the share, but most of this is 
back-up, so IO requirement is very low. However, I initially tried using 
ceph-fuse but performance latency was poor. Moving to kernel client driver for 
mounting the share has improved performance greatly. However, I am getting the 
cache pressure issue.

Can someone help me with the math to properly size the mds cache? How do I know 
if the cache size is too small (I think very few files in-use at any given 
time) versus the clients are broken and not releasing cache properly?

Thank you!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com