Re: [ceph-users] NFS interaction with RBD

2015-06-11 Thread Christian Schnidrig
Hi George

Well that’s strange. I wonder why our systems behave so differently.

We’ve got:

Hypervisors running on Ubuntu 14.04. 
VMs with 9 ceph volumes: 2TB each.
XFS instead of your ext4

Maybe the number of placement groups plays a major role as well. Jens-Christian 
may be able to give you the specifics of our ceph cluster. 
I’m about to leave on vacation and don’t have time to look that up anymore.

Best regards
Christian


On 29 May 2015, at 14:42, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote:

 All,
 
 I 've tried to recreate the issue without success!
 
 My configuration is the following:
 
 OS (Hypervisor + VM): CentOS 6.6 (2.6.32-504.1.3.el6.x86_64)
 QEMU: qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64
 Ceph: ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), 20x4TB 
 OSDs equally distributed on two disk nodes, 3xMonitors
 
 
 OpenStack Cinder has been configured to provide RBD Volumes from Ceph.
 
 I have created 10x 500GB Volumes which were then all attached at a single 
 Virtual Machine.
 
 All volumes were formatted two times for comparison reasons, one using 
 mkfs.xfs and one using mkfs.ext4.
 I did try to issue the commands all at the same time (or as possible to that).
 
 In both tests I didn't notice any interruption. It may took longer than just 
 doing one at a time but the system was continuously up and everything was 
 responding without the problem.
 
 At the time of these processes the open connections were 100 with one of the 
 OSD node and 111 with the other one.
 
 So I guess I am not experiencing the issue due to the low number of OSDs I am 
 having. Is my assumption correct?
 
 
 Best regards,
 
 George
 
 
 
 Thanks a million for the feedback Christian!
 
 I 've tried to recreate the issue with 10RBD Volumes mounted on a
 single server without success!
 
 I 've issued the mkfs.xfs command simultaneously (or at least as
 fast I could do it in different terminals) without noticing any
 problems. Can you please tell me what was the size of each one of the
 RBD Volumes cause I have a feeling that mine were two small, and if so
 I have to test it on our bigger cluster.
 
 I 've also thought that besides QEMU version it might also be
 important the underlying OS, so what was your testbed?
 
 
 All the best,
 
 George
 
 Hi George
 
 In order to experience the error it was enough to simply run mkfs.xfs
 on all the volumes.
 
 
 In the meantime it became clear what the problem was:
 
 ~ ; cat /proc/183016/limits
 ...
 Max open files1024 4096 files
 ..
 
 This can be changed by setting a decent value in
 /etc/libvirt/qemu.conf for max_files.
 
 Regards
 Christian
 
 
 
 On 27 May 2015, at 16:23, Jens-Christian Fischer
 jens-christian.fisc...@switch.ch wrote:
 
 George,
 
 I will let Christian provide you the details. As far as I know, it was 
 enough to just do a ‘ls’ on all of the attached drives.
 
 we are using Qemu 2.0:
 
 $ dpkg -l | grep qemu
 ii  ipxe-qemu   
 1.0.0+git-2013.c3d1e78-2ubuntu1   all  PXE boot firmware - ROM 
 images for qemu
 ii  qemu-keymaps2.0.0+dfsg-2ubuntu1.11  all
   QEMU keyboard maps
 ii  qemu-system 2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries
 ii  qemu-system-arm 2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (arm)
 ii  qemu-system-common  2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (common files)
 ii  qemu-system-mips2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (mips)
 ii  qemu-system-misc2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (miscelaneous)
 ii  qemu-system-ppc 2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (ppc)
 ii  qemu-system-sparc   2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (sparc)
 ii  qemu-system-x86 2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (x86)
 ii  qemu-utils  2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU utilities
 
 cheers
 jc
 
 --
 SWITCH
 Jens-Christian Fischer, Peta Solutions
 Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
 phone +41 44 268 15 15, direct +41 44 268 15 71
 jens-christian.fisc...@switch.ch
 http://www.switch.ch
 
 http://www.switch.ch/stories
 
 On 26.05.2015, at 19:12, Georgios Dimitrakakis gior...@acmac.uoc.gr 
 wrote:
 
 Jens-Christian,
 
 how did you test that? Did you just tried to write to them 
 simultaneously? Any other tests that one can perform to verify that?
 
 In our installation we have a VM with 30 RBD volumes mounted which are 
 all exported via NFS to other VMs.
 No one has complaint for the moment but the load/usage is 

Re: [ceph-users] NFS interaction with RBD

2015-06-11 Thread Christian Schnidrig
Hi George

In order to experience the error it was enough to simply run mkfs.xfs on all 
the volumes.


In the meantime it became clear what the problem was:

 ~ ; cat /proc/183016/limits
...
Max open files1024 4096 files
..

This can be changed by setting a decent value in /etc/libvirt/qemu.conf for 
max_files.

Regards
Christian



On 27 May 2015, at 16:23, Jens-Christian Fischer 
jens-christian.fisc...@switch.ch wrote:

 George,
 
 I will let Christian provide you the details. As far as I know, it was enough 
 to just do a ‘ls’ on all of the attached drives.
 
 we are using Qemu 2.0:
 
 $ dpkg -l | grep qemu
 ii  ipxe-qemu   1.0.0+git-2013.c3d1e78-2ubuntu1   
 all  PXE boot firmware - ROM images for qemu
 ii  qemu-keymaps2.0.0+dfsg-2ubuntu1.11
 all  QEMU keyboard maps
 ii  qemu-system 2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries
 ii  qemu-system-arm 2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (arm)
 ii  qemu-system-common  2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (common files)
 ii  qemu-system-mips2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (mips)
 ii  qemu-system-misc2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (miscelaneous)
 ii  qemu-system-ppc 2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (ppc)
 ii  qemu-system-sparc   2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (sparc)
 ii  qemu-system-x86 2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (x86)
 ii  qemu-utils  2.0.0+dfsg-2ubuntu1.11
 amd64QEMU utilities
 
 cheers
 jc
 
 -- 
 SWITCH
 Jens-Christian Fischer, Peta Solutions
 Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
 phone +41 44 268 15 15, direct +41 44 268 15 71
 jens-christian.fisc...@switch.ch
 http://www.switch.ch
 
 http://www.switch.ch/stories
 
 On 26.05.2015, at 19:12, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote:
 
 Jens-Christian,
 
 how did you test that? Did you just tried to write to them simultaneously? 
 Any other tests that one can perform to verify that?
 
 In our installation we have a VM with 30 RBD volumes mounted which are all 
 exported via NFS to other VMs.
 No one has complaint for the moment but the load/usage is very minimal.
 If this problem really exists then very soon that the trial phase will be 
 over we will have millions of complaints :-(
 
 What version of QEMU are you using? We are using the one provided by Ceph in 
 qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64.rpm
 
 Best regards,
 
 George
 
 I think we (i.e. Christian) found the problem:
 
 We created a test VM with 9 mounted RBD volumes (no NFS server). As
 soon as he hit all disks, we started to experience these 120 second
 timeouts. We realized that the QEMU process on the hypervisor is
 opening a TCP connection to every OSD for every mounted volume -
 exceeding the 1024 FD limit.
 
 So no deep scrubbing etc, but simply to many connections…
 
 cheers
 jc
 
 --
 SWITCH
 Jens-Christian Fischer, Peta Solutions
 Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
 phone +41 44 268 15 15, direct +41 44 268 15 71
 jens-christian.fisc...@switch.ch [3]
 http://www.switch.ch
 
 http://www.switch.ch/stories
 
 On 25.05.2015, at 06:02, Christian Balzer  wrote:
 
 Hello,
 
 lets compare your case with John-Paul's.
 
 Different OS and Ceph versions (thus we can assume different NFS
 versions
 as well).
 The only common thing is that both of you added OSDs and are likely
 suffering from delays stemming from Ceph re-balancing or
 deep-scrubbing.
 
 Ceph logs will only pipe up when things have been blocked for more
 than 30
 seconds, NFS might take offense to lower values (or the accumulation
 of
 several distributed delays).
 
 You added 23 OSDs, tell us more about your cluster, HW, network.
 Were these added to the existing 16 nodes, are these on new storage
 nodes
 (so could there be something different with those nodes?), how busy
 is your
 network, CPU.
 Running something like collectd to gather all ceph perf data and
 other
 data from the storage nodes and then feeding it to graphite (or
 similar)
 can be VERY helpful to identify if something is going wrong and what
 it is
 in particular.
 Otherwise run atop on your storage nodes to identify if CPU,
 network,
 specific HDDs/OSDs are bottlenecks.
 
 Deep scrubbing can be _very_ taxing, do your problems persist if
 inject
 into your running cluster an osd_scrub_sleep value of 0.5 (lower