Hi George
Well that’s strange. I wonder why our systems behave so differently.
We’ve got:
Hypervisors running on Ubuntu 14.04.
VMs with 9 ceph volumes: 2TB each.
XFS instead of your ext4
Maybe the number of placement groups plays a major role as well. Jens-Christian
may be able to give you
Hi George
In order to experience the error it was enough to simply run mkfs.xfs on all
the volumes.
In the meantime it became clear what the problem was:
~ ; cat /proc/183016/limits
...
Max open files1024 4096 files
..
This can be changed by
All,
I 've tried to recreate the issue without success!
My configuration is the following:
OS (Hypervisor + VM): CentOS 6.6 (2.6.32-504.1.3.el6.x86_64)
QEMU: qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64
Ceph: ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047),
20x4TB OSDs equally
In the end this came down to one slow OSD. There were no hardware
issues so have to just assume something gummed up during rebalancing and
peering.
I restarted the osd process after setting the cluster to noout. After
the osd was restarted the rebalance completed and the cluster returned
to
To follow up on the original post,
Further digging indicates this is a problem with RBD image access and is
not related to NFS-RBD interaction as initially suspected. The nfsd is
simply hanging as a result of a hung request to the XFS file system
mounted on our RBD-NFS gateway.This hung XFS
Thanks a million for the feedback Christian!
I 've tried to recreate the issue with 10RBD Volumes mounted on a
single server without success!
I 've issued the mkfs.xfs command simultaneously (or at least as fast
I could do it in different terminals) without noticing any problems. Can
you
Jens-Christian Fischer jens-christian.fischer@... writes:
I think we (i.e. Christian) found the problem:
We created a test VM with 9 mounted RBD volumes (no NFS server). As soon as
he hit all disks, we started to experience these 120 second timeouts. We
realized that the QEMU process on the
George,
I will let Christian provide you the details. As far as I know, it was enough
to just do a ‘ls’ on all of the attached drives.
we are using Qemu 2.0:
$ dpkg -l | grep qemu
ii ipxe-qemu 1.0.0+git-2013.c3d1e78-2ubuntu1
all PXE boot firmware -
I think we (i.e. Christian) found the problem:
We created a test VM with 9 mounted RBD volumes (no NFS server). As soon as he
hit all disks, we started to experience these 120 second timeouts. We realized
that the QEMU process on the hypervisor is opening a TCP connection to every
OSD for
Jens-Christian,
how did you test that? Did you just tried to write to them
simultaneously? Any other tests that one can perform to verify that?
In our installation we have a VM with 30 RBD volumes mounted which are
all exported via NFS to other VMs.
No one has complaint for the moment but
Hello,
lets compare your case with John-Paul's.
Different OS and Ceph versions (thus we can assume different NFS versions
as well).
The only common thing is that both of you added OSDs and are likely
suffering from delays stemming from Ceph re-balancing or deep-scrubbing.
Ceph logs will only
We see something very similar on our Ceph cluster, starting as of today.
We use a 16 node, 102 OSD Ceph installation as the basis for an Icehouse
OpenStack cluster (we applied the RBD patches for live migration etc)
On this cluster we have a big ownCloud installation (Sync Share) that stores
We've had a an NFS gateway serving up RBD images successfully for over a year.
Ubuntu 12.04 and ceph .73 iirc.
In the past couple of weeks we have developed a problem where the nfs clients
hang while accessing exported rbd containers.
We see errors on the server about nfsd hanging for 120sec
13 matches
Mail list logo