[ceph-users] Issues with CentOS RDO Liberty (OpenStack) and Ceph Repo (dependency resolution failed)
Dear Ceph-users, I just set up a new CentOS7 Ceph- and OpenStack-Cluster. When "ceph-deploy install compute2" starts to set up the Ceph repo, it fails at dependency resolution: === Loaded plugins: fastestmirror, langpacks, priorities Loading mirror speeds from cached hostfile * base: mirror.23media.de * elrepo: mirrors.ircam.fr * epel: mirror.23media.de * extras: ftp.rrzn.uni-hannover.de * updates: ftp.rrzn.uni-hannover.de 5977 packages excluded due to repository priority protections Resolving Dependencies --> Running transaction check ---> Package ceph.x86_64 1:0.94.5-0.el7 will be installed --> Processing Dependency: libcephfs1 = 1:0.94.5-0.el7 for package: 1:ceph-0.94.5-0.el7.x86_64 --> Processing Dependency: python-rbd = 1:0.94.5-0.el7 for package: 1:ceph-0.94.5-0.el7.x86_64 --> Processing Dependency: ceph-common = 1:0.94.5-0.el7 for package: 1:ceph-0.94.5-0.el7.x86_64 --> Processing Dependency: python-cephfs = 1:0.94.5-0.el7 for package: 1:ceph-0.94.5-0.el7.x86_64 --> Processing Dependency: python-rados = 1:0.94.5-0.el7 for package: 1:ceph-0.94.5-0.el7.x86_64 --> Processing Dependency: python-flask for package: 1:ceph-0.94.5-0.el7.x86_64 --> Processing Dependency: hdparm for package: 1:ceph-0.94.5-0.el7.x86_64 --> Processing Dependency: libboost_program_options-mt.so.1.53.0()(64bit) for package: 1:ceph-0.94.5-0.el7.x86_64 --> Processing Dependency: libtcmalloc.so.4()(64bit) for package: 1:ceph-0.94.5-0.el7.x86_64 --> Processing Dependency: libleveldb.so.1()(64bit) for package: 1:ceph-0.94.5-0.el7.x86_64 --> Processing Dependency: libcephfs.so.1()(64bit) for package: 1:ceph-0.94.5-0.el7.x86_64 ---> Package ceph-radosgw.x86_64 1:0.94.5-0.el7 will be installed --> Processing Dependency: libfcgi.so.0()(64bit) for package: 1:ceph-radosgw-0.94.5-0.el7.x86_64 --> Running transaction check ---> Package boost-program-options.x86_64 0:1.53.0-23.el7 will be installed ---> Package ceph-common.x86_64 1:0.94.5-0.el7 will be installed --> Processing Dependency: redhat-lsb-core for package: 1:ceph-common-0.94.5-0.el7.x86_64 ---> Package fcgi.x86_64 0:2.4.0-22.el7 will be installed ---> Package gperftools-libs.x86_64 0:2.1-1.el7 will be installed --> Processing Dependency: libunwind.so.8()(64bit) for package: gperftools-libs-2.1-1.el7.x86_64 ---> Package hdparm.x86_64 0:9.43-5.el7 will be installed ---> Package leveldb.x86_64 0:1.12.0-5.el7 will be installed ---> Package libcephfs1.x86_64 1:0.94.5-0.el7 will be installed ---> Package python-cephfs.x86_64 1:0.94.5-0.el7 will be installed ---> Package python-flask.noarch 1:0.10.1-3.el7 will be installed --> Processing Dependency: python-itsdangerous for package: 1:python-flask-0.10.1-3.el7.noarch ---> Package python-rados.x86_64 1:0.94.5-0.el7 will be installed ---> Package python-rbd.x86_64 1:0.94.5-0.el7 will be installed --> Running transaction check ---> Package libunwind.x86_64 0:1.1-10.el7 will be installed ---> Package python-itsdangerous.noarch 0:0.23-1.el7 will be installed ---> Package redhat-lsb-core.x86_64 0:4.1-27.el7.centos.1 will be installed --> Processing Dependency: redhat-lsb-submod-security(x86-64) = 4.1-27.el7.centos.1 for package: redhat-lsb-core-4.1-27.el7.centos.1.x86_64 --> Processing Dependency: spax for package: redhat-lsb-core-4.1-27.el7.centos.1.x86_64 --> Processing Dependency: /usr/bin/patch for package: redhat-lsb-core-4.1-27.el7.centos.1.x86_64 --> Processing Dependency: /usr/bin/m4 for package: redhat-lsb-core-4.1-27.el7.centos.1.x86_64 --> Processing Dependency: /usr/bin/lpr for package: redhat-lsb-core-4.1-27.el7.centos.1.x86_64 --> Processing Dependency: /usr/bin/lp for package: redhat-lsb-core-4.1-27.el7.centos.1.x86_64 --> Running transaction check ---> Package cups-client.x86_64 1:1.6.3-17.el7 will be installed --> Processing Dependency: cups-libs(x86-64) = 1:1.6.3-17.el7 for package: 1:cups-client-1.6.3-17.el7.x86_64 ---> Package m4.x86_64 0:1.4.16-9.el7 will be installed ---> Package patch.x86_64 0:2.7.1-8.el7 will be installed ---> Package redhat-lsb-submod-security.x86_64 0:4.1-27.el7.centos.1 will be installed ---> Package spax.x86_64 0:1.5.2-11.el7 will be installed --> Finished Dependency Resolution Error: Package: 1:cups-client-1.6.3-17.el7.x86_64 (core-0) Requires: cups-libs(x86-64) = 1:1.6.3-17.el7 Installed: 1:cups-libs-1.6.3-17.el7_1.1.x86_64 (@updates) cups-libs(x86-64) = 1:1.6.3-17.el7_1.1 Available: 1:cups-libs-1.6.3-17.el7.x86_64 (core-0) cups-libs(x86-64) = 1:1.6.3-17.el7 You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest === How can I solve this problem? Regards, Kevin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph OSDs with bcache experience
On 11/05/2015 11:03 PM, Michal Kozanecki wrote: > Why did you guys go with partitioning the SSD for ceph journals, instead of > just using the whole SSD for bcache and leaving the journal on the filesystem > (which itself is ontop bcache)? Was there really a benefit to separating the > journals from the bcache fronted HDDs? > > I ask because it has been shown in the past that separating the journal on > SSD based pools doesn't really do much. > Well, the I/O for the journal by-passes bcache completely in this case. The less code the I/O travels through the better we figured. We didn't try with the Journal on bcache. This works for us so we didn't mind testing anything different. Wido > Michal Kozanecki | Linux Administrator | mkozane...@evertz.com > > > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido > den Hollander > Sent: October-28-15 5:49 AM > To: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Ceph OSDs with bcache experience > > > > On 21-10-15 15:30, Mark Nelson wrote: >> >> >> On 10/21/2015 01:59 AM, Wido den Hollander wrote: >>> On 10/20/2015 07:44 PM, Mark Nelson wrote: On 10/20/2015 09:00 AM, Wido den Hollander wrote: > Hi, > > In the "newstore direction" thread on ceph-devel I wrote that I'm > using bcache in production and Mark Nelson asked me to share some details. > > Bcache is running in two clusters now that I manage, but I'll keep > this information to one of them (the one at PCextreme behind CloudStack). > > In this cluster has been running for over 2 years now: > > epoch 284353 > fsid 0d56dd8f-7ae0-4447-b51b-f8b818749307 > created 2013-09-23 11:06:11.819520 > modified 2015-10-20 15:27:48.734213 > > The system consists out of 39 hosts: > > 2U SuperMicro chassis: > * 80GB Intel SSD for OS > * 240GB Intel S3700 SSD for Journaling + Bcache > * 6x 3TB disk > > This isn't the newest hardware. The next batch of hardware will be > more disks per chassis, but this is it for now. > > All systems were installed with Ubuntu 12.04, but they are all > running > 14.04 now with bcache. > > The Intel S3700 SSD is partitioned with a GPT label: > - 5GB Journal for each OSD > - 200GB Partition for bcache > > root@ceph11:~# df -h|grep osd > /dev/bcache02.8T 1.1T 1.8T 38% /var/lib/ceph/osd/ceph-60 > /dev/bcache12.8T 1.2T 1.7T 41% /var/lib/ceph/osd/ceph-61 > /dev/bcache22.8T 930G 1.9T 34% /var/lib/ceph/osd/ceph-62 > /dev/bcache32.8T 970G 1.8T 35% /var/lib/ceph/osd/ceph-63 > /dev/bcache42.8T 814G 2.0T 30% /var/lib/ceph/osd/ceph-64 > /dev/bcache52.8T 915G 1.9T 33% /var/lib/ceph/osd/ceph-65 > root@ceph11:~# > > root@ceph11:~# lsb_release -a > No LSB modules are available. > Distributor ID:Ubuntu > Description:Ubuntu 14.04.3 LTS > Release:14.04 > Codename:trusty > root@ceph11:~# uname -r > 3.19.0-30-generic > root@ceph11:~# > > "apply_latency": { > "avgcount": 2985023, > "sum": 226219.891559000 > } > > What did we notice? > - Less spikes on the disk > - Lower commit latencies on the OSDs > - Almost no 'slow requests' during backfills > - Cache-hit ratio of about 60% > > Max backfills and recovery active are both set to 1 on all OSDs. > > For the next generation hardware we are looking into using 3U > chassis with 16 4TB SATA drives and a 1.2TB NVM-E SSD for bcache, > but we haven't tested those yet, so nothing to say about it. > > The current setup is 200GB of cache for 18TB of disks. The new > setup will be 1200GB for 64TB, curious to see what that does. > > Our main conclusion however is that it does smoothen the > I/O-pattern towards the disks and that gives a overall better > response of the disks. Hi Wido, thanks for the big writeup! Did you guys happen to do any benchmarking? I think Xiaoxi looked at flashcache a while back but had mixed results if I remember right. It would be interesting to know how bcache is affecting performance in different scenarios. >>> >>> No, we didn't do any benchmarking. Initially this cluster was build >>> for just the RADOS Gateway, so we went for 2Gbit (2x 1Gbit) per >>> machine. 90% is still Gbit networking and we are in the process of >>> upgrading it all to 10Gbit. >>> >>> Since the 1Gbit network latency is about 4 times higher then 10Gbit >>> we aren't really benchmarking the cluster. >>> >>> What counts for us most is that we can do recovery operations without >>> any slow requests. >>> >>> Before bcache we saw disks spike to 100% busy while a backfill was busy. >>> Now bcache smoothens this and we see peaks of maybe 70%, but that's it. >> >> In the testing I was doing to
[ceph-users] Soft removal of RBD images
Hi, Since Ceph Hammer we can protect pools from being removed from the cluster, but we can't protect against this: $ rbd ls|xargs -n 1 rbd rm That would remove all not opened RBD images from the cluster. This requires direct access to your Ceph cluster and keys with the proper permission, but it could also be that somebody gains access to a OpenStack or CloudStack API with the proper credentials and issues a removal for all volumes. *Stack will then remove the RBD image and you just lost the data or you face a very long restore procedure. What about a soft-delete for RBD images? I don't know how it should work, since if you gain native RADOS access you can still remove all objects: $ rados -p rbd ls|xargs -n 1 rados -p rbd rm I don't have a design idea yet, but it's something that came to mind. I'd personally like a double-double backup before Ceph decides to remove the data. But for example: When a RBD image is removed we set the "removed" bit in the RBD header, but every RADOS object also gets a "removed" bit set. After a X period the OSD which is primary for a PG starts to remove all objects which have that bit set. In the meantime you can still get back the RBD image by reverting it in a special way. With a special cephx capability for example. This goes a bit in the direction of soft pool-removals as well, it might be combined. Comments? -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Openstack deployment
There must be something in /var/log/cinder/volume.log or /var/log/nova/nova-compute.log that points to the problem. Can you post it here? 2015-11-06 20:14 GMT+08:00 Iban Cabrillo: > Hi Vasilly, > Thanks, but I still see the same error: > > cinder.conf (of course I just restart the cinder-volume service) > > # default volume type to use (string value) > > [rbd-cephvolume] > rbd_user = cinder > rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-xxx > volume_backend_name=rbd > volume_driver = cinder.volume.drivers.rbd.RBDDriver > rbd_pool = volumes > rbd_ceph_conf = /etc/ceph/ceph.conf > rbd_flatten_volume_from_snapshot = false > rbd_max_clone_depth = 5 > rbd_store_chunk_size = 4 > rados_connect_timeout = -1 > glance_api_version = 2 > > > xen be: qdisk-51760: error: Could not open > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or > directory > xen be: qdisk-51760: initialise() failed > xen be: qdisk-51760: error: Could not open > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or > directory > xen be: qdisk-51760: initialise() failed > xen be: qdisk-51760: error: Could not open > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or > directory > xen be: qdisk-51760: initialise() failed > xen be: qdisk-51760: error: Could not open > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or > directory > xen be: qdisk-51760: initialise() failed > xen be: qdisk-51760: error: Could not open > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or > directory > xen be: qdisk-51760: initialise() failed > xen be: qdisk-51760: error: Could not open > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or > directory > xen be: qdisk-51760: initialise() failed > > Regards, I > > 2015-11-06 13:00 GMT+01:00 Vasiliy Angapov : >> >> At cinder.conf you should place this options: >> >> rbd_user = cinder >> rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-xxx >> >> to [rbd-cephvolume] section instead of DEFAULT. >> >> 2015-11-06 19:45 GMT+08:00 Iban Cabrillo : >> > Hi, >> > One more step debugging this issue (hypervisor/nova-compute node is >> > XEN >> > 4.4.2): >> > >> > I think the problem is that libvirt is not getting the correct user or >> > credentials tu access pool, on instance qemu log i see: >> > >> > xen be: qdisk-51760: error: Could not open >> > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or >> > directory >> > xen be: qdisk-51760: initialise() failed >> > xen be: qdisk-51760: error: Could not open >> > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or >> > directory >> > xen be: qdisk-51760: initialise() failed >> > xen be: qdisk-51760: error: Could not open >> > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or >> > directory >> > >> > But using the user cinder on pool volumes : >> > >> > rbd ls -p volumes --id cinder >> > test >> > volume-4d26bb31-91e8-4646-8010-82127b775c8e >> > volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a >> > volume-7da08f12-fb0f-4269-931a-d528c1507fee >> > >> > Using: >> > qemu-img info -f rbd rbd:volumes/test >> > Does not work, but using directly the user cinder and the ceph.conf file >> > works fine: >> > >> > qemu-img info -f rbd rbd:volumes/test:id=cinder:conf=/etc/ceph/ceph.conf >> > >> > I think nova.conf is set correctly (section libvirt): >> > images_rbd_pool = volumes >> > images_rbd_ceph_conf = /etc/ceph/ceph.conf >> > hw_disk_discard=unmap >> > rbd_user = cinder >> > rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9- >> > >> > And looking at libvirt: >> > >> > # virsh secret-list >> > setlocale: No such file or directory >> > UUID Usage >> > >> > >> > 67a6d4a1-e53a-42c7-9bc9- ceph client.cinder secret >> > >> > >> > virsh secret-get-value 67a6d4a1-e53a-42c7-9bc9- >> > setlocale: No such file or directory >> > AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg== >> > cat /etc/ceph/ceph.client.cinder.keyring >> > [client.cinder] >> > key = AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg== >> > >> > >> > Any idea will be welcomed. >> > regards, I >> > >> > 2015-11-04 10:51 GMT+01:00 Iban Cabrillo : >> >> >> >> Dear Cephers, >> >> >> >>I still can attach volume to my cloud machines, ceph version is >> >> 0.94.5 >> >> (9764da52395923e0b32908d83a9f7304401fee43) and Openstack Juno >> >> >> >>Nova+cinder are able to create volumes on Ceph >> >> cephvolume:~ # rados ls --pool volumes >> >> rbd_header.1f7784a9e1c2e >> >> rbd_id.volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a >> >> rbd_directory >> >> rbd_id.volume-7da08f12-fb0f-4269-931a-d528c1507fee >> >> rbd_header.23d5e33b4c15c >> >> rbd_id.volume-4d26bb31-91e8-4646-8010-82127b775c8e >> >> rbd_header.20407190ce77f >> >> >> >> cloud:~ #
Re: [ceph-users] Ceph Openstack deployment
Hi Vasily, Of course, from cinder-volume.log 2015-11-06 12:28:52.865 366 WARNING oslo_config.cfg [req-41a4-4bec-40d2-a7c1-6e8d73644b4c b7aadbb4a85745feb498b74e437129cc ce2dd2951bd24c1ea3b43c3b3716f604 - - -] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency". 2015-11-06 13:09:31.863 15534 WARNING oslo_config.cfg [req-dd47624d-cf25-4beb-9d9e-70f532b2e8f9 - - - - -] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency". 2015-11-06 13:09:44.375 15544 WARNING oslo_config.cfg [req-696a1282-b84c-464c-a220-d4e41a7dbd02 b7aadbb4a85745feb498b74e437129cc ce2dd2951bd24c1ea3b43c3b3716f604 - - -] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency". 2015-11-06 13:11:02.024 15722 WARNING oslo_config.cfg [req-db3c3775-3607-4fb7-acc9-5dba207bde56 - - - - -] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency". 2015-11-06 13:11:40.042 15729 WARNING oslo_config.cfg [req-45458cfd-4e3a-4be2-b858-cece77072829 b7aadbb4a85745feb498b74e437129cc ce2dd2951bd24c1ea3b43c3b3716f604 - - -] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency". 2015-11-06 13:16:49.331 15729 WARNING cinder.quota [req-4e2c2f71-5bfa-487e-a99f-a6bb63bf1bc1 - - - - -] Deprecated: Default quota for resource: gigabytes_rbd is set by the default quota flag: quota_gigabytes_rbd, it is now deprecated. Please use the default quota class for default quota. 2015-11-06 13:16:49.332 15729 WARNING cinder.quota [req-4e2c2f71-5bfa-487e-a99f-a6bb63bf1bc1 - - - - -] Deprecated: Default quota for resource: volumes_rbd is set by the default quota flag: quota_volumes_rbd, it is now deprecated. Please use the default quota class for default quota. 2015-11-06 13:18:16.163 16635 WARNING oslo_config.cfg [req-503543b9-c2df-4483-a8b3-11f622a9cbe8 - - - - -] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency". 2015-11-06 14:17:08.288 16970 WARNING oslo_config.cfg [req-a4ce4dbf-4119-427b-b555-930e66b9a2e3 58981d56c6cd4c5cacd59e518220a0eb 4d778e83692b44778f71cbe44da0bc0b - - -] Option "lock_path" from group "DEFAULT" is deprecated. Use option "lock_path" from group "oslo_concurrency". 2015-11-06 14:17:08.674 16970 WARNING cinder.quota [req-fe21f3ad-7160-45b4-8adf-4cbe4bb85fc3 - - - - -] Deprecated: Default quota for resource: gigabytes_rbd is set by the default quota flag: quota_gigabytes_rbd, it is now deprecated. Please use the default quota class for default quota. 2015-11-06 14:17:08.676 16970 WARNING cinder.quota [req-fe21f3ad-7160-45b4-8adf-4cbe4bb85fc3 - - - - -] Deprecated: Default quota for resource: volumes_rbd is set by the default quota flag: quota_volumes_rbd, it is now deprecated. Please use the default quota class for default quota. And from nova-compute.log 2015-11-06 12:28:20.260 25915 INFO oslo_messaging._drivers.impl_rabbit [req-dd85618c-ab24-43df-8192-b069d00abeeb - - - - -] Connected to AMQP server on rabbitmq01:5672 2015-11-06 12:28:51.864 25915 INFO nova.compute.manager [req-030d8966-cbe7-46c3-9d95-a1c886553fbd b7aadbb4a85745feb498b74e437129cc ce2dd2951bd24c1ea3b43c3b3716f604 - - -] [instance: 08f6fef5-7c98-445b-abfe-636c4c6fee89] Detach volume 4d26bb31-91e8-4646-8010-82127b775c8e from mountpoint /dev/xvdd 2015-11-06 12:29:18.255 25915 INFO nova.compute.resource_tracker [req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 - - - - -] Auditing locally available compute resources for node cms01.ifca.es 2015-11-06 12:29:18.480 25915 INFO nova.compute.resource_tracker [req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 - - - - -] Total usable vcpus: 24, total allocated vcpus: 24 2015-11-06 12:29:18.481 25915 INFO nova.compute.resource_tracker [req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 - - - - -] Final resource view: name=cms01.ifca.es phys_ram=49143MB used_ram=47616MB phys_disk=270GB used_disk=220GB total_vcpus=24 used_vcpus=24 pci_stats= 2015-11-06 12:29:18.508 25915 INFO nova.scheduler.client.report [req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 - - - - -] Compute_service record updated for ('cms01', 'cms01.ifca.es') 2015-11-06 12:29:18.508 25915 INFO nova.compute.resource_tracker [req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 - - - - -] Compute_service record updated for cms01:cms01.ifca.es 2015-11-06 12:29:49.825 25915 INFO nova.compute.manager [req-92d8810c-bea8-4eba-b682-c0d4e9d90c89 b7aadbb4a85745feb498b74e437129cc ce2dd2951bd24c1ea3b43c3b3716f604 - - -] [instance: 08f6fef5-7c98-445b-abfe-636c4c6fee89] Attaching volume 4d26bb31-91e8-4646-8010-82127b775c8e to /dev/xvdd 2015-11-06 12:30:20.389 25915 INFO nova.compute.resource_tracker [req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 - - - - -] Auditing locally available compute resources for node cms01.ifca.es 2015-11-06 12:30:20.595 25915 INFO nova.compute.resource_tracker [req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001
Re: [ceph-users] Soft removal of RBD images
On Fri, Nov 6, 2015 at 2:03 AM, Wido den Hollanderwrote: > Hi, > > Since Ceph Hammer we can protect pools from being removed from the > cluster, but we can't protect against this: > > $ rbd ls|xargs -n 1 rbd rm > > That would remove all not opened RBD images from the cluster. > > This requires direct access to your Ceph cluster and keys with the > proper permission, but it could also be that somebody gains access to a > OpenStack or CloudStack API with the proper credentials and issues a > removal for all volumes. > > *Stack will then remove the RBD image and you just lost the data or you > face a very long restore procedure. > > What about a soft-delete for RBD images? I don't know how it should > work, since if you gain native RADOS access you can still remove all > objects: > > $ rados -p rbd ls|xargs -n 1 rados -p rbd rm > > I don't have a design idea yet, but it's something that came to mind. > I'd personally like a double-double backup before Ceph decides to remove > the data. > > But for example: > > When a RBD image is removed we set the "removed" bit in the RBD header, > but every RADOS object also gets a "removed" bit set. > > After a X period the OSD which is primary for a PG starts to remove all > objects which have that bit set. > > In the meantime you can still get back the RBD image by reverting it in > a special way. With a special cephx capability for example. > > This goes a bit in the direction of soft pool-removals as well, it might > be combined. > > Comments? Besides the work of implementing lazy object deletes, I'm not sure it's a good idea — when somebody's cluster fills up (and there's always somebody!) we need a way to do deletes, and for that data to go away immediately. We have enough trouble with people testing cache pools and finding out there isn't instant deletion of the underlying data. ;) -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Group permission problems with CephFS
I'm seeing similar behavior as well. -rw-rw-r-- 1 testuser testgroup 6 Nov 6 07:41 testfile aaron@testhost$ groups ... testgroup ... aaron@testhost$ cat > testfile -bash: testfile: Permission denied Running version 9.0.2. Were you able to make any progress on this? Thanks, -Aaron On Tue, Aug 4, 2015 at 4:19 AM, Burkhard Linke < burkhard.li...@computational.bio.uni-giessen.de> wrote: > Hi, > > I've encountered some problems accesing files on CephFS: > > $ ls -al syntenyPlot.png > -rw-r- 1 edgar edgar 9329 Jun 11 2014 syntenyPlot.png > > $ groups > ... edgar ... > > $ cat syntenyPlot.png > cat: syntenyPlot.png: Permission denied > > CephFS is mounted via ceph-fuse: > ceph-fuse on /ceph type fuse.ceph-fuse > (rw,nosuid,nodev,noatime,allow_other,default_permissions) > > OS is Ubuntu 14.04, Ceph version is 0.94.2 > > I've isolated a test machine and activated debugging (debug_client = > 20/20). The following lines correspond to the 'cat' invocation: > > 2015-08-04 12:59:44.030372 7f574dffb700 20 client.421984 _ll_get > 0x7f5758024da0 100022310be -> 13 > 2015-08-04 12:59:44.030398 7f574dffb700 3 client.421984 ll_getattr > 100022310be.head > 2015-08-04 12:59:44.030403 7f574dffb700 10 client.421984 _getattr mask > pAsLsXsFs issued=1 > 2015-08-04 12:59:44.030413 7f574dffb700 10 client.421984 fill_stat on > 100022310be snap/devhead mode 042770 mtime 2014-06-12 09:31:39.00 ctime > 2015-07-31 14:17:12.364416 > 2015-08-04 12:59:44.030426 7f574dffb700 3 client.421984 ll_getattr > 100022310be.head = 0 > 2015-08-04 12:59:44.030443 7f574dffb700 3 client.421984 ll_forget > 100022310be 1 > 2015-08-04 12:59:44.030447 7f574dffb700 20 client.421984 _ll_put > 0x7f5758024da0 100022310be 1 -> 12 > 2015-08-04 12:59:44.030459 7f574dffb700 20 client.421984 _ll_get > 0x7f5758024da0 100022310be -> 13 > 2015-08-04 12:59:44.030463 7f574dffb700 3 client.421984 ll_lookup > 0x7f5758024da0 syntenyPlot.png > 2015-08-04 12:59:44.030469 7f574dffb700 20 client.421984 _lookup have dn > syntenyPlot.png mds.-1 ttl 0.00 seq 0 > 2015-08-04 12:59:44.030476 7f574dffb700 10 client.421984 _lookup > 100022310be.head(ref=3 ll_ref=13 cap_refs={} open={} mode=42770 size=0/0 > mtime=2014-06-12 09:31:39.00 caps=pAsLsXsFs(0=pAsLsXsFs) COMPLETE > parents=0x7f57580261d0 0x7f5758024da0) syntenyPlot.png = > 1000223121e.head(ref=2 ll_ref=20 cap_refs={} open={} mode=100640 > size=9329/0 mtime=2014-06-11 09:05:47.00 > caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000223121e ts 0/0 objects 0 > dirty_or_tx 0] parents=0x7f575802d290 0x7f575802c5a0) > 2015-08-04 12:59:44.030530 7f574dffb700 10 client.421984 fill_stat on > 1000223121e snap/devhead mode 0100640 mtime 2014-06-11 09:05:47.00 > ctime 2015-08-04 11:07:53.623370 > 2015-08-04 12:59:44.030539 7f574dffb700 20 client.421984 _ll_get > 0x7f575802c5a0 1000223121e -> 21 > 2015-08-04 12:59:44.030542 7f574dffb700 3 client.421984 ll_lookup > 0x7f5758024da0 syntenyPlot.png -> 0 (1000223121e) > 2015-08-04 12:59:44.030555 7f574dffb700 3 client.421984 ll_forget > 100022310be 1 > 2015-08-04 12:59:44.030558 7f574dffb700 20 client.421984 _ll_put > 0x7f5758024da0 100022310be 1 -> 12 > 2015-08-04 12:59:44.030628 7f57467fc700 20 client.421984 _ll_get > 0x7f575802c5a0 1000223121e -> 22 > 2015-08-04 12:59:44.030645 7f57467fc700 3 client.421984 ll_getattr > 1000223121e.head > 2015-08-04 12:59:44.030649 7f57467fc700 10 client.421984 _getattr mask > pAsLsXsFs issued=1 > 2015-08-04 12:59:44.030659 7f57467fc700 10 client.421984 fill_stat on > 1000223121e snap/devhead mode 0100640 mtime 2014-06-11 09:05:47.00 > ctime 2015-08-04 11:07:53.623370 > 2015-08-04 12:59:44.030672 7f57467fc700 3 client.421984 ll_getattr > 1000223121e.head = 0 > 2015-08-04 12:59:44.030690 7f57467fc700 3 client.421984 ll_forget > 1000223121e 1 > 2015-08-04 12:59:44.030695 7f57467fc700 20 client.421984 _ll_put > 0x7f575802c5a0 1000223121e 1 -> 21 > 2015-08-04 12:59:44.030760 7f574e7fc700 20 client.421984 _ll_get > 0x7f575802c5a0 1000223121e -> 22 > 2015-08-04 12:59:44.030775 7f574e7fc700 3 client.421984 ll_open > 1000223121e.head 32768 > 2015-08-04 12:59:44.030779 7f574e7fc700 3 client.421984 ll_open > 1000223121e.head 32768 = -13 (0) > 2015-08-04 12:59:44.030797 7f574e7fc700 3 client.421984 ll_forget > 1000223121e 1 > 2015-08-04 12:59:44.030802 7f574e7fc700 20 client.421984 _ll_put > 0x7f575802c5a0 1000223121e 1 -> 21 > > > The return value of -13 in the open call is probably 'permission denied'. > > The setup looks ok with respect to permissions. The root user is able to > read the file in question. The owning user is also able to read the file > (after sudo). The problem occurs on several hosts for a number of files, > but not all files or all users on CephFS are affected. User and group > information are stored in LDAP and made available via SSSD; ls -l displays > to correct group and user names, and id(1) lists the correct id and names. > > Any hints on what's going wrong here? > > Best regards, > Burkhard >
Re: [ceph-users] Group permission problems with CephFS
Hi, On 11/06/2015 04:52 PM, Aaron Ten Clay wrote: I'm seeing similar behavior as well. -rw-rw-r-- 1 testuser testgroup 6 Nov 6 07:41 testfile aaron@testhost$ groups ... testgroup ... aaron@testhost$ cat > testfile -bash: testfile: Permission denied Running version 9.0.2. Were you able to make any progress on this? There's a pending pull request that need to be included in the next release (see http://tracker.ceph.com/issues/12617) Regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Openstack deployment
Hi Vasilly, Thanks, but I still see the same error: cinder.conf (of course I just restart the cinder-volume service) # default volume type to use (string value) [rbd-cephvolume] rbd_user = cinder rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-xxx volume_backend_name=rbd volume_driver = cinder.volume.drivers.rbd.RBDDriver rbd_pool = volumes rbd_ceph_conf = /etc/ceph/ceph.conf rbd_flatten_volume_from_snapshot = false rbd_max_clone_depth = 5 rbd_store_chunk_size = 4 rados_connect_timeout = -1 glance_api_version = 2 xen be: qdisk-51760: error: Could not open 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or directory xen be: qdisk-51760: initialise() failed xen be: qdisk-51760: error: Could not open 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or directory xen be: qdisk-51760: initialise() failed xen be: qdisk-51760: error: Could not open 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or directory xen be: qdisk-51760: initialise() failed xen be: qdisk-51760: error: Could not open 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or directory xen be: qdisk-51760: initialise() failed xen be: qdisk-51760: error: Could not open 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or directory xen be: qdisk-51760: initialise() failed xen be: qdisk-51760: error: Could not open 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or directory xen be: qdisk-51760: initialise() failed Regards, I 2015-11-06 13:00 GMT+01:00 Vasiliy Angapov: > At cinder.conf you should place this options: > > rbd_user = cinder > rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-xxx > > to [rbd-cephvolume] section instead of DEFAULT. > > 2015-11-06 19:45 GMT+08:00 Iban Cabrillo : > > Hi, > > One more step debugging this issue (hypervisor/nova-compute node is XEN > > 4.4.2): > > > > I think the problem is that libvirt is not getting the correct user or > > credentials tu access pool, on instance qemu log i see: > > > > xen be: qdisk-51760: error: Could not open > > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or > > directory > > xen be: qdisk-51760: initialise() failed > > xen be: qdisk-51760: error: Could not open > > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or > > directory > > xen be: qdisk-51760: initialise() failed > > xen be: qdisk-51760: error: Could not open > > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or > > directory > > > > But using the user cinder on pool volumes : > > > > rbd ls -p volumes --id cinder > > test > > volume-4d26bb31-91e8-4646-8010-82127b775c8e > > volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a > > volume-7da08f12-fb0f-4269-931a-d528c1507fee > > > > Using: > > qemu-img info -f rbd rbd:volumes/test > > Does not work, but using directly the user cinder and the ceph.conf file > > works fine: > > > > qemu-img info -f rbd rbd:volumes/test:id=cinder:conf=/etc/ceph/ceph.conf > > > > I think nova.conf is set correctly (section libvirt): > > images_rbd_pool = volumes > > images_rbd_ceph_conf = /etc/ceph/ceph.conf > > hw_disk_discard=unmap > > rbd_user = cinder > > rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9- > > > > And looking at libvirt: > > > > # virsh secret-list > > setlocale: No such file or directory > > UUID Usage > > > > > 67a6d4a1-e53a-42c7-9bc9- ceph client.cinder secret > > > > > > virsh secret-get-value 67a6d4a1-e53a-42c7-9bc9- > > setlocale: No such file or directory > > AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg== > > cat /etc/ceph/ceph.client.cinder.keyring > > [client.cinder] > > key = AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg== > > > > > > Any idea will be welcomed. > > regards, I > > > > 2015-11-04 10:51 GMT+01:00 Iban Cabrillo : > >> > >> Dear Cephers, > >> > >>I still can attach volume to my cloud machines, ceph version is > 0.94.5 > >> (9764da52395923e0b32908d83a9f7304401fee43) and Openstack Juno > >> > >>Nova+cinder are able to create volumes on Ceph > >> cephvolume:~ # rados ls --pool volumes > >> rbd_header.1f7784a9e1c2e > >> rbd_id.volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a > >> rbd_directory > >> rbd_id.volume-7da08f12-fb0f-4269-931a-d528c1507fee > >> rbd_header.23d5e33b4c15c > >> rbd_id.volume-4d26bb31-91e8-4646-8010-82127b775c8e > >> rbd_header.20407190ce77f > >> > >> cloud:~ # cinder list > >> > >> > +--++--+--+-+--+--+ > >> | ID | > >> Status | Display Name | Size | Volume Type | Bootable | > >> Attached to | > >> > >> >
[ceph-users] ceph-deploy on lxc container - 'initctl: Event failed'
Hello, everyone! I just tried to create a new Ceph cluster, using 3 LXC clusters as monitors, and the 'ceph-deploy mon create-initial' command fails for each of the monitors with a 'initctl: Event failed' error, when running the following command: [ceph-mon-01][INFO ] Running command: sudo initctl emit ceph-mon cluster=ceph id=ceph-mon-01 [ceph-mon-01][WARNIN] initctl: Event failed Is it OK to use LXC containers as Ceph MONs? if yes - is there anything special which needs to be done prior to the 'mon create-initial' phase? Thank you! Regards, Bogdan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osd fails to start, rbd hangs
Hi, I have an issue with my (small) ceph cluster after an osd failed. ceph -s reports the following: cluster 2752438a-a33e-4df4-b9ec-beae32d00aad health HEALTH_WARN 31 pgs down 31 pgs peering 31 pgs stuck inactive 31 pgs stuck unclean monmap e1: 1 mons at {0=192.168.19.13:6789/0} election epoch 1, quorum 0 0 osdmap e138: 3 osds: 2 up, 2 in pgmap v77979: 64 pgs, 1 pools, 844 GB data, 211 kobjects 1290 GB used, 8021 GB / 9315 GB avail 33 active+clean 31 down+peering I am now unable to map the rbd image; the command will just time out. The log is at the end of the message. Is there a way to recover the osd / the ceph cluster from this? thanks in advance Philipp -2> 2015-10-30 01:04:59.689116 7f4bb741e700 1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had timed out after 15 -1> 2015-10-30 01:04:59.689140 7f4bb741e700 1 heartbeat_map is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had suicide timed out after 150 0> 2015-10-30 01:04:59.906546 7f4bb741e700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f4bb741e700 time 2015-10-30 01:04:59.689176 common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x77) [0xb12457] 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x119) [0xa47179] 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] 4: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] 5: (CephContextServiceThread::entry()+0x164) [0xb21974] 6: (()+0x76f5) [0x7f4bbdb0c6f5] 7: (__clone()+0x6d) [0x7f4bbc09cedd] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 1 max_new 1000 log_file /var/log/ceph/ceph-osd.2.log --- end dump of recent events --- 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal (Aborted) ** in thread 7f4bb741e700 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) 1: /usr/bin/ceph-osd() [0xa11c84] 2: (()+0x10690) [0x7f4bbdb15690] 3: (gsignal()+0x37) [0x7f4bbbfe63c7] 4: (abort()+0x16a) [0x7f4bbbfe77fa] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45] 6: (()+0x5dda7) [0x7f4bbc8c5da7] 7: (()+0x5ddf2) [0x7f4bbc8c5df2] 8: (()+0x5e008) [0x7f4bbc8c6008] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x252) [0xb12632] 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x119) [0xa47179] 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] 13: (CephContextServiceThread::entry()+0x164) [0xb21974] 14: (()+0x76f5) [0x7f4bbdb0c6f5] 15: (__clone()+0x6d) [0x7f4bbc09cedd] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- begin dump of recent events --- 0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal (Aborted) ** in thread 7f4bb741e700 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) 1: /usr/bin/ceph-osd() [0xa11c84] 2: (()+0x10690) [0x7f4bbdb15690] 3: (gsignal()+0x37) [0x7f4bbbfe63c7] 4: (abort()+0x16a) [0x7f4bbbfe77fa] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45] 6: (()+0x5dda7) [0x7f4bbc8c5da7] 7: (()+0x5ddf2) [0x7f4bbc8c5df2] 8: (()+0x5e008) [0x7f4bbc8c6008] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x252) [0xb12632] 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x119) [0xa4 7179] 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] 13: (CephContextServiceThread::entry()+0x164) [0xb21974] 14: (()+0x76f5) [0x7f4bbdb0c6f5] 15: (__clone()+0x6d) [0x7f4bbc09cedd] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this . --- begin dump of recent events --- 0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 ***
Re: [ceph-users] osd fails to start, rbd hangs
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/ :) On Friday, November 6, 2015, Philipp Schwahawrote: > Hi, > > I have an issue with my (small) ceph cluster after an osd failed. > ceph -s reports the following: > cluster 2752438a-a33e-4df4-b9ec-beae32d00aad > health HEALTH_WARN > 31 pgs down > 31 pgs peering > 31 pgs stuck inactive > 31 pgs stuck unclean > monmap e1: 1 mons at {0=192.168.19.13:6789/0} > election epoch 1, quorum 0 0 > osdmap e138: 3 osds: 2 up, 2 in > pgmap v77979: 64 pgs, 1 pools, 844 GB data, 211 kobjects > 1290 GB used, 8021 GB / 9315 GB avail > 33 active+clean > 31 down+peering > > I am now unable to map the rbd image; the command will just time out. > The log is at the end of the message. > > Is there a way to recover the osd / the ceph cluster from this? > > thanks in advance > Philipp > > > > -2> 2015-10-30 01:04:59.689116 7f4bb741e700 1 heartbeat_map > is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had timed out after 15 > -1> 2015-10-30 01:04:59.689140 7f4bb741e700 1 heartbeat_map > is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had suicide timed out > after 150 > 0> 2015-10-30 01:04:59.906546 7f4bb741e700 -1 > common/HeartbeatMap.cc: In function 'bool > ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, > time_t)' thread 7f4bb741e700 time 2015-10-30 01:04:59.689176 > common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") > > ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x77) [0xb12457] > 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x119) [0xa47179] > 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] > 4: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] > 5: (CephContextServiceThread::entry()+0x164) [0xb21974] > 6: (()+0x76f5) [0x7f4bbdb0c6f5] > 7: (__clone()+0x6d) [0x7f4bbc09cedd] > NOTE: a copy of the executable, or `objdump -rdS ` is > needed to interpret this. > > --- logging levels --- >0/ 5 none >0/ 1 lockdep >0/ 1 context >1/ 1 crush >1/ 5 mds >1/ 5 mds_balancer >1/ 5 mds_locker >1/ 5 mds_log >1/ 5 mds_log_expire >1/ 5 mds_migrator >0/ 1 buffer >0/ 1 timer >0/ 1 filer >0/ 1 striper >0/ 1 objecter >0/ 5 rados >0/ 5 rbd >0/ 5 rbd_replay >0/ 5 journaler >0/ 5 objectcacher >0/ 5 client >0/ 5 osd >0/ 5 optracker >0/ 5 objclass >1/ 3 filestore >1/ 3 keyvaluestore >1/ 3 journal >0/ 5 ms >1/ 5 mon >0/10 monc >1/ 5 paxos >0/ 5 tp >1/ 5 auth >1/ 5 crypto >1/ 1 finisher >1/ 5 heartbeatmap >1/ 5 perfcounter >1/ 5 rgw >1/10 civetweb >1/ 5 javaclient >1/ 5 asok >1/ 1 throttle >0/ 0 refs >1/ 5 xio > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 1 > max_new 1000 > log_file /var/log/ceph/ceph-osd.2.log > --- end dump of recent events --- > 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal (Aborted) ** > in thread 7f4bb741e700 > > ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) > 1: /usr/bin/ceph-osd() [0xa11c84] > 2: (()+0x10690) [0x7f4bbdb15690] > 3: (gsignal()+0x37) [0x7f4bbbfe63c7] > 4: (abort()+0x16a) [0x7f4bbbfe77fa] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45] > 6: (()+0x5dda7) [0x7f4bbc8c5da7] > 7: (()+0x5ddf2) [0x7f4bbc8c5df2] > 8: (()+0x5e008) [0x7f4bbc8c6008] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x252) [0xb12632] > 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x119) [0xa47179] > 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76] > 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258] > 13: (CephContextServiceThread::entry()+0x164) [0xb21974] > 14: (()+0x76f5) [0x7f4bbdb0c6f5] > 15: (__clone()+0x6d) [0x7f4bbc09cedd] > NOTE: a copy of the executable, or `objdump -rdS ` is > needed to interpret this. > > --- begin dump of recent events --- > 0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal > (Aborted) ** > in thread 7f4bb741e700 > > ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) > 1: /usr/bin/ceph-osd() [0xa11c84] > 2: (()+0x10690) [0x7f4bbdb15690] > 3: (gsignal()+0x37) [0x7f4bbbfe63c7] > 4: (abort()+0x16a) [0x7f4bbbfe77fa] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45] > 6: (()+0x5dda7) [0x7f4bbc8c5da7] > 7: (()+0x5ddf2) [0x7f4bbc8c5df2] > 8: (()+0x5e008) [0x7f4bbc8c6008] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x252) [0xb12632] > 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, > long)+0x119) [0xa4
Re: [ceph-users] Ceph Openstack deployment
Hi, One more step debugging this issue (hypervisor/nova-compute node is XEN 4.4.2): I think the problem is that libvirt is not getting the correct user or credentials tu access pool, on instance qemu log i see: xen be: qdisk-51760: error: Could not open 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or directory xen be: qdisk-51760: initialise() failed xen be: qdisk-51760: error: Could not open 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or directory xen be: qdisk-51760: initialise() failed xen be: qdisk-51760: error: Could not open 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or directory But using the user cinder on pool volumes : rbd ls -p volumes --id cinder test *volume-4d26bb31-91e8-4646-8010-82127b775c8e* volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a volume-7da08f12-fb0f-4269-931a-d528c1507fee Using: qemu-img info -f rbd rbd:volumes/test Does not work, but using directly the user cinder and the ceph.conf file works fine: *qemu-img info -f rbd rbd:volumes/test:id=cinder:conf=/etc/ceph/ceph.conf* I think nova.conf is set correctly (section libvirt): images_rbd_pool = volumes images_rbd_ceph_conf = /etc/ceph/ceph.conf hw_disk_discard=unmap rbd_user = cinder rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9- And looking at libvirt: # virsh secret-list setlocale: No such file or directory UUID Usage 67a6d4a1-e53a-42c7-9bc9- ceph client.cinder secret virsh secret-get-value 67a6d4a1-e53a-42c7-9bc9- setlocale: No such file or directory *AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg==* cat /etc/ceph/ceph.client.cinder.keyring [client.cinder] key = *AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg==* Any idea will be welcomed. regards, I 2015-11-04 10:51 GMT+01:00 Iban Cabrillo: > Dear Cephers, > >I still can attach volume to my cloud machines, ceph version is 0.94.5 > (9764da52395923e0b32908d83a9f7304401fee43) and Openstack Juno > >Nova+cinder are able to create volumes on Ceph > cephvolume:~ # rados ls --pool volumes > rbd_header.1f7784a9e1c2e > rbd_id.volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a > rbd_directory > rbd_id.volume-7da08f12-fb0f-4269-931a-d528c1507fee > rbd_header.23d5e33b4c15c > rbd_id.volume-4d26bb31-91e8-4646-8010-82127b775c8e > rbd_header.20407190ce77f > > cloud:~ # cinder list > > +--++--+--+-+--+--+ > | ID | > Status | Display Name | Size | Volume Type | Bootable | > Attached to | > > +--++--+--+-+--+|-+ > | 4d26bb31-91e8-4646-8010-82127b775c8e | in-use | None | 2 > | rbd | false | > 59aa021e-bb4c-4154-9b18-9d09f5fd3aeb | > > +--++--+--+-+--+--+ > > >nova:~ # nova volume-attach 59aa021e-bb4c-4154-9b18-9d09f5fd3aeb > 4d26bb31-91e8-4646-8010-82127b775c8e auto > +--++ > | Property | Value > | > +--++ > | device | /dev/xvdd > | > | id | 4d26bb31-91e8-4646-8010-82127b775c8e | > | serverId | 59aa021e-bb4c-4154-9b18-9d09f5fd3aeb | > | volumeId | 4d26bb31-91e8-4646-8010-82127b775c8e | > +--+--+ > > From nova-compute (Ubuntu 14.04 LTS \n \l) node I see the > attaching/detaching: > cloud01:~ # dpkg -l | grep ceph > ii ceph-common 0.94.5-1trusty > amd64common utilities to mount and interact with a ceph storage > cluster > ii libcephfs1 0.94.5-1trusty >amd64Ceph distributed file system client library > ii python-cephfs 0.94.5-1trusty >amd64Python libraries for the Ceph libcephfs library > ii librbd10.94.5-1trusty > amd64RADOS block device client library > ii python-rbd 0.94.5-1trusty > amd64Python libraries for the Ceph librbd library > > *at cinder.conf* > > *rbd_user = cinder* > *rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-xxx* > > *[rbd-cephvolume]* > *volume_backend_name=rbd* > *volume_driver = cinder.volume.drivers.rbd.RBDDriver* > *rbd_pool = volumes* > *rbd_ceph_conf =
Re: [ceph-users] Ceph Openstack deployment
At cinder.conf you should place this options: rbd_user = cinder rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-xxx to [rbd-cephvolume] section instead of DEFAULT. 2015-11-06 19:45 GMT+08:00 Iban Cabrillo: > Hi, > One more step debugging this issue (hypervisor/nova-compute node is XEN > 4.4.2): > > I think the problem is that libvirt is not getting the correct user or > credentials tu access pool, on instance qemu log i see: > > xen be: qdisk-51760: error: Could not open > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or > directory > xen be: qdisk-51760: initialise() failed > xen be: qdisk-51760: error: Could not open > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or > directory > xen be: qdisk-51760: initialise() failed > xen be: qdisk-51760: error: Could not open > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or > directory > > But using the user cinder on pool volumes : > > rbd ls -p volumes --id cinder > test > volume-4d26bb31-91e8-4646-8010-82127b775c8e > volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a > volume-7da08f12-fb0f-4269-931a-d528c1507fee > > Using: > qemu-img info -f rbd rbd:volumes/test > Does not work, but using directly the user cinder and the ceph.conf file > works fine: > > qemu-img info -f rbd rbd:volumes/test:id=cinder:conf=/etc/ceph/ceph.conf > > I think nova.conf is set correctly (section libvirt): > images_rbd_pool = volumes > images_rbd_ceph_conf = /etc/ceph/ceph.conf > hw_disk_discard=unmap > rbd_user = cinder > rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9- > > And looking at libvirt: > > # virsh secret-list > setlocale: No such file or directory > UUID Usage > > 67a6d4a1-e53a-42c7-9bc9- ceph client.cinder secret > > > virsh secret-get-value 67a6d4a1-e53a-42c7-9bc9- > setlocale: No such file or directory > AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg== > cat /etc/ceph/ceph.client.cinder.keyring > [client.cinder] > key = AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg== > > > Any idea will be welcomed. > regards, I > > 2015-11-04 10:51 GMT+01:00 Iban Cabrillo : >> >> Dear Cephers, >> >>I still can attach volume to my cloud machines, ceph version is 0.94.5 >> (9764da52395923e0b32908d83a9f7304401fee43) and Openstack Juno >> >>Nova+cinder are able to create volumes on Ceph >> cephvolume:~ # rados ls --pool volumes >> rbd_header.1f7784a9e1c2e >> rbd_id.volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a >> rbd_directory >> rbd_id.volume-7da08f12-fb0f-4269-931a-d528c1507fee >> rbd_header.23d5e33b4c15c >> rbd_id.volume-4d26bb31-91e8-4646-8010-82127b775c8e >> rbd_header.20407190ce77f >> >> cloud:~ # cinder list >> >> +--++--+--+-+--+--+ >> | ID | >> Status | Display Name | Size | Volume Type | Bootable | >> Attached to | >> >> +--++--+--+-+--+|-+ >> | 4d26bb31-91e8-4646-8010-82127b775c8e | in-use | None | 2 >> | rbd | false | 59aa021e-bb4c-4154-9b18-9d09f5fd3aeb >> | >> >> +--++--+--+-+--+--+ >> >> >>nova:~ # nova volume-attach 59aa021e-bb4c-4154-9b18-9d09f5fd3aeb >> 4d26bb31-91e8-4646-8010-82127b775c8e auto >> +--++ >> | Property | Value >> | >> +--++ >> | device | /dev/xvdd >> | >> | id | 4d26bb31-91e8-4646-8010-82127b775c8e | >> | serverId | 59aa021e-bb4c-4154-9b18-9d09f5fd3aeb | >> | volumeId | 4d26bb31-91e8-4646-8010-82127b775c8e | >> +--+--+ >> >> From nova-compute (Ubuntu 14.04 LTS \n \l) node I see the >> attaching/detaching: >> cloud01:~ # dpkg -l | grep ceph >> ii ceph-common 0.94.5-1trusty >> amd64common utilities to mount and interact with a ceph storage >> cluster >> ii libcephfs1 0.94.5-1trusty >> amd64Ceph distributed file system client library >> ii python-cephfs 0.94.5-1trusty >> amd64Python libraries for the Ceph libcephfs library >> ii librbd10.94.5-1trusty >> amd64RADOS block device client library >> ii python-rbd
Re: [ceph-users] osd fails to start, rbd hangs
On 11/06/2015 09:25 PM, Gregory Farnum wrote: > http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/ > > :) > Thanks, I tried to follow the advice to "... start that ceph-osd and things will recover.", for the better part of the last two days but did not succeed in reviving the crashed osd :( I do not understand the message the osd is giving, since the files appear to be there: beta ~ # ls -lrt /var/lib/ceph/osd/ceph-2/ total 1048656 -rw-r--r-- 1 root root 37 Oct 26 16:25 fsid -rw-r--r-- 1 root root 4 Oct 26 16:25 store_version -rw-r--r-- 1 root root 53 Oct 26 16:25 superblock -rw-r--r-- 1 root root 21 Oct 26 16:25 magic -rw-r--r-- 1 root root 2 Oct 26 16:25 whoami -rw-r--r-- 1 root root 37 Oct 26 16:25 ceph_fsid -rw-r--r-- 1 root root 6 Oct 26 16:25 ready -rw--- 1 root root 56 Oct 26 16:25 keyring drwxr-xr-x 1 root root752 Oct 26 16:47 snap_16793 drwxr-xr-x 1 root root752 Oct 26 16:47 snap_16773 drwxr-xr-x 1 root root230 Oct 30 01:01 snap_242352 drwxr-xr-x 1 root root230 Oct 30 01:01 snap_242378 -rw-r--r-- 1 root root 1073741824 Oct 30 01:02 journal drwxr-xr-x 1 root root256 Nov 6 21:55 current as well as a subvolume: btrfs subvolume list /var/lib/ceph/osd/ceph-2/ ID 8005 gen 8336 top level 5 path snap_242352 ID 8006 gen 8467 top level 5 path snap_242378 ID 8070 gen 8468 top level 5 path current still the osd complains says "current/ missing entirely (unusual, but okay)" and then completely fails to mount the object store. Is this somethig where to give up on the osd completely, mark it as lost and try to go on from there? The machine on which the osd runs did not have any other issues, only the osd apparently self destructed ~3.5 days after it was added. Or is the recovery of the osd simple (enough) and I just missed the point somewhere? ;) thanks in advance Philipp The log of an attempted start of the osd continues to give: 2015-11-06 21:41:53.213174 7f44755a77c0 0 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b), process ceph-osd, pid 3751 2015-11-06 21:41:53.254418 7f44755a77c0 10 filestore(/var/lib/ceph/osd/ceph-2) dump_stop 2015-11-06 21:41:53.275694 7f44755a77c0 10 ErasureCodePluginSelectJerasure: load: jerasure_sse4 2015-11-06 21:41:53.291133 7f44755a77c0 10 load: jerasure load: lrc 2015-11-06 21:41:53.291543 7f44755a77c0 5 filestore(/var/lib/ceph/osd/ceph-2) test_mount basedir /var/lib/ceph/osd/ceph-2 journal /var/lib/ceph/osd/ceph-2/journal 2015-11-06 21:41:53.292043 7f44755a77c0 2 osd.2 0 mounting /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal 2015-11-06 21:41:53.292152 7f44755a77c0 5 filestore(/var/lib/ceph/osd/ceph-2) basedir /var/lib/ceph/osd/ceph-2 journal /var/lib/ceph/osd/ceph-2/journal 2015-11-06 21:41:53.292216 7f44755a77c0 10 filestore(/var/lib/ceph/osd/ceph-2) mount fsid is 2662df9c-fd60-425c-ac89-4fe07a2a1b2f 2015-11-06 21:41:53.292412 7f44755a77c0 0 filestore(/var/lib/ceph/osd/ceph-2) backend btrfs (magic 0x9123683e) 2015-11-06 21:41:59.753329 7f44755a77c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: FIEMAP ioctl is supported and appears to work 2015-11-06 21:41:59.753395 7f44755a77c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-11-06 21:42:00.968438 7f44755a77c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-11-06 21:42:00.969431 7f44755a77c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: CLONE_RANGE ioctl is supported 2015-11-06 21:42:03.033742 7f44755a77c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: SNAP_CREATE is supported 2015-11-06 21:42:03.034262 7f44755a77c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: SNAP_DESTROY is supported 2015-11-06 21:42:03.042168 7f44755a77c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: START_SYNC is supported (transid 8453) 2015-11-06 21:42:04.144516 7f44755a77c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: WAIT_SYNC is supported 2015-11-06 21:42:04.309323 7f44755a77c0 0 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: SNAP_CREATE_V2 is supported 2015-11-06 21:42:04.310562 7f44755a77c0 10 filestore(/var/lib/ceph/osd/ceph-2) current/ missing entirely (unusual, but okay) 2015-11-06 21:42:04.310686 7f44755a77c0 10 filestore(/var/lib/ceph/osd/ceph-2) most recent snap from <242352,242378> is 242378 2015-11-06 21:42:04.310763 7f44755a77c0 10 filestore(/var/lib/ceph/osd/ceph-2) mount rolling back to consistent snap 242378 2015-11-06 21:42:04.310812 7f44755a77c0 10 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) rollback_to: to 'snap_242378' 2015-11-06 21:42:06.384894 7f44755a77c0 5 filestore(/var/lib/ceph/osd/ceph-2) mount op_seq is 0 2015-11-06
Re: [ceph-users] ceph-deploy on lxc container - 'initctl: Event failed'
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I've put monitors in LXC but I haven't done it with ceph-deploy. I've had no problems with it. - Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Fri, Nov 6, 2015 at 12:55 PM, Bogdan SOLGA wrote: > Hello, everyone! > > I just tried to create a new Ceph cluster, using 3 LXC clusters as monitors, > and the 'ceph-deploy mon create-initial' command fails for each of the > monitors with a 'initctl: Event failed' error, when running the following > command: > > [ceph-mon-01][INFO ] Running command: sudo initctl emit ceph-mon > cluster=ceph id=ceph-mon-01 > [ceph-mon-01][WARNIN] initctl: Event failed > > Is it OK to use LXC containers as Ceph MONs? if yes - is there anything > special which needs to be done prior to the 'mon create-initial' phase? > > Thank you! > > Regards, > Bogdan > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -BEGIN PGP SIGNATURE- Version: Mailvelope v1.2.3 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJWPRu7CRDmVDuy+mK58QAAqoUP/0CM1aRSm6XRWVeRvWzb kWWrgHyypNbHKhGXe07F8bHS1jberhKs9RCuU+RKN2aJ7M3zL1xr5ysspZ4R +1fMHVW4enW5haBKa1Z1/1C5uPBQvVOwjEE+7k8XncvP4+mnICtBqtEQPc1g +62CY9Ke39btPXwGJiTC8by2Uh6pvrtnfGf7UGh6nWrnoOxJmTnZImmQKbpg PLvqw/Dl/KJD4DcQoS3nzLRXhZXOohpUsAJBMegq422+iYa31f0QVdddzoC7 DYfqxV2xszOeh24McTXZjOVulC1w2Xni3R9vOWjbJGPlMbg1xnBqX/G+Fn2z 2UAOYTMx5bK/j3wzAryMYs9/dtr4JhpO8cVWSm1fxM4J3V/96ug4Y3eYHoCZ FoTGDmPwFDXQkwTFwjWWgoIMQh/1Zi6Nm6cLnggVlQcotdfka/glcLEHXXMb uPXKcrY6kwwIbw+JFUbn6GUlK1ZSURKnmwXmVroHnoxnWH7bH7hhNv+GYzxJ AjOxlds8E4igFHxwh0A7xIq/IosKgwxIuxbO2BlnYTYCoCrjOWoesiFtQdpX q+tRSo03gC4PSqrjsm7xsMdSW/3uaIEzZPx/SQJU/JBDKarNY2eCo7VYntUx 7uxkWGEA4sibLdjNIGkRJHSrZDVdSJMlaPNBNrxmREl0t9b+DVBtbLgSvHeW Tj4D =aGAZ -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd fails to start, rbd hangs
Hi Philipp, I see you only have 2 osds, have you check that your "osd pool get size" is 2, and min_size=1?? Cheers, I 2015-11-06 22:05 GMT+01:00 Philipp Schwaha: > On 11/06/2015 09:25 PM, Gregory Farnum wrote: > > > http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/ > > > > :) > > > > Thanks, I tried to follow the advice to "... start that ceph-osd and > things will recover.", for the better part of the last two days but did > not succeed in reviving the crashed osd :( > I do not understand the message the osd is giving, since the files > appear to be there: > > beta ~ # ls -lrt /var/lib/ceph/osd/ceph-2/ > total 1048656 > -rw-r--r-- 1 root root 37 Oct 26 16:25 fsid > -rw-r--r-- 1 root root 4 Oct 26 16:25 store_version > -rw-r--r-- 1 root root 53 Oct 26 16:25 superblock > -rw-r--r-- 1 root root 21 Oct 26 16:25 magic > -rw-r--r-- 1 root root 2 Oct 26 16:25 whoami > -rw-r--r-- 1 root root 37 Oct 26 16:25 ceph_fsid > -rw-r--r-- 1 root root 6 Oct 26 16:25 ready > -rw--- 1 root root 56 Oct 26 16:25 keyring > drwxr-xr-x 1 root root752 Oct 26 16:47 snap_16793 > drwxr-xr-x 1 root root752 Oct 26 16:47 snap_16773 > drwxr-xr-x 1 root root230 Oct 30 01:01 snap_242352 > drwxr-xr-x 1 root root230 Oct 30 01:01 snap_242378 > -rw-r--r-- 1 root root 1073741824 Oct 30 01:02 journal > drwxr-xr-x 1 root root256 Nov 6 21:55 current > > as well as a subvolume: > > btrfs subvolume list /var/lib/ceph/osd/ceph-2/ > ID 8005 gen 8336 top level 5 path snap_242352 > ID 8006 gen 8467 top level 5 path snap_242378 > ID 8070 gen 8468 top level 5 path current > > still the osd complains says "current/ missing entirely (unusual, but > okay)" and then completely fails to mount the object store. > Is this somethig where to give up on the osd completely, mark it as lost > and try to go on from there? > The machine on which the osd runs did not have any other issues, only > the osd apparently self destructed ~3.5 days after it was added. > > Or is the recovery of the osd simple (enough) and I just missed the > point somewhere? ;) > > thanks in advance > Philipp > > The log of an attempted start of the osd continues to give: > > 2015-11-06 21:41:53.213174 7f44755a77c0 0 ceph version 0.94.3 > (95cefea9fd9ab740263bf8bb4796fd864d9afe2b), process ceph-osd, pid 3751 > 2015-11-06 21:41:53.254418 7f44755a77c0 10 > filestore(/var/lib/ceph/osd/ceph-2) dump_stop > 2015-11-06 21:41:53.275694 7f44755a77c0 10 > ErasureCodePluginSelectJerasure: load: jerasure_sse4 > 2015-11-06 21:41:53.291133 7f44755a77c0 10 load: jerasure load: lrc > 2015-11-06 21:41:53.291543 7f44755a77c0 5 > filestore(/var/lib/ceph/osd/ceph-2) test_mount basedir > /var/lib/ceph/osd/ceph-2 journal /var/lib/ceph/osd/ceph-2/journal > 2015-11-06 21:41:53.292043 7f44755a77c0 2 osd.2 0 mounting > /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal > 2015-11-06 21:41:53.292152 7f44755a77c0 5 > filestore(/var/lib/ceph/osd/ceph-2) basedir /var/lib/ceph/osd/ceph-2 > journal /var/lib/ceph/osd/ceph-2/journal > 2015-11-06 21:41:53.292216 7f44755a77c0 10 > filestore(/var/lib/ceph/osd/ceph-2) mount fsid is > 2662df9c-fd60-425c-ac89-4fe07a2a1b2f > 2015-11-06 21:41:53.292412 7f44755a77c0 0 > filestore(/var/lib/ceph/osd/ceph-2) backend btrfs (magic 0x9123683e) > 2015-11-06 21:41:59.753329 7f44755a77c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: > FIEMAP ioctl is supported and appears to work > 2015-11-06 21:41:59.753395 7f44755a77c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: > FIEMAP ioctl is disabled via 'filestore fiemap' config option > 2015-11-06 21:42:00.968438 7f44755a77c0 0 > genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: > syncfs(2) syscall fully supported (by glibc and kernel) > 2015-11-06 21:42:00.969431 7f44755a77c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: > CLONE_RANGE ioctl is supported > 2015-11-06 21:42:03.033742 7f44755a77c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: > SNAP_CREATE is supported > 2015-11-06 21:42:03.034262 7f44755a77c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: > SNAP_DESTROY is supported > 2015-11-06 21:42:03.042168 7f44755a77c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: > START_SYNC is supported (transid 8453) > 2015-11-06 21:42:04.144516 7f44755a77c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: > WAIT_SYNC is supported > 2015-11-06 21:42:04.309323 7f44755a77c0 0 > btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature: > SNAP_CREATE_V2 is supported > 2015-11-06 21:42:04.310562 7f44755a77c0 10 > filestore(/var/lib/ceph/osd/ceph-2) current/ missing entirely (unusual, > but okay) > 2015-11-06 21:42:04.310686 7f44755a77c0 10 > filestore(/var/lib/ceph/osd/ceph-2) most recent snap from >
[ceph-users] v9.2.0 Infernalis released
[I'm going to break my own rule and do this on a Friday only because this has been built and in the repos for a couple of days now; I've just been traveling and haven't had time to announce it.] This major release will be the foundation for the next stable series. There have been some major changes since v0.94.x Hammer, and the upgrade process is non-trivial. Please read these release notes carefully. Major Changes from Hammer - - General: * Ceph daemons are now managed via systemd (with the exception of Ubuntu Trusty, which still uses upstart). * Ceph daemons run as 'ceph' user instead root. * On Red Hat distros, there is also an SELinux policy. - RADOS: * The RADOS cache tier can now proxy write operations to the base tier, allowing writes to be handled without forcing migration of an object into the cache. * The SHEC erasure coding support is no longer flagged as experimental. SHEC trades some additional storage space for faster repair. * There is now a unified queue (and thus prioritization) of client IO, recovery, scrubbing, and snapshot trimming. * There have been many improvements to low-level repair tooling (ceph-objectstore-tool). * The internal ObjectStore API has been significantly cleaned up in order to faciliate new storage backends like NewStore. - RGW: * The Swift API now supports object expiration. * There are many Swift API compatibility improvements. - RBD: * The ``rbd du`` command shows actual usage (quickly, when object-map is enabled). * The object-map feature has seen many stability improvements. * Object-map and exclusive-lock features can be enabled or disabled dynamically. * You can now store user metadata and set persistent librbd options associated with individual images. * The new deep-flatten features allows flattening of a clone and all of its snapshots. (Previously snapshots could not be flattened.) * The export-diff command command is now faster (it uses aio). There is also a new fast-diff feature. * The --size argument can be specified with a suffix for units (e.g., ``--size 64G``). * There is a new ``rbd status`` command that, for now, shows who has the image open/mapped. - CephFS: * You can now rename snapshots. * There have been ongoing improvements around administration, diagnostics, and the check and repair tools. * The caching and revocation of client cache state due to unused inodes has been dramatically improved. * The ceph-fuse client behaves better on 32-bit hosts. Distro compatibility We have decided to drop support for many older distributions so that we can move to a newer compiler toolchain (e.g., C++11). Although it is still possible to build Ceph on older distributions by installing backported development tools, we are not building and publishing release packages for ceph.com. We now build packages for: * CentOS 7 or later. We have dropped support for CentOS 6 (and other RHEL 6 derivatives, like Scientific Linux 6). * Debian Jessie 8.x or later. Debian Wheezy 7.x's g++ has incomplete support for C++11 (and no systemd). * Ubuntu Trusty 14.04 or later. Ubuntu Precise 12.04 is no longer supported. * Fedora 22 or later. Upgrading from Firefly -- Upgrading directly from Firefly v0.80.z is not recommended. It is possible to do a direct upgrade, but not without downtime. We recommend that clusters are first upgraded to Hammer v0.94.4 or a later v0.94.z release; only then is it possible to upgrade to Infernalis 9.2.z for an online upgrade (see below). To do an offline upgrade directly from Firefly, all Firefly OSDs must be stopped and marked down before any Infernalis OSDs will be allowed to start up. This fencing is enforced by the Infernalis monitor, so use an upgrade procedure like: 1. Upgrade Ceph on monitor hosts 2. Restart all ceph-mon daemons 3. Upgrade Ceph on all OSD hosts 4. Stop all ceph-osd daemons 5. Mark all OSDs down with something like:: ceph osd down `seq 0 1000` 6. Start all ceph-osd daemons 7. Upgrade and restart remaining daemons (ceph-mds, radosgw) Upgrading from Hammer - * For all distributions that support systemd (CentOS 7, Fedora, Debian Jessie 8.x, OpenSUSE), ceph daemons are now managed using native systemd files instead of the legacy sysvinit scripts. For example,:: systemctl start ceph.target # start all daemons systemctl status ceph-osd@12 # check status of osd.12 The main notable distro that is *not* yet using systemd is Ubuntu trusty 14.04. (The next Ubuntu LTS, 16.04, will use systemd instead of upstart.) * Ceph daemons now run as user and group ``ceph`` by default. The ceph user has a static UID assigned by Fedora and Debian (also used by derivative distributions like RHEL/CentOS and Ubuntu). On SUSE the ceph user will currently get a