[ceph-users] Issues with CentOS RDO Liberty (OpenStack) and Ceph Repo (dependency resolution failed)

2015-11-06 Thread c...@dolphin-it.de


Dear Ceph-users,

I just set up a new CentOS7 Ceph- and OpenStack-Cluster. When "ceph-deploy 
install compute2" starts to set up the Ceph repo, it fails at dependency 
resolution:


===
Loaded plugins: fastestmirror, langpacks, priorities
Loading mirror speeds from cached hostfile
 * base: mirror.23media.de
 * elrepo: mirrors.ircam.fr
 * epel: mirror.23media.de
 * extras: ftp.rrzn.uni-hannover.de
 * updates: ftp.rrzn.uni-hannover.de
5977 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package ceph.x86_64 1:0.94.5-0.el7 will be installed
--> Processing Dependency: libcephfs1 = 1:0.94.5-0.el7 for package: 
1:ceph-0.94.5-0.el7.x86_64
--> Processing Dependency: python-rbd = 1:0.94.5-0.el7 for package: 
1:ceph-0.94.5-0.el7.x86_64
--> Processing Dependency: ceph-common = 1:0.94.5-0.el7 for package: 
1:ceph-0.94.5-0.el7.x86_64
--> Processing Dependency: python-cephfs = 1:0.94.5-0.el7 for package: 
1:ceph-0.94.5-0.el7.x86_64
--> Processing Dependency: python-rados = 1:0.94.5-0.el7 for package: 
1:ceph-0.94.5-0.el7.x86_64
--> Processing Dependency: python-flask for package: 1:ceph-0.94.5-0.el7.x86_64
--> Processing Dependency: hdparm for package: 1:ceph-0.94.5-0.el7.x86_64
--> Processing Dependency: libboost_program_options-mt.so.1.53.0()(64bit) for 
package: 1:ceph-0.94.5-0.el7.x86_64
--> Processing Dependency: libtcmalloc.so.4()(64bit) for package: 
1:ceph-0.94.5-0.el7.x86_64
--> Processing Dependency: libleveldb.so.1()(64bit) for package: 
1:ceph-0.94.5-0.el7.x86_64
--> Processing Dependency: libcephfs.so.1()(64bit) for package: 
1:ceph-0.94.5-0.el7.x86_64
---> Package ceph-radosgw.x86_64 1:0.94.5-0.el7 will be installed
--> Processing Dependency: libfcgi.so.0()(64bit) for package: 
1:ceph-radosgw-0.94.5-0.el7.x86_64
--> Running transaction check
---> Package boost-program-options.x86_64 0:1.53.0-23.el7 will be installed
---> Package ceph-common.x86_64 1:0.94.5-0.el7 will be installed
--> Processing Dependency: redhat-lsb-core for package: 
1:ceph-common-0.94.5-0.el7.x86_64
---> Package fcgi.x86_64 0:2.4.0-22.el7 will be installed
---> Package gperftools-libs.x86_64 0:2.1-1.el7 will be installed
--> Processing Dependency: libunwind.so.8()(64bit) for package: 
gperftools-libs-2.1-1.el7.x86_64
---> Package hdparm.x86_64 0:9.43-5.el7 will be installed
---> Package leveldb.x86_64 0:1.12.0-5.el7 will be installed
---> Package libcephfs1.x86_64 1:0.94.5-0.el7 will be installed
---> Package python-cephfs.x86_64 1:0.94.5-0.el7 will be installed
---> Package python-flask.noarch 1:0.10.1-3.el7 will be installed
--> Processing Dependency: python-itsdangerous for package: 
1:python-flask-0.10.1-3.el7.noarch
---> Package python-rados.x86_64 1:0.94.5-0.el7 will be installed
---> Package python-rbd.x86_64 1:0.94.5-0.el7 will be installed
--> Running transaction check
---> Package libunwind.x86_64 0:1.1-10.el7 will be installed
---> Package python-itsdangerous.noarch 0:0.23-1.el7 will be installed
---> Package redhat-lsb-core.x86_64 0:4.1-27.el7.centos.1 will be installed
--> Processing Dependency: redhat-lsb-submod-security(x86-64) = 
4.1-27.el7.centos.1 for package: redhat-lsb-core-4.1-27.el7.centos.1.x86_64
--> Processing Dependency: spax for package: 
redhat-lsb-core-4.1-27.el7.centos.1.x86_64
--> Processing Dependency: /usr/bin/patch for package: 
redhat-lsb-core-4.1-27.el7.centos.1.x86_64
--> Processing Dependency: /usr/bin/m4 for package: 
redhat-lsb-core-4.1-27.el7.centos.1.x86_64
--> Processing Dependency: /usr/bin/lpr for package: 
redhat-lsb-core-4.1-27.el7.centos.1.x86_64
--> Processing Dependency: /usr/bin/lp for package: 
redhat-lsb-core-4.1-27.el7.centos.1.x86_64
--> Running transaction check
---> Package cups-client.x86_64 1:1.6.3-17.el7 will be installed
--> Processing Dependency: cups-libs(x86-64) = 1:1.6.3-17.el7 for package: 
1:cups-client-1.6.3-17.el7.x86_64
---> Package m4.x86_64 0:1.4.16-9.el7 will be installed
---> Package patch.x86_64 0:2.7.1-8.el7 will be installed
---> Package redhat-lsb-submod-security.x86_64 0:4.1-27.el7.centos.1 will be 
installed
---> Package spax.x86_64 0:1.5.2-11.el7 will be installed
--> Finished Dependency Resolution
Error: Package: 1:cups-client-1.6.3-17.el7.x86_64 (core-0)
   Requires: cups-libs(x86-64) = 1:1.6.3-17.el7
   Installed: 1:cups-libs-1.6.3-17.el7_1.1.x86_64 (@updates)
   cups-libs(x86-64) = 1:1.6.3-17.el7_1.1
   Available: 1:cups-libs-1.6.3-17.el7.x86_64 (core-0)
   cups-libs(x86-64) = 1:1.6.3-17.el7
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

===

How can I solve this problem?

Regards,
Kevin


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSDs with bcache experience

2015-11-06 Thread Wido den Hollander
On 11/05/2015 11:03 PM, Michal Kozanecki wrote:
> Why did you guys go with partitioning the SSD for ceph journals, instead of 
> just using the whole SSD for bcache and leaving the journal on the filesystem 
> (which itself is ontop bcache)? Was there really a benefit to separating the 
> journals from the bcache fronted HDDs?
> 
> I ask because it has been shown in the past that separating the journal on 
> SSD based pools doesn't really do much.
> 

Well, the I/O for the journal by-passes bcache completely in this case.
The less code the I/O travels through the better we figured.

We didn't try with the Journal on bcache. This works for us so we didn't
mind testing anything different.

Wido

> Michal Kozanecki | Linux Administrator | mkozane...@evertz.com
> 
> 
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido 
> den Hollander
> Sent: October-28-15 5:49 AM
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Ceph OSDs with bcache experience
> 
> 
> 
> On 21-10-15 15:30, Mark Nelson wrote:
>>
>>
>> On 10/21/2015 01:59 AM, Wido den Hollander wrote:
>>> On 10/20/2015 07:44 PM, Mark Nelson wrote:
 On 10/20/2015 09:00 AM, Wido den Hollander wrote:
> Hi,
>
> In the "newstore direction" thread on ceph-devel I wrote that I'm 
> using bcache in production and Mark Nelson asked me to share some details.
>
> Bcache is running in two clusters now that I manage, but I'll keep 
> this information to one of them (the one at PCextreme behind CloudStack).
>
> In this cluster has been running for over 2 years now:
>
> epoch 284353
> fsid 0d56dd8f-7ae0-4447-b51b-f8b818749307
> created 2013-09-23 11:06:11.819520
> modified 2015-10-20 15:27:48.734213
>
> The system consists out of 39 hosts:
>
> 2U SuperMicro chassis:
> * 80GB Intel SSD for OS
> * 240GB Intel S3700 SSD for Journaling + Bcache
> * 6x 3TB disk
>
> This isn't the newest hardware. The next batch of hardware will be 
> more disks per chassis, but this is it for now.
>
> All systems were installed with Ubuntu 12.04, but they are all 
> running
> 14.04 now with bcache.
>
> The Intel S3700 SSD is partitioned with a GPT label:
> - 5GB Journal for each OSD
> - 200GB Partition for bcache
>
> root@ceph11:~# df -h|grep osd
> /dev/bcache02.8T  1.1T  1.8T  38% /var/lib/ceph/osd/ceph-60
> /dev/bcache12.8T  1.2T  1.7T  41% /var/lib/ceph/osd/ceph-61
> /dev/bcache22.8T  930G  1.9T  34% /var/lib/ceph/osd/ceph-62
> /dev/bcache32.8T  970G  1.8T  35% /var/lib/ceph/osd/ceph-63
> /dev/bcache42.8T  814G  2.0T  30% /var/lib/ceph/osd/ceph-64
> /dev/bcache52.8T  915G  1.9T  33% /var/lib/ceph/osd/ceph-65
> root@ceph11:~#
>
> root@ceph11:~# lsb_release -a
> No LSB modules are available.
> Distributor ID:Ubuntu
> Description:Ubuntu 14.04.3 LTS
> Release:14.04
> Codename:trusty
> root@ceph11:~# uname -r
> 3.19.0-30-generic
> root@ceph11:~#
>
> "apply_latency": {
>   "avgcount": 2985023,
>   "sum": 226219.891559000
> }
>
> What did we notice?
> - Less spikes on the disk
> - Lower commit latencies on the OSDs
> - Almost no 'slow requests' during backfills
> - Cache-hit ratio of about 60%
>
> Max backfills and recovery active are both set to 1 on all OSDs.
>
> For the next generation hardware we are looking into using 3U 
> chassis with 16 4TB SATA drives and a 1.2TB NVM-E SSD for bcache, 
> but we haven't tested those yet, so nothing to say about it.
>
> The current setup is 200GB of cache for 18TB of disks. The new 
> setup will be 1200GB for 64TB, curious to see what that does.
>
> Our main conclusion however is that it does smoothen the 
> I/O-pattern towards the disks and that gives a overall better 
> response of the disks.

 Hi Wido, thanks for the big writeup!  Did you guys happen to do any 
 benchmarking?  I think Xiaoxi looked at flashcache a while back but 
 had mixed results if I remember right.  It would be interesting to 
 know how bcache is affecting performance in different scenarios.

>>>
>>> No, we didn't do any benchmarking. Initially this cluster was build 
>>> for just the RADOS Gateway, so we went for 2Gbit (2x 1Gbit) per 
>>> machine. 90% is still Gbit networking and we are in the process of 
>>> upgrading it all to 10Gbit.
>>>
>>> Since the 1Gbit network latency is about 4 times higher then 10Gbit 
>>> we aren't really benchmarking the cluster.
>>>
>>> What counts for us most is that we can do recovery operations without 
>>> any slow requests.
>>>
>>> Before bcache we saw disks spike to 100% busy while a backfill was busy.
>>> Now bcache smoothens this and we see peaks of maybe 70%, but that's it.
>>
>> In the testing I was doing to 

[ceph-users] Soft removal of RBD images

2015-11-06 Thread Wido den Hollander
Hi,

Since Ceph Hammer we can protect pools from being removed from the
cluster, but we can't protect against this:

$ rbd ls|xargs -n 1 rbd rm

That would remove all not opened RBD images from the cluster.

This requires direct access to your Ceph cluster and keys with the
proper permission, but it could also be that somebody gains access to a
OpenStack or CloudStack API with the proper credentials and issues a
removal for all volumes.

*Stack will then remove the RBD image and you just lost the data or you
face a very long restore procedure.

What about a soft-delete for RBD images? I don't know how it should
work, since if you gain native RADOS access you can still remove all
objects:

$ rados -p rbd ls|xargs -n 1 rados -p rbd rm

I don't have a design idea yet, but it's something that came to mind.
I'd personally like a double-double backup before Ceph decides to remove
the data.

But for example:

When a RBD image is removed we set the "removed" bit in the RBD header,
but every RADOS object also gets a "removed" bit set.

After a X period the OSD which is primary for a PG starts to remove all
objects which have that bit set.

In the meantime you can still get back the RBD image by reverting it in
a special way. With a special cephx capability for example.

This goes a bit in the direction of soft pool-removals as well, it might
be combined.

Comments?

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Openstack deployment

2015-11-06 Thread Vasiliy Angapov
There must be something in /var/log/cinder/volume.log or
/var/log/nova/nova-compute.log that points to the problem. Can you
post it here?

2015-11-06 20:14 GMT+08:00 Iban Cabrillo :
> Hi Vasilly,
>   Thanks, but I still see the same error:
>
> cinder.conf (of course I just restart the cinder-volume service)
>
> # default volume type to use (string value)
>
> [rbd-cephvolume]
> rbd_user = cinder
> rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-xxx
> volume_backend_name=rbd
> volume_driver = cinder.volume.drivers.rbd.RBDDriver
> rbd_pool = volumes
> rbd_ceph_conf = /etc/ceph/ceph.conf
> rbd_flatten_volume_from_snapshot = false
> rbd_max_clone_depth = 5
> rbd_store_chunk_size = 4
> rados_connect_timeout = -1
> glance_api_version = 2
>
>
>   xen be: qdisk-51760: error: Could not open
> 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
> directory
> xen be: qdisk-51760: initialise() failed
> xen be: qdisk-51760: error: Could not open
> 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
> directory
> xen be: qdisk-51760: initialise() failed
> xen be: qdisk-51760: error: Could not open
> 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
> directory
> xen be: qdisk-51760: initialise() failed
> xen be: qdisk-51760: error: Could not open
> 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
> directory
> xen be: qdisk-51760: initialise() failed
> xen be: qdisk-51760: error: Could not open
> 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
> directory
> xen be: qdisk-51760: initialise() failed
> xen be: qdisk-51760: error: Could not open
> 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
> directory
> xen be: qdisk-51760: initialise() failed
>
> Regards, I
>
> 2015-11-06 13:00 GMT+01:00 Vasiliy Angapov :
>>
>> At cinder.conf you should place this options:
>>
>> rbd_user = cinder
>> rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-xxx
>>
>> to [rbd-cephvolume] section instead of DEFAULT.
>>
>> 2015-11-06 19:45 GMT+08:00 Iban Cabrillo :
>> > Hi,
>> >   One more step debugging this issue (hypervisor/nova-compute node is
>> > XEN
>> > 4.4.2):
>> >
>> >   I think the problem is that libvirt is not getting the correct user or
>> > credentials tu access pool, on instance qemu log i see:
>> >
>> > xen be: qdisk-51760: error: Could not open
>> > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
>> > directory
>> > xen be: qdisk-51760: initialise() failed
>> > xen be: qdisk-51760: error: Could not open
>> > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
>> > directory
>> > xen be: qdisk-51760: initialise() failed
>> > xen be: qdisk-51760: error: Could not open
>> > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
>> > directory
>> >
>> > But using the user cinder on pool volumes :
>> >
>> > rbd ls -p volumes --id cinder
>> > test
>> > volume-4d26bb31-91e8-4646-8010-82127b775c8e
>> > volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a
>> > volume-7da08f12-fb0f-4269-931a-d528c1507fee
>> >
>> > Using:
>> > qemu-img info -f rbd rbd:volumes/test
>> > Does not work, but using directly the user cinder and the ceph.conf file
>> > works fine:
>> >
>> > qemu-img info -f rbd rbd:volumes/test:id=cinder:conf=/etc/ceph/ceph.conf
>> >
>> > I think nova.conf is set correctly (section libvirt):
>> > images_rbd_pool = volumes
>> > images_rbd_ceph_conf = /etc/ceph/ceph.conf
>> > hw_disk_discard=unmap
>> > rbd_user = cinder
>> > rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-
>> >
>> > And looking at libvirt:
>> >
>> > # virsh secret-list
>> > setlocale: No such file or directory
>> >  UUID  Usage
>> >
>> > 
>> >  67a6d4a1-e53a-42c7-9bc9-  ceph client.cinder secret
>> >
>> >
>> > virsh secret-get-value 67a6d4a1-e53a-42c7-9bc9-
>> > setlocale: No such file or directory
>> > AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg==
>> > cat /etc/ceph/ceph.client.cinder.keyring
>> > [client.cinder]
>> > key = AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg==
>> >
>> >
>> > Any idea will be welcomed.
>> > regards, I
>> >
>> > 2015-11-04 10:51 GMT+01:00 Iban Cabrillo :
>> >>
>> >> Dear Cephers,
>> >>
>> >>I still can attach volume to my cloud machines, ceph version is
>> >> 0.94.5
>> >> (9764da52395923e0b32908d83a9f7304401fee43) and Openstack Juno
>> >>
>> >>Nova+cinder are able to create volumes on Ceph
>> >> cephvolume:~ # rados ls --pool volumes
>> >> rbd_header.1f7784a9e1c2e
>> >> rbd_id.volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a
>> >> rbd_directory
>> >> rbd_id.volume-7da08f12-fb0f-4269-931a-d528c1507fee
>> >> rbd_header.23d5e33b4c15c
>> >> rbd_id.volume-4d26bb31-91e8-4646-8010-82127b775c8e
>> >> rbd_header.20407190ce77f
>> >>
>> >> cloud:~ # 

Re: [ceph-users] Ceph Openstack deployment

2015-11-06 Thread Iban Cabrillo
Hi Vasily,
  Of course,
from cinder-volume.log

 2015-11-06 12:28:52.865 366 WARNING oslo_config.cfg
[req-41a4-4bec-40d2-a7c1-6e8d73644b4c b7aadbb4a85745feb498b74e437129cc
ce2dd2951bd24c1ea3b43c3b3716f604 - - -] Option "lock_path" from group
"DEFAULT" is deprecated. Use option "lock_path" from group
"oslo_concurrency".
2015-11-06 13:09:31.863 15534 WARNING oslo_config.cfg
[req-dd47624d-cf25-4beb-9d9e-70f532b2e8f9 - - - - -] Option "lock_path"
from group "DEFAULT" is deprecated. Use option "lock_path" from group
"oslo_concurrency".
2015-11-06 13:09:44.375 15544 WARNING oslo_config.cfg
[req-696a1282-b84c-464c-a220-d4e41a7dbd02 b7aadbb4a85745feb498b74e437129cc
ce2dd2951bd24c1ea3b43c3b3716f604 - - -] Option "lock_path" from group
"DEFAULT" is deprecated. Use option "lock_path" from group
"oslo_concurrency".
2015-11-06 13:11:02.024 15722 WARNING oslo_config.cfg
[req-db3c3775-3607-4fb7-acc9-5dba207bde56 - - - - -] Option "lock_path"
from group "DEFAULT" is deprecated. Use option "lock_path" from group
"oslo_concurrency".
2015-11-06 13:11:40.042 15729 WARNING oslo_config.cfg
[req-45458cfd-4e3a-4be2-b858-cece77072829 b7aadbb4a85745feb498b74e437129cc
ce2dd2951bd24c1ea3b43c3b3716f604 - - -] Option "lock_path" from group
"DEFAULT" is deprecated. Use option "lock_path" from group
"oslo_concurrency".
2015-11-06 13:16:49.331 15729 WARNING cinder.quota
[req-4e2c2f71-5bfa-487e-a99f-a6bb63bf1bc1 - - - - -] Deprecated: Default
quota for resource: gigabytes_rbd is set by the default quota flag:
quota_gigabytes_rbd, it is now deprecated. Please use the default quota
class for default quota.
2015-11-06 13:16:49.332 15729 WARNING cinder.quota
[req-4e2c2f71-5bfa-487e-a99f-a6bb63bf1bc1 - - - - -] Deprecated: Default
quota for resource: volumes_rbd is set by the default quota flag:
quota_volumes_rbd, it is now deprecated. Please use the default quota class
for default quota.
2015-11-06 13:18:16.163 16635 WARNING oslo_config.cfg
[req-503543b9-c2df-4483-a8b3-11f622a9cbe8 - - - - -] Option "lock_path"
from group "DEFAULT" is deprecated. Use option "lock_path" from group
"oslo_concurrency".
2015-11-06 14:17:08.288 16970 WARNING oslo_config.cfg
[req-a4ce4dbf-4119-427b-b555-930e66b9a2e3 58981d56c6cd4c5cacd59e518220a0eb
4d778e83692b44778f71cbe44da0bc0b - - -] Option "lock_path" from group
"DEFAULT" is deprecated. Use option "lock_path" from group
"oslo_concurrency".
2015-11-06 14:17:08.674 16970 WARNING cinder.quota
[req-fe21f3ad-7160-45b4-8adf-4cbe4bb85fc3 - - - - -] Deprecated: Default
quota for resource: gigabytes_rbd is set by the default quota flag:
quota_gigabytes_rbd, it is now deprecated. Please use the default quota
class for default quota.
2015-11-06 14:17:08.676 16970 WARNING cinder.quota
[req-fe21f3ad-7160-45b4-8adf-4cbe4bb85fc3 - - - - -] Deprecated: Default
quota for resource: volumes_rbd is set by the default quota flag:
quota_volumes_rbd, it is now deprecated. Please use the default quota class
for default quota.

And from nova-compute.log

2015-11-06 12:28:20.260 25915 INFO oslo_messaging._drivers.impl_rabbit
[req-dd85618c-ab24-43df-8192-b069d00abeeb - - - - -] Connected to AMQP
server on rabbitmq01:5672
2015-11-06 12:28:51.864 25915 INFO nova.compute.manager
[req-030d8966-cbe7-46c3-9d95-a1c886553fbd b7aadbb4a85745feb498b74e437129cc
ce2dd2951bd24c1ea3b43c3b3716f604 - - -] [instance:
08f6fef5-7c98-445b-abfe-636c4c6fee89] Detach volume
4d26bb31-91e8-4646-8010-82127b775c8e from mountpoint /dev/xvdd
2015-11-06 12:29:18.255 25915 INFO nova.compute.resource_tracker
[req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 - - - - -] Auditing locally
available compute resources for node cms01.ifca.es
2015-11-06 12:29:18.480 25915 INFO nova.compute.resource_tracker
[req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 - - - - -] Total usable vcpus:
24, total allocated vcpus: 24
2015-11-06 12:29:18.481 25915 INFO nova.compute.resource_tracker
[req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 - - - - -] Final resource view:
name=cms01.ifca.es phys_ram=49143MB used_ram=47616MB phys_disk=270GB
used_disk=220GB total_vcpus=24 used_vcpus=24
pci_stats=
2015-11-06 12:29:18.508 25915 INFO nova.scheduler.client.report
[req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 - - - - -] Compute_service record
updated for ('cms01', 'cms01.ifca.es')
2015-11-06 12:29:18.508 25915 INFO nova.compute.resource_tracker
[req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 - - - - -] Compute_service record
updated for cms01:cms01.ifca.es
2015-11-06 12:29:49.825 25915 INFO nova.compute.manager
[req-92d8810c-bea8-4eba-b682-c0d4e9d90c89 b7aadbb4a85745feb498b74e437129cc
ce2dd2951bd24c1ea3b43c3b3716f604 - - -] [instance:
08f6fef5-7c98-445b-abfe-636c4c6fee89] Attaching volume
4d26bb31-91e8-4646-8010-82127b775c8e to /dev/xvdd
2015-11-06 12:30:20.389 25915 INFO nova.compute.resource_tracker
[req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 - - - - -] Auditing locally
available compute resources for node cms01.ifca.es
2015-11-06 12:30:20.595 25915 INFO nova.compute.resource_tracker
[req-0a4b7821-1b11-4ff7-a78d-d7e2b7b5a001 

Re: [ceph-users] Soft removal of RBD images

2015-11-06 Thread Gregory Farnum
On Fri, Nov 6, 2015 at 2:03 AM, Wido den Hollander  wrote:
> Hi,
>
> Since Ceph Hammer we can protect pools from being removed from the
> cluster, but we can't protect against this:
>
> $ rbd ls|xargs -n 1 rbd rm
>
> That would remove all not opened RBD images from the cluster.
>
> This requires direct access to your Ceph cluster and keys with the
> proper permission, but it could also be that somebody gains access to a
> OpenStack or CloudStack API with the proper credentials and issues a
> removal for all volumes.
>
> *Stack will then remove the RBD image and you just lost the data or you
> face a very long restore procedure.
>
> What about a soft-delete for RBD images? I don't know how it should
> work, since if you gain native RADOS access you can still remove all
> objects:
>
> $ rados -p rbd ls|xargs -n 1 rados -p rbd rm
>
> I don't have a design idea yet, but it's something that came to mind.
> I'd personally like a double-double backup before Ceph decides to remove
> the data.
>
> But for example:
>
> When a RBD image is removed we set the "removed" bit in the RBD header,
> but every RADOS object also gets a "removed" bit set.
>
> After a X period the OSD which is primary for a PG starts to remove all
> objects which have that bit set.
>
> In the meantime you can still get back the RBD image by reverting it in
> a special way. With a special cephx capability for example.
>
> This goes a bit in the direction of soft pool-removals as well, it might
> be combined.
>
> Comments?

Besides the work of implementing lazy object deletes, I'm not sure
it's a good idea — when somebody's cluster fills up (and there's
always somebody!) we need a way to do deletes, and for that data to go
away immediately. We have enough trouble with people testing cache
pools and finding out there isn't instant deletion of the underlying
data. ;)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Group permission problems with CephFS

2015-11-06 Thread Aaron Ten Clay
I'm seeing similar behavior as well.

-rw-rw-r-- 1 testuser testgroup 6 Nov  6 07:41 testfile
aaron@testhost$ groups
... testgroup ...
aaron@testhost$ cat > testfile
-bash: testfile: Permission denied

Running version 9.0.2. Were you able to make any progress on this?

Thanks,
-Aaron



On Tue, Aug 4, 2015 at 4:19 AM, Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de> wrote:

> Hi,
>
> I've encountered some problems accesing files on CephFS:
>
> $ ls -al syntenyPlot.png
> -rw-r- 1 edgar edgar 9329 Jun 11  2014 syntenyPlot.png
>
> $ groups
> ... edgar ...
>
> $ cat syntenyPlot.png
> cat: syntenyPlot.png: Permission denied
>
> CephFS is mounted via ceph-fuse:
> ceph-fuse on /ceph type fuse.ceph-fuse
> (rw,nosuid,nodev,noatime,allow_other,default_permissions)
>
> OS is Ubuntu 14.04, Ceph version is 0.94.2
>
> I've isolated a test machine and activated debugging (debug_client =
> 20/20). The following lines correspond to the 'cat' invocation:
>
> 2015-08-04 12:59:44.030372 7f574dffb700 20 client.421984 _ll_get
> 0x7f5758024da0 100022310be -> 13
> 2015-08-04 12:59:44.030398 7f574dffb700  3 client.421984 ll_getattr
> 100022310be.head
> 2015-08-04 12:59:44.030403 7f574dffb700 10 client.421984 _getattr mask
> pAsLsXsFs issued=1
> 2015-08-04 12:59:44.030413 7f574dffb700 10 client.421984 fill_stat on
> 100022310be snap/devhead mode 042770 mtime 2014-06-12 09:31:39.00 ctime
> 2015-07-31 14:17:12.364416
> 2015-08-04 12:59:44.030426 7f574dffb700  3 client.421984 ll_getattr
> 100022310be.head = 0
> 2015-08-04 12:59:44.030443 7f574dffb700  3 client.421984 ll_forget
> 100022310be 1
> 2015-08-04 12:59:44.030447 7f574dffb700 20 client.421984 _ll_put
> 0x7f5758024da0 100022310be 1 -> 12
> 2015-08-04 12:59:44.030459 7f574dffb700 20 client.421984 _ll_get
> 0x7f5758024da0 100022310be -> 13
> 2015-08-04 12:59:44.030463 7f574dffb700  3 client.421984 ll_lookup
> 0x7f5758024da0 syntenyPlot.png
> 2015-08-04 12:59:44.030469 7f574dffb700 20 client.421984 _lookup have dn
> syntenyPlot.png mds.-1 ttl 0.00 seq 0
> 2015-08-04 12:59:44.030476 7f574dffb700 10 client.421984 _lookup
> 100022310be.head(ref=3 ll_ref=13 cap_refs={} open={} mode=42770 size=0/0
> mtime=2014-06-12 09:31:39.00 caps=pAsLsXsFs(0=pAsLsXsFs) COMPLETE
> parents=0x7f57580261d0 0x7f5758024da0) syntenyPlot.png =
> 1000223121e.head(ref=2 ll_ref=20 cap_refs={} open={} mode=100640
> size=9329/0 mtime=2014-06-11 09:05:47.00
> caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000223121e ts 0/0 objects 0
> dirty_or_tx 0] parents=0x7f575802d290 0x7f575802c5a0)
> 2015-08-04 12:59:44.030530 7f574dffb700 10 client.421984 fill_stat on
> 1000223121e snap/devhead mode 0100640 mtime 2014-06-11 09:05:47.00
> ctime 2015-08-04 11:07:53.623370
> 2015-08-04 12:59:44.030539 7f574dffb700 20 client.421984 _ll_get
> 0x7f575802c5a0 1000223121e -> 21
> 2015-08-04 12:59:44.030542 7f574dffb700  3 client.421984 ll_lookup
> 0x7f5758024da0 syntenyPlot.png -> 0 (1000223121e)
> 2015-08-04 12:59:44.030555 7f574dffb700  3 client.421984 ll_forget
> 100022310be 1
> 2015-08-04 12:59:44.030558 7f574dffb700 20 client.421984 _ll_put
> 0x7f5758024da0 100022310be 1 -> 12
> 2015-08-04 12:59:44.030628 7f57467fc700 20 client.421984 _ll_get
> 0x7f575802c5a0 1000223121e -> 22
> 2015-08-04 12:59:44.030645 7f57467fc700  3 client.421984 ll_getattr
> 1000223121e.head
> 2015-08-04 12:59:44.030649 7f57467fc700 10 client.421984 _getattr mask
> pAsLsXsFs issued=1
> 2015-08-04 12:59:44.030659 7f57467fc700 10 client.421984 fill_stat on
> 1000223121e snap/devhead mode 0100640 mtime 2014-06-11 09:05:47.00
> ctime 2015-08-04 11:07:53.623370
> 2015-08-04 12:59:44.030672 7f57467fc700  3 client.421984 ll_getattr
> 1000223121e.head = 0
> 2015-08-04 12:59:44.030690 7f57467fc700  3 client.421984 ll_forget
> 1000223121e 1
> 2015-08-04 12:59:44.030695 7f57467fc700 20 client.421984 _ll_put
> 0x7f575802c5a0 1000223121e 1 -> 21
> 2015-08-04 12:59:44.030760 7f574e7fc700 20 client.421984 _ll_get
> 0x7f575802c5a0 1000223121e -> 22
> 2015-08-04 12:59:44.030775 7f574e7fc700  3 client.421984 ll_open
> 1000223121e.head 32768
> 2015-08-04 12:59:44.030779 7f574e7fc700  3 client.421984 ll_open
> 1000223121e.head 32768 = -13 (0)
> 2015-08-04 12:59:44.030797 7f574e7fc700  3 client.421984 ll_forget
> 1000223121e 1
> 2015-08-04 12:59:44.030802 7f574e7fc700 20 client.421984 _ll_put
> 0x7f575802c5a0 1000223121e 1 -> 21
>
>
> The return value of -13 in the open call is probably 'permission denied'.
>
> The setup looks ok with respect to permissions. The root user is able to
> read the file in question. The owning user is also able to read the file
> (after sudo). The problem occurs on several hosts for a number of files,
> but not all files or all users on CephFS are affected. User and group
> information are stored in LDAP and made available via SSSD; ls -l displays
> to correct group and user names, and id(1) lists the correct id and names.
>
> Any hints on what's going wrong here?
>
> Best regards,
> Burkhard
> 

Re: [ceph-users] Group permission problems with CephFS

2015-11-06 Thread Burkhard Linke

Hi,

On 11/06/2015 04:52 PM, Aaron Ten Clay wrote:

I'm seeing similar behavior as well.

-rw-rw-r-- 1 testuser testgroup 6 Nov  6 07:41 testfile
aaron@testhost$ groups
... testgroup ...
aaron@testhost$ cat > testfile
-bash: testfile: Permission denied

Running version 9.0.2. Were you able to make any progress on this?
There's a pending pull request that need to be included in the next 
release (see http://tracker.ceph.com/issues/12617)


Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Openstack deployment

2015-11-06 Thread Iban Cabrillo
Hi Vasilly,
  Thanks, but I still see the same error:

cinder.conf (of course I just restart the cinder-volume service)

# default volume type to use (string value)

[rbd-cephvolume]
rbd_user = cinder
rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-xxx
volume_backend_name=rbd
volume_driver = cinder.volume.drivers.rbd.RBDDriver
rbd_pool = volumes
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
glance_api_version = 2


  xen be: qdisk-51760: error: Could not open
'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
directory
xen be: qdisk-51760: initialise() failed
xen be: qdisk-51760: error: Could not open
'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
directory
xen be: qdisk-51760: initialise() failed
xen be: qdisk-51760: error: Could not open
'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
directory
xen be: qdisk-51760: initialise() failed
xen be: qdisk-51760: error: Could not open
'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
directory
xen be: qdisk-51760: initialise() failed
xen be: qdisk-51760: error: Could not open
'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
directory
xen be: qdisk-51760: initialise() failed
xen be: qdisk-51760: error: Could not open
'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
directory
xen be: qdisk-51760: initialise() failed

Regards, I

2015-11-06 13:00 GMT+01:00 Vasiliy Angapov :

> At cinder.conf you should place this options:
>
> rbd_user = cinder
> rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-xxx
>
> to [rbd-cephvolume] section instead of DEFAULT.
>
> 2015-11-06 19:45 GMT+08:00 Iban Cabrillo :
> > Hi,
> >   One more step debugging this issue (hypervisor/nova-compute node is XEN
> > 4.4.2):
> >
> >   I think the problem is that libvirt is not getting the correct user or
> > credentials tu access pool, on instance qemu log i see:
> >
> > xen be: qdisk-51760: error: Could not open
> > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
> > directory
> > xen be: qdisk-51760: initialise() failed
> > xen be: qdisk-51760: error: Could not open
> > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
> > directory
> > xen be: qdisk-51760: initialise() failed
> > xen be: qdisk-51760: error: Could not open
> > 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
> > directory
> >
> > But using the user cinder on pool volumes :
> >
> > rbd ls -p volumes --id cinder
> > test
> > volume-4d26bb31-91e8-4646-8010-82127b775c8e
> > volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a
> > volume-7da08f12-fb0f-4269-931a-d528c1507fee
> >
> > Using:
> > qemu-img info -f rbd rbd:volumes/test
> > Does not work, but using directly the user cinder and the ceph.conf file
> > works fine:
> >
> > qemu-img info -f rbd rbd:volumes/test:id=cinder:conf=/etc/ceph/ceph.conf
> >
> > I think nova.conf is set correctly (section libvirt):
> > images_rbd_pool = volumes
> > images_rbd_ceph_conf = /etc/ceph/ceph.conf
> > hw_disk_discard=unmap
> > rbd_user = cinder
> > rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-
> >
> > And looking at libvirt:
> >
> > # virsh secret-list
> > setlocale: No such file or directory
> >  UUID  Usage
> >
> 
> >  67a6d4a1-e53a-42c7-9bc9-  ceph client.cinder secret
> >
> >
> > virsh secret-get-value 67a6d4a1-e53a-42c7-9bc9-
> > setlocale: No such file or directory
> > AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg==
> > cat /etc/ceph/ceph.client.cinder.keyring
> > [client.cinder]
> > key = AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg==
> >
> >
> > Any idea will be welcomed.
> > regards, I
> >
> > 2015-11-04 10:51 GMT+01:00 Iban Cabrillo :
> >>
> >> Dear Cephers,
> >>
> >>I still can attach volume to my cloud machines, ceph version is
> 0.94.5
> >> (9764da52395923e0b32908d83a9f7304401fee43) and Openstack Juno
> >>
> >>Nova+cinder are able to create volumes on Ceph
> >> cephvolume:~ # rados ls --pool volumes
> >> rbd_header.1f7784a9e1c2e
> >> rbd_id.volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a
> >> rbd_directory
> >> rbd_id.volume-7da08f12-fb0f-4269-931a-d528c1507fee
> >> rbd_header.23d5e33b4c15c
> >> rbd_id.volume-4d26bb31-91e8-4646-8010-82127b775c8e
> >> rbd_header.20407190ce77f
> >>
> >> cloud:~ # cinder list
> >>
> >>
> +--++--+--+-+--+--+
> >> |  ID   |
> >> Status  | Display Name | Size | Volume Type | Bootable |
> >> Attached to  |
> >>
> >>
> 

[ceph-users] ceph-deploy on lxc container - 'initctl: Event failed'

2015-11-06 Thread Bogdan SOLGA
Hello, everyone!

I just tried to create a new Ceph cluster, using 3 LXC clusters as
monitors, and the 'ceph-deploy mon create-initial' command fails for each
of the monitors with a 'initctl: Event failed' error, when running the
following command:

[ceph-mon-01][INFO  ] Running command: sudo initctl emit ceph-mon
cluster=ceph id=ceph-mon-01
[ceph-mon-01][WARNIN] initctl: Event failed

Is it OK to use LXC containers as Ceph MONs? if yes - is there anything
special which needs to be done prior to the 'mon create-initial' phase?

Thank you!

Regards,
Bogdan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd fails to start, rbd hangs

2015-11-06 Thread Philipp Schwaha
Hi,

I have an issue with my (small) ceph cluster after an osd failed.
ceph -s reports the following:
cluster 2752438a-a33e-4df4-b9ec-beae32d00aad
 health HEALTH_WARN
31 pgs down
31 pgs peering
31 pgs stuck inactive
31 pgs stuck unclean
 monmap e1: 1 mons at {0=192.168.19.13:6789/0}
election epoch 1, quorum 0 0
 osdmap e138: 3 osds: 2 up, 2 in
  pgmap v77979: 64 pgs, 1 pools, 844 GB data, 211 kobjects
1290 GB used, 8021 GB / 9315 GB avail
  33 active+clean
  31 down+peering

I am now unable to map the rbd image; the command will just time out.
The log is at the end of the message.

Is there a way to recover the osd / the ceph cluster from this?

thanks in advance
Philipp



-2> 2015-10-30 01:04:59.689116 7f4bb741e700  1 heartbeat_map
is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had timed out after 15
-1> 2015-10-30 01:04:59.689140 7f4bb741e700  1 heartbeat_map
is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had suicide timed out
after 150
 0> 2015-10-30 01:04:59.906546 7f4bb741e700 -1
common/HeartbeatMap.cc: In function 'bool
ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*,
time_t)' thread 7f4bb741e700 time 2015-10-30 01:04:59.689176
common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")

 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x77) [0xb12457]
 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
long)+0x119) [0xa47179]
 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
 4: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
 5: (CephContextServiceThread::entry()+0x164) [0xb21974]
 6: (()+0x76f5) [0x7f4bbdb0c6f5]
 7: (__clone()+0x6d) [0x7f4bbc09cedd]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 1
  max_new 1000
  log_file /var/log/ceph/ceph-osd.2.log
--- end dump of recent events ---
2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal (Aborted) **
 in thread 7f4bb741e700

 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
 1: /usr/bin/ceph-osd() [0xa11c84]
 2: (()+0x10690) [0x7f4bbdb15690]
 3: (gsignal()+0x37) [0x7f4bbbfe63c7]
 4: (abort()+0x16a) [0x7f4bbbfe77fa]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45]
 6: (()+0x5dda7) [0x7f4bbc8c5da7]
 7: (()+0x5ddf2) [0x7f4bbc8c5df2]
 8: (()+0x5e008) [0x7f4bbc8c6008]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x252) [0xb12632]
 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
long)+0x119) [0xa47179]
 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
 13: (CephContextServiceThread::entry()+0x164) [0xb21974]
 14: (()+0x76f5) [0x7f4bbdb0c6f5]
 15: (__clone()+0x6d) [0x7f4bbc09cedd]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

--- begin dump of recent events ---
 0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal
(Aborted) **
 in thread 7f4bb741e700

 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
 1: /usr/bin/ceph-osd() [0xa11c84]
 2: (()+0x10690) [0x7f4bbdb15690]
 3: (gsignal()+0x37) [0x7f4bbbfe63c7]
 4: (abort()+0x16a) [0x7f4bbbfe77fa]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45]
 6: (()+0x5dda7) [0x7f4bbc8c5da7]
 7: (()+0x5ddf2) [0x7f4bbc8c5df2]
 8: (()+0x5e008) [0x7f4bbc8c6008]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x252) [0xb12632]
 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
long)+0x119) [0xa4
7179]
 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
 12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
 13: (CephContextServiceThread::entry()+0x164) [0xb21974]
 14: (()+0x76f5) [0x7f4bbdb0c6f5]
 15: (__clone()+0x6d) [0x7f4bbc09cedd]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this
.

--- begin dump of recent events ---
 0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** 

Re: [ceph-users] osd fails to start, rbd hangs

2015-11-06 Thread Gregory Farnum
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/

:)

On Friday, November 6, 2015, Philipp Schwaha  wrote:

> Hi,
>
> I have an issue with my (small) ceph cluster after an osd failed.
> ceph -s reports the following:
> cluster 2752438a-a33e-4df4-b9ec-beae32d00aad
>  health HEALTH_WARN
> 31 pgs down
> 31 pgs peering
> 31 pgs stuck inactive
> 31 pgs stuck unclean
>  monmap e1: 1 mons at {0=192.168.19.13:6789/0}
> election epoch 1, quorum 0 0
>  osdmap e138: 3 osds: 2 up, 2 in
>   pgmap v77979: 64 pgs, 1 pools, 844 GB data, 211 kobjects
> 1290 GB used, 8021 GB / 9315 GB avail
>   33 active+clean
>   31 down+peering
>
> I am now unable to map the rbd image; the command will just time out.
> The log is at the end of the message.
>
> Is there a way to recover the osd / the ceph cluster from this?
>
> thanks in advance
> Philipp
>
>
>
> -2> 2015-10-30 01:04:59.689116 7f4bb741e700  1 heartbeat_map
> is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had timed out after 15
> -1> 2015-10-30 01:04:59.689140 7f4bb741e700  1 heartbeat_map
> is_healthy 'OSD::osd_tp thread 0x7f4ba13cd700' had suicide timed out
> after 150
>  0> 2015-10-30 01:04:59.906546 7f4bb741e700 -1
> common/HeartbeatMap.cc: In function 'bool
> ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*,
> time_t)' thread 7f4bb741e700 time 2015-10-30 01:04:59.689176
> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout")
>
>  ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x77) [0xb12457]
>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x119) [0xa47179]
>  3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
>  4: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
>  5: (CephContextServiceThread::entry()+0x164) [0xb21974]
>  6: (()+0x76f5) [0x7f4bbdb0c6f5]
>  7: (__clone()+0x6d) [0x7f4bbc09cedd]
>  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
>
> --- logging levels ---
>0/ 5 none
>0/ 1 lockdep
>0/ 1 context
>1/ 1 crush
>1/ 5 mds
>1/ 5 mds_balancer
>1/ 5 mds_locker
>1/ 5 mds_log
>1/ 5 mds_log_expire
>1/ 5 mds_migrator
>0/ 1 buffer
>0/ 1 timer
>0/ 1 filer
>0/ 1 striper
>0/ 1 objecter
>0/ 5 rados
>0/ 5 rbd
>0/ 5 rbd_replay
>0/ 5 journaler
>0/ 5 objectcacher
>0/ 5 client
>0/ 5 osd
>0/ 5 optracker
>0/ 5 objclass
>1/ 3 filestore
>1/ 3 keyvaluestore
>1/ 3 journal
>0/ 5 ms
>1/ 5 mon
>0/10 monc
>1/ 5 paxos
>0/ 5 tp
>1/ 5 auth
>1/ 5 crypto
>1/ 1 finisher
>1/ 5 heartbeatmap
>1/ 5 perfcounter
>1/ 5 rgw
>1/10 civetweb
>1/ 5 javaclient
>1/ 5 asok
>1/ 1 throttle
>0/ 0 refs
>1/ 5 xio
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent 1
>   max_new 1000
>   log_file /var/log/ceph/ceph-osd.2.log
> --- end dump of recent events ---
> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal (Aborted) **
>  in thread 7f4bb741e700
>
>  ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>  1: /usr/bin/ceph-osd() [0xa11c84]
>  2: (()+0x10690) [0x7f4bbdb15690]
>  3: (gsignal()+0x37) [0x7f4bbbfe63c7]
>  4: (abort()+0x16a) [0x7f4bbbfe77fa]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45]
>  6: (()+0x5dda7) [0x7f4bbc8c5da7]
>  7: (()+0x5ddf2) [0x7f4bbc8c5df2]
>  8: (()+0x5e008) [0x7f4bbc8c6008]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x252) [0xb12632]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x119) [0xa47179]
>  11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xa47b76]
>  12: (ceph::HeartbeatMap::check_touch_file()+0x18) [0xa48258]
>  13: (CephContextServiceThread::entry()+0x164) [0xb21974]
>  14: (()+0x76f5) [0x7f4bbdb0c6f5]
>  15: (__clone()+0x6d) [0x7f4bbc09cedd]
>  NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
>
> --- begin dump of recent events ---
>  0> 2015-10-30 01:05:00.193324 7f4bb741e700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f4bb741e700
>
>  ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>  1: /usr/bin/ceph-osd() [0xa11c84]
>  2: (()+0x10690) [0x7f4bbdb15690]
>  3: (gsignal()+0x37) [0x7f4bbbfe63c7]
>  4: (abort()+0x16a) [0x7f4bbbfe77fa]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bbc8c7d45]
>  6: (()+0x5dda7) [0x7f4bbc8c5da7]
>  7: (()+0x5ddf2) [0x7f4bbc8c5df2]
>  8: (()+0x5e008) [0x7f4bbc8c6008]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x252) [0xb12632]
>  10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*,
> long)+0x119) [0xa4

Re: [ceph-users] Ceph Openstack deployment

2015-11-06 Thread Iban Cabrillo
Hi,
  One more step debugging this issue (hypervisor/nova-compute node is XEN
4.4.2):

  I think the problem is that libvirt is not getting the correct user or
credentials tu access pool, on instance qemu log i see:

xen be: qdisk-51760: error: Could not open
'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
directory
xen be: qdisk-51760: initialise() failed
xen be: qdisk-51760: error: Could not open
'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
directory
xen be: qdisk-51760: initialise() failed
xen be: qdisk-51760: error: Could not open
'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
directory

But using the user cinder on pool volumes :

rbd ls -p volumes --id cinder
test
*volume-4d26bb31-91e8-4646-8010-82127b775c8e*
volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a
volume-7da08f12-fb0f-4269-931a-d528c1507fee

Using:
qemu-img info -f rbd rbd:volumes/test
Does not work, but using directly the user cinder and the ceph.conf file
works fine:

*qemu-img info -f rbd rbd:volumes/test:id=cinder:conf=/etc/ceph/ceph.conf*

I think nova.conf is set correctly (section libvirt):
images_rbd_pool = volumes
images_rbd_ceph_conf = /etc/ceph/ceph.conf
hw_disk_discard=unmap
rbd_user = cinder
rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-

And looking at libvirt:

# virsh secret-list
setlocale: No such file or directory
 UUID  Usage

 67a6d4a1-e53a-42c7-9bc9-  ceph client.cinder secret


virsh secret-get-value 67a6d4a1-e53a-42c7-9bc9-
setlocale: No such file or directory
*AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg==*
cat /etc/ceph/ceph.client.cinder.keyring
[client.cinder]
key = *AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg==*


Any idea will be welcomed.
regards, I

2015-11-04 10:51 GMT+01:00 Iban Cabrillo :

> Dear Cephers,
>
>I still can attach volume to my cloud machines, ceph version is  0.94.5
> (9764da52395923e0b32908d83a9f7304401fee43) and Openstack Juno
>
>Nova+cinder are able to create volumes on Ceph
> cephvolume:~ # rados ls --pool volumes
> rbd_header.1f7784a9e1c2e
> rbd_id.volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a
> rbd_directory
> rbd_id.volume-7da08f12-fb0f-4269-931a-d528c1507fee
> rbd_header.23d5e33b4c15c
> rbd_id.volume-4d26bb31-91e8-4646-8010-82127b775c8e
> rbd_header.20407190ce77f
>
> cloud:~ # cinder list
>
> +--++--+--+-+--+--+
> |  ID   |
> Status  | Display Name | Size | Volume Type | Bootable |
>  Attached to  |
>
> +--++--+--+-+--+|-+
> | 4d26bb31-91e8-4646-8010-82127b775c8e | in-use | None | 2
>   | rbd |  false   |
> 59aa021e-bb4c-4154-9b18-9d09f5fd3aeb |
>
> +--++--+--+-+--+--+
>
>
>nova:~ # nova volume-attach 59aa021e-bb4c-4154-9b18-9d09f5fd3aeb
> 4d26bb31-91e8-4646-8010-82127b775c8e auto
> +--++
> | Property |  Value
>  |
> +--++
> | device  | /dev/xvdd
> |
> | id | 4d26bb31-91e8-4646-8010-82127b775c8e |
> | serverId   | 59aa021e-bb4c-4154-9b18-9d09f5fd3aeb  |
> | volumeId | 4d26bb31-91e8-4646-8010-82127b775c8e |
> +--+--+
>
> From nova-compute (Ubuntu 14.04 LTS \n \l) node I see the
> attaching/detaching:
> cloud01:~ # dpkg -l | grep ceph
> ii  ceph-common 0.94.5-1trusty
>  amd64common utilities to mount and interact with a ceph storage
> cluster
> ii  libcephfs1   0.94.5-1trusty
>amd64Ceph distributed file system client library
> ii  python-cephfs 0.94.5-1trusty
>amd64Python libraries for the Ceph libcephfs library
> ii  librbd10.94.5-1trusty
>  amd64RADOS block device client library
> ii  python-rbd  0.94.5-1trusty
>  amd64Python libraries for the Ceph librbd library
>
> *at cinder.conf*
>
>  *rbd_user = cinder*
> *rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-xxx*
>
> *[rbd-cephvolume]*
> *volume_backend_name=rbd*
> *volume_driver = cinder.volume.drivers.rbd.RBDDriver*
> *rbd_pool = volumes*
> *rbd_ceph_conf = 

Re: [ceph-users] Ceph Openstack deployment

2015-11-06 Thread Vasiliy Angapov
At cinder.conf you should place this options:

rbd_user = cinder
rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-xxx

to [rbd-cephvolume] section instead of DEFAULT.

2015-11-06 19:45 GMT+08:00 Iban Cabrillo :
> Hi,
>   One more step debugging this issue (hypervisor/nova-compute node is XEN
> 4.4.2):
>
>   I think the problem is that libvirt is not getting the correct user or
> credentials tu access pool, on instance qemu log i see:
>
> xen be: qdisk-51760: error: Could not open
> 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
> directory
> xen be: qdisk-51760: initialise() failed
> xen be: qdisk-51760: error: Could not open
> 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
> directory
> xen be: qdisk-51760: initialise() failed
> xen be: qdisk-51760: error: Could not open
> 'volumes/volume-4d26bb31-91e8-4646-8010-82127b775c8e': No such file or
> directory
>
> But using the user cinder on pool volumes :
>
> rbd ls -p volumes --id cinder
> test
> volume-4d26bb31-91e8-4646-8010-82127b775c8e
> volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a
> volume-7da08f12-fb0f-4269-931a-d528c1507fee
>
> Using:
> qemu-img info -f rbd rbd:volumes/test
> Does not work, but using directly the user cinder and the ceph.conf file
> works fine:
>
> qemu-img info -f rbd rbd:volumes/test:id=cinder:conf=/etc/ceph/ceph.conf
>
> I think nova.conf is set correctly (section libvirt):
> images_rbd_pool = volumes
> images_rbd_ceph_conf = /etc/ceph/ceph.conf
> hw_disk_discard=unmap
> rbd_user = cinder
> rbd_secret_uuid = 67a6d4a1-e53a-42c7-9bc9-
>
> And looking at libvirt:
>
> # virsh secret-list
> setlocale: No such file or directory
>  UUID  Usage
> 
>  67a6d4a1-e53a-42c7-9bc9-  ceph client.cinder secret
>
>
> virsh secret-get-value 67a6d4a1-e53a-42c7-9bc9-
> setlocale: No such file or directory
> AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg==
> cat /etc/ceph/ceph.client.cinder.keyring
> [client.cinder]
> key = AQAonAdWS3iMJxxj9iErv001a0k+vyFdUg==
>
>
> Any idea will be welcomed.
> regards, I
>
> 2015-11-04 10:51 GMT+01:00 Iban Cabrillo :
>>
>> Dear Cephers,
>>
>>I still can attach volume to my cloud machines, ceph version is  0.94.5
>> (9764da52395923e0b32908d83a9f7304401fee43) and Openstack Juno
>>
>>Nova+cinder are able to create volumes on Ceph
>> cephvolume:~ # rados ls --pool volumes
>> rbd_header.1f7784a9e1c2e
>> rbd_id.volume-5e2ab5c2-4710-4c28-9755-b5bc4ff6a52a
>> rbd_directory
>> rbd_id.volume-7da08f12-fb0f-4269-931a-d528c1507fee
>> rbd_header.23d5e33b4c15c
>> rbd_id.volume-4d26bb31-91e8-4646-8010-82127b775c8e
>> rbd_header.20407190ce77f
>>
>> cloud:~ # cinder list
>>
>> +--++--+--+-+--+--+
>> |  ID   |
>> Status  | Display Name | Size | Volume Type | Bootable |
>> Attached to  |
>>
>> +--++--+--+-+--+|-+
>> | 4d26bb31-91e8-4646-8010-82127b775c8e | in-use | None | 2
>> | rbd |  false   | 59aa021e-bb4c-4154-9b18-9d09f5fd3aeb
>> |
>>
>> +--++--+--+-+--+--+
>>
>>
>>nova:~ # nova volume-attach 59aa021e-bb4c-4154-9b18-9d09f5fd3aeb
>> 4d26bb31-91e8-4646-8010-82127b775c8e auto
>> +--++
>> | Property |  Value
>> |
>> +--++
>> | device  | /dev/xvdd
>> |
>> | id | 4d26bb31-91e8-4646-8010-82127b775c8e |
>> | serverId   | 59aa021e-bb4c-4154-9b18-9d09f5fd3aeb  |
>> | volumeId | 4d26bb31-91e8-4646-8010-82127b775c8e |
>> +--+--+
>>
>> From nova-compute (Ubuntu 14.04 LTS \n \l) node I see the
>> attaching/detaching:
>> cloud01:~ # dpkg -l | grep ceph
>> ii  ceph-common 0.94.5-1trusty
>> amd64common utilities to mount and interact with a ceph storage
>> cluster
>> ii  libcephfs1   0.94.5-1trusty
>> amd64Ceph distributed file system client library
>> ii  python-cephfs 0.94.5-1trusty
>> amd64Python libraries for the Ceph libcephfs library
>> ii  librbd10.94.5-1trusty
>> amd64RADOS block device client library
>> ii  python-rbd  

Re: [ceph-users] osd fails to start, rbd hangs

2015-11-06 Thread Philipp Schwaha
On 11/06/2015 09:25 PM, Gregory Farnum wrote:
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
> 
> :)
> 

Thanks, I tried to follow the advice to "... start that ceph-osd and
things will recover.", for the better part of the last two days but did
not succeed in reviving the crashed osd :(
I do not understand the message the osd is giving, since the files
appear to be there:

beta ~ # ls -lrt /var/lib/ceph/osd/ceph-2/
total 1048656
-rw-r--r-- 1 root root 37 Oct 26 16:25 fsid
-rw-r--r-- 1 root root  4 Oct 26 16:25 store_version
-rw-r--r-- 1 root root 53 Oct 26 16:25 superblock
-rw-r--r-- 1 root root 21 Oct 26 16:25 magic
-rw-r--r-- 1 root root  2 Oct 26 16:25 whoami
-rw-r--r-- 1 root root 37 Oct 26 16:25 ceph_fsid
-rw-r--r-- 1 root root  6 Oct 26 16:25 ready
-rw--- 1 root root 56 Oct 26 16:25 keyring
drwxr-xr-x 1 root root752 Oct 26 16:47 snap_16793
drwxr-xr-x 1 root root752 Oct 26 16:47 snap_16773
drwxr-xr-x 1 root root230 Oct 30 01:01 snap_242352
drwxr-xr-x 1 root root230 Oct 30 01:01 snap_242378
-rw-r--r-- 1 root root 1073741824 Oct 30 01:02 journal
drwxr-xr-x 1 root root256 Nov  6 21:55 current

as well as a subvolume:

btrfs subvolume list /var/lib/ceph/osd/ceph-2/
ID 8005 gen 8336 top level 5 path snap_242352
ID 8006 gen 8467 top level 5 path snap_242378
ID 8070 gen 8468 top level 5 path current

still the osd complains says "current/ missing entirely (unusual, but
okay)" and then completely fails to mount the object store.
Is this somethig where to give up on the osd completely, mark it as lost
and try to go on from there?
The machine on which the osd runs did not have any other issues, only
the osd apparently self destructed ~3.5 days after it was added.

Or is the recovery of the osd simple (enough) and I just missed the
point somewhere? ;)

thanks in advance
Philipp

The log of an attempted start of the osd continues to give:

2015-11-06 21:41:53.213174 7f44755a77c0  0 ceph version 0.94.3
(95cefea9fd9ab740263bf8bb4796fd864d9afe2b), process ceph-osd, pid 3751
2015-11-06 21:41:53.254418 7f44755a77c0 10
filestore(/var/lib/ceph/osd/ceph-2) dump_stop
2015-11-06 21:41:53.275694 7f44755a77c0 10
ErasureCodePluginSelectJerasure: load: jerasure_sse4
2015-11-06 21:41:53.291133 7f44755a77c0 10 load: jerasure load: lrc
2015-11-06 21:41:53.291543 7f44755a77c0  5
filestore(/var/lib/ceph/osd/ceph-2) test_mount basedir
/var/lib/ceph/osd/ceph-2 journal /var/lib/ceph/osd/ceph-2/journal
2015-11-06 21:41:53.292043 7f44755a77c0  2 osd.2 0 mounting
/var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
2015-11-06 21:41:53.292152 7f44755a77c0  5
filestore(/var/lib/ceph/osd/ceph-2) basedir /var/lib/ceph/osd/ceph-2
journal /var/lib/ceph/osd/ceph-2/journal
2015-11-06 21:41:53.292216 7f44755a77c0 10
filestore(/var/lib/ceph/osd/ceph-2) mount fsid is
2662df9c-fd60-425c-ac89-4fe07a2a1b2f
2015-11-06 21:41:53.292412 7f44755a77c0  0
filestore(/var/lib/ceph/osd/ceph-2) backend btrfs (magic 0x9123683e)
2015-11-06 21:41:59.753329 7f44755a77c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
FIEMAP ioctl is supported and appears to work
2015-11-06 21:41:59.753395 7f44755a77c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-11-06 21:42:00.968438 7f44755a77c0  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2015-11-06 21:42:00.969431 7f44755a77c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
CLONE_RANGE ioctl is supported
2015-11-06 21:42:03.033742 7f44755a77c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
SNAP_CREATE is supported
2015-11-06 21:42:03.034262 7f44755a77c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
SNAP_DESTROY is supported
2015-11-06 21:42:03.042168 7f44755a77c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
START_SYNC is supported (transid 8453)
2015-11-06 21:42:04.144516 7f44755a77c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
WAIT_SYNC is supported
2015-11-06 21:42:04.309323 7f44755a77c0  0
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
SNAP_CREATE_V2 is supported
2015-11-06 21:42:04.310562 7f44755a77c0 10
filestore(/var/lib/ceph/osd/ceph-2)  current/ missing entirely (unusual,
but okay)
2015-11-06 21:42:04.310686 7f44755a77c0 10
filestore(/var/lib/ceph/osd/ceph-2)  most recent snap from
<242352,242378> is 242378
2015-11-06 21:42:04.310763 7f44755a77c0 10
filestore(/var/lib/ceph/osd/ceph-2) mount rolling back to consistent
snap 242378
2015-11-06 21:42:04.310812 7f44755a77c0 10
btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) rollback_to: to
'snap_242378'
2015-11-06 21:42:06.384894 7f44755a77c0  5
filestore(/var/lib/ceph/osd/ceph-2) mount op_seq is 0
2015-11-06 

Re: [ceph-users] ceph-deploy on lxc container - 'initctl: Event failed'

2015-11-06 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I've put monitors in LXC but I haven't done it with ceph-deploy. I've
had no problems with it.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Nov 6, 2015 at 12:55 PM, Bogdan SOLGA  wrote:
> Hello, everyone!
>
> I just tried to create a new Ceph cluster, using 3 LXC clusters as monitors,
> and the 'ceph-deploy mon create-initial' command fails for each of the
> monitors with a 'initctl: Event failed' error, when running the following
> command:
>
> [ceph-mon-01][INFO  ] Running command: sudo initctl emit ceph-mon
> cluster=ceph id=ceph-mon-01
> [ceph-mon-01][WARNIN] initctl: Event failed
>
> Is it OK to use LXC containers as Ceph MONs? if yes - is there anything
> special which needs to be done prior to the 'mon create-initial' phase?
>
> Thank you!
>
> Regards,
> Bogdan
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.2.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWPRu7CRDmVDuy+mK58QAAqoUP/0CM1aRSm6XRWVeRvWzb
kWWrgHyypNbHKhGXe07F8bHS1jberhKs9RCuU+RKN2aJ7M3zL1xr5ysspZ4R
+1fMHVW4enW5haBKa1Z1/1C5uPBQvVOwjEE+7k8XncvP4+mnICtBqtEQPc1g
+62CY9Ke39btPXwGJiTC8by2Uh6pvrtnfGf7UGh6nWrnoOxJmTnZImmQKbpg
PLvqw/Dl/KJD4DcQoS3nzLRXhZXOohpUsAJBMegq422+iYa31f0QVdddzoC7
DYfqxV2xszOeh24McTXZjOVulC1w2Xni3R9vOWjbJGPlMbg1xnBqX/G+Fn2z
2UAOYTMx5bK/j3wzAryMYs9/dtr4JhpO8cVWSm1fxM4J3V/96ug4Y3eYHoCZ
FoTGDmPwFDXQkwTFwjWWgoIMQh/1Zi6Nm6cLnggVlQcotdfka/glcLEHXXMb
uPXKcrY6kwwIbw+JFUbn6GUlK1ZSURKnmwXmVroHnoxnWH7bH7hhNv+GYzxJ
AjOxlds8E4igFHxwh0A7xIq/IosKgwxIuxbO2BlnYTYCoCrjOWoesiFtQdpX
q+tRSo03gC4PSqrjsm7xsMdSW/3uaIEzZPx/SQJU/JBDKarNY2eCo7VYntUx
7uxkWGEA4sibLdjNIGkRJHSrZDVdSJMlaPNBNrxmREl0t9b+DVBtbLgSvHeW
Tj4D
=aGAZ
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd fails to start, rbd hangs

2015-11-06 Thread Iban Cabrillo
Hi Philipp,
  I see you only have 2 osds, have you check that your "osd pool get size"
is 2, and min_size=1??
Cheers, I

2015-11-06 22:05 GMT+01:00 Philipp Schwaha :

> On 11/06/2015 09:25 PM, Gregory Farnum wrote:
> >
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
> >
> > :)
> >
>
> Thanks, I tried to follow the advice to "... start that ceph-osd and
> things will recover.", for the better part of the last two days but did
> not succeed in reviving the crashed osd :(
> I do not understand the message the osd is giving, since the files
> appear to be there:
>
> beta ~ # ls -lrt /var/lib/ceph/osd/ceph-2/
> total 1048656
> -rw-r--r-- 1 root root 37 Oct 26 16:25 fsid
> -rw-r--r-- 1 root root  4 Oct 26 16:25 store_version
> -rw-r--r-- 1 root root 53 Oct 26 16:25 superblock
> -rw-r--r-- 1 root root 21 Oct 26 16:25 magic
> -rw-r--r-- 1 root root  2 Oct 26 16:25 whoami
> -rw-r--r-- 1 root root 37 Oct 26 16:25 ceph_fsid
> -rw-r--r-- 1 root root  6 Oct 26 16:25 ready
> -rw--- 1 root root 56 Oct 26 16:25 keyring
> drwxr-xr-x 1 root root752 Oct 26 16:47 snap_16793
> drwxr-xr-x 1 root root752 Oct 26 16:47 snap_16773
> drwxr-xr-x 1 root root230 Oct 30 01:01 snap_242352
> drwxr-xr-x 1 root root230 Oct 30 01:01 snap_242378
> -rw-r--r-- 1 root root 1073741824 Oct 30 01:02 journal
> drwxr-xr-x 1 root root256 Nov  6 21:55 current
>
> as well as a subvolume:
>
> btrfs subvolume list /var/lib/ceph/osd/ceph-2/
> ID 8005 gen 8336 top level 5 path snap_242352
> ID 8006 gen 8467 top level 5 path snap_242378
> ID 8070 gen 8468 top level 5 path current
>
> still the osd complains says "current/ missing entirely (unusual, but
> okay)" and then completely fails to mount the object store.
> Is this somethig where to give up on the osd completely, mark it as lost
> and try to go on from there?
> The machine on which the osd runs did not have any other issues, only
> the osd apparently self destructed ~3.5 days after it was added.
>
> Or is the recovery of the osd simple (enough) and I just missed the
> point somewhere? ;)
>
> thanks in advance
> Philipp
>
> The log of an attempted start of the osd continues to give:
>
> 2015-11-06 21:41:53.213174 7f44755a77c0  0 ceph version 0.94.3
> (95cefea9fd9ab740263bf8bb4796fd864d9afe2b), process ceph-osd, pid 3751
> 2015-11-06 21:41:53.254418 7f44755a77c0 10
> filestore(/var/lib/ceph/osd/ceph-2) dump_stop
> 2015-11-06 21:41:53.275694 7f44755a77c0 10
> ErasureCodePluginSelectJerasure: load: jerasure_sse4
> 2015-11-06 21:41:53.291133 7f44755a77c0 10 load: jerasure load: lrc
> 2015-11-06 21:41:53.291543 7f44755a77c0  5
> filestore(/var/lib/ceph/osd/ceph-2) test_mount basedir
> /var/lib/ceph/osd/ceph-2 journal /var/lib/ceph/osd/ceph-2/journal
> 2015-11-06 21:41:53.292043 7f44755a77c0  2 osd.2 0 mounting
> /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
> 2015-11-06 21:41:53.292152 7f44755a77c0  5
> filestore(/var/lib/ceph/osd/ceph-2) basedir /var/lib/ceph/osd/ceph-2
> journal /var/lib/ceph/osd/ceph-2/journal
> 2015-11-06 21:41:53.292216 7f44755a77c0 10
> filestore(/var/lib/ceph/osd/ceph-2) mount fsid is
> 2662df9c-fd60-425c-ac89-4fe07a2a1b2f
> 2015-11-06 21:41:53.292412 7f44755a77c0  0
> filestore(/var/lib/ceph/osd/ceph-2) backend btrfs (magic 0x9123683e)
> 2015-11-06 21:41:59.753329 7f44755a77c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is supported and appears to work
> 2015-11-06 21:41:59.753395 7f44755a77c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> FIEMAP ioctl is disabled via 'filestore fiemap' config option
> 2015-11-06 21:42:00.968438 7f44755a77c0  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2015-11-06 21:42:00.969431 7f44755a77c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
> CLONE_RANGE ioctl is supported
> 2015-11-06 21:42:03.033742 7f44755a77c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
> SNAP_CREATE is supported
> 2015-11-06 21:42:03.034262 7f44755a77c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
> SNAP_DESTROY is supported
> 2015-11-06 21:42:03.042168 7f44755a77c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
> START_SYNC is supported (transid 8453)
> 2015-11-06 21:42:04.144516 7f44755a77c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
> WAIT_SYNC is supported
> 2015-11-06 21:42:04.309323 7f44755a77c0  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_feature:
> SNAP_CREATE_V2 is supported
> 2015-11-06 21:42:04.310562 7f44755a77c0 10
> filestore(/var/lib/ceph/osd/ceph-2)  current/ missing entirely (unusual,
> but okay)
> 2015-11-06 21:42:04.310686 7f44755a77c0 10
> filestore(/var/lib/ceph/osd/ceph-2)  most recent snap from
> 

[ceph-users] v9.2.0 Infernalis released

2015-11-06 Thread Sage Weil
[I'm going to break my own rule and do this on a Friday only because this 
has been built and in the repos for a couple of days now; I've just been 
traveling and haven't had time to announce it.]

This major release will be the foundation for the next stable series.
There have been some major changes since v0.94.x Hammer, and the
upgrade process is non-trivial.  Please read these release notes carefully.

Major Changes from Hammer
-

- General:

  * Ceph daemons are now managed via systemd (with the exception of
Ubuntu Trusty, which still uses upstart).
  * Ceph daemons run as 'ceph' user instead root.
  * On Red Hat distros, there is also an SELinux policy.

- RADOS:

  * The RADOS cache tier can now proxy write operations to the base
tier, allowing writes to be handled without forcing migration of
an object into the cache.
  * The SHEC erasure coding support is no longer flagged as
experimental. SHEC trades some additional storage space for faster
repair.
  * There is now a unified queue (and thus prioritization) of client
IO, recovery, scrubbing, and snapshot trimming.
  * There have been many improvements to low-level repair tooling
(ceph-objectstore-tool).
  * The internal ObjectStore API has been significantly cleaned up in order
to faciliate new storage backends like NewStore.

- RGW:

  * The Swift API now supports object expiration.
  * There are many Swift API compatibility improvements.

- RBD:

  * The ``rbd du`` command shows actual usage (quickly, when
object-map is enabled).
  * The object-map feature has seen many stability improvements.
  * Object-map and exclusive-lock features can be enabled or disabled
dynamically.
  * You can now store user metadata and set persistent librbd options
associated with individual images.
  * The new deep-flatten features allows flattening of a clone and all
of its snapshots.  (Previously snapshots could not be flattened.)
  * The export-diff command command is now faster (it uses aio).  There is also
a new fast-diff feature.
  * The --size argument can be specified with a suffix for units
(e.g., ``--size 64G``).
  * There is a new ``rbd status`` command that, for now, shows who has
the image open/mapped.

- CephFS:

  * You can now rename snapshots.
  * There have been ongoing improvements around administration, diagnostics,
and the check and repair tools.
  * The caching and revocation of client cache state due to unused
inodes has been dramatically improved.
  * The ceph-fuse client behaves better on 32-bit hosts.

Distro compatibility


We have decided to drop support for many older distributions so that we can
move to a newer compiler toolchain (e.g., C++11).  Although it is still possible
to build Ceph on older distributions by installing backported development tools,
we are not building and publishing release packages for ceph.com.

We now build packages for:

* CentOS 7 or later.  We have dropped support for CentOS 6 (and other
  RHEL 6 derivatives, like Scientific Linux 6).
* Debian Jessie 8.x or later.  Debian Wheezy 7.x's g++ has incomplete
  support for C++11 (and no systemd).
* Ubuntu Trusty 14.04 or later.  Ubuntu Precise 12.04 is no longer
  supported.
* Fedora 22 or later.

Upgrading from Firefly
--

Upgrading directly from Firefly v0.80.z is not recommended.  It is
possible to do a direct upgrade, but not without downtime.  We
recommend that clusters are first upgraded to Hammer v0.94.4 or a
later v0.94.z release; only then is it possible to upgrade to
Infernalis 9.2.z for an online upgrade (see below).

To do an offline upgrade directly from Firefly, all Firefly OSDs must
be stopped and marked down before any Infernalis OSDs will be allowed
to start up.  This fencing is enforced by the Infernalis monitor, so
use an upgrade procedure like:

  1. Upgrade Ceph on monitor hosts
  2. Restart all ceph-mon daemons
  3. Upgrade Ceph on all OSD hosts
  4. Stop all ceph-osd daemons
  5. Mark all OSDs down with something like::
   ceph osd down `seq 0 1000`
  6. Start all ceph-osd daemons
  7. Upgrade and restart remaining daemons (ceph-mds, radosgw)

Upgrading from Hammer
-

* For all distributions that support systemd (CentOS 7, Fedora, Debian
  Jessie 8.x, OpenSUSE), ceph daemons are now managed using native systemd
  files instead of the legacy sysvinit scripts.  For example,::

systemctl start ceph.target   # start all daemons
systemctl status ceph-osd@12  # check status of osd.12

  The main notable distro that is *not* yet using systemd is Ubuntu trusty
  14.04.  (The next Ubuntu LTS, 16.04, will use systemd instead of upstart.)

* Ceph daemons now run as user and group ``ceph`` by default.  The
  ceph user has a static UID assigned by Fedora and Debian (also used
  by derivative distributions like RHEL/CentOS and Ubuntu).  On SUSE
  the ceph user will currently get a