Re: [ceph-users] PGs issue
Hello, Nick! Thank you for your reply! I have tested both with setting the replicas number to 2 and 3, by setting the 'osd pool default size = (2|3)' in the .conf file. Either I'm doing something incorrectly, or they seem to produce the same result. Can you give any troubleshooting advice? I have purged and re-created the cluster several times, but the result is the same. Thank you for your help! Regards, Bogdan On Thu, Mar 19, 2015 at 11:29 PM, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bogdan SOLGA Sent: 19 March 2015 20:51 To: ceph-users@lists.ceph.com Subject: [ceph-users] PGs issue Hello, everyone! I have created a Ceph cluster (v0.87.1-1) using the info from the 'Quick deploy' page, with the following setup: • 1 x admin / deploy node; • 3 x OSD and MON nodes; o each OSD node has 2 x 8 GB HDDs; The setup was made using Virtual Box images, on Ubuntu 14.04.2. After performing all the steps, the 'ceph health' output lists the cluster in the HEALTH_WARN state, with the following details: HEALTH_WARN 64 pgs degraded; 64 pgs stuck degraded; 64 pgs stuck unclean; 64 pgs stuck undersized; 64 pgs undersized; too few pgs per osd (10 min 20) The output of 'ceph -s': cluster b483bc59-c95e-44b1-8f8d-86d3feffcfab health HEALTH_WARN 64 pgs degraded; 64 pgs stuck degraded; 64 pgs stuck unclean; 64 pgs stuck undersized; 64 pgs undersized; too few pgs per osd (10 min 20) monmap e1: 3 mons at {osd-003=192.168.122.23:6789/0,osd- 002=192.168.122.22:6789/0,osd-001=192.168.122.21:6789/0}, election epoch 6, quorum 0,1,2 osd-001,osd-002,osd-003 osdmap e20: 6 osds: 6 up, 6 in pgmap v36: 64 pgs, 1 pools, 0 bytes data, 0 objects 199 MB used, 18166 MB / 18365 MB avail 64 active+undersized+degraded I have tried to increase the pg_num and pgp_num to 512, as advised here, but Ceph refused to do that, with the following error: Error E2BIG: specified pg_num 512 is too large (creating 384 new PGs on ~6 OSDs exceeds per-OSD max of 32) After changing the pg*_num to 256, as advised here, the warning was changed to: health HEALTH_WARN 256 pgs degraded; 256 pgs stuck unclean; 256 pgs undersized What is the issue behind these warning? and what do I need to do to fix it? It's basically telling you that you current available OSD's don't meet the requirements to suit the number of replica's you have requested. What replica size have you configured for that pool? I'm a newcomer in the Ceph world, so please don't shoot me if this issue has been answered / discussed countless times before :) I have searched the web and the mailing list for the answers, but I couldn't find a valid solution. Any help is highly appreciated. Thank you! Regards, Bogdan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD remains down
there was a blackout and one of my osds remains down, I have noticed that the journal partition an data partion is not showed anymore so the device cannot mounted… 8 1145241856 sdh2 8 128 3906249728 sdi 8 129 3901005807 sdi1 8 1305241856 sdi2 8 144 3906249728 sdj 8 145 3901005807 sdj1 8 1465241856 sdj2 8 192 3906249728 sdm 8 176 3906249728 sdl 8 177 3901005807 sdl1 8 1785241856 sdl2 8 160 3906249728 sdk 8 161 3901005807 sdk1 8 1625241856 sdk2 2530 52428800 dm-0 25314194304 dm-1 2532 37588992 dm-2 the device is the /dev/sdm and the osd is the number 110, so what does that mean ? that I have lost everything in OSD 110? Thanks /dev/mapper/rhel-root 50G 4.4G 46G 9% / devtmpfs 126G 0 126G 0% /dev tmpfs 126G 92K 126G 1% /dev/shm tmpfs 126G 11M 126G 1% /run tmpfs 126G 0 126G 0% /sys/fs/cgroup /dev/sda1 494M 165M 330M 34% /boot /dev/sdj1 3.7T 220M 3.7T 1% /var/lib/ceph/osd/ceph-80 /dev/mapper/rhel-home 36G 49M 36G 1% /home /dev/sdg1 3.7T 256M 3.7T 1% /var/lib/ceph/osd/ceph-50 /dev/sdd1 3.7T 320M 3.7T 1% /var/lib/ceph/osd/ceph-20 /dev/sdc1 3.7T 257M 3.7T 1% /var/lib/ceph/osd/ceph-10 /dev/sdi1 3.7T 252M 3.7T 1% /var/lib/ceph/osd/ceph-70 /dev/sdl1 3.7T 216M 3.7T 1% /var/lib/ceph/osd/ceph-100 /dev/sdh1 3.7T 301M 3.7T 1% /var/lib/ceph/osd/ceph-60 /dev/sde1 3.7T 268M 3.7T 1% /var/lib/ceph/osd/ceph-30 /dev/sdf1 3.7T 299M 3.7T 1% /var/lib/ceph/osd/ceph-40 /dev/sdb1 3.7T 244M 3.7T 1% /var/lib/ceph/osd/ceph-0 /dev/sdk1 3.7T 240M 3.7T 1% /var/lib/ceph/osd/ceph-90 [root@capricornio ~]# 0 3.63osd.0 up 1 10 3.63osd.10 up 1 20 3.63osd.20 up 1 30 3.63osd.30 up 1 40 3.63osd.40 up 1 50 3.63osd.50 up 1 60 3.63osd.60 up 1 70 3.63osd.70 up 1 80 3.63osd.80 up 1 90 3.63osd.90 up 1 100 3.63osd.100 up 1 110 3.63osd.110 down0 [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [cid:image006.gif@01D00809.A6D502D0] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD remains down
Hi, If device mounted is not coming up, you can replace with new disk and ceph will handle rebalancing the data. Here are the steps if you would like to replace the failed disk with new one : 1. ceph osd out osd.110 2. Now remove this failed OSD from Crush Map , as soon as its removed from crush map , recovery process will start. ceph osd crush remove osd.110 3.delete keyrings for that OSD and finally remove OSD ceph auth del osd.110 ceph osd rm 110 4. Once recovery is done and ceph status is active+clean, remove the old drive and insert new drive say /dev/sdb 5 Now create osd using ceph-deploy: (or the way you added osds at first) ceph-deploy osd create node:/dev/sdb --zap-disk Thanks Sahana On Fri, Mar 20, 2015 at 12:10 PM, Jesus Chavez (jeschave) jesch...@cisco.com wrote: there was a blackout and one of my osds remains down, I have noticed that the journal partition an data partion is not showed anymore so the device cannot mounted… 8 1145241856 sdh2 8 128 3906249728 sdi 8 129 3901005807 sdi1 8 1305241856 sdi2 8 144 3906249728 sdj 8 145 3901005807 sdj1 8 1465241856 sdj2 8 192 3906249728 sdm 8 176 3906249728 sdl 8 177 3901005807 sdl1 8 1785241856 sdl2 8 160 3906249728 sdk 8 161 3901005807 sdk1 8 1625241856 sdk2 2530 52428800 dm-0 25314194304 dm-1 2532 37588992 dm-2 the device is the /dev/sdm and the osd is the number 110, so what does that mean ? that I have lost everything in OSD 110? Thanks /dev/mapper/rhel-root 50G 4.4G 46G 9% / devtmpfs 126G 0 126G 0% /dev tmpfs 126G 92K 126G 1% /dev/shm tmpfs 126G 11M 126G 1% /run tmpfs 126G 0 126G 0% /sys/fs/cgroup /dev/sda1 494M 165M 330M 34% /boot /dev/sdj1 3.7T 220M 3.7T 1% /var/lib/ceph/osd/ceph-80 /dev/mapper/rhel-home 36G 49M 36G 1% /home /dev/sdg1 3.7T 256M 3.7T 1% /var/lib/ceph/osd/ceph-50 /dev/sdd1 3.7T 320M 3.7T 1% /var/lib/ceph/osd/ceph-20 /dev/sdc1 3.7T 257M 3.7T 1% /var/lib/ceph/osd/ceph-10 /dev/sdi1 3.7T 252M 3.7T 1% /var/lib/ceph/osd/ceph-70 /dev/sdl1 3.7T 216M 3.7T 1% /var/lib/ceph/osd/ceph-100 /dev/sdh1 3.7T 301M 3.7T 1% /var/lib/ceph/osd/ceph-60 /dev/sde1 3.7T 268M 3.7T 1% /var/lib/ceph/osd/ceph-30 /dev/sdf1 3.7T 299M 3.7T 1% /var/lib/ceph/osd/ceph-40 /dev/sdb1 3.7T 244M 3.7T 1% /var/lib/ceph/osd/ceph-0 /dev/sdk1 3.7T 240M 3.7T 1% /var/lib/ceph/osd/ceph-90 [root@capricornio ~]# 0 3.63osd.0 up 1 10 3.63osd.10 up 1 20 3.63osd.20 up 1 30 3.63osd.30 up 1 40 3.63osd.40 up 1 50 3.63osd.50 up 1 60 3.63osd.60 up 1 70 3.63osd.70 up 1 80 3.63osd.80 up 1 90 3.63osd.90 up 1 100 3.63osd.100 up 1 110 3.63osd.110 down0 * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 %2B52%2055%205267%203146* Mobile: *+51 1 5538883255* CCIE - 44433 Cisco.com http://www.cisco.com/ Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here http://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs issue
HI Bogdan, Please paste the output of `ceph osd dump` and ceph osd tree` Thanks Sahana On Fri, Mar 20, 2015 at 11:47 AM, Bogdan SOLGA bogdan.so...@gmail.com wrote: Hello, Nick! Thank you for your reply! I have tested both with setting the replicas number to 2 and 3, by setting the 'osd pool default size = (2|3)' in the .conf file. Either I'm doing something incorrectly, or they seem to produce the same result. Can you give any troubleshooting advice? I have purged and re-created the cluster several times, but the result is the same. Thank you for your help! Regards, Bogdan On Thu, Mar 19, 2015 at 11:29 PM, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bogdan SOLGA Sent: 19 March 2015 20:51 To: ceph-users@lists.ceph.com Subject: [ceph-users] PGs issue Hello, everyone! I have created a Ceph cluster (v0.87.1-1) using the info from the 'Quick deploy' page, with the following setup: • 1 x admin / deploy node; • 3 x OSD and MON nodes; o each OSD node has 2 x 8 GB HDDs; The setup was made using Virtual Box images, on Ubuntu 14.04.2. After performing all the steps, the 'ceph health' output lists the cluster in the HEALTH_WARN state, with the following details: HEALTH_WARN 64 pgs degraded; 64 pgs stuck degraded; 64 pgs stuck unclean; 64 pgs stuck undersized; 64 pgs undersized; too few pgs per osd (10 min 20) The output of 'ceph -s': cluster b483bc59-c95e-44b1-8f8d-86d3feffcfab health HEALTH_WARN 64 pgs degraded; 64 pgs stuck degraded; 64 pgs stuck unclean; 64 pgs stuck undersized; 64 pgs undersized; too few pgs per osd (10 min 20) monmap e1: 3 mons at {osd-003=192.168.122.23:6789/0,osd- 002=192.168.122.22:6789/0,osd-001=192.168.122.21:6789/0}, election epoch 6, quorum 0,1,2 osd-001,osd-002,osd-003 osdmap e20: 6 osds: 6 up, 6 in pgmap v36: 64 pgs, 1 pools, 0 bytes data, 0 objects 199 MB used, 18166 MB / 18365 MB avail 64 active+undersized+degraded I have tried to increase the pg_num and pgp_num to 512, as advised here, but Ceph refused to do that, with the following error: Error E2BIG: specified pg_num 512 is too large (creating 384 new PGs on ~6 OSDs exceeds per-OSD max of 32) After changing the pg*_num to 256, as advised here, the warning was changed to: health HEALTH_WARN 256 pgs degraded; 256 pgs stuck unclean; 256 pgs undersized What is the issue behind these warning? and what do I need to do to fix it? It's basically telling you that you current available OSD's don't meet the requirements to suit the number of replica's you have requested. What replica size have you configured for that pool? I'm a newcomer in the Ceph world, so please don't shoot me if this issue has been answered / discussed countless times before :) I have searched the web and the mailing list for the answers, but I couldn't find a valid solution. Any help is highly appreciated. Thank you! Regards, Bogdan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PHP Rados failed in read operation if object size is large (say more than 10 MB )
If I run from command prompt it gives below error in $piece = rados_read($ioRados, 'TEMP_object',$pieceSize['psize'] ,0); -- Segmentation fault (core dumped) -- I have tried new version of librados too... -- php --ri rados rados Rados = enabled Rados extension version = 0.9.3 librados version (linked) = 0.69.1 librados version (compiled) = 0.69.1 Maximum length snapshot name = 64 Maximum snapshots per pool = 256 -- Also tried new version of php too.. -- php5 -v PHP 5.6.6-1+deb.sury.org~trusty+1 (cli) (built: Feb 20 2015 11:22:10) Copyright (c) 1997-2015 The PHP Group Zend Engine v2.6.0, Copyright (c) 1998-2015 Zend Technologies php5-fpm -v PHP 5.6.6-1+deb.sury.org~trusty+1 (fpm-fcgi) (built: Feb 20 2015 11:27:03) Copyright (c) 1997-2015 The PHP Group Zend Engine v2.6.0, Copyright (c) 1998-2015 Zend Technologies -- On Wed, Mar 18, 2015 at 11:00 AM, Gaurang Vyas gdv...@gmail.com wrote: Used from From MASTER branch. /etc/php5/cli/conf.d/rados.ini rados librados version (linked) = 0.69.0 librados version (compiled) = 0.69.0 -- Seems like the error is due to rados_osd_op_timeout or rados_mon_op_timeout On Mon, Mar 16, 2015 at 7:26 PM, Wido den Hollander w...@42on.com wrote: On 03/16/2015 01:55 PM, Gaurang Vyas wrote: running on ubuntu with nginx + php-fpm ?php $rados = rados_create('admin'); rados_conf_read_file($rados, '/etc/ceph/ceph.conf'); rados_conf_set($rados, 'keyring','/etc/ceph/ceph.client.admin.keyring'); $temp = rados_conf_get($rados, rados_osd_op_timeout); echo osd ; echo $temp; $temp = rados_conf_get($rados, client_mount_timeout); echo clinet ; echo $temp; $temp = rados_conf_get($rados, rados_mon_op_timeout); echo mon ; echo $temp; $err = rados_connect($rados); $ioRados = rados_ioctx_create($rados,'dev_whereis'); $pieceSize = rados_stat($ioRados,'TEMP_object'); var_dump($pieceSize); $piece = rados_read($ioRados, 'TEMP_object',$pieceSize['psize'] ,0); So what is the error exactly? Are you running phprados from the master branch on Github? echo $piece; ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD Forece Removal
Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Thanks [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [cid:image006.gif@01D00809.A6D502D0] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs issue
Hello, Sahana! The output of the requested commands is listed below: admin@cp-admin:~/safedrive$ ceph osd dump epoch 26 fsid 7db3cf23-ddcb-40d9-874b-d7434bd8463d created 2015-03-20 07:53:37.948969 modified 2015-03-20 08:11:18.813790 flags pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 26 flags hashpspool stripe_width 0 max_osd 6 osd.0 up in weight 1 up_from 4 up_thru 24 down_at 0 last_clean_interval [0,0) 192.168.122.21:6800/10437 192.168.122.21:6801/10437 192.168.122.21:6802/10437 192.168.122.21:6803/10437 exists,up c6f241e1-2e98-4fb5-b376-27bade093428 osd.1 up in weight 1 up_from 8 up_thru 0 down_at 0 last_clean_interval [0,0) 192.168.122.21:6805/11079 192.168.122.21:6806/11079 192.168.122.21:6807/11079 192.168.122.21:6808/11079 exists,up a4f2aeea-4e45-4d5f-ab9e-dff8295fb5ea osd.2 up in weight 1 up_from 11 up_thru 0 down_at 0 last_clean_interval [0,0) 192.168.122.22:6800/9375 192.168.122.22:6801/9375 192.168.122.22:6802/9375 192.168.122.22:6803/9375 exists,up f879ef15-7c9a-41a8-88a6-cde013dc2d07 osd.3 up in weight 1 up_from 14 up_thru 0 down_at 0 last_clean_interval [0,0) 192.168.122.22:6805/10008 192.168.122.22:6806/10008 192.168.122.22:6807/10008 192.168.122.22:6808/10008 exists,up 99b3ff05-78b9-4f9f-a8f1-dbead9baddc6 osd.4 up in weight 1 up_from 17 up_thru 0 down_at 0 last_clean_interval [0,0) 192.168.122.23:6800/9158 192.168.122.23:6801/9158 192.168.122.23:6802/9158 192.168.122.23:6803/9158 exists,up 9217fcdd-201b-47c1-badf-b352a639d122 osd.5 up in weight 1 up_from 20 up_thru 0 down_at 0 last_clean_interval [0,0) 192.168.122.23:6805/9835 192.168.122.23:6806/9835 192.168.122.23:6807/9835 192.168.122.23:6808/9835 exists,up ec2c4764-5e30-431b-bc3e-755a7614b90d admin@cp-admin:~/safedrive$ ceph osd tree # idweighttype nameup/downreweight -10root default -20host osd-001 00osd.0up1 10osd.1up1 -30host osd-002 20osd.2up1 30osd.3up1 -40host osd-003 40osd.4up1 50osd.5up1 Please let me know if there's anything else I can / should do. Thank you very much! Regards, Bogdan On Fri, Mar 20, 2015 at 9:17 AM, Sahana shna...@gmail.com wrote: HI Bogdan, Please paste the output of `ceph osd dump` and ceph osd tree` Thanks Sahana On Fri, Mar 20, 2015 at 11:47 AM, Bogdan SOLGA bogdan.so...@gmail.com wrote: Hello, Nick! Thank you for your reply! I have tested both with setting the replicas number to 2 and 3, by setting the 'osd pool default size = (2|3)' in the .conf file. Either I'm doing something incorrectly, or they seem to produce the same result. Can you give any troubleshooting advice? I have purged and re-created the cluster several times, but the result is the same. Thank you for your help! Regards, Bogdan On Thu, Mar 19, 2015 at 11:29 PM, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bogdan SOLGA Sent: 19 March 2015 20:51 To: ceph-users@lists.ceph.com Subject: [ceph-users] PGs issue Hello, everyone! I have created a Ceph cluster (v0.87.1-1) using the info from the 'Quick deploy' page, with the following setup: • 1 x admin / deploy node; • 3 x OSD and MON nodes; o each OSD node has 2 x 8 GB HDDs; The setup was made using Virtual Box images, on Ubuntu 14.04.2. After performing all the steps, the 'ceph health' output lists the cluster in the HEALTH_WARN state, with the following details: HEALTH_WARN 64 pgs degraded; 64 pgs stuck degraded; 64 pgs stuck unclean; 64 pgs stuck undersized; 64 pgs undersized; too few pgs per osd (10 min 20) The output of 'ceph -s': cluster b483bc59-c95e-44b1-8f8d-86d3feffcfab health HEALTH_WARN 64 pgs degraded; 64 pgs stuck degraded; 64 pgs stuck unclean; 64 pgs stuck undersized; 64 pgs undersized; too few pgs per osd (10 min 20) monmap e1: 3 mons at {osd-003=192.168.122.23:6789/0,osd- 002=192.168.122.22:6789/0,osd-001=192.168.122.21:6789/0}, election epoch 6, quorum 0,1,2 osd-001,osd-002,osd-003 osdmap e20: 6 osds: 6 up, 6 in pgmap v36: 64 pgs, 1 pools, 0 bytes data, 0 objects 199 MB used, 18166 MB / 18365 MB avail 64 active+undersized+degraded I have tried to increase the pg_num and pgp_num to 512, as advised here, but Ceph refused to do that, with the following error: Error E2BIG: specified pg_num 512 is too large (creating 384 new PGs on ~6 OSDs exceeds per-OSD max of 32) After changing the pg*_num to 256, as advised here, the warning was changed to: health HEALTH_WARN 256 pgs degraded; 256 pgs stuck unclean; 256 pgs undersized What is the issue behind these warning? and what do I need to do to fix it? It's basically
Re: [ceph-users] 'pgs stuck unclean ' problem
Hi, On 03/20/2015 01:58 AM, houguanghua wrote: Dear all, Ceph 0.72.2 is deployed in three hosts. But the ceph's status is HEALTH_WARN . The status is as follows: # ceph -s cluster e25909ed-25d9-42fd-8c97-0ed31eec6194 health HEALTH_WARN 768 pgs degraded; 768 pgs stuck unclean; recovery 2/3 objects degraded (66.667%) monmap e3: 3 mons at {ceph-node1=192.168.57.101:6789/0,ceph-node2=192.168.57.102:6789/0,ceph-node3=192.168.57.103:6789/0}, election epoch 34, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3 osdmap e170: 9 osds: 9 up, 9 in pgmap v1741: 768 pgs, 7 pools, 36 bytes data, 1 objects 367 MB used, 45612 MB / 45980 MB avail 2/3 objects degraded (66.667%) 768 active+degraded *snipsnap* Other info is depicted here. # ceph osd tree # idweight type name up/down reweight -1 0 root default -7 0 rack rack03 -4 0 host ceph-node3 6 0 osd.6 up 1 7 0 osd.7 up 1 8 0 osd.8 up 1 -6 0 rack rack02 -3 0 host ceph-node2 3 0 osd.3 up 1 4 0 osd.4 up 1 5 0 osd.5 up 1 -5 0 rack rack01 -2 0 host ceph-node1 0 0 osd.0 up 1 1 0 osd.1 up 1 2 0 osd.2 up 1 The weights for all OSD devices are 0. As a result all OSDs are considered unusable for Ceph and not considered for storing objects on them. This problem usually occurs in test setups with very small OSDs devices. If this is the case in your setup, you can adjust the weight of the OSDs or use larger devices. If your devices should have a sufficient size, you need to check why the weights of the OSDs are not adjusted accordingly. Best regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD Hardware recommendation
On 19 Mar 2015, at 08:17, Christian Balzer ch...@gol.com wrote: On Wed, 18 Mar 2015 08:59:14 +0100 Josef Johansson wrote: Hi, On 18 Mar 2015, at 05:29, Christian Balzer ch...@gol.com wrote: Hello, On Wed, 18 Mar 2015 03:52:22 +0100 Josef Johansson wrote: [snip] We though of doing a cluster with 3 servers, and any recommendation of supermicro servers would be appreciated. Why 3, replication of 3? With Intel SSDs and diligent (SMART/NAGIOS) wear level monitoring I'd personally feel safe with a replication factor of 2. I’ve seen recommendations of replication 2! The Intel SSDs are indeed endurable. This is only with Intel SSDs I assume? From the specifications and reviews I've seen the Samsung 845DC PRO, the SM 843T and even more so the SV843 (http://www.samsung.com/global/business/semiconductor/product/flash-ssd/overview don't you love it when the same company has different, competing products?) should do just fine when it comes to endurance and performance. Alas I have no first hand experience with either, just the (read-optimized) 845DC EVO. The 845DC Pro does look really nice, comparable with s3700 with TDW even. The price is what really does it, as it’s almost a third compared with s3700.. With replication set of 3 it’s the same price as s3610 with replication set of 2. How enterprise-ish is it to run with replication set of 2 according to the Inktank-guys? Really thinking of going with 845DC Pro here actually. This 1U http://www.supermicro.com.tw/products/system/1U/1028/SYS-1028U-TR4T_.cfm http://www.supermicro.com.tw/products/system/1U/1028/SYS-1028U-TR4T_.cfm is really nice, missing the SuperDOM peripherals though.. While I certainly see use cases for SuperDOM, not all models have 2 connectors, so no chance to RAID1 things, thus the need to _definitely_ have to pull the server out (and re-install the OS) should it fail. Yeah, I fancy using hot swap for OS disks, and with 24 front hot swap there’s plenty room to have a couple of OS drives =) The 2U also has possibility to have an extra 2x10GbE-card totalling in 4x10GbE, which is needed. so you really get 8 drives if you need two for OS. And the rails.. don’t get me started, but lately they do just snap into the racks! No screws needed. That’s a refresh from earlier 1U SM rails. Ah, the only 1U servers I'm currently deploying from SM are older ones, so still no snap-in rails. Everything 2U has been that way for at least 2 years, though. ^^ It’s awesome I tell you. :) Cheers, Josef Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs issue
Hi Bogdan, Here is the link for hardware recccomendations : http://ceph.com/docs/master/start/hardware-recommendations/#hard-disk-drives. As per this link, minimum size reccommended for osds is 1TB. Butt as Nick said, Ceph OSDs must be min. 10GB to get an weight of 0.01 Here is the snippet from crushmaps section of ceph docs: Weighting Bucket Items Ceph expresses bucket weights as doubles, which allows for fine weighting. A weight is the relative difference between device capacities. We recommend using 1.00 as the relative weight for a 1TB storage device. In such a scenario, a weight of 0.5 would represent approximately 500GB, and a weight of 3.00 would represent approximately 3TB. Higher level buckets have a weight that is the sum total of the leaf items aggregated by the bucket. Thanks Sahana On Fri, Mar 20, 2015 at 2:08 PM, Bogdan SOLGA bogdan.so...@gmail.com wrote: Thank you for your suggestion, Nick! I have re-weighted the OSDs and the status has changed to '256 active+clean'. Is this information clearly stated in the documentation, and I have missed it? In case it isn't - I think it would be recommended to add it, as the issue might be encountered by other users, as well. Kind regards, Bogdan On Fri, Mar 20, 2015 at 10:33 AM, Nick Fisk n...@fisk.me.uk wrote: I see the Problem, as your OSD's are only 8GB they have a zero weight, I think the minimum size you can get away with is 10GB in Ceph as the size is measured in TB and only has 2 decimal places. For a work around try running :- ceph osd crush reweight osd.X 1 for each osd, this will reweight the OSD's. Assuming this is a test cluster and you won't be adding any larger OSD's in the future this shouldn't cause any problems. admin@cp-admin:~/safedrive$ ceph osd tree # idweighttype nameup/downreweight -10root default -20host osd-001 00osd.0up1 10osd.1up1 -30host osd-002 20osd.2up1 30osd.3up1 -40host osd-003 40osd.4up1 50osd.5up1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD + Flashcache + udev + Partition uuid
Hi, On 03/19/2015 10:41 PM, Nick Fisk wrote: I'm looking at trialling OSD's with a small flashcache device over them to hopefully reduce the impact of metadata updates when doing small block io. Inspiration from here:- http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/12083 One thing I suspect will happen, is that when the OSD node starts up udev could possibly mount the base OSD partition instead of flashcached device, as the base disk will have the ceph partition uuid type. This could result in quite nasty corruption. I ran into this problem with an enhanceio based cache for one of our database servers. I think you can prevent this problem by using bcache, which is also integrated into the official kernel tree. It does not act as a drop in replacement, but creates a new device that is only available if the cache is initialized correctly. A GPT partion table on the bcache device should be enough to allow the standard udev rules to kick in. I haven't used bcache in this scenario yet, and I cannot comment on its speed and reliability compared to other solutions. But from the operational point of view it is safer than enhanceio/flashcache. Best regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to compute Ceph durability?
Hi Ghislain, You will find more information about tools and methods at On 20/03/2015 11:47, ghislain.cheval...@orange.com wrote: Hi all, I would like to compute the durability of data stored in a ceph environment according to the cluster topology (failure domains) and the data resiliency (replication/erasure coding). Does a tool exist ? Best regards *- - - - - - - - - - - - - - - - -* *Ghislain Chevalier ORANGE* +33299124432 +33788624370 ghislain.cheval...@orange.com mailto:ghislain.cheval...@orange-ftgroup.com _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to compute Ceph durability?
(that's what happens when typing Control-Enter V instead of Control-V enter ;-) On 20/03/2015 11:50, Loic Dachary wrote: Hi Ghislain, You will find more information about tools and methods at https://wiki.ceph.com/Development/Reliability_model/Final_report Enjoy ! On 20/03/2015 11:47, ghislain.cheval...@orange.com wrote: Hi all, I would like to compute the durability of data stored in a ceph environment according to the cluster topology (failure domains) and the data resiliency (replication/erasure coding). Does a tool exist ? Best regards *- - - - - - - - - - - - - - - - -* *Ghislain Chevalier ORANGE* +33299124432 +33788624370 ghislain.cheval...@orange.com mailto:ghislain.cheval...@orange-ftgroup.com _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean
Hello Guys My CEPH cluster lost data and not its not recovering. This problem occurred when Ceph performed recovery when one of the node was down. Now all the nodes are up but Ceph is showing PG as incomplete , unclean , recovering. I have tried several things to recover them like , scrub , deep-scrub , pg repair , try changing primary affinity and then scrubbing , osd_pool_default_size etc. BUT NO LUCK Could yo please advice , how to recover PG and achieve HEALTH_OK # ceph -s cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33 health HEALTH_WARN 19 pgs incomplete; 3 pgs recovering; 20 pgs stuck inactive; 23 pgs stuck unclean; 2 requests are blocked 32 sec; recovery 531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%) monmap e3: 3 mons at {xxx=:6789/0,xxx=:6789:6789/0,xxx=:6789:6789/0}, election epoch 1474, quorum 0,1,2 xx,xx,xx osdmap e261536: 239 osds: 239 up, 238 in pgmap v415790: 18432 pgs, 13 pools, 2330 GB data, 319 kobjects 20316 GB used, 844 TB / 864 TB avail 531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%) 1 creating 18409 active+clean 3 active+recovering 19 incomplete # ceph pg dump_stuck unclean ok pg_stat objects mip degrunf bytes log disklog state state_stamp v reportedup up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 10.70 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.534911 0'0 261536:1015 [153,140,80]153 [153,140,80]153 0'0 2015-03-12 17:59:43.275049 0'0 2015-03-09 17:55:58.745662 3.dde 68 66 0 66 552861709 297 297 incomplete 2015-03-20 12:19:49.584839 33547'297 261536:228352 [174,5,179] 174 [174,5,179] 174 33547'297 2015-03-12 14:19:15.261595 28522'432015-03-11 14:19:13.894538 5.a20 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.560756 0'0 261536:897 [214,191,170] 214 [214,191,170] 214 0'0 2015-03-12 17:58:29.257085 0'0 2015-03-09 17:55:07.684377 13.1b6 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.846253 0'0 261536:1050 [0,176,131] 0 [0,176,131] 0 0'0 2015-03-12 18:00:13.286920 0'0 2015-03-09 17:56:18.715208 7.25b 16 0 0 0 6710886416 16 incomplete 2015-03-20 12:19:49.639102 27666'16261536:4777 [194,145,45]194 [194,145,45]194 27666'162015-03-12 17:59:06.357864 2330'3 2015-03-09 17:55:30.754522 5.190 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.742698 0'0 261536:25410[212,43,131]212 [212,43,131]212 0'0 2015-03-12 13:51:37.777026 0'0 2015-03-11 13:51:35.406246 3.a2f 0 0 0 0 0 0 0 creating 2015-03-20 12:42:15.586372 0'0 0:0 [] -1 [] -1 0'0 0.000'0 0.00 7.298 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.566966 0'0 261536:900 [187,95,225]187 [187,95,225]187 27666'132015-03-12 17:59:10.308423 2330'4 2015-03-09 17:55:35.750109 3.a5a 77 87 261 87 623902741 325 325 active+recovering 2015-03-20 10:54:57.443670 33569'325 261536:182464 [150,149,181] 150 [150,149,181] 150 33569'325 2015-03-12 13:58:05.813966 28433'442015-03-11 13:57:53.909795 1.1e7 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.610547 0'0 261536:772 [175,182] 175 [175,182] 175 0'0 2015-03-12 17:55:45.203232 0'0 2015-03-09 17:53:49.694822 3.774 79 0 0 0 645136397 339 339 incomplete 2015-03-20 12:19:49.821708 33570'339 261536:166857 [162,39,161]162 [162,39,161]162 33570'339 2015-03-12 14:49:03.869447 2226'2 2015-03-09 13:46:49.783950 3.7d0 78 0 0 0 609222686 376 376 incomplete 2015-03-20 12:19:49.534004 33538'376 261536:182810 [117,118,177] 117 [117,118,177] 117 33538'376 2015-03-12 13:51:03.984454 28394'622015-03-11 13:50:58.196288 3.d60 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.647196 0'0 261536:833 [154,172,1] 154 [154,172,1] 154 33552'321 2015-03-12
[ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean
Hello Guys My CEPH cluster lost data and not its not recovering. This problem occurred when Ceph performed recovery when one of the node was down. Now all the nodes are up but Ceph is showing PG as incomplete , unclean , recovering. I have tried several things to recover them like , scrub , deep-scrub , pg repair , try changing primary affinity and then scrubbing , osd_pool_default_size etc. BUT NO LUCK Could yo please advice , how to recover PG and achieve HEALTH_OK # ceph -s cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33 health HEALTH_WARN 19 pgs incomplete; 3 pgs recovering; 20 pgs stuck inactive; 23 pgs stuck unclean; 2 requests are blocked 32 sec; recovery 531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%) monmap e3: 3 mons at {xxx=:6789/0,xxx=:6789:6789/0,xxx=:6789:6789/0}, election epoch 1474, quorum 0,1,2 xx,xx,xx osdmap e261536: 239 osds: 239 up, 238 in pgmap v415790: 18432 pgs, 13 pools, 2330 GB data, 319 kobjects 20316 GB used, 844 TB / 864 TB avail 531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%) 1 creating 18409 active+clean 3 active+recovering 19 incomplete # ceph pg dump_stuck unclean ok pg_stat objects mip degrunf bytes log disklog state state_stamp v reportedup up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 10.70 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.534911 0'0 261536:1015 [153,140,80]153 [153,140,80]153 0'0 2015-03-12 17:59:43.275049 0'0 2015-03-09 17:55:58.745662 3.dde 68 66 0 66 552861709 297 297 incomplete 2015-03-20 12:19:49.584839 33547'297 261536:228352 [174,5,179] 174 [174,5,179] 174 33547'297 2015-03-12 14:19:15.261595 28522'432015-03-11 14:19:13.894538 5.a20 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.560756 0'0 261536:897 [214,191,170] 214 [214,191,170] 214 0'0 2015-03-12 17:58:29.257085 0'0 2015-03-09 17:55:07.684377 13.1b6 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.846253 0'0 261536:1050 [0,176,131] 0 [0,176,131] 0 0'0 2015-03-12 18:00:13.286920 0'0 2015-03-09 17:56:18.715208 7.25b 16 0 0 0 6710886416 16 incomplete 2015-03-20 12:19:49.639102 27666'16261536:4777 [194,145,45]194 [194,145,45]194 27666'162015-03-12 17:59:06.357864 2330'3 2015-03-09 17:55:30.754522 5.190 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.742698 0'0 261536:25410[212,43,131]212 [212,43,131]212 0'0 2015-03-12 13:51:37.777026 0'0 2015-03-11 13:51:35.406246 3.a2f 0 0 0 0 0 0 0 creating 2015-03-20 12:42:15.586372 0'0 0:0 [] -1 [] -1 0'0 0.000'0 0.00 7.298 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.566966 0'0 261536:900 [187,95,225]187 [187,95,225]187 27666'132015-03-12 17:59:10.308423 2330'4 2015-03-09 17:55:35.750109 3.a5a 77 87 261 87 623902741 325 325 active+recovering 2015-03-20 10:54:57.443670 33569'325 261536:182464 [150,149,181] 150 [150,149,181] 150 33569'325 2015-03-12 13:58:05.813966 28433'442015-03-11 13:57:53.909795 1.1e7 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.610547 0'0 261536:772 [175,182] 175 [175,182] 175 0'0 2015-03-12 17:55:45.203232 0'0 2015-03-09 17:53:49.694822 3.774 79 0 0 0 645136397 339 339 incomplete 2015-03-20 12:19:49.821708 33570'339 261536:166857 [162,39,161]162 [162,39,161]162 33570'339 2015-03-12 14:49:03.869447 2226'2 2015-03-09 13:46:49.783950 3.7d0 78 0 0 0 609222686 376 376 incomplete 2015-03-20 12:19:49.534004 33538'376 261536:182810 [117,118,177] 117 [117,118,177] 117 33538'376 2015-03-12 13:51:03.984454 28394'622015-03-11 13:50:58.196288 3.d60 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.647196 0'0 261536:833 [154,172,1] 154 [154,172,1] 154 33552'321 2015-03-12
[ceph-users] how to compute Ceph durability?
Hi all, I would like to compute the durability of data stored in a ceph environment according to the cluster topology (failure domains) and the data resiliency (replication/erasure coding). Does a tool exist ? Best regards - - - - - - - - - - - - - - - - - Ghislain Chevalier ORANGE +33299124432 +33788624370 ghislain.cheval...@orange.commailto:ghislain.cheval...@orange-ftgroup.com _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs issue
Thank you for your suggestion, Nick! I have re-weighted the OSDs and the status has changed to '256 active+clean'. Is this information clearly stated in the documentation, and I have missed it? In case it isn't - I think it would be recommended to add it, as the issue might be encountered by other users, as well. Kind regards, Bogdan On Fri, Mar 20, 2015 at 10:33 AM, Nick Fisk n...@fisk.me.uk wrote: I see the Problem, as your OSD's are only 8GB they have a zero weight, I think the minimum size you can get away with is 10GB in Ceph as the size is measured in TB and only has 2 decimal places. For a work around try running :- ceph osd crush reweight osd.X 1 for each osd, this will reweight the OSD's. Assuming this is a test cluster and you won't be adding any larger OSD's in the future this shouldn't cause any problems. admin@cp-admin:~/safedrive$ ceph osd tree # idweighttype nameup/downreweight -10root default -20host osd-001 00osd.0up1 10osd.1up1 -30host osd-002 20osd.2up1 30osd.3up1 -40host osd-003 40osd.4up1 50osd.5up1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD Forece Removal
- Mail original - Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Hi, Try manual : * http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.com Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?
Ah, I was wondering myself if compression could be causing an issue, but I'm reconsidering now. My latest experiment should hopefully help troubleshoot. So, I remembered that ZLIB is slower, but is more 'safe for old kernels'. I try that: find /var/lib/ceph/osd/ceph-1/current -xdev \( -type f -o -type d \) -exec btrfs filesystem defragment -v -czlib -- {} + After much, much waiting, all files have been rewritten, but the OSD still gets stuck at the same point. I've now unset the compress attribute on all files and started the defragment process again, but I'm not too hopeful since the files must be readable/writeable if I didn't get some failure during the defrag process. find /var/lib/ceph/osd/ceph-1/current -xdev \( -type f -o -type d \) -exec chattr -c -- {} + find /var/lib/ceph/osd/ceph-1/current -xdev \( -type f -o -type d \) -exec btrfs filesystem defragment -v -- {} + (latter command still running) Any other ideas at all? In the absence of the problem being spelled out to me with an error of some sort, I'm not sure how to troubleshoot further. Is it safe to upgrade a problematic cluster, when the time comes, in case this ultimately is a CEPH bug which is fixed in something later than 0.80.9? -Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: 18 March 2015 14:01 To: Chris Murray Cc: ceph-users Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help? On Wed, Mar 18, 2015 at 3:28 AM, Chris Murray chrismurra...@gmail.com wrote: Hi again Greg :-) No, it doesn't seem to progress past that point. I started the OSD again a couple of nights ago: 2015-03-16 21:34:46.221307 7fe4a8aa7780 10 journal op_apply_finish 13288339 open_ops 1 - 0, max_applied_seq 13288338 - 13288339 2015-03-16 21:34:46.221445 7fe4a8aa7780 3 journal journal_replay: r = 0, op_seq now 13288339 2015-03-16 21:34:46.221513 7fe4a8aa7780 2 journal read_entry 3951706112 : seq 13288340 1755 bytes 2015-03-16 21:34:46.221547 7fe4a8aa7780 3 journal journal_replay: applying op seq 13288340 2015-03-16 21:34:46.221579 7fe4a8aa7780 10 journal op_apply_start 13288340 open_ops 0 - 1 2015-03-16 21:34:46.221610 7fe4a8aa7780 10 filestore(/var/lib/ceph/osd/ceph-1) _do_transaction on 0x3142480 2015-03-16 21:34:46.221651 7fe4a8aa7780 15 filestore(/var/lib/ceph/osd/ceph-1) _omap_setkeys meta/16ef7597/infos/head//-1 2015-03-16 21:34:46.222017 7fe4a8aa7780 10 filestore oid: 16ef7597/infos/head//-1 not skipping op, *spos 13288340.0.1 2015-03-16 21:34:46.222053 7fe4a8aa7780 10 filestore header.spos 0.0.0 2015-03-16 21:34:48.096002 7fe49a5ac700 20 filestore(/var/lib/ceph/osd/ceph-1) sync_entry woke after 5.000178 2015-03-16 21:34:48.096037 7fe49a5ac700 10 journal commit_start max_applied_seq 13288339, open_ops 1 2015-03-16 21:34:48.096040 7fe49a5ac700 10 journal commit_start waiting for 1 open ops to drain There's the success line for 13288339, like you mentioned. But not one for 13288340. Intriguing. So, those same 1755 bytes seem problematic every time the journal is replayed? Interestingly, there is a lot (in time, not exactly data mass or IOPs, but still more than 1755 bytes!) of activity while the log is at this line: 2015-03-16 21:34:48.096040 7fe49a5ac700 10 journal commit_start waiting for 1 open ops to drain ... but then the IO ceases and the log still doesn't go any further. I wonder why 13288339 doesn't have that same 'waiting for ... open ops to drain' line. Or the 'woke after' one for that matter. While there is activity on sdb, it 'pulses' every 10 seconds or so, like this: sdb 20.00 0.00 3404.00 0 3404 sdb 16.00 0.00 2100.00 0 2100 sdb 10.00 0.00 1148.00 0 1148 sdb 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sdb 1.00 0.00 496.00 0496 sdb 32.00 0.00 4940.00 0 4940 sdb 8.00 0.00 1144.00 0 1144 sdb 1.00 0.00 4.00 0 4 sdb 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sdb 0.00 0.00
Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?
On Fri, Mar 20, 2015 at 4:03 PM, Chris Murray chrismurra...@gmail.com wrote: Ah, I was wondering myself if compression could be causing an issue, but I'm reconsidering now. My latest experiment should hopefully help troubleshoot. So, I remembered that ZLIB is slower, but is more 'safe for old kernels'. I try that: find /var/lib/ceph/osd/ceph-1/current -xdev \( -type f -o -type d \) -exec btrfs filesystem defragment -v -czlib -- {} + After much, much waiting, all files have been rewritten, but the OSD still gets stuck at the same point. I've now unset the compress attribute on all files and started the defragment process again, but I'm not too hopeful since the files must be readable/writeable if I didn't get some failure during the defrag process. find /var/lib/ceph/osd/ceph-1/current -xdev \( -type f -o -type d \) -exec chattr -c -- {} + find /var/lib/ceph/osd/ceph-1/current -xdev \( -type f -o -type d \) -exec btrfs filesystem defragment -v -- {} + (latter command still running) Any other ideas at all? In the absence of the problem being spelled out to me with an error of some sort, I'm not sure how to troubleshoot further. Not much, sorry. Is it safe to upgrade a problematic cluster, when the time comes, in case this ultimately is a CEPH bug which is fixed in something later than 0.80.9? In general it should be fine since we're careful about backwards compatibility, but without knowing the actual issue I can't promise anything. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Question Blackout
Hi! We have experienced several blackouts on our small ceph cluster. Most annoying problem is time desync just after a blackout: mons are not starting to work before time sync, after resync and manual restart of monitors, some of pgs can stuck in inactive or peering state for a significant period of time, restarting of osds with such pgs can unstuck them. Pavel. 18 марта 2015 г., в 6:32, Jesus Chavez (jeschave) jesch...@cisco.com написал(а): Hi everyone, I am ready to launch ceph on production but there is one thing that keeps on my mind... If there was a Blackout where all the ceph nodes went off what would really happen with the filesystem? It would get corrupt? Or ceph has any Kind of mechanism to survive to something like that? Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.com mailto:jesch...@cisco.com Phone: +52 55 5267 3146 tel:+52%2055%205267%203146 Mobile: +51 1 5538883255 tel:+51%201%205538883255 CCIE - 44433 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD Forece Removal
Yes, at this point, I'd export the CRUSH, edit it and import it back in. What version are you running? Robert LeBlanc Sent from a mobile device please excuse any typos. On Mar 20, 2015 4:28 PM, Jesus Chavez (jeschave) jesch...@cisco.com wrote: thats what you sayd? [root@capricornio ~]# ceph auth del osd.9 entity osd.9 does not exist [root@capricornio ~]# ceph auth del osd.19 entity osd.19 does not exist * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 %2B52%2055%205267%203146* Mobile: *+51 1 5538883255* CCIE - 44433 Cisco.com http://www.cisco.com/ Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here http://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. On Mar 20, 2015, at 4:13 PM, Robert LeBlanc rob...@leblancnet.us wrote: Does it show DNE in the entry? That stands for Does Not Exist. It will disappear on it's own after a while. I don't know what the timeout is, but they have always gone away within 24 hours. I've edited the CRUSH map before and I don't think it removed it when it was already DNE, I just had to wait for it to go away on it's own. On Fri, Mar 20, 2015 at 3:55 PM, Jesus Chavez (jeschave) jesch...@cisco.com wrote: Maybe I should Edit the crushmap and delete osd... Is that a way yo force them? Thanks * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 +52%2055%205267%203146* Mobile: *+51 1 5538883255 +51%201%205538883255* CCIE - 44433 On Mar 20, 2015, at 2:21 PM, Robert LeBlanc rob...@leblancnet.us wrote: Removing the OSD from the CRUSH map and deleting the auth key is how you force remove an OSD. The OSD can no longer participate in the cluster, even if it does come back to life. All clients forget about the OSD when the new CRUSH map is distributed. On Fri, Mar 20, 2015 at 11:19 AM, Jesus Chavez (jeschave) jesch...@cisco.com wrote: Any idea how to forcé remove ? Thanks * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 +52%2055%205267%203146* Mobile: *+51 1 5538883255 +51%201%205538883255* CCIE - 44433 Begin forwarded message: *From:* Stéphane DUGRAVOT stephane.dugra...@univ-lorraine.fr *Date:* March 20, 2015 at 3:49:11 AM CST *To:* Jesus Chavez (jeschave) jesch...@cisco.com *Cc:* ceph-users ceph-users@lists.ceph.com *Subject:* *Re: [ceph-users] OSD Forece Removal* -- Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Hi, Try manual : - http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual Thanks * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 %2B52%2055%205267%203146* Mobile: *+51 1 5538883255* CCIE - 44433 Cisco.com http://www.cisco.com/ Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here http://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RADOS Gateway Maturity
Hi Jerry, We using RGW and RBD in our OpenStack clusters and as stand alone clusters. We have six large clusters and adding more. Most of any issues we have faced have been self inflicted such as not currently supporting bucket names like host names. Some S3 tools only work that way which causes some of our developer customers grief. We are addressing that. We have built extensive testing frameworks around S3 RGW testing using a OpenStack or AWS EC2 or Google Cloud Platform to dynamically spin up worker nodes to distribute load for stressing and performance monitoring. I'm actually building a project called IQStack that will be released on github soon that does that plus other OpenStack testing for scalability. Anyway, there may be some incompatibilities depending on the feature set but most can be abstracted away and addressed. Just as a footnote, I finished running long load testing against AWS S3 and RGW. After tuning our load balancers, firewall rules and a few other tweaks I was able to get parity with AWS S3 up to 10GbE (max size of my load balancer I was testing with). We use several CDNs on video clips so my tests were with 2MB byte-range requests. Thanks, Chris On Fri, Mar 20, 2015 at 6:06 PM, Craig Lewis cle...@centraldesktop.com wrote: I have found a few incompatibilities, but so far they're all on the Ceph side. One example I remember was having to change the way we delete objects. The function we originally used fetches a list of object versions, and deletes all versions. Ceph is implementing objects versions now (I believe that'll ship with Hammer), so we had to call a different function to delete the object without iterating over the versions. AFAIK, that code should work fine if we point it at Amazon. I haven't tried it though. I've been using RGW (with replication) in production for 2 years now, although I'm not large. So far, all of my RGW issues have been Ceph issues. Most of my issues are caused by my under-powered hardware, or shooting myself in the foot with aggressive optimizations. Things are better with my journals on SSD, but the best thing I did was slow down with my changes. For example, I have 7 OSD nodes and 72 OSDs. When I add new OSDs, I add a couple at a time instead of adding all the disks in a node at once. Guess how I learned that lesson. :-) On Wed, Mar 18, 2015 at 10:03 AM, Jerry Lam jerry@oicr.on.ca wrote: Hi Chris, Thank you for your reply. We are also thinking about using the S3 API but we are concerned about how compatible it is with the real S3. For instance, we would like to design the system using pre-signed URL for storing some objects. I read the ceph documentation, it does not mention if it supports it or not. My question is do you guys find that the code using the RADOS S3 API can easily run in Amazon S3 without any change? If no, how much effort it is needed to make it compatible? Best Regards, Jerry From: Chris Jones cjo...@cloudm2.com Date: Tuesday, March 17, 2015 at 4:39 PM To: Jerry Lam jerry@oicr.on.ca Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com Subject: Re: [ceph-users] RADOS Gateway Maturity Hi Jerry, I currently work at Bloomberg and we currently have a very large Ceph installation in production and we use the S3 compatible API for rados gateway. We are also re-architecting our new RGW and evaluating a different Apache configuration for a little better performance. We only use replicas right now, no erasure coding yet. Actually, you can take a look at our current configuration at https://github.com/bloomberg/chef-bcpc. -Chris On Tue, Mar 17, 2015 at 10:40 AM, Jerry Lam jerry@oicr.on.ca wrote: Hi Ceph user, I’m new to Ceph but I need to use Ceph as the storage for the Cloud we are building in house. Did anyone use RADOS Gateway in production? How mature it is in terms of compatibility with S3 / Swift? Anyone can share their experience on it? Best Regards, Jerry ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best Regards, Chris Jones http://www.cloudm2.com cjo...@cloudm2.com (p) 770.655.0770 This message is intended exclusively for the individual or entity to which it is addressed. This communication may contain information that is proprietary, privileged or confidential or otherwise legally exempt from disclosure. If you are not the named addressee, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this message in error, please notify the sender immediately by e-mail and delete all copies of the message. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best Regards, Chris Jones http://www.cloudm2.com
[ceph-users] Unable to create rbd snapshot on Centos 7
Hi guys, I'm trying to test rbd snapshot on a Centos 7. # rbd -p rbd ls test-a test-b test-c test-d # rbd snap create rbd/test-b@snap rbd: failed to create snapshot: (22) Invalid argument 2015-03-20 15:22:56.300731 7f78f7afe880 -1 librbd: failed to create snap id: (22) Invalid argument I tried the same exact command on a Ubuntu 14.04.2 LTS # rbd snap create rbd/test-a@snap # rbd snap ls --image test-a SNAPID NAME SIZE 2 snap 10240 MB Does anyone have any clue? Thank you, Gianfranco ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] centos vs ubuntu for production ceph cluster ?
Hi, I'll build my full ssd production soon, I wonder which distrib is best tested with inktank and ceph team ? ceph.com doc is quite old, and don't have reference for giant or hammer http://ceph.com/docs/master/start/os-recommendations/ Seem than in past only ubuntu and rhel was well tested, not sure about centos. Regards, Alexandre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD Forece Removal
Have you tried it from a different node? like the ceph-mon or another ceph-osd node? On Fri, Mar 20, 2015 at 11:23 AM, Jesus Chavez (jeschave) jesch...@cisco.com wrote: Thanks stephane the thing is that those steps needs to be run in the node where the osd lives, I dont have that node any more since the operating Systems got corrupted so I Couldnt make it work :( Thanks * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 +52%2055%205267%203146* Mobile: *+51 1 5538883255 +51%201%205538883255* CCIE - 44433 On Mar 20, 2015, at 3:49 AM, Stéphane DUGRAVOT stephane.dugra...@univ-lorraine.fr wrote: -- Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Hi, Try manual : - http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual Thanks image005.png * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 %2B52%2055%205267%203146* Mobile: *+51 1 5538883255* CCIE - 44433 Cisco.com http://www.cisco.com/ Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here http://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD Forece Removal
Thanks stephane the thing is that those steps needs to be run in the node where the osd lives, I dont have that node any more since the operating Systems got corrupted so I Couldnt make it work :( Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 On Mar 20, 2015, at 3:49 AM, Stéphane DUGRAVOT stephane.dugra...@univ-lorraine.frmailto:stephane.dugra...@univ-lorraine.fr wrote: Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Hi, Try manual : * http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual Thanks image005.png Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [cid:image006.gif@01D00809.A6D502D0] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] centos vs ubuntu for production ceph cluster ?
For all intents and purposes, centos and rhel are equivalent, so I'd not be too concerned about that distinction. I can't comment as to which distro is better tested by ceph devs, but assuming that the packages are built appropriately with similar dependency versions and whatnot, that also shouldn't matter much, though distro-specific bugs are certainly a thing they are generally a rare thing aside from packaging quirks. In my experience the biggest differentiator by distro for the quality of a deployed service is the skill of the people administering it. In other words, deploy on the one your ops team knows better. Everything else will come out in the wash. QH On Fri, Mar 20, 2015 at 8:16 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, I'll build my full ssd production soon, I wonder which distrib is best tested with inktank and ceph team ? ceph.com doc is quite old, and don't have reference for giant or hammer http://ceph.com/docs/master/start/os-recommendations/ Seem than in past only ubuntu and rhel was well tested, not sure about centos. Regards, Alexandre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD Forece Removal
On 20/03/2015 15:23, Jesus Chavez (jeschave) wrote: Thanks stephane the thing is that those steps needs to be run in the node where the osd lives, I dont have that node any more since the operating Systems got corrupted so I Couldnt make it work :( Assuming the OSD is already down+out, you can skip straight to the Removing the OSD part[1] and run those commands from one of your mons. John 1. http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds log message
On Fri, Mar 20, 2015 at 12:39 PM, Daniel Takatori Ohara dtoh...@mochsl.org.br wrote: Hello, Anybody help me, please? Appear any messages in log of my mds. And after the shell of my clients freeze. 2015-03-20 12:23:54.068005 7f1608d49700 0 log_channel(default) log [WRN] : client.3197487 isn't responding to mclientcaps(revoke), ino 11b1696 pending pAsxLsXsxFcb issued pAsxLsXsxFsxcrwb, sent 962.02 Well, this one means that it asked a client to revoke some file capabilities 962 seconds ago, and the client still hasn't. 2015-03-20 12:23:54.068135 7f1608d49700 0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for 962.028297 secs 2015-03-20 12:23:54.068142 7f1608d49700 0 log_channel(default) log [WRN] : slow request 962.028297 seconds old, received at 2015-03-20 12:07:52.039805: client_request(client.3197487:391527 create #11b And this is a request from the same client to create a file, also received ~962 seconds ago. This is probably blocked by the aforementioned capability drop. Everything that follows these have a good chance of being follow-on effects. The issue will probably clear itself up if you just restart the MDS. We've fixed a lot of bugs around this recently (although it's an ongoing source of them), so unless you're running very new code I would just restart and not worry about it. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hadoop namenode not starting due to bindException while deploying hadoop with cephFS
On Fri, Mar 20, 2015 at 1:05 PM, Ridwan Rashid ridwan...@gmail.com wrote: Gregory Farnum greg@... writes: On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid ridwan064@... wrote: Hi, I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with cephFS. I have installed hadoop-1.1.1 in the nodes and changed the conf/core-site.xml file according to the ceph documentation http://ceph.com/docs/master/cephfs/hadoop/ but after changing the file the namenode is not starting (namenode can be formatted) but the other services(datanode, jobtracker, tasktracker) are running in hadoop. The default hadoop works fine but when I change the core-site.xml file as above I get the following bindException as can be seen from the namenode log: 2015-03-19 01:37:31,436 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.net.BindException: Problem binding to node1/10.242.144.225:6789 : Cannot assign requested address I have one monitor for the ceph cluster (node1/10.242.144.225) and I included in the core-site.xml file ceph://10.242.144.225:6789 as the value of fs.default.name. The 6789 port is the default port being used by the monitor node of ceph, so that may be the reason for the bindException but the ceph documentation mentions that it should be included like this in the core-site.xml file. It would be really helpful to get some pointers to where I am doing wrong in the setup. I'm a bit confused. The NameNode is only used by HDFS, and so shouldn't be running at all if you're using CephFS. Nor do I have any idea why you've changed anything in a way that tells the NameNode to bind to the monitor's IP address; none of the instructions that I see can do that, and they certainly shouldn't be. -Greg Hi Greg, I want to run a hadoop job (e.g. terasort) and want to use cephFS instead of HDFS. In Using Hadoop with cephFS documentation in http://ceph.com/docs/master/cephfs/hadoop/ if you look into the Hadoop configuration section, the first property fs.default.name has to be set as the ceph URI and in the notes it's mentioned as ceph://[monaddr:port]/. My core-site.xml of hadoop conf looks like this configuration property namefs.default.name/name valueceph://10.242.144.225:6789/value /property Yeah, that all makes sense. But I don't understand why or how you're starting up a NameNode at all, nor what config values it's drawing from to try and bind to that port. The NameNode is the problem because it shouldn't even be invoked. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hadoop namenode not starting due to bindException while deploying hadoop with cephFS
On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid ridwan...@gmail.com wrote: Hi, I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with cephFS. I have installed hadoop-1.1.1 in the nodes and changed the conf/core-site.xml file according to the ceph documentation http://ceph.com/docs/master/cephfs/hadoop/ but after changing the file the namenode is not starting (namenode can be formatted) but the other services(datanode, jobtracker, tasktracker) are running in hadoop. The default hadoop works fine but when I change the core-site.xml file as above I get the following bindException as can be seen from the namenode log: 2015-03-19 01:37:31,436 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.net.BindException: Problem binding to node1/10.242.144.225:6789 : Cannot assign requested address I have one monitor for the ceph cluster (node1/10.242.144.225) and I included in the core-site.xml file ceph://10.242.144.225:6789 as the value of fs.default.name. The 6789 port is the default port being used by the monitor node of ceph, so that may be the reason for the bindException but the ceph documentation mentions that it should be included like this in the core-site.xml file. It would be really helpful to get some pointers to where I am doing wrong in the setup. I'm a bit confused. The NameNode is only used by HDFS, and so shouldn't be running at all if you're using CephFS. Nor do I have any idea why you've changed anything in a way that tells the NameNode to bind to the monitor's IP address; none of the instructions that I see can do that, and they certainly shouldn't be. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs issue
Thank you for the clarifications, Sahana! I haven't got to that part, yet, so these details were (yet) unknown to me. Perhaps some information on the PGs weight should be provided in the 'quick deployment' page, as this issue might be encountered in the future by other users, as well. Kind regards, Bogdan On Fri, Mar 20, 2015 at 12:05 PM, Sahana shna...@gmail.com wrote: Hi Bogdan, Here is the link for hardware recccomendations : http://ceph.com/docs/master/start/hardware-recommendations/#hard-disk-drives. As per this link, minimum size reccommended for osds is 1TB. Butt as Nick said, Ceph OSDs must be min. 10GB to get an weight of 0.01 Here is the snippet from crushmaps section of ceph docs: Weighting Bucket Items Ceph expresses bucket weights as doubles, which allows for fine weighting. A weight is the relative difference between device capacities. We recommend using 1.00 as the relative weight for a 1TB storage device. In such a scenario, a weight of 0.5 would represent approximately 500GB, and a weight of 3.00 would represent approximately 3TB. Higher level buckets have a weight that is the sum total of the leaf items aggregated by the bucket. Thanks Sahana On Fri, Mar 20, 2015 at 2:08 PM, Bogdan SOLGA bogdan.so...@gmail.com wrote: Thank you for your suggestion, Nick! I have re-weighted the OSDs and the status has changed to '256 active+clean'. Is this information clearly stated in the documentation, and I have missed it? In case it isn't - I think it would be recommended to add it, as the issue might be encountered by other users, as well. Kind regards, Bogdan On Fri, Mar 20, 2015 at 10:33 AM, Nick Fisk n...@fisk.me.uk wrote: I see the Problem, as your OSD's are only 8GB they have a zero weight, I think the minimum size you can get away with is 10GB in Ceph as the size is measured in TB and only has 2 decimal places. For a work around try running :- ceph osd crush reweight osd.X 1 for each osd, this will reweight the OSD's. Assuming this is a test cluster and you won't be adding any larger OSD's in the future this shouldn't cause any problems. admin@cp-admin:~/safedrive$ ceph osd tree # idweighttype nameup/downreweight -10root default -20host osd-001 00osd.0up1 10osd.1up1 -30host osd-002 20osd.2up1 30osd.3up1 -40host osd-003 40osd.4up1 50osd.5up1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Fwd: OSD Forece Removal
Any idea how to forcé remove ? Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 Begin forwarded message: From: Stéphane DUGRAVOT stephane.dugra...@univ-lorraine.frmailto:stephane.dugra...@univ-lorraine.fr Date: March 20, 2015 at 3:49:11 AM CST To: Jesus Chavez (jeschave) jesch...@cisco.commailto:jesch...@cisco.com Cc: ceph-users ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] OSD Forece Removal Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Hi, Try manual : * http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual Thanks [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [cid:image006.gif@01D00809.A6D502D0] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs issue
This seems to be a fairly consistent problem for new users. The create-or-move is adjusting the crush weight, not the osd weight. Perhaps the init script should set the defaultweight to 0.01 if it's = 0? It seems like there's a downside to this, but I don't see it. On Fri, Mar 20, 2015 at 1:25 PM, Robert LeBlanc rob...@leblancnet.us wrote: The weight can be based on anything, size, speed, capability, some random value, etc. The important thing is that it makes sense to you and that you are consistent. Ceph by default (ceph-disk and I believe ceph-deploy) take the approach of using size. So if you use a different weighting scheme, you should manually add the OSDs, or clean up after using ceph-disk/ceph-deploy. Size works well for most people, unless the disks are less than 10 GB so most people don't bother messing with it. On Fri, Mar 20, 2015 at 12:06 PM, Bogdan SOLGA bogdan.so...@gmail.com wrote: Thank you for the clarifications, Sahana! I haven't got to that part, yet, so these details were (yet) unknown to me. Perhaps some information on the PGs weight should be provided in the 'quick deployment' page, as this issue might be encountered in the future by other users, as well. Kind regards, Bogdan On Fri, Mar 20, 2015 at 12:05 PM, Sahana shna...@gmail.com wrote: Hi Bogdan, Here is the link for hardware recccomendations : http://ceph.com/docs/master/start/hardware-recommendations/#hard-disk-drives. As per this link, minimum size reccommended for osds is 1TB. Butt as Nick said, Ceph OSDs must be min. 10GB to get an weight of 0.01 Here is the snippet from crushmaps section of ceph docs: Weighting Bucket Items Ceph expresses bucket weights as doubles, which allows for fine weighting. A weight is the relative difference between device capacities. We recommend using 1.00 as the relative weight for a 1TB storage device. In such a scenario, a weight of 0.5 would represent approximately 500GB, and a weight of 3.00 would represent approximately 3TB. Higher level buckets have a weight that is the sum total of the leaf items aggregated by the bucket. Thanks Sahana On Fri, Mar 20, 2015 at 2:08 PM, Bogdan SOLGA bogdan.so...@gmail.com wrote: Thank you for your suggestion, Nick! I have re-weighted the OSDs and the status has changed to '256 active+clean'. Is this information clearly stated in the documentation, and I have missed it? In case it isn't - I think it would be recommended to add it, as the issue might be encountered by other users, as well. Kind regards, Bogdan On Fri, Mar 20, 2015 at 10:33 AM, Nick Fisk n...@fisk.me.uk wrote: I see the Problem, as your OSD's are only 8GB they have a zero weight, I think the minimum size you can get away with is 10GB in Ceph as the size is measured in TB and only has 2 decimal places. For a work around try running :- ceph osd crush reweight osd.X 1 for each osd, this will reweight the OSD's. Assuming this is a test cluster and you won't be adding any larger OSD's in the future this shouldn't cause any problems. admin@cp-admin:~/safedrive$ ceph osd tree # idweighttype nameup/downreweight -10root default -20host osd-001 00osd.0up1 10osd.1up1 -30host osd-002 20osd.2up1 30osd.3up1 -40host osd-003 40osd.4up1 50osd.5up1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean
osdmap e261536: 239 osds: 239 up, 238 in Why is that last OSD not IN? The history you need is probably there. Run ceph pg pgid query on some of the stuck PGs. Look for the recovery_state section. That should tell you what Ceph needs to complete the recovery. If you need more help, post the output of a couple pg queries. On Fri, Mar 20, 2015 at 4:22 AM, Karan Singh karan.si...@csc.fi wrote: Hello Guys My CEPH cluster lost data and not its not recovering. This problem occurred when Ceph performed recovery when one of the node was down. Now all the nodes are up but Ceph is showing PG as incomplete , unclean , recovering. I have tried several things to recover them like , *scrub , deep-scrub , pg repair , try changing primary affinity and then scrubbing , osd_pool_default_size etc. BUT NO LUCK* Could yo please advice , how to recover PG and achieve HEALTH_OK # ceph -s cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33 health *HEALTH_WARN 19 pgs incomplete; 3 pgs recovering; 20 pgs stuck inactive; 23 pgs stuck unclean*; 2 requests are blocked 32 sec; recovery 531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%) monmap e3: 3 mons at {xxx=:6789/0,xxx=:6789:6789/0,xxx=:6789:6789/0}, election epoch 1474, quorum 0,1,2 xx,xx,xx osdmap e261536: 239 osds: 239 up, 238 in pgmap v415790: 18432 pgs, 13 pools, 2330 GB data, 319 kobjects 20316 GB used, 844 TB / 864 TB avail 531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%) 1 creating 18409 active+clean 3 active+recovering 19 incomplete # ceph pg dump_stuck unclean ok pg_stat objects mip degr unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 10.70 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.534911 0'0 261536:1015 [153,140,80] 153 [153,140,80] 153 0'0 2015-03-12 17:59:43.275049 0'0 2015-03-09 17:55:58.745662 3.dde 68 66 0 66 552861709 297 297 incomplete 2015-03-20 12:19:49.584839 33547'297 261536:228352 [174,5,179] 174 [174,5,179] 174 33547'297 2015-03-12 14:19:15.261595 28522'43 2015-03-11 14:19:13.894538 5.a2 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.560756 0'0 261536:897 [214,191,170] 214 [214,191,170] 214 0'0 2015-03-12 17:58:29.257085 0'0 2015-03-09 17:55:07.684377 13.1b6 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.846253 0'0 261536:1050 [0,176,131] 0 [0,176,131] 0 0'0 2015-03-12 18:00:13.286920 0'0 2015-03-09 17:56:18.715208 7.25b 16 0 0 0 67108864 16 16 incomplete 2015-03-20 12:19:49.639102 27666'16 261536:4777 [194,145,45] 194 [194,145,45] 194 27666'16 2015-03-12 17:59:06.357864 2330'3 2015-03-09 17:55:30.754522 5.19 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.742698 0'0 261536:25410 [212,43,131] 212 [212,43,131] 212 0'0 2015-03-12 13:51:37.777026 0'0 2015-03-11 13:51:35.406246 3.a2f 0 0 0 0 0 0 0 creating 2015-03-20 12:42:15.586372 0'0 0:0 [] -1 [] -1 0'0 0.00 0'0 0.00 7.298 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.566966 0'0 261536:900 [187,95,225] 187 [187,95,225] 187 27666'13 2015-03-12 17:59:10.308423 2330'4 2015-03-09 17:55:35.750109 3.a5a 77 87 261 87 623902741 325 325 active+recovering 2015-03-20 10:54:57.443670 33569'325 261536:182464 [150,149,181] 150 [150,149,181] 150 33569'325 2015-03-12 13:58:05.813966 28433'44 2015-03-11 13:57:53.909795 1.1e7 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.610547 0'0 261536:772 [175,182] 175 [175,182] 175 0'0 2015-03-12 17:55:45.203232 0'0 2015-03-09 17:53:49.694822 3.774 79 0 0 0 645136397 339 339 incomplete 2015-03-20 12:19:49.821708 33570'339 261536:166857 [162,39,161] 162 [162,39,161] 162 33570'339 2015-03-12 14:49:03.869447 2226'2 2015-03-09 13:46:49.783950 3.7d0 78 0 0 0 609222686 376 376 incomplete 2015-03-20 12:19:49.534004 33538'376 261536:182810 [117,118,177] 117 [117,118,177] 117 33538'376 2015-03-12 13:51:03.984454 28394'62 2015-03-11 13:50:58.196288 3.d60 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.647196 0'0 261536:833 [154,172,1] 154 [154,172,1] 154 33552'321 2015-03-12 13:44:43.502907 28356'39 2015-03-11 13:44:41.663482 4.1fc 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.610103 0'0 261536:1069 [70,179,58] 70 [70,179,58] 70 0'0 2015-03-12 17:58:19.254170 0'0 2015-03-09 17:54:55.720479 3.e02 72 0 0 0 585105425 304 304 incomplete 2015-03-20 12:19:49.564768 33568'304 261536:167428 [15,102,147] 15 [15,102,147] 15 33568'304 2015-03-16 10:04:19.894789 2246'4 2015-03-09 11:43:44.176331 8.1d4 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.614727 0'0 261536:19611 [126,43,174] 126 [126,43,174] 126 0'0 2015-03-12 14:34:35.258338 0'0 2015-03-12 14:34:35.258338 4.2f4 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.595109 0'0 261536:113791 [181,186,13] 181 [181,186,13] 181 0'0 2015-03-12 14:59:03.529264 0'0 2015-03-09 13:46:40.601301 3.52c 65 23 69 23
Re: [ceph-users] hadoop namenode not starting due to bindException while deploying hadoop with cephFS
Gregory Farnum greg@... writes: On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid ridwan064@... wrote: Hi, I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with cephFS. I have installed hadoop-1.1.1 in the nodes and changed the conf/core-site.xml file according to the ceph documentation http://ceph.com/docs/master/cephfs/hadoop/ but after changing the file the namenode is not starting (namenode can be formatted) but the other services(datanode, jobtracker, tasktracker) are running in hadoop. The default hadoop works fine but when I change the core-site.xml file as above I get the following bindException as can be seen from the namenode log: 2015-03-19 01:37:31,436 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.net.BindException: Problem binding to node1/10.242.144.225:6789 : Cannot assign requested address I have one monitor for the ceph cluster (node1/10.242.144.225) and I included in the core-site.xml file ceph://10.242.144.225:6789 as the value of fs.default.name. The 6789 port is the default port being used by the monitor node of ceph, so that may be the reason for the bindException but the ceph documentation mentions that it should be included like this in the core-site.xml file. It would be really helpful to get some pointers to where I am doing wrong in the setup. I'm a bit confused. The NameNode is only used by HDFS, and so shouldn't be running at all if you're using CephFS. Nor do I have any idea why you've changed anything in a way that tells the NameNode to bind to the monitor's IP address; none of the instructions that I see can do that, and they certainly shouldn't be. -Greg Hi Greg, I want to run a hadoop job (e.g. terasort) and want to use cephFS instead of HDFS. In Using Hadoop with cephFS documentation in http://ceph.com/docs/master/cephfs/hadoop/ if you look into the Hadoop configuration section, the first property fs.default.name has to be set as the ceph URI and in the notes it's mentioned as ceph://[monaddr:port]/. My core-site.xml of hadoop conf looks like this configuration property namefs.default.name/name valueceph://10.242.144.225:6789/value /property property namehadoop.tmp.dir/name value/app/hadoop/tmp/value descriptionA base for other temporary directories./description /property property namefs.ceph.impl/name valueorg.apache.hadoop.fs.ceph.CephFileSystem/value description /description /property property nameceph.conf.file/name value/etc/ceph/ceph.conf/value /property property nameceph.root.dir/name value//value /property property nameceph.mon.address/name value10.242.144.225:6789/value descriptionThis is the primary monitor node IP address in our installation./description /property property nameceph.auth.id/name valueadmin/value /property property nameceph.auth.keyring/name value/etc/ceph/ceph.client.admin.keyring/value /property property nameceph.object.size/name value67108864/value /property property nameceph.data.pools/name valuedata/value /property property nameceph.localize.reads/name valuetrue/value /property /configuration ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Server Specific Pools
You can create CRUSH rulesets and then assign pools to different rulesets. http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds On Thu, Mar 19, 2015 at 7:28 PM, Garg, Pankaj pankaj.g...@caviumnetworks.com wrote: Hi, I have a Ceph cluster with both ARM and x86 based servers in the same cluster. Is there a way for me to define Pools or some logical separation that would allow me to use only 1 set of machines for a particular test. That way it makes easy for me to run tests either on x86 or ARM and do some comparison testing. Thanks Pankaj ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mds log message
Hello, Anybody help me, please? Appear any messages in log of my mds. And after the shell of my clients freeze. 2015-03-20 12:23:54.068005 7f1608d49700 0 log_channel(default) log [WRN] : client.3197487 isn't responding to mclientcaps(revoke), ino 11b1696 pending pAsxLsXsxFcb issued pAsxLsXsxFsxcrwb, sent 962.02 2015-03-20 12:23:54.068135 7f1608d49700 0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for 962.028297 secs 2015-03-20 12:23:54.068142 7f1608d49700 0 log_channel(default) log [WRN] : slow request 962.028297 seconds old, received at 2015-03-20 12:07:52.039805: client_request(client.3197487:391527 create #11b 2015-03-20 12:39:54.096730 7f1608d49700 0 log_channel(default) log [WRN] : client.3197487 isn't responding to mclientcaps(revoke), ino 11b1696 pending pAsxLsXsxFcb issued pAsxLsXsxFsxcrwb, sent 1922.0 2015-03-20 12:39:54.096884 7f1608d49700 0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for 1922.057031 secs 2015-03-20 12:39:54.096891 7f1608d49700 0 log_channel(default) log [WRN] : slow request 1922.057031 seconds old, received at 2015-03-20 12:07:52.039805: client_request(client.3197487:391527 create #11 2015-03-20 13:11:54.161299 7f1608d49700 0 log_channel(default) log [WRN] : client.3197487 isn't responding to mclientcaps(revoke), ino 11b1696 pending pAsxLsXsxFcb issued pAsxLsXsxFsxcrwb, sent 3842.1 2015-03-20 13:11:54.161478 7f1608d49700 0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for 3842.121637 secs 2015-03-20 13:11:54.161500 7f1608d49700 0 log_channel(default) log [WRN] : slow request 3842.121637 seconds old, received at 2015-03-20 12:07:52.039805: client_request(client.3197487:391527 create #11 2015-03-20 14:15:54.301949 7f1608d49700 0 log_channel(default) log [WRN] : client.3197487 isn't responding to mclientcaps(revoke), ino 11b1696 pending pAsxLsXsxFcb issued pAsxLsXsxFsxcrwb, sent 7682.2 2015-03-20 14:15:54.302097 7f1608d49700 0 log_channel(default) log [WRN] : 1 slow requests, 1 included below; oldest blocked for 7682.262249 secs 2015-03-20 14:15:54.302105 7f1608d49700 0 log_channel(default) log [WRN] : slow request 7682.262249 seconds old, received at 2015-03-20 12:07:52.039805: client_request(client.3197487:391527 create #11 2015-03-20 16:14:49.549932 7f1608d49700 0 log_channel(default) log [WRN] : 2 slow requests, 1 included below; oldest blocked for 14817.510091 secs 2015-03-20 16:14:49.549954 7f1608d49700 0 log_channel(default) log [WRN] : slow request 32.442747 seconds old, received at 2015-03-20 16:14:17.107149: client_request(client.1727647:56325699 create #11 2015-03-20 16:15:19.550915 7f1608d49700 0 log_channel(default) log [WRN] : 2 slow requests, 1 included below; oldest blocked for 14847.511071 secs 2015-03-20 16:15:19.550942 7f1608d49700 0 log_channel(default) log [WRN] : slow request 62.443727 seconds old, received at 2015-03-20 16:14:17.107149: client_request(client.1727647:56325699 create #11 2015-03-20 16:16:19.552948 7f1608d49700 0 log_channel(default) log [WRN] : 2 slow requests, 1 included below; oldest blocked for 14907.513103 secs 2015-03-20 16:16:19.552970 7f1608d49700 0 log_channel(default) log [WRN] : slow request 122.445759 seconds old, received at 2015-03-20 16:14:17.107149: client_request(client.1727647:56325699 create #1 2015-03-20 16:18:19.556521 7f1608d49700 0 log_channel(default) log [WRN] : 2 slow requests, 1 included below; oldest blocked for 15027.516678 secs 2015-03-20 16:18:19.556544 7f1608d49700 0 log_channel(default) log [WRN] : slow request 242.449334 seconds old, received at 2015-03-20 16:14:17.107149: client_request(client.1727647:56325699 create #1 2015-03-20 16:19:44.559211 7f1608d49700 0 log_channel(default) log [WRN] : 3 slow requests, 1 included below; oldest blocked for 15112.519357 secs 2015-03-20 16:19:44.559241 7f1608d49700 0 log_channel(default) log [WRN] : slow request 34.882902 seconds old, received at 2015-03-20 16:19:09.676260: client_request(client.4880:2515834 getattr pAsLsXsFs 2015-03-20 16:20:14.560108 7f1608d49700 0 log_channel(default) log [WRN] : 3 slow requests, 1 included below; oldest blocked for 15142.520256 secs 2015-03-20 16:20:14.560137 7f1608d49700 0 log_channel(default) log [WRN] : slow request 64.883801 seconds old, received at 2015-03-20 16:19:09.676260: client_request(client.4880:2515834 getattr pAsLsXsFs 2015-03-20 16:21:14.562233 7f1608d49700 0 log_channel(default) log [WRN] : 3 slow requests, 1 included below; oldest blocked for 15202.522380 secs 2015-03-20 16:21:14.562265 7f1608d49700 0 log_channel(default) log [WRN] : slow request 124.885925 seconds old, received at 2015-03-20 16:19:09.676260: client_request(client.4880:2515834 getattr pAsLsXsFs 2015-03-20 16:22:19.564455 7f1608d49700 0 log_channel(default) log [WRN] : 3 slow requests, 1 included below; oldest blocked for 15267.524608 secs 2015-03-20
Re: [ceph-users] Fwd: OSD Forece Removal
Removing the OSD from the CRUSH map and deleting the auth key is how you force remove an OSD. The OSD can no longer participate in the cluster, even if it does come back to life. All clients forget about the OSD when the new CRUSH map is distributed. On Fri, Mar 20, 2015 at 11:19 AM, Jesus Chavez (jeschave) jesch...@cisco.com wrote: Any idea how to forcé remove ? Thanks * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 +52%2055%205267%203146* Mobile: *+51 1 5538883255 +51%201%205538883255* CCIE - 44433 Begin forwarded message: *From:* Stéphane DUGRAVOT stephane.dugra...@univ-lorraine.fr *Date:* March 20, 2015 at 3:49:11 AM CST *To:* Jesus Chavez (jeschave) jesch...@cisco.com *Cc:* ceph-users ceph-users@lists.ceph.com *Subject:* *Re: [ceph-users] OSD Forece Removal* -- Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Hi, Try manual : - http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual Thanks * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 %2B52%2055%205267%203146* Mobile: *+51 1 5538883255* CCIE - 44433 Cisco.com http://www.cisco.com/ Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here http://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD + Flashcache + udev + Partition uuid
We tested bcache and abandoned it for two reasons. 1. Didn't give us any better performance than journals on SSD. 2. We had lots of corruption of the OSDs and were rebuilding them frequently. Since removing them, the OSDs have been much more stable. On Fri, Mar 20, 2015 at 4:03 AM, Nick Fisk n...@fisk.me.uk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Burkhard Linke Sent: 20 March 2015 09:09 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] OSD + Flashcache + udev + Partition uuid Hi, On 03/19/2015 10:41 PM, Nick Fisk wrote: I'm looking at trialling OSD's with a small flashcache device over them to hopefully reduce the impact of metadata updates when doing small block io. Inspiration from here:- http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/12083 One thing I suspect will happen, is that when the OSD node starts up udev could possibly mount the base OSD partition instead of flashcached device, as the base disk will have the ceph partition uuid type. This could result in quite nasty corruption. I ran into this problem with an enhanceio based cache for one of our database servers. I think you can prevent this problem by using bcache, which is also integrated into the official kernel tree. It does not act as a drop in replacement, but creates a new device that is only available if the cache is initialized correctly. A GPT partion table on the bcache device should be enough to allow the standard udev rules to kick in. I haven't used bcache in this scenario yet, and I cannot comment on its speed and reliability compared to other solutions. But from the operational point of view it is safer than enhanceio/flashcache. I did look at bcache, but there are a lot of worrying messages on the mailing list about hangs and panics that has discouraged me slightly from it. I do think it is probably the best solution, but I'm not convinced about the stability. Best regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs issue
The weight can be based on anything, size, speed, capability, some random value, etc. The important thing is that it makes sense to you and that you are consistent. Ceph by default (ceph-disk and I believe ceph-deploy) take the approach of using size. So if you use a different weighting scheme, you should manually add the OSDs, or clean up after using ceph-disk/ceph-deploy. Size works well for most people, unless the disks are less than 10 GB so most people don't bother messing with it. On Fri, Mar 20, 2015 at 12:06 PM, Bogdan SOLGA bogdan.so...@gmail.com wrote: Thank you for the clarifications, Sahana! I haven't got to that part, yet, so these details were (yet) unknown to me. Perhaps some information on the PGs weight should be provided in the 'quick deployment' page, as this issue might be encountered in the future by other users, as well. Kind regards, Bogdan On Fri, Mar 20, 2015 at 12:05 PM, Sahana shna...@gmail.com wrote: Hi Bogdan, Here is the link for hardware recccomendations : http://ceph.com/docs/master/start/hardware-recommendations/#hard-disk-drives. As per this link, minimum size reccommended for osds is 1TB. Butt as Nick said, Ceph OSDs must be min. 10GB to get an weight of 0.01 Here is the snippet from crushmaps section of ceph docs: Weighting Bucket Items Ceph expresses bucket weights as doubles, which allows for fine weighting. A weight is the relative difference between device capacities. We recommend using 1.00 as the relative weight for a 1TB storage device. In such a scenario, a weight of 0.5 would represent approximately 500GB, and a weight of 3.00 would represent approximately 3TB. Higher level buckets have a weight that is the sum total of the leaf items aggregated by the bucket. Thanks Sahana On Fri, Mar 20, 2015 at 2:08 PM, Bogdan SOLGA bogdan.so...@gmail.com wrote: Thank you for your suggestion, Nick! I have re-weighted the OSDs and the status has changed to '256 active+clean'. Is this information clearly stated in the documentation, and I have missed it? In case it isn't - I think it would be recommended to add it, as the issue might be encountered by other users, as well. Kind regards, Bogdan On Fri, Mar 20, 2015 at 10:33 AM, Nick Fisk n...@fisk.me.uk wrote: I see the Problem, as your OSD's are only 8GB they have a zero weight, I think the minimum size you can get away with is 10GB in Ceph as the size is measured in TB and only has 2 decimal places. For a work around try running :- ceph osd crush reweight osd.X 1 for each osd, this will reweight the OSD's. Assuming this is a test cluster and you won't be adding any larger OSD's in the future this shouldn't cause any problems. admin@cp-admin:~/safedrive$ ceph osd tree # idweighttype nameup/downreweight -10root default -20host osd-001 00osd.0up1 10osd.1up1 -30host osd-002 20osd.2up1 30osd.3up1 -40host osd-003 40osd.4up1 50osd.5up1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceiling on number of PGs in a OSD
This isn't a hard limit on the number, but it's recommended that you keep it around 100. Smaller values cause data distribution evenness problems. Larger values cause the OSD processes to use more CPU, RAM, and file descriptors, particularly during recovery. With that many OSDs, you're going to want to increase your sysctl's, particularly open file descriptors, open sockets, FDs per process, etc. You don't need the same number of placement groups for every pool. Pools without much data don't need as many PGs. For example, I have a bunch of pools for RGW zones, and they have 32 PGs each. I have a total of 2600 PGs, 2048 are in the .rgw.buckets pool. Also keep in mind that your pg_num and pgp_num need to be multipled by the number of replicas to get the PG per OSD count. I have 2600 PGs and replication 3, so I really have 7800 PGs spread over 72 OSDs. Assuming you have one big pool, 750 OSDs, and replication 3, I'd go with 32k PGs on the big pool. Same thing, but replication 2, I'd still go 32k, but prepare to expand PGs with your next addition of OSDs. If you're going to have several big pools (ie, you're using RGW and RDB heavily), I'd go with 16k PGs for the big pools, and adjust those over time depending on which is used more heavily. If RDB is consuming 2x the space, then increase it's pg_num and pgp_num during the next OSD expansion, but don't increase RGWs pg_num and pgp_num. The number of PGs per OSD should stay around 100 as you add OSDs. If you add 10x the OSDs, you'll multiple the pg_num and pgp_num by 10 too, which gives you the same number of PGs per OSD. My (pg_num / osd_num) fluctuates between 75 and 200, depending on when I do the pg_num and pgp_num increase relative to the OSD adds. When you increase pg_num and pgp_num, don't do a large jump. Ceph will only allow you to double the value. Even that is extreme. It will cause every OSD in the cluster to start splitting PGs. When you want to double your pg_num and pgp_num, it's recommended that you make several passes. I don't recall seeing any recommendations, but I'm planning to break my next increase up into 10 passes. I'm at 2048 now, so I'll probably add 204 PGs until I get to 4096. On Thu, Mar 19, 2015 at 6:12 AM, Sreenath BH bhsreen...@gmail.com wrote: Hi, Is there a celing on the number for number of placement groups in a OSD beyond which steady state and/or recovery performance will start to suffer? Example: I need to create a pool with 750 osds (25 OSD per server, 50 servers). The PG calculator gives me 65536 placement groups with 300 PGs per OSD. Now as the cluster expands, the number of PGs in a OSD has to increase as well. If the cluster size inceases by a factor of 10, the number of PGs per OSD will also need to be increased. What would be the impact of large pg number in a OSD on peering and rebalancing. There is 3GB per OSD available. thanks, Sreenath ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD Forece Removal
Yes that exactly what I did but ceph osd tree still shows the osds Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 On Mar 20, 2015, at 9:41 AM, John Spray john.sp...@redhat.commailto:john.sp...@redhat.com wrote: On 20/03/2015 15:23, Jesus Chavez (jeschave) wrote: Thanks stephane the thing is that those steps needs to be run in the node where the osd lives, I dont have that node any more since the operating Systems got corrupted so I Couldnt make it work :( Assuming the OSD is already down+out, you can skip straight to the Removing the OSD part[1] and run those commands from one of your mons. John 1. http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs issue
I like this idea. I was under the impression that udev did not call the init script, but ceph-disk directly. I don't see ceph-disk calling create-or-move, but I know it does because I see it in the ceph -w when I boot up OSDs. /lib/udev/rules.d/95-ceph-osd.rules # activate ceph-tagged partitions ACTION==add, SUBSYSTEM==block, \ ENV{DEVTYPE}==partition, \ ENV{ID_PART_ENTRY_TYPE}==4fbd7e29-9d25-41b8-afd0-062c0ceff05d, \ RUN+=/usr/sbin/ceph-disk-activate /dev/$name On Fri, Mar 20, 2015 at 2:36 PM, Craig Lewis cle...@centraldesktop.com wrote: This seems to be a fairly consistent problem for new users. The create-or-move is adjusting the crush weight, not the osd weight. Perhaps the init script should set the defaultweight to 0.01 if it's = 0? It seems like there's a downside to this, but I don't see it. On Fri, Mar 20, 2015 at 1:25 PM, Robert LeBlanc rob...@leblancnet.us wrote: The weight can be based on anything, size, speed, capability, some random value, etc. The important thing is that it makes sense to you and that you are consistent. Ceph by default (ceph-disk and I believe ceph-deploy) take the approach of using size. So if you use a different weighting scheme, you should manually add the OSDs, or clean up after using ceph-disk/ceph-deploy. Size works well for most people, unless the disks are less than 10 GB so most people don't bother messing with it. On Fri, Mar 20, 2015 at 12:06 PM, Bogdan SOLGA bogdan.so...@gmail.com wrote: Thank you for the clarifications, Sahana! I haven't got to that part, yet, so these details were (yet) unknown to me. Perhaps some information on the PGs weight should be provided in the 'quick deployment' page, as this issue might be encountered in the future by other users, as well. Kind regards, Bogdan On Fri, Mar 20, 2015 at 12:05 PM, Sahana shna...@gmail.com wrote: Hi Bogdan, Here is the link for hardware recccomendations : http://ceph.com/docs/master/start/hardware-recommendations/#hard-disk-drives. As per this link, minimum size reccommended for osds is 1TB. Butt as Nick said, Ceph OSDs must be min. 10GB to get an weight of 0.01 Here is the snippet from crushmaps section of ceph docs: Weighting Bucket Items Ceph expresses bucket weights as doubles, which allows for fine weighting. A weight is the relative difference between device capacities. We recommend using 1.00 as the relative weight for a 1TB storage device. In such a scenario, a weight of 0.5 would represent approximately 500GB, and a weight of 3.00 would represent approximately 3TB. Higher level buckets have a weight that is the sum total of the leaf items aggregated by the bucket. Thanks Sahana On Fri, Mar 20, 2015 at 2:08 PM, Bogdan SOLGA bogdan.so...@gmail.com wrote: Thank you for your suggestion, Nick! I have re-weighted the OSDs and the status has changed to '256 active+clean'. Is this information clearly stated in the documentation, and I have missed it? In case it isn't - I think it would be recommended to add it, as the issue might be encountered by other users, as well. Kind regards, Bogdan On Fri, Mar 20, 2015 at 10:33 AM, Nick Fisk n...@fisk.me.uk wrote: I see the Problem, as your OSD's are only 8GB they have a zero weight, I think the minimum size you can get away with is 10GB in Ceph as the size is measured in TB and only has 2 decimal places. For a work around try running :- ceph osd crush reweight osd.X 1 for each osd, this will reweight the OSD's. Assuming this is a test cluster and you won't be adding any larger OSD's in the future this shouldn't cause any problems. admin@cp-admin:~/safedrive$ ceph osd tree # idweighttype nameup/downreweight -10root default -20host osd-001 00osd.0up1 10osd.1up1 -30host osd-002 20osd.2up1 30osd.3up1 -40host osd-003 40osd.4up1 50osd.5up1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: OSD Forece Removal
Maybe I should Edit the crushmap and delete osd... Is that a way yo force them? Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 On Mar 20, 2015, at 2:21 PM, Robert LeBlanc rob...@leblancnet.usmailto:rob...@leblancnet.us wrote: Removing the OSD from the CRUSH map and deleting the auth key is how you force remove an OSD. The OSD can no longer participate in the cluster, even if it does come back to life. All clients forget about the OSD when the new CRUSH map is distributed. On Fri, Mar 20, 2015 at 11:19 AM, Jesus Chavez (jeschave) jesch...@cisco.commailto:jesch...@cisco.com wrote: Any idea how to forcé remove ? Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 Begin forwarded message: From: Stéphane DUGRAVOT stephane.dugra...@univ-lorraine.frmailto:stephane.dugra...@univ-lorraine.fr Date: March 20, 2015 at 3:49:11 AM CST To: Jesus Chavez (jeschave) jesch...@cisco.commailto:jesch...@cisco.com Cc: ceph-users ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] OSD Forece Removal Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Hi, Try manual : * http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual Thanks [cid:] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:%2B52%2055%205267%203146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [cid:] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PGs issue
I see the Problem, as your OSD's are only 8GB they have a zero weight, I think the minimum size you can get away with is 10GB in Ceph as the size is measured in TB and only has 2 decimal places. For a work around try running :- ceph osd crush reweight osd.X 1 for each osd, this will reweight the OSD's. Assuming this is a test cluster and you won't be adding any larger OSD's in the future this shouldn't cause any problems. admin@cp-admin:~/safedrive$ ceph osd tree # idweighttype nameup/downreweight -10root default -20host osd-001 00osd.0up1 10osd.1up1 -30host osd-002 20osd.2up1 30osd.3up1 -40host osd-003 40osd.4up1 50osd.5up1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cciss driver package for RHEL7
On 19/03/2015, at 17.46, O'Reilly, Dan daniel.orei...@dish.com wrote: The problem with using the hpsa driver is that I need to install RHEL 7.1 on a Proliant system using the SmartArray 400 controller. Therefore, I need a driver that supports it to even install RHEL 7.1. RHEL 7.1 doesn’t generically recognize that controller out of the box. I known, got the same issue when utilizing old proliants for test/PoC with newer SW. Maybe we should try to use such old raid ctlrs similar to this for OSD journaling and avoid wearability issues as with SSDs :) /Steffen From: Steffen W Sørensen [mailto:ste...@me.com] Sent: Thursday, March 19, 2015 10:08 AM To: O'Reilly, Dan Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] cciss driver package for RHEL7 On 19/03/2015, at 15.57, O'Reilly, Dan daniel.orei...@dish.com wrote: I understand there’s a KMOD_CCISS package available. However, I can’t find it for download. Anybody have any ideas? Oh I believe HP swapped cciss for hpsa (Smart Array) driver long ago… so maybe only download cciss latest source and then compile your self, or… Sourceforge says: *New* The cciss driver has been removed from RHEL7 and SLES12. If you really want cciss on RHEL7 checkout the elrepo directory. /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD + Flashcache + udev + Partition uuid
-Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Burkhard Linke Sent: 20 March 2015 09:09 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] OSD + Flashcache + udev + Partition uuid Hi, On 03/19/2015 10:41 PM, Nick Fisk wrote: I'm looking at trialling OSD's with a small flashcache device over them to hopefully reduce the impact of metadata updates when doing small block io. Inspiration from here:- http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/12083 One thing I suspect will happen, is that when the OSD node starts up udev could possibly mount the base OSD partition instead of flashcached device, as the base disk will have the ceph partition uuid type. This could result in quite nasty corruption. I ran into this problem with an enhanceio based cache for one of our database servers. I think you can prevent this problem by using bcache, which is also integrated into the official kernel tree. It does not act as a drop in replacement, but creates a new device that is only available if the cache is initialized correctly. A GPT partion table on the bcache device should be enough to allow the standard udev rules to kick in. I haven't used bcache in this scenario yet, and I cannot comment on its speed and reliability compared to other solutions. But from the operational point of view it is safer than enhanceio/flashcache. I did look at bcache, but there are a lot of worrying messages on the mailing list about hangs and panics that has discouraged me slightly from it. I do think it is probably the best solution, but I'm not convinced about the stability. Best regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RADOS Gateway Maturity
I have found a few incompatibilities, but so far they're all on the Ceph side. One example I remember was having to change the way we delete objects. The function we originally used fetches a list of object versions, and deletes all versions. Ceph is implementing objects versions now (I believe that'll ship with Hammer), so we had to call a different function to delete the object without iterating over the versions. AFAIK, that code should work fine if we point it at Amazon. I haven't tried it though. I've been using RGW (with replication) in production for 2 years now, although I'm not large. So far, all of my RGW issues have been Ceph issues. Most of my issues are caused by my under-powered hardware, or shooting myself in the foot with aggressive optimizations. Things are better with my journals on SSD, but the best thing I did was slow down with my changes. For example, I have 7 OSD nodes and 72 OSDs. When I add new OSDs, I add a couple at a time instead of adding all the disks in a node at once. Guess how I learned that lesson. :-) On Wed, Mar 18, 2015 at 10:03 AM, Jerry Lam jerry@oicr.on.ca wrote: Hi Chris, Thank you for your reply. We are also thinking about using the S3 API but we are concerned about how compatible it is with the real S3. For instance, we would like to design the system using pre-signed URL for storing some objects. I read the ceph documentation, it does not mention if it supports it or not. My question is do you guys find that the code using the RADOS S3 API can easily run in Amazon S3 without any change? If no, how much effort it is needed to make it compatible? Best Regards, Jerry From: Chris Jones cjo...@cloudm2.com Date: Tuesday, March 17, 2015 at 4:39 PM To: Jerry Lam jerry@oicr.on.ca Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com Subject: Re: [ceph-users] RADOS Gateway Maturity Hi Jerry, I currently work at Bloomberg and we currently have a very large Ceph installation in production and we use the S3 compatible API for rados gateway. We are also re-architecting our new RGW and evaluating a different Apache configuration for a little better performance. We only use replicas right now, no erasure coding yet. Actually, you can take a look at our current configuration at https://github.com/bloomberg/chef-bcpc. -Chris On Tue, Mar 17, 2015 at 10:40 AM, Jerry Lam jerry@oicr.on.ca wrote: Hi Ceph user, I’m new to Ceph but I need to use Ceph as the storage for the Cloud we are building in house. Did anyone use RADOS Gateway in production? How mature it is in terms of compatibility with S3 / Swift? Anyone can share their experience on it? Best Regards, Jerry ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best Regards, Chris Jones http://www.cloudm2.com cjo...@cloudm2.com (p) 770.655.0770 This message is intended exclusively for the individual or entity to which it is addressed. This communication may contain information that is proprietary, privileged or confidential or otherwise legally exempt from disclosure. If you are not the named addressee, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this message in error, please notify the sender immediately by e-mail and delete all copies of the message. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: OSD Forece Removal
Does it show DNE in the entry? That stands for Does Not Exist. It will disappear on it's own after a while. I don't know what the timeout is, but they have always gone away within 24 hours. I've edited the CRUSH map before and I don't think it removed it when it was already DNE, I just had to wait for it to go away on it's own. On Fri, Mar 20, 2015 at 3:55 PM, Jesus Chavez (jeschave) jesch...@cisco.com wrote: Maybe I should Edit the crushmap and delete osd... Is that a way yo force them? Thanks * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 +52%2055%205267%203146* Mobile: *+51 1 5538883255 +51%201%205538883255* CCIE - 44433 On Mar 20, 2015, at 2:21 PM, Robert LeBlanc rob...@leblancnet.us wrote: Removing the OSD from the CRUSH map and deleting the auth key is how you force remove an OSD. The OSD can no longer participate in the cluster, even if it does come back to life. All clients forget about the OSD when the new CRUSH map is distributed. On Fri, Mar 20, 2015 at 11:19 AM, Jesus Chavez (jeschave) jesch...@cisco.com wrote: Any idea how to forcé remove ? Thanks * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 +52%2055%205267%203146* Mobile: *+51 1 5538883255 +51%201%205538883255* CCIE - 44433 Begin forwarded message: *From:* Stéphane DUGRAVOT stephane.dugra...@univ-lorraine.fr *Date:* March 20, 2015 at 3:49:11 AM CST *To:* Jesus Chavez (jeschave) jesch...@cisco.com *Cc:* ceph-users ceph-users@lists.ceph.com *Subject:* *Re: [ceph-users] OSD Forece Removal* -- Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Hi, Try manual : - http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual Thanks * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 %2B52%2055%205267%203146* Mobile: *+51 1 5538883255* CCIE - 44433 Cisco.com http://www.cisco.com/ Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here http://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Uneven CPU usage on OSD nodes
I would say you're a little light on RAM. With 4TB disks 70% full, I've seen some ceph-osd processes using 3.5GB of RAM during recovery. You'll be fine during normal operation, but you might run into issues at the worst possible time. I have 8 OSDs per node, and 32G of RAM. I've had ceph-osd processes start swapping, and that's a great way to get them kicked out for being unresponsive. I'm not a dev, but I can make some wild and uninformed guesses :-) . The primary OSD uses more CPU than the replicas, and I suspect that you have more primaries on the hot nodes. Since you're testing, try repeating the test on 3 OSD nodes instead of 4. If you don't want to run that test, you can generate a histogram from ceph pg dump data, and see if there are more primary osds (the first one in the acting array) on the hot nodes. On Wed, Mar 18, 2015 at 7:18 AM, f...@univ-lr.fr f...@univ-lr.fr wrote: Hi to the ceph-users list ! We're setting up a new Ceph infrastructure : - 1 MDS admin node - 4 OSD storage nodes (60 OSDs) each of them running a monitor - 1 client Each 32GB RAM/16 cores OSD node supports 15 x 4TB SAS OSDs (XFS) and 1 SSD with 5GB journal partitions, all in JBOD attachement. Every node has 2x10Gb LACP attachement. The OSD nodes are freshly installed with puppet then from the admin node Default OSD weight in the OSD tree 1 test pool with 4096 PGs During setup phase, we're trying to qualify the performance characteristics of our setup. Rados benchmark are done from a client with these commandes : rados -p pool -b 4194304 bench 60 write -t 32 --no-cleanup rados -p pool -b 4194304 bench 60 seq -t 32 --no-cleanup Each time we observed a recurring phenomena : 2 of the 4 OSD nodes have twice the CPU load : http://www.4shared.com/photo/Ua0umPVbba/UnevenLoad.html (What to look at is the real-time %CPU and the cumulated CPU time per ceph-osd process) And after a fresh complete reinstall to be sure, this twice-as-high CPU load is observed but not on the same 2 nodes : http://www.4shared.com/photo/2AJfd1B_ba/UnevenLoad-v2.html Nothing obvious about the installation seems able to explain that. The crush distribution function doesn't have more than 4.5% inequality between the 4 OSD nodes for the primary OSDs of the objects, and less than 3% between the hosts if we considere the whole acting sets for the objects used during the benchmark. And the differences are not accordingly comparable to the CPU loads. So the cause has to be elsewhere. I cannot be sure it has no impact on performance. Even if we have enough CPU cores headroom, logic would say it has to have some consequences on delays and also on performances . Would someone have any idea, or reproduce the test on its setup to see if this is a common comportment ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD Forece Removal
This is the output if I try to remove from the crush map it says that a is already out… [root@capricornio ~]# ceph osd crush remove osd.29 device 'osd.29' does not appear in the crush map [root@capricornio ~]# [root@capricornio ~]# ceph osd tree | grep down # idweight type name up/down reweight 6 3.63osd.6 down0 16 3.63osd.16 down0 26 3.63osd.26 down0 36 3.63osd.36 down0 46 3.63osd.46 down0 56 3.63osd.56 down0 66 3.63osd.66 down0 76 3.63osd.76 down0 86 3.63osd.86 down0 96 3.63osd.96 down0 106 3.63osd.106 down0 116 3.63osd.116 down0 9 0 osd.9 down0 19 0 osd.19 down0 29 0 osd.29 down0 39 0 osd.39 down0 49 0 osd.49 down0 59 0 osd.59 down0 69 0 osd.69 down0 79 0 osd.79 down0 89 0 osd.89 down0 99 0 osd.99 down0 109 0 osd.109 down0 119 0 osd.119 down0 [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [cid:image006.gif@01D00809.A6D502D0] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. On Mar 20, 2015, at 4:13 PM, Robert LeBlanc rob...@leblancnet.usmailto:rob...@leblancnet.us wrote: Does it show DNE in the entry? That stands for Does Not Exist. It will disappear on it's own after a while. I don't know what the timeout is, but they have always gone away within 24 hours. I've edited the CRUSH map before and I don't think it removed it when it was already DNE, I just had to wait for it to go away on it's own. On Fri, Mar 20, 2015 at 3:55 PM, Jesus Chavez (jeschave) jesch...@cisco.commailto:jesch...@cisco.com wrote: Maybe I should Edit the crushmap and delete osd... Is that a way yo force them? Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 On Mar 20, 2015, at 2:21 PM, Robert LeBlanc rob...@leblancnet.usmailto:rob...@leblancnet.us wrote: Removing the OSD from the CRUSH map and deleting the auth key is how you force remove an OSD. The OSD can no longer participate in the cluster, even if it does come back to life. All clients forget about the OSD when the new CRUSH map is distributed. On Fri, Mar 20, 2015 at 11:19 AM, Jesus Chavez (jeschave) jesch...@cisco.commailto:jesch...@cisco.com wrote: Any idea how to forcé remove ? Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 Begin forwarded message: From: Stéphane DUGRAVOT stephane.dugra...@univ-lorraine.frmailto:stephane.dugra...@univ-lorraine.fr Date: March 20, 2015 at 3:49:11 AM CST To: Jesus Chavez (jeschave) jesch...@cisco.commailto:jesch...@cisco.com Cc: ceph-users ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] OSD Forece Removal Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Hi, Try manual : * http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual Thanks [X] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:%2B52%2055%205267%203146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [X] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click
Re: [ceph-users] OSD Forece Removal
thats what you sayd? [root@capricornio ~]# ceph auth del osd.9 entity osd.9 does not exist [root@capricornio ~]# ceph auth del osd.19 entity osd.19 does not exist [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [cid:image006.gif@01D00809.A6D502D0] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. On Mar 20, 2015, at 4:13 PM, Robert LeBlanc rob...@leblancnet.usmailto:rob...@leblancnet.us wrote: Does it show DNE in the entry? That stands for Does Not Exist. It will disappear on it's own after a while. I don't know what the timeout is, but they have always gone away within 24 hours. I've edited the CRUSH map before and I don't think it removed it when it was already DNE, I just had to wait for it to go away on it's own. On Fri, Mar 20, 2015 at 3:55 PM, Jesus Chavez (jeschave) jesch...@cisco.commailto:jesch...@cisco.com wrote: Maybe I should Edit the crushmap and delete osd... Is that a way yo force them? Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 On Mar 20, 2015, at 2:21 PM, Robert LeBlanc rob...@leblancnet.usmailto:rob...@leblancnet.us wrote: Removing the OSD from the CRUSH map and deleting the auth key is how you force remove an OSD. The OSD can no longer participate in the cluster, even if it does come back to life. All clients forget about the OSD when the new CRUSH map is distributed. On Fri, Mar 20, 2015 at 11:19 AM, Jesus Chavez (jeschave) jesch...@cisco.commailto:jesch...@cisco.com wrote: Any idea how to forcé remove ? Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 Begin forwarded message: From: Stéphane DUGRAVOT stephane.dugra...@univ-lorraine.frmailto:stephane.dugra...@univ-lorraine.fr Date: March 20, 2015 at 3:49:11 AM CST To: Jesus Chavez (jeschave) jesch...@cisco.commailto:jesch...@cisco.com Cc: ceph-users ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] OSD Forece Removal Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Hi, Try manual : * http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual Thanks [X] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:%2B52%2055%205267%203146 Mobile: +51 1 5538883255 CCIE - 44433 Cisco.comhttp://www.cisco.com/ [X] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click herehttp://www.cisco.com/web/about/doing_business/legal/cri/index.html for Company Registration Information. ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Question Blackout
I'm not a CephFS user, but I have had a few cluster outages. Each OSD has a journal, and Ceph ensures that a write is in all of the journals (primary and replicas) before it acknowledges the write. If an OSD process crashes, it replays the journal on startup, and recovers the write. I've lost power at my data center, and had the whole cluster down. Ceph came back up when power was restored without me getting involved. You might want the paid support package. For extra piece of mind, you can get a paid cluster review, and an engineer will go through your use case with you. On Tue, Mar 17, 2015 at 8:32 PM, Jesus Chavez (jeschave) jesch...@cisco.com wrote: Hi everyone, I am ready to launch ceph on production but there is one thing that keeps on my mind... If there was a Blackout where all the ceph nodes went off what would really happen with the filesystem? It would get corrupt? Or ceph has any Kind of mechanism to survive to something like that? Thanks * Jesus Chavez* SYSTEMS ENGINEER-C.SALES jesch...@cisco.com Phone: *+52 55 5267 3146 +52%2055%205267%203146* Mobile: *+51 1 5538883255 +51%201%205538883255* CCIE - 44433 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com