[ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type
Hi, I'm trying to add CEPH as Primary Storage, but my libvirt 0.10.2 (CentOS 6.5) does some complaints: - internal error missing backend for pool type 8 Is it possible that the libvirt 0.10.2 (shipped with CentOS 6.5) was not compiled with RBD support ? Can't find how to check this... I'm able to use qemu-img to create rbd images etc... Here is cloudstack-agent DEBUG output, all seems fine... pool type='rbd' name1e119e4c-20d1-3fbc-a525-a5771944046d/name uuid1e119e4c-20d1-3fbc-a525-a5771944046d/uuid source host name='10.44.253.10' port='6789'/ namecloudstack/name auth username='cloudstack' type='ceph' secret uuid='1e119e4c-20d1-3fbc-a525-a5771944046d'/ /auth /source /pool -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type
Thank you very much Wido, any suggestion on compiling libvirt with support (I already found a way) or perhaps use some prebuilt , that you would recommend ? Best On 28 April 2014 13:25, Wido den Hollander w...@42on.com wrote: On 04/28/2014 12:49 PM, Andrija Panic wrote: Hi, I'm trying to add CEPH as Primary Storage, but my libvirt 0.10.2 (CentOS 6.5) does some complaints: - internal error missing backend for pool type 8 Is it possible that the libvirt 0.10.2 (shipped with CentOS 6.5) was not compiled with RBD support ? Can't find how to check this... No, it's probably not compiled with RBD storage pool support. As far as I know CentOS doesn't compile libvirt with that support yet. I'm able to use qemu-img to create rbd images etc... Here is cloudstack-agent DEBUG output, all seems fine... pool type='rbd' name1e119e4c-20d1-3fbc-a525-a5771944046d/name uuid1e119e4c-20d1-3fbc-a525-a5771944046d/uuid source host name='10.44.253.10' port='6789'/ I recommend creating a Round Robin DNS record which points to all your monitors. namecloudstack/name auth username='cloudstack' type='ceph' secret uuid='1e119e4c-20d1-3fbc-a525-a5771944046d'/ /auth /source /pool -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type
Thanks Dan :) On 28 April 2014 15:02, Dan van der Ster daniel.vanders...@cern.ch wrote: On 28/04/14 14:54, Wido den Hollander wrote: On 04/28/2014 02:15 PM, Andrija Panic wrote: Thank you very much Wido, any suggestion on compiling libvirt with support (I already found a way) or perhaps use some prebuilt , that you would recommend ? No special suggestions, just make sure you use at least Ceph 0.67.7 I'm not aware of any pre-build packages for CentOS. Look for qemu-kvm-rhev ... el6 ... That's the Redhat built version of kvm which supports RBD. Cheers, Dan -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unable to add CEPH as Primary Storage - libvirt error undefined storage pool type
Dan, is this maybe just rbd support for kvm package (I already have rbd enabled qemu, qemu-img etc from ceph.com site) I need just libvirt with rbd support ? Thanks On 28 April 2014 15:05, Andrija Panic andrija.pa...@gmail.com wrote: Thanks Dan :) On 28 April 2014 15:02, Dan van der Ster daniel.vanders...@cern.chwrote: On 28/04/14 14:54, Wido den Hollander wrote: On 04/28/2014 02:15 PM, Andrija Panic wrote: Thank you very much Wido, any suggestion on compiling libvirt with support (I already found a way) or perhaps use some prebuilt , that you would recommend ? No special suggestions, just make sure you use at least Ceph 0.67.7 I'm not aware of any pre-build packages for CentOS. Look for qemu-kvm-rhev ... el6 ... That's the Redhat built version of kvm which supports RBD. Cheers, Dan -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD not starting at boot time
Hi, I was wondering why would OSDs not start at the boot time, happens on 1 server (2 OSDs). If i check with: chkconfig ceph --list, I can see that is should start, that is, the MON on this server does really start but OSDs does not. I can normally start them with: service ceph start osd.X This is CentOS 6.5, and CEPH 0.72.2 deployed with ceph deploy tool. I did not forget the ceph osd activate... for sure. Thanks -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Migrate system VMs from local storage to CEPH
Hi. I was wondering what would be correct way to migrate system VMs (storage,console,VR) from local storage to CEPH. I'm on CS 4.2.1 and will be soon updating to 4.3... Is it enough to just change global setting system.vm.use.local.storage = true, to FALSE, and then destroy system VMs (cloudstack will recreate them in 1-2 minutes) Also how to make sure that system VMs will NOT end up on NFS storage ? Thanks for any input... -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrate system VMs from local storage to CEPH
Thank you very much Wido, that's exatly what I was looking for. Thanks On 4 May 2014 18:30, Wido den Hollander w...@42on.com wrote: On 05/02/2014 04:06 PM, Andrija Panic wrote: Hi. I was wondering what would be correct way to migrate system VMs (storage,console,VR) from local storage to CEPH. I'm on CS 4.2.1 and will be soon updating to 4.3... Is it enough to just change global setting system.vm.use.local.storage = true, to FALSE, and then destroy system VMs (cloudstack will recreate them in 1-2 minutes) Yes, that would be sufficient. CloudStack will then deploy the SSVMs on your RBD storage. Also how to make sure that system VMs will NOT end up on NFS storage ? Make use of the tagging. Tag the RBD pools with 'rbd' and change the Service Offering for the SSVMs where they require 'rbd' as a storage tag. Thanks for any input... -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrate system VMs from local storage to CEPH
Will try creating tag inside CS database, since GUI/cloudmoneky editing of existing offer is NOT possible... On 5 May 2014 16:04, Brian Rak b...@gameservers.com wrote: This would be a better question for the Cloudstack community. On 5/2/2014 10:06 AM, Andrija Panic wrote: Hi. I was wondering what would be correct way to migrate system VMs (storage,console,VR) from local storage to CEPH. I'm on CS 4.2.1 and will be soon updating to 4.3... Is it enough to just change global setting system.vm.use.local.storage = true, to FALSE, and then destroy system VMs (cloudstack will recreate them in 1-2 minutes) Also how to make sure that system VMs will NOT end up on NFS storage ? Thanks for any input... -- Andrija Panić ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrate system VMs from local storage to CEPH
Hi Wido, thanks again for inputs. Everything is fine, except for the Software Router - it doesn't seem to get created on CEPH, no matter what I try. I created new offering for CPVV and SSVM and used the guide here: https://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html-single/Admin_Guide/index.html#sys-offering-sysvmto start using these new system offerings and it is all fine. Did the same for Software Router, but it keeps using original system offering, instead of the one I created. CS keeps creating VR on NFS storage, choosen randomly among 3 NFS storage nodes... Any suggestion, please ? Thanks, Andrija On 5 May 2014 16:11, Andrija Panic andrija.pa...@gmail.com wrote: Will try creating tag inside CS database, since GUI/cloudmoneky editing of existing offer is NOT possible... On 5 May 2014 16:04, Brian Rak b...@gameservers.com wrote: This would be a better question for the Cloudstack community. On 5/2/2014 10:06 AM, Andrija Panic wrote: Hi. I was wondering what would be correct way to migrate system VMs (storage,console,VR) from local storage to CEPH. I'm on CS 4.2.1 and will be soon updating to 4.3... Is it enough to just change global setting system.vm.use.local.storage = true, to FALSE, and then destroy system VMs (cloudstack will recreate them in 1-2 minutes) Also how to make sure that system VMs will NOT end up on NFS storage ? Thanks for any input... -- Andrija Panić ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Replace journals disk
If you have dedicated disk for Journal, that you want to replace - consider (this may be not optimal, but crosses my mind...) stoping OSD (if that is possible), maybe with no-out etc, then DD old disk to new one, and just resize file system and partitions if needed... I guess there is more elegant than this manual steps... Cheers On 6 May 2014 12:52, Gandalf Corvotempesta gandalf.corvotempe...@gmail.comwrote: 2014-05-06 12:39 GMT+02:00 Andrija Panic andrija.pa...@gmail.com: Good question - I'm also interested. Do you want to movejournal to dedicated disk/partition i.e. on SSD or just replace (failed) disk with new/bigger one ? I would like to replace the disk with a bigger one (in fact, my new disk is smaller, but this should not change the workflow) -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrate system VMs from local storage to CEPH
I appologize, I did post to wrong mailing list, to much emails these days :) @Wido, yes I did check and there is separete offering butyou can't change it the same way you change for CPVM and SSVM... Will post to CS mailing, sorry for this.. On 6 May 2014 17:52, Wido den Hollander w...@42on.com wrote: On 05/05/2014 11:40 PM, Andrija Panic wrote: Hi Wido, thanks again for inputs. Everything is fine, except for the Software Router - it doesn't seem to get created on CEPH, no matter what I try. There is a separate offering for the VR, have you checked that? But this is more something for the CloudStack users list as it's not related to Ceph. Wido I created new offering for CPVV and SSVM and used the guide here: https://cloudstack.apache.org/docs/en-US/Apache_CloudStack/ 4.2.0/html-single/Admin_Guide/index.html#sys-offering-sysvm to start using these new system offerings and it is all fine. Did the same for Software Router, but it keeps using original system offering, instead of the one I created. CS keeps creating VR on NFS storage, choosen randomly among 3 NFS storage nodes... Any suggestion, please ? Thanks, Andrija On 5 May 2014 16:11, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: Will try creating tag inside CS database, since GUI/cloudmoneky editing of existing offer is NOT possible... On 5 May 2014 16:04, Brian Rak b...@gameservers.com mailto:b...@gameservers.com wrote: This would be a better question for the Cloudstack community. On 5/2/2014 10:06 AM, Andrija Panic wrote: Hi. I was wondering what would be correct way to migrate system VMs (storage,console,VR) from local storage to CEPH. I'm on CS 4.2.1 and will be soon updating to 4.3... Is it enough to just change global setting system.vm.use.local.storage = true, to FALSE, and then destroy system VMs (cloudstack will recreate them in 1-2 minutes) Also how to make sure that system VMs will NOT end up on NFS storage ? Thanks for any input... -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] NFS over CEPH - best practice
Mapping RBD image to 2 or more servers is the same as a shared storage device (SAN) - so from there on, you could do any clustering you want, based on what Wido said... On 7 May 2014 12:43, Andrei Mikhailovsky and...@arhont.com wrote: Wido, would this work if I were to run nfs over two or more servers with virtual IP? I can see what you've suggested working in a one server setup. What about if you want to have two nfs servers in an active/backup or active/active setup? Thanks Andrei -- *From: *Wido den Hollander w...@42on.com *To: *ceph-users@lists.ceph.com *Sent: *Wednesday, 7 May, 2014 11:15:39 AM *Subject: *Re: [ceph-users] NFS over CEPH - best practice On 05/07/2014 11:46 AM, Andrei Mikhailovsky wrote: Hello guys, I would like to offer NFS service to the XenServer and VMWare hypervisors for storing vm images. I am currently running ceph rbd with kvm, which is working reasonably well. What would be the best way of running NFS services over CEPH, so that the XenServer and VMWare's vm disk images are stored in ceph storage over NFS? Use kernel RBD, put XFS on it an re-export that with NFS? Would that be something that works? I'd however suggest that you use a recent kernel so that you have a new version of krbd. For example Ubuntu 14.04 LTS. Many thanks Andrei ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] qemu-img break cloudstack snapshot
Hi, just to share my issue with qemu-img provided by CEPH (RedHat made a problem, not CEPH): newest qemu-img - /qemu-img-0.12.1.2-2.415.el6.3ceph.x86_64.rpm was built from RHEL 6.5 source code, where Redhat removed the -s paramter, so snapshooting in CloudStack up to 4.2.1 does not work, I guess there are also problems with OpenStack... Older CEPH's RPM for qemu-img that I have, that is working fine (I suppose it was built based on RHEL 6.4 source) is qemu-img-0.12.1.2-2.355.el6.2.cuttlefish.x86_64.rpm Raised a ticket, although this is not a problem caused by CEPH, but by RedHat. Ticket was raised on hope for CEPH's developers to provide a older qemu-img that works fine (the one that I have) - or possibly to compile new one based on RHEL 6.4 source. http://tracker.ceph.com/issues/8329 Best, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] client: centos6.4 no rbd.ko
Try 3.x from elrepo repo...works for me, cloudstack/ceph... Sent from Google Nexus 4 On May 14, 2014 11:56 AM, maoqi1982 maoqi1...@126.com wrote: Hi list our ceph(0.72) cluster use ubuntu12.04 is ok . client server run openstack install CentOS6.4 final, the kernel is up to kernel-2.6.32-358.123.2.openstack.el6.x86_64. the question is the kernel does not support the rbd.ko ceph.ko. can anyone help me to add the rbd.ko ceph.ko in kernel-2.6.32-358.123.2.openstack.el6.x86_64 or other way except up kernel thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cluster status reported wrongly as HEALTH_WARN
Hi, I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data, network also fine: Ceph ceph-0.72.2. When I issue ceph status command, I get randomly HEALTH_OK, and imidiately after that when repeating command, I get HEALTH_WARN Examle given down - these commands were issues within less than 1 sec between them There are NO occuring of word warn in the logs (grep -ir warn /var/log/ceph) on any of the servers... I get false alerts with my status monitoring script, for this reason... Any help would be greatly appriciated. Thanks, [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_OK monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 17331 kB/s rd, 113 kB/s wr, 176 op/s [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_WARN monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 28383 kB/s rd, 566 kB/s wr, 321 op/s [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_OK monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 21632 kB/s rd, 49354 B/s wr, 283 op/s -- Andrija Panić -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
Hi Christian, that seems true, thanks. But again, there are only occurence in GZ logs files (that were logrotated, not in current log files): Example: [root@cs2 ~]# grep -ir WRN /var/log/ceph/ Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140614.gz matches Binary file /var/log/ceph/ceph.log-20140611.gz matches Binary file /var/log/ceph/ceph.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140613.gz matches Thanks, Andrija On 17 June 2014 10:48, Christian Balzer ch...@gol.com wrote: Hello, On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: Hi, I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data, network also fine: Ceph ceph-0.72.2. When I issue ceph status command, I get randomly HEALTH_OK, and imidiately after that when repeating command, I get HEALTH_WARN Examle given down - these commands were issues within less than 1 sec between them There are NO occuring of word warn in the logs (grep -ir warn /var/log/ceph) on any of the servers... I get false alerts with my status monitoring script, for this reason... If I recall correctly, the logs will show INF, WRN and ERR, so grep for WRN. Regards, Christian Any help would be greatly appriciated. Thanks, [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_OK monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 17331 kB/s rd, 113 kB/s wr, 176 op/s [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_WARN monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 28383 kB/s rd, 566 kB/s wr, 321 op/s [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_OK monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 21632 kB/s rd, 49354 B/s wr, 283 op/s -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
Hi, thanks for that, but is not space issue: OSD drives are only 12% full. and /var drive on which MON lives is over 70% only on CS3 server, but I have increased alert treshold in ceph.conf (mon data avail warn = 15, mon data avail crit = 5), and since I increased them those alerts are gone (anyway, these alerts for /var full over 70% can be normally seen in logs and in ceph -w output). Here I get no normal/visible warning in eather logs or ceph -w output... Thanks, Andrija On 17 June 2014 11:00, Stanislav Yanchev s.yanc...@maxtelecom.bg wrote: Try grep in cs1 and cs3 could be a disk space issue. Regards, *Stanislav Yanchev* Core System Administrator [image: MAX TELECOM] Mobile: +359 882 549 441 s.yanc...@maxtelecom.bg www.maxtelecom.bg *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Andrija Panic *Sent:* Tuesday, June 17, 2014 11:57 AM *To:* Christian Balzer *Cc:* ceph-users@lists.ceph.com *Subject:* Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN Hi Christian, that seems true, thanks. But again, there are only occurence in GZ logs files (that were logrotated, not in current log files): Example: [root@cs2 ~]# grep -ir WRN /var/log/ceph/ Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140614.gz matches Binary file /var/log/ceph/ceph.log-20140611.gz matches Binary file /var/log/ceph/ceph.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140613.gz matches Thanks, Andrija On 17 June 2014 10:48, Christian Balzer ch...@gol.com wrote: Hello, On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: Hi, I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data, network also fine: Ceph ceph-0.72.2. When I issue ceph status command, I get randomly HEALTH_OK, and imidiately after that when repeating command, I get HEALTH_WARN Examle given down - these commands were issues within less than 1 sec between them There are NO occuring of word warn in the logs (grep -ir warn /var/log/ceph) on any of the servers... I get false alerts with my status monitoring script, for this reason... If I recall correctly, the logs will show INF, WRN and ERR, so grep for WRN. Regards, Christian Any help would be greatly appriciated. Thanks, [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_OK monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 17331 kB/s rd, 113 kB/s wr, 176 op/s [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_WARN monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 28383 kB/s rd, 566 kB/s wr, 321 op/s [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_OK monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 21632 kB/s rd, 49354 B/s wr, 283 op/s -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ -- Andrija Panić -- http://admintweets.com -- http://gfidisc.maxtelecom.bg *Confidentiality notice* -- The information contained in this message (including any attachments) is confidential and may be legally privileged or otherwise protected from disclosure. This message is intended solely for the addressee(s). If you are not the intended recipient, please notify the sender by return e-mail and delete this message from your system. Any unauthorised use, reproduction, or dissemination of this message is strictly prohibited. Any liability arising from any third party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. Please note that e-mails are susceptible to change. Max Telecom shall not be liable for the improper or incomplete transmission
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
Hi Gregory, indeed - I still have warnings about 20% free space on CS3 server, where MON lives...strange is that I don't get these warnings with prolonged ceph -w output... [root@cs2 ~]# ceph health detail HEALTH_WARN mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk space! I don't understand, how is this possible to get warnings - I have folowing in each ceph.conf file, under the general section: mon data avail warn = 15 mon data avail crit = 5 I found this settings on ceph mailing list... Thanks a lot, Andrija On 17 June 2014 19:22, Gregory Farnum g...@inktank.com wrote: Try running ceph health detail on each of the monitors. Your disk space thresholds probably aren't configured correctly or something. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi, thanks for that, but is not space issue: OSD drives are only 12% full. and /var drive on which MON lives is over 70% only on CS3 server, but I have increased alert treshold in ceph.conf (mon data avail warn = 15, mon data avail crit = 5), and since I increased them those alerts are gone (anyway, these alerts for /var full over 70% can be normally seen in logs and in ceph -w output). Here I get no normal/visible warning in eather logs or ceph -w output... Thanks, Andrija On 17 June 2014 11:00, Stanislav Yanchev s.yanc...@maxtelecom.bg wrote: Try grep in cs1 and cs3 could be a disk space issue. Regards, *Stanislav Yanchev* Core System Administrator [image: MAX TELECOM] Mobile: +359 882 549 441 s.yanc...@maxtelecom.bg www.maxtelecom.bg *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Andrija Panic *Sent:* Tuesday, June 17, 2014 11:57 AM *To:* Christian Balzer *Cc:* ceph-users@lists.ceph.com *Subject:* Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN Hi Christian, that seems true, thanks. But again, there are only occurence in GZ logs files (that were logrotated, not in current log files): Example: [root@cs2 ~]# grep -ir WRN /var/log/ceph/ Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140614.gz matches Binary file /var/log/ceph/ceph.log-20140611.gz matches Binary file /var/log/ceph/ceph.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140613.gz matches Thanks, Andrija On 17 June 2014 10:48, Christian Balzer ch...@gol.com wrote: Hello, On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: Hi, I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data, network also fine: Ceph ceph-0.72.2. When I issue ceph status command, I get randomly HEALTH_OK, and imidiately after that when repeating command, I get HEALTH_WARN Examle given down - these commands were issues within less than 1 sec between them There are NO occuring of word warn in the logs (grep -ir warn /var/log/ceph) on any of the servers... I get false alerts with my status monitoring script, for this reason... If I recall correctly, the logs will show INF, WRN and ERR, so grep for WRN. Regards, Christian Any help would be greatly appriciated. Thanks, [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_OK monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 17331 kB/s rd, 113 kB/s wr, 176 op/s [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_WARN monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 28383 kB/s rd, 566 kB/s wr, 321 op/s [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_OK monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 21632 kB/s rd, 49354 B/s wr, 283 op/s -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ -- Andrija Panić
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
As stupid as I could do it... After lowering mon data . from 20% to 15% treshold, it seems I forgot to restart MON service on this one node... I appologies for bugging and thanks again everybody. Andrija On 18 June 2014 09:49, Andrija Panic andrija.pa...@gmail.com wrote: Hi Gregory, indeed - I still have warnings about 20% free space on CS3 server, where MON lives...strange is that I don't get these warnings with prolonged ceph -w output... [root@cs2 ~]# ceph health detail HEALTH_WARN mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk space! I don't understand, how is this possible to get warnings - I have folowing in each ceph.conf file, under the general section: mon data avail warn = 15 mon data avail crit = 5 I found this settings on ceph mailing list... Thanks a lot, Andrija On 17 June 2014 19:22, Gregory Farnum g...@inktank.com wrote: Try running ceph health detail on each of the monitors. Your disk space thresholds probably aren't configured correctly or something. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi, thanks for that, but is not space issue: OSD drives are only 12% full. and /var drive on which MON lives is over 70% only on CS3 server, but I have increased alert treshold in ceph.conf (mon data avail warn = 15, mon data avail crit = 5), and since I increased them those alerts are gone (anyway, these alerts for /var full over 70% can be normally seen in logs and in ceph -w output). Here I get no normal/visible warning in eather logs or ceph -w output... Thanks, Andrija On 17 June 2014 11:00, Stanislav Yanchev s.yanc...@maxtelecom.bg wrote: Try grep in cs1 and cs3 could be a disk space issue. Regards, *Stanislav Yanchev* Core System Administrator [image: MAX TELECOM] Mobile: +359 882 549 441 s.yanc...@maxtelecom.bg www.maxtelecom.bg *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Andrija Panic *Sent:* Tuesday, June 17, 2014 11:57 AM *To:* Christian Balzer *Cc:* ceph-users@lists.ceph.com *Subject:* Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN Hi Christian, that seems true, thanks. But again, there are only occurence in GZ logs files (that were logrotated, not in current log files): Example: [root@cs2 ~]# grep -ir WRN /var/log/ceph/ Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140614.gz matches Binary file /var/log/ceph/ceph.log-20140611.gz matches Binary file /var/log/ceph/ceph.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140613.gz matches Thanks, Andrija On 17 June 2014 10:48, Christian Balzer ch...@gol.com wrote: Hello, On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: Hi, I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data, network also fine: Ceph ceph-0.72.2. When I issue ceph status command, I get randomly HEALTH_OK, and imidiately after that when repeating command, I get HEALTH_WARN Examle given down - these commands were issues within less than 1 sec between them There are NO occuring of word warn in the logs (grep -ir warn /var/log/ceph) on any of the servers... I get false alerts with my status monitoring script, for this reason... If I recall correctly, the logs will show INF, WRN and ERR, so grep for WRN. Regards, Christian Any help would be greatly appriciated. Thanks, [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_OK monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 17331 kB/s rd, 113 kB/s wr, 176 op/s [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_WARN monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 28383 kB/s rd, 566 kB/s wr, 321 op/s [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_OK monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379913: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576
Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN
Thanks Greg, seems like I'm going to update soon... Thanks again, Andrija On 18 June 2014 14:06, Gregory Farnum g...@inktank.com wrote: The lack of warnings in ceph -w for this issue is a bug in Emperor. It's resolved in Firefly. -Greg On Wed, Jun 18, 2014 at 3:49 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi Gregory, indeed - I still have warnings about 20% free space on CS3 server, where MON lives...strange is that I don't get these warnings with prolonged ceph -w output... [root@cs2 ~]# ceph health detail HEALTH_WARN mon.cs3 addr 10.44.xxx.12:6789/0 has 20% avail disk space -- low disk space! I don't understand, how is this possible to get warnings - I have folowing in each ceph.conf file, under the general section: mon data avail warn = 15 mon data avail crit = 5 I found this settings on ceph mailing list... Thanks a lot, Andrija On 17 June 2014 19:22, Gregory Farnum g...@inktank.com wrote: Try running ceph health detail on each of the monitors. Your disk space thresholds probably aren't configured correctly or something. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Jun 17, 2014 at 2:09 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi, thanks for that, but is not space issue: OSD drives are only 12% full. and /var drive on which MON lives is over 70% only on CS3 server, but I have increased alert treshold in ceph.conf (mon data avail warn = 15, mon data avail crit = 5), and since I increased them those alerts are gone (anyway, these alerts for /var full over 70% can be normally seen in logs and in ceph -w output). Here I get no normal/visible warning in eather logs or ceph -w output... Thanks, Andrija On 17 June 2014 11:00, Stanislav Yanchev s.yanc...@maxtelecom.bg wrote: Try grep in cs1 and cs3 could be a disk space issue. Regards, Stanislav Yanchev Core System Administrator Mobile: +359 882 549 441 s.yanc...@maxtelecom.bg www.maxtelecom.bg From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Andrija Panic Sent: Tuesday, June 17, 2014 11:57 AM To: Christian Balzer Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Cluster status reported wrongly as HEALTH_WARN Hi Christian, that seems true, thanks. But again, there are only occurence in GZ logs files (that were logrotated, not in current log files): Example: [root@cs2 ~]# grep -ir WRN /var/log/ceph/ Binary file /var/log/ceph/ceph-mon.cs2.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140614.gz matches Binary file /var/log/ceph/ceph.log-20140611.gz matches Binary file /var/log/ceph/ceph.log-20140612.gz matches Binary file /var/log/ceph/ceph.log-20140613.gz matches Thanks, Andrija On 17 June 2014 10:48, Christian Balzer ch...@gol.com wrote: Hello, On Tue, 17 Jun 2014 10:30:44 +0200 Andrija Panic wrote: Hi, I have 3 node (2 OSD per node) CEPH cluster, running fine, not much data, network also fine: Ceph ceph-0.72.2. When I issue ceph status command, I get randomly HEALTH_OK, and imidiately after that when repeating command, I get HEALTH_WARN Examle given down - these commands were issues within less than 1 sec between them There are NO occuring of word warn in the logs (grep -ir warn /var/log/ceph) on any of the servers... I get false alerts with my status monitoring script, for this reason... If I recall correctly, the logs will show INF, WRN and ERR, so grep for WRN. Regards, Christian Any help would be greatly appriciated. Thanks, [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_OK monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379904: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 17331 kB/s rd, 113 kB/s wr, 176 op/s [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_WARN monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx.12:6789/0}, election epoch 122, quorum 0,1,2 cs1,cs2,cs3 osdmap e890: 6 osds: 6 up, 6 in pgmap v2379905: 448 pgs, 4 pools, 862 GB data, 217 kobjects 2576 GB used, 19732 GB / 22309 GB avail 448 active+clean client io 28383 kB/s rd, 566 kB/s wr, 321 op/s [root@cs3 ~]# ceph status cluster cab20370-bf6a-4589-8010-8d5fc8682eab health HEALTH_OK monmap e2: 3 mons at {cs1=10.44.xxx.10:6789/0,cs2=10.44.xxx.11:6789/0,cs3=10.44.xxx
[ceph-users] Mixing CEPH versions on new ceph nodes...
Hi, I have existing CEPH cluster of 3 nodes, versions 0.72.2 I'm in a process of installing CEPH on 4th node, but now CEPH version is 0.80.1 Will this make problems running mixed CEPH versions ? I intend to upgrade CEPH on exsiting 3 nodes anyway ? Recommended steps ? Thanks -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mixing CEPH versions on new ceph nodes...
Hi Wido, thanks for answers - I have mons and OSD on each host... server1: mon + 2 OSDs, same for server2 and server3. Any Proposed upgrade path, or just start with 1 server and move along to others ? Thanks again. Andrija On 2 July 2014 16:34, Wido den Hollander w...@42on.com wrote: On 07/02/2014 04:08 PM, Andrija Panic wrote: Hi, I have existing CEPH cluster of 3 nodes, versions 0.72.2 I'm in a process of installing CEPH on 4th node, but now CEPH version is 0.80.1 Will this make problems running mixed CEPH versions ? No, but the recommendation is not to have this running for a very long period. Try to upgrade all nodes to the same version within a reasonable amount of time. I intend to upgrade CEPH on exsiting 3 nodes anyway ? Recommended steps ? Always upgrade the monitors first! Then to the OSDs one by one. Thanks -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mixing CEPH versions on new ceph nodes...
Thanks a lot Wido, will do... Andrija On 3 July 2014 13:12, Wido den Hollander w...@42on.com wrote: On 07/03/2014 10:59 AM, Andrija Panic wrote: Hi Wido, thanks for answers - I have mons and OSD on each host... server1: mon + 2 OSDs, same for server2 and server3. Any Proposed upgrade path, or just start with 1 server and move along to others ? Upgrade the packages, but don't restart the daemons yet, then: 1. Restart the mon leader 2. Restart the two other mons 3. Restart all the OSDs one by one I suggest that you wait for the cluster to become fully healthy again before restarting the next OSD. Wido Thanks again. Andrija On 2 July 2014 16:34, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: On 07/02/2014 04:08 PM, Andrija Panic wrote: Hi, I have existing CEPH cluster of 3 nodes, versions 0.72.2 I'm in a process of installing CEPH on 4th node, but now CEPH version is 0.80.1 Will this make problems running mixed CEPH versions ? No, but the recommendation is not to have this running for a very long period. Try to upgrade all nodes to the same version within a reasonable amount of time. I intend to upgrade CEPH on exsiting 3 nodes anyway ? Recommended steps ? Always upgrade the monitors first! Then to the OSDs one by one. Thanks -- Andrija Panić _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902 Skype: contact42on _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mixing CEPH versions on new ceph nodes...
Wido, one final question: since I compiled libvirt1.2.3 usinfg ceph-devel 0.72 - do I need to recompile libvirt again now with ceph-devel 0.80 ? Perhaps not smart question, but need to make sure I don't screw something... Thanks for your time, Andrija On 3 July 2014 14:27, Andrija Panic andrija.pa...@gmail.com wrote: Thanks a lot Wido, will do... Andrija On 3 July 2014 13:12, Wido den Hollander w...@42on.com wrote: On 07/03/2014 10:59 AM, Andrija Panic wrote: Hi Wido, thanks for answers - I have mons and OSD on each host... server1: mon + 2 OSDs, same for server2 and server3. Any Proposed upgrade path, or just start with 1 server and move along to others ? Upgrade the packages, but don't restart the daemons yet, then: 1. Restart the mon leader 2. Restart the two other mons 3. Restart all the OSDs one by one I suggest that you wait for the cluster to become fully healthy again before restarting the next OSD. Wido Thanks again. Andrija On 2 July 2014 16:34, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: On 07/02/2014 04:08 PM, Andrija Panic wrote: Hi, I have existing CEPH cluster of 3 nodes, versions 0.72.2 I'm in a process of installing CEPH on 4th node, but now CEPH version is 0.80.1 Will this make problems running mixed CEPH versions ? No, but the recommendation is not to have this running for a very long period. Try to upgrade all nodes to the same version within a reasonable amount of time. I intend to upgrade CEPH on exsiting 3 nodes anyway ? Recommended steps ? Always upgrade the monitors first! Then to the OSDs one by one. Thanks -- Andrija Panić _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902 Skype: contact42on _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mixing CEPH versions on new ceph nodes...
Thanks again a lot. On 3 July 2014 15:20, Wido den Hollander w...@42on.com wrote: On 07/03/2014 03:07 PM, Andrija Panic wrote: Wido, one final question: since I compiled libvirt1.2.3 usinfg ceph-devel 0.72 - do I need to recompile libvirt again now with ceph-devel 0.80 ? Perhaps not smart question, but need to make sure I don't screw something... No, no need to. The librados API didn't change in case you are using RBD storage pool support. Otherwise it just talks to Qemu and that talks to librbd/librados. Wido Thanks for your time, Andrija On 3 July 2014 14:27, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: Thanks a lot Wido, will do... Andrija On 3 July 2014 13:12, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: On 07/03/2014 10:59 AM, Andrija Panic wrote: Hi Wido, thanks for answers - I have mons and OSD on each host... server1: mon + 2 OSDs, same for server2 and server3. Any Proposed upgrade path, or just start with 1 server and move along to others ? Upgrade the packages, but don't restart the daemons yet, then: 1. Restart the mon leader 2. Restart the two other mons 3. Restart all the OSDs one by one I suggest that you wait for the cluster to become fully healthy again before restarting the next OSD. Wido Thanks again. Andrija On 2 July 2014 16:34, Wido den Hollander w...@42on.com mailto:w...@42on.com mailto:w...@42on.com mailto:w...@42on.com wrote: On 07/02/2014 04:08 PM, Andrija Panic wrote: Hi, I have existing CEPH cluster of 3 nodes, versions 0.72.2 I'm in a process of installing CEPH on 4th node, but now CEPH version is 0.80.1 Will this make problems running mixed CEPH versions ? No, but the recommendation is not to have this running for a very long period. Try to upgrade all nodes to the same version within a reasonable amount of time. I intend to upgrade CEPH on exsiting 3 nodes anyway ? Recommended steps ? Always upgrade the monitors first! Then to the OSDs one by one. Thanks -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com mailto:ceph-us...@lists.ceph.__com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph._ ___com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902 tel:%2B31%20%280%2920%20700%__209902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com mailto:ceph-us...@lists.ceph.__com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph._ ___com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902 Skype: contact42on -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- -- Wido den Hollander Ceph consultant and trainer 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users
[ceph-users] [URGENT]. Can't connect to CEPH after upgrade from 0.72 to 0.80
Hi, Sorry to bother, but I have urgent situation: upgraded CEPH from 0.72 to 0.80 (centos 6.5), and now all my CloudStack HOSTS can not connect. I did basic yum update ceph on the first MON leader, and all CEPH services on that HOST, have been restarted - done same on other CEPH nodes (I have 1MON + 2 OSD per physical host), then I have set variables to optimal with ceph osd crush tunables optimal and after some rebalancing, ceph shows HEALTH_OK. Also, I can create new images with qemu-img -f rbd rbd:/cloudstack Libvirt 1.2.3 was compiled while ceph was 0.72, but I got instructions from Wido that I don't need to REcompile now with ceph 0.80... Libvirt logs: libvirt: Storage Driver error : Storage pool not found: no storage pool with matching uuid ÎhyJ~`a*× Note there are some strange uuid - not sure what is happening ? Did I forget to do something after CEPH upgrade ? Any help will be VERY much appriciated... Andrija -- Andrija Panić -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [URGENT]. Can't connect to CEPH after upgrade from 0.72 to 0.80
Hi Mark, actually, CEPH is running fine, and I have deployed NEW host (new compile libvirt with ceph 0.8 devel, and newer kernel) - and it works... so migrating some VMs to this new host... I have 3 physical hosts, that are both MON and 2x OSD per host, all3 don't work-cloudstack/libvirt... Any suggestion on need to recompile libvirt ? I got info from Wido, that libvirt does NOT need to be recompiled Best On 13 July 2014 08:35, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 13/07/14 17:07, Andrija Panic wrote: Hi, Sorry to bother, but I have urgent situation: upgraded CEPH from 0.72 to 0.80 (centos 6.5), and now all my CloudStack HOSTS can not connect. I did basic yum update ceph on the first MON leader, and all CEPH services on that HOST, have been restarted - done same on other CEPH nodes (I have 1MON + 2 OSD per physical host), then I have set variables to optimal with ceph osd crush tunables optimal and after some rebalancing, ceph shows HEALTH_OK. Also, I can create new images with qemu-img -f rbd rbd:/cloudstack Libvirt 1.2.3 was compiled while ceph was 0.72, but I got instructions from Wido that I don't need to REcompile now with ceph 0.80... Libvirt logs: libvirt: Storage Driver error : Storage pool not found: no storage pool with matching uuid ‡ÎhyšJŠ~`a*× Note there are some strange uuid - not sure what is happening ? Did I forget to do something after CEPH upgrade ? Have you got any ceph logs to examine on the host running libvirt? When I try to connect a v0.72 client to v0.81 cluster I get: 2014-07-13 18:21:23.860898 7fc3bd2ca700 0 -- 192.168.122.41:0/1002012 192.168.122.21:6789/0 pipe(0x7fc3c00241f0 sd=3 :49451 s=1 pgs=0 cs=0 l=1 c=0x7fc3c0024450).connect protocol feature mismatch, my f peer 5f missing 50 Regards Mark -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [URGENT]. Can't connect to CEPH after upgrade from 0.72 to 0.80
Hi Mark, update: after restarting libvirtd and cloudstack-agent and management server God know how many times - it WORKS now ! Not sure what is happening here, but it works again... I know for sure it was not CEPH cluster, since it was fine, and accessible via qemu-img, etc... Thanks Mark for your time for my issue... Best. Andrija On 13 July 2014 10:20, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: On 13/07/14 19:15, Mark Kirkwood wrote: On 13/07/14 18:38, Andrija Panic wrote: Any suggestion on need to recompile libvirt ? I got info from Wido, that libvirt does NOT need to be recompiled Thinking about this a bit more - Wido *may* have meant: - *libvirt* does not need to be rebuild - ...but you need to get/build a later ceph client i.e - 0.80 Of course depending on how your libvirt build was set up (e.g static linkage), this *might* have meant you needed to rebuild it too. Regards Mark -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time
Hi, after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush tunables optimal and after only few minutes I have added 2 more OSDs to the CEPH cluster... So these 2 changes were more or a less done at the same time - rebalancing because of tunables optimal, and rebalancing because of adding new OSD... Result - all VMs living on CEPH storage have gone mad, no disk access efectively, blocked so to speak. Since this rebalancing took 5h-6h, I had bunch of VMs down for that long... Did I do wrong by causing 2 rebalancing to happen at the same time ? Is this behaviour normal, to cause great load on all VMs because they are unable to access CEPH storage efectively ? Thanks for any input... -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mixing CEPH versions on new ceph nodes...
Hi Wido, you said previously: Upgrade the packages, but don't restart the daemons yet, then: 1. Restart the mon leader 2. Restart the two other mons 3. Restart all the OSDs one by one But in reality (yum update or by using ceph-deploy install nodename) - the package manager does restart ALL ceph services on that node by its own... So, I have upgraded - MON leader and 2 OSD on this 1st upgraded host were restarted, folowed by doing the same with other 2 servers (1 MON peon and 2 OSD per host). Is this perhaps a package (RPM) bug - restarting daemons automatically ? Since it makes sense to have all MONs updated first, and than OSD (and perhaps after that MDS if using it...) Upgraded to 0.80.3 release btw. Thanks for your help again. Andrija On 3 July 2014 15:21, Andrija Panic andrija.pa...@gmail.com wrote: Thanks again a lot. On 3 July 2014 15:20, Wido den Hollander w...@42on.com wrote: On 07/03/2014 03:07 PM, Andrija Panic wrote: Wido, one final question: since I compiled libvirt1.2.3 usinfg ceph-devel 0.72 - do I need to recompile libvirt again now with ceph-devel 0.80 ? Perhaps not smart question, but need to make sure I don't screw something... No, no need to. The librados API didn't change in case you are using RBD storage pool support. Otherwise it just talks to Qemu and that talks to librbd/librados. Wido Thanks for your time, Andrija On 3 July 2014 14:27, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: Thanks a lot Wido, will do... Andrija On 3 July 2014 13:12, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: On 07/03/2014 10:59 AM, Andrija Panic wrote: Hi Wido, thanks for answers - I have mons and OSD on each host... server1: mon + 2 OSDs, same for server2 and server3. Any Proposed upgrade path, or just start with 1 server and move along to others ? Upgrade the packages, but don't restart the daemons yet, then: 1. Restart the mon leader 2. Restart the two other mons 3. Restart all the OSDs one by one I suggest that you wait for the cluster to become fully healthy again before restarting the next OSD. Wido Thanks again. Andrija On 2 July 2014 16:34, Wido den Hollander w...@42on.com mailto:w...@42on.com mailto:w...@42on.com mailto:w...@42on.com wrote: On 07/02/2014 04:08 PM, Andrija Panic wrote: Hi, I have existing CEPH cluster of 3 nodes, versions 0.72.2 I'm in a process of installing CEPH on 4th node, but now CEPH version is 0.80.1 Will this make problems running mixed CEPH versions ? No, but the recommendation is not to have this running for a very long period. Try to upgrade all nodes to the same version within a reasonable amount of time. I intend to upgrade CEPH on exsiting 3 nodes anyway ? Recommended steps ? Always upgrade the monitors first! Then to the OSDs one by one. Thanks -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com mailto:ceph-us...@lists.ceph.__com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph._ ___com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902 tel:%2B31%20%280%2920%20700%__209902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com mailto:ceph-us...@lists.ceph.__com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph._ ___com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph
Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time
Hi Andrei, nice to meet you again ;) Thanks for sharing this info with me - I though it was my mistake by introducing new OSD components at the same time - I though that since it's rebalancing, let's add those new OSD, so it also rebalances - so I don't have to cause 2 data rebalancing - but during normal OSD restart and data rebalancing (I did not set osd noout etc...) I did have somehat lower VM performacne, but it was all UP and fine. Also 30% of data moving during my upgrade/tunables change... although documents say 10% as you said. Did not lost any data, but finding all VMs that use CEPH as storage, is somewhat PITA... So, any CEPH developers input would be greatly appriciated... Thanks agan for such detailed info, Andrija On 14 July 2014 10:52, Andrei Mikhailovsky and...@arhont.com wrote: Hi Andrija, I've got at least two more stories of similar nature. One is my friend running a ceph cluster and one is from me. Both of our clusters are pretty small. My cluster has only two osd servers with 8 osds each, 3 mons. I have an ssd journal per 4 osds. My friend has a cluster of 3 mons and 3 osd servers with 4 osds each and an ssd per 4 osds as well. Both clusters are connected with 40gbit/s IP over Infiniband links. We had the same issue while upgrading to firefly. However, we did not add any new disks, just ran the ceph osd crush tunables optimal command after following an upgrade. Both of our clusters were down as far as the virtual machines are concerned. All vms have crashed because of the lack of IO. It was a bit problematic, taking into account that ceph is typically so great at staying alive during failures and upgrades. So, there seems to be a problem with the upgrade. I wish devs would have added a big note in red letters that if you run this command it will likely affect your cluster performance and most likely all your vms will die. So, please shutdown your vms if you do not want to have data loss. I've changed the default values to reduce the load during recovery and also to tune a few things performance wise. My settings were: osd recovery max chunk = 8388608 osd recovery op priority = 2 osd max backfills = 1 osd recovery max active = 1 osd recovery threads = 1 osd disk threads = 2 filestore max sync interval = 10 filestore op threads = 20 filestore_flusher = false However, this didn't help much and i've noticed that shortly after running the tunnables command my guest vms iowait has quickly jumped to 50% and a to 99% a minute after. This has happened on all vms at once. During the recovery phase I ran the rbd -p poolname ls -l command several times and it took between 20-40 minutes to complete. It typically takes less than 2 seconds when the cluster is not in recovery mode. My mate's cluster had the same tunables apart from the last three. He had exactly the same behaviour. One other thing that i've noticed is that somewhere in the docs I've read that running the tunnable optimal command should move not more than 10% of your data. However, in both of our cases our status was just over 30% degraded and it took a good part of 9 hours to complete the data reshuffling. Any comments from the ceph team or other ceph gurus on: 1. What have we done wrong in our upgrade process 2. What options should we have used to keep our vms alive Cheers Andrei -- *From: *Andrija Panic andrija.pa...@gmail.com *To: *ceph-users@lists.ceph.com *Sent: *Sunday, 13 July, 2014 9:54:17 PM *Subject: *[ceph-users] ceph osd crush tunables optimal AND add new OSD at thesame time Hi, after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush tunables optimal and after only few minutes I have added 2 more OSDs to the CEPH cluster... So these 2 changes were more or a less done at the same time - rebalancing because of tunables optimal, and rebalancing because of adding new OSD... Result - all VMs living on CEPH storage have gone mad, no disk access efectively, blocked so to speak. Since this rebalancing took 5h-6h, I had bunch of VMs down for that long... Did I do wrong by causing 2 rebalancing to happen at the same time ? Is this behaviour normal, to cause great load on all VMs because they are unable to access CEPH storage efectively ? Thanks for any input... -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time
Perhaps here: http://ceph.com/releases/v0-80-firefly-released/ Thanks On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote: I've added some additional notes/warnings to the upgrade and release notes: https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451 If there is somewhere else where you think a warning flag would be useful, let me know! Generally speaking, we want to be able to cope with huge data rebalances without interrupting service. It's an ongoing process of improving the recovery vs client prioritization, though, and removing sources of overhead related to rebalancing... and it's clearly not perfect yet. :/ sage On Sun, 13 Jul 2014, Andrija Panic wrote: Hi, after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush tunables optimal and after only few minutes I have added 2 more OSDs to the CEPH cluster... So these 2 changes were more or a less done at the same time - rebalancing because of tunables optimal, and rebalancing because of adding new OSD... Result - all VMs living on CEPH storage have gone mad, no disk access efectively, blocked so to speak. Since this rebalancing took 5h-6h, I had bunch of VMs down for that long... Did I do wrong by causing 2 rebalancing to happen at the same time ? Is this behaviour normal, to cause great load on all VMs because they are unable to access CEPH storage efectively ? Thanks for any input... -- Andrija Pani? -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time
Udo, I had all VMs completely unoperational - so don't set optimal for now... On 14 July 2014 20:48, Udo Lembke ulem...@polarzone.de wrote: Hi, which values are all changed with ceph osd crush tunables optimal? Is it perhaps possible to change some parameter the weekends before the upgrade is running, to have more time? (depends if the parameter are available in 0.72...). The warning told, it's can take days... we have an cluster with 5 storage node and 12 4TB-osd-disk each (60 osd), replica 2. The cluster is 60% filled. Networkconnection 10Gb. Takes tunables optimal in such an configuration one, two or more days? Udo On 14.07.2014 18:18, Sage Weil wrote: I've added some additional notes/warnings to the upgrade and release notes: https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451 If there is somewhere else where you think a warning flag would be useful, let me know! Generally speaking, we want to be able to cope with huge data rebalances without interrupting service. It's an ongoing process of improving the recovery vs client prioritization, though, and removing sources of overhead related to rebalancing... and it's clearly not perfect yet. :/ sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time
Hi Sage, since this problem is tunables-related, do we need to expect same behavior or not when we do regular data rebalancing caused by adding new/removing OSD? I guess not, but would like your confirmation. I'm already on optimal tunables, but I'm afraid to test this by i.e. shuting down 1 OSD. Thanks, Andrija On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote: I've added some additional notes/warnings to the upgrade and release notes: https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451 If there is somewhere else where you think a warning flag would be useful, let me know! Generally speaking, we want to be able to cope with huge data rebalances without interrupting service. It's an ongoing process of improving the recovery vs client prioritization, though, and removing sources of overhead related to rebalancing... and it's clearly not perfect yet. :/ sage On Sun, 13 Jul 2014, Andrija Panic wrote: Hi, after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush tunables optimal and after only few minutes I have added 2 more OSDs to the CEPH cluster... So these 2 changes were more or a less done at the same time - rebalancing because of tunables optimal, and rebalancing because of adding new OSD... Result - all VMs living on CEPH storage have gone mad, no disk access efectively, blocked so to speak. Since this rebalancing took 5h-6h, I had bunch of VMs down for that long... Did I do wrong by causing 2 rebalancing to happen at the same time ? Is this behaviour normal, to cause great load on all VMs because they are unable to access CEPH storage efectively ? Thanks for any input... -- Andrija Pani? -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.80.4 Firefly released
Hi Sage, can anyone validate, if there is still bug inside RPMs that does automatic CEPH service restart after updating packages ? We are instructed to first update/restart MONs, and after that OSD - but that is impossible if we have MON+OSDs on same host...since the ceph is automaticaly restarted with YUM/RPM, but NOT automaticaly restarted on Ubuntu/Debian (as reported by some other list memeber...) Thanks On 16 July 2014 01:45, Sage Weil s...@inktank.com wrote: This Firefly point release fixes an potential data corruption problem when ceph-osd daemons run on top of XFS and service Firefly librbd clients. A recently added allocation hint that RBD utilizes triggers an XFS bug on some kernels (Linux 3.2, and likely others) that leads to data corruption and deep-scrub errors (and inconsistent PGs). This release avoids the situation by disabling the allocation hint until we can validate which kernels are affected and/or are known to be safe to use the hint on. We recommend that all v0.80.x Firefly users urgently upgrade, especially if they are using RBD. Notable Changes --- * osd: disable XFS extsize hint by default (#8830, Samuel Just) * rgw: fix extra data pool default name (Yehuda Sadeh) For more detailed information, see: http://ceph.com/docs/master/_downloads/v0.80.4.txt Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://ceph.com/download/ceph-0.80.4.tar.gz * For packages, see http://ceph.com/docs/master/install/get-packages * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time
For me, 3 nodes, 1MON+ 2x2TB OSDs on each node... no mds used... I went through pain of waiting for data rebalancing and now I'm on optimal tunables... Cheers On 16 July 2014 14:29, Andrei Mikhailovsky and...@arhont.com wrote: Quenten, We've got two monitors sitting on the osd servers and one on a different server. Andrei -- Andrei Mikhailovsky Director Arhont Information Security Web: http://www.arhont.com http://www.wi-foo.com Tel: +44 (0)870 4431337 Fax: +44 (0)208 429 3111 PGP: Key ID - 0x2B3438DE PGP: Server - keyserver.pgp.com DISCLAIMER The information contained in this email is intended only for the use of the person(s) to whom it is addressed and may be confidential or contain legally privileged information. If you are not the intended recipient you are hereby notified that any perusal, use, distribution, copying or disclosure is strictly prohibited. If you have received this email in error please immediately advise us by return email at and...@arhont.com and delete and purge the email and any attachments without making a copy. -- *From: *Quenten Grasso qgra...@onq.com.au *To: *Andrija Panic andrija.pa...@gmail.com, Sage Weil sw...@redhat.com *Cc: *ceph-users@lists.ceph.com *Sent: *Wednesday, 16 July, 2014 1:20:19 PM *Subject: *Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time Hi Sage, Andrija List I have seen the tuneables issue on our cluster when I upgraded to firefly. I ended up going back to legacy settings after about an hour as my cluster is of 55 3TB OSD’s over 5 nodes and it decided it needed to move around 32% of our data, which after an hour all of our vm’s were frozen and I had to revert the change back to legacy settings and wait about the same time again until our cluster had recovered and reboot our vms. (wasn’t really expecting that one from the patch notes) Also our CPU usage went through the roof as well on our nodes, do you per chance have your metadata servers co-located on your osd nodes as we do? I’ve been thinking about trying to move these to dedicated nodes as it may resolve our issues. Regards, Quenten *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf Of *Andrija Panic *Sent:* Tuesday, 15 July 2014 8:38 PM *To:* Sage Weil *Cc:* ceph-users@lists.ceph.com *Subject:* Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at the same time Hi Sage, since this problem is tunables-related, do we need to expect same behavior or not when we do regular data rebalancing caused by adding new/removing OSD? I guess not, but would like your confirmation. I'm already on optimal tunables, but I'm afraid to test this by i.e. shuting down 1 OSD. Thanks, Andrija On 14 July 2014 18:18, Sage Weil sw...@redhat.com wrote: I've added some additional notes/warnings to the upgrade and release notes: https://github.com/ceph/ceph/commit/fc597e5e3473d7db6548405ce347ca7732832451 If there is somewhere else where you think a warning flag would be useful, let me know! Generally speaking, we want to be able to cope with huge data rebalances without interrupting service. It's an ongoing process of improving the recovery vs client prioritization, though, and removing sources of overhead related to rebalancing... and it's clearly not perfect yet. :/ sage On Sun, 13 Jul 2014, Andrija Panic wrote: Hi, after seting ceph upgrade (0.72.2 to 0.80.3) I have issued ceph osd crush tunables optimal and after only few minutes I have added 2 more OSDs to the CEPH cluster... So these 2 changes were more or a less done at the same time - rebalancing because of tunables optimal, and rebalancing because of adding new OSD... Result - all VMs living on CEPH storage have gone mad, no disk access efectively, blocked so to speak. Since this rebalancing took 5h-6h, I had bunch of VMs down for that long... Did I do wrong by causing 2 rebalancing to happen at the same time ? Is this behaviour normal, to cause great load on all VMs because they are unable to access CEPH storage efectively ? Thanks for any input... -- Andrija Pani? -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Show IOps per VM/client to find heavy users...
Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Thanks Wido, yes I'm aware of CloudStack in that sense, but would prefer some precise OP/s per ceph Image at least... Will check CloudStack then... Thx On 8 August 2014 13:53, Wido den Hollander w...@42on.com wrote: On 08/08/2014 01:51 PM, Andrija Panic wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? This is not very easy to do with Ceph, but CloudStack keeps track of this in the usage database. With never versions of CloudStack you can also limit the IOps of Instances to prevent such situations. Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Hm, true... One final question, I might be a noob... 13923 B/s rd, 4744 kB/s wr, 1172 op/s what does this op/s represent - is it classic IOps (4k reads/writes) or something else ? how much is too much :) - I'm familiar with SATA/SSD IO/s specs/tests, etc, but not sure what CEPH menas by op/s - could not find anything with google... Thanks again Wido. Andrija On 8 August 2014 14:07, Wido den Hollander w...@42on.com wrote: On 08/08/2014 02:02 PM, Andrija Panic wrote: Thanks Wido, yes I'm aware of CloudStack in that sense, but would prefer some precise OP/s per ceph Image at least... Will check CloudStack then... Ceph doesn't really know that since RBD is just a layer on top of RADOS. In the end the CloudStack hypervisors are doing I/O towards RADOS objects, so giving exact stats of how many IOps you are seeing per image is hard to figure out. The hypervisor knows this best since it sees all the I/O going through. Wido Thx On 8 August 2014 13:53, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: On 08/08/2014 01:51 PM, Andrija Panic wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? This is not very easy to do with Ceph, but CloudStack keeps track of this in the usage database. With never versions of CloudStack you can also limit the IOps of Instances to prevent such situations. Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 tel:%2B31%20%280%2920%20700%209902 Skype: contact42on _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Hi Dan, thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done... This seems to read only gziped logs? so since read only I guess it is safe to run it on proudction cluster now... ? The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ? Thanks a lot. Andrija On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Thanks again, and btw, beside being Friday I'm also on vacation - so double the joy of troubleshooting performance problmes :))) Thx :) On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote: Hi Dan, thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done... This seems to read only gziped logs? Well it’s pretty simple, and it zcat’s each input file. So yes, only gz files in the current script. But you can change that pretty trivially ;) so since read only I guess it is safe to run it on proudction cluster now… ? I personally don’t do anything new on a Friday just before leaving ;) But its just grepping the log files, so start with one, then two, then... The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ? Yup, what I do is gather all of the OSD logs for a single day in a single directory (in CephFS ;), then run that script on all of the OSDs. It takes awhile, but it will give you the overall daily totals for the whole cluster. If you are only trying to find the top users, then it is sufficient to check a subset of OSDs, since by their nature the client IOs are spread across most/all OSDs. Cheers, Dan Thanks a lot. Andrija On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Will do so definitively, thanks Wido and Dan... Cheers guys On 8 August 2014 16:13, Wido den Hollander w...@42on.com wrote: On 08/08/2014 03:44 PM, Dan Van Der Ster wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/ tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. +1 I'd strongly advise to set I/O limits for Instances. I've had multiple occasions where a runaway script inside a VM was hammering on the underlying storage killing all I/O. Not only with Ceph, but over the many years I've worked with storage. I/O == expensive CloudStack supports I/O limiting, so I recommend you set a limit. Set it to 750 write IOps for example. That way one Instance can't kill the whole cluster, but it still has enough I/O to run. (usually). Wido Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
Hi Dan, the script provided seems to not work on my ceph cluster :( This is ceph version 0.80.3 I get empty results, on both debug level 10 and the maximum level of 20... [root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz Writes per OSD: Writes per pool: Writes per PG: Writes per RBD: Writes per object: Writes per length: . . . On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote: Hi Dan, thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done... This seems to read only gziped logs? Well it’s pretty simple, and it zcat’s each input file. So yes, only gz files in the current script. But you can change that pretty trivially ;) so since read only I guess it is safe to run it on proudction cluster now… ? I personally don’t do anything new on a Friday just before leaving ;) But its just grepping the log files, so start with one, then two, then... The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ? Yup, what I do is gather all of the OSD logs for a single day in a single directory (in CephFS ;), then run that script on all of the OSDs. It takes awhile, but it will give you the overall daily totals for the whole cluster. If you are only trying to find the top users, then it is sufficient to check a subset of OSDs, since by their nature the client IOs are spread across most/all OSDs. Cheers, Dan Thanks a lot. Andrija On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
I appologize, clicked the Send button to fast... Anyway, I can see there are lines in log file: 2014-08-11 12:43:25.477693 7f022d257700 10 filestore(/var/lib/ceph/osd/ceph-0) write 3.48_head/14b1ca48/rbd_data.41e16619f5eb6.1bd1/head//3 3641344~4608 = 4608 Not sure if I can do anything to fix this... ? Thanks, Andrija On 11 August 2014 12:46, Andrija Panic andrija.pa...@gmail.com wrote: Hi Dan, the script provided seems to not work on my ceph cluster :( This is ceph version 0.80.3 I get empty results, on both debug level 10 and the maximum level of 20... [root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz Writes per OSD: Writes per pool: Writes per PG: Writes per RBD: Writes per object: Writes per length: . . . On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote: Hi Dan, thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done... This seems to read only gziped logs? Well it’s pretty simple, and it zcat’s each input file. So yes, only gz files in the current script. But you can change that pretty trivially ;) so since read only I guess it is safe to run it on proudction cluster now… ? I personally don’t do anything new on a Friday just before leaving ;) But its just grepping the log files, so start with one, then two, then... The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ? Yup, what I do is gather all of the OSD logs for a single day in a single directory (in CephFS ;), then run that script on all of the OSDs. It takes awhile, but it will give you the overall daily totals for the whole cluster. If you are only trying to find the top users, then it is sufficient to check a subset of OSDs, since by their nature the client IOs are spread across most/all OSDs. Cheers, Dan Thanks a lot. Andrija On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Show IOps per VM/client to find heavy users...
That's better :D Thanks a lot, now I will be able to troubleshoot my problem :) Thanks Dan, Andrija On 11 August 2014 13:21, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, I changed the script to be a bit more flexible with the osd path. Give this a try again: https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 11 Aug 2014, at 12:48, Andrija Panic andrija.pa...@gmail.com wrote: I appologize, clicked the Send button to fast... Anyway, I can see there are lines in log file: 2014-08-11 12:43:25.477693 7f022d257700 10 filestore(/var/lib/ceph/osd/ceph-0) write 3.48_head/14b1ca48/rbd_data.41e16619f5eb6.1bd1/head//3 3641344~4608 = 4608 Not sure if I can do anything to fix this... ? Thanks, Andrija On 11 August 2014 12:46, Andrija Panic andrija.pa...@gmail.com wrote: Hi Dan, the script provided seems to not work on my ceph cluster :( This is ceph version 0.80.3 I get empty results, on both debug level 10 and the maximum level of 20... [root@cs1 ~]# ./rbd-io-stats.pl /var/log/ceph/ceph-osd.0.log-20140811.gz Writes per OSD: Writes per pool: Writes per PG: Writes per RBD: Writes per object: Writes per length: . . . On 8 August 2014 16:01, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, On 08 Aug 2014, at 15:55, Andrija Panic andrija.pa...@gmail.com wrote: Hi Dan, thank you very much for the script, will check it out...no thortling so far, but I guess it will have to be done... This seems to read only gziped logs? Well it’s pretty simple, and it zcat’s each input file. So yes, only gz files in the current script. But you can change that pretty trivially ;) so since read only I guess it is safe to run it on proudction cluster now… ? I personally don’t do anything new on a Friday just before leaving ;) But its just grepping the log files, so start with one, then two, then... The script will also check for mulitply OSDs as far as I can understadn, not just osd.0 given in script comment ? Yup, what I do is gather all of the OSD logs for a single day in a single directory (in CephFS ;), then run that script on all of the OSDs. It takes awhile, but it will give you the overall daily totals for the whole cluster. If you are only trying to find the top users, then it is sufficient to check a subset of OSDs, since by their nature the client IOs are spread across most/all OSDs. Cheers, Dan Thanks a lot. Andrija On 8 August 2014 15:44, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, Here’s what we do to identify our top RBD users. First, enable log level 10 for the filestore so you can see all the IOs coming from the VMs. Then use a script like this (used on a dumpling cluster): https://github.com/cernceph/ceph-scripts/blob/master/tools/rbd-io-stats.pl to summarize the osd logs and identify the top clients. Then its just a matter of scripting to figure out the ops/sec per volume, but for us at least the main use-case has been to identify who is responsible for a new peak in overall ops — and daily-granular statistics from the above script tends to suffice. BTW, do you throttle your clients? We found that its absolutely necessary, since without a throttle just a few active VMs can eat up the entire iops capacity of the cluster. Cheers, Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- On 08 Aug 2014, at 13:51, Andrija Panic andrija.pa...@gmail.com wrote: Hi, we just had some new clients, and have suffered very big degradation in CEPH performance for some reasons (we are using CloudStack). I'm wondering if there is way to monitor OP/s or similar usage by client connected, so we can isolate the heavy client ? Also, what is the general best practice to monitor these kind of changes in CEPH ? I'm talking about R/W or OP/s change or similar... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Multiply OSDs per host strategy ?
Hi, I have 2 x 2TB disks, in 3 servers, so total of 6 disks... I have deployed total of 6 OSDs. ie: host1 = osd.0 and osd.1 host2 = osd.2 and osd.3 host4 = osd.4 and osd.5 Now, since I will have total of 3 replica (original + 2 replicas), I want my replica placement to be such, that I don't end up having 2 replicas on 1 host (replica on osd0, osd1 (both on host1) and replica on osd2. I want all 3 replicas spread on different hosts... I know this is to be done via crush maps, but I'm not sure if it would be better to have 2 pools, 1 pool on osd0,2,4 and and another pool on osd1,3,5. If possible, I would want only 1 pool, spread across all 6 OSDs, but with data placement such, that I don't end up having 2 replicas on 1 host...not sure if this is possible at all... Is that possible, or maybe I should go for RAID0 in each server (2 x 2Tb = 4TB for osd0) or maybe JBOD (1 volume, so 1 OSD per host) ? Any suggesting about best practice ? Regards, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multiply OSDs per host strategy ?
well, nice one :) *step chooseleaf firstn 0 type host* - it is the part of default crush map (3 hosts, 2 OSDs per host) It means: write 3 replicas (in my case) to 3 hosts...and randomly select OSD from each host ? I already read all the docs...and still not sure how to proceed... On 16 October 2013 23:27, Mike Dawson mike.daw...@cloudapt.com wrote: Andrija, You can use a single pool and the proper CRUSH rule step chooseleaf firstn 0 type host to accomplish your goal. http://ceph.com/docs/master/**rados/operations/crush-map/http://ceph.com/docs/master/rados/operations/crush-map/ Cheers, Mike Dawson On 10/16/2013 5:16 PM, Andrija Panic wrote: Hi, I have 2 x 2TB disks, in 3 servers, so total of 6 disks... I have deployed total of 6 OSDs. ie: host1 = osd.0 and osd.1 host2 = osd.2 and osd.3 host4 = osd.4 and osd.5 Now, since I will have total of 3 replica (original + 2 replicas), I want my replica placement to be such, that I don't end up having 2 replicas on 1 host (replica on osd0, osd1 (both on host1) and replica on osd2. I want all 3 replicas spread on different hosts... I know this is to be done via crush maps, but I'm not sure if it would be better to have 2 pools, 1 pool on osd0,2,4 and and another pool on osd1,3,5. If possible, I would want only 1 pool, spread across all 6 OSDs, but with data placement such, that I don't end up having 2 replicas on 1 host...not sure if this is possible at all... Is that possible, or maybe I should go for RAID0 in each server (2 x 2Tb = 4TB for osd0) or maybe JBOD (1 volume, so 1 OSD per host) ? Any suggesting about best practice ? Regards, -- Andrija Panić __**_ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- http://admintweets.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD read-ahead not working in 0.87.1
Acutally, good question - is RBD caching at all - possible with Windows guestes, if it ussing latest VirtIO drivers ? Linux caching (write caching, writeback) is working fine with newer virt-io drivers... Thanks On 18 March 2015 at 10:39, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, I don't known how rbd read-ahead is working, but with qemu virtio-scsi, you can have read merge request (for sequential reads), so it's doing bigger ops to ceph cluster and improve throughput. virtio-blk merge request will be supported in coming qemu 2.3. (I'm not sure of virtio-win drivers support of theses features) - Mail original - De: Stephen Taylor stephen.tay...@storagecraft.com À: ceph-users ceph-us...@ceph.com Envoyé: Mardi 17 Mars 2015 21:22:59 Objet: Re: [ceph-users] RBD read-ahead not working in 0.87.1 Never mind. After digging through the history on Github it looks like the docs are wrong. The code for the RBD read-ahead feature appears in 0.88, not 0.86, which explains why I can’t get it to work in 0.87.1. Steve From: Stephen Taylor Sent: Tuesday, March 17, 2015 11:32 AM To: 'ceph-us...@ceph.com' Subject: RBD read-ahead not working in 0.87.1 Hello, fellow Ceph users, I’m trying to utilize RBD read-ahead settings with 0.87.1 (documented as new in 0.86) to convince the Windows boot loader to boot a Windows RBD in a reasonable amount of time using QEMU on Ubuntu 14.04.2. Below is the output of “ceph -w” during the Windows VM boot process. During the boot loader phase it’s almost a perfect correspondence of kB/s rd and op/s, which I interpret as the boot loader doing LOTS of non-cached, 1kB reads. This is what the [client] section of my ceph.conf looks like: [client] rbd_cache = true rbd_cache_size = 268435456 rbd_cache_max_dirty = 201326592 rbd_cache_target_dirty = 134217728 rbd_readahead_trigger_requests = 1 rbd_readahead_max_bytes = 524288 rbd_readahead_disable_after_bytes = 0 rbd_cache_writethrough_until_flush = true admin_socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok Some of those values are not what I would use in production. This is just a test environment to try to prove that the RBD read-ahead caching works as I expect. Another interesting note is that “sudo ceph daemon /var/run/ceph/admin socket config show | grep rbd_readahead” yields nothing. The “config show” lists all of the config settings with the values I expect, but the rbd_readahead_* settings are absent. I have tried all kinds of different values in my ceph.conf file with the same result. The reason I’m convinced that read-ahead caching is my problem here is that I can mount my RBD via rbd-fuse and use the same QEMU command with the -drive parameter changed to use the rbd-fuse mount as a raw file instead of direct librbd, and the same Windows VM boots in a fraction of the time with much lower op/s numbers in the Ceph status output. I assume this is due to the Linux page cache helping me out with the rbd-fuse mount. Are the RBD read-ahead settings simply not working? That’s what it looks like, but I figure I must be doing something wrong. Thanks for any help. Steve Taylor 2015-03-17 09:50:19.209721 mon.0 [INF] pgmap v20871: 8192 pgs: 8192 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 3 B/s rd, 0 op/s 2015-03-17 09:50:24.199327 mon.0 [INF] pgmap v20872: 8192 pgs: 8192 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 7 B/s rd, 0 op/s 2015-03-17 10:02:03.471846 mon.0 [INF] pgmap v20873: 8192 pgs: 8192 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 1 B/s rd, 0 op/s 2015-03-17 10:02:05.739547 mon.0 [INF] pgmap v20874: 8192 pgs: 8192 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 754 B/s rd, 0 op/s 2015-03-17 10:02:08.008245 mon.0 [INF] pgmap v20875: 8192 pgs: 8192 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 144 kB/s rd, 156 op/s 2015-03-17 10:02:09.286862 mon.0 [INF] pgmap v20876: 8192 pgs: 8192 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 130 kB/s rd, 147 op/s 2015-03-17 10:02:10.543695 mon.0 [INF] pgmap v20877: 8192 pgs: 8192 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 614 kB/s rd, 614 op/s 2015-03-17 10:02:11.832906 mon.0 [INF] pgmap v20878: 8192 pgs: 8192 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 828 kB/s rd, 828 op/s 2015-03-17 10:02:12.998471 mon.0 [INF] pgmap v20879: 8192 pgs: 8192 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 387 kB/s rd, 387 op/s 2015-03-17 10:02:14.378462 mon.0 [INF] pgmap v20880: 8192 pgs: 8192 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 76889 B/s rd, 75 op/s 2015-03-17 10:02:15.656530 mon.0 [INF] pgmap v20881: 8192 pgs: 8192 active+clean; 47678 MB data, 163 GB used, 381 TB / 381 TB avail; 73924 B/s rd, 72 op/s 2015-03-17 10:02:16.935335 mon.0 [INF] pgmap
Re: [ceph-users] Doesn't Support Qcow2 Disk images
ceph is RAW format - should be all fine...so VM will be using that RAW format On 12 March 2015 at 09:03, Azad Aliyar azad.ali...@sparksupport.com wrote: Community please explain the 2nd warning on this page: http://ceph.com/docs/master/rbd/rbd-openstack/ Important Ceph doesn’t support QCOW2 for hosting a virtual machine disk. Thus if you want to boot virtual machines in Ceph (ephemeral backend or boot from volume), the Glance image format must be RAW. -- Warm Regards, Azad Aliyar Linux Server Engineer *Email* : azad.ali...@sparksupport.com *|* *Skype* : spark.azad http://www.sparksupport.com http://www.sparkmycloud.com https://www.facebook.com/sparksupport http://www.linkedin.com/company/244846 https://twitter.com/sparksupport3rd Floor, Leela Infopark, Phase -2,Kakanad, Kochi-30, Kerala, India *Phone*:+91 484 6561696 , *Mobile*:91-8129270421. *Confidentiality Notice:* Information in this e-mail is proprietary to SparkSupport. and is intended for use only by the addressed, and may contain information that is privileged, confidential or exempt from disclosure. If you are not the intended recipient, you are notified that any use of this information in any manner is strictly prohibited. Please delete this mail notify us immediately at i...@sparksupport.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
Hi Robert, it seems I have not listened well on your advice - I set osd to out, instead of stoping it - and now instead of some ~ 3% of degraded objects, now there is 0.000% of degraded, and arround 6% misplaced - and rebalancing is happening again, but this is small percentage.. Do you know if later when I remove this OSD from crush map - no more data will be rebalanced (as per CEPH official documentation) - since already missplaced objects are geting distributed away to all other nodes ? (after service ceph stop osd.0 - there was 2.45% degraded data - but no backfilling was happening for some reason...it just stayed degraded... so this is a reason why I started back the OSD, and then set it to out...) Thanks On 4 March 2015 at 17:54, Andrija Panic andrija.pa...@gmail.com wrote: Hi Robert, I already have this stuff set. CEph is 0.87.0 now... Thanks, will schedule this for weekend, 10G network and 36 OSDs - should move data in less than 8h per my last experineced that was arround8h, but some 1G OSDs were included... Thx! On 4 March 2015 at 17:49, Robert LeBlanc rob...@leblancnet.us wrote: You will most likely have a very high relocation percentage. Backfills always are more impactful on smaller clusters, but osd max backfills should be what you need to help reduce the impact. The default is 10, you will want to use 1. I didn't catch which version of Ceph you are running, but I think there was some priority work done in firefly to help make backfills lower priority. I think it has gotten better in later versions. On Wed, Mar 4, 2015 at 1:35 AM, Andrija Panic andrija.pa...@gmail.com wrote: Thank you Rober - I'm wondering when I do remove total of 7 OSDs from crush map - weather that will cause more than 37% of data moved (80% or whatever) I'm also wondering if the thortling that I applied is fine or not - I will introduce the osd_recovery_delay_start 10sec as Irek said. I'm just wondering hom much will be the performance impact, because: - when stoping OSD, the impact while backfilling was fine more or a less - I can leave with this - when I removed OSD from cursh map - first 1h or so, impact was tremendous, and later on during recovery process impact was much less but still noticable... Thanks for the tip of course ! Andrija On 3 March 2015 at 18:34, Robert LeBlanc rob...@leblancnet.us wrote: I would be inclined to shut down both OSDs in a node, let the cluster recover. Once it is recovered, shut down the next two, let it recover. Repeat until all the OSDs are taken out of the cluster. Then I would set nobackfill and norecover. Then remove the hosts/disks from the CRUSH then unset nobackfill and norecover. That should give you a few small changes (when you shut down OSDs) and then one big one to get everything in the final place. If you are still adding new nodes, when nobackfill and norecover is set, you can add them in so that the one big relocate fills the new drives too. On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic andrija.pa...@gmail.com wrote: Thx Irek. Number of replicas is 3. I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already decommissioned), which is further connected to a new 10G switch/network with 3 servers on it with 12 OSDs each. I'm decommissioning old 3 nodes on 1G network... So you suggest removing whole node with 2 OSDs manually from crush map? Per my knowledge, ceph never places 2 replicas on 1 node, all 3 replicas were originally been distributed over all 3 nodes. So anyway It could be safe to remove 2 OSDs at once together with the node itself...since replica count is 3... ? Thx again for your time On Mar 3, 2015 1:35 PM, Irek Fasikhov malm...@gmail.com wrote: Once you have only three nodes in the cluster. I recommend you add new nodes to the cluster, and then delete the old. 2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com: You have a number of replication? 2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com : Hi Irek, yes, stoping OSD (or seting it to OUT) resulted in only 3% of data degraded and moved/recovered. When I after that removed it from Crush map ceph osd crush rm id, that's when the stuff with 37% happened. And thanks Irek for help - could you kindly just let me know of the prefered steps when removing whole node? Do you mean I first stop all OSDs again, or just remove each OSD from crush map, or perhaps, just decompile cursh map, delete the node completely, compile back in, and let it heal/recover ? Do you think this would result in less data missplaces and moved arround ? Sorry for bugging you, I really appreaciate your help. Thanks On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote: A large percentage of the rebuild of the cluster map (But low
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
Thanks a lot Robert. I have actually already tried folowing: a) set one OSD to out (6% of data misplaced, CEPH recovered fine), stop OSD, remove OSD from crush map (again 36% of data misplaced !!!) - then inserted OSD back in to crushmap - and those 36% displaced objects disappeared, of course - I'v undone the crush remove... so damage undone - the OSD is just out and cluster healthy again. b) set norecover, nobackfill, and then: - Remove one OSD from crush (the running OSD, not the one from point a) - only 18% of data misplaced !!! (no recovery was happening though, because of norecover, nobackfill) - Removed another OSD from same node - total of only 20% of objects missplaced (with 2 OSDs on same node, removed from crush map) -So these 2 OSD were still running UP and IN, and I just removed them from crush map, per the advice to avoid calcualting Crush map twice = from: http://image.slidesharecdn.com/scalingcephatcern-140311134847-phpapp01/95/scaling-ceph-at-cern-ceph-day-frankfurt-19-638.jpg?cb=1394564547 - And I added back this 2 OSD to crush map, this was just a test... So the algorith is very funny in some aspect..but it's all pseudo stuff so I kind of understand... I will share my finding during the rest of the OSD demotion, after I demote them... Thanks for your detailed inputs ! Andrija On 5 March 2015 at 22:51, Robert LeBlanc rob...@leblancnet.us wrote: Setting an OSD out will start the rebalance with the degraded object count. The OSD is still alive and can participate in the relocation of the objects. This is preferable so that you don't happen to get less the min_size because a disk fails during the rebalance then I/O stops on the cluster. Because CRUSH is an algorithm, anything that changes algorithm will cause a change in the output (location). When you set/fail an OSD, it changes the CRUSH, but the host and weight of the host are still in effect. When you remove the host or change the weight of the host (by removing a single OSD), it makes a change to the algorithm which will also cause some changes in how it computes the locations. Disclaimer - I have not tried this It may be possible to minimize the data movement by doing the following: 1. set norecover and nobackfill on the cluster 2. Set the OSDs to be removed to out 3. Adjust the weight of the hosts in the CRUSH (if removing all OSDs for the host, set it to zero) 4. If you have new OSDs to add, add them into the cluster now 5. Once all OSDs changes have been entered, unset norecover and nobackfill 6. This will migrate the data off the old OSDs and onto the new OSDs in one swoop. 7. Once the data migration is complete, set norecover and nobackfill on the cluster again. 8. Remove the old OSDs 9. Unset norecover and nobackfill The theory is that by setting the host weights to 0, removing the OSDs/hosts later should minimize the data movement afterwards because the algorithm should have already dropped it out as a candidate for placement. If this works right, then you basically queue up a bunch of small changes, do one data movement, always keep all copies of your objects online and minimize the impact of the data movement by leveraging both your old and new hardware at the same time. If you try this, please report back on your experience. I'm might try it in my lab, but I'm really busy at the moment so I don't know if I'll get to it real soon. On Thu, Mar 5, 2015 at 12:53 PM, Andrija Panic andrija.pa...@gmail.com wrote: Hi Robert, it seems I have not listened well on your advice - I set osd to out, instead of stoping it - and now instead of some ~ 3% of degraded objects, now there is 0.000% of degraded, and arround 6% misplaced - and rebalancing is happening again, but this is small percentage.. Do you know if later when I remove this OSD from crush map - no more data will be rebalanced (as per CEPH official documentation) - since already missplaced objects are geting distributed away to all other nodes ? (after service ceph stop osd.0 - there was 2.45% degraded data - but no backfilling was happening for some reason...it just stayed degraded... so this is a reason why I started back the OSD, and then set it to out...) Thanks On 4 March 2015 at 17:54, Andrija Panic andrija.pa...@gmail.com wrote: Hi Robert, I already have this stuff set. CEph is 0.87.0 now... Thanks, will schedule this for weekend, 10G network and 36 OSDs - should move data in less than 8h per my last experineced that was arround8h, but some 1G OSDs were included... Thx! On 4 March 2015 at 17:49, Robert LeBlanc rob...@leblancnet.us wrote: You will most likely have a very high relocation percentage. Backfills always are more impactful on smaller clusters, but osd max backfills should be what you need to help reduce the impact. The default is 10, you will want to use 1. I didn't catch which version of Ceph you are running, but I
[ceph-users] [rbd cache experience - given]
Hi there, just wanted to share some benchmark experience with RBD caching, that I have just (partially) implemented. This is not nicely formated results, just raw numbers to understadn the difference *INFRASTRUCTURE: - 3 hosts with: 12 x 4TB drives, 6 Journals on 1 SSD, 6 journals on second SSD - 10GB NICs on both Compute and Storage nodes - 10GB dedicated replication/private CEPH network - Libvirt 1.2.3 - Qemu 0.12.1.2 - qemu drive-cache=none (set by CloudStack) *** CEPH SETTINGS (ceph.conf on KVM hosts): [client] rbd cache = true rbd cache size = 67108864 # (64MB) rbd cache max dirty = 50331648 # (48MB) rbd cache target dirty = 33554432 # (32MB) rbd cache max dirty age = 2 rbd cache writethrough until flush = true # For safety reasons *NUMBERS (CentOS 6.6 VM - FIO/sysbench tools): Random write 16k IO size (yes I know, this is not iops because true IOPS is considered 4K size - but is good enough for comparison): Random write, NO RBD cache: 170 IOPS Random write, RBD cache 64MB: 6500 IOPS. Sequential writes improved from ~ 40 MB/s to 800 MB/s Will check latency also...and let you know *** IMPORTANT: Make sure to have latest VirtIO drivers, because: - CentOS 6.6, Kernel 2.6.32.x - *RBD caching does not work* (2.6.32 VirtiIO driver does not send flushes properly) - CentOS 6.6 Kernel 3.10 Elrepo *RBD caching works fine* (new VirtIO drivers sending flushes fine) I dont know for Windows, but will give you before and after numbers very soon. Best, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Adding Monitor
Georgeos , you need to have deployment server and cd into folder that you used originaly while deploying CEPH - in this folder you should already have ceph.conf, admin.client keyring and other stuff - which is required to to connect to cluster...and provision new MONs or OSDs, etc. Message: [ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run 'new' to create a new cluster... ...means (if I'm not mistaken) that you are runnign ceph-deploy from NOT original folder... On 13 March 2015 at 23:03, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote: Not a firewall problem!! Firewall is disabled ... Loic I 've tried mon create because of this: http://ceph.com/docs/v0.80.5/ start/quick-ceph-deploy/#adding-monitors Should I first create and then add?? What is the proper order??? Should I do it from the already existing monitor node or can I run it from the new one? If I try add from the beginning I am getting this: ceph_deploy.conf][DEBUG ] found configuration file at: /home/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.22): /usr/bin/ceph-deploy mon add jin [ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run 'new' to create a new cluster Regards, George Hi, I think ceph-deploy mon add (instead of create) is what you should be using. Cheers On 13/03/2015 22:25, Georgios Dimitrakakis wrote: On an already available cluster I 've tried to add a new monitor! I have used ceph-deploy mon create {NODE} where {NODE}=the name of the node and then I restarted the /etc/init.d/ceph service with a success at the node where it showed that the monitor is running like: # /etc/init.d/ceph restart === mon.jin === === mon.jin === Stopping Ceph mon.jin on jin...kill 36388...done === mon.jin === Starting Ceph mon.jin on jin... Starting ceph-create-keys on jin... But checking the quorum it doesn't show the newly added monitor! Plus ceph mon stat gives out only 1 monitor!!! # ceph mon stat e1: 1 mons at {fu=192.168.1.100:6789/0}, election epoch 1, quorum 0 fu Any ideas on what have I done wrong??? Regards, George ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Adding Monitor
Check firewall - I hit this issue over and over again... On 13 March 2015 at 22:25, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote: On an already available cluster I 've tried to add a new monitor! I have used ceph-deploy mon create {NODE} where {NODE}=the name of the node and then I restarted the /etc/init.d/ceph service with a success at the node where it showed that the monitor is running like: # /etc/init.d/ceph restart === mon.jin === === mon.jin === Stopping Ceph mon.jin on jin...kill 36388...done === mon.jin === Starting Ceph mon.jin on jin... Starting ceph-create-keys on jin... But checking the quorum it doesn't show the newly added monitor! Plus ceph mon stat gives out only 1 monitor!!! # ceph mon stat e1: 1 mons at {fu=192.168.1.100:6789/0}, election epoch 1, quorum 0 fu Any ideas on what have I done wrong??? Regards, George ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Public Network Meaning
Public network is clients-to-OSD traffic - and if you have NOT explicitely defined cluster network, than also OSD-to-OSD replication takes place over same network. Otherwise, you can define public and cluster(private) network - so OSD replication will happen over dedicated NICs (cluster network) and thus speed up. If i.e. replica count on pool is 3, that means, each 1GB of data writen to some particualr OSD, will generate 3 x 1GB of more writes, to the replicas... - which ideally will take place over separate NICs to speed up things... On 14 March 2015 at 17:43, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote: Hi all!! What is the meaning of public_network in ceph.conf? Is it the network that OSDs are talking and transferring data? I have two nodes with two IP addresses each. One for internal network 192.168.1.0/24 and one external 15.12.6.* I see the following in my logs: osd.0 is down since epoch 2204, last address 15.12.6.21:6826/33094 osd.1 is down since epoch 2206, last address 15.12.6.21:6817/32463 osd.2 is down since epoch 2198, last address 15.12.6.21:6843/34921 osd.3 is down since epoch 2200, last address 15.12.6.21:6838/34208 osd.4 is down since epoch 2202, last address 15.12.6.21:6831/33610 osd.5 is down since epoch 2194, last address 15.12.6.21:6858/35948 osd.7 is down since epoch 2192, last address 15.12.6.21:6871/36720 osd.8 is down since epoch 2196, last address 15.12.6.21:6855/35354 I 've managed to add a second node and during rebalancing I see that data is transfered through the internal 192.* but the external link is also saturated! What is being transferred from that? Any help much appreciated! Regards, George ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [SPAM] Changing pg_num = RBD VM down !
changin PG number - causes LOOOT of data rebalancing (in my case was 80%) which I learned the hard way... On 14 March 2015 at 18:49, Gabri Mate mailingl...@modernbiztonsag.org wrote: I had the same issue a few days ago. I was increasing the pg_num of one pool from 512 to 1024 and all the VMs in that pool stopped. I came to the conclusion that doubling the pg_num caused such a high load in ceph that the VMs were blocked. The next time I will test with small increments. On 12:38 Sat 14 Mar , Florent B wrote: Hi all, I have a Giant cluster in production. Today one of my RBD pools had the too few pgs warning. So I changed pg_num pgp_num. And at this moment, some of the VM stored on this pool were stopped (on some hosts, not all, it depends, no logic) All was running fine for months... Have you ever seen this ? What could have caused this ? Thank you. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Public Network Meaning
This is how I did it, and then retart each OSD one by one, but monritor with ceph -s, when ceph is healthy, proceed with next OSD restart... Make sure the networks are fine on physical nodes, that you can ping in between... [global] x x x x x x # ### REPLICATION NETWORK ON SEPARATE 10G NICs # replication network cluster network = 10.44.251.0/24 # public/client network public network = 10.44.253.0/16 # [mon.xx] mon_addr = x.x.x.x:6789 host = xx [mon.yy] mon_addr = x.x.x.x:6789 host = yy [mon.zz] mon_addr = x.x.x.x:6789 host = zz On 14 March 2015 at 19:14, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote: I thought that it was easy but apparently it's not! I have the following in my conf file mon_host = 192.168.1.100,192.168.1.101,192.168.1.102 public_network = 192.168.1.0/24 mon_initial_members = fu,rai,jin but still the 15.12.6.21 link is being saturated Any ideas why??? Should I put cluster network as well?? Should I put each OSD in the CONF file??? Regards, George Andrija, thanks a lot for the useful info! I would also like to thank Kingrat at the IRC channel for his useful advice! I was under the wrong impression that public is the one used for RADOS. So I thought that public=external=internet and therefore I used that one in my conf. I understand now that I should have specified in CEPH Public's Network what I call internal and which is the one that all machines are talking directly to each other. Thanks you all for the feedback! Regards, George Public network is clients-to-OSD traffic - and if you have NOT explicitely defined cluster network, than also OSD-to-OSD replication takes place over same network. Otherwise, you can define public and cluster(private) network - so OSD replication will happen over dedicated NICs (cluster network) and thus speed up. If i.e. replica count on pool is 3, that means, each 1GB of data writen to some particualr OSD, will generate 3 x 1GB of more writes, to the replicas... - which ideally will take place over separate NICs to speed up things... On 14 March 2015 at 17:43, Georgios Dimitrakakis wrote: Hi all!! What is the meaning of public_network in ceph.conf? Is it the network that OSDs are talking and transferring data? I have two nodes with two IP addresses each. One for internal network MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 192.168.1.0/24 [1] and one external 15.12.6.* I see the following in my logs: osd.0 is down since epoch 2204, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6826/33094 [2] osd.1 is down since epoch 2206, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6817/32463 [3] osd.2 is down since epoch 2198, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6843/34921 [4] osd.3 is down since epoch 2200, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6838/34208 [5] osd.4 is down since epoch 2202, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6831/33610 [6] osd.5 is down since epoch 2194, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6858/35948 [7] osd.7 is down since epoch 2192, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6871/36720 [8] osd.8 is down since epoch 2196, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6855/35354 [9] I ve managed to add a second node and during rebalancing I see that data is transfered through the internal 192.* but the external link is also saturated! What is being transferred from that? Any help much appreciated! Regards, George ___ ceph-users mailing list ceph-users@lists.ceph.com [10] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [11] -- Andrija Panić Links: -- [1] http://192.168.1.0/24 [2] http://15.12.6.21:6826/33094 [3] http://15.12.6.21:6817/32463 [4] http://15.12.6.21:6843/34921 [5] http://15.12.6.21:6838/34208 [6] http://15.12.6.21:6831/33610 [7] http://15.12.6.21:6858/35948 [8] http://15.12.6.21:6871/36720 [9] http://15.12.6.21:6855/35354 [10] mailto:ceph-users@lists.ceph.com [11] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [12] mailto:gior...@acmac.uoc.gr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] {Disarmed} Re: Public Network Meaning
Georgios, no need to put ANYTHING if you don't plan to split client-to-OSD vs OSD-OSD-replication on 2 different Network Cards/Networks - for pefromance reasons. if you have only 1 network - simply DONT configure networks at all inside your CEPH.conf file... if you have 2 x 1G cards in servers, then you may use first 1G for client traffic, and second 1G for OSD-to-OSD replication... best On 14 March 2015 at 19:33, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote: Andrija, Thanks for you help! In my case I just have one 192.* network, so should I put that for both? Besides monitors do I have to list OSDs as well? Thanks again! Best, George This is how I did it, and then retart each OSD one by one, but monritor with ceph -s, when ceph is healthy, proceed with next OSD restart... Make sure the networks are fine on physical nodes, that you can ping in between... [global] x x x x x x # ### REPLICATION NETWORK ON SEPARATE 10G NICs # replication network cluster network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 10.44.251.0/24 [29] # public/client network public network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 10.44.253.0/16 [30] # [mon.xx] mon_addr = x.x.x.x:6789 host = xx [mon.yy] mon_addr = x.x.x.x:6789 host = yy [mon.zz] mon_addr = x.x.x.x:6789 host = zz On 14 March 2015 at 19:14, Georgios Dimitrakakis wrote: I thought that it was easy but apparently its not! I have the following in my conf file mon_host = 192.168.1.100,192.168.1.101,192.168.1.102 public_network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 192.168.1.0/24 [26] mon_initial_members = fu,rai,jin but still the 15.12.6.21 link is being saturated Any ideas why??? Should I put cluster network as well?? Should I put each OSD in the CONF file??? Regards, George Andrija, thanks a lot for the useful info! I would also like to thank Kingrat at the IRC channel for his useful advice! I was under the wrong impression that public is the one used for RADOS. So I thought that public=external=internet and therefore I used that one in my conf. I understand now that I should have specified in CEPH Publics Network what I call internal and which is the one that all machines are talking directly to each other. Thanks you all for the feedback! Regards, George Public network is clients-to-OSD traffic - and if you have NOT explicitely defined cluster network, than also OSD-to-OSD replication takes place over same network. Otherwise, you can define public and cluster(private) network - so OSD replication will happen over dedicated NICs (cluster network) and thus speed up. If i.e. replica count on pool is 3, that means, each 1GB of data writen to some particualr OSD, will generate 3 x 1GB of more writes, to the replicas... - which ideally will take place over separate NICs to speed up things... On 14 March 2015 at 17:43, Georgios Dimitrakakis wrote: Hi all!! What is the meaning of public_network in ceph.conf? Is it the network that OSDs are talking and transferring data? I have two nodes with two IP addresses each. One for internal network MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 192.168.1.0/24 [1] [1] and one external 15.12.6.* I see the following in my logs: osd.0 is down since epoch 2204, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6826/33094 [2] [2] osd.1 is down since epoch 2206, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6817/32463 [3] [3] osd.2 is down since epoch 2198, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6843/34921 [4] [4] osd.3 is down since epoch 2200, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6838/34208 [5] [5] osd.4 is down since epoch 2202, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6831/33610 [6] [6] osd.5 is down since epoch 2194, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6858/35948 [7] [7] osd.7 is down since epoch 2192, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6871/36720 [8] [8] osd.8 is down since epoch 2196, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
Re: [ceph-users] {Disarmed} Re: {Disarmed} Re: Public Network Meaning
In that case - yes...put everything on 1 card - or if both cards are 1G (or same speed for that matter...) - then you might want toblock all external traffic except i.e. SSH, WEB, but allow ALL traffic between all CEPH OSDs... so you can still use that network for public/client traffic - not sure how do you connect/use CEPH - from internet ??? or you have some more VMs/servers/clients on 192.* network... ? On 14 March 2015 at 19:38, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote: Andrija, I have two cards! One on 15.12.* and one on 192.* Obviously the 15.12.* is the external network (real public IP address e.g used to access the node via SSH) That's why I am telling that my public network for CEPH is the 192. and should I use the cluster network for that as well? Best, George Georgios, no need to put ANYTHING if you dont plan to split client-to-OSD vs OSD-OSD-replication on 2 different Network Cards/Networks - for pefromance reasons. if you have only 1 network - simply DONT configure networks at all inside your CEPH.conf file... if you have 2 x 1G cards in servers, then you may use first 1G for client traffic, and second 1G for OSD-to-OSD replication... best On 14 March 2015 at 19:33, Georgios Dimitrakakis wrote: Andrija, Thanks for you help! In my case I just have one 192.* network, so should I put that for both? Besides monitors do I have to list OSDs as well? Thanks again! Best, George This is how I did it, and then retart each OSD one by one, but monritor with ceph -s, when ceph is healthy, proceed with next OSD restart... Make sure the networks are fine on physical nodes, that you can ping in between... [global] x x x x x x # ### REPLICATION NETWORK ON SEPARATE 10G NICs # replication network cluster network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 10.44.251.0/24 [29] [29] # public/client network public network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 10.44.253.0/16 [30] [30] # [mon.xx] mon_addr = x.x.x.x:6789 host = xx [mon.yy] mon_addr = x.x.x.x:6789 host = yy [mon.zz] mon_addr = x.x.x.x:6789 host = zz On 14 March 2015 at 19:14, Georgios Dimitrakakis wrote: I thought that it was easy but apparently its not! I have the following in my conf file mon_host = 192.168.1.100,192.168.1.101,192.168.1.102 public_network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 192.168.1.0/24 [26] [26] mon_initial_members = fu,rai,jin but still the 15.12.6.21 link is being saturated Any ideas why??? Should I put cluster network as well?? Should I put each OSD in the CONF file??? Regards, George Andrija, thanks a lot for the useful info! I would also like to thank Kingrat at the IRC channel for his useful advice! I was under the wrong impression that public is the one used for RADOS. So I thought that public=external=internet and therefore I used that one in my conf. I understand now that I should have specified in CEPH Publics Network what I call internal and which is the one that all machines are talking directly to each other. Thanks you all for the feedback! Regards, George Public network is clients-to-OSD traffic - and if you have NOT explicitely defined cluster network, than also OSD-to-OSD replication takes place over same network. Otherwise, you can define public and cluster(private) network - so OSD replication will happen over dedicated NICs (cluster network) and thus speed up. If i.e. replica count on pool is 3, that means, each 1GB of data writen to some particualr OSD, will generate 3 x 1GB of more writes, to the replicas... - which ideally will take place over separate NICs to speed up things... On 14 March 2015 at 17:43, Georgios Dimitrakakis wrote: Hi all!! What is the meaning of public_network in ceph.conf? Is it the network that OSDs are talking and transferring data? I have two nodes with two IP addresses each. One for internal network MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 192.168.1.0/24 [1] [1] [1] and one external 15.12.6.* I see the following in my logs: osd.0 is down since epoch 2204, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6826/33094 [2] [2] [2] osd.1 is down since epoch 2206, last address MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE
Re: [ceph-users] Turning on SCRUB back on - any suggestion ?
Thanks Wido - I will do that. On 13 March 2015 at 09:46, Wido den Hollander w...@42on.com wrote: On 13-03-15 09:42, Andrija Panic wrote: Hi all, I have set nodeep-scrub and noscrub while I had small/slow hardware for the cluster. It has been off for a while now. Now we are upgraded with hardware/networking/SSDs and I would like to activate - or unset these flags. Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I was wondering what is the best way to unset flags - meaning if I just unset the flags, should I expect that the SCRUB will start all of the sudden on all disks - or is there way to let the SCRUB do drives one by one... So, I *think* that unsetting these flags will trigger a big scrub, since all PGs have a very old last_scrub_stamp and last_deepscrub_stamp You can verify this with: $ ceph pg pgid query A solution would be to scrub each PG manually first in a timely fashion. $ ceph pg scrub pgid That way you set the timestamps and slowly scrub each PG. When that's done, unset the flags. Wido In other words - should I expect BIG performance impact ornot ? Any experience is very appreciated... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Turning on SCRUB back on - any suggestion ?
Hi all, I have set nodeep-scrub and noscrub while I had small/slow hardware for the cluster. It has been off for a while now. Now we are upgraded with hardware/networking/SSDs and I would like to activate - or unset these flags. Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I was wondering what is the best way to unset flags - meaning if I just unset the flags, should I expect that the SCRUB will start all of the sudden on all disks - or is there way to let the SCRUB do drives one by one... In other words - should I expect BIG performance impact ornot ? Any experience is very appreciated... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Turning on SCRUB back on - any suggestion ?
Nice - so I just realized I need to manually scrub 1216 placements groups :) On 13 March 2015 at 10:16, Andrija Panic andrija.pa...@gmail.com wrote: Thanks Wido - I will do that. On 13 March 2015 at 09:46, Wido den Hollander w...@42on.com wrote: On 13-03-15 09:42, Andrija Panic wrote: Hi all, I have set nodeep-scrub and noscrub while I had small/slow hardware for the cluster. It has been off for a while now. Now we are upgraded with hardware/networking/SSDs and I would like to activate - or unset these flags. Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I was wondering what is the best way to unset flags - meaning if I just unset the flags, should I expect that the SCRUB will start all of the sudden on all disks - or is there way to let the SCRUB do drives one by one... So, I *think* that unsetting these flags will trigger a big scrub, since all PGs have a very old last_scrub_stamp and last_deepscrub_stamp You can verify this with: $ ceph pg pgid query A solution would be to scrub each PG manually first in a timely fashion. $ ceph pg scrub pgid That way you set the timestamps and slowly scrub each PG. When that's done, unset the flags. Wido In other words - should I expect BIG performance impact ornot ? Any experience is very appreciated... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Turning on SCRUB back on - any suggestion ?
Interestingthx for that Henrik. BTW, my placements groups are arround 1800 objects (ceph pg dump) - meainng max of 7GB od data at the moment, regular scrub just took 5-10sec to finish. Deep scrub would I guess take some minutes for sure What about deepscrub - timestamp is still some months ago, but regular scrub is fine now with fresh timestamp...? I don't see max deep scrub setings - or are these settings applied in general for both kind on scrubs ? Thanks On 13 March 2015 at 12:22, Henrik Korkuc li...@kirneh.eu wrote: I think that there will be no big scrub, as there are limits of maximum scrubs at a time. http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing If we take osd max scrubs which is 1 by default, then you will not get more than 1 scrub per OSD. I couldn't quickly find if there are cluster wide limits. On 3/13/15 10:46, Wido den Hollander wrote: On 13-03-15 09:42, Andrija Panic wrote: Hi all, I have set nodeep-scrub and noscrub while I had small/slow hardware for the cluster. It has been off for a while now. Now we are upgraded with hardware/networking/SSDs and I would like to activate - or unset these flags. Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I was wondering what is the best way to unset flags - meaning if I just unset the flags, should I expect that the SCRUB will start all of the sudden on all disks - or is there way to let the SCRUB do drives one by one... So, I *think* that unsetting these flags will trigger a big scrub, since all PGs have a very old last_scrub_stamp and last_deepscrub_stamp You can verify this with: $ ceph pg pgid query A solution would be to scrub each PG manually first in a timely fashion. $ ceph pg scrub pgid That way you set the timestamps and slowly scrub each PG. When that's done, unset the flags. Wido In other words - should I expect BIG performance impact ornot ? Any experience is very appreciated... Thanks, -- Andrija Panić ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Turning on SCRUB back on - any suggestion ?
Hmnice Thx guys On 13 March 2015 at 12:33, Henrik Korkuc li...@kirneh.eu wrote: I think settings apply to both kinds of scrubs On 3/13/15 13:31, Andrija Panic wrote: Interestingthx for that Henrik. BTW, my placements groups are arround 1800 objects (ceph pg dump) - meainng max of 7GB od data at the moment, regular scrub just took 5-10sec to finish. Deep scrub would I guess take some minutes for sure What about deepscrub - timestamp is still some months ago, but regular scrub is fine now with fresh timestamp...? I don't see max deep scrub setings - or are these settings applied in general for both kind on scrubs ? Thanks On 13 March 2015 at 12:22, Henrik Korkuc li...@kirneh.eu wrote: I think that there will be no big scrub, as there are limits of maximum scrubs at a time. http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing If we take osd max scrubs which is 1 by default, then you will not get more than 1 scrub per OSD. I couldn't quickly find if there are cluster wide limits. On 3/13/15 10:46, Wido den Hollander wrote: On 13-03-15 09:42, Andrija Panic wrote: Hi all, I have set nodeep-scrub and noscrub while I had small/slow hardware for the cluster. It has been off for a while now. Now we are upgraded with hardware/networking/SSDs and I would like to activate - or unset these flags. Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I was wondering what is the best way to unset flags - meaning if I just unset the flags, should I expect that the SCRUB will start all of the sudden on all disks - or is there way to let the SCRUB do drives one by one... So, I *think* that unsetting these flags will trigger a big scrub, since all PGs have a very old last_scrub_stamp and last_deepscrub_stamp You can verify this with: $ ceph pg pgid query A solution would be to scrub each PG manually first in a timely fashion. $ ceph pg scrub pgid That way you set the timestamps and slowly scrub each PG. When that's done, unset the flags. Wido In other words - should I expect BIG performance impact ornot ? Any experience is very appreciated... Thanks, -- Andrija Panić ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Turning on SCRUB back on - any suggestion ?
Will do, of course :) THx Wido for quick help, as always ! On 13 March 2015 at 12:04, Wido den Hollander w...@42on.com wrote: On 13-03-15 12:00, Andrija Panic wrote: Nice - so I just realized I need to manually scrub 1216 placements groups :) With manual I meant using a script. Loop through 'ceph pg dump', get the PGid, issue a scrub, sleep for X seconds and issue the next scrub. Wido On 13 March 2015 at 10:16, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: Thanks Wido - I will do that. On 13 March 2015 at 09:46, Wido den Hollander w...@42on.com mailto:w...@42on.com wrote: On 13-03-15 09:42, Andrija Panic wrote: Hi all, I have set nodeep-scrub and noscrub while I had small/slow hardware for the cluster. It has been off for a while now. Now we are upgraded with hardware/networking/SSDs and I would like to activate - or unset these flags. Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I was wondering what is the best way to unset flags - meaning if I just unset the flags, should I expect that the SCRUB will start all of the sudden on all disks - or is there way to let the SCRUB do drives one by one... So, I *think* that unsetting these flags will trigger a big scrub, since all PGs have a very old last_scrub_stamp and last_deepscrub_stamp You can verify this with: $ ceph pg pgid query A solution would be to scrub each PG manually first in a timely fashion. $ ceph pg scrub pgid That way you set the timestamps and slowly scrub each PG. When that's done, unset the flags. Wido In other words - should I expect BIG performance impact ornot ? Any experience is very appreciated... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- Andrija Panić -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
Thanks Irek. Does this mean, that after peering for each PG, there will be delay of 10sec, meaning that every once in a while, I will have 10sec od the cluster NOT being stressed/overloaded, and then the recovery takes place for that PG, and then another 10sec cluster is fine, and then stressed again ? I'm trying to understand process before actually doing stuff (config reference is there on ceph.com but I don't fully understand the process) Thanks, Andrija On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote: Hi. Use value osd_recovery_delay_start example: [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok config show | grep osd_recovery_delay_start osd_recovery_delay_start: 10 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: HI Guys, I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over 37% od the data to rebalance - let's say this is fine (this is when I removed it frm Crush Map). I'm wondering - I have previously set some throtling mechanism, but during first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s - and VMs were unusable completely, and then last 4h of the duration of recover this recovery rate went down to, say, 100-200 MB.s and during this VM performance was still pretty impacted, but at least I could work more or a less So my question, is this behaviour expected, is throtling here working as expected, since first 1h was almoust no throtling applied if I check the recovery rate 1500MB/s and the impact on Vms. And last 4h seemed pretty fine (although still lot of impact in general) I changed these throtling on the fly with: ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_op_priority 1' ceph tell osd.* injectargs '--osd_max_backfills 1' My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD, 6 journals on another SSD) - I have 3 of these hosts. Any thought are welcome. -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
Thx Irek. Number of replicas is 3. I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already decommissioned), which is further connected to a new 10G switch/network with 3 servers on it with 12 OSDs each. I'm decommissioning old 3 nodes on 1G network... So you suggest removing whole node with 2 OSDs manually from crush map? Per my knowledge, ceph never places 2 replicas on 1 node, all 3 replicas were originally been distributed over all 3 nodes. So anyway It could be safe to remove 2 OSDs at once together with the node itself...since replica count is 3... ? Thx again for your time On Mar 3, 2015 1:35 PM, Irek Fasikhov malm...@gmail.com wrote: Once you have only three nodes in the cluster. I recommend you add new nodes to the cluster, and then delete the old. 2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com: You have a number of replication? 2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: Hi Irek, yes, stoping OSD (or seting it to OUT) resulted in only 3% of data degraded and moved/recovered. When I after that removed it from Crush map ceph osd crush rm id, that's when the stuff with 37% happened. And thanks Irek for help - could you kindly just let me know of the prefered steps when removing whole node? Do you mean I first stop all OSDs again, or just remove each OSD from crush map, or perhaps, just decompile cursh map, delete the node completely, compile back in, and let it heal/recover ? Do you think this would result in less data missplaces and moved arround ? Sorry for bugging you, I really appreaciate your help. Thanks On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote: A large percentage of the rebuild of the cluster map (But low percentage degradation). If you had not made ceph osd crush rm id, the percentage would be low. In your case, the correct option is to remove the entire node, rather than each disk individually 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: Another question - I mentioned here 37% of objects being moved arround - this is MISPLACED object (degraded objects were 0.001%, after I removed 1 OSD from cursh map (out of 44 OSD or so). Can anybody confirm this is normal behaviour - and are there any workarrounds ? I understand this is because of the object placement algorithm of CEPH, but still 37% of object missplaces just by removing 1 OSD from crush maps out of 44 make me wonder why this large percentage ? Seems not good to me, and I have to remove another 7 OSDs (we are demoting some old hardware nodes). This means I can potentialy go with 7 x the same number of missplaced objects...? Any thoughts ? Thanks On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote: Thanks Irek. Does this mean, that after peering for each PG, there will be delay of 10sec, meaning that every once in a while, I will have 10sec od the cluster NOT being stressed/overloaded, and then the recovery takes place for that PG, and then another 10sec cluster is fine, and then stressed again ? I'm trying to understand process before actually doing stuff (config reference is there on ceph.com but I don't fully understand the process) Thanks, Andrija On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote: Hi. Use value osd_recovery_delay_start example: [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok config show | grep osd_recovery_delay_start osd_recovery_delay_start: 10 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: HI Guys, I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over 37% od the data to rebalance - let's say this is fine (this is when I removed it frm Crush Map). I'm wondering - I have previously set some throtling mechanism, but during first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s - and VMs were unusable completely, and then last 4h of the duration of recover this recovery rate went down to, say, 100-200 MB.s and during this VM performance was still pretty impacted, but at least I could work more or a less So my question, is this behaviour expected, is throtling here working as expected, since first 1h was almoust no throtling applied if I check the recovery rate 1500MB/s and the impact on Vms. And last 4h seemed pretty fine (although still lot of impact in general) I changed these throtling on the fly with: ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_op_priority 1' ceph tell osd.* injectargs '--osd_max_backfills 1' My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD, 6 journals on another SSD) - I have 3 of these hosts. Any thought are welcome. -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users
[ceph-users] [URGENT-HELP] - Ceph rebalancing again after taking OSD out of CRUSH map
Hi people, I had one OSD crash, so the rebalancing happened - all fine (some 3% of the data has been moved arround, and rebalanced) and my previous recovery/backfill throtling was applied fine and we didnt have a unusable cluster. Now I used the procedure to remove this crashed OSD comletely from the CEPH ( http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd ) and when I used the ceph osd crush remove osd.0 command, all of a sudden, CEPH started to rebalance once again, this time with 37% of the object that are missplaced and based on the eperience inside VMs, and the Recovery RAte in MB/s - I can tell that my throtling of backfilling and recovery is not taken into consideration. Why is this, 37% of all objects again being moved arround, any help, hint, explanation greatly appreciated. This is CEPH 0.87.0 from CEPH repo of course. 42 OSD total after the crash etc. The throtling that I have applied from before is like folowing: ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_op_priority 1' ceph tell osd.* injectargs '--osd_max_backfills 1' Please advise... Thanks -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [URGENT-HELP] - Ceph rebalancing again after taking OSD out of CRUSH map
OK thx Wido. Than can we at least update the documentaiton, that will say MAJOR data rebalancing will happen AGAIN, and not 3%, but 37% in my case. Because, I would never run this during work hours, while clients are hammering VMs... This reminds me of those tunable changes couple of months ago, when my cluster completely colapsed during data rebalancing... I don't see any option to contribute to documentation ? Best On 2 March 2015 at 16:07, Wido den Hollander w...@42on.com wrote: On 03/02/2015 03:56 PM, Andrija Panic wrote: Hi people, I had one OSD crash, so the rebalancing happened - all fine (some 3% of the data has been moved arround, and rebalanced) and my previous recovery/backfill throtling was applied fine and we didnt have a unusable cluster. Now I used the procedure to remove this crashed OSD comletely from the CEPH ( http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd ) and when I used the ceph osd crush remove osd.0 command, all of a sudden, CEPH started to rebalance once again, this time with 37% of the object that are missplaced and based on the eperience inside VMs, and the Recovery RAte in MB/s - I can tell that my throtling of backfilling and recovery is not taken into consideration. Why is this, 37% of all objects again being moved arround, any help, hint, explanation greatly appreciated. This has been discussed a couple of times on the list. If you remove a item from the CRUSHMap, although it has a weight of 0, a rebalance still happens since the CRUSHMap changes. This is CEPH 0.87.0 from CEPH repo of course. 42 OSD total after the crash etc. The throtling that I have applied from before is like folowing: ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_op_priority 1' ceph tell osd.* injectargs '--osd_max_backfills 1' Please advise... Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Implement replication network with live cluster
That was my thought, yes - I found this blog that confirms what you are saying I guess: http://www.sebastien-han.fr/blog/2012/07/29/tip-ceph-public-slash-private-network-configuration/ I will do that... Thx I guess it doesnt matter, since my Crush Map will still refernce old OSDs, that are stoped (and cluster resynced after that) ? Thx again for the help On 4 March 2015 at 17:44, Robert LeBlanc rob...@leblancnet.us wrote: If I remember right, someone has done this on a live cluster without any issues. I seem to remember that it had a fallback mechanism if the OSDs couldn't be reached on the cluster network to contact them on the public network. You could test it pretty easily without much impact. Take one OSD that has both networks and configure it and restart the process. If all the nodes (specifically the old ones with only one network) is able to connect to it, then you are good to go by restarting one OSD at a time. On Wed, Mar 4, 2015 at 4:17 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi, I'm having a live cluster with only public network (so no explicit network configuraion in the ceph.conf file) I'm wondering what is the procedure to implement dedicated Replication/Private and Public network. I've read the manual, know how to do it in ceph.conf, but I'm wondering since this is already running cluster - what should I do after I change ceph.conf on all nodes ? Restarting OSDs one by one, or... ? Is there any downtime expected ? - for the replication network to actually imlemented completely. Another related quetion: Also, I'm demoting some old OSDs, on old servers, I will have them all stoped, but would like to implement replication network before actually removing old OSDs from crush map - since lot of data will be moved arround. My old nodes/OSDs (that will be stoped before I implement replication network) - do NOT have dedicated NIC for replication network, in contrast to new nodes/OSDs. So there will be still reference to these old OSD in the crush map. Will this be a problem - me changing/implementing replication network that WILL work on new nodes/OSDs, but not on old ones since they don't have dedicated NIC ? I guess not since old OSDs are stoped anyway, but would like opinion. Or perhaps i might remove OSD from crush map with prior seting of nobackfill and norecover (so no rebalancing happens) and then implement replication netwotk? Sorry for old post, but... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Implement replication network with live cluster
Thx Wido, I needed this confirmations - thanks! On 4 March 2015 at 17:49, Wido den Hollander w...@42on.com wrote: On 03/04/2015 05:44 PM, Robert LeBlanc wrote: If I remember right, someone has done this on a live cluster without any issues. I seem to remember that it had a fallback mechanism if the OSDs couldn't be reached on the cluster network to contact them on the public network. You could test it pretty easily without much impact. Take one OSD that has both networks and configure it and restart the process. If all the nodes (specifically the old ones with only one network) is able to connect to it, then you are good to go by restarting one OSD at a time. In the OSDMap each OSD has a public and cluster network address. If the cluster network address is not set, replication to that OSD will be done over the public network. So you can push a new configuration to all OSDs and restart them one by one. Make sure the network is ofcourse up and running and it should work. On Wed, Mar 4, 2015 at 4:17 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi, I'm having a live cluster with only public network (so no explicit network configuraion in the ceph.conf file) I'm wondering what is the procedure to implement dedicated Replication/Private and Public network. I've read the manual, know how to do it in ceph.conf, but I'm wondering since this is already running cluster - what should I do after I change ceph.conf on all nodes ? Restarting OSDs one by one, or... ? Is there any downtime expected ? - for the replication network to actually imlemented completely. Another related quetion: Also, I'm demoting some old OSDs, on old servers, I will have them all stoped, but would like to implement replication network before actually removing old OSDs from crush map - since lot of data will be moved arround. My old nodes/OSDs (that will be stoped before I implement replication network) - do NOT have dedicated NIC for replication network, in contrast to new nodes/OSDs. So there will be still reference to these old OSD in the crush map. Will this be a problem - me changing/implementing replication network that WILL work on new nodes/OSDs, but not on old ones since they don't have dedicated NIC ? I guess not since old OSDs are stoped anyway, but would like opinion. Or perhaps i might remove OSD from crush map with prior seting of nobackfill and norecover (so no rebalancing happens) and then implement replication netwotk? Sorry for old post, but... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Implement replication network with live cluster
Thx again - I really appreciatethe help guys ! On 4 March 2015 at 17:51, Robert LeBlanc rob...@leblancnet.us wrote: If the data have been replicated to new OSDs, it will be able to function properly even them them down or only on the public network. On Wed, Mar 4, 2015 at 9:49 AM, Andrija Panic andrija.pa...@gmail.com wrote: I guess it doesnt matter, since my Crush Map will still refernce old OSDs, that are stoped (and cluster resynced after that) ? I wanted to say: it doesnt matter (I guess?) that my Crush map is still referencing old OSD nodes that are already stoped. Tired, sorry... On 4 March 2015 at 17:48, Andrija Panic andrija.pa...@gmail.com wrote: That was my thought, yes - I found this blog that confirms what you are saying I guess: http://www.sebastien-han.fr/blog/2012/07/29/tip-ceph-public-slash-private-network-configuration/ I will do that... Thx I guess it doesnt matter, since my Crush Map will still refernce old OSDs, that are stoped (and cluster resynced after that) ? Thx again for the help On 4 March 2015 at 17:44, Robert LeBlanc rob...@leblancnet.us wrote: If I remember right, someone has done this on a live cluster without any issues. I seem to remember that it had a fallback mechanism if the OSDs couldn't be reached on the cluster network to contact them on the public network. You could test it pretty easily without much impact. Take one OSD that has both networks and configure it and restart the process. If all the nodes (specifically the old ones with only one network) is able to connect to it, then you are good to go by restarting one OSD at a time. On Wed, Mar 4, 2015 at 4:17 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi, I'm having a live cluster with only public network (so no explicit network configuraion in the ceph.conf file) I'm wondering what is the procedure to implement dedicated Replication/Private and Public network. I've read the manual, know how to do it in ceph.conf, but I'm wondering since this is already running cluster - what should I do after I change ceph.conf on all nodes ? Restarting OSDs one by one, or... ? Is there any downtime expected ? - for the replication network to actually imlemented completely. Another related quetion: Also, I'm demoting some old OSDs, on old servers, I will have them all stoped, but would like to implement replication network before actually removing old OSDs from crush map - since lot of data will be moved arround. My old nodes/OSDs (that will be stoped before I implement replication network) - do NOT have dedicated NIC for replication network, in contrast to new nodes/OSDs. So there will be still reference to these old OSD in the crush map. Will this be a problem - me changing/implementing replication network that WILL work on new nodes/OSDs, but not on old ones since they don't have dedicated NIC ? I guess not since old OSDs are stoped anyway, but would like opinion. Or perhaps i might remove OSD from crush map with prior seting of nobackfill and norecover (so no rebalancing happens) and then implement replication netwotk? Sorry for old post, but... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- Andrija Panić -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Implement replication network with live cluster
I guess it doesnt matter, since my Crush Map will still refernce old OSDs, that are stoped (and cluster resynced after that) ? I wanted to say: it doesnt matter (I guess?) that my Crush map is still referencing old OSD nodes that are already stoped. Tired, sorry... On 4 March 2015 at 17:48, Andrija Panic andrija.pa...@gmail.com wrote: That was my thought, yes - I found this blog that confirms what you are saying I guess: http://www.sebastien-han.fr/blog/2012/07/29/tip-ceph-public-slash-private-network-configuration/ I will do that... Thx I guess it doesnt matter, since my Crush Map will still refernce old OSDs, that are stoped (and cluster resynced after that) ? Thx again for the help On 4 March 2015 at 17:44, Robert LeBlanc rob...@leblancnet.us wrote: If I remember right, someone has done this on a live cluster without any issues. I seem to remember that it had a fallback mechanism if the OSDs couldn't be reached on the cluster network to contact them on the public network. You could test it pretty easily without much impact. Take one OSD that has both networks and configure it and restart the process. If all the nodes (specifically the old ones with only one network) is able to connect to it, then you are good to go by restarting one OSD at a time. On Wed, Mar 4, 2015 at 4:17 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi, I'm having a live cluster with only public network (so no explicit network configuraion in the ceph.conf file) I'm wondering what is the procedure to implement dedicated Replication/Private and Public network. I've read the manual, know how to do it in ceph.conf, but I'm wondering since this is already running cluster - what should I do after I change ceph.conf on all nodes ? Restarting OSDs one by one, or... ? Is there any downtime expected ? - for the replication network to actually imlemented completely. Another related quetion: Also, I'm demoting some old OSDs, on old servers, I will have them all stoped, but would like to implement replication network before actually removing old OSDs from crush map - since lot of data will be moved arround. My old nodes/OSDs (that will be stoped before I implement replication network) - do NOT have dedicated NIC for replication network, in contrast to new nodes/OSDs. So there will be still reference to these old OSD in the crush map. Will this be a problem - me changing/implementing replication network that WILL work on new nodes/OSDs, but not on old ones since they don't have dedicated NIC ? I guess not since old OSDs are stoped anyway, but would like opinion. Or perhaps i might remove OSD from crush map with prior seting of nobackfill and norecover (so no rebalancing happens) and then implement replication netwotk? Sorry for old post, but... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
Hi Robert, I already have this stuff set. CEph is 0.87.0 now... Thanks, will schedule this for weekend, 10G network and 36 OSDs - should move data in less than 8h per my last experineced that was arround8h, but some 1G OSDs were included... Thx! On 4 March 2015 at 17:49, Robert LeBlanc rob...@leblancnet.us wrote: You will most likely have a very high relocation percentage. Backfills always are more impactful on smaller clusters, but osd max backfills should be what you need to help reduce the impact. The default is 10, you will want to use 1. I didn't catch which version of Ceph you are running, but I think there was some priority work done in firefly to help make backfills lower priority. I think it has gotten better in later versions. On Wed, Mar 4, 2015 at 1:35 AM, Andrija Panic andrija.pa...@gmail.com wrote: Thank you Rober - I'm wondering when I do remove total of 7 OSDs from crush map - weather that will cause more than 37% of data moved (80% or whatever) I'm also wondering if the thortling that I applied is fine or not - I will introduce the osd_recovery_delay_start 10sec as Irek said. I'm just wondering hom much will be the performance impact, because: - when stoping OSD, the impact while backfilling was fine more or a less - I can leave with this - when I removed OSD from cursh map - first 1h or so, impact was tremendous, and later on during recovery process impact was much less but still noticable... Thanks for the tip of course ! Andrija On 3 March 2015 at 18:34, Robert LeBlanc rob...@leblancnet.us wrote: I would be inclined to shut down both OSDs in a node, let the cluster recover. Once it is recovered, shut down the next two, let it recover. Repeat until all the OSDs are taken out of the cluster. Then I would set nobackfill and norecover. Then remove the hosts/disks from the CRUSH then unset nobackfill and norecover. That should give you a few small changes (when you shut down OSDs) and then one big one to get everything in the final place. If you are still adding new nodes, when nobackfill and norecover is set, you can add them in so that the one big relocate fills the new drives too. On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic andrija.pa...@gmail.com wrote: Thx Irek. Number of replicas is 3. I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already decommissioned), which is further connected to a new 10G switch/network with 3 servers on it with 12 OSDs each. I'm decommissioning old 3 nodes on 1G network... So you suggest removing whole node with 2 OSDs manually from crush map? Per my knowledge, ceph never places 2 replicas on 1 node, all 3 replicas were originally been distributed over all 3 nodes. So anyway It could be safe to remove 2 OSDs at once together with the node itself...since replica count is 3... ? Thx again for your time On Mar 3, 2015 1:35 PM, Irek Fasikhov malm...@gmail.com wrote: Once you have only three nodes in the cluster. I recommend you add new nodes to the cluster, and then delete the old. 2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com: You have a number of replication? 2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: Hi Irek, yes, stoping OSD (or seting it to OUT) resulted in only 3% of data degraded and moved/recovered. When I after that removed it from Crush map ceph osd crush rm id, that's when the stuff with 37% happened. And thanks Irek for help - could you kindly just let me know of the prefered steps when removing whole node? Do you mean I first stop all OSDs again, or just remove each OSD from crush map, or perhaps, just decompile cursh map, delete the node completely, compile back in, and let it heal/recover ? Do you think this would result in less data missplaces and moved arround ? Sorry for bugging you, I really appreaciate your help. Thanks On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote: A large percentage of the rebuild of the cluster map (But low percentage degradation). If you had not made ceph osd crush rm id, the percentage would be low. In your case, the correct option is to remove the entire node, rather than each disk individually 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com : Another question - I mentioned here 37% of objects being moved arround - this is MISPLACED object (degraded objects were 0.001%, after I removed 1 OSD from cursh map (out of 44 OSD or so). Can anybody confirm this is normal behaviour - and are there any workarrounds ? I understand this is because of the object placement algorithm of CEPH, but still 37% of object missplaces just by removing 1 OSD from crush maps out of 44 make me wonder why this large percentage ? Seems not good
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
Thank you Rober - I'm wondering when I do remove total of 7 OSDs from crush map - weather that will cause more than 37% of data moved (80% or whatever) I'm also wondering if the thortling that I applied is fine or not - I will introduce the osd_recovery_delay_start 10sec as Irek said. I'm just wondering hom much will be the performance impact, because: - when stoping OSD, the impact while backfilling was fine more or a less - I can leave with this - when I removed OSD from cursh map - first 1h or so, impact was tremendous, and later on during recovery process impact was much less but still noticable... Thanks for the tip of course ! Andrija On 3 March 2015 at 18:34, Robert LeBlanc rob...@leblancnet.us wrote: I would be inclined to shut down both OSDs in a node, let the cluster recover. Once it is recovered, shut down the next two, let it recover. Repeat until all the OSDs are taken out of the cluster. Then I would set nobackfill and norecover. Then remove the hosts/disks from the CRUSH then unset nobackfill and norecover. That should give you a few small changes (when you shut down OSDs) and then one big one to get everything in the final place. If you are still adding new nodes, when nobackfill and norecover is set, you can add them in so that the one big relocate fills the new drives too. On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic andrija.pa...@gmail.com wrote: Thx Irek. Number of replicas is 3. I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already decommissioned), which is further connected to a new 10G switch/network with 3 servers on it with 12 OSDs each. I'm decommissioning old 3 nodes on 1G network... So you suggest removing whole node with 2 OSDs manually from crush map? Per my knowledge, ceph never places 2 replicas on 1 node, all 3 replicas were originally been distributed over all 3 nodes. So anyway It could be safe to remove 2 OSDs at once together with the node itself...since replica count is 3... ? Thx again for your time On Mar 3, 2015 1:35 PM, Irek Fasikhov malm...@gmail.com wrote: Once you have only three nodes in the cluster. I recommend you add new nodes to the cluster, and then delete the old. 2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com: You have a number of replication? 2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: Hi Irek, yes, stoping OSD (or seting it to OUT) resulted in only 3% of data degraded and moved/recovered. When I after that removed it from Crush map ceph osd crush rm id, that's when the stuff with 37% happened. And thanks Irek for help - could you kindly just let me know of the prefered steps when removing whole node? Do you mean I first stop all OSDs again, or just remove each OSD from crush map, or perhaps, just decompile cursh map, delete the node completely, compile back in, and let it heal/recover ? Do you think this would result in less data missplaces and moved arround ? Sorry for bugging you, I really appreaciate your help. Thanks On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote: A large percentage of the rebuild of the cluster map (But low percentage degradation). If you had not made ceph osd crush rm id, the percentage would be low. In your case, the correct option is to remove the entire node, rather than each disk individually 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: Another question - I mentioned here 37% of objects being moved arround - this is MISPLACED object (degraded objects were 0.001%, after I removed 1 OSD from cursh map (out of 44 OSD or so). Can anybody confirm this is normal behaviour - and are there any workarrounds ? I understand this is because of the object placement algorithm of CEPH, but still 37% of object missplaces just by removing 1 OSD from crush maps out of 44 make me wonder why this large percentage ? Seems not good to me, and I have to remove another 7 OSDs (we are demoting some old hardware nodes). This means I can potentialy go with 7 x the same number of missplaced objects...? Any thoughts ? Thanks On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote: Thanks Irek. Does this mean, that after peering for each PG, there will be delay of 10sec, meaning that every once in a while, I will have 10sec od the cluster NOT being stressed/overloaded, and then the recovery takes place for that PG, and then another 10sec cluster is fine, and then stressed again ? I'm trying to understand process before actually doing stuff (config reference is there on ceph.com but I don't fully understand the process) Thanks, Andrija On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote: Hi. Use value osd_recovery_delay_start example: [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok config show
[ceph-users] Implement replication network with live cluster
Hi, I'm having a live cluster with only public network (so no explicit network configuraion in the ceph.conf file) I'm wondering what is the procedure to implement dedicated Replication/Private and Public network. I've read the manual, know how to do it in ceph.conf, but I'm wondering since this is already running cluster - what should I do after I change ceph.conf on all nodes ? Restarting OSDs one by one, or... ? Is there any downtime expected ? - for the replication network to actually imlemented completely. Another related quetion: Also, I'm demoting some old OSDs, on old servers, I will have them all stoped, but would like to implement replication network before actually removing old OSDs from crush map - since lot of data will be moved arround. My old nodes/OSDs (that will be stoped before I implement replication network) - do NOT have dedicated NIC for replication network, in contrast to new nodes/OSDs. So there will be still reference to these old OSD in the crush map. Will this be a problem - me changing/implementing replication network that WILL work on new nodes/OSDs, but not on old ones since they don't have dedicated NIC ? I guess not since old OSDs are stoped anyway, but would like opinion. Or perhaps i might remove OSD from crush map with prior seting of nobackfill and norecover (so no rebalancing happens) and then implement replication netwotk? Sorry for old post, but... Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
Another question - I mentioned here 37% of objects being moved arround - this is MISPLACED object (degraded objects were 0.001%, after I removed 1 OSD from cursh map (out of 44 OSD or so). Can anybody confirm this is normal behaviour - and are there any workarrounds ? I understand this is because of the object placement algorithm of CEPH, but still 37% of object missplaces just by removing 1 OSD from crush maps out of 44 make me wonder why this large percentage ? Seems not good to me, and I have to remove another 7 OSDs (we are demoting some old hardware nodes). This means I can potentialy go with 7 x the same number of missplaced objects...? Any thoughts ? Thanks On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote: Thanks Irek. Does this mean, that after peering for each PG, there will be delay of 10sec, meaning that every once in a while, I will have 10sec od the cluster NOT being stressed/overloaded, and then the recovery takes place for that PG, and then another 10sec cluster is fine, and then stressed again ? I'm trying to understand process before actually doing stuff (config reference is there on ceph.com but I don't fully understand the process) Thanks, Andrija On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote: Hi. Use value osd_recovery_delay_start example: [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok config show | grep osd_recovery_delay_start osd_recovery_delay_start: 10 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: HI Guys, I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over 37% od the data to rebalance - let's say this is fine (this is when I removed it frm Crush Map). I'm wondering - I have previously set some throtling mechanism, but during first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s - and VMs were unusable completely, and then last 4h of the duration of recover this recovery rate went down to, say, 100-200 MB.s and during this VM performance was still pretty impacted, but at least I could work more or a less So my question, is this behaviour expected, is throtling here working as expected, since first 1h was almoust no throtling applied if I check the recovery rate 1500MB/s and the impact on Vms. And last 4h seemed pretty fine (although still lot of impact in general) I changed these throtling on the fly with: ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_op_priority 1' ceph tell osd.* injectargs '--osd_max_backfills 1' My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD, 6 journals on another SSD) - I have 3 of these hosts. Any thought are welcome. -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- Andrija Panić -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?
Hi Irek, yes, stoping OSD (or seting it to OUT) resulted in only 3% of data degraded and moved/recovered. When I after that removed it from Crush map ceph osd crush rm id, that's when the stuff with 37% happened. And thanks Irek for help - could you kindly just let me know of the prefered steps when removing whole node? Do you mean I first stop all OSDs again, or just remove each OSD from crush map, or perhaps, just decompile cursh map, delete the node completely, compile back in, and let it heal/recover ? Do you think this would result in less data missplaces and moved arround ? Sorry for bugging you, I really appreaciate your help. Thanks On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote: A large percentage of the rebuild of the cluster map (But low percentage degradation). If you had not made ceph osd crush rm id, the percentage would be low. In your case, the correct option is to remove the entire node, rather than each disk individually 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: Another question - I mentioned here 37% of objects being moved arround - this is MISPLACED object (degraded objects were 0.001%, after I removed 1 OSD from cursh map (out of 44 OSD or so). Can anybody confirm this is normal behaviour - and are there any workarrounds ? I understand this is because of the object placement algorithm of CEPH, but still 37% of object missplaces just by removing 1 OSD from crush maps out of 44 make me wonder why this large percentage ? Seems not good to me, and I have to remove another 7 OSDs (we are demoting some old hardware nodes). This means I can potentialy go with 7 x the same number of missplaced objects...? Any thoughts ? Thanks On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote: Thanks Irek. Does this mean, that after peering for each PG, there will be delay of 10sec, meaning that every once in a while, I will have 10sec od the cluster NOT being stressed/overloaded, and then the recovery takes place for that PG, and then another 10sec cluster is fine, and then stressed again ? I'm trying to understand process before actually doing stuff (config reference is there on ceph.com but I don't fully understand the process) Thanks, Andrija On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote: Hi. Use value osd_recovery_delay_start example: [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok config show | grep osd_recovery_delay_start osd_recovery_delay_start: 10 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: HI Guys, I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over 37% od the data to rebalance - let's say this is fine (this is when I removed it frm Crush Map). I'm wondering - I have previously set some throtling mechanism, but during first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s - and VMs were unusable completely, and then last 4h of the duration of recover this recovery rate went down to, say, 100-200 MB.s and during this VM performance was still pretty impacted, but at least I could work more or a less So my question, is this behaviour expected, is throtling here working as expected, since first 1h was almoust no throtling applied if I check the recovery rate 1500MB/s and the impact on Vms. And last 4h seemed pretty fine (although still lot of impact in general) I changed these throtling on the fly with: ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_op_priority 1' ceph tell osd.* injectargs '--osd_max_backfills 1' My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD, 6 journals on another SSD) - I have 3 of these hosts. Any thought are welcome. -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- Andrija Panić -- Andrija Panić -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757 -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Rebalance/Backfill Throtling - anything missing here?
HI Guys, I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over 37% od the data to rebalance - let's say this is fine (this is when I removed it frm Crush Map). I'm wondering - I have previously set some throtling mechanism, but during first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s - and VMs were unusable completely, and then last 4h of the duration of recover this recovery rate went down to, say, 100-200 MB.s and during this VM performance was still pretty impacted, but at least I could work more or a less So my question, is this behaviour expected, is throtling here working as expected, since first 1h was almoust no throtling applied if I check the recovery rate 1500MB/s and the impact on Vms. And last 4h seemed pretty fine (although still lot of impact in general) I changed these throtling on the fly with: ceph tell osd.* injectargs '--osd_recovery_max_active 1' ceph tell osd.* injectargs '--osd_recovery_op_priority 1' ceph tell osd.* injectargs '--osd_max_backfills 1' My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD, 6 journals on another SSD) - I have 3 of these hosts. Any thought are welcome. -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
Well, seems like they are on satellite :) On 6 May 2015 at 02:58, Matthew Monaco m...@monaco.cx wrote: On 05/05/2015 08:55 AM, Andrija Panic wrote: Hi, small update: in 3 months - we lost 5 out of 6 Samsung 128Gb 850 PROs (just few days in between of each SSD death) - cant believe it - NOT due to wearing out... I really hope we got efective series from suplier... That's ridiculous. Are these drives mounted un-shielded on a satellite? I didn't know the ISS had a ceph cluster. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
Hi, small update: in 3 months - we lost 5 out of 6 Samsung 128Gb 850 PROs (just few days in between of each SSD death) - cant believe it - NOT due to wearing out... I really hope we got efective series from suplier... Regards On 18 April 2015 at 14:24, Andrija Panic andrija.pa...@gmail.com wrote: yes I know, but to late now, I'm afraid :) On 18 April 2015 at 14:18, Josef Johansson jose...@gmail.com wrote: Have you looked into the samsung 845 dc? They are not that expensive last time I checked. /Josef On 18 Apr 2015 13:15, Andrija Panic andrija.pa...@gmail.com wrote: might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but these have horrible random/sequetial speeds - Samsun 850 PROs are 3 times at least faster on sequential, and more than 3 times faser on random/IOPS measures. And ofcourse modern enterprise drives = ... On 18 April 2015 at 12:42, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: Yes, it sure is - my experience with 'consumer' SSD is that they die with obscure firmware bugs (wrong capacity, zero capacity, not detected in bios anymore) rather than flash wearout. It seems that the 'enterprise' tagged drives are less inclined to suffer this fate. Regards Mark On 18/04/15 22:23, Andrija Panic wrote: these 2 drives, are on the regular SATA (on board)controler, and beside this, there is 12 x 4TB on the fron of the servers - normal backplane on the front. Anyway, we are going to check those dead SSDs on a pc/laptop or so,just to confirm they are really dead - but this is the way they die, not wear out, but simply show different space instead of real one - thse were 3 months old only when they died... On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com mailto:jose...@gmail.com wrote: If the same chassi/chip/backplane is behind both drives and maybe other drives in the chassi have troubles,it may be a defect there as well. On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com mailto:ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- Andrija Panić -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] replace dead SSD journal
Hi guys, I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph rebalanced etc. Now I have new SSD inside, and I will partition it etc - but would like to know, how to proceed now, with the journal recreation for those 6 OSDs that are down now. Should I flush journal (where to, journals doesnt still exist...?), or just recreate journal from scratch (making symboliv links again: ln -s /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs. I expect the folowing procedure, but would like confirmation please: rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link) ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal ceph-osd -i $ID --mkjournal ll /var/lib/ceph/osd/ceph-$ID/journal service ceph start osd.$ID Any thought greatly appreciated ! Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are down, and rebalancing is about finish... after which I need to fix the OSDs. On 17 April 2015 at 19:01, Josef Johansson jo...@oderland.se wrote: Hi, Did 6 other OSDs go down when re-adding? /Josef On 17 Apr 2015, at 18:49, Andrija Panic andrija.pa...@gmail.com wrote: 12 osds down - I expect less work with removing and adding osd? On Apr 17, 2015 6:35 PM, Krzysztof Nowicki krzysztof.a.nowi...@gmail.com wrote: Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the existing OSD UUID, copy the keyring and let it populate itself? pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic andrija.pa...@gmail.com napisał: Thx guys, thats what I will be doing at the end. Cheers On Apr 17, 2015 6:24 PM, Robert LeBlanc rob...@leblancnet.us wrote: Delete and re-add all six OSDs. On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi guys, I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph rebalanced etc. Now I have new SSD inside, and I will partition it etc - but would like to know, how to proceed now, with the journal recreation for those 6 OSDs that are down now. Should I flush journal (where to, journals doesnt still exist...?), or just recreate journal from scratch (making symboliv links again: ln -s /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs. I expect the folowing procedure, but would like confirmation please: rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link) ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal ceph-osd -i $ID --mkjournal ll /var/lib/ceph/osd/ceph-$ID/journal service ceph start osd.$ID Any thought greatly appreciated ! Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
Thx guys, thats what I will be doing at the end. Cheers On Apr 17, 2015 6:24 PM, Robert LeBlanc rob...@leblancnet.us wrote: Delete and re-add all six OSDs. On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi guys, I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph rebalanced etc. Now I have new SSD inside, and I will partition it etc - but would like to know, how to proceed now, with the journal recreation for those 6 OSDs that are down now. Should I flush journal (where to, journals doesnt still exist...?), or just recreate journal from scratch (making symboliv links again: ln -s /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs. I expect the folowing procedure, but would like confirmation please: rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link) ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal ceph-osd -i $ID --mkjournal ll /var/lib/ceph/osd/ceph-$ID/journal service ceph start osd.$ID Any thought greatly appreciated ! Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
12 osds down - I expect less work with removing and adding osd? On Apr 17, 2015 6:35 PM, Krzysztof Nowicki krzysztof.a.nowi...@gmail.com wrote: Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the existing OSD UUID, copy the keyring and let it populate itself? pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic andrija.pa...@gmail.com napisał: Thx guys, thats what I will be doing at the end. Cheers On Apr 17, 2015 6:24 PM, Robert LeBlanc rob...@leblancnet.us wrote: Delete and re-add all six OSDs. On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi guys, I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph rebalanced etc. Now I have new SSD inside, and I will partition it etc - but would like to know, how to proceed now, with the journal recreation for those 6 OSDs that are down now. Should I flush journal (where to, journals doesnt still exist...?), or just recreate journal from scratch (making symboliv links again: ln -s /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs. I expect the folowing procedure, but would like confirmation please: rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link) ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal ceph-osd -i $ID --mkjournal ll /var/lib/ceph/osd/ceph-$ID/journal service ceph start osd.$ID Any thought greatly appreciated ! Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc... ) On 17 April 2015 at 21:01, Josef Johansson jose...@gmail.com wrote: tough luck, hope everything comes up ok afterwards. What models on the SSD? /Josef On 17 Apr 2015 20:05, Andrija Panic andrija.pa...@gmail.com wrote: SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are down, and rebalancing is about finish... after which I need to fix the OSDs. On 17 April 2015 at 19:01, Josef Johansson jo...@oderland.se wrote: Hi, Did 6 other OSDs go down when re-adding? /Josef On 17 Apr 2015, at 18:49, Andrija Panic andrija.pa...@gmail.com wrote: 12 osds down - I expect less work with removing and adding osd? On Apr 17, 2015 6:35 PM, Krzysztof Nowicki krzysztof.a.nowi...@gmail.com wrote: Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the existing OSD UUID, copy the keyring and let it populate itself? pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic andrija.pa...@gmail.com napisał: Thx guys, thats what I will be doing at the end. Cheers On Apr 17, 2015 6:24 PM, Robert LeBlanc rob...@leblancnet.us wrote: Delete and re-add all six OSDs. On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi guys, I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph rebalanced etc. Now I have new SSD inside, and I will partition it etc - but would like to know, how to proceed now, with the journal recreation for those 6 OSDs that are down now. Should I flush journal (where to, journals doesnt still exist...?), or just recreate journal from scratch (making symboliv links again: ln -s /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs. I expect the folowing procedure, but would like confirmation please: rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link) ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal ceph-osd -i $ID --mkjournal ll /var/lib/ceph/osd/ceph-$ID/journal service ceph start osd.$ID Any thought greatly appreciated ! Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
damn, good news for me, pssibly bad news for you :) what is wearing level (samrtctl -a /dev/sdX) - attribute near the end of the atribute list... thx On 17 April 2015 at 21:12, Krzysztof Nowicki krzysztof.a.nowi...@gmail.com wrote: I have two of them in my cluster (plus one 256GB version) for about half a year now. So far so good. I'll be keeping a closer look at them. pt., 17 kwi 2015, 21:07 Andrija Panic użytkownik andrija.pa...@gmail.com napisał: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc... ) On 17 April 2015 at 21:01, Josef Johansson jose...@gmail.com wrote: tough luck, hope everything comes up ok afterwards. What models on the SSD? /Josef On 17 Apr 2015 20:05, Andrija Panic andrija.pa...@gmail.com wrote: SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are down, and rebalancing is about finish... after which I need to fix the OSDs. On 17 April 2015 at 19:01, Josef Johansson jo...@oderland.se wrote: Hi, Did 6 other OSDs go down when re-adding? /Josef On 17 Apr 2015, at 18:49, Andrija Panic andrija.pa...@gmail.com wrote: 12 osds down - I expect less work with removing and adding osd? On Apr 17, 2015 6:35 PM, Krzysztof Nowicki krzysztof.a.nowi...@gmail.com wrote: Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the existing OSD UUID, copy the keyring and let it populate itself? pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic andrija.pa...@gmail.com napisał: Thx guys, thats what I will be doing at the end. Cheers On Apr 17, 2015 6:24 PM, Robert LeBlanc rob...@leblancnet.us wrote: Delete and re-add all six OSDs. On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic andrija.pa...@gmail.com wrote: Hi guys, I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph rebalanced etc. Now I have new SSD inside, and I will partition it etc - but would like to know, how to proceed now, with the journal recreation for those 6 OSDs that are down now. Should I flush journal (where to, journals doesnt still exist...?), or just recreate journal from scratch (making symboliv links again: ln -s /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs. I expect the folowing procedure, but would like confirmation please: rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link) ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal ceph-osd -i $ID --mkjournal ll /var/lib/ceph/osd/ceph-$ID/journal service ceph start osd.$ID Any thought greatly appreciated ! Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
yes I know, but to late now, I'm afraid :) On 18 April 2015 at 14:18, Josef Johansson jose...@gmail.com wrote: Have you looked into the samsung 845 dc? They are not that expensive last time I checked. /Josef On 18 Apr 2015 13:15, Andrija Panic andrija.pa...@gmail.com wrote: might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but these have horrible random/sequetial speeds - Samsun 850 PROs are 3 times at least faster on sequential, and more than 3 times faser on random/IOPS measures. And ofcourse modern enterprise drives = ... On 18 April 2015 at 12:42, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: Yes, it sure is - my experience with 'consumer' SSD is that they die with obscure firmware bugs (wrong capacity, zero capacity, not detected in bios anymore) rather than flash wearout. It seems that the 'enterprise' tagged drives are less inclined to suffer this fate. Regards Mark On 18/04/15 22:23, Andrija Panic wrote: these 2 drives, are on the regular SATA (on board)controler, and beside this, there is 12 x 4TB on the fron of the servers - normal backplane on the front. Anyway, we are going to check those dead SSDs on a pc/laptop or so,just to confirm they are really dead - but this is the way they die, not wear out, but simply show different space instead of real one - thse were 3 months old only when they died... On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com mailto:jose...@gmail.com wrote: If the same chassi/chip/backplane is behind both drives and maybe other drives in the chassi have troubles,it may be a defect there as well. On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com mailto:ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but these have horrible random/sequetial speeds - Samsun 850 PROs are 3 times at least faster on sequential, and more than 3 times faser on random/IOPS measures. And ofcourse modern enterprise drives = ... On 18 April 2015 at 12:42, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: Yes, it sure is - my experience with 'consumer' SSD is that they die with obscure firmware bugs (wrong capacity, zero capacity, not detected in bios anymore) rather than flash wearout. It seems that the 'enterprise' tagged drives are less inclined to suffer this fate. Regards Mark On 18/04/15 22:23, Andrija Panic wrote: these 2 drives, are on the regular SATA (on board)controler, and beside this, there is 12 x 4TB on the fron of the servers - normal backplane on the front. Anyway, we are going to check those dead SSDs on a pc/laptop or so,just to confirm they are really dead - but this is the way they die, not wear out, but simply show different space instead of real one - thse were 3 months old only when they died... On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com mailto:jose...@gmail.com wrote: If the same chassi/chip/backplane is behind both drives and maybe other drives in the chassi have troubles,it may be a defect there as well. On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com mailto:ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com mailto:andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-deploy journal on separate partition - quck info needed
Hi all, when I run: ceph-deploy osd create SERVER:sdi:/dev/sdb5 (sdi = previously ZAP-ed 4TB drive) (sdb5 = previously manually created empty partition with fdisk) Is ceph-deploy going to create journal properly on sdb5 (something similar to: ceph-osd -i $ID --mkjournal ), or do I need to do something before this ? I have actually already run this command but havent seen any mkjournal commands in the output OSD shows as up and in, but I have doubts if journal is fine (symlink does link to /dev/sdb5) but again... Any confimration is welcomed Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-deploy journal on separate partition - quck info needed
ok, thx Robert - I expected that so this is fine then - just done it on 12 OSDs and all fine... thx again On 17 April 2015 at 23:38, Robert LeBlanc rob...@leblancnet.us wrote: If the journal file on the osd is a symlink to the partition and the OSD process is running, then the journal was created properly. The OSD would not start if the journal was not created. On Fri, Apr 17, 2015 at 2:43 PM, Andrija Panic andrija.pa...@gmail.com wrote: Hi all, when I run: ceph-deploy osd create SERVER:sdi:/dev/sdb5 (sdi = previously ZAP-ed 4TB drive) (sdb5 = previously manually created empty partition with fdisk) Is ceph-deploy going to create journal properly on sdb5 (something similar to: ceph-osd -i $ID --mkjournal ), or do I need to do something before this ? I have actually already run this command but havent seen any mkjournal commands in the output OSD shows as up and in, but I have doubts if journal is fine (symlink does link to /dev/sdb5) but again... Any confimration is welcomed Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
heh :) yes, intresting last name :) anyway, all are the exact same age, we implememnted new CEPH nodes at exactly same time - but it's now wearing problem - the dead SSDs were siply DEAD - smartctl-a showing nothing, except 600 PB space/size :) On 18 April 2015 at 09:41, Steffen W Sørensen ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
these 2 drives, are on the regular SATA (on board)controler, and beside this, there is 12 x 4TB on the fron of the servers - normal backplane on the front. Anyway, we are going to check those dead SSDs on a pc/laptop or so,just to confirm they are really dead - but this is the way they die, not wear out, but simply show different space instead of real one - thse were 3 months old only when they died... On 18 April 2015 at 11:55, Josef Johansson jose...@gmail.com wrote: If the same chassi/chip/backplane is behind both drives and maybe other drives in the chassi have troubles,it may be a defect there as well. On 18 Apr 2015 09:42, Steffen W Sørensen ste...@me.com wrote: On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Repair inconsistent pgs..
Guys, I'm Igor's colleague, working a bit on CEPH, together with Igor. This is production cluster, and we are becoming more desperate as the time goes by. Im not sure if this is appropriate place to seek commercial support, but anyhow, I do it... If anyone feels like and have some experience in this particular PG troubleshooting issues, we are also ready to seek for commercial support to solve our issue, company or individual, it doesn't matter. Thanks, Andrija On 20 August 2015 at 19:07, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Inktank: https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf Mail-list: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18338.html 2015-08-20 20:06 GMT+03:00 Samuel Just sj...@redhat.com: Which docs? -Sam On Thu, Aug 20, 2015 at 9:57 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Not yet. I will create. But according to mail lists and Inktank docs - it's expected behaviour when cache enable 2015-08-20 19:56 GMT+03:00 Samuel Just sj...@redhat.com: Is there a bug for this in the tracker? -Sam On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Issue, that in forward mode, fstrim doesn't work proper, and when we take snapshot - data not proper update in cache layer, and client (ceph) see damaged snap.. As headers requested from cache layer. 2015-08-20 19:53 GMT+03:00 Samuel Just sj...@redhat.com: What was the issue? -Sam On Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Samuel, we turned off cache layer few hours ago... I will post ceph.log in few minutes For snap - we found issue, was connected with cache tier.. 2015-08-20 19:23 GMT+03:00 Samuel Just sj...@redhat.com: Ok, you appear to be using a replicated cache tier in front of a replicated base tier. Please scrub both inconsistent pgs and post the ceph.log from before when you started the scrub until after. Also, what command are you using to take snapshots? -Sam On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi Samuel, we try to fix it in trick way. we check all rbd_data chunks from logs (OSD) which are affected, then query rbd info to compare which rbd consist bad rbd_data, after that we mount this rbd as rbd0, create empty rbd, and DD all info from bad volume to new one. But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos try to out OSD which was lead, but after rebalancing this 2 pgs still have 35 scrub errors... ceph osd getmap -o outfile - attached 2015-08-18 18:48 GMT+03:00 Samuel Just sj...@redhat.com: Is the number of inconsistent objects growing? Can you attach the whole ceph.log from the 6 hours before and after the snippet you linked above? Are you using cache/tiering? Can you attach the osdmap (ceph osd getmap -o outfile)? -Sam On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: ceph - 0.94.2 Its happen during rebalancing I thought too, that some OSD miss copy, but looks like all miss... So any advice in which direction i need to go 2015-08-18 14:14 GMT+03:00 Gregory Farnum gfar...@redhat.com: From a quick peek it looks like some of the OSDs are missing clones of objects. I'm not sure how that could happen and I'd expect the pg repair to handle that but if it's not there's probably something wrong; what version of Ceph are you running? Sam, is this something you've seen, a new bug, or some kind of config issue? -Greg On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs in inconsistent state... root@temp:~# ceph health detail | grep inc HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] From OSD logs, after recovery attempt: root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done dumped all in format plain instructing pg 2.490 on osd.56 to repair instructing pg 2.c4 on osd.56 to repair /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 7f94663b3700 -1 log_channel(cluster) log [ERR] : deep-scrub 2.490 f5759490/rbd_data.1631755377d7e.04da/head//2
Re: [ceph-users] Broken snapshots... CEPH 0.94.2
This was related to the caching layer, which doesnt support snapshooting per docs...for sake of closing the thread. On 17 August 2015 at 21:15, Voloshanenko Igor igor.voloshane...@gmail.com wrote: Hi all, can you please help me with unexplained situation... All snapshot inside ceph broken... So, as example, we have VM template, as rbd inside ceph. We can map it and mount to check that all ok with it root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5 /dev/rbd0 root@test:~# parted /dev/rbd0 print Model: Unknown (unknown) Disk /dev/rbd0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End SizeType File system Flags 1 1049kB 525MB 524MB primary ext4 boot 2 525MB 10.7GB 10.2GB primary lvm Than i want to create snap, so i do: root@test:~# rbd snap create cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap And now i want to map it: root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap /dev/rbd1 root@test:~# parted /dev/rbd1 print Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Warning: Unable to open /dev/rbd1 read-write (Read-only file system). /dev/rbd1 has been opened read-only. Error: /dev/rbd1: unrecognised disk label Even md5 different... root@ix-s2:~# md5sum /dev/rbd0 9a47797a07fee3a3d71316e22891d752 /dev/rbd0 root@ix-s2:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 Ok, now i protect snap and create clone... but same thing... md5 for clone same as for snap,, root@test:~# rbd unmap /dev/rbd1 root@test:~# rbd snap protect cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap root@test:~# rbd clone cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5@new_snap cold-storage/test-image root@test:~# rbd map cold-storage/test-image /dev/rbd1 root@test:~# md5sum /dev/rbd1 e450f50b9ffa0073fae940ee858a43ce /dev/rbd1 but it's broken... root@test:~# parted /dev/rbd1 print Error: /dev/rbd1: unrecognised disk label = tech details: root@test:~# ceph -v ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) We have 2 inconstistent pgs, but all images not placed on this pgs... root@test:~# ceph health detail HEALTH_ERR 2 pgs inconsistent; 18 scrub errors pg 2.490 is active+clean+inconsistent, acting [56,15,29] pg 2.c4 is active+clean+inconsistent, acting [56,10,42] 18 scrub errors root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5 osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5' - pg 2.74458f70 (2.770) - up ([37,15,14], p37) acting ([37,15,14], p37) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@snap osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@snap' - pg 2.793cd4a3 (2.4a3) - up ([12,23,17], p12) acting ([12,23,17], p12) root@test:~# ceph osd map cold-storage 0e23c701-401d-4465-b9b4-c02939d57bb5@test-image osdmap e16770 pool 'cold-storage' (2) object '0e23c701-401d-4465-b9b4-c02939d57bb5@test-image' - pg 2.9519c2a9 (2.2a9) - up ([12,44,23], p12) acting ([12,44,23], p12) Also we use cache layer, which in current moment - in forward mode... Can you please help me with this.. As my brain stop to understand what is going on... Thank in advance! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700
Make sure you test what ever you decide. We just learned this the hard way with samsung 850 pro, which is total crap, more than you could imagine... Andrija On Aug 25, 2015 11:25 AM, Jan Schermer j...@schermer.cz wrote: I would recommend Samsung 845 DC PRO (not EVO, not just PRO). Very cheap, better than Intel 3610 for sure (and I think it beats even 3700). Jan On 25 Aug 2015, at 11:23, Christopher Kunz chrisl...@de-punkt.de wrote: Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator: Hi, most of the times I do get the recommendation from resellers to go with the intel s3700 for the journalling. Check out the Intel s3610. 3 drive writes per day for 5 years. Plus, it is cheaper than S3700. Regards, --ck ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700
First read please: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those are constant performance numbers, meaning avoiding drives cache and running for longer period of time... Also if checking with FIO you will get better latencies on intel s3500 (model tested in our case) along with 20X better IOPS results... We observed original issue by having high speed at begining of i.e. file transfer inside VM, which than halts to zero... We moved journals back to HDDs and performans was acceptable...no we are upgrading to intel S3500... Best any details on that ? On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic andrija.pa...@gmail.com wrote: Make sure you test what ever you decide. We just learned this the hard way with samsung 850 pro, which is total crap, more than you could imagine... Andrija On Aug 25, 2015 11:25 AM, Jan Schermer j...@schermer.cz wrote: I would recommend Samsung 845 DC PRO (not EVO, not just PRO). Very cheap, better than Intel 3610 for sure (and I think it beats even 3700). Jan On 25 Aug 2015, at 11:23, Christopher Kunz chrisl...@de-punkt.de wrote: Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator: Hi, most of the times I do get the recommendation from resellers to go with the intel s3700 for the journalling. Check out the Intel s3610. 3 drive writes per day for 5 years. Plus, it is cheaper than S3700. Regards, --ck ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: mariusz.gronczew...@efigence.com mailto:mariusz.gronczew...@efigence.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700
We have some 850 pro 256gb ssds if anyone interested to buy:) And also there was new 850 pro firmware that broke peoples disk which was revoked later etc... I'm sticking with only vacuum cleaners from Samsung for now, maybe... :) On Aug 25, 2015 12:02 PM, Voloshanenko Igor igor.voloshane...@gmail.com wrote: To be honest, Samsung 850 PRO not 24/7 series... it's something about desktop+ series, but anyway - results from this drives - very very bad in any scenario acceptable by real life... Possible 845 PRO more better, but we don't want to experiment anymore... So we choose S3500 240G. Yes, it's cheaper than S3700 (about 2x times), and no so durable for writes, but we think more better to replace 1 ssd per 1 year than to pay double price now. 2015-08-25 12:59 GMT+03:00 Andrija Panic andrija.pa...@gmail.com: And should I mention that in another CEPH installation we had samsung 850 pro 128GB and all of 6 ssds died in 2 month period - simply disappear from the system, so not wear out... Never again we buy Samsung :) On Aug 25, 2015 11:57 AM, Andrija Panic andrija.pa...@gmail.com wrote: First read please: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those are constant performance numbers, meaning avoiding drives cache and running for longer period of time... Also if checking with FIO you will get better latencies on intel s3500 (model tested in our case) along with 20X better IOPS results... We observed original issue by having high speed at begining of i.e. file transfer inside VM, which than halts to zero... We moved journals back to HDDs and performans was acceptable...no we are upgrading to intel S3500... Best any details on that ? On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic andrija.pa...@gmail.com wrote: Make sure you test what ever you decide. We just learned this the hard way with samsung 850 pro, which is total crap, more than you could imagine... Andrija On Aug 25, 2015 11:25 AM, Jan Schermer j...@schermer.cz wrote: I would recommend Samsung 845 DC PRO (not EVO, not just PRO). Very cheap, better than Intel 3610 for sure (and I think it beats even 3700). Jan On 25 Aug 2015, at 11:23, Christopher Kunz chrisl...@de-punkt.de wrote: Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator: Hi, most of the times I do get the recommendation from resellers to go with the intel s3700 for the journalling. Check out the Intel s3610. 3 drive writes per day for 5 years. Plus, it is cheaper than S3700. Regards, --ck ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: mariusz.gronczew...@efigence.com mailto:mariusz.gronczew...@efigence.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700
And should I mention that in another CEPH installation we had samsung 850 pro 128GB and all of 6 ssds died in 2 month period - simply disappear from the system, so not wear out... Never again we buy Samsung :) On Aug 25, 2015 11:57 AM, Andrija Panic andrija.pa...@gmail.com wrote: First read please: http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those are constant performance numbers, meaning avoiding drives cache and running for longer period of time... Also if checking with FIO you will get better latencies on intel s3500 (model tested in our case) along with 20X better IOPS results... We observed original issue by having high speed at begining of i.e. file transfer inside VM, which than halts to zero... We moved journals back to HDDs and performans was acceptable...no we are upgrading to intel S3500... Best any details on that ? On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic andrija.pa...@gmail.com wrote: Make sure you test what ever you decide. We just learned this the hard way with samsung 850 pro, which is total crap, more than you could imagine... Andrija On Aug 25, 2015 11:25 AM, Jan Schermer j...@schermer.cz wrote: I would recommend Samsung 845 DC PRO (not EVO, not just PRO). Very cheap, better than Intel 3610 for sure (and I think it beats even 3700). Jan On 25 Aug 2015, at 11:23, Christopher Kunz chrisl...@de-punkt.de wrote: Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator: Hi, most of the times I do get the recommendation from resellers to go with the intel s3700 for the journalling. Check out the Intel s3610. 3 drive writes per day for 5 years. Plus, it is cheaper than S3700. Regards, --ck ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: mariusz.gronczew...@efigence.com mailto:mariusz.gronczew...@efigence.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] which SSD / experiences with Samsung 843T vs. Intel s3700
Quentin, try fio or dd with O_DIRECT and D_SYNC flags, and you will see less than 1MB/s - that is common for most "home" drives - check the post down to understand We removed all Samsung 850 pro 256GB from our new CEPH installation and replaced with Intel S3500 (18.000 (4Kb) IOPS constant write speed with O_DIRECT, D_SYNC, in comparison to 200 IOPS for Samsun 850pro - you can imagine the difference...): http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ Best On 4 September 2015 at 21:09, Quentin Hartman <qhart...@direwolfdigital.com> wrote: > Mine are also mostly 850 Pros. I have a few 840s, and a few 850 EVOs in > there just because I couldn't find 14 pros at the time we were ordering > hardware. I have 14 nodes, each with a single 128 or 120GB SSD that serves > as the boot drive and the journal for 3 OSDs. And similarly, mine just > started disappearing a few weeks ago. I've now had four fail (three 850 > Pro, one 840 Pro). I expect the rest to fail any day. > > As it turns out I had a phone conversation with the support rep who has > been helping me with RMA's today and he's putting together a report with my > pertinent information in it to forward on to someone. > > FWIW, I tried to get your 845's for this deploy, but couldn't find them > anywhere, and since the 850's looked about as durable on paper I figured > they would do ok. Seems not to be the case. > > QH > > On Fri, Sep 4, 2015 at 12:53 PM, Andrija Panic <andrija.pa...@gmail.com> > wrote: > >> Hi James, >> >> I had 3 CEPH nodes as folowing: 12 OSDs(HDD) and 2 SSDs (2x 6 Journals >> partitions on each SSD) - SSDs just vanished with no warning, no smartctl >> errors nothing... so 2 SSDs in 3 servers vanished in...2-3 weeks, after a >> 3-4 months of being in production (VMs/KVM/CloudStack) >> >> Mine were also Samsung 850 PRO 128GB. >> >> Best, >> Andrija >> >> On 4 September 2015 at 19:27, James (Fei) Liu-SSI < >> james@ssi.samsung.com> wrote: >> >>> Hi Quentin and Andrija, >>> >>> Thanks so much for reporting the problems with Samsung. >>> >>> >>> >>> Would be possible to get to know your configuration of your system? >>> What kind of workload are you running? Do you use Samsung SSD as separate >>> journaling disk, right? >>> >>> >>> >>> Thanks so much. >>> >>> >>> >>> James >>> >>> >>> >>> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On >>> Behalf Of *Quentin Hartman >>> *Sent:* Thursday, September 03, 2015 1:06 PM >>> *To:* Andrija Panic >>> *Cc:* ceph-users >>> *Subject:* Re: [ceph-users] which SSD / experiences with Samsung 843T >>> vs. Intel s3700 >>> >>> >>> >>> Yeah, we've ordered some S3700's to replace them already. Should be here >>> early next week. Hopefully they arrive before we have multiple nodes die at >>> once and can no longer rebalance successfully. >>> >>> >>> >>> Most of the drives I have are the 850 Pro 128GB (specifically >>> MZ7KE128HMGA) >>> >>> There are a couple 120GB 850 EVOs in there too, but ironically, none of >>> them have pooped out yet. >>> >>> >>> >>> On Thu, Sep 3, 2015 at 1:58 PM, Andrija Panic <andrija.pa...@gmail.com> >>> wrote: >>> >>> I really advise removing the bastards becore they die...no rebalancing >>> hapening just temp osd down while replacing journals... >>> >>> What size and model are yours Samsungs? >>> >>> On Sep 3, 2015 7:10 PM, "Quentin Hartman" <qhart...@direwolfdigital.com> >>> wrote: >>> >>> We also just started having our 850 Pros die one after the other after >>> about 9 months of service. 3 down, 11 to go... No warning at all, the drive >>> is fine, and then it's not even visible to the machine. According to the >>> stats in hdparm and the calcs I did they should have had years of life >>> left, so it seems that ceph journals definitely do something they do not >>> like, which is not reflected in their stats. >>> >>> >>> >>> QH >>> >>> >>> >>> On Wed, Aug 26, 2015 at 7:15 AM, 10 minus <t10te...@gmail.com> wrote: >>> >>> Hi , >>> >>> We got a good deal on 843T and we are using it in our Openstack setup >>> ..as journals . >>> The