[ceph-users] FileStore OSD, journal direct symlinked, permission troubles.
I've just finished a double upgrade on my ceph (PVE-based) from hammer to jewel and from jewel to luminous. All went well, apart that... OSD does not restart automatically, because permission troubles on the journal: Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.449886 7fa505a43e00 -1 filestore(/var/lib/ceph/osd/ceph-2) mount(1822): failed to open journal /var/lib/ceph/osd/ceph-2/journal: (13) Permission denied Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.453524 7fa505a43e00 -1 osd.2 0 OSD:init: unable to mount object store Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.453535 7fa505a43e00 -1 #033[0;31m ** ERROR: osd init failed: (13) Permission denied#033[0m A little fast rewind: when i've setup the cluster i've used some 'old' servers, using a couple of SSD disks as SO and as journal. Because servers was old, i was forced to partition the boot disk in DOS, not GPT mode. While creating the OSD, i've received some warnings: WARNING:ceph-disk:Journal /dev/sdaX was not prepared with ceph-disk. Symlinking directly. Looking at the cluster now, seems to me that osd init scripts try to idetify journal based on GPT partition label/info, and clearly fail. Not that if i do, on servers that hold OSD: for l in $(readlink -f /var/lib/ceph/osd/ceph-*/journal); do chown ceph: $l; done OSD start flawlessy. There's something i can do? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Post-mortem analisys?
Mandi! Martin Verges In chel di` si favelave... > first of all, hyperconverged setups with public accessable VMs could be > affected by DDoS attacks or other harmful issues that causes cascading errors > in your infrastructure. No, private cluster. > Are you sure your network worked correctly at the time? Not completely sure, but i've redirected syslog of switches, to a server, and i've not catch main errors/failures signs. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Post-mortem analisys?
[I't is not really a 'mortem', but...] Saturday afternoon, my 3-nodes proxmox ceph cluster have a big 'slowdown', that started at 12:35:24 with some OOM condition in one of the 3 storage nodes, followed with also OOM on another node, at 12:43:31. After that, all bad things happens: stuck requests, SCSI timeout on VMs, MONs flip-flop on RBD clients. I make a 'ceph -s' every hour, so at 14:17:01 i had at two nodes: cluster 8794c124-c2ec-4e81-8631-742992159bd6 health HEALTH_WARN 26 requests are blocked > 32 sec monmap e9: 5 mons at {2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0,blackpanther=10.27.251.2:6789/0,capitanmarvel=10.27.251.8:6789/0} election epoch 3930, quorum 0,1,2,3,4 blackpanther,capitanmarvel,4,2,3 osdmap e15713: 12 osds: 12 up, 12 in pgmap v67358590: 768 pgs, 3 pools, GB data, 560 kobjects 6639 GB used, 11050 GB / 17689 GB avail 768 active+clean client io 266 kB/s wr, 25 op/s and on the third: cluster 8794c124-c2ec-4e81-8631-742992159bd6 health HEALTH_WARN 5 mons down, quorum monmap e9: 5 mons at {2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0,blackpanther=10.27.251.2:6789/0,capitanmarvel=10.27.251.8:6789/0} election epoch 3931, quorum osdmap e15713: 12 osds: 12 up, 12 in pgmap v67358598: 768 pgs, 3 pools, GB data, 560 kobjects 6639 GB used, 11050 GB / 17689 GB avail 767 active+clean 1 active+clean+scrubbing client io 617 kB/s wr, 70 op/s At that hour, the site served by the cluster was just closed (eg, no users). The only task running, looking at logs, seems a backup (bacula), but was just saving catalog, eg database workload on a container, and ended at 14.27. All that continue, more or less, till sunday morning, then all goes back as normal. Seems there was no hardware failures on nodes. Backup tasks (all VM/LXC backups) on saturday night competed with no errors. Someone can provide some hint on how to 'correlate' various logs, and so (try to) understand what happens? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HW failure cause client IO drops
Mandi! M Ranga Swami Reddy In chel di` si favelave... > Hello - Recevenlt we had an issue with storage node's battery failure, which > cause ceph client IO dropped to '0' bytes. Means ceph cluster couldn't perform > IO operations on the cluster till the node takes out. This is not expected > from > Ceph, as some HW fails, those respective OSDs should mark as out/down and IO > should go as is.. > Please let me know if anyone seen the similar behavior and is this issue > resolved? 'battery' mean 'CMOS battery'? OSDs and MONs need accurate clock sync between them. So, if a node reboot with a clock skew more than (AFAI Remember well) 5 seconds, OSD does not start. Provide a stable NTP server for all your OSDs and MONs, and restart OSDs after clock are in sync. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD after OS reinstallation.
Mandi! Alfredo Deza In chel di` si favelave... > There are ways to create partitions without a PARTUUID. We have an > example in our docs with parted that will produce what is needed: > http://docs.ceph.com/docs/master/ceph-volume/lvm/prepare/#partitioning > But then again... I would strongly suggest avoiding all of this and > just using the new way of doing OSDs with LVM Ahem... i'm still on hammer... ;-))) -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Prevent rebalancing in the same host?
Mandi! Christian Balzer In chel di` si favelave... > You pretty much answered your question, as in a limit of "osd' would do > the trick, though not just for intra-host. Oh, documentation does not list the possible values... good to know. > But of course everybody will (rightly) tell you that you need enough > capacity to at the very least deal with a single OSD loss. Super clear. Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD after OS reinstallation.
Mandi! Alfredo Deza In chel di` si favelave... > The problem is that if there is no PARTUUID ceph-volume can't ensure > what device is the one actually pointing to data/journal. Being 'GPT' > alone will not be enough here :( Ok. There's some way to 'force' a PARTUUID, in a GPT or non-GPT partition, clearly even, if needed, destroying it? I've tried also to create a GPT partition in a DOS partition (eg, in a /dev/sda5), and seems that GPT partition get correctly created, but still (sub) partition have no PARTUUID... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD after OS reinstallation.
Mandi! Alfredo Deza In chel di` si favelave... > > Ahem, how can i add a GPT label to a non-GPT partition (even loosing > > data)? > If you are coming from ceph-disk (or something else custom-made) and > don't care about losing data, why not fully migrate to the > new OSDs? > http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#rados-replacing-an-osd I'm using proxmox, so 'pveceph' helper, but i've trouble with journal lables, indeed, not main filesystem labels... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD after OS reinstallation.
Mandi! Alfredo Deza In chel di` si favelave... > I think this is what happens with a non-gpt partition. GPT labels will > use a PARTUUID to identify the partition, and I just confirmed that > ceph-volume will enforce looking for PARTUUID if the JSON > identified a partition (vs. an LV). > From what I briefly researched it is not possible to add a GPT label > on a non-gpt partition without losing data. Ahem, how can i add a GPT label to a non-GPT partition (even loosing data)? Seems the culprit around my 'Proxmox 4.4, Ceph hammer, OSD cache link...' thread... Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Prevent rebalancing in the same host?
Little cluster, 3 nodes, 4 OSD per node. An OSD died, and ceph start to rebalance data between the OSD of the same node (not completing it, leading to 'near os full' warning). As exist: mon osd down out subtree limit = host to prevent host rebalancing, there's some way to prevent intra-host OSD rebalancing? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Proxmox 4.4, Ceph hammer, OSD cache link...
Mandi! Michel Raabe In chel di` si favelave... > Have you changed/add the journal_uuid from the old partition? > https://ceph.com/geen-categorie/ceph-recover-osds-after-ssd-journal-failure/ root@blackpanther:~# ls -la /var/lib/ceph/osd/ceph-15 totale 56 drwxr-xr-x 3 root root 199 nov 21 23:08 . drwxr-xr-x 6 root root 4096 nov 21 23:08 .. -rw-r--r-- 1 root root 903 nov 21 23:08 activate.monmap -rw-r--r-- 1 root root3 nov 21 23:08 active -rw-r--r-- 1 root root 37 nov 21 23:08 ceph_fsid drwxr-xr-x 292 root root 8192 dic 9 15:02 current -rw-r--r-- 1 root root 37 nov 21 23:08 fsid lrwxrwxrwx 1 root root9 nov 21 23:08 journal -> /dev/sda8 -rw--- 1 root root 57 nov 21 23:08 keyring -rw-r--r-- 1 root root 21 nov 21 23:08 magic -rw-r--r-- 1 root root6 nov 21 23:08 ready -rw-r--r-- 1 root root4 nov 21 23:08 store_version -rw-r--r-- 1 root root 53 nov 21 23:08 superblock -rw-r--r-- 1 root root0 nov 21 23:08 sysvinit -rw-r--r-- 1 root root3 nov 21 23:08 whoami Ahem, i've no 'journal_uuid' file on OSD... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Proxmox 4.4, Ceph hammer, OSD cache link...
I come back here. > I've recently added a host to my ceph cluster, using proxmox 'helpers' > to add OSD, eg: > > pveceph createosd /dev/sdb -journal_dev /dev/sda5 > > and now i've: > > root@blackpanther:~# ls -la /var/lib/ceph/osd/ceph-12 > totale 60 > drwxr-xr-x 3 root root 199 nov 21 17:02 . > drwxr-xr-x 6 root root 4096 nov 21 23:08 .. > -rw-r--r-- 1 root root 903 nov 21 17:02 activate.monmap > -rw-r--r-- 1 root root 3 nov 21 17:02 active > -rw-r--r-- 1 root root37 nov 21 17:02 ceph_fsid > drwxr-xr-x 432 root root 12288 dic 1 18:21 current > -rw-r--r-- 1 root root37 nov 21 17:02 fsid > lrwxrwxrwx 1 root root 9 nov 21 17:02 journal -> /dev/sda5 > -rw--- 1 root root57 nov 21 17:02 keyring > -rw-r--r-- 1 root root21 nov 21 17:02 magic > -rw-r--r-- 1 root root 6 nov 21 17:02 ready > -rw-r--r-- 1 root root 4 nov 21 17:02 store_version > -rw-r--r-- 1 root root53 nov 21 17:02 superblock > -rw-r--r-- 1 root root 0 nov 21 17:02 sysvinit > -rw-r--r-- 1 root root 3 nov 21 17:02 whoami > > and all works as expected, only i suposed to find as a journal not the > device (/dev/sda5) but the uuid (/dev/disk/by-uuid/). > > But seems that the cache partition does not have an UUID associated: > > root@blackpanther:~# ls -la /dev/disk/by-uuid/ | grep sda5 > root@blackpanther:~# blkid /dev/sda5 > /dev/sda5: PARTUUID="a222c6bf-05" > > I'm a but ''puzzled'' because if i've to add a disk ''before'' sda, all > device name will change with, i suppose, unexpected result. > > I'm missing something? Thanks. I was forced to change some journal, using some partition (MBR); i've stopped osd, flushed old journal, changed symplink and then do a 'journal format': root@deadpool:/var/lib/ceph/osd/ceph-6# ls -la totale 64 drwxr-xr-x 3 root root 199 feb 6 17:45 . drwxr-xr-x 6 root root 4096 dic 14 2016 .. -rw-r--r-- 1 root root 751 dic 14 2016 activate.monmap -rw-r--r-- 1 root root 3 dic 14 2016 active -rw-r--r-- 1 root root37 dic 14 2016 ceph_fsid drwxr-xr-x 378 root root 20480 feb 6 17:12 current -rw-r--r-- 1 root root37 dic 14 2016 fsid lrwxrwxrwx 1 root root 9 feb 6 17:45 journal -> /dev/sda5 -rw--- 1 root root56 dic 14 2016 keyring -rw-r--r-- 1 root root21 dic 14 2016 magic -rw-r--r-- 1 root root 6 dic 14 2016 ready -rw-r--r-- 1 root root 4 dic 14 2016 store_version -rw-r--r-- 1 root root53 dic 14 2016 superblock -rw-r--r-- 1 root root 0 feb 6 17:10 sysvinit -rw-r--r-- 1 root root 2 dic 14 2016 whoami root@deadpool:/var/lib/ceph/osd/ceph-6# ceph-osd -i 6 --mkjournal 2019-02-06 17:45:35.030359 7ff679c24880 -1 journal check: ondisk fsid ---- doesn't match expected 70357923-3227-4d57-980f-92b8c853dc76, invalid (someone else's?) journal 2019-02-06 17:45:35.038522 7ff679c24880 -1 created new journal /var/lib/ceph/osd/ceph-6/journal for object store /var/lib/ceph/osd/ceph-6 Clearly i've changed the journal partition by hand (eg, direct link) so i'm expecting that link is 'direct to partition'; but, and see the warning about fsid, still there's no 'id' associated to that partition (eg, no link in /dev/disk/by-*/). If i rerun the 'mkjournal': root@deadpool:/var/lib/ceph/osd/ceph-6# ceph-osd -i 6 --mkjournal 2019-02-06 17:45:37.621855 7f3391377880 -1 created new journal /var/lib/ceph/osd/ceph-6/journal for object store /var/lib/ceph/osd/ceph-6 So seems that effectively journal partition get 'tagged' in someway. But i'm still confused... using ID link in journal partitions works only for GPO partitioning? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Decommissioning cluster - rebalance questions
Mandi! si...@turka.nl In chel di` si favelave... > What I don't get is, when I perform 'ceph osd out ' the cluster is > rebalancing, but when I perform 'ceph osd crush remove osd.' it again > starts to rebalance. Why does this happen? I've recently hit the same 'strangeness'. Note that i'm not a ceph developer or 'power' (or 'old') user. Seems to me that there's two ''rebalance'': one for safety, one for optimization. If you tear 'out' an OSD, ceph rebalance the data for safety. But you don't have touched the crushmap, so data are scattered with the 'old' crushmap. So if then you remove that OSD (or, in any other way you touch the crushmap), a rebalance for 'optimization' start. In the same way, you can put 'slowly out' an OSD with: ceph osd reweight X (with 0 <= X <= 1) but still you don't touch the crusmap. You can also 'slowly remove' an OSD with: ceph osd crush reweight osd. X (with 0 <= X <= ); in this way you can 'deweight' the OSD in crushmap until 0, then you can safely remove. I hope i've not sayed too much blasphemy... ;-) -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Proxmox 4.4, Ceph hammer, OSD cache link...
I've recently added a host to my ceph cluster, using proxmox 'helpers' to add OSD, eg: pveceph createosd /dev/sdb -journal_dev /dev/sda5 and now i've: root@blackpanther:~# ls -la /var/lib/ceph/osd/ceph-12 totale 60 drwxr-xr-x 3 root root 199 nov 21 17:02 . drwxr-xr-x 6 root root 4096 nov 21 23:08 .. -rw-r--r-- 1 root root 903 nov 21 17:02 activate.monmap -rw-r--r-- 1 root root 3 nov 21 17:02 active -rw-r--r-- 1 root root37 nov 21 17:02 ceph_fsid drwxr-xr-x 432 root root 12288 dic 1 18:21 current -rw-r--r-- 1 root root37 nov 21 17:02 fsid lrwxrwxrwx 1 root root 9 nov 21 17:02 journal -> /dev/sda5 -rw--- 1 root root57 nov 21 17:02 keyring -rw-r--r-- 1 root root21 nov 21 17:02 magic -rw-r--r-- 1 root root 6 nov 21 17:02 ready -rw-r--r-- 1 root root 4 nov 21 17:02 store_version -rw-r--r-- 1 root root53 nov 21 17:02 superblock -rw-r--r-- 1 root root 0 nov 21 17:02 sysvinit -rw-r--r-- 1 root root 3 nov 21 17:02 whoami and all works as expected, only i suposed to find as a journal not the device (/dev/sda5) but the uuid (/dev/disk/by-uuid/). But seems that the cache partition does not have an UUID associated: root@blackpanther:~# ls -la /dev/disk/by-uuid/ | grep sda5 root@blackpanther:~# blkid /dev/sda5 /dev/sda5: PARTUUID="a222c6bf-05" I'm a but ''puzzled'' because if i've to add a disk ''before'' sda, all device name will change with, i suppose, unexpected result. I'm missing something? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Degraded objects afte: ceph osd in $osd
I reply to myself. > I've added a new node, added slowly 4 new OSD, but in the meantime an > OSD (not the new, not the node to remove) died. My situation now is: > root@blackpanther:~# ceph osd df tree > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR TYPE NAME > -1 21.41985- 5586G 2511G 3074G 00 root default > -2 5.45996- 5586G 2371G 3214G 42.45 0.93 host capitanamerica > 0 1.81999 1.0 1862G 739G 1122G 39.70 0.87 osd.0 > 1 1.81999 1.0 1862G 856G 1005G 46.00 1.00 osd.1 > 10 0.90999 1.0 931G 381G 549G 40.95 0.89 osd.10 > 11 0.90999 1.0 931G 394G 536G 42.35 0.92 osd.11 > -3 5.03996- 5586G 2615G 2970G 46.82 1.02 host vedovanera > 2 1.3 1.0 1862G 684G 1177G 36.78 0.80 osd.2 > 3 1.81999 1.0 1862G 1081G 780G 58.08 1.27 osd.3 > 4 0.90999 1.0 931G 412G 518G 44.34 0.97 osd.4 > 5 0.90999 1.0 931G 436G 494G 46.86 1.02 osd.5 > -4 5.45996- 931G 583G 347G 00 host deadpool > 6 1.81999 1.0 1862G 898G 963G 48.26 1.05 osd.6 > 7 1.81999 1.0 1862G 839G 1022G 45.07 0.98 osd.7 > 8 0.909990 0 0 0 00 osd.8 > 9 0.90999 1.0 931G 583G 347G 62.64 1.37 osd.9 > -5 5.45996- 5586G 2511G 3074G 44.96 0.98 host blackpanther > 12 1.81999 1.0 1862G 828G 1033G 44.51 0.97 osd.12 > 13 1.81999 1.0 1862G 753G 1108G 40.47 0.88 osd.13 > 14 0.90999 1.0 931G 382G 548G 41.11 0.90 osd.14 > 15 0.90999 1.0 931G 546G 384G 58.66 1.28 osd.15 > TOTAL 21413G 9819G 11594G 45.85 > MIN/MAX VAR: 0/1.37 STDDEV: 7.37 > > Perfectly healthy. But i've tried to, slowly, remove an OSD from > 'vedovanera', and so i've tried with: > ceph osd crush reweight osd.2 > as you can see, i'm arrived to weight 1.4 (from 1.81999), but if i go > lower than that i catch: [...] > recovery 2/2556513 objects degraded (0.000%) Seems that the trouble came from osd.8 that was out and down, but not from the crushmap (still have weight 0.90999). After removing osd 8 massive rebalance start. After that, now i can lower weight of OSD for node vedovanera and i've no more degraded object. I think i'm starting to understand how concretely the crush algorithm work. ;-) -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Degraded objects afte: ceph osd in $osd
Mandi! Janne Johansson In chel di` si favelave... > It is a slight mistake in reporting it in the same way as an error, even if > it looks to the > cluster just as if it was in error and needs fixing. I think i'm hit a similar situation, and also i'm feeling that something have to be 'fixed'. I seek an explanation... I'm adding a node (blackpanther, 4 OSDs, done) and removing a node (vedovanera[1], 4 OSDs, to be done). I've added a new node, added slowly 4 new OSD, but in the meantime an OSD (not the new, not the node to remove) died. My situation now is: root@blackpanther:~# ceph osd df tree ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR TYPE NAME -1 21.41985- 5586G 2511G 3074G 00 root default -2 5.45996- 5586G 2371G 3214G 42.45 0.93 host capitanamerica 0 1.81999 1.0 1862G 739G 1122G 39.70 0.87 osd.0 1 1.81999 1.0 1862G 856G 1005G 46.00 1.00 osd.1 10 0.90999 1.0 931G 381G 549G 40.95 0.89 osd.10 11 0.90999 1.0 931G 394G 536G 42.35 0.92 osd.11 -3 5.03996- 5586G 2615G 2970G 46.82 1.02 host vedovanera 2 1.3 1.0 1862G 684G 1177G 36.78 0.80 osd.2 3 1.81999 1.0 1862G 1081G 780G 58.08 1.27 osd.3 4 0.90999 1.0 931G 412G 518G 44.34 0.97 osd.4 5 0.90999 1.0 931G 436G 494G 46.86 1.02 osd.5 -4 5.45996- 931G 583G 347G 00 host deadpool 6 1.81999 1.0 1862G 898G 963G 48.26 1.05 osd.6 7 1.81999 1.0 1862G 839G 1022G 45.07 0.98 osd.7 8 0.909990 0 0 0 00 osd.8 9 0.90999 1.0 931G 583G 347G 62.64 1.37 osd.9 -5 5.45996- 5586G 2511G 3074G 44.96 0.98 host blackpanther 12 1.81999 1.0 1862G 828G 1033G 44.51 0.97 osd.12 13 1.81999 1.0 1862G 753G 1108G 40.47 0.88 osd.13 14 0.90999 1.0 931G 382G 548G 41.11 0.90 osd.14 15 0.90999 1.0 931G 546G 384G 58.66 1.28 osd.15 TOTAL 21413G 9819G 11594G 45.85 MIN/MAX VAR: 0/1.37 STDDEV: 7.37 Perfectly healthy. But i've tried to, slowly, remove an OSD from 'vedovanera', and so i've tried with: ceph osd crush reweight osd.2 as you can see, i'm arrived to weight 1.4 (from 1.81999), but if i go lower than that i catch: cluster 8794c124-c2ec-4e81-8631-742992159bd6 health HEALTH_WARN 6 pgs backfill 1 pgs backfilling 7 pgs stuck unclean recovery 2/2556513 objects degraded (0.000%) recovery 7721/2556513 objects misplaced (0.302%) monmap e6: 6 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0,blackpanther=10.27.251.2:6789/0} election epoch 2780, quorum 0,1,2,3,4,5 blackpanther,0,1,4,2,3 osdmap e9302: 16 osds: 15 up, 15 in; 7 remapped pgs pgmap v54971897: 768 pgs, 3 pools, 3300 GB data, 830 kobjects 9911 GB used, 11502 GB / 21413 GB avail 2/2556513 objects degraded (0.000%) 7721/2556513 objects misplaced (0.302%) 761 active+clean 6 active+remapped+wait_backfill 1 active+remapped+backfilling client io 9725 kB/s rd, 772 kB/s wr, 153 op/s eg, 2 object 'degraded'. This really puzzled me. Why?! Thanks. [1] some Marvel Comics heros got translated in Italian, so 'vedovanera' is 'black widow' and 'capitanamerica' clearly 'Captain America'. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Disable intra-host replication?
Mandi! Janne Johansson In chel di` si favelave... > The default crush rules with replication=3 would only place PGs on > separate hosts, > so in that case it would go into degraded mode if a node goes away, > and not place > replicas on different disks on the remaining hosts. 'hosts' mean 'hosts with OSDs', right? Because my cluster have 5 hosts, 2 are only MONs. Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Disable intra-host replication?
Previous (partial) node failures and my current experiments on adding a node lead me to the fact that, when rebalancing are needed, ceph rebalance also on intra-node: eg, if an OSD of a node die, data are rebalanced on all OSD, even if i've pool molteplicity 3 and 3 nodes. This, indeed, make perfectly sense: overral data scattering have better performance and safety. But... there's some way to se to crush 'don't rebalance in the same node, go in degradated mode'? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New OSD with weight 0, rebalance still happen...
Mandi! Paweł Sadowski In chel di` si favelave... > This is most probably due to big difference in weights between your hosts (the > new one has 20x lower weight than the old ones) which in combination with > straw > algorithm is a 'known' issue. Ok. I've reweighted back that disk to '1' and status goes back to HEALTH_OK. > You could try to increase choose_total_tries in > your crush map from 50 to some bigger number. The best IMO would be to use > straw2 (which will cause some rebalance) and then use 'ceph osd crush > reweight' > (instead of 'ceph osd reweight') with small steps to slowly rebalance data > onto > new OSDs. For now i'm putting in the new disks with 'ceph osd reweight', probably when i'm on 50% of new disks i'll start to use 'ceph osd crush reweight' against the old one. Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New OSD with weight 0, rebalance still happen...
Mandi! Paweł Sadowsk In chel di` si favelave... > Exactly, your 'new' OSD have weight 1.81999 (osd.12, osd.13) and 0.90999 > (osd.14, osd.15). As Jarek pointed out you should add them using > 'osd crush initial weight = 0' > and the use > 'ceph osd crush reweight osd.x 0.05' > to slowly increase weight on them. > From your osd tree it looks like you used 'ceph osd reweight'. Reading ceph docs lead to me that 'ceph osd reweight' and 'ceph osd crush reweight' was roughly the same, the first is effectively 'temporary' and expressed in percentage (0-1), while the second is 'permanent' and expressed, normally, as disk terabyte. You are saying that insted the first modify only the disk occupation, while only the latter alter the crush map. Right? This is true only for 'straw' algorithm? Or is general? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New OSD with weight 0, rebalance still happen...
Mandi! Paweł Sadowsk In chel di` si favelave... > From your osd tree it looks like you used 'ceph osd reweight'. Yes, and i supposed also to do the right things! Now, i've tried to lower the to-dimissi OSD, using: ceph osd reweight 2 0.95 leading to an osd map tree like: root@blackpanther:~# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 21.83984 root default -2 5.45996 host capitanamerica 0 1.81999 osd.0up 1.0 1.0 1 1.81999 osd.1up 1.0 1.0 10 0.90999 osd.10 up 1.0 1.0 11 0.90999 osd.11 up 1.0 1.0 -3 5.45996 host vedovanera 2 1.81999 osd.2up 0.95000 1.0 3 1.81999 osd.3up 1.0 1.0 4 0.90999 osd.4up 1.0 1.0 5 0.90999 osd.5up 1.0 1.0 -4 5.45996 host deadpool 6 1.81999 osd.6up 1.0 1.0 7 1.81999 osd.7up 1.0 1.0 8 0.90999 osd.8up 1.0 1.0 9 0.90999 osd.9up 1.0 1.0 -5 5.45996 host blackpanther 12 1.81999 osd.12 up 0.04999 1.0 13 1.81999 osd.13 up 0.04999 1.0 14 0.90999 osd.14 up 0.04999 1.0 15 0.90999 osd.15 up 0.04999 1.0 and, after rebalancing, to: root@blackpanther:~# ceph -s cluster 8794c124-c2ec-4e81-8631-742992159bd6 health HEALTH_WARN 6 pgs stuck unclean recovery 4/2550363 objects degraded (0.000%) recovery 11282/2550363 objects misplaced (0.442%) monmap e6: 6 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0,blackpanther=10.27.251.2:6789/0} election epoch 2750, quorum 0,1,2,3,4,5 blackpanther,0,1,4,2,3 osdmap e7300: 16 osds: 16 up, 16 in; 6 remapped pgs pgmap v54737590: 768 pgs, 3 pools, 3299 GB data, 830 kobjects 9870 GB used, 12474 GB / 22344 GB avail 4/2550363 objects degraded (0.000%) 11282/2550363 objects misplaced (0.442%) 761 active+clean 6 active+remapped 1 active+clean+scrubbing client io 13476 B/s rd, 654 kB/s wr, 95 op/s Why pgs that are in state 'stuck unclean'? -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New OSD with weight 0, rebalance still happen...
Mandi! Zongyou Yao In chel di` si favelave... > The reason for the rebalance is you are using straw algorithms. If you swift > to straw2, no data will be moved. I'm still on hammer, so: http://docs.ceph.com/docs/hammer/rados/operations/crush-map/ seems there's no 'staw2'... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New OSD with weight 0, rebalance still happen...
Mandi! Paweł Sadowsk In chel di` si favelave... > We did similar changes a many times and it always behave as expected. Ok. Good. > Can you show you crushmap/ceph osd tree? Sure! root@blackpanther:~# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 21.83984 root default -2 5.45996 host capitanamerica 0 1.81999 osd.0up 1.0 1.0 1 1.81999 osd.1up 1.0 1.0 10 0.90999 osd.10 up 1.0 1.0 11 0.90999 osd.11 up 1.0 1.0 -3 5.45996 host vedovanera 2 1.81999 osd.2up 1.0 1.0 3 1.81999 osd.3up 1.0 1.0 4 0.90999 osd.4up 1.0 1.0 5 0.90999 osd.5up 1.0 1.0 -4 5.45996 host deadpool 6 1.81999 osd.6up 1.0 1.0 7 1.81999 osd.7up 1.0 1.0 8 0.90999 osd.8up 1.0 1.0 9 0.90999 osd.9up 1.0 1.0 -5 5.45996 host blackpanther 12 1.81999 osd.12 up 0.04999 1.0 13 1.81999 osd.13 up 0.04999 1.0 14 0.90999 osd.14 up 0.04999 1.0 15 0.90999 osd.15 up 0.04999 1.0 OSD 12-15 are the new OSD; after creating it with 'noin' i've reweighted them to '0.05' (to make a test). Crush map attached. Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable straw_calc_version 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 device 8 osd.8 device 9 osd.9 device 10 osd.10 device 11 osd.11 device 12 osd.12 device 13 osd.13 device 14 osd.14 device 15 osd.15 # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host capitanamerica { id -2 # do not change unnecessarily # weight 5.460 alg straw hash 0 # rjenkins1 item osd.0 weight 1.820 item osd.1 weight 1.820 item osd.10 weight 0.910 item osd.11 weight 0.910 } host vedovanera { id -3 # do not change unnecessarily # weight 5.460 alg straw hash 0 # rjenkins1 item osd.2 weight 1.820 item osd.3 weight 1.820 item osd.4 weight 0.910 item osd.5 weight 0.910 } host deadpool { id -4 # do not change unnecessarily # weight 5.460 alg straw hash 0 # rjenkins1 item osd.6 weight 1.820 item osd.7 weight 1.820 item osd.8 weight 0.910 item osd.9 weight 0.910 } host blackpanther { id -5 # do not change unnecessarily # weight 5.460 alg straw hash 0 # rjenkins1 item osd.12 weight 1.820 item osd.13 weight 1.820 item osd.14 weight 0.910 item osd.15 weight 0.910 } root default { id -1 # do not change unnecessarily # weight 21.840 alg straw hash 0 # rjenkins1 item capitanamerica weight 5.460 item vedovanera weight 5.460 item deadpool weight 5.460 item blackpanther weight 5.460 } # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] New OSD with weight 0, rebalance still happen...
Ceph still surprise me, when i'm sure i've fully understood it, something 'strange' (to my knowledge) happen. I need to move out a server of my ceph hammer cluster (3 nodes, 4 OSD per node), and for some reasons i cannot simply move disks. So i've added a new node, and yesterday i've setup the new 4 OSD. In my mind i will add 4 OSD with weight 0, and then slowly i will lower the old OSD weight and increase the weight of the new. I've done before: ceph osd set noin and then added OSD, and (as expected) new OSD start with weight 0. But, despite of the fact that weight is zero, rebalance happen, and using percentage of rebalance 'weighted' to the size of new disk (eg, i've had 18TB circa of space, i've added a 2TB disks and roughly 10% of data start to rebalance). Why? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Hammer and a (little) disk/partition shrink...
Mandi! David Turner In chel di` si favelave... > Replace the raid controller in the chassis with an HBA before moving into the > new hardware? ;) Eh... some hint on a controller i can buy? > If you do move to the HP controller, make sure you're monitoring the health of > the cache battery in the controller. I've no battery in the controller... ;-) -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Hammer and a (little) disk/partition shrink...
Probably a complex question, with a simple answer: NO. ;-) I need to move disks from a ceph node (still on hammer) from an hardware to another one. The source hardware have a simple SATA/SAS controller, the 'new' server have a RAID controller with no JBOD mode (the infamous HP P410i), so i need to create some 'RAID 0 with a single disk' fake raid. These controller, seems to ''eat'' some space at the end of the disk, so (doing some tests) the disk does not get corrupted with the 'raid0-ification', but lost some bytes at the end, and linux then complain that the (last) partition are corrupted. hammer use filestore, so practically i need to shrunk an xfs filesystem, that is not supported by XFS. Clearly i can do 'xfsdump' of disks in some scratch space and rebuild filesystem but... I've some escape path? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RAID question for Ceph
Mandi! Troy Ablan In chel di` si favelave... > Even worse, the P410i doesn't appear to support a pass-thru (JBOD/HBA) > mode, so your only sane option for using this card is to create RAID-0s. I confirm Even worse, P410i can define a maximum of 2 'array' (even a fake array composed of one disk in raid-0) without the (expensive!) cache module. I've found digging around references to alternative firmwares that enable JBOD/HBA, but never walked that way... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Snapshot removed, cluster thrashed...
Mandi! David Turner In chel di` si favelave... > Snapshots are not a free action. Thanks to all for the info! -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Snapshot removed, cluster thrashed...
Mandi! Lindsay Mathieson In chel di` si favelave... > Have you tried restoring a snapshot? I found it unusablly slow - as in hours No, still no; i've never restored a snapshot... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Snapshot removed, cluster thrashed...
data, 7016 GB used, 9742 GB / 16758 GB avail; 1918 B/s rd, 925 kB/s wr, 34 op/s 2017-06-23 18:19:44.212196 mon.0 10.27.251.7:6789/0 1408936 : cluster [INF] pgmap v17394404: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 477 B/s rd, 63935 B/s wr, 17 op/s 2017-06-23 18:19:46.322367 mon.0 10.27.251.7:6789/0 1408937 : cluster [INF] pgmap v17394405: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 319 B/s rd, 71906 B/s wr, 17 op/s 2017-06-23 18:19:47.385412 mon.0 10.27.251.7:6789/0 1408938 : cluster [INF] pgmap v17394406: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 6434 B/s rd, 379 kB/s wr, 27 op/s 2017-06-23 18:19:49.458938 mon.0 10.27.251.7:6789/0 1408939 : cluster [INF] pgmap v17394407: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 6498 B/s rd, 338 kB/s wr, 15 op/s 2017-06-23 18:19:50.568192 mon.0 10.27.251.7:6789/0 1408940 : cluster [INF] pgmap v17394408: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 2589 B/s rd, 12625 B/s wr, 3 op/s 2017-06-23 18:19:51.660889 mon.0 10.27.251.7:6789/0 1408941 : cluster [INF] pgmap v17394409: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 28382 B/s rd, 276 kB/s wr, 52 op/s 2017-06-23 18:19:52.735827 mon.0 10.27.251.7:6789/0 1408942 : cluster [INF] pgmap v17394410: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 47066 B/s rd, 2849 kB/s wr, 166 op/s 2017-06-23 18:19:53.802365 mon.0 10.27.251.7:6789/0 1408943 : cluster [INF] pgmap v17394411: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 51028 B/s rd, 4144 kB/s wr, 174 op/s 2017-06-23 18:19:56.814558 mon.0 10.27.251.7:6789/0 1408945 : cluster [INF] pgmap v17394412: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 136 kB/s rd, 1159 kB/s wr, 52 op/s 2017-06-23 18:19:58.034610 mon.0 10.27.251.7:6789/0 1408946 : cluster [INF] pgmap v17394413: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 112 kB/s rd, 989 kB/s wr, 116 op/s 2017-06-23 18:19:59.112915 mon.0 10.27.251.7:6789/0 1408947 : cluster [INF] pgmap v17394414: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 43695 B/s rd, 1380 kB/s wr, 146 op/s 2017-06-23 18:20:00.223605 mon.0 10.27.251.7:6789/0 1408948 : cluster [INF] HEALTH_OK Three question: a) while a 'snapshot remove' action put system on load? b) as for options like: osd scrub during recovery = false osd recovery op priority = 1 osd recovery max active = 5 osd max backfills = 1 (for recovery), there are option to reduce the impact of a stapshot remove? c) snapshot are handled differently from other IO ops, or doing some similar things (eg, a restore from a backup) i've to expect some similar result? Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Network redundancy...
> The switches your using can they stack? > If so you could spread the LACP across the two switches. And: > Just use balance-alb, this will do a trick with no stack switches Thanks for the answers, i'll do some tests! ;-) -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Network redundancy...
I've setup a little Ceph cluster (3 host, 12 OSD), all belonging to a single switch, using 2-1Gbit/s LACP links. Supposing to have two identical switches, there's some way to setup a ''redundant'' configuration? For example, something similar to 'iSCSI multipath'? I'm reading switch manuals and ceph documentations, but with no luck. Thanks. -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Limit bandwidth on RadosGW?
Mandi! Marc Roos In chel di` si favelave... > Just a thought, what about marking connections with iptables and using > that mark with tc? Surely, but many things have to be taken into account: a) doing traffic control mean disabling ALL network hardware optimizations (queue, offline checksumming, ...), and i don't know the impact on ceph. b) doing simple control (eg, traffic clamping on a interface) could add little overhead, but if complex setup are needed (multiqueue, traffic shaping by IP/port/...) i think that more overhead get added. c) again, doing simple control can be done on ingress easily, but if complex setup ar needed the ingress traffic must be routed to another interface (mostly, IFB interfaces) and proper egress shaping get done here. In ifbX interfaces, also, there's no netfilter. I'm using that stuffs on firewall, where performance on modern/decent hardware is not a trouble at al. So, no, i've no benchmark at all. ;) -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Power Failure
Mandi! Santu Roy In chel di` si favelave... > I am very new to Ceph. Studding for few days for a deployment of Ceph cluster. > I am going to deploy ceph in a small data center where power failure is a big > problem. we have single power supply, Single ups and a stand by generator. so > what happend if all node down due to power failure? will it create any problem > to restart service when power restore? > looking for your suggestion.. First message on list. I manage a little Ceph cluster (3 nodes, 6 OSD), used exclusively for Proxmox (2 computing nodes, but also MON). I've experimented some pain tearing down all the cluster (for a general maintenace task, i've had to move servers from place to another...) and after rebooting the storage nodes, clock skew prevent Ceph to work correctly. It was partially a mistake by mine (ahem, i've pointed storage nodes toward an internal NTP server that was... ahem... a VM in the same cluster! ;-), but i've had some panic because, also after syncing time and after monitors stop complain about clock skew, i was forced to reboot/restart all the mon/osd one by one to get ceph working properly. So, in my experience, don't tear down a Ceph cluster at least without a NTP server available. ;-) Also, i've configured UPSes so computing nodes start shutdown before the storage nodes, and storage nodes to power on as soon as possible, while computing nodes as delayed as possible. But i've never done a power off - power on test again... -- dott. Marco Gaiarin GNUPG Key ID: 240A3D66 Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/ Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN) marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797 Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA! http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000 (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com