[ceph-users] FileStore OSD, journal direct symlinked, permission troubles.

2019-08-29 Thread Marco Gaiarin


I've just finished a double upgrade on my ceph (PVE-based) from hammer
to jewel and from jewel to luminous.

All went well, apart that... OSD does not restart automatically,
because permission troubles on the journal:

 Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: starting osd.2 at - osd_data 
/var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
 Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.449886 
7fa505a43e00 -1 filestore(/var/lib/ceph/osd/ceph-2) mount(1822): failed to open 
journal /var/lib/ceph/osd/ceph-2/journal: (13) Permission denied
 Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.453524 
7fa505a43e00 -1 osd.2 0 OSD:init: unable to mount object store
 Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.453535 
7fa505a43e00 -1 #033[0;31m ** ERROR: osd init failed: (13) Permission 
denied#033[0m


A little fast rewind: when i've setup the cluster i've used some 'old'
servers, using a couple of SSD disks as SO and as journal.
Because servers was old, i was forced to partition the boot disk in
DOS, not GPT mode.

While creating the OSD, i've received some warnings:

WARNING:ceph-disk:Journal /dev/sdaX was not prepared with ceph-disk. 
Symlinking directly.


Looking at the cluster now, seems to me that osd init scripts try to
idetify journal based on GPT partition label/info, and clearly fail.


Not that if i do, on servers that hold OSD:

for l in $(readlink -f /var/lib/ceph/osd/ceph-*/journal); do chown 
ceph: $l; done

OSD start flawlessy.


There's something i can do? Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Post-mortem analisys?

2019-05-13 Thread Marco Gaiarin
Mandi! Martin Verges
  In chel di` si favelave...

> first of all, hyperconverged setups with public accessable VMs could be
> affected by DDoS attacks or other harmful issues that causes cascading errors
> in your infrastructure.

No, private cluster.


> Are you sure your network worked correctly at the time?

Not completely sure, but i've redirected syslog of switches, to a
server, and i've not catch main errors/failures signs.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Post-mortem analisys?

2019-05-13 Thread Marco Gaiarin


[I't is not really a 'mortem', but...]


Saturday afternoon, my 3-nodes proxmox ceph cluster have a big
'slowdown', that started at 12:35:24 with some OOM condition in one of
the 3 storage nodes, followed with also OOM on another node, at
12:43:31.

After that, all bad things happens: stuck requests, SCSI timeout on
VMs, MONs flip-flop on RBD clients.

I make a 'ceph -s' every hour, so at 14:17:01 i had at two nodes:

cluster 8794c124-c2ec-4e81-8631-742992159bd6
 health HEALTH_WARN
26 requests are blocked > 32 sec
 monmap e9: 5 mons at 
{2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0,blackpanther=10.27.251.2:6789/0,capitanmarvel=10.27.251.8:6789/0}
election epoch 3930, quorum 0,1,2,3,4 
blackpanther,capitanmarvel,4,2,3
 osdmap e15713: 12 osds: 12 up, 12 in
  pgmap v67358590: 768 pgs, 3 pools,  GB data, 560 kobjects
6639 GB used, 11050 GB / 17689 GB avail
 768 active+clean
  client io 266 kB/s wr, 25 op/s

and on the third:
cluster 8794c124-c2ec-4e81-8631-742992159bd6
 health HEALTH_WARN
5 mons down, quorum
 monmap e9: 5 mons at 
{2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0,blackpanther=10.27.251.2:6789/0,capitanmarvel=10.27.251.8:6789/0}
election epoch 3931, quorum
 osdmap e15713: 12 osds: 12 up, 12 in
  pgmap v67358598: 768 pgs, 3 pools,  GB data, 560 kobjects
6639 GB used, 11050 GB / 17689 GB avail
 767 active+clean
   1 active+clean+scrubbing
  client io 617 kB/s wr, 70 op/s


At that hour, the site served by the cluster was just closed (eg, no
users). The only task running, looking at logs, seems a backup
(bacula), but was just saving catalog, eg database workload on a
container, and ended at 14.27.


All that continue, more or less, till sunday morning, then all goes
back as normal.
Seems there was no hardware failures on nodes.

Backup tasks (all VM/LXC backups) on saturday night competed with no
errors.


Someone can provide some hint on how to 'correlate' various logs, and
so (try to) understand what happens?


Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HW failure cause client IO drops

2019-04-16 Thread Marco Gaiarin
Mandi! M Ranga Swami Reddy
  In chel di` si favelave...

> Hello - Recevenlt we had an issue with storage node's battery failure, which
> cause ceph client IO dropped to '0' bytes. Means ceph cluster couldn't perform
> IO operations on the cluster till the node takes out. This is not expected 
> from
> Ceph, as some HW fails, those respective OSDs should mark as out/down and IO
> should go as is..
> Please let me know if anyone seen the similar behavior and is this issue
> resolved?

'battery' mean 'CMOS battery'?


OSDs and MONs need accurate clock sync between them. So, if a node
reboot with a clock skew more than (AFAI Remember well) 5 seconds, OSD
does not start.

Provide a stable NTP server for all your OSDs and MONs, and restart
OSDs after clock are in sync.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD after OS reinstallation.

2019-02-25 Thread Marco Gaiarin
Mandi! Alfredo Deza
  In chel di` si favelave...

> There are ways to create partitions without a PARTUUID. We have an
> example in our docs with parted that will produce what is needed:
> http://docs.ceph.com/docs/master/ceph-volume/lvm/prepare/#partitioning
> But then again... I would strongly suggest avoiding all of this and
> just using the new way of doing OSDs with LVM

Ahem... i'm still on hammer... ;-)))

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Prevent rebalancing in the same host?

2019-02-22 Thread Marco Gaiarin
Mandi! Christian Balzer
  In chel di` si favelave...

> You pretty much answered your question, as in a limit of "osd' would do
> the trick, though not just for intra-host.

Oh, documentation does not list the possible values... good to know.


> But of course everybody will (rightly) tell you that you need enough
> capacity to at the very least deal with a single OSD loss.

Super clear. Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD after OS reinstallation.

2019-02-22 Thread Marco Gaiarin
Mandi! Alfredo Deza
  In chel di` si favelave...

> The problem is that if there is no PARTUUID ceph-volume can't ensure
> what device is the one actually pointing to data/journal. Being 'GPT'
> alone will not be enough here :(

Ok. There's some way to 'force' a PARTUUID, in a GPT or non-GPT
partition, clearly even, if needed, destroying it?


I've tried also to create a GPT partition in a DOS partition (eg, in a
/dev/sda5), and seems that GPT partition get correctly created, but
still (sub) partition have no PARTUUID...

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD after OS reinstallation.

2019-02-20 Thread Marco Gaiarin
Mandi! Alfredo Deza
  In chel di` si favelave...

> > Ahem, how can i add a GPT label to a non-GPT partition (even loosing
> > data)?
> If you are coming from ceph-disk (or something else custom-made) and
> don't care about losing data, why not fully migrate to the
> new OSDs? 
> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#rados-replacing-an-osd

I'm using proxmox, so 'pveceph' helper, but i've trouble with journal
lables, indeed, not main filesystem labels...

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD after OS reinstallation.

2019-02-20 Thread Marco Gaiarin
Mandi! Alfredo Deza
  In chel di` si favelave...

> I think this is what happens with a non-gpt partition. GPT labels will
> use a PARTUUID to identify the partition, and I just confirmed that
> ceph-volume will enforce looking for PARTUUID if the JSON
> identified a partition (vs. an LV).
> From what I briefly researched it is not possible to add a GPT label
> on a non-gpt partition without losing data.

Ahem, how can i add a GPT label to a non-GPT partition (even loosing
data)?

Seems the culprit around my 'Proxmox 4.4, Ceph hammer, OSD cache
link...' thread...


Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Prevent rebalancing in the same host?

2019-02-19 Thread Marco Gaiarin


Little cluster, 3 nodes, 4 OSD per node.

An OSD died, and ceph start to rebalance data between the OSD of the
same node (not completing it, leading to 'near os full' warning).

As exist:
mon osd down out subtree limit = host

to prevent host rebalancing, there's some way to prevent intra-host OSD
rebalancing?


Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Proxmox 4.4, Ceph hammer, OSD cache link...

2019-02-12 Thread Marco Gaiarin
Mandi! Michel Raabe
  In chel di` si favelave...

> Have you changed/add the journal_uuid from the old partition?
> https://ceph.com/geen-categorie/ceph-recover-osds-after-ssd-journal-failure/

 root@blackpanther:~# ls -la /var/lib/ceph/osd/ceph-15
 totale 56
 drwxr-xr-x   3 root root  199 nov 21 23:08 .
 drwxr-xr-x   6 root root 4096 nov 21 23:08 ..
 -rw-r--r--   1 root root  903 nov 21 23:08 activate.monmap
 -rw-r--r--   1 root root3 nov 21 23:08 active
 -rw-r--r--   1 root root   37 nov 21 23:08 ceph_fsid
 drwxr-xr-x 292 root root 8192 dic  9 15:02 current
 -rw-r--r--   1 root root   37 nov 21 23:08 fsid
 lrwxrwxrwx   1 root root9 nov 21 23:08 journal -> /dev/sda8
 -rw---   1 root root   57 nov 21 23:08 keyring
 -rw-r--r--   1 root root   21 nov 21 23:08 magic
 -rw-r--r--   1 root root6 nov 21 23:08 ready
 -rw-r--r--   1 root root4 nov 21 23:08 store_version
 -rw-r--r--   1 root root   53 nov 21 23:08 superblock
 -rw-r--r--   1 root root0 nov 21 23:08 sysvinit
 -rw-r--r--   1 root root3 nov 21 23:08 whoami

Ahem, i've no 'journal_uuid' file on OSD...

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Proxmox 4.4, Ceph hammer, OSD cache link...

2019-02-06 Thread Marco Gaiarin


I come back here.

> I've recently added a host to my ceph cluster, using proxmox 'helpers'
> to add OSD, eg:
> 
>   pveceph createosd /dev/sdb -journal_dev /dev/sda5
> 
> and now i've:
> 
>  root@blackpanther:~# ls -la /var/lib/ceph/osd/ceph-12
>  totale 60
>  drwxr-xr-x   3 root root   199 nov 21 17:02 .
>  drwxr-xr-x   6 root root  4096 nov 21 23:08 ..
>  -rw-r--r--   1 root root   903 nov 21 17:02 activate.monmap
>  -rw-r--r--   1 root root 3 nov 21 17:02 active
>  -rw-r--r--   1 root root37 nov 21 17:02 ceph_fsid
>  drwxr-xr-x 432 root root 12288 dic  1 18:21 current
>  -rw-r--r--   1 root root37 nov 21 17:02 fsid
>  lrwxrwxrwx   1 root root 9 nov 21 17:02 journal -> /dev/sda5
>  -rw---   1 root root57 nov 21 17:02 keyring
>  -rw-r--r--   1 root root21 nov 21 17:02 magic
>  -rw-r--r--   1 root root 6 nov 21 17:02 ready
>  -rw-r--r--   1 root root 4 nov 21 17:02 store_version
>  -rw-r--r--   1 root root53 nov 21 17:02 superblock
>  -rw-r--r--   1 root root 0 nov 21 17:02 sysvinit
>  -rw-r--r--   1 root root 3 nov 21 17:02 whoami
> 
> and all works as expected, only i suposed to find as a journal not the
> device (/dev/sda5) but the uuid (/dev/disk/by-uuid/).
> 
> But seems that the cache partition does not have an UUID associated:
> 
>   root@blackpanther:~# ls -la /dev/disk/by-uuid/ | grep sda5
>   root@blackpanther:~# blkid /dev/sda5
>   /dev/sda5: PARTUUID="a222c6bf-05"
> 
> I'm a but ''puzzled'' because if i've to add a disk ''before'' sda, all
> device name will change with, i suppose, unexpected result.
> 
> I'm missing something? Thanks.

I was forced to change some journal, using some partition (MBR); i've
stopped osd, flushed old journal, changed symplink and then do a
'journal format':

 root@deadpool:/var/lib/ceph/osd/ceph-6# ls -la
 totale 64
 drwxr-xr-x   3 root root   199 feb  6 17:45 .
 drwxr-xr-x   6 root root  4096 dic 14  2016 ..
 -rw-r--r--   1 root root   751 dic 14  2016 activate.monmap
 -rw-r--r--   1 root root 3 dic 14  2016 active
 -rw-r--r--   1 root root37 dic 14  2016 ceph_fsid
 drwxr-xr-x 378 root root 20480 feb  6 17:12 current
 -rw-r--r--   1 root root37 dic 14  2016 fsid
 lrwxrwxrwx   1 root root 9 feb  6 17:45 journal -> /dev/sda5
 -rw---   1 root root56 dic 14  2016 keyring
 -rw-r--r--   1 root root21 dic 14  2016 magic
 -rw-r--r--   1 root root 6 dic 14  2016 ready
 -rw-r--r--   1 root root 4 dic 14  2016 store_version
 -rw-r--r--   1 root root53 dic 14  2016 superblock
 -rw-r--r--   1 root root 0 feb  6 17:10 sysvinit
 -rw-r--r--   1 root root 2 dic 14  2016 whoami
 root@deadpool:/var/lib/ceph/osd/ceph-6# ceph-osd -i 6 --mkjournal
 2019-02-06 17:45:35.030359 7ff679c24880 -1 journal check: ondisk fsid 
---- doesn't match expected 
70357923-3227-4d57-980f-92b8c853dc76, invalid (someone else's?) journal
 2019-02-06 17:45:35.038522 7ff679c24880 -1 created new journal 
/var/lib/ceph/osd/ceph-6/journal for object store /var/lib/ceph/osd/ceph-6

Clearly i've changed the journal partition by hand (eg, direct link) so
i'm expecting that link is 'direct to partition'; but, and see the
warning about fsid, still there's no 'id' associated to that partition
(eg, no link in /dev/disk/by-*/).


If i rerun the 'mkjournal':

 root@deadpool:/var/lib/ceph/osd/ceph-6# ceph-osd -i 6 --mkjournal
 2019-02-06 17:45:37.621855 7f3391377880 -1 created new journal 
/var/lib/ceph/osd/ceph-6/journal for object store /var/lib/ceph/osd/ceph-6

So seems that effectively journal partition get 'tagged' in someway.


But i'm still confused... using ID link in journal partitions works
only for GPO partitioning?


Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Decommissioning cluster - rebalance questions

2018-12-04 Thread Marco Gaiarin
Mandi! si...@turka.nl
  In chel di` si favelave...

> What I don't get is, when I perform 'ceph osd out ' the cluster is
> rebalancing, but when I perform 'ceph osd crush remove osd.' it again
> starts to rebalance. Why does this happen?

I've recently hit the same 'strangeness'. Note that i'm not a ceph
developer or 'power' (or 'old') user.

Seems to me that there's two ''rebalance'': one for safety, one for
optimization.


If you tear 'out' an OSD, ceph rebalance the data for safety. But you
don't have touched the crushmap, so data are scattered with the 'old'
crushmap.
So if then you remove that OSD (or, in any other way you touch the crushmap),
a rebalance for 'optimization' start.

In the same way, you can put 'slowly out' an OSD with:
ceph osd reweight  X

(with 0 <= X <= 1) but still you don't touch the crusmap.

You can also 'slowly remove' an OSD with:
ceph osd crush reweight osd. X

(with 0 <= X <= ); in this way you can 'deweight' the
OSD in crushmap until 0, then you can safely remove.


I hope i've not sayed too much blasphemy... ;-)

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Proxmox 4.4, Ceph hammer, OSD cache link...

2018-12-03 Thread Marco Gaiarin


I've recently added a host to my ceph cluster, using proxmox 'helpers'
to add OSD, eg:

pveceph createosd /dev/sdb -journal_dev /dev/sda5

and now i've:

 root@blackpanther:~# ls -la /var/lib/ceph/osd/ceph-12
 totale 60
 drwxr-xr-x   3 root root   199 nov 21 17:02 .
 drwxr-xr-x   6 root root  4096 nov 21 23:08 ..
 -rw-r--r--   1 root root   903 nov 21 17:02 activate.monmap
 -rw-r--r--   1 root root 3 nov 21 17:02 active
 -rw-r--r--   1 root root37 nov 21 17:02 ceph_fsid
 drwxr-xr-x 432 root root 12288 dic  1 18:21 current
 -rw-r--r--   1 root root37 nov 21 17:02 fsid
 lrwxrwxrwx   1 root root 9 nov 21 17:02 journal -> /dev/sda5
 -rw---   1 root root57 nov 21 17:02 keyring
 -rw-r--r--   1 root root21 nov 21 17:02 magic
 -rw-r--r--   1 root root 6 nov 21 17:02 ready
 -rw-r--r--   1 root root 4 nov 21 17:02 store_version
 -rw-r--r--   1 root root53 nov 21 17:02 superblock
 -rw-r--r--   1 root root 0 nov 21 17:02 sysvinit
 -rw-r--r--   1 root root 3 nov 21 17:02 whoami

and all works as expected, only i suposed to find as a journal not the
device (/dev/sda5) but the uuid (/dev/disk/by-uuid/).

But seems that the cache partition does not have an UUID associated:

  root@blackpanther:~# ls -la /dev/disk/by-uuid/ | grep sda5
  root@blackpanther:~# blkid /dev/sda5
  /dev/sda5: PARTUUID="a222c6bf-05"

I'm a but ''puzzled'' because if i've to add a disk ''before'' sda, all
device name will change with, i suppose, unexpected result.


I'm missing something? Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Degraded objects afte: ceph osd in $osd

2018-11-29 Thread Marco Gaiarin


I reply to myself.

> I've added a new node, added slowly 4 new OSD, but in the meantime an
> OSD (not the new, not the node to remove) died. My situation now is:
>  root@blackpanther:~# ceph osd df tree
>  ID WEIGHT   REWEIGHT SIZE   USE   AVAIL  %USE  VAR  TYPE NAME   
>  -1 21.41985-  5586G 2511G  3074G 00 root default
>  -2  5.45996-  5586G 2371G  3214G 42.45 0.93 host capitanamerica 
>   0  1.81999  1.0  1862G  739G  1122G 39.70 0.87 osd.0   
>   1  1.81999  1.0  1862G  856G  1005G 46.00 1.00 osd.1   
>  10  0.90999  1.0   931G  381G   549G 40.95 0.89 osd.10  
>  11  0.90999  1.0   931G  394G   536G 42.35 0.92 osd.11  
>  -3  5.03996-  5586G 2615G  2970G 46.82 1.02 host vedovanera 
>   2  1.3  1.0  1862G  684G  1177G 36.78 0.80 osd.2   
>   3  1.81999  1.0  1862G 1081G   780G 58.08 1.27 osd.3   
>   4  0.90999  1.0   931G  412G   518G 44.34 0.97 osd.4   
>   5  0.90999  1.0   931G  436G   494G 46.86 1.02 osd.5   
>  -4  5.45996-   931G  583G   347G 00 host deadpool   
>   6  1.81999  1.0  1862G  898G   963G 48.26 1.05 osd.6   
>   7  1.81999  1.0  1862G  839G  1022G 45.07 0.98 osd.7   
>   8  0.909990  0 0  0 00 osd.8   
>   9  0.90999  1.0   931G  583G   347G 62.64 1.37 osd.9   
>  -5  5.45996-  5586G 2511G  3074G 44.96 0.98 host blackpanther   
>  12  1.81999  1.0  1862G  828G  1033G 44.51 0.97 osd.12  
>  13  1.81999  1.0  1862G  753G  1108G 40.47 0.88 osd.13  
>  14  0.90999  1.0   931G  382G   548G 41.11 0.90 osd.14  
>  15  0.90999  1.0   931G  546G   384G 58.66 1.28 osd.15  
> TOTAL 21413G 9819G 11594G 45.85  
>  MIN/MAX VAR: 0/1.37  STDDEV: 7.37
> 
> Perfectly healthy. But i've tried to, slowly, remove an OSD from
> 'vedovanera', and so i've tried with:
>   ceph osd crush reweight osd.2 
> as you can see, i'm arrived to weight 1.4 (from 1.81999), but if i go
> lower than that i catch:
[...]
> recovery 2/2556513 objects degraded (0.000%)

Seems that the trouble came from osd.8 that was out and down, but not
from the crushmap (still have weight 0.90999).

After removing osd 8 massive rebalance start. After that, now i can
lower weight of OSD for node vedovanera and i've no more degraded
object.

I think i'm starting to understand how concretely the crush algorithm
work. ;-)

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Degraded objects afte: ceph osd in $osd

2018-11-26 Thread Marco Gaiarin
Mandi! Janne Johansson
  In chel di` si favelave...

> It is a slight mistake in reporting it in the same way as an error, even if 
> it looks to the
> cluster just as if it was in error and needs fixing.

I think i'm hit a similar situation, and also i'm feeling that
something have to be 'fixed'. I seek an explanation...

I'm adding a node (blackpanther, 4 OSDs, done) and removing a
node (vedovanera[1], 4 OSDs, to be done).

I've added a new node, added slowly 4 new OSD, but in the meantime an
OSD (not the new, not the node to remove) died. My situation now is:

 root@blackpanther:~# ceph osd df tree
 ID WEIGHT   REWEIGHT SIZE   USE   AVAIL  %USE  VAR  TYPE NAME   
 -1 21.41985-  5586G 2511G  3074G 00 root default
 -2  5.45996-  5586G 2371G  3214G 42.45 0.93 host capitanamerica 
  0  1.81999  1.0  1862G  739G  1122G 39.70 0.87 osd.0   
  1  1.81999  1.0  1862G  856G  1005G 46.00 1.00 osd.1   
 10  0.90999  1.0   931G  381G   549G 40.95 0.89 osd.10  
 11  0.90999  1.0   931G  394G   536G 42.35 0.92 osd.11  
 -3  5.03996-  5586G 2615G  2970G 46.82 1.02 host vedovanera 
  2  1.3  1.0  1862G  684G  1177G 36.78 0.80 osd.2   
  3  1.81999  1.0  1862G 1081G   780G 58.08 1.27 osd.3   
  4  0.90999  1.0   931G  412G   518G 44.34 0.97 osd.4   
  5  0.90999  1.0   931G  436G   494G 46.86 1.02 osd.5   
 -4  5.45996-   931G  583G   347G 00 host deadpool   
  6  1.81999  1.0  1862G  898G   963G 48.26 1.05 osd.6   
  7  1.81999  1.0  1862G  839G  1022G 45.07 0.98 osd.7   
  8  0.909990  0 0  0 00 osd.8   
  9  0.90999  1.0   931G  583G   347G 62.64 1.37 osd.9   
 -5  5.45996-  5586G 2511G  3074G 44.96 0.98 host blackpanther   
 12  1.81999  1.0  1862G  828G  1033G 44.51 0.97 osd.12  
 13  1.81999  1.0  1862G  753G  1108G 40.47 0.88 osd.13  
 14  0.90999  1.0   931G  382G   548G 41.11 0.90 osd.14  
 15  0.90999  1.0   931G  546G   384G 58.66 1.28 osd.15  
TOTAL 21413G 9819G 11594G 45.85  
 MIN/MAX VAR: 0/1.37  STDDEV: 7.37

Perfectly healthy. But i've tried to, slowly, remove an OSD from
'vedovanera', and so i've tried with:

ceph osd crush reweight osd.2 

as you can see, i'm arrived to weight 1.4 (from 1.81999), but if i go
lower than that i catch:

   cluster 8794c124-c2ec-4e81-8631-742992159bd6
 health HEALTH_WARN
6 pgs backfill
1 pgs backfilling
7 pgs stuck unclean
recovery 2/2556513 objects degraded (0.000%)
recovery 7721/2556513 objects misplaced (0.302%)
 monmap e6: 6 mons at 
{0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0,blackpanther=10.27.251.2:6789/0}
election epoch 2780, quorum 0,1,2,3,4,5 blackpanther,0,1,4,2,3
 osdmap e9302: 16 osds: 15 up, 15 in; 7 remapped pgs
  pgmap v54971897: 768 pgs, 3 pools, 3300 GB data, 830 kobjects
9911 GB used, 11502 GB / 21413 GB avail
2/2556513 objects degraded (0.000%)
7721/2556513 objects misplaced (0.302%)
 761 active+clean
   6 active+remapped+wait_backfill
   1 active+remapped+backfilling
  client io 9725 kB/s rd, 772 kB/s wr, 153 op/s

eg, 2 object 'degraded'. This really puzzled me.

Why?! Thanks.


[1] some Marvel Comics heros got translated in Italian, so 'vedovanera'
  is 'black widow' and 'capitanamerica' clearly 'Captain America'.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Disable intra-host replication?

2018-11-26 Thread Marco Gaiarin
Mandi! Janne Johansson
  In chel di` si favelave...

> The default crush rules with replication=3 would only place PGs on
> separate hosts,
> so in that case it would go into degraded mode if a node goes away,
> and not place
> replicas on different disks on the remaining hosts.

'hosts' mean 'hosts with OSDs', right?

Because my cluster have 5 hosts, 2 are only MONs.


Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Disable intra-host replication?

2018-11-23 Thread Marco Gaiarin


Previous (partial) node failures and my current experiments on adding a
node lead me to the fact that, when rebalancing are needed, ceph
rebalance also on intra-node: eg, if an OSD of a node die, data are
rebalanced on all OSD, even if i've pool molteplicity 3 and 3 nodes.

This, indeed, make perfectly sense: overral data scattering have better
performance and safety.


But... there's some way to se to crush 'don't rebalance in the same node, go
in degradated mode'?


Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-23 Thread Marco Gaiarin
Mandi! Paweł Sadowski
  In chel di` si favelave...

> This is most probably due to big difference in weights between your hosts (the
> new one has 20x lower weight than the old ones) which in combination with 
> straw
> algorithm is a 'known' issue.

Ok. I've reweighted back that disk to '1' and status goes back to
HEALTH_OK.


> You could try to increase choose_total_tries in
> your crush map from 50 to some bigger number. The best IMO would be to use
> straw2 (which will cause some rebalance) and then use 'ceph osd crush 
> reweight'
> (instead of 'ceph osd reweight') with small steps to slowly rebalance data 
> onto
> new OSDs.

For now i'm putting in the new disks with 'ceph osd reweight',
probably when i'm on 50% of new disks i'll start to use 'ceph osd crush 
reweight'
against the old one.

Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-23 Thread Marco Gaiarin
Mandi! Paweł Sadowsk
  In chel di` si favelave...

> Exactly, your 'new' OSD have weight 1.81999 (osd.12, osd.13) and 0.90999
> (osd.14, osd.15). As Jarek pointed out you should add them using
>   'osd crush initial weight = 0'
> and the use
>   'ceph osd crush reweight osd.x 0.05'
> to slowly increase weight on them.
> From your osd tree it looks like you used 'ceph osd reweight'.

Reading ceph docs lead to me that 'ceph osd reweight' and 'ceph osd crush
reweight' was roughly the same, the first is effectively 'temporary'
and expressed in percentage (0-1), while the second is 'permanent' and
expressed, normally, as disk terabyte.

You are saying that insted the first modify only the disk occupation,
while only the latter alter the crush map.

Right?


This is true only for 'straw' algorithm? Or is general? Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Marco Gaiarin
Mandi! Paweł Sadowsk
  In chel di` si favelave...

> From your osd tree it looks like you used 'ceph osd reweight'.

Yes, and i supposed also to do the right things!

Now, i've tried to lower the to-dimissi OSD, using:
ceph osd reweight 2 0.95

leading to an osd map tree like:

 root@blackpanther:~# ceph osd tree
 ID WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY 
 -1 21.83984 root default  
 -2  5.45996 host capitanamerica   
  0  1.81999 osd.0up  1.0  1.0 
  1  1.81999 osd.1up  1.0  1.0 
 10  0.90999 osd.10   up  1.0  1.0 
 11  0.90999 osd.11   up  1.0  1.0 
 -3  5.45996 host vedovanera   
  2  1.81999 osd.2up  0.95000  1.0 
  3  1.81999 osd.3up  1.0  1.0 
  4  0.90999 osd.4up  1.0  1.0 
  5  0.90999 osd.5up  1.0  1.0 
 -4  5.45996 host deadpool 
  6  1.81999 osd.6up  1.0  1.0 
  7  1.81999 osd.7up  1.0  1.0 
  8  0.90999 osd.8up  1.0  1.0 
  9  0.90999 osd.9up  1.0  1.0 
 -5  5.45996 host blackpanther 
 12  1.81999 osd.12   up  0.04999  1.0 
 13  1.81999 osd.13   up  0.04999  1.0 
 14  0.90999 osd.14   up  0.04999  1.0 
 15  0.90999 osd.15   up  0.04999  1.0 

and, after rebalancing, to:

 root@blackpanther:~# ceph -s
cluster 8794c124-c2ec-4e81-8631-742992159bd6
 health HEALTH_WARN
6 pgs stuck unclean
recovery 4/2550363 objects degraded (0.000%)
recovery 11282/2550363 objects misplaced (0.442%)
 monmap e6: 6 mons at 
{0=10.27.251.7:6789/0,1=10.27.251.8:6789/0,2=10.27.251.11:6789/0,3=10.27.251.12:6789/0,4=10.27.251.9:6789/0,blackpanther=10.27.251.2:6789/0}
election epoch 2750, quorum 0,1,2,3,4,5 blackpanther,0,1,4,2,3
 osdmap e7300: 16 osds: 16 up, 16 in; 6 remapped pgs
  pgmap v54737590: 768 pgs, 3 pools, 3299 GB data, 830 kobjects
9870 GB used, 12474 GB / 22344 GB avail
4/2550363 objects degraded (0.000%)
11282/2550363 objects misplaced (0.442%)
 761 active+clean
   6 active+remapped
   1 active+clean+scrubbing
  client io 13476 B/s rd, 654 kB/s wr, 95 op/s

Why pgs that are in state 'stuck unclean'?

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Marco Gaiarin
Mandi! Zongyou Yao
  In chel di` si favelave...

> The reason for the rebalance is you are using straw algorithms.  If you swift 
> to straw2, no data will be moved.

I'm still on hammer, so:

http://docs.ceph.com/docs/hammer/rados/operations/crush-map/

seems there's no 'staw2'...

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Marco Gaiarin
Mandi! Paweł Sadowsk
  In chel di` si favelave...

> We did similar changes a many times and it always behave as expected.

Ok. Good.

> Can you show you crushmap/ceph osd tree?

Sure!

 root@blackpanther:~# ceph osd tree
 ID WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY 
 -1 21.83984 root default  
 -2  5.45996 host capitanamerica   
  0  1.81999 osd.0up  1.0  1.0 
  1  1.81999 osd.1up  1.0  1.0 
 10  0.90999 osd.10   up  1.0  1.0 
 11  0.90999 osd.11   up  1.0  1.0 
 -3  5.45996 host vedovanera   
  2  1.81999 osd.2up  1.0  1.0 
  3  1.81999 osd.3up  1.0  1.0 
  4  0.90999 osd.4up  1.0  1.0 
  5  0.90999 osd.5up  1.0  1.0 
 -4  5.45996 host deadpool 
  6  1.81999 osd.6up  1.0  1.0 
  7  1.81999 osd.7up  1.0  1.0 
  8  0.90999 osd.8up  1.0  1.0 
  9  0.90999 osd.9up  1.0  1.0 
 -5  5.45996 host blackpanther 
 12  1.81999 osd.12   up  0.04999  1.0 
 13  1.81999 osd.13   up  0.04999  1.0 
 14  0.90999 osd.14   up  0.04999  1.0 
 15  0.90999 osd.15   up  0.04999  1.0 

OSD 12-15 are the new OSD; after creating it with 'noin' i've
reweighted them to '0.05' (to make a test).


Crush map attached. Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host capitanamerica {
id -2   # do not change unnecessarily
# weight 5.460
alg straw
hash 0  # rjenkins1
item osd.0 weight 1.820
item osd.1 weight 1.820
item osd.10 weight 0.910
item osd.11 weight 0.910
}
host vedovanera {
id -3   # do not change unnecessarily
# weight 5.460
alg straw
hash 0  # rjenkins1
item osd.2 weight 1.820
item osd.3 weight 1.820
item osd.4 weight 0.910
item osd.5 weight 0.910
}
host deadpool {
id -4   # do not change unnecessarily
# weight 5.460
alg straw
hash 0  # rjenkins1
item osd.6 weight 1.820
item osd.7 weight 1.820
item osd.8 weight 0.910
item osd.9 weight 0.910
}
host blackpanther {
id -5   # do not change unnecessarily
# weight 5.460
alg straw
hash 0  # rjenkins1
item osd.12 weight 1.820
item osd.13 weight 1.820
item osd.14 weight 0.910
item osd.15 weight 0.910
}
root default {
id -1   # do not change unnecessarily
# weight 21.840
alg straw
hash 0  # rjenkins1
item capitanamerica weight 5.460
item vedovanera weight 5.460
item deadpool weight 5.460
item blackpanther weight 5.460
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New OSD with weight 0, rebalance still happen...

2018-11-22 Thread Marco Gaiarin


Ceph still surprise me, when i'm sure i've fully understood it,
something 'strange' (to my knowledge) happen.


I need to move out a server of my ceph hammer cluster (3 nodes, 4 OSD
per node), and for some reasons i cannot simply move disks.
So i've added a new node, and yesterday i've setup the new 4 OSD.
In my mind i will add 4 OSD with weight 0, and then slowly i will lower
the old OSD weight and increase the weight of the new.

I've done before:

ceph osd set noin

and then added OSD, and (as expected) new OSD start with weight 0.

But, despite of the fact that weight is zero, rebalance happen, and
using percentage of rebalance 'weighted' to the size of new disk (eg,
i've had 18TB circa of space, i've added a 2TB disks and roughly 10% of
data start to rebalance).


Why? Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hammer and a (little) disk/partition shrink...

2018-08-29 Thread Marco Gaiarin
Mandi! David Turner
  In chel di` si favelave...

> Replace the raid controller in the chassis with an HBA before moving into the
> new hardware? ;)

Eh... some hint on a controller i can buy?


> If you do move to the HP controller, make sure you're monitoring the health of
> the cache battery in the controller.

I've no battery in the controller... ;-)

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Hammer and a (little) disk/partition shrink...

2018-08-29 Thread Marco Gaiarin


Probably a complex question, with a simple answer: NO. ;-)


I need to move disks from a ceph node (still on hammer) from an
hardware to another one. The source hardware have a simple SATA/SAS
controller, the 'new' server have a RAID controller with no JBOD mode
(the infamous HP P410i), so i need to create some 'RAID 0 with a single
disk' fake raid.

These controller, seems to ''eat'' some space at the end of the disk,
so (doing some tests) the disk does not get corrupted with the
'raid0-ification', but lost some bytes at the end, and linux then
complain that the (last) partition are corrupted.

hammer use filestore, so practically i need to shrunk an xfs
filesystem, that is not supported by XFS.
Clearly i can do 'xfsdump' of disks in some scratch space and rebuild
filesystem but...


I've some escape path?


Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RAID question for Ceph

2018-07-19 Thread Marco Gaiarin
Mandi! Troy Ablan
  In chel di` si favelave...

> Even worse, the P410i doesn't appear to support a pass-thru (JBOD/HBA)
> mode, so your only sane option for using this card is to create RAID-0s.

I confirm Even worse, P410i can define a maximum of 2 'array' (even a
fake array composed of one disk in raid-0) without the (expensive!)
cache module.

I've found digging around references to alternative firmwares that
enable JBOD/HBA, but never walked that way...

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Snapshot removed, cluster thrashed...

2017-06-27 Thread Marco Gaiarin
Mandi! David Turner
  In chel di` si favelave...

> Snapshots are not a free action.

Thanks to all for the info!

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Snapshot removed, cluster thrashed...

2017-06-26 Thread Marco Gaiarin
Mandi! Lindsay Mathieson
  In chel di` si favelave...

> Have you tried restoring a snapshot? I found it unusablly slow - as in hours

No, still no; i've never restored a snapshot...

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Snapshot removed, cluster thrashed...

2017-06-26 Thread Marco Gaiarin
data, 7016 GB used, 9742 GB 
/ 16758 GB avail; 1918 B/s rd, 925 kB/s wr, 34 op/s
 2017-06-23 18:19:44.212196 mon.0 10.27.251.7:6789/0 1408936 : cluster [INF] 
pgmap v17394404: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB 
/ 16758 GB avail; 477 B/s rd, 63935 B/s wr, 17 op/s
 2017-06-23 18:19:46.322367 mon.0 10.27.251.7:6789/0 1408937 : cluster [INF] 
pgmap v17394405: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB 
/ 16758 GB avail; 319 B/s rd, 71906 B/s wr, 17 op/s
 2017-06-23 18:19:47.385412 mon.0 10.27.251.7:6789/0 1408938 : cluster [INF] 
pgmap v17394406: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB 
/ 16758 GB avail; 6434 B/s rd, 379 kB/s wr, 27 op/s
 2017-06-23 18:19:49.458938 mon.0 10.27.251.7:6789/0 1408939 : cluster [INF] 
pgmap v17394407: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB 
/ 16758 GB avail; 6498 B/s rd, 338 kB/s wr, 15 op/s
 2017-06-23 18:19:50.568192 mon.0 10.27.251.7:6789/0 1408940 : cluster [INF] 
pgmap v17394408: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB 
/ 16758 GB avail; 2589 B/s rd, 12625 B/s wr, 3 op/s
 2017-06-23 18:19:51.660889 mon.0 10.27.251.7:6789/0 1408941 : cluster [INF] 
pgmap v17394409: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB 
/ 16758 GB avail; 28382 B/s rd, 276 kB/s wr, 52 op/s
 2017-06-23 18:19:52.735827 mon.0 10.27.251.7:6789/0 1408942 : cluster [INF] 
pgmap v17394410: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB 
/ 16758 GB avail; 47066 B/s rd, 2849 kB/s wr, 166 op/s
 2017-06-23 18:19:53.802365 mon.0 10.27.251.7:6789/0 1408943 : cluster [INF] 
pgmap v17394411: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB 
/ 16758 GB avail; 51028 B/s rd, 4144 kB/s wr, 174 op/s
 2017-06-23 18:19:56.814558 mon.0 10.27.251.7:6789/0 1408945 : cluster [INF] 
pgmap v17394412: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB 
/ 16758 GB avail; 136 kB/s rd, 1159 kB/s wr, 52 op/s
 2017-06-23 18:19:58.034610 mon.0 10.27.251.7:6789/0 1408946 : cluster [INF] 
pgmap v17394413: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB 
/ 16758 GB avail; 112 kB/s rd, 989 kB/s wr, 116 op/s
 2017-06-23 18:19:59.112915 mon.0 10.27.251.7:6789/0 1408947 : cluster [INF] 
pgmap v17394414: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB 
/ 16758 GB avail; 43695 B/s rd, 1380 kB/s wr, 146 op/s
 2017-06-23 18:20:00.223605 mon.0 10.27.251.7:6789/0 1408948 : cluster [INF] 
HEALTH_OK


Three question:

a) while a 'snapshot remove' action put system on load?

b) as for options like:

osd scrub during recovery = false
osd recovery op priority = 1
osd recovery max active = 5
osd max backfills = 1

 (for recovery), there are option to reduce the impact of a stapshot
 remove?

c) snapshot are handled differently from other IO ops, or doing some
 similar things (eg, a restore from a backup) i've to expect some
 similar result?


Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Network redundancy...

2017-05-30 Thread Marco Gaiarin

> The switches your using can they stack?
> If so you could spread the LACP across the two switches.

And:

> Just use balance-alb, this will do a trick with no stack switches

Thanks for the answers, i'll do some tests! ;-)

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Network redundancy...

2017-05-29 Thread Marco Gaiarin

I've setup a little Ceph cluster (3 host, 12 OSD), all belonging to a
single switch, using 2-1Gbit/s LACP links.

Supposing to have two identical switches, there's some way to setup a
''redundant'' configuration?
For example, something similar to 'iSCSI multipath'?


I'm reading switch manuals and ceph documentations, but with no luck.


Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Limit bandwidth on RadosGW?

2017-05-04 Thread Marco Gaiarin
Mandi! Marc Roos
  In chel di` si favelave...

> Just a thought, what about marking connections with iptables and using 
> that mark with tc? 

Surely, but many things have to be taken into account:

a) doing traffic control mean disabling ALL network hardware
 optimizations (queue, offline checksumming, ...), and i don't know
the impact on ceph.

b) doing simple control (eg, traffic clamping on a interface) could add
 little overhead, but if complex setup are needed (multiqueue, traffic
shaping by IP/port/...) i think that more overhead get added.

c) again, doing simple control can be done on ingress easily, but if
 complex setup ar needed the ingress traffic must be routed to another
interface (mostly, IFB interfaces) and proper egress shaping get done
here. In ifbX interfaces, also, there's no netfilter.


I'm using that stuffs on firewall, where performance on modern/decent
hardware is not a trouble at al.

So, no, i've no benchmark at all. ;)

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Power Failure

2017-04-26 Thread Marco Gaiarin
Mandi! Santu Roy
  In chel di` si favelave...

> I am very new to Ceph. Studding for few days for a deployment of Ceph cluster.
> I am going to deploy ceph in a small data center where power failure is a big
> problem. we have single power supply, Single ups and a stand by generator. so
> what happend if all node down due to power failure? will it create any problem
> to restart service when power restore?
> looking for your suggestion..

First message on list. I manage a little Ceph cluster (3 nodes, 6 OSD),
used exclusively for Proxmox (2 computing nodes, but also MON).

I've experimented some pain tearing down all the cluster (for a general
maintenace task, i've had to move servers from place to another...) and
after rebooting the storage nodes, clock skew prevent Ceph to work
correctly.

It was partially a mistake by mine (ahem, i've pointed storage nodes
toward an internal NTP server that was... ahem... a VM in the same
cluster! ;-), but i've had some panic because, also after syncing time
and after monitors stop complain about clock skew, i was forced to
reboot/restart all the mon/osd one by one to get ceph working properly.


So, in my experience, don't tear down a Ceph cluster at least without a
NTP server available. ;-)


Also, i've configured UPSes so computing nodes start shutdown before
the storage nodes, and storage nodes to power on as soon as possible,
while computing nodes as delayed as possible.
But i've never done a power off - power on test again...

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com