Re: [ceph-users] Importance of Stable Mon and OSD IPs

2018-01-23 Thread Mayank Kumar
Thanks Burkhard for the detailed explanation. Regarding the following:- >>>The ceph client (librbd accessing a volume in this case) gets asynchronous notification from the ceph mons in case of relevant changes, e.g. updates to the osd map reflecting the failure of an OSD. i have some more

Re: [ceph-users] OSD servers swapping despite having free memory capacity

2018-01-23 Thread Blair Bethwaite
+1 to Warren's advice on checking for memory fragmentation. Are you seeing kmem allocation failures in dmesg on these hosts? On 24 January 2018 at 10:44, Warren Wang wrote: > Check /proc/buddyinfo for memory fragmentation. We have some pretty severe > memory frag issues

Re: [ceph-users] OSD servers swapping despite having free memory capacity

2018-01-23 Thread Warren Wang
Check /proc/buddyinfo for memory fragmentation. We have some pretty severe memory frag issues with Ceph to the point where we keep excessive min_free_kbytes configured (8GB), and are starting to order more memory than we actually need. If you have a lot of objects, you may find that you need to

Re: [ceph-users] OSD servers swapping despite having free memory capacity

2018-01-23 Thread Marc Roos
Maybe first check what is using the swap? swap-use.sh | sort -k 5,5 -n #!/bin/bash SUM=0 OVERALL=0 for DIR in `find /proc/ -maxdepth 1 -type d | egrep "^/proc/[0-9]"` do PID=`echo $DIR | cut -d / -f 3` PROGNAME=`ps -p $PID -o comm --no-headers` for SWAP in `grep Swap $DIR/smaps

Re: [ceph-users] OSD servers swapping despite having free memory capacity

2018-01-23 Thread Lincoln Bryant
Hi Sam, What happens if you just disable swap altogether? i.e., with `swapoff -a` --Lincoln On Tue, 2018-01-23 at 19:54 +, Samuel Taylor Liston wrote: > We have a 9 - node (16 - 8TB OSDs per node) running jewel on centos > 7.4.  The OSDs are configured with encryption.  The cluster is >

[ceph-users] OSD servers swapping despite having free memory capacity

2018-01-23 Thread Samuel Taylor Liston
We have a 9 - node (16 - 8TB OSDs per node) running jewel on centos 7.4. The OSDs are configured with encryption. The cluster is accessed via two - RGWs and there are 3 - mon servers. The data pool is using 6+3 erasure coding. About 2 weeks ago I found two of the nine servers wedged and had

Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini
Il 23/01/2018 16:49, c...@jack.fr.eu.org ha scritto: On 01/23/2018 04:33 PM, Massimiliano Cuttini wrote: With Ceph you have to install an orchestrator 3rd party in order to have a clear picture of what is going on. Which can be ok, but not alway pheasable. Just as with everything As said

Re: [ceph-users] Ceph Future

2018-01-23 Thread ceph
On 01/23/2018 04:33 PM, Massimiliano Cuttini wrote: With Ceph you have to install an orchestrator 3rd party in order to have a clear picture of what is going on. Which can be ok, but not alway pheasable. Just as with everything As said wikipedia, for instance, "Proxmox VE supports local

Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini
Il 23/01/2018 14:32, c...@jack.fr.eu.org ha scritto: I think I was not clear There are VMs management system, look at https://fr.wikipedia.org/wiki/Proxmox_VE, https://en.wikipedia.org/wiki/Ganeti, probably https://en.wikipedia.org/wiki/OpenStack too Theses systems interacts with Ceph.

Re: [ceph-users] Ceph Future

2018-01-23 Thread Volker Theile
Hello Massimiliano, > >> You're more than welcome - we have a lot of work ahead of us... >> Feel free to join our Freenode IRC channel #openattic to get in touch! > > A curiosity! > as far as I understood this software was created to manage only Ceph. > Is it right? > so... why such a "far away"

Re: [ceph-users] Ceph Future

2018-01-23 Thread ceph
I think I was not clear There are VMs management system, look at https://fr.wikipedia.org/wiki/Proxmox_VE, https://en.wikipedia.org/wiki/Ganeti, probably https://en.wikipedia.org/wiki/OpenStack too Theses systems interacts with Ceph. When you create a VM, a rbd volume is created When you

Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini
You're more than welcome - we have a lot of work ahead of us... Feel free to join our Freenode IRC channel #openattic to get in touch! A curiosity! as far as I understood this software was created to manage only Ceph. Is it right? so... why such a "far away" name for a software dedicated to

Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini
Il 23/01/2018 13:20, c...@jack.fr.eu.org ha scritto: - USER taks: create new images, increase images size, sink images size, check daily status and change broken disks whenever is needed. Who does that ? For instance, Ceph can be used for VMs. Your VMs system create images, resizes images,

Re: [ceph-users] Ceph Future

2018-01-23 Thread Lenz Grimmer
Ciao Massimiliano, On 01/23/2018 01:29 PM, Massimiliano Cuttini wrote: >> https://www.openattic.org/features.html > > Oh god THIS is the answer! :) > Lenz, if you need help I can join also development. You're more than welcome - we have a lot of work ahead of us... Feel free to join our

Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini
https://www.openattic.org/features.html Oh god THIS is the answer! Lenz, if you need help I can join also development. Lenz ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini
Hey Lenz, OpenAttic seems to implement several good feature and to be more-or-less what I was asking. I'll go through all the website. :) THANKS! Il 16/01/2018 09:04, Lenz Grimmer ha scritto: Hi Massimiliano, On 01/11/2018 12:15 PM, Massimiliano Cuttini wrote: _*3) Management

Re: [ceph-users] Ceph Future

2018-01-23 Thread ceph
On 01/23/2018 11:04 AM, Massimiliano Cuttini wrote: Il 22/01/2018 21:55, Jack ha scritto: On 01/22/2018 08:38 PM, Massimiliano Cuttini wrote: The web interface is needed because:*cmd-lines are prune to typos.* And you never misclick, indeed; Do you really mean: 1) misclick once on an option

Re: [ceph-users] How to set mon-clock-drift-allowed tunable

2018-01-23 Thread Hüseyin Atatür YILDIRIM
Hello everyone, I fixed my NTP server as you said, not changed the default value of mon-clock-drift-allowed tunable. Thank you, Atatur [cid:image14d44d.PNG@a59c06b6.409b36ff] [cid:image87ad04.JPG@9a3e4b1b.4faa780e] Hüseyin Atatür YILDIRIM SİSTEM MÜHENDİSİ

Re: [ceph-users] udev rule or script to auto add bcache devices?

2018-01-23 Thread Jens-U. Mozdzen
Hi Stefan, Zitat von Stefan Priebe - Profihost AG: Hello, bcache didn't supported partitions on the past so that a lot of our osds have their data directly on: /dev/bcache[0-9] But that means i can't give them the needed part type of 4fbd7e29-9d25-41b8-afd0-062c0ceff05d and that means that

[ceph-users] Ruleset for optimized Ceph hybrid storage

2018-01-23 Thread Niklas
My question is, Is it possible to create a 3 copies ruleset where first copy is stored on class nvme and all other copies is stored on class hdd? At the same time making sure that the step to store on class hdd is not placed in the same datacenter as the first copy on class nvme? Below is a

Re: [ceph-users] Ceph Future

2018-01-23 Thread Massimiliano Cuttini
Il 22/01/2018 21:55, Jack ha scritto: On 01/22/2018 08:38 PM, Massimiliano Cuttini wrote: The web interface is needed because:*cmd-lines are prune to typos.* And you never misclick, indeed; Do you really mean: 1) misclick once on an option list, 2) miscklick once on the form, 3) mistype the

Re: [ceph-users] Missing udev rule for FC disks (Re: mkjournal error creating journal ... : (13) Permission denied)

2018-01-23 Thread Fulvio Galeazzi
Thanks a lot, Tom, glad this was already taken care of! Will keep the patch around until the official one somehow gets into my distribution. Ciao ciao Fulvio Original Message Subject: Re: [ceph-users] Missing udev rule for FC disks (Re: mkjournal

Re: [ceph-users] Luminous: example of a single down osd taking out a cluster

2018-01-23 Thread Stefan Kooman
Quoting Dan van der Ster (d...@vanderster.com): > > So, first question is: why didn't that OSD get detected as failing > much earlier? We have notiticed that "mon osd adjust heartbeat grace" made the cluster "realize" OSDs going down _much_ later than the MONs / OSDs themselves. Setting this

Re: [ceph-users] Importance of Stable Mon and OSD IPs

2018-01-23 Thread Burkhard Linke
Hi, On 01/23/2018 09:53 AM, Mayank Kumar wrote: Hi Ceph Experts I am a new user of Ceph and currently using Kubernetes to deploy Ceph RBD Volumes. We our doing some initial work rolling it out to internal customers and in doing that we are using the ip of the host as the ip of the osd and

Re: [ceph-users] Luminous: example of a single down osd taking out a cluster

2018-01-23 Thread Gregory Farnum
On Mon, Jan 22, 2018 at 8:46 PM, Dan van der Ster wrote: > Here's a bit more info as I read the logs. Firstly, these are in fact > Filestore OSDs... I was confused, but I don't think it makes a big > difference. > > Next, all the other OSDs had indeed noticed that osd.2 had

[ceph-users] Importance of Stable Mon and OSD IPs

2018-01-23 Thread Mayank Kumar
Hi Ceph Experts I am a new user of Ceph and currently using Kubernetes to deploy Ceph RBD Volumes. We our doing some initial work rolling it out to internal customers and in doing that we are using the ip of the host as the ip of the osd and mons. This means if a host goes down , we loose that

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-23 Thread Nico Schottelius
Hey Burkhard, we did actually restart osd.61, which led to the current status. Best, Nico Burkhard Linke writes:> > On 01/23/2018 08:54 AM, Nico Schottelius wrote: >> Good morning, >> >> the osd.61 actually just crashed and the disk is still

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-23 Thread Burkhard Linke
Hi, On 01/23/2018 08:54 AM, Nico Schottelius wrote: Good morning, the osd.61 actually just crashed and the disk is still intact. However, after 8 hours of rebuilding, the unfound objects are still missing: *snipsnap* Is there any chance to recover those pgs or did we actually lose data

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-23 Thread Nico Schottelius
... while trying to locate which VMs are potentially affected by a revert/delete, we noticed that root@server1:~# rados -p one-hdd ls hangs. Where does ceph store the index of block devices found in a pool? And is it possible that this information is in one of the damaged pgs? Nico Nico