Re: [ceph-users] Mimic osd fails to start.

2018-08-20 Thread Alfredo Deza
On Mon, Aug 20, 2018 at 10:23 AM, Daznis wrote: > Hello, > > It appears that something is horribly wrong with the cluster itself. I > can't create or add any new osds to it at all. Have you added new monitors? Or replaced monitors? I would check that all your versions match, something seems to

Re: [ceph-users] missing dependecy in ubuntu packages

2018-08-20 Thread Alfredo Daniel Rezinovsky
On 20/08/18 06:44, John Spray wrote: On Sun, Aug 19, 2018 at 9:21 PM Alfredo Daniel Rezinovsky wrote: both in ubuntu 16.04 and 18.04 ceph-mgr fail to starts when package python-routes is not installed I guess you mean that the dashboard doesn't work, as opposed to the whole ceph-mgr process

Re: [ceph-users] missing dependecy in ubuntu packages

2018-08-20 Thread John Spray
On Mon, Aug 20, 2018 at 6:50 PM Alfredo Daniel Rezinovsky wrote: > > > > On 20/08/18 06:44, John Spray wrote: > > On Sun, Aug 19, 2018 at 9:21 PM Alfredo Daniel Rezinovsky > > wrote: > >> both in ubuntu 16.04 and 18.04 ceph-mgr fail to starts when package > >> python-routes is not installed > >

[ceph-users] Still risky to remove RBD-Images?

2018-08-20 Thread Mehmet
Hello, AFAIK removing of big RBD-Images would lead ceph to produce blocked requests - I dont mean caused by poor disks. Is this still the case with "Luminous (12.2.4)"? I have a a few images with - 2 Terrabyte - 5 Terrabyte and - 20 Terrabyte in size and have to delete the images. Would

[ceph-users] what is Implicated osds

2018-08-20 Thread Satish Patel
Folks, Today i found ceph -s is really slow and just hanging for minute or 2 minute to give me output also same with "ceph osd tree" output, command just hanging long time to give me output.. This is what i am seeing output, one OSD down not sure why its down and what is the relation with

[ceph-users] Removing all rados objects based on a prefix

2018-08-20 Thread David Turner
The general talk about the rados cleanup command is to clean things up after benchmarking. Could this command also be used for deleting an old RGW bucket or an RBD. For instance, a bucket with a prefix of `25ff9eff-058b-41e3-8724-cfffecb979c0.9709451.1` such that all objects in the

Re: [ceph-users] Mimic osd fails to start.

2018-08-20 Thread Daznis
Hello, Medic shows everything fine. Whole cluster is on the latest mimic version. It was updated to mimic when stable version of mimic was release and recently it was updated to "ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)". For some reason one mgr service is

[ceph-users] QEMU/Libvirt + librbd issue using Luminous 12.2.7

2018-08-20 Thread Andre Goree
This issue first started while using Luminous 12.2.5, I upgraded to 12.2.7 and it's still present.  This issue is _not_ present in 12.2.4. With Ceph 12.2.4, using QEMU/KVM + Libvirt, I'm able to mount an rbd image using the following syntax and populated xml: 'virsh attach-device $vm foo.xml

[ceph-users] ceph balancer: further optimizations?

2018-08-20 Thread Stefan Priebe - Profihost AG
Hello, since loic seems to have left ceph development and his wunderful crush optimization tool isn'T working anymore i'm trying to get a good distribution with the ceph balancer. Sadly it does not work as good as i want. # ceph osd df | sort -k8 show 75 to 83% Usage which is 8% difference

Re: [ceph-users] cephfs client version in RedHat/CentOS 7.5

2018-08-20 Thread Ilya Dryomov
On Mon, Aug 20, 2018 at 4:52 PM Dietmar Rieder wrote: > > Hi Cephers, > > > I wonder if the cephfs client in RedHat/CentOS 7.5 will be updated to > luminous? > As far as I see there is some luminous related stuff that was > backported, however, > the "ceph features" command just reports "jewel"

Re: [ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-20 Thread Yehuda Sadeh-Weinraub
That message has been there since 2014. We should lower the log level though. Yehuda On Mon, Aug 20, 2018 at 6:08 AM, David Turner wrote: > In luminous they consolidated a lot of the rgw metadata pools by using > namespace inside of the pools. I would say that the GC pool was consolidated >

Re: [ceph-users] failing to respond to cache pressure

2018-08-20 Thread Zhenshi Zhou
Hi Eugen, I think it does have positive effect on the messages. Cause I get fewer messages than before. Eugen Block 于2018年8月20日周一 下午9:29写道: > Update: we are getting these messages again. > > So the search continues... > > > Zitat von Eugen Block : > > > Hi, > > > > Depending on your kernel

Re: [ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-20 Thread Yehuda Sadeh-Weinraub
There was an existing bug reported for this one, and it's fixed on master: http://tracker.ceph.com/issues/23801 It will be backport to luminous and mimic. On Mon, Aug 20, 2018 at 9:25 AM, Yehuda Sadeh-Weinraub wrote: > That message has been there since 2014. We should lower the log level

Re: [ceph-users] Removing all rados objects based on a prefix

2018-08-20 Thread Wido den Hollander
On 08/20/2018 05:20 PM, David Turner wrote: > The general talk about the rados cleanup command is to clean things up > after benchmarking.  Could this command also be used for deleting an old > RGW bucket or an RBD.  For instance, a bucket with a prefix of >

Re: [ceph-users] cephfs client version in RedHat/CentOS 7.5

2018-08-20 Thread Dan van der Ster
On Mon, Aug 20, 2018 at 5:37 PM Ilya Dryomov wrote: > > On Mon, Aug 20, 2018 at 4:52 PM Dietmar Rieder > wrote: > > > > Hi Cephers, > > > > > > I wonder if the cephfs client in RedHat/CentOS 7.5 will be updated to > > luminous? > > As far as I see there is some luminous related stuff that was >

Re: [ceph-users] ceph balancer: further optimizations?

2018-08-20 Thread Stefan Priebe - Profihost AG
Am 20.08.2018 um 22:13 schrieb David Turner: > You might just have too much data per PG.  If a single PG can account > for 4% of your OSD, then 9% difference in used space on your OSDs is > caused by an OSD having only 2 more PGs than another OSD.  If you do > have very large PGs, increasing your

Re: [ceph-users] what is Implicated osds

2018-08-20 Thread Brad Hubbard
On Tue, Aug 21, 2018 at 2:37 AM, Satish Patel wrote: > Folks, > > Today i found ceph -s is really slow and just hanging for minute or 2 > minute to give me output also same with "ceph osd tree" output, > command just hanging long time to give me output.. > > This is what i am seeing output, one

Re: [ceph-users] ceph balancer: further optimizations?

2018-08-20 Thread David Turner
I didn't ask how many PGs per OSD, I asked how large are your PGs in comparison to your OSDs. For instance my primary data pool in my home cluster has 10914GB of data in it and has 256 PGs. That means that each PG accounts for 42GB of data. I'm using 5TB disks in this cluster. Each PG on an

Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

2018-08-20 Thread Kees Meijs
Hi again, I'm starting to feel really unlucky here... At the moment, the situation is "sort of okay":     1387 active+clean   11 active+clean+inconsistent    7 active+recovery_wait+degraded    1

Re: [ceph-users] ceph balancer: further optimizations?

2018-08-20 Thread Stefan Priebe - Profihost AG
Am 20.08.2018 um 21:52 schrieb Sage Weil: > On Mon, 20 Aug 2018, Stefan Priebe - Profihost AG wrote: >> Hello, >> >> since loic seems to have left ceph development and his wunderful crush >> optimization tool isn'T working anymore i'm trying to get a good >> distribution with the ceph balancer.

Re: [ceph-users] ceph balancer: further optimizations?

2018-08-20 Thread Stefan Priebe - Profihost AG
Am 20.08.2018 um 22:38 schrieb Dan van der Ster: > On Mon, Aug 20, 2018 at 10:19 PM Stefan Priebe - Profihost AG > wrote: >> >> >> Am 20.08.2018 um 21:52 schrieb Sage Weil: >>> On Mon, 20 Aug 2018, Stefan Priebe - Profihost AG wrote: Hello, since loic seems to have left ceph

Re: [ceph-users] ceph balancer: further optimizations?

2018-08-20 Thread David Turner
You might just have too much data per PG. If a single PG can account for 4% of your OSD, then 9% difference in used space on your OSDs is caused by an OSD having only 2 more PGs than another OSD. If you do have very large PGs, increasing your PG count in those pools should improve your data

Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time

2018-08-20 Thread Kees Meijs
Hi there, A few hours ago I started the given OSD again and gave it weight 1.0. Backfilling started and more PGs became active+clean. After a while the same crashing behaviour started to act up so I stopped the backfilling. Running with noout,nobackfill,norebalance,noscrub,nodeep-scrub

Re: [ceph-users] what is Implicated osds

2018-08-20 Thread Satish Patel
Thanks Brad, This is what i found, issue was MTU I have set MTU 9000 on all my OSD nodes and mon node but somehow it get reverted on mon node back to 1500. Because of mismatched MTU caused some strange communication issue between osd and mon nodes. After fixing MTU on mon, things started

Re: [ceph-users] packages names for ubuntu/debian

2018-08-20 Thread Alfredo Daniel Rezinovsky
On 20/08/18 03:50, Bastiaan Visser wrote: you should only use the 18.04 repo in 18.04, and remove the 16.04 repo. use: https://download.ceph.com/debian-luminous bionic main - Bastiaan Right. But if I came from a working 16.04 system upgraded to 18.04 the ceph (xenial) packages are already

Re: [ceph-users] Ensure Hammer client compatibility

2018-08-20 Thread Kees Meijs
Hi Lincoln, We're looking at (now existing) RBD support using KVM/QEMU, so this is an upgrade path. Regards, Kees On 20-08-18 16:37, Lincoln Bryant wrote: What interfaces do your Hammer clients need? If you're looking at CephFS, we have had reasonable success moving our older clients (EL6)

Re: [ceph-users] Ensure Hammer client compatibility

2018-08-20 Thread Lincoln Bryant
Hi Kees, What interfaces do your Hammer clients need? If you're looking at CephFS, we have had reasonable success moving our older clients (EL6) to NFS Ganesha with the Ceph FSAL. --Lincoln On Mon, 2018-08-20 at 12:22 +0200, Kees Meijs wrote: > Good afternoon Cephers, > > While I'm fixing our

Re: [ceph-users] Set existing pools to use hdd device class only

2018-08-20 Thread Jonathan Proulx
On Mon, Aug 20, 2018 at 06:13:26AM -0400, David Turner wrote: :There is a thread from the ceph-large users ML that covered a way to do :this change without shifting data for an HDD only cluster. Hopefully it :will be helpful for you. : :

Re: [ceph-users] Set existing pools to use hdd device class only

2018-08-20 Thread Eugen Block
The correct URL should be: http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000106.html Zitat von Jonathan Proulx : On Mon, Aug 20, 2018 at 06:13:26AM -0400, David Turner wrote: :There is a thread from the ceph-large users ML that covered a way to do :this change without

[ceph-users] cephfs client version in RedHat/CentOS 7.5

2018-08-20 Thread Dietmar Rieder
Hi Cephers, I wonder if the cephfs client in RedHat/CentOS 7.5 will be updated to luminous? As far as I see there is some luminous related stuff that was backported, however, the "ceph features" command just reports "jewel" as release of my cephfs clients running CentOS 7.5 (kernel

Re: [ceph-users] packages names for ubuntu/debian

2018-08-20 Thread Bastiaan Visser
you should only use the 18.04 repo in 18.04, and remove the 16.04 repo. use: https://download.ceph.com/debian-luminous bionic main - Bastiaan - Original Message - From: "Alfredo Daniel Rezinovsky" To: "ceph-users" Sent: Sunday, August 19, 2018 10:15:00 PM Subject: [ceph-users]

Re: [ceph-users] Invalid Object map without flags set

2018-08-20 Thread Glen Baars
Hello K, We have found our issue – we were only fixing the main RDB image in our script rather than the snapshots. Working fine now. Thanks for your help. Kind regards, Glen Baars From: Konstantin Shalygin Sent: Friday, 17 August 2018 11:20 AM To: ceph-users@lists.ceph.com; Glen Baars

Re: [ceph-users] Librados Keyring Issues

2018-08-20 Thread Benjamin Cherian
Ok...after a bit more searching. I realized you can specify the username directly in the constructor of the "Rados" object. I'm still not entirely clear how one would do it through the config file, but this works for me as well. import rados cluster = rados.Rados(conffile="python_ceph.conf",

Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-20 Thread Kees Meijs
Hi David, Thanks for your advice. My end goal is BlueStore so to upgrade to Jewel and then Luminous would be ideal. Currently all monitors are (succesfully) running Internalis, one OSD node is running Infernalis and all other OSD nodes have Hammer. I'll try freeing up one Infernalis OSD at

Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-20 Thread Kees Meijs
Bad news: I've got a PG stuck in down+peering now. Please advice. K. On 20-08-18 12:12, Kees Meijs wrote: > Thanks for your advice. My end goal is BlueStore so to upgrade to Jewel > and then Luminous would be ideal. > > Currently all monitors are (succesfully) running Internalis, one OSD > node

Re: [ceph-users] Set existing pools to use hdd device class only

2018-08-20 Thread David Turner
All of the data moves because all of the crush IDs for the hosts and osds changes when you configure a crush rule to only use SSDs or HDDs. Crush creates shadow hosts and shadow osds in the crush map that only have each class of osd. So if you had node1 with osd.0 as an hdd and osd.1 as an SSD,

[ceph-users] BlueStore sizing

2018-08-20 Thread Harald Staub
As mentioned here recently, the sizing recommendations for BlueStore have been updated: http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing In our ceph cluster, we have some ratios that are much lower, like 20GB of SSD (WAL and DB) per 7TB of spinning space. This

Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-20 Thread Kees Meijs
The given PG is back online, phew... Meanwhile, some OSDs still on Hammer seem to crash with errors alike: > 2018-08-20 13:06:33.819569 7f8962b2f700 -1 osd/ReplicatedPG.cc: In > function 'void ReplicatedPG::scan_range(int, int, > PG::BackfillInterval*, ThreadPool::TPHandle&)' thread 7f8962b2f700

[ceph-users] Ensure Hammer client compatibility

2018-08-20 Thread Kees Meijs
Good afternoon Cephers, While I'm fixing our upgrade-semi-broken cluster (see thread Upgrade to Infernalis: failed to pick suitable auth object) I'm wondering about ensuring client compatibility. My end goal is BlueStore (i.e. running Luminous) and unfortunately I'm obliged to offer Hammer

[ceph-users] Set existing pools to use hdd device class only

2018-08-20 Thread Enrico Kern
Hello, right now we have multiple HDD only clusters with ether filestore journals on SSDs or on newer installations WAL etc. on SSD. I plan to extend our ceph clusters with SSDs to provide ssd only pools. In luminous we have devices classes so that i should be able todo this without editing

Re: [ceph-users] Set existing pools to use hdd device class only

2018-08-20 Thread Enrico Kern
Hmm then is not really an option for me. Maybe someone from the devs can shed a light why it is doing migration as long you only have OSDs with the same class? I have a few Petabyte of Storage in each cluster. When it starts migrating everything again that will result in a super big performance

Re: [ceph-users] Librados Keyring Issues

2018-08-20 Thread David Turner
It isn't possible in the config file. You have to do it via the rados constructor. You came to the correct conclusion. On Mon, Aug 20, 2018, 2:59 AM Benjamin Cherian wrote: > Ok...after a bit more searching. I realized you can specify the username > directly in the constructor of the "Rados"

Re: [ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-20 Thread Jakub Jaszewski
Issue tracker http://tracker.ceph.com/issues/23801. Still don't know why only particular OSDs write this information to log files. Jakub On Wed, Aug 8, 2018 at 12:02 PM Jakub Jaszewski wrote: > Hi All, exactly the same story today, same 8 OSDs and a lot of garbage > collection objects to

Re: [ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-20 Thread David Turner
I'm assuming you use RGW and that you have a GC pool for RGW. It also might beat assumed that your GC pool only has 8 PGs. Are any of those guesses correct? On Mon, Aug 20, 2018, 5:13 AM Jakub Jaszewski wrote: > Issue tracker http://tracker.ceph.com/issues/23801. > Still don't know why only

Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-20 Thread Kees Meijs
Hi again, Over night some other PGs seem inconsistent as well after being deep scrubbed. All affected OSDs log similar errors like: > log [ERR] : 3.13 soid -5/0013/temp_3.13_0_16175425_287/head: > failed to pick suitable auth object Since there's temp in the name and we're running a

Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-20 Thread Kees Meijs
Ehrm, that should of course be rebuilding. (I.e. removing the OSD, reformat, re-add.) On 20-08-18 11:51, Kees Meijs wrote: > Since there's temp in the name and we're running a 3-replica cluster, > I'm thinking of just reboiling the comprised OSDs. ___

Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object

2018-08-20 Thread David Turner
My suggestion would be to remove the osds and let the cluster recover from all of the other copies. I would deploy the node back to Hammer instead of Infernalis. Either that or remove these osds, let the cluster backfill, and then upgrade to Jewel, and then luminous, and maybe mimic if you're

Re: [ceph-users] Set existing pools to use hdd device class only

2018-08-20 Thread Marc Roos
I just recently did the same. Take into account that everything starts migrating. How weird it maybe, I had hdd test cluster only and changed the crush rule to having hdd. Took a few days, totally unnecessary as far as I am concerned. -Original Message- From: Enrico Kern

Re: [ceph-users] missing dependecy in ubuntu packages

2018-08-20 Thread John Spray
On Sun, Aug 19, 2018 at 9:21 PM Alfredo Daniel Rezinovsky wrote: > > both in ubuntu 16.04 and 18.04 ceph-mgr fail to starts when package > python-routes is not installed I guess you mean that the dashboard doesn't work, as opposed to the whole ceph-mgr process not starting? If it's the latter

Re: [ceph-users] BlueStore sizing

2018-08-20 Thread Paul Emmerich
It will automatically spill over to the slower storage if necessary; it's better to have some fast storage for the DB than just slow storage. Paul 2018-08-20 13:07 GMT+02:00 Harald Staub : > As mentioned here recently, the sizing recommendations for BlueStore have > been updated: >

Re: [ceph-users] FreeBSD rc.d script: sta.rt not found

2018-08-20 Thread Norman Gray
Willem Jan, hello. On 16 Aug 2018, at 12:07, Willem Jan Withagen wrote: In the mean time I have uploaded a PR to fix this in the manual, which should read: gpart create -s GPT ada1 gpart add -t freebsd-zfs -l osd.1 ada1 zpool create osd.1 gpt/osd.1 zfs create -o

Re: [ceph-users] Questions on CRUSH map

2018-08-20 Thread Cody
Hi Konstantin, Thank you for looking into my question. I was trying to understand how to set up CRUSH hierarchies and set rules for different failure domains. I am particularly confused by the 'step take' and 'step choose|chooseleaf' settings for which I think are the keys for defining a failure

Re: [ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-20 Thread David Turner
In luminous they consolidated a lot of the rgw metadata pools by using namespace inside of the pools. I would say that the GC pool was consolidated into the log pool based on the correlation you've found with the primary osds. At least that mystery is solved as to why those 8 osds. I don't know

Re: [ceph-users] failing to respond to cache pressure

2018-08-20 Thread Eugen Block
Update: we are getting these messages again. So the search continues... Zitat von Eugen Block : Hi, Depending on your kernel (memory leaks with CephFS) increasing the mds_cache_memory_limit could be of help. What is your current setting now? ceph:~ # ceph daemon mds. config show |

Re: [ceph-users] Mimic osd fails to start.

2018-08-20 Thread Daznis
Hello, It appears that something is horribly wrong with the cluster itself. I can't create or add any new osds to it at all. On Mon, Aug 20, 2018 at 11:04 AM Daznis wrote: > > Hello, > > > Zapping the journal didn't help. I tried to create the journal after > zapping it. Also failed. I'm not