Re: [ceph-users] balancer mgr module

2018-02-16 Thread Dan van der Ster
Hi Caspar, I've been trying the mgr balancer for a couple weeks now and can share some experience. Currently there are two modes implemented: upmap and crush-compat. Upmap requires all clients to be running luminous -- it uses this new pg-upmap mechanism to precisely move PGs one by one to a

[ceph-users] Ceph-mgr Python error with prometheus plugin

2018-02-16 Thread Ansgar Jazdzewski
Hi Folks, i just try to get the prometheus plugin up and runing but as soon as i browse /metrics i got: 500 Internal Server Error The server encountered an unexpected condition which prevented it from fulfilling the request. Traceback (most recent call last): File

Re: [ceph-users] Monitor won't upgrade

2018-02-16 Thread Mark Schouten
On vrijdag 16 februari 2018 00:21:34 CET Gregory Farnum wrote: > If mon.0 is not connected to the cluster, the monitor version report won’t > update — how could it? > > So you need to figure out why that’s not working. A monitor that’s running > but isn’t part of the active set is not good.

[ceph-users] balancer mgr module

2018-02-16 Thread Caspar Smit
Hi, After watching Sage's talk at LinuxConfAU about making distributed storage easy he mentioned the Balancer Manager module. After enabling this module, pg's should get balanced automagically around the cluster. The module was added in Ceph Luminous v12.2.2 Since i couldn't find much

[ceph-users] radosgw: Huge Performance impact during dynamic bucket index resharding

2018-02-16 Thread Micha Krause
Hi, Radosgw decided to reshard a bucket with 25 million objects from 256 to 512 shards. Resharding took about 1 hour, during this time all buckets on the cluster had a huge performance drop. "GET" requests for small objects (on other buckets) took multiple seconds. Are there any

Re: [ceph-users] Ceph-mgr Python error with prometheus plugin

2018-02-16 Thread Konstantin Shalygin
i just try to get the prometheus plugin up and runing Use module from master. From this commit should work with 12.2.2, just wget it and replace stock module. https://github.com/ceph/ceph/blob/d431de74def1b8889ad568ab99436362833d063e/src/pybind/mgr/prometheus/module.py k

Re: [ceph-users] Luminous and calamari

2018-02-16 Thread Lenz Grimmer
On 02/16/2018 07:16 AM, Kai Wagner wrote: > yes there are plans to add management functionality to the dashboard as > well. As soon as we're covered all the existing functionality to create > the initial PR we'll start with the management stuff. The big benefit > here is, that we can profit what

Re: [ceph-users] Ceph-mgr Python error with prometheus plugin

2018-02-16 Thread Jan Fajerski
On Fri, Feb 16, 2018 at 09:27:08AM +0100, Ansgar Jazdzewski wrote: Hi Folks, i just try to get the prometheus plugin up and runing but as soon as i browse /metrics i got: 500 Internal Server Error The server encountered an unexpected condition which prevented it from fulfilling the request.

Re: [ceph-users] rgw bucket inaccessible - appears to be using incorrect index pool?

2018-02-16 Thread Robin H. Johnson
On Fri, Feb 16, 2018 at 07:06:21PM -0600, Graham Allan wrote: [snip great debugging] This seems similar to two open issues, could be either of them depending on how old that bucket is. http://tracker.ceph.com/issues/22756 http://tracker.ceph.com/issues/22928 - I have a mitigation posted to

Re: [ceph-users] High Load and High Apply Latency

2018-02-16 Thread John Petrini
I thought I'd follow up on this just in case anyone else experiences similar issues. We ended up increasing the tcmalloc thread cache size and saw a huge improvement in latency. This got us out of the woods because we were finally in a state where performance was good enough that it was no longer

Re: [ceph-users] Newbie question: stretch ceph cluster

2018-02-16 Thread Alex Gorbachev
On Wed, Feb 14, 2018 at 3:20 AM Maged Mokhtar wrote: > Hi, > > You need to set the min_size to 2 in crush rule. > > The exact location and replication flow when a client writes data depends > on the object name and num of pgs. the crush rule determines which osds > will

[ceph-users] rgw bucket inaccessible - appears to be using incorrect index pool?

2018-02-16 Thread Graham Allan
Sorry to be posting a second mystery at the same time - though this feels unconnected to my other one. We had a user complain that they can't list the contents of one of their buckets (they can access certain objects within the bucket). I started by running a simple command to get data on

Re: [ceph-users] Ceph-mgr Python error with prometheus plugin

2018-02-16 Thread Ansgar Jazdzewski
tanks i will have a look into it -- Ansgar 2018-02-16 10:10 GMT+01:00 Konstantin Shalygin : >> i just try to get the prometheus plugin up and runing > > > > Use module from master. > > From this commit should work with 12.2.2, just wget it and replace stock > module. > >

[ceph-users] libvirt on ceph - external snapshots?

2018-02-16 Thread João Pagaime
Hello all, I have a  VM system with libvirt/KVM on top of  a ceph storage system I can't take an external snapshot (disk + RAM) of a running vm. The option view->snapshots is disabled on the virt-manager application . On other VM, on the same hypervisor, that runs on local storage, I can take

Re: [ceph-users] Migrating to new pools

2018-02-16 Thread Jens-U. Mozdzen
Dear list, hello Jason, you may have seen my message on the Ceph mailing list about RDB pool migration - it's a common subject that pools were created in a sub-optimum fashion and i. e. pgnum is (not yet) reducible, so we're looking into means to "clone" an RBD pool into a new pool within

Re: [ceph-users] Ceph-mgr Python error with prometheus plugin

2018-02-16 Thread Ansgar Jazdzewski
hi, wihle i added the "class" to all my OSD's the ceph-mgr crashed :-( but the prometheus plugin works now for i in {1..9}; do ceph osd crush set-device-class hdd osd.$i; done Thanks, Ansgar 2018-02-16 10:12 GMT+01:00 Jan Fajerski : > On Fri, Feb 16, 2018 at 09:27:08AM

Re: [ceph-users] Efficient deletion of large radosgw buckets

2018-02-16 Thread Sean Purdy
Thanks David. > purging the objects and bypassing the GC is definitely the way to go Cool. > What rebalancing do you expect to see during this operation that you're > trying to avoid I think I just have a poor understanding or wasn't thinking very hard :) I suppose the question really was

Re: [ceph-users] Ceph-mgr Python error with prometheus plugin

2018-02-16 Thread Ansgar Jazdzewski
we upgraded the cluster from jewel to luminous but the restart of the ceph osd service did not add the 'CLASS (hdd/ssd)' on its own it it not exist so i had to add it on my own to make it work. should somhow mentiond in the upgrade process? like: for all osd make shure that the "class" it set

[ceph-users] puppet for the deployment of ceph

2018-02-16 Thread Александр Пивушков
Colleagues, tell me please, who uses puppet for the deployment of ceph in production?   And also, where can I get the puppet modules for ceph? Александр Пивушков ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Migrating to new pools

2018-02-16 Thread Jason Dillaman
On Fri, Feb 16, 2018 at 5:36 AM, Jens-U. Mozdzen wrote: > Dear list, hello Jason, > > you may have seen my message on the Ceph mailing list about RDB pool > migration - it's a common subject that pools were created in a sub-optimum > fashion and i. e. pgnum is (not yet)

[ceph-users] ceph df: Raw used vs. used vs. actual bytes in cephfs

2018-02-16 Thread Flemming Frandsen
I'm trying out cephfs and I'm in the process of copying over some real-world data to see what happens. I have created a number of cephfs file systems, the only one I've started working on is the one called jenkins specifically the one named jenkins which lives in fs_jenkins_data and

Re: [ceph-users] Migrating to new pools

2018-02-16 Thread Jason Dillaman
On Fri, Feb 16, 2018 at 8:08 AM, Jason Dillaman wrote: > On Fri, Feb 16, 2018 at 5:36 AM, Jens-U. Mozdzen wrote: >> Dear list, hello Jason, >> >> you may have seen my message on the Ceph mailing list about RDB pool >> migration - it's a common subject that

Re: [ceph-users] balancer mgr module

2018-02-16 Thread Caspar Smit
2018-02-16 10:16 GMT+01:00 Dan van der Ster : > Hi Caspar, > > I've been trying the mgr balancer for a couple weeks now and can share > some experience. > > Currently there are two modes implemented: upmap and crush-compat. > > Upmap requires all clients to be running

Re: [ceph-users] Ceph-mgr Python error with prometheus plugin

2018-02-16 Thread John Spray
On Fri, Feb 16, 2018 at 8:27 AM, Ansgar Jazdzewski wrote: > Hi Folks, > > i just try to get the prometheus plugin up and runing but as soon as i > browse /metrics i got: > > 500 Internal Server Error > The server encountered an unexpected condition which prevented it

[ceph-users] ceph luminous - ceph tell osd bench performance

2018-02-16 Thread Steven Vacaroaia
Hi, For every CONSECUTIVE ran of the "ceph tell osd.x bench" command I get different and MUCH worse results Is this expected ? OSD was created with the following command ( /dev/sda is an Entreprise class SDD) ceph-deploy osd create --zap-disk --bluestore osd01:sdc --block-db /dev/sda

Re: [ceph-users] Migrating to new pools

2018-02-16 Thread Eugen Block
Hi Jason, ... also forgot to mention "rbd export --export-format 2" / "rbd import --export-format 2" that will also deeply export/import all snapshots associated with an image and that feature is available in the Luminous release. thanks for that information, this could be very valuable for

Re: [ceph-users] Migrating to new pools

2018-02-16 Thread Jason Dillaman
On Fri, Feb 16, 2018 at 11:20 AM, Eugen Block wrote: > Hi Jason, > >> ... also forgot to mention "rbd export --export-format 2" / "rbd >> import --export-format 2" that will also deeply export/import all >> snapshots associated with an image and that feature is available in >> the

Re: [ceph-users] Monitor won't upgrade

2018-02-16 Thread Mark Schouten
On vrijdag 16 februari 2018 13:33:32 CET David Turner wrote: > Can you send us a `ceph status` and `ceph health detail`? Something is > still weird. Also can you query the running daemon for it's version instead > of asking the cluster? You should also be able to find it in the logs when > it

Re: [ceph-users] Is the minimum length of a part in a RGW multipart upload configurable?

2018-02-16 Thread Casey Bodley
On 02/16/2018 12:39 AM, F21 wrote: I am uploading parts to RGW using the S3 multipart upload functionality. I tried uploading a part sized at 500 KB and received a EntityTooSmall error from the server. I am assuming that it expects each part to have a minimum size of 5MB like S3. I found

Re: [ceph-users] Monitor won't upgrade

2018-02-16 Thread David Turner
Can you send us a `ceph status` and `ceph health detail`? Something is still weird. Also can you query the running daemon for it's version instead of asking the cluster? You should also be able to find it in the logs when it starts. On Fri, Feb 16, 2018, 4:24 AM Mark Schouten

[ceph-users] mon service failed to start

2018-02-16 Thread Behnam Loghmani
Hi there, I have a Ceph cluster version 12.2.2 on CentOS 7. It is a testing cluster and I have set it up 2 weeks ago. after some days, I see that one of the three mons has stopped(out of quorum) and I can't start it anymore. I checked the mon service log and the output shows this error: """

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-16 Thread Bryan Banister
Well I decided to try the increase in PGs to 4096 and that seems to have caused some issues: 2018-02-16 12:38:35.798911 mon.carf-ceph-osd01 [ERR] overall HEALTH_ERR 61802168/241154376 objects misplaced (25.628%); Reduced data availability: 2081 pgs inactive, 322 pgs peering; Degraded data

[ceph-users] Ceph Crush for 2 room setup

2018-02-16 Thread Karsten Becker
Hi. I want to run my Ceph cluster in a 2 datacenter/room setup with pool size/replica 3. But I don't get it done to define the ruleset correctly - or at least I am unsure if it is correct. I have the following setup of my Ceph cluster: > ID CLASS WEIGHT TYPE NAME

[ceph-users] Orphaned entries in Crush map

2018-02-16 Thread Karsten Becker
Hi. during the reorgainzation of my cluster I removed some OSDs. Obviously something went wrong for 2 of them, osd.19 and osd.20. If I get my current Crush map, decompile and edit them, I see 2 orphaned/stale entries for the former OSDs: > device 16 osd.16 class hdd > device 17 osd.17 class hdd

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-16 Thread Bryan Banister
Thanks David, I have set the nobackfill, norecover, noscrub, and nodeep-scrub options at this point and the backfills have stopped. I’ll also stop the backups from pushing into ceph for now. I don’t want to make things worse, so ask for some more guidance now. 1) In looking at a PG

Re: [ceph-users] ceph df: Raw used vs. used vs. actual bytes in cephfs

2018-02-16 Thread Gregory Farnum
What does the cluster deployment look like? Usually this happens when you’re sharing disks with the OS, or have co-located file journals or something. On Fri, Feb 16, 2018 at 4:02 AM Flemming Frandsen < flemming.frand...@stibosystems.com> wrote: > I'm trying out cephfs and I'm in the process of

Re: [ceph-users] Restoring keyring capabilities

2018-02-16 Thread Michel Raabe
On 02/16/18 @ 18:59, Nico Schottelius wrote: > Saw that, too, however it does not work: > > root@server3:/var/lib/ceph/mon/ceph-server3# ceph -n mon. --keyring keyring > auth caps client.admin mds 'allow *' osd 'allow *' mon 'allow *' > 2018-02-16 17:23:38.154282 7f7e257e3700 0 librados: mon.

Re: [ceph-users] Understanding/correcting sudden onslaught of unfound objects

2018-02-16 Thread Graham Allan
On 02/16/2018 12:31 PM, Graham Allan wrote: If I set debug rgw=1 and demug ms=1 before running the "object stat" command, it seems to stall in a loop of trying communicate with osds for pool 96, which is .rgw.control 10.32.16.93:0/2689814946 --> 10.31.0.68:6818/8969 --

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-16 Thread David Turner
Your problem might have been creating too many PGs at once. I generally increase pg_num and pgp_num by no more than 256 at a time. Making sure that all PGs are creating, peered, and healthy (other than backfilling). To help you get back to a healthy state, let's start off by getting all of your

Re: [ceph-users] mon service failed to start

2018-02-16 Thread Gregory Farnum
On Fri, Feb 16, 2018 at 7:37 AM Behnam Loghmani wrote: > Hi there, > > I have a Ceph cluster version 12.2.2 on CentOS 7. > > It is a testing cluster and I have set it up 2 weeks ago. > after some days, I see that one of the three mons has stopped(out of > quorum) and I

Re: [ceph-users] Bluestore Hardwaresetup

2018-02-16 Thread Michel Raabe
Hi Peter, On 02/15/18 @ 19:44, Jan Peters wrote: > I want to evaluate ceph with bluestore, so I need some hardware/configure > advices from you. > > My Setup should be: > > 3 Nodes Cluster, on each with: > > - Intel Gold Processor SP 5118, 12 core / 2.30Ghz > - 64GB RAM > - 6 x 7,2k, 4 TB

Re: [ceph-users] Bluestore Hardwaresetup

2018-02-16 Thread Joe Comeau
I have a question about block.db and block.wal How big should they be? Relative to drive size or ssd size ? Thanks Joe >>> Michel Raabe 2/16/2018 9:12 AM >>> Hi Peter, On 02/15/18 @ 19:44, Jan Peters wrote: > I want to evaluate ceph with bluestore, so I need some

[ceph-users] Restoring keyring capabilities

2018-02-16 Thread Nico Schottelius
Hello, on a test cluster I issued a few seconds ago: ceph auth caps client.admin mgr 'allow *' instead of what I really wanted to do ceph auth caps client.admin mgr 'allow *' mon 'allow *' osd 'allow *' \ mds allow Now any access to the cluster using client.admin correctly results in

Re: [ceph-users] Restoring keyring capabilities

2018-02-16 Thread Nico Schottelius
Saw that, too, however it does not work: root@server3:/var/lib/ceph/mon/ceph-server3# ceph -n mon. --keyring keyring auth caps client.admin mds 'allow *' osd 'allow *' mon 'allow *' 2018-02-16 17:23:38.154282 7f7e257e3700 0 librados: mon. authentication error (13) Permission denied [errno

Re: [ceph-users] Bluestore Hardwaresetup

2018-02-16 Thread Jan Peters
Hi,   thank you.   Networksetup is like that:   2 x 10 GBit LACP for public 2 x 10 GBit LACP for clusternetwork 1 x 1 GBit for management   Yes Joe, the sizing for block.db and blockwal would be interesting!   Is there another advice for SSDs like blog from Sebastian Han?:  

Re: [ceph-users] Bluestore Hardwaresetup

2018-02-16 Thread Jan Peters
Hi,   thank you.   Networksetup is like that:   2 x 10 GBit LACP for public 2 x 10 GBit LACP for clusternetwork 1 x 1 GBit for management  Yes Joe, the sizing for block.db and blockwal would be interesting!   Is there another advice for SSDs like blog from Sebastian Han?:  

Re: [ceph-users] Restoring keyring capabilities

2018-02-16 Thread Michel Raabe
On 02/16/18 @ 18:21, Nico Schottelius wrote: > on a test cluster I issued a few seconds ago: > > ceph auth caps client.admin mgr 'allow *' > > instead of what I really wanted to do > > ceph auth caps client.admin mgr 'allow *' mon 'allow *' osd 'allow *' \ > mds allow > > Now any access

Re: [ceph-users] Luminous and calamari

2018-02-16 Thread Ronny Aasen
On 16.02.2018 06:20, Laszlo Budai wrote: Hi, I've just started up the dasboard component of the ceph mgr. It looks OK, but from what can be seen, and what I was able to find in the docs, the dashboard is just for monitoring. Is there any plugin that allows management of the ceph resources

Re: [ceph-users] Understanding/correcting sudden onslaught of unfound objects

2018-02-16 Thread Graham Allan
On 02/15/2018 05:33 PM, Gregory Farnum wrote: On Thu, Feb 15, 2018 at 3:10 PM Graham Allan > wrote: A lot more in xattrs which I won't paste, though the keys are: > root@cephmon1:~# ssh ceph03 find

Re: [ceph-users] Orphaned entries in Crush map

2018-02-16 Thread David Turner
What is the output of `ceph osd stat`? My guess is that they are still considered to be part of the cluster and going through the process of removing OSDs from your cluster is what you need to do. In particular `ceph osd rm 19`. On Fri, Feb 16, 2018 at 2:31 PM Karsten Becker

Re: [ceph-users] mon service failed to start

2018-02-16 Thread Behnam Loghmani
Thanks for your reply Do you mean, that's the problem with the disk I use for WAL and DB? On Fri, Feb 16, 2018 at 11:33 PM, Gregory Farnum wrote: > > On Fri, Feb 16, 2018 at 7:37 AM Behnam Loghmani > wrote: > >> Hi there, >> >> I have a Ceph

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-16 Thread David Turner
The questions I definitely know the answer to first, and then we'll continue from there. If an OSD is blocking peering but is online, when you mark it as down in the cluster it receives a message in it's log saying it was wrongly marked down and tells the mons it is online. That gets it to stop

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-16 Thread David Turner
That sounds like a good next step. Start with OSDs involved in the longest blocked requests. Wait a couple minutes after the osd marks itself back up and continue through them. Hopefully things will start clearing up so that you don't need to mark all of them down. There is usually a only a

Re: [ceph-users] mon service failed to start

2018-02-16 Thread Gregory Farnum
The disk that the monitor is on...there isn't anything for you to configure about a monitor WAL though so I'm not sure how that enters into it? On Fri, Feb 16, 2018 at 12:46 PM Behnam Loghmani wrote: > Thanks for your reply > > Do you mean, that's the problem with the

Re: [ceph-users] Restoring keyring capabilities

2018-02-16 Thread Nico Schottelius
It seems your monitor capabilities are different to mine: root@server3:/opt/ungleich-tools# ceph -k /var/lib/ceph/mon/ceph-server3/keyring -n mon. auth list 2018-02-16 20:34:59.257529 7fe0d5c6b700 0 librados: mon. authentication error (13) Permission denied [errno 13] error connecting to the

Re: [ceph-users] Orphaned entries in Crush map

2018-02-16 Thread Karsten Becker
Hi David. So far everything else is fine. > 46 osds: 46 up, 46 in; 1344 remapped pgs And the rm gives: > root@kong[/0]:~ # ceph osd rm 19 > osd.19 does not exist. > root@kong[/0]:~ # ceph osd rm 20 > osd.20 does not exist. The "devices" do NOT show up in "ceph osd tree" or "ceph osd df"...

Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-16 Thread Bryan Banister
Thanks David, Taking the list of all OSDs that are stuck reports that a little over 50% of all OSDs are in this condition. There isn’t any discernable pattern that I can find and they are spread across the three servers. All of the OSDs are online as far as the service is concern. I have

Re: [ceph-users] Orphaned entries in Crush map

2018-02-16 Thread Karsten Becker
Here is what I did - bash history: > 1897 for n in 6 7 14 15 16 17 18 19 3 9 10 11 12 20; do ceph osd down > osd.$n; done> 1920 for n in 6 7 14 15 16 17 18 19 3 9 10 11 12 20; do ceph > osd out osd.$n; done > 1921 for n in 6 7 14 15 16 17 18 19 3 9 10 11 12 20; do ceph osd down > osd.$n;

Re: [ceph-users] Orphaned entries in Crush map

2018-02-16 Thread David Turner
First you stop the service, then make sure they're down, out, crush remove, auth del, and finally osd rm. You had it almost in the right order, but you were down and outing them before you stopped them. That would allow them to mark themselves back up and in. The down and out commands don't

Re: [ceph-users] Understanding/correcting sudden onslaught of unfound objects

2018-02-16 Thread Gregory Farnum
On Fri, Feb 16, 2018 at 12:17 PM Graham Allan wrote: > On 02/16/2018 12:31 PM, Graham Allan wrote: > > > > If I set debug rgw=1 and demug ms=1 before running the "object stat" > > command, it seems to stall in a loop of trying communicate with osds for > > pool 96, which is

Re: [ceph-users] mon service failed to start

2018-02-16 Thread Behnam Loghmani
I checked the disk that monitor is on it with smartctl and it didn't return any error and it doesn't have any Current_Pending_Sector. Do you recommend any disk checks to make sure that this disk has problem and then I can send the report to the provider for replacing the disk On Sat, Feb 17, 2018

Re: [ceph-users] Signature check failures.

2018-02-16 Thread Gregory Farnum
On Thu, Feb 15, 2018 at 10:28 AM Cary wrote: > Hello, > > I have enabled debugging on my MONs and OSDs to help troubleshoot > these signature check failures. I was watching ods.4's log and saw > these errors when the signature check failure happened. > > 2018-02-15