Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

2019-02-07 Thread Wido den Hollander
On 2/7/19 8:41 AM, Brett Chancellor wrote: > This seems right. You are doing a single benchmark from a single client. > Your limiting factor will be the network latency. For most networks this > is between 0.2 and 0.3ms.  if you're trying to test the potential of > your cluster, you'll need

Re: [ceph-users] v12.2.11 Luminous released

2019-01-31 Thread Wido den Hollander
On 2/1/19 8:44 AM, Abhishek wrote: > We are glad to announce the eleventh bug fix release of the Luminous > v12.2.x long term stable release series. We recommend that all users > upgrade to this release. Please note the following precautions while > upgrading. > > Notable Changes >

Re: [ceph-users] block storage over provisioning

2019-01-30 Thread Wido den Hollander
On 1/30/19 9:12 PM, Void Star Nill wrote: > Hello, > > When a Ceph block device is created with a given size, does Ceph > allocate all that space right away or is that allocated as the user > starts storing the data? > > I want to know if we can over provision the Ceph cluster. For example, >

Re: [ceph-users] backfill_toofull while OSDs are not full

2019-01-30 Thread Wido den Hollander
it to me. In a few weeks I'll be performing an expansion with a customer where I'm expecting this to show up again. I'll check again and note the use on all OSDs and report back. Wido > > David > > On 1/27/19 11:34 PM, Wido den Hollander wrote: >> >> On 1/25/19 8:33 AM, Grego

Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread Wido den Hollander
On 1/30/19 2:02 PM, PHARABOT Vincent wrote: > Hello, > >   > > I have my cluster set up correctly now (thank you again for the help) > >   > > I am seeking now a way to get cluster health thru API (REST) with curl > command. > > I had a look at manager / RESTful and Dashboard but none seems

Re: [ceph-users] Right way to delete OSD from cluster?

2019-01-30 Thread Wido den Hollander
On 1/30/19 2:00 PM, Fyodor Ustinov wrote: > Hi! > > I thought I should first do "ceph osd out", wait for the end relocation of > the misplaced objects and after that do "ceph osd purge". > But after "purge" the cluster starts relocation again. > > Maybe I'm doing something wrong? Then what

Re: [ceph-users] backfill_toofull while OSDs are not full

2019-01-27 Thread Wido den Hollander
ime I saw many PGs in the backfill_toofull state for a long time. This is new since Mimic. Wido > On Tue, Jan 22, 2019 at 2:44 PM Wido den Hollander <mailto:w...@42on.com>> wrote: > > Hi, > > I've got a couple of PGs which are stuck in backfill_toofull, but non

[ceph-users] backfill_toofull while OSDs are not full

2019-01-22 Thread Wido den Hollander
Hi, I've got a couple of PGs which are stuck in backfill_toofull, but none of them are actually full. "up": [ 999, 1900, 145 ], "acting": [ 701, 1146, 1880 ], "backfill_targets": [ "145", "999", "1900" ], "acting_recovery_backfill": [ "145",

Re: [ceph-users] dropping python 2 for nautilus... go/no-go

2019-01-18 Thread Wido den Hollander
On 1/16/19 4:54 PM, c...@jack.fr.eu.org wrote: > Hi, > > My 2 cents: > - do drop python2 support I wouldn't agree. Python 2 needs to be dropped. > - do not drop python2 support unexpectedly, aka do a deprecation phase > Indeed. Deprecate it at the Nautilus release and drop it after N.

Re: [ceph-users] Offsite replication scenario

2019-01-16 Thread Wido den Hollander
On 1/16/19 8:08 PM, Anthony Verevkin wrote: > I would definitely see huge value in going to 3 MONs here (and btw 2 on-site > MGR and 2 on-site MDS) > However 350Kbps is quite low and MONs may be latency sensitive, so I suggest > you do heavy QoS if you want to use that link for ANYTHING else.

Re: [ceph-users] /var/lib/ceph/mon/ceph-{node}/store.db on mon nodes

2019-01-16 Thread Wido den Hollander
y advice: Do something about this right now. I can't stress it enough: If you loose that single Monitor you will be in trouble, big trouble. Wido > Sent from my iPhone > >> On Jan 16, 2019, at 02:56, Wido den Hollander wrote: >> >> >> >>> On 1/16/19 10:36 AM, Matt

Re: [ceph-users] /var/lib/ceph/mon/ceph-{node}/store.db on mon nodes

2019-01-16 Thread Wido den Hollander
On 1/16/19 10:36 AM, Matthew Vernon wrote: > Hi, > > On 16/01/2019 09:02, Brian Topping wrote: > >> I’m looking at writes to a fragile SSD on a mon node, >> /var/lib/ceph/mon/ceph-{node}/store.db is the big offender at the >> moment. >> Is it required to be on a physical disk or can it be in

Re: [ceph-users] ceph-osd processes restart during Luminous -> Mimic upgrade on CentOS 7

2019-01-15 Thread Wido den Hollander
> > On Tue, Jan 15, 2019 at 11:33 AM Wido den Hollander wrote: >> >> Hi, >> >> I'm in the middle of upgrading a 12.2.8 cluster to 13.2.4 and I've >> noticed that during the Yum/RPM upgrade the OSDs are being restarted. >> >> Jan 15 11:24:25 x

[ceph-users] ceph-osd processes restart during Luminous -> Mimic upgrade on CentOS 7

2019-01-15 Thread Wido den Hollander
Hi, I'm in the middle of upgrading a 12.2.8 cluster to 13.2.4 and I've noticed that during the Yum/RPM upgrade the OSDs are being restarted. Jan 15 11:24:25 x yum[2348259]: Updated: 2:ceph-base-13.2.4-0.el7.x86_64 Jan 15 11:24:47 x systemd[1]: Stopped target ceph target allowing to

Re: [ceph-users] Problems after migrating to straw2 (to enable the balancer)

2019-01-14 Thread Wido den Hollander
On 1/14/19 3:18 PM, Massimo Sgaravatto wrote: > Thanks for the prompt reply > > Indeed I have different racks with different weights.  > Below the ceph osd tree" output > Can you also show the output of 'ceph osd df' ? The amount of PGs might be on the low side which also causes this

Re: [ceph-users] RBD Mirror Proxy Support?

2019-01-14 Thread Wido den Hollander
On 1/11/19 8:08 PM, Kenneth Van Alstyne wrote: > Hello all (and maybe this would be better suited for the ceph devel > mailing list): > I’d like to use RBD mirroring between two sites (to each other), but I > have the following limitations: > - The clusters use the same name (“ceph”) > - The

Re: [ceph-users] two OSDs with high out rate

2019-01-10 Thread Wido den Hollander
On 1/10/19 12:59 PM, Marc wrote: > Hi, > > for support reasons we're still running firefly (part of MCP 6). In our > grafana monitoring we noticed that two out of 128 OSD processes show > significantly higher outbound IO than all the others and this is > constant (cant see first occurance of

Re: [ceph-users] cephfs free space issue

2019-01-09 Thread Wido den Hollander
On 1/9/19 2:33 PM, Yoann Moulin wrote: > Hello, > > I have a CEPH cluster in luminous 12.2.10 dedicated to cephfs. > > The raw size is 65.5 TB, with a replica 3, I should have ~21.8 TB usable. > > But the size of the cephfs view by df is *only* 19 TB, is that normal ? > Yes. Ceph will

Re: [ceph-users] repair do not work for inconsistent pg which three replica are the same

2019-01-09 Thread Wido den Hollander
On 1/10/19 8:36 AM, hnuzhoulin2 wrote: > > Hi,cephers > > I have two inconsistent pg.I try list inconsistent obj,got nothing. > > rados list-inconsistent-obj 388.c29 > No scrub information available for pg 388.c29 > error 2: (2) No such file or directory > Have you tried to run a

Re: [ceph-users] set-require-min-compat-client failed

2019-01-09 Thread Wido den Hollander
maybe you are running a older patch version which doesn't have the required Luminous code? Wido > -Original Message----- > From: Wido den Hollander [mailto:w...@42on.com] > Sent: Wednesday, January 09, 2019 4:48 PM > To: 楼锴毅; ceph-users@lists.ceph.com > Subject: Re: [ceph-users

Re: [ceph-users] Ceph Dashboard Rewrite

2019-01-08 Thread Wido den Hollander
On 1/8/19 8:58 PM, Marc Schöchlin wrote: > Hello ceph-users, > > we are using ceph luminous 12.2.10. > > We run 3 mgrs - if i access the dashboard on a non-active mgr i get a > location redirect to the hostname. > Because this is not a fqdn, i cannot access the the dasboard in a convenient >

Re: [ceph-users] rocksdb mon stores growing until restart

2019-01-08 Thread Wido den Hollander
On 8/30/18 10:28 AM, Dan van der Ster wrote: > Hi, > > Is anyone else seeing rocksdb mon stores slowly growing to >15GB, > eventually triggering the 'mon is using a lot of disk space' warning? > > Since upgrading to luminous, we've seen this happen at least twice. > Each time, we restart all

Re: [ceph-users] Is it possible to increase Ceph Mon store?

2019-01-08 Thread Wido den Hollander
On 1/7/19 11:15 PM, Pardhiv Karri wrote: > Thank you Bryan, for the information. We have 816 OSDs of size 2TB each. > The mon store too big popped up when no rebalancing happened in that > month. It is slightly above the 15360 threshold around 15900 or 16100 > and stayed there for more than a

Re: [ceph-users] [Ceph-large] Help with setting device-class rule on pool without causing data to move

2018-12-31 Thread Wido den Hollander
Recently Dan from CERN showcased a interesting way. Use upmap to map all PGs to the current OSDs and then change the CRUSH topology. Then enable the balancer module and have it slowly move the PGs. Yes, you will rebalance, but it can be done over a very long period with Health OK in between.

Re: [ceph-users] Mounting DR copy as Read-Only

2018-12-12 Thread Wido den Hollander
On 12/12/18 4:44 PM, Vikas Rana wrote: > Hi, > > We are using Luminous and copying a 100TB RBD image to DR site using RBD > Mirror. > > Everything seems to works fine. > > The question is, can we mount the DR copy as Read-Only? We can do it on > Netapp and we are trying to figure out if

Re: [ceph-users] Cephalocon Barcelona 2019 CFP now open!

2018-12-10 Thread Wido den Hollander
On 12/10/18 5:00 PM, Mike Perez wrote: > Hello everyone! > > It gives me great pleasure to announce the CFP for Cephalocon Barcelona > 2019 is now open [1]! > > Cephalocon Barcelona aims to bring together more than 800 technologists > and adopters from across the globe to showcase Ceph’s

Re: [ceph-users] PG problem after reweight (1 PG active+remapped)

2018-12-03 Thread Wido den Hollander
ight make it worse. Wido > I am starting to get anxious for data integrity or ceph readonly state > also in case OSD 6 or 26 have availability issues... > > Thanks for quick reply! > > Regards, > Nasos Panterlis > ---

Re: [ceph-users] PG problem after reweight (1 PG active+remapped)

2018-12-03 Thread Wido den Hollander
Hi, How old is this cluster? As this might be a CRUSH tunables issue where this pops up. You can try (might move a lot of data!) $ ceph osd getcrushmap -o crushmap.backup $ ceph osd crush tunables optimal If things go wrong you always have the old CRUSHmap: $ ceph osd setcrushmap -i

Re: [ceph-users] rbd IO monitoring

2018-11-29 Thread Wido den Hollander
On 11/30/18 5:48 AM, Michael Green wrote: > Hello collective wisdom, > > Ceph neophyte here, running v13.2.2 (mimic). > > Question: what tools are available to monitor IO stats on RBD level? > That is, IOPS, Throughput, IOs inflight and so on? > I'm testing with FIO and want to verify

Re: [ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Wido den Hollander
ion error: Corruption: block checksum mismatch RocksDB got corrupted on that OSD and won't be able to start now. I wouldn't know where to start with this OSD. Wido > Il giorno gio 29 nov 2018 alle ore 10:43 Wido den Hollander > mailto:w...@42on.com>> ha scritto: > > > > On 1

Re: [ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Wido den Hollander
On 11/29/18 10:28 AM, Mario Giammarco wrote: > Hello, > I have a ceph installation in a proxmox cluster. > Due to a temporary hardware glitch now I get this error on osd startup > > -6> 2018-11-26 18:02:33.179327 7fa1d784be00  0 osd.0 1033 crush map > has features 1009089991638532096,

Re: [ceph-users] Monitor disks for SSD only cluster

2018-11-26 Thread Wido den Hollander
On 11/26/18 2:21 PM, Gregory Farnum wrote: > As the monitors limit their transaction rates, I would tend for the > higher-durability drives. I don't think any monitor throughput issues > have been reported on clusters with SSDs for storage. I can confirm that. Just make sure you have proper

Re: [ceph-users] PG auto repair with BlueStore

2018-11-15 Thread Wido den Hollander
search the cluster logs to find out that a auto repair took place on a Placement Group. Wido > > -K. > > > On 2018-11-15 20:45, Mark Schouten wrote: >> As a user, I’m very surprised that this isn’t a default setting. >> >> Mark Schouten >> >>>

Re: [ceph-users] PG auto repair with BlueStore

2018-11-15 Thread Wido den Hollander
On 11/15/18 7:45 PM, Mark Schouten wrote: > As a user, I’m very surprised that this isn’t a default setting. > That is because you can also have FileStore OSDs in a cluster on which such a auto-repair is not safe. Wido > Mark Schouten > >> Op 15 nov. 2018 om 18:40 heeft

Re: [ceph-users] PG auto repair with BlueStore

2018-11-15 Thread Wido den Hollander
Hi, This question is actually still outstanding. Is there any good reason to keep auto repair for scrub errors disabled with BlueStore? I couldn't think of a reason when using size=3 and min_size=2, so just wondering. Thanks! Wido On 8/24/18 8:55 AM, Wido den Hollander wrote: >

[ceph-users] Removing orphaned radosgw bucket indexes from pool

2018-11-15 Thread Wido den Hollander
Hi, Recently we've seen multiple messages on the mailinglists about people seeing HEALTH_WARN due to large OMAP objects on their cluster. This is due to the fact that starting with 12.2.6 OSDs warn about this. I've got multiple people asking me the same questions and I've done some digging

Re: [ceph-users] Placement Groups undersized after adding OSDs

2018-11-15 Thread Wido den Hollander
e vacated from this OSD. The side-effect is that it took 14 hours before these PGs started to backfill. I would say that a PG which is in undersized+degraded should get the highest possible priority to be repaired asap. Wido > On Wed, Nov 14, 2018 at 8:09 PM Wido den Hollander <mailt

[ceph-users] Placement Groups undersized after adding OSDs

2018-11-14 Thread Wido den Hollander
Hi, I'm in the middle of expanding a Ceph cluster and while having 'ceph -s' open I suddenly saw a bunch of Placement Groups go undersized. My first hint was that one or more OSDs have failed, but none did. So I checked and I saw these Placement Groups undersized: 11.3b54

Re: [ceph-users] upgrade ceph from L to M

2018-11-13 Thread Wido den Hollander
On 11/13/18 12:49 PM, Zhenshi Zhou wrote: > Hi > > I remember that there was a bug when using cephfs after  > upgrading ceph from L to M. Is that bug fixed now? > No. 13.2.2 still has this bug, you will need to wait for 13.2.3 before upgrading if you use CephFS. Wido >

Re: [ceph-users] Ceph Influx Plugin in luminous

2018-11-12 Thread Wido den Hollander
On 11/12/18 12:54 PM, mart.v wrote: > Hi, > > I'm trying to set up a Influx plugin > (http://docs.ceph.com/docs/mimic/mgr/influx/). The docs says that it > will be available in Mimic release, but I can see it (and enable) in > current Luminous. It seems that someone else acutally used it in >

Re: [ceph-users] Unexplainable high memory usage OSD with BlueStore

2018-11-08 Thread Wido den Hollander
On 11/8/18 12:28 PM, Hector Martin wrote: > On 11/8/18 5:52 PM, Wido den Hollander wrote: >> [osd] >> bluestore_cache_size_ssd = 1G >> >> The BlueStore Cache size for SSD has been set to 1GB, so the OSDs >> shouldn't use more then that. >> >&g

Re: [ceph-users] mount rbd read only

2018-11-08 Thread Wido den Hollander
On 11/8/18 1:05 PM, ST Wong (ITSC) wrote: > Hi, > >   > > We created a testing rbd block device image as following: > >   > > - cut here --- > > # rbd create 4copy/foo --size 10G > > # rbd feature disable 4copy/foo object-map fast-diff deep-flatten > > # rbd --image 4copy/foo

Re: [ceph-users] Unexplainable high memory usage OSD with BlueStore

2018-11-08 Thread Wido den Hollander
On 11/8/18 11:34 AM, Stefan Kooman wrote: > Quoting Wido den Hollander (w...@42on.com): >> Hi, >> >> Recently I've seen a Ceph cluster experience a few outages due to memory >> issues. >> >> The machines: >> >> - Intel Xeon E3 CPU >> - 3

[ceph-users] Unexplainable high memory usage OSD with BlueStore

2018-11-08 Thread Wido den Hollander
Hi, Recently I've seen a Ceph cluster experience a few outages due to memory issues. The machines: - Intel Xeon E3 CPU - 32GB Memory - 8x 1.92TB SSD - Ubuntu 16.04 - Ceph 12.2.8 Looking at one of the machines: root@ceph22:~# free -h totalusedfree shared

Re: [ceph-users] librados3

2018-10-29 Thread Wido den Hollander
On 10/29/18 12:42 PM, kefu chai wrote: > + ceph-user for more inputs in hope to get more inputs from librados > and librbd 's C++ interfaces. > > On Wed, Oct 24, 2018 at 1:34 AM Jason Dillaman wrote: >> >> On Tue, Oct 23, 2018 at 11:38 AM kefu chai wrote: >>> >>> we plan to introduce some

Re: [ceph-users] Monitor Recovery

2018-10-24 Thread Wido den Hollander
On 10/24/18 2:22 AM, John Petrini wrote: > Hi List, > > I've got a monitor that won't stay up. It comes up and joins the > cluster but crashes within a couple of minutes with no info in the > logs. At this point I'd prefer to just give up on it and assume it's > in a bad state and recover it

Re: [ceph-users] safe to remove leftover bucket index objects

2018-10-22 Thread Wido den Hollander
On 8/31/18 5:31 PM, Dan van der Ster wrote: > So it sounds like you tried what I was going to do, and it broke > things. Good to know... thanks. > > In our case, what triggered the extra index objects was a user running > PUT /bucketname/ around 20 million times -- this apparently recreates >

Re: [ceph-users] why set pg_num do not update pgp_num

2018-10-19 Thread Wido den Hollander
On 10/19/18 7:51 AM, xiang@iluvatar.ai wrote: > Hi! > > I use ceph 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic > (stable), and find that: > > When expand whole cluster, i update pg_num, all succeed, but the status > is as below: >   cluster: >     id:

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-16 Thread Wido den Hollander
On 10/16/18 11:32 AM, Igor Fedotov wrote: > > > On 10/16/2018 6:57 AM, Wido den Hollander wrote: >> >> On 10/16/2018 12:04 AM, Igor Fedotov wrote: >>> On 10/15/2018 11:47 PM, Wido den Hollander wrote: >>>> Hi, >>>> >>

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Wido den Hollander
On 10/16/2018 12:04 AM, Igor Fedotov wrote: > > On 10/15/2018 11:47 PM, Wido den Hollander wrote: >> Hi, >> >> On 10/15/2018 10:43 PM, Igor Fedotov wrote: >>> Hi Wido, >>> >>> once you apply the PR you'll probably see the initial error i

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Wido den Hollander
es that I missed something.. > > > Thanks, > > Igor > > > On 10/15/2018 10:12 PM, Wido den Hollander wrote: >> >> On 10/15/2018 08:23 PM, Gregory Farnum wrote: >>> I don't know anything about the BlueStore code, but given the snippets >>> you've

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Wido den Hollander
4543 It will stop the spamming, but that's not the root cause. The OSDs in this case are at max 80% full and they do have a lot of OMAP (RGW indexes) in them, but that's all. I'm however not sure why this is happening suddenly in this cluster. Wido > -Greg > > On Mon, Oct 15, 2018 at 10:02 AM Wi

Re: [ceph-users] SSD for MON/MGR/MDS

2018-10-15 Thread Wido den Hollander
On 10/15/2018 07:50 PM, solarflow99 wrote: > I think the answer is, yes.  I'm pretty sure only the OSDs require very > long life enterprise grade SSDs > Yes and No. Please use reliable Datacenter Grade SSDs for your MON databases. Something like 200GB is more then enough in your MON servers.

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Wido den Hollander
On 10/11/2018 12:08 AM, Wido den Hollander wrote: > Hi, > > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm > seeing OSDs writing heavily to their logfiles spitting out these lines: > > > 2018-10-10 21:52:04.019037 7f90c2f0f700 0 stupidalloc 0x

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-11 Thread Wido den Hollander
s a mix of SSDs and HDDs. The problem is with the SSD OSDs. So we moved a pool from SSD to HDD and that seems to have fixed the problem for now. But it will probably get back as soon as some OSDs go >80%. Wido > On Wed, Oct 10, 2018 at 6:37 PM Wido den Hollander <mailto:w...@42on.com>&g

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-10 Thread Wido den Hollander
On 10/11/2018 12:08 AM, Wido den Hollander wrote: > Hi, > > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm > seeing OSDs writing heavily to their logfiles spitting out these lines: > > > 2018-10-10 21:52:04.019037 7f90c2f0f700 0 stupidalloc 0x

[ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-10 Thread Wido den Hollander
Hi, On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm seeing OSDs writing heavily to their logfiles spitting out these lines: 2018-10-10 21:52:04.019037 7f90c2f0f700 0 stupidalloc 0x0x55828ae047d0 dump 0x15cd2078000~34000 2018-10-10 21:52:04.019038 7f90c2f0f700 0

Re: [ceph-users] Mons are using a lot of disk space and has a lot of old osd maps

2018-10-09 Thread Wido den Hollander
832 osds: 791 up, 790 in >>  flags noout,nodeep-scrub >> >>   data: >> pools: 10 pools, 52336 pgs >> objects: 47.78M objects, 238TiB >> usage: 854TiB used, 1.28PiB / 2.12PiB avail >> pgs: 52336 active+clean >> >>   io: >

Re: [ceph-users] Mons are using a lot of disk space and has a lot of old osd maps

2018-10-08 Thread Wido den Hollander
On 10/08/2018 05:04 PM, Aleksei Zakharov wrote: > Hi all, > > We've upgraded our cluster from jewel to luminous and re-created monitors > using rocksdb. > Now we see, that mon's are using a lot of disk space and used space only > grows. It is about 17GB for now. It was ~13GB when we used

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Wido den Hollander
Hi, $ ceph-volume lvm list Does that work for you? Wido On 10/08/2018 12:01 PM, Kevin Olbrich wrote: > Hi! > > Is there an easy way to find raw disks (eg. sdd/sdd1) by OSD id? > Before I migrated from filestore with simple-mode to bluestore with lvm, > I was able to find the raw disk with

[ceph-users] PG auto repair with BlueStore

2018-08-24 Thread Wido den Hollander
Hi, osd_scrub_auto_repair still defaults to false and I was wondering how we think about enabling this feature by default. Would we say it's safe to enable this with BlueStore? Wido ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] ceph auto repair. What is wrong?

2018-08-24 Thread Wido den Hollander
On 08/24/2018 06:11 AM, Fyodor Ustinov wrote: > Hi! > > I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two > - ssd). Each host located in own rack. > > I make such crush configuration on fresh ceph installation: > >sudo ceph osd crush add-bucket R-26-3-1 rack >

Re: [ceph-users] Removing all rados objects based on a prefix

2018-08-20 Thread Wido den Hollander
On 08/20/2018 05:20 PM, David Turner wrote: > The general talk about the rados cleanup command is to clean things up > after benchmarking.  Could this command also be used for deleting an old > RGW bucket or an RBD.  For instance, a bucket with a prefix of >

Re: [ceph-users] BlueStore wal vs. db size

2018-08-15 Thread Wido den Hollander
elp. Keep in mind that the 'journal' doesn't apply anymore with BlueStore. That was a FileStore thing. Wido > On Wed, Aug 15, 2018 at 10:58 AM, Wido den Hollander <mailto:w...@42on.com>> wrote: > > > > On 08/15/2018 05:57 PM, Robert Stanford wrote: > > &g

Re: [ceph-users] BlueStore wal vs. db size

2018-08-15 Thread Wido den Hollander
itions for each? > Yes, that is correct. Each OSD needs 10GB/1TB of storage of DB. So size your SSD according to your storage needs. However, it depends on the workload if you need to offload WAL+DB to a SSD. What is the workload? Wido > On Wed, Aug 15, 2018 at 1:59 AM, Wido den Hollander

Re: [ceph-users] limited disk slots - should I ran OS on SD card ?

2018-08-15 Thread Wido den Hollander
On 08/14/2018 09:12 AM, Burkhard Linke wrote: > Hi, > > > AFAIk SD cards (and SATA DOMs) do not have any kind of wear-leveling > support. Even if the crappy write endurance of these storage systems > would be enough to operate a server for several years on average, you > will always have some

Re: [ceph-users] BlueStore wal vs. db size

2018-08-15 Thread Wido den Hollander
On 08/15/2018 04:17 AM, Robert Stanford wrote: > I am keeping the wal and db for a ceph cluster on an SSD.  I am using > the masif_bluestore_block_db_size / masif_bluestore_block_wal_size > parameters in ceph.conf to specify how big they should be.  Should these > values be the same, or should

Re: [ceph-users] failing to respond to cache pressure

2018-08-13 Thread Wido den Hollander
On 08/13/2018 01:22 PM, Zhenshi Zhou wrote: > Hi, > Recently, the cluster runs healthy, but I get warning messages everyday: > Which version of Ceph? Which version of clients? Can you post: $ ceph versions $ ceph features $ ceph fs status Wido > 2018-08-13 17:39:23.682213 [INF]  Cluster is

Re: [ceph-users] Beginner's questions regarding Ceph, Deployment with ceph-ansible

2018-08-07 Thread Wido den Hollander
On 08/07/2018 11:23 AM, Jörg Kastning wrote: > Am 06.08.2018 um 22:01 schrieb Pawel S: >> On Mon, Aug 6, 2018 at 3:08 PM J?rg Kastning > wrote: >>> But what are agents, rgws, nfss, restapis, rbdmirrors, clients and >>> iscsi-gws? Where could I found additional information about them? Where >>>

Re: [ceph-users] ceph-mgr dashboard behind reverse proxy

2018-08-04 Thread Wido den Hollander
On 08/04/2018 09:04 AM, Tobias Florek wrote: > Hi! > > Thank you for your reply. > >>> I want to set up the dashboard behind a reverse proxy. How do >>> people determine which ceph-mgr is active? Is there any simple and >>> elegant solution? >> >> You can use haproxy. It supports periodic

Re: [ceph-users] Run ceph-rest-api in Mimic

2018-08-01 Thread Wido den Hollander
On 08/01/2018 12:00 PM, Ha, Son Hai wrote: > Hello everybody! > >   > > Because some of my applications are depended on the obsoleted > ceph-rest-api module, I would like to know if there is a way to run it > in Mimic? If I understood correctly, the new restful plugin >

Re: [ceph-users] Mimi Telegraf plugin on Luminous

2018-07-31 Thread Wido den Hollander
On 07/31/2018 09:38 AM, Denny Fuchs wrote: > hi, > > I try to get the Telegraf plugin from Mimic on Luminous running (Debian > Stretch). I copied the files from the Git into > /usr/lib/ceph/mgr/telegraf; enabled the plugin and get: > > > 2018-07-31 09:25:46.501858 7f496cfc9700 -1

Re: [ceph-users] Fwd: Mons stucking in election afther 3 Days offline

2018-07-26 Thread Wido den Hollander
_FOOTER_AND_DISPATCH pgs=74 cs=1 l=1). rx > client.? seq 1 0x55aa46be4fc0 auth(proto 0 30 bytes epoch 0) v1 > 2018-07-26 11:46:24.004914 7f819e167700 10 -- 10.111.73.1:6789/0 >> > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 s=STATE_OPEN pgs=74 > cs=1 l=1).handle_write >

Re: [ceph-users] Fwd: Mons stucking in election afther 3 Days offline

2018-07-26 Thread Wido den Hollander
o get some more information about the Messenger Traffic. Wido > kind regards > > Ben > >> Wido den Hollander hat am 26. Juli 2018 um 10:18 > geschrieben: >> >> >> >> >> On 07/26/2018 10:12 AM, Benjamin Naber wrote: >> > Hi together, >> >

Re: [ceph-users] Fwd: Mons stucking in election afther 3 Days offline

2018-07-26 Thread Wido den Hollander
On 07/26/2018 10:12 AM, Benjamin Naber wrote: > Hi together, > > we currently have some problems with monitor quorum after shutting down all > cluster nodes for migration to another location. > > mon_status gives uns the following outputt: > > { > "name": "mon01", > "rank": 0, > "state":

Re: [ceph-users] Read/write statistics per RBD image

2018-07-24 Thread Wido den Hollander
On 07/24/2018 12:51 PM, Mateusz Skala (UST, POL) wrote: > Hello again, > > How can I determine $cctid for specific rbd name? Or is there any good > way to map admin-socket with rbd? > Yes, check the output of 'perf dump', you can fetch the RBD image information from that JSON output. Wido >

Re: [ceph-users] Mimic 13.2.1 release date

2018-07-23 Thread Wido den Hollander
Any news on this yet? 13.2.1 would be very welcome! :-) Wido On 07/09/2018 05:11 PM, Wido den Hollander wrote: > Hi, > > Is there a release date for Mimic 13.2.1 yet? > > There are a few issues which currently make deploying with Mimic 13.2.0 > a bit difficult, for exa

Re: [ceph-users] Safe to use rados -p rbd cleanup?

2018-07-16 Thread Wido den Hollander
On 07/15/2018 11:12 AM, Mehmet wrote: > hello guys, > > in my production cluster i've many objects like this > > "#> rados -p rbd ls | grep 'benchmark'" > ... .. . > benchmark_data_inkscope.example.net_32654_object1918 > benchmark_data_server_26414_object1990 > ... .. . > > Is it safe to run

Re: [ceph-users] mimic (13.2.0) and "Failed to send data to Zabbix"

2018-07-11 Thread Wido den Hollander
On 07/11/2018 10:22 AM, ceph.nov...@habmalnefrage.de wrote: > at about the same time we also updated the Linux OS via "YUM" to: > > # more /etc/redhat-release > Red Hat Enterprise Linux Server release 7.5 (Maipo) > > > > from the given error message, it seems like there are 32 "measure

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Wido den Hollander
On 07/11/2018 10:10 AM, Robert Stanford wrote: > >  In a recent thread the Samsung SM863a was recommended as a journal > SSD.  Are there any recommendations for data SSDs, for people who want > to use just SSDs in a new Ceph cluster? > Depends on what you are looking for, SATA, SAS3 or NVMe?

Re: [ceph-users] mimic (13.2.0) and "Failed to send data to Zabbix"

2018-07-11 Thread Wido den Hollander
On 07/11/2018 10:02 AM, ceph.nov...@habmalnefrage.de wrote: > anyone with "mgr Zabbix enabled" migrated from Luminous (12.2.5 or 5) and has > the same problem in Mimic now? > if I disable and re-enable the "zabbix" module, the status is "HEALTH_OK" for > some sec. and changes to "HEALTH_WARN"

Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-11 Thread Wido den Hollander
is? Wido > > Cheers, > > Linh > > > > > *From:* John Spray > *Sent:* Tuesday, 10 July 2018 7:11 PM > *To:* Linh Vu > *Cc:* Wido den Hollander; ceph-users@lists.ceph.com > *Subject:* Re: [ceph

[ceph-users] Mimic 13.2.1 release date

2018-07-09 Thread Wido den Hollander
Hi, Is there a release date for Mimic 13.2.1 yet? There are a few issues which currently make deploying with Mimic 13.2.0 a bit difficult, for example: - https://tracker.ceph.com/issues/24423 - https://github.com/ceph/ceph/pull/22393 Especially the first one makes it difficult. 13.2.1 would

Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-06 Thread Wido den Hollander
On 07/06/2018 01:47 PM, John Spray wrote: > On Fri, Jul 6, 2018 at 12:19 PM Wido den Hollander wrote: >> >> >> >> On 07/05/2018 03:36 PM, John Spray wrote: >>> On Thu, Jul 5, 2018 at 1:42 PM Dennis Kramer (DBS) wrote: >>>> >>>>

Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-06 Thread Wido den Hollander
On 07/05/2018 03:36 PM, John Spray wrote: > On Thu, Jul 5, 2018 at 1:42 PM Dennis Kramer (DBS) wrote: >> >> Hi list, >> >> I have a serious problem now... I think. >> >> One of my users just informed me that a file he created (.doc file) has >> a different content then before. It looks like

Re: [ceph-users] Designating an OSD as a spare

2018-06-21 Thread Wido den Hollander
On 06/21/2018 03:35 PM, Drew Weaver wrote: > Yes, > >   > > Eventually however you would probably want to replace that physical disk > that has died and sometimes with remote deployments it is nice to not > have to do that instantly which is how enterprise arrays and support > contracts have

Re: [ceph-users] Planning all flash cluster

2018-06-20 Thread Wido den Hollander
On 06/20/2018 02:00 PM, Robert Sander wrote: > On 20.06.2018 13:58, Nick A wrote: > >> We'll probably add another 2 OSD drives per month per node until full >> (24 SSD's per node), at which point, more nodes. > > I would add more nodes earlier to achieve better overall performance. Exactly.

Re: [ceph-users] PM1633a

2018-06-16 Thread Wido den Hollander
On 06/15/2018 09:02 PM, Brian : wrote: > Hello List - anyone using these drives and have any good / bad things > to say about them? > Not really experience with them. I was about to order them in a SuperMicro chassis which supports SAS3 but then I found that the PM963a NVMe disks have the

Re: [ceph-users] Journal flushed on osd clean shutdown?

2018-06-13 Thread Wido den Hollander
On 06/13/2018 11:39 AM, Chris Dunlop wrote: > Hi, > > Is the osd journal flushed completely on a clean shutdown? > > In this case, with Jewel, and FileStore osds, and a "clean shutdown" being: > It is, a Jewel OSD will flush it's journal on a clean shutdown. The flush-journal is no longer

Re: [ceph-users] Ceph bonding vs separate provate public network

2018-06-12 Thread Wido den Hollander
On 06/12/2018 02:00 PM, Steven Vacaroaia wrote: > Hi, > > I am designing a new ceph cluster and was wondering whether I should > bond the 10 GB adapters or use one for public one for private > > The advantage of bonding is simplicity and, maybe, performance  > The catch though is that I cannot

Re: [ceph-users] Adding cluster network to running cluster

2018-06-07 Thread Wido den Hollander
On 06/07/2018 01:39 PM, mj wrote: > Hi, > > Please allow me to ask one more question: > > We currently have a seperated network: cluster on 10.10.x.x and public > on 192.168.x.x. > > I would like to migrate all network to 192.168.x.x setup, which would > give us 2*10G. > > Is simply

Re: [ceph-users] Adding cluster network to running cluster

2018-06-07 Thread Wido den Hollander
Keep it simple, one network to run the cluster on. Less components which can fail or complicate things. Wido > Thank you. > > Kevin > > 2018-06-07 10:44 GMT+02:00 Wido den Hollander <mailto:w...@42on.com>>: > > > > On 06/07/2018 09:46 AM, Kevin

Re: [ceph-users] Adding cluster network to running cluster

2018-06-07 Thread Wido den Hollander
On 06/07/2018 09:46 AM, Kevin Olbrich wrote: > Hi! > > When we installed our new luminous cluster, we had issues with the > cluster network (setup of mon's failed). > We moved on with a single network setup. > > Now I would like to set the cluster network again but the cluster is in > use (4

Re: [ceph-users] Stop scrubbing

2018-06-07 Thread Wido den Hollander
On 06/06/2018 08:32 PM, Joe Comeau wrote: > When I am upgrading from filestore to bluestore > or any other server maintenance for a short time > (ie high I/O while rebuilding) >   > ceph osd set noout > ceph osd set noscrub > ceph osd set nodeep-scrub >   > when finished >   > ceph osd unset

Re: [ceph-users] Data recovery after loosing all monitors

2018-05-22 Thread Wido den Hollander
On 05/22/2018 03:38 PM, George Shuklin wrote: > Good news, it's not an emergency, just a curiosity. > > Suppose I lost all monitors in a ceph cluster in my laboratory. I have > all OSDs intact. Is it possible to recover something from Ceph? Yes, there is. Using ceph-objectstore-tool you are

Re: [ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

2018-05-17 Thread Wido den Hollander
On 05/17/2018 04:37 PM, Thomas Byrne - UKRI STFC wrote: > Hi all, > >   > > As far as I understand, the monitor stores will grow while not HEALTH_OK > as they hold onto all cluster maps. Is this true for all HEALTH_WARN > reasons? Our cluster recently went into HEALTH_WARN due to a few weeks >

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-17 Thread Wido den Hollander
On 05/16/2018 03:34 PM, Wido den Hollander wrote: > > > On 05/16/2018 01:22 PM, Blair Bethwaite wrote: >> On 15 May 2018 at 08:45, Wido den Hollander <w...@42on.com >> <mailto:w...@42on.com>> wrote: >> >> > We've got some S

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-16 Thread Wido den Hollander
On 05/16/2018 01:22 PM, Blair Bethwaite wrote: > On 15 May 2018 at 08:45, Wido den Hollander <w...@42on.com > <mailto:w...@42on.com>> wrote: > > > We've got some Skylake Ubuntu based hypervisors that we can look at to > > compare tomorrow... >

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-15 Thread Wido den Hollander
On 05/15/2018 02:51 PM, Blair Bethwaite wrote: > Sorry, bit late to get back to this... > > On Wed., 2 May 2018, 06:19 Nick Fisk, > wrote: > > 4.16 required? > > > Looks like it - thanks for pointing that out. > > Wido, I don't think you are

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-15 Thread Wido den Hollander
having some issues with getting intel_pstate loaded, but with 4.16 it loaded without any problems, but still, CPUs keep clocking down. Wido > > > -Original Message- > From: ceph-users <ceph-users-boun...@lists.ceph.com> On Behalf Of Wido den > Hollander > Sent: 14 Ma

<    1   2   3   4   5   6   7   8   9   10   >