Re: [ceph-users] Ceph capacity versus pool replicated size discrepancy?

2019-08-14 Thread Konstantin Shalygin
On 8/14/19 6:19 PM, Kenneth Van Alstyne wrote: Got it!  I can calculate individual clone usage using “rbd du”, but does anything exist to show total clone usage across the pool?  Otherwise it looks like phantom space is just missing. rbd du for each snapshot, I think... k

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-14 Thread Mike Christie
On 08/14/2019 02:09 PM, Mike Christie wrote: > On 08/14/2019 07:35 AM, Marc Schöchlin wrote: 3. I wonder if we are hitting a bug with PF_MEMALLOC Ilya hit with krbd. He removed that code from the krbd. I will ping him on that. >> >> Interesting. I activated Coredumps for that processes -

Re: [ceph-users] WAL/DB size

2019-08-14 Thread Anthony D'Atri
Good points in both posts, but I think there’s still some unclarity. Absolutely let’s talk about DB and WAL together. By “bluestore goes on flash” I assume you mean WAL+DB? “Simply allocate DB and WAL will appear there automatically” Forgive me please if this is obvious, but I’d like to see a

Re: [ceph-users] reproducible rbd-nbd crashes

2019-08-14 Thread Mike Christie
On 08/14/2019 07:35 AM, Marc Schöchlin wrote: >>> 3. I wonder if we are hitting a bug with PF_MEMALLOC Ilya hit with krbd. >>> He removed that code from the krbd. I will ping him on that. > > Interesting. I activated Coredumps for that processes - probably we can > find something interesting

Re: [ceph-users] ceph device list empty

2019-08-14 Thread Gary Molenkamp
I've had no luck in tracing this down.  I've tried setting debugging and log channels to try and find what is failing with no success. With debug_mgr at 20/20, the logs will show:        log_channel(audit) log [DBG] : from='client.10424012 -' entity='client.admin' cmd=[{"prefix": "device ls",

Re: [ceph-users] New Cluster Failing to Start (Resolved)

2019-08-14 Thread DHilsbos
All; We found the problem, we had the v2 ports incorrect in the monmap. Thank you, Dominic L. Hilsbos, MBA Director - Information Technology Perform Air International Inc. dhils...@performair.com www.PerformAir.com -Original Message- From: ceph-users

Re: [ceph-users] WAL/DB size

2019-08-14 Thread Mark Nelson
On 8/14/19 1:06 PM, solarflow99 wrote: Actually standalone WAL is required when you have either very small fast device (and don't want db to use it) or three devices (different in performance) behind OSD (e.g. hdd, ssd, nvme). So WAL is to be located at the fastest one.

Re: [ceph-users] CephFS meltdown fallout: mds assert failure, kernel oopses

2019-08-14 Thread Jeff Layton
On Wed, 2019-08-14 at 19:29 +0200, Ilya Dryomov wrote: > On Tue, Aug 13, 2019 at 1:06 PM Hector Martin wrote: > > I just had a minor CephFS meltdown caused by underprovisioned RAM on the > > MDS servers. This is a CephFS with two ranks; I manually failed over the > > first rank and the new MDS

Re: [ceph-users] WAL/DB size

2019-08-14 Thread solarflow99
> Actually standalone WAL is required when you have either very small fast > device (and don't want db to use it) or three devices (different in > performance) behind OSD (e.g. hdd, ssd, nvme). So WAL is to be located > at the fastest one. > > For the given use case you just have HDD and NVMe and

Re: [ceph-users] CephFS meltdown fallout: mds assert failure, kernel oopses

2019-08-14 Thread Ilya Dryomov
On Tue, Aug 13, 2019 at 1:06 PM Hector Martin wrote: > > I just had a minor CephFS meltdown caused by underprovisioned RAM on the > MDS servers. This is a CephFS with two ranks; I manually failed over the > first rank and the new MDS server ran out of RAM in the rejoin phase > (ceph-mds didn't

[ceph-users] New Cluster Failing to Start

2019-08-14 Thread DHilsbos
All; We're working to deploy our first production Ceph cluster, and we've run into a snag. The MONs start, but the "cluster" doesn't appear to come up. Ceph -s never returns. These are the last lines in the event log of one of the mons: 2019-08-13 16:20:03.706 7f668108f180 0 starting

Re: [ceph-users] MDS corruption

2019-08-14 Thread ☣Adam
I was able to get this resolved, thanks again to Pierre Dittes! The reason the recovery did not work the first time I tried it was because I still had the filesystem mounted (or at least attempted to have it mounted). This was causing sessions to be active. After rebooting all the machines

Re: [ceph-users] Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2019-08-14 Thread Troy Ablan
Paul, Thanks for the reply. All of these seemed to fail except for pulling the osdmap from the live cluster. -Troy -[~:#]- ceph-objectstore-tool --op get-osdmap --data-path /var/lib/ceph/osd/ceph-45/ --file osdmap45 terminate called after throwing an instance of

[ceph-users] Question to developers about iscsi

2019-08-14 Thread Fyodor Ustinov
Hi! As I understand - iscsi gate is part of ceph. Documentation says: Note The iSCSI management functionality of Ceph Dashboard depends on the latest version 3 of the ceph-iscsi project. Make sure that your operating system provides the correct version, otherwise the dashboard won’t enable the

Re: [ceph-users] Canonical Livepatch broke CephFS client

2019-08-14 Thread Ilya Dryomov
On Wed, Aug 14, 2019 at 1:54 PM Tim Bishop wrote: > > On Wed, Aug 14, 2019 at 12:44:15PM +0200, Ilya Dryomov wrote: > > On Tue, Aug 13, 2019 at 10:56 PM Tim Bishop wrote: > > > This email is mostly a heads up for others who might be using > > > Canonical's livepatch on Ubuntu on a CephFS client.

Re: [ceph-users] strange backfill delay after outing one node

2019-08-14 Thread Simon Oosthoek
On 14/08/2019 10:44, Wido den Hollander wrote: > > > On 8/14/19 9:48 AM, Simon Oosthoek wrote: >> Is it a good idea to give the above commands or other commands to speed >> up the backfilling? (e.g. like increasing "osd max backfills") >> > > Yes, as right now the OSDs aren't doing that many

[ceph-users] reproducible rbd-nbd crashes

2019-08-14 Thread Marc Schöchlin
Hello Mike, see my inline comments. Am 14.08.19 um 02:09 schrieb Mike Christie: >>> - >>> Previous tests crashed in a reproducible manner with "-P 1" (single io >>> gzip/gunzip) after a few minutes up to 45 minutes. >>> >>> Overview of my tests: >>> >>> - SUCCESSFUL: kernel 4.15, ceph

Re: [ceph-users] Canonical Livepatch broke CephFS client

2019-08-14 Thread Tim Bishop
On Wed, Aug 14, 2019 at 12:44:15PM +0200, Ilya Dryomov wrote: > On Tue, Aug 13, 2019 at 10:56 PM Tim Bishop wrote: > > This email is mostly a heads up for others who might be using > > Canonical's livepatch on Ubuntu on a CephFS client. > > > > I have an Ubuntu 18.04 client with the standard

Re: [ceph-users] Ceph capacity versus pool replicated size discrepancy?

2019-08-14 Thread Kenneth Van Alstyne
Got it! I can calculate individual clone usage using “rbd du”, but does anything exist to show total clone usage across the pool? Otherwise it looks like phantom space is just missing. Thanks, -- Kenneth Van Alstyne Systems Architect M: 228.547.8045 15052 Conference Center Dr, Chantilly, VA

Re: [ceph-users] WAL/DB size

2019-08-14 Thread Igor Fedotov
Hi Wido & Hermant. On 8/14/2019 11:36 AM, Wido den Hollander wrote: On 8/14/19 9:33 AM, Hemant Sonawane wrote: Hello guys, Thank you so much for your responses really appreciate it. But I would like to mention one more thing which I forgot in my last email is that I am going to use this

Re: [ceph-users] Canonical Livepatch broke CephFS client

2019-08-14 Thread Ilya Dryomov
On Tue, Aug 13, 2019 at 10:56 PM Tim Bishop wrote: > > Hi, > > This email is mostly a heads up for others who might be using > Canonical's livepatch on Ubuntu on a CephFS client. > > I have an Ubuntu 18.04 client with the standard kernel currently at > version linux-image-4.15.0-54-generic

Re: [ceph-users] Scrub start-time and end-time

2019-08-14 Thread Thomas Byrne - UKRI STFC
Hi Torben, > Is it allowed to have the scrub period cross midnight ? eg have start time at > 22:00 and end time 07:00 next morning. Yes, I think that's what the way it is mostly used, primarily to reduce the scrub impact during waking/working hours. > I assume that if you only configure the

Re: [ceph-users] Sudden loss of all SSD OSDs in a cluster, immedaite abort on restart [Mimic 13.2.6]

2019-08-14 Thread Paul Emmerich
Starting point to debug/fix this would be to extract the osdmap from one of the dead OSDs: ceph-objectstore-tool --op get-osdmap --data-path /var/lib/ceph/osd/... Then try to run osdmaptool on that osdmap to see if it also crashes, set some --debug options (don't know which one off the top of my

Re: [ceph-users] WAL/DB size

2019-08-14 Thread Burkhard Linke
Hi, please keep in mind that due to the rocksdb level concept, only certain db partition sizes are useful. Larger partitions are a waste of capacity, since rockdb will only use whole level sizes. There has been a lot of discussion about this on the mailing list in the last months. A plain

Re: [ceph-users] strange backfill delay after outing one node

2019-08-14 Thread Janne Johansson
Den ons 14 aug. 2019 kl 09:49 skrev Simon Oosthoek : > Hi all, > > Yesterday I marked out all the osds on one node in our new cluster to > reconfigure them with WAL/DB on their NVMe devices, but it is taking > ages to rebalance. > > > ceph tell 'osd.*' injectargs '--osd-max-backfills 16' > >

Re: [ceph-users] strange backfill delay after outing one node

2019-08-14 Thread Wido den Hollander
On 8/14/19 9:48 AM, Simon Oosthoek wrote: > Hi all, > > Yesterday I marked out all the osds on one node in our new cluster to > reconfigure them with WAL/DB on their NVMe devices, but it is taking > ages to rebalance. The whole cluster (and thus the osds) is only ~1% > full, therefore the full

Re: [ceph-users] WAL/DB size

2019-08-14 Thread Wido den Hollander
On 8/14/19 9:33 AM, Hemant Sonawane wrote: > Hello guys, > > Thank you so much for your responses really appreciate it. But I would > like to mention one more thing which I forgot in my last email is that I > am going to use this storage for openstack VM's. So still the answer > will be the

Re: [ceph-users] Cephfs cannot mount with kernel client

2019-08-14 Thread Serkan Çoban
Hi, just double checked the stack trace and I can confirm it is same as in tracker. compaction also worked for me, I can now mount cephfs without problems. Thanks for help, Serkan On Tue, Aug 13, 2019 at 6:44 PM Ilya Dryomov wrote: > > On Tue, Aug 13, 2019 at 4:30 PM Serkan Çoban wrote: > > >

[ceph-users] strange backfill delay after outing one node

2019-08-14 Thread Simon Oosthoek
Hi all, Yesterday I marked out all the osds on one node in our new cluster to reconfigure them with WAL/DB on their NVMe devices, but it is taking ages to rebalance. The whole cluster (and thus the osds) is only ~1% full, therefore the full ratio is nowhere in sight. We have 14 osd nodes with 12

Re: [ceph-users] WAL/DB size

2019-08-14 Thread Hemant Sonawane
Hello guys, Thank you so much for your responses really appreciate it. But I would like to mention one more thing which I forgot in my last email is that I am going to use this storage for openstack VM's. So still the answer will be the same that I should use 1GB for wal? On Wed, 14 Aug 2019 at