Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-23 Thread Janek Bevendorff
Ceph-fuse ?No, I am using the kernel module. Was there "Client xxx failing to respond to cache pressure" health warning?At first, yes (at least with the Mimic client). There were also warnings about being behind on trimming. I haven't seen these warnings with Nautilus now, but the effect is pretty

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-23 Thread Marc Schöchlin
Hi Jason, Am 24.07.19 um 00:40 schrieb Jason Dillaman: > >> Sure, which kernel do you prefer? > You said you have never had an issue w/ rbd-nbd 12.2.5 in your Xen > environment. Can you use a matching kernel version? Thats true, our virtual machines of our xen environments completly run on

Re: [ceph-users] Iscsi in the nautilus Dashboard

2019-07-23 Thread Brent Kennedy
The devs came through and added a 3.0.1 ceph-iscsi release that works with 14.2.2 ( detects as version 9 ). I then went into the dashboard to add a target and then hit a wall. I used the auto generated target IQN, then selected the two gateways as the “add portal” option and hit “create

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-23 Thread Alex Litvak
I just had an osd crashed with no logs (debug was not enabled). Happened 24 hours later after actual upgrade from 14.2.1 to 14.2.2. Nothing else changed as far as environment or load. Disk is OK. Restarted osd and it came back. Had cluster up for 2 month until the upgrade without an issue.

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-23 Thread Yan, Zheng
On Wed, Jul 24, 2019 at 4:06 AM Janek Bevendorff < janek.bevendo...@uni-weimar.de> wrote: > Thanks for your reply. > > On 23/07/2019 21:03, Nathan Fish wrote: > > What Ceph version? Do the clients match? What CPUs do the MDS servers > > have, and how is their CPU usage when this occurs? > >

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-23 Thread Janek Bevendorff
Addition: the MDS just crashed with a cache size of over 100GB. Nothing in the logs though (not at this level at least). It just went from spamming "handle_client_request" to its usual bootup log spill. On 23/07/2019 23:52, Janek Bevendorff wrote: >> Multiple active MDS's is a somewhat new

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-23 Thread Janek Bevendorff
> Multiple active MDS's is a somewhat new feature, and it might obscure > debugging information. > I'm not sure what the best way to restore stability temporarily is, > but if you can manage it, > I would go down to one MDS, crank up the debugging, and try to > reproduce the problem. That didn't

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-23 Thread Nathan Fish
Multiple active MDS's is a somewhat new feature, and it might obscure debugging information. I'm not sure what the best way to restore stability temporarily is, but if you can manage it, I would go down to one MDS, crank up the debugging, and try to reproduce the problem. How are your OSDs

Re: [ceph-users] RGW Admin REST metadata caps

2019-07-23 Thread Casey Bodley
the /admin/metadata apis require caps of type "metadata" source: https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rest_metadata.h#L37 On 7/23/19 12:53 PM, Benjeman Meekhof wrote: Ceph Nautilus, 14.2.2, RGW civetweb. Trying to read from the RGW admin api /metadata/user with request URL

Re: [ceph-users] RGW Admin REST metadata caps

2019-07-23 Thread Benjeman Meekhof
Please disregard, the listed caps are sufficient and there does not seem to be any issue here. Between adding the metadata caps and re-testing I made a mistake in passing credentials to the module and naturally received an AccessDenied for bad credentials. thanks, Ben On Tue, Jul 23, 2019 at

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-23 Thread Janek Bevendorff
Here's some additional information. While MDS daemons are trying to restore rank 0, they always load up to about 26M inodes before being replaced by a standby: +--++-+---+---+---+ | Rank | State  | MDS |    Activity   |  dns  |  inos |

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-23 Thread Nathan Fish
I have not had any more OSDs crash, but the 3 that crashed still crash on startup. I may purge and recreate them, but there's no hurry. I have 18 OSDs per host and plenty of free space currently. On Tue, Jul 23, 2019 at 2:19 AM Ashley Merrick wrote: > > Have they been stable since, or still had

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-23 Thread Marc Schöchlin
Hi Jason, Am 23.07.19 um 14:41 schrieb Jason Dillaman > Can you please test a consistent Ceph release w/ a known working > kernel release? It sounds like you have changed two variables, so it's > hard to know which one is broken. We need *you* to isolate what > specific Ceph or kernel release

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-23 Thread Janek Bevendorff
Thanks for your reply. On 23/07/2019 21:03, Nathan Fish wrote: > What Ceph version? Do the clients match? What CPUs do the MDS servers > have, and how is their CPU usage when this occurs? Sorry, I totally forgot to mention that while transcribing my post. The cluster runs Nautilus (I upgraded

Re: [ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-23 Thread Nathan Fish
What Ceph version? Do the clients match? What CPUs do the MDS servers have, and how is their CPU usage when this occurs? While migrating to a Nautilus cluster recently, we had up to 14 million inodes open, and we increased the cache limit to 16GiB. Other than warnings about oversized cache, this

Re: [ceph-users] MON crashing when upgrading from Hammer to Luminous

2019-07-23 Thread Armin Ranjbar
Thanks JC! I can confirm that I was able to first upgrade to jewel, and then directly upgrade to Luminous. thank you all! --- Armin ranjbar On Tue, Jul 23, 2019 at 1:22 AM JC Lopez wrote: > First link should be this one >

[ceph-users] MDS fails repeatedly while handling many concurrent meta data operations

2019-07-23 Thread Janek Bevendorff
Hi, Disclaimer: I posted this before to the cheph.io mailing list, but from the answers I didn't get and a look at the archives, I concluded that that list is very dead. So apologies if anyone has read this before. I am trying to copy our storage server to a CephFS. We have 5 MONs in our cluster

Re: [ceph-users] Ceph Scientific Computing User Group

2019-07-23 Thread Kevin Hrpcek
Update We're going to hold off until August for this so we can promote it on the Ceph twitter with more notice. Sorry for the inconvenience if you were planning on the meeting tomorrow. Keep a watch on the list, twitter, or ceph calendar for updates. Kevin On 7/5/19 11:15 PM, Kevin Hrpcek

[ceph-users] RGW Admin REST metadata caps

2019-07-23 Thread Benjeman Meekhof
Ceph Nautilus, 14.2.2, RGW civetweb. Trying to read from the RGW admin api /metadata/user with request URL like: GET /admin/metadata/user?key=someuser=json But am getting a 403 denied error from RGW. Shouldn't the caps below be sufficient, or am I missing something? "caps": [ {

Re: [ceph-users] Mark CephFS inode as lost

2019-07-23 Thread Robert LeBlanc
Thanks, I created a ticket. http://tracker.ceph.com/issues/40906 Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Mon, Jul 22, 2019 at 11:45 PM Yan, Zheng wrote: > please create a ticket at http://tracker.ceph.com/projects/cephfs and >

Re: [ceph-users] Iscsi in the nautilus Dashboard

2019-07-23 Thread Brent Kennedy
I had installed 3.0 and its detected by the dashboard as version 8. I checked the common.py in the source from the ceph-iscsi site and it shows as version 8 on line 59(meaning I confirmed what the dashboard is saying). From what I can tell, there is no version 9 posted to the ceph-iscsi

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-23 Thread Jason Dillaman
On Tue, Jul 23, 2019 at 6:58 AM Marc Schöchlin wrote: > > > Am 23.07.19 um 07:28 schrieb Marc Schöchlin: > > > > Okay, i already experimented with high timeouts (i.e 600 seconds). As i can > > remember this leaded to pretty unusable system if i put high amounts of io > > on the ec volume. > >

Re: [ceph-users] Observation of bluestore db/wal performance

2019-07-23 Thread Виталий Филиппов
Bluestore's deferred write queue doesn't act like Filestore's journal because a) it's very small = 64 requests b) it doesn't have a background flush thread. Bluestore basically refuses to do writes faster than the HDD can do them _on_average_. With Filestore you can have 1000-2000 write iops

Re: [ceph-users] Future of Filestore?

2019-07-23 Thread Stuart Longland
On 19/7/19 8:21 pm, Stuart Longland wrote: > I'm now getting about 5MB/sec I/O speeds in my VMs. > > I'm contemplating whether I migrate back to using Filestore (on XFS this > time, since BTRFS appears to be a rude word despite Ceph v10 docs > suggesting it as a good option), but I'm not sure

Re: [ceph-users] Please help: change IP address of a cluster

2019-07-23 Thread Manuel Lausch
Hi, I had to change the IPs of my cluster some time ago. The process was quite easy. I don't understand what you mean with configuring and deleting static routes. The easies way is if the router allows (at least for the change) all traffic between the old and the new network. I did the

Re: [ceph-users] reproducable rbd-nbd crashes

2019-07-23 Thread Marc Schöchlin
Am 23.07.19 um 07:28 schrieb Marc Schöchlin: > > Okay, i already experimented with high timeouts (i.e 600 seconds). As i can > remember this leaded to pretty unusable system if i put high amounts of io on > the ec volume. > This system also runs als krbd volume which saturates the system with

[ceph-users] v14.2.2 Nautilus released

2019-07-23 Thread Nathan Cutler
This is the second bug fix release of Ceph Nautilus release series. We recommend all Nautilus users upgrade to this release. For upgrading from older releases of ceph, general guidelines for upgrade to nautilus must be followed. Notable Changes --- * The no{up,down,in,out} related

Re: [ceph-users] Repair statsfs fail some osd 14.2.1 to 14.2.2

2019-07-23 Thread Igor Fedotov
Hi Manuel, this looks like either corrupted data in BlueStore data base or memory related (some leakage?) issue. This is reproducible, right? Could you please make a ticket in upstream tracker, rerun repair with debug bluestore set to 5/20 and upload corresponding log. Please observe

[ceph-users] Repair statsfs fail some osd 14.2.1 to 14.2.2

2019-07-23 Thread EDH - Manuel Rios Fernandez
Hi Ceph, Upgraded last night from 14.2.1 to 14.2.2, 36 osd with old stats. We're still repairing stats one by one . But one failed. Hope this helps. CentOS Version: Linux CEPH006 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Re: [ceph-users] Mark CephFS inode as lost

2019-07-23 Thread Yan, Zheng
please create a ticket at http://tracker.ceph.com/projects/cephfs and upload mds log with debug_mds =10 On Tue, Jul 23, 2019 at 6:00 AM Robert LeBlanc wrote: > > We have a Luminous cluster which has filled up to 100% multiple times and > this causes an inode to be left in a bad state. Doing

Re: [ceph-users] Nautilus 14.2.1 / 14.2.2 crash

2019-07-23 Thread Ashley Merrick
Have they been stable since, or still had some crash? ,Thanks On Sat, 20 Jul 2019 10:09:08 +0800 Nigel Williams wrote On Sat, 20 Jul 2019 at 04:28, Nathan Fish wrote: On further investigation, it seems to be this bug: