Re: [ceph-users] mon sudden crash loop - pinned map

2019-10-09 Thread Philippe D'Anjou
I dont think this has anything to do with CephFS, the mon crashes for same reason even without the mds running.I have still the old rocksdb files but they had a corruption issue, not sure if that's easier to fix, there havent been any changes on the cluster in between. This is a disaster

Re: [ceph-users] Unexpected increase in the memory usage of OSDs

2019-10-09 Thread Gregory Farnum
On Wed, Oct 9, 2019 at 10:58 AM Vladimir Brik < vladimir.b...@icecube.wisc.edu> wrote: > Best I can tell, automatic cache sizing is enabled and all related > settings are at their default values. > > Looking through cache tunables, I came across > osd_memory_expected_fragmentation, which the docs

Re: [ceph-users] Ceph pg repair clone_missing?

2019-10-09 Thread Brad Hubbard
Awesome! Sorry it took so long. On Thu, Oct 10, 2019 at 12:44 AM Marc Roos wrote: > > > Brad, many thanks!!! My cluster has finally HEALTH_OK af 1,5 year or so! > :) > > > -Original Message- > Subject: Re: Ceph pg repair clone_missing? > > On Fri, Oct 4, 2019 at 6:09 PM Marc Roos >

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-10-09 Thread Patrick Donnelly
Looks like this bug: https://tracker.ceph.com/issues/41148 On Wed, Oct 9, 2019 at 1:15 PM David C wrote: > > Hi Daniel > > Thanks for looking into this. I hadn't installed ceph-debuginfo, here's the > bt with line numbers: > > #0 operator uint64_t (this=0x10) at >

Re: [ceph-users] [Nfs-ganesha-devel] 2.7.3 with CEPH_FSAL Crashing

2019-10-09 Thread David C
Hi Daniel Thanks for looking into this. I hadn't installed ceph-debuginfo, here's the bt with line numbers: #0 operator uint64_t (this=0x10) at /usr/src/debug/ceph-14.2.2/src/include/object.h:123 #1 Client::fill_statx (this=this@entry=0x274b980, in=0x0, mask=mask@entry=341,

Re: [ceph-users] Unexpected increase in the memory usage of OSDs

2019-10-09 Thread Vladimir Brik
Best I can tell, automatic cache sizing is enabled and all related settings are at their default values. Looking through cache tunables, I came across osd_memory_expected_fragmentation, which the docs define as "estimate the percent of memory fragmentation". What's the formula to compute

Re: [ceph-users] mon sudden crash loop - pinned map

2019-10-09 Thread Gregory Farnum
On Mon, Oct 7, 2019 at 11:11 PM Philippe D'Anjou wrote: > > Hi, > unfortunately it's single mon, because we had major outage on this cluster > and it's just being used to copy off data now. We werent able to add more > mons because once a second mon was added it crashed the first one (there's a

Re: [ceph-users] Ceph multi site outage question

2019-10-09 Thread Melzer Pinto
Thanks - yeah jewel is old  But i meant to say nautilus and not luminous. The first option probably wont work for me. Since both sides are active and the application1 needs to write in both places as http://application1.something.com. The 2nd one in theory should work. I'm using haproxy and it

Re: [ceph-users] Unexpected increase in the memory usage of OSDs

2019-10-09 Thread Gregory Farnum
On Mon, Oct 7, 2019 at 7:20 AM Vladimir Brik wrote: > > > Do you have statistics on the size of the OSDMaps or count of them > > which were being maintained by the OSDs? > No, I don't think so. How can I find this information? Hmm I don't know if we directly expose the size of maps. There are

Re: [ceph-users] Ceph multi site outage question

2019-10-09 Thread Ed Fisher
Boy, Jewel is pretty old. Even Luminous is getting up there. There have been a lot of multisite improvements in Mimic and now Nautilus, so you might want to consider upgrading all the way to 14.2.4. Anyway, the way we solve this is by giving each zone a different name (eg

[ceph-users] Ceph multi site outage question

2019-10-09 Thread Melzer Pinto
Hello, I have a question about multi site configuration. I have 2 clusters configured in a single realm and zonegroup. One cluster is the master zone and the other the slave. Lets assume the first cluster can be reached at http://application1.something.com and the 2nd one is

Re: [ceph-users] Ceph pg repair clone_missing?

2019-10-09 Thread Marc Roos
Brad, many thanks!!! My cluster has finally HEALTH_OK af 1,5 year or so! :) -Original Message- Subject: Re: Ceph pg repair clone_missing? On Fri, Oct 4, 2019 at 6:09 PM Marc Roos wrote: > > > > >Try something like the following on each OSD that holds a copy of >