[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-01 Thread Satish Patel
Now when I run "ceph orch ps" it works but the following command throws an error. Trying to bring up second mgr using ceph orch apply mgr command but didn't help root@ceph1:/ceph-disk# ceph version ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable)

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-01 Thread Satish Patel
nevermind, i found doc related that and i am able to get 1 mgr up - https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon On Fri, Sep 2, 2022 at 1:21 AM Satish Patel wrote: > Folks, > > I am having little fun time with cephadm and it's very annoying to deal >

[ceph-users] Re: how to fix mds stuck at dispatched without restart ads

2022-09-01 Thread Xiubo Li
On 9/1/22 4:29 PM, zxcs wrote: Thanks a lot, xiubo!!! this time we still restarted mds fix this due to user urgent need list /path/to/A/, i will try to mds debug log if we hit it again. Also, haven’t try flush mds journal before, any side effect to do this? This cephfs cluster is a

[ceph-users] [cephadm] mgr: no daemons active

2022-09-01 Thread Satish Patel
Folks, I am having little fun time with cephadm and it's very annoying to deal with it I have deployed a ceph cluster using cephadm on two nodes. Now when i was trying to upgrade and noticed hiccups where it just upgraded a single mgr with 16.2.10 but not other so i started messing around and

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Satish Patel
Great, thanks! Don't ask me how many commands I have typed to fix my issue. Finally I did it. Basically i fix /etc/hosts and then i remove mgr service using following command ceph orch daemon rm mgr.ceph1.xmbvsb And cephadm auto deployed a new working mgr. I found ceph orch ps was hanging and

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Adam King
I'm not sure exactly what needs to be done to fix that, but I'd imagine just editing the /etc/hosts file on all your hosts to be correct would be the start (the cephadm shell would have taken its /etc/hosts off of whatever host you ran the shell from). Unfortunately I'm not much of a networking

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Satish Patel
Hi Adam, You are correct, look like it was a naming issue in my /etc/hosts file. Is there a way to correct it? If you see i have ceph1 two time. :( 10.73.0.191 ceph1.example.com ceph1 10.73.0.192 ceph2.example.com ceph1 On Thu, Sep 1, 2022 at 8:06 PM Adam King wrote: > the naming for daemons

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Adam King
the naming for daemons is a bit different for each daemon type, but for mgr daemons it's always "mgr..". The daemons cephadm will be able to find for something like a daemon redeploy are pretty much always whatever is reported in "ceph orch ps". Given that "mgr.ceph1.xmbvsb" isn't listed there,

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Satish Patel
Hi Adam, I have also noticed a very strange thing which is Duplicate name in the following output. Is this normal? I don't know how it got here. Is there a way I can rename them? root@ceph1:~# ceph orch ps NAME HOST PORTSSTATUS REFRESHED AGE MEM USE MEM

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Satish Patel
Hi Adam, Getting the following error, not sure why it's not able to find it. root@ceph1:~# ceph orch daemon redeploy mgr.ceph1.xmbvsb Error EINVAL: Unable to find mgr.ceph1.xmbvsb daemon(s) On Thu, Sep 1, 2022 at 5:57 PM Adam King wrote: > what happens if you run `ceph orch daemon redeploy

[ceph-users] Re: Remove corrupt PG

2022-09-01 Thread Jesper Lykkegaard Karlsen
Well not the total solution after all. There is still some metadata and header structure left that I still cannot delete with ceph-objectstore-tool —op remove. It makes a core dump. I think I need to declare the OSD lost anyway to the through this. Unless somebody have a better suggestion?

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Adam King
what happens if you run `ceph orch daemon redeploy mgr.ceph1.xmbvsb`? On Thu, Sep 1, 2022 at 5:12 PM Satish Patel wrote: > Hi Adam, > > Here is requested output > > root@ceph1:~# ceph health detail > HEALTH_WARN 4 stray daemon(s) not managed by cephadm > [WRN] CEPHADM_STRAY_DAEMON: 4 stray

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Satish Patel
Hi Adam, Here is requested output root@ceph1:~# ceph health detail HEALTH_WARN 4 stray daemon(s) not managed by cephadm [WRN] CEPHADM_STRAY_DAEMON: 4 stray daemon(s) not managed by cephadm stray daemon mon.ceph1 on host ceph1 not managed by cephadm stray daemon osd.0 on host ceph1 not

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Adam King
cephadm deploys the containers with --rm so they will get removed if you stop them. As for getting the 2nd mgr back, if it still lists the 2nd one in `ceph orch ps` you should be able to do a `ceph orch daemon redeploy ` where should match the name given in the orch ps output for the one that

[ceph-users] Re: Remove corrupt PG

2022-09-01 Thread Jesper Lykkegaard Karlsen
To answer my own question. The removal of the corrupt PG, could be fixed by doing ceph-objectstore-tool fuse mount-thingy. Then from the mount point, delete everything in the PGs head directory. This took only a few seconds (compared to 7.5 days) and after unmount and restart of the OSD it

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Satish Patel
Adam, I have posted a question related to upgrading earlier and this thread is related to that, I have opened a new one because I found that error in logs and thought the upgrade may be stuck because of duplicate OSDs. root@ceph1:~# ls -l /var/lib/ceph/f270ad9e-1f6f-11ed-b6f8-a539d87379ea/ total

[ceph-users] Ceph Mgr/Dashboard Python depedencies: a new approach

2022-09-01 Thread Ernesto Puerta
Hi all, For the Reef release, Dashboard team wants to deliver Python packages embedded in the ceph-dashboard distro package. [1] This will make it unnecessary to depend on distro Python packages and allow us to: - *Use new/er Python packages:* that are not available in all distros (e.g.:

[ceph-users] Re: The next quincy point release

2022-09-01 Thread Ilya Dryomov
On Thu, Sep 1, 2022 at 8:19 PM Yuri Weinstein wrote: > > I have several PRs that are ready for merge but failing "make check" > > https://github.com/ceph/ceph/pull/47650 (main related to quincy) > https://github.com/ceph/ceph/pull/47057 > https://github.com/ceph/ceph/pull/47621 >

[ceph-users] Re: cephadm upgrade from octopus to pasific stuck

2022-09-01 Thread Adam King
Does "ceph orch upgrade status" give any insights (e.g. an error message of some kind)? If not, maybe you could try looking at https://tracker.ceph.com/issues/56485#note-2 because it seems like a similar issue and I see you're using --ceph-version (which we need to fix, sorry about that). On Wed,

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Adam King
Are there any extra directories in /var/lib/ceph or /var/lib/ceph/ that appear to be for those OSDs on that host? When cephadm builds the info it uses for "ceph orch ps" it's actually scraping those directories. The output of "cephadm ls" on the host with the duplicates could also potentially have

[ceph-users] [cephadm] Found duplicate OSDs

2022-09-01 Thread Satish Patel
Folks, I am playing with cephadm and life was good until I started upgrading from octopus to pacific. My upgrade process stuck after upgrading mgr and in logs now i can see following error root@ceph1:~# ceph log last cephadm 2022-09-01T14:40:45.739804+ mgr.ceph2.hmbdla (mgr.265806) 8 :

[ceph-users] More recovery pain

2022-09-01 Thread Wyll Ingersoll
We are in the middle of a massive recovery event and our monitor DBs keep exploding to the point that they fill their disk partition (800GB disk). We cannot compact it because there is no room on the device for compaction to happen. We cannot add another disk at this time either. We

[ceph-users] Re: Fwd: Active-Active MDS RAM consumption

2022-09-01 Thread Gerdriaan Mulder
Hi Kamil, Seems like this issue where the MDS loads all direntries into memory. You should probably take a look at the mds_oft_prefetch_dirfrags setting (which changed default from true to

[ceph-users] Re: The next quincy point release

2022-09-01 Thread Venky Shankar
On Tue, Aug 30, 2022 at 10:48 PM Yuri Weinstein wrote: > > I have several PRs in testing: > > https://github.com/ceph/ceph/labels/wip-yuri2-testing > https://github.com/ceph/ceph/labels/wip-yuri-testing (needs fs review) > > Assuming they were merged, anything else is a must to be added to the >

[ceph-users] Re: Ceph Leadership Team Meeting Minutes - August 31, 2022

2022-09-01 Thread Venky Shankar
On Thu, Sep 1, 2022 at 10:12 AM Neha Ojha wrote: > > Hi everyone, > > Here are the topics discussed in today's meeting. > > - David Galoway's last CLT meeting, mixed emotions but we wish David > all the best for all his future endeavors > - Tracker upgrade postponed for now > - OVH payment

[ceph-users] Fwd: Active-Active MDS RAM consumption

2022-09-01 Thread Kamil Madac
Hi Ceph Community One of my customer has an issue with the MDS cluster. Ceph cluster is deployed with cephadm and is in version 16.2.7. As soon as MDS is switched from Active-Standby to Active-Active-Standby, MDS daemon starts to consume a lot of RAM. After some time it consumes 48GB RAM, and

[ceph-users] Re: Questions about the QA process and the data format of both OSD and MON

2022-09-01 Thread Neha Ojha
Hi Satoru, Apologies for the delay in responding to your questions. In the case of https://github.com/ceph/ceph/pull/45963, we caught the bug in an upgrade test (as described in https://tracker.ceph.com/issues/55444) and not in the rados test suite. Our upgrade test suites are meant to be run

[ceph-users] Re: how to speed up hundreds of millions small files read base on cephfs?

2022-09-01 Thread Maged Mokhtar
Hi, experts, We are using cephfs(15.2.*) with kernel mount on our production environment. And these days when we do massive read from cluster(multi processes), ceph health always report slow ops for some osds(build with hdd(8TB) which using ssd as db cache). our cluster have more read than

[ceph-users] Re: OSDs crush - Since Pacific

2022-09-01 Thread Igor Fedotov
Hi Wissem, given the log output it looks like suicide timeout has been fired. From my experience this is often observed when DB performance is degraded after bulk removals. And offline compaction should provide some relief. At least temporarily... But if deletes are ongoing (e.g. due to

[ceph-users] how to speed up hundreds of millions small files read base on cephfs?

2022-09-01 Thread zxcs
Hi, experts, We are using cephfs(15.2.*) with kernel mount on our production environment. And these days when we do massive read from cluster(multi processes), ceph health always report slow ops for some osds(build with hdd(8TB) which using ssd as db cache). our cluster have more read than

[ceph-users] Re: Wide variation in osd_mclock_max_capacity_iops_hdd

2022-09-01 Thread Sridhar Seshasayee
Hello Vladimir, I have noticed that our osd_mclock_max_capacity_iops_hdd > varies widely for OSDs on identical drives in identical > machines (from ~600 to ~2800). > The IOPS shouldn't vary widely if the drives are of similar age and running the same workloads. The

[ceph-users] Re: how to fix mds stuck at dispatched without restart ads

2022-09-01 Thread zxcs
Thanks a lot, xiubo!!! this time we still restarted mds fix this due to user urgent need list /path/to/A/, i will try to mds debug log if we hit it again. Also, haven’t try flush mds journal before, any side effect to do this? This cephfs cluster is a production environment, we need very