[ceph-users] ceph-ansible install failure

2022-10-21 Thread Zhongzhou Cai
Hi folks,

I'm trying to install ceph on GCE VMs (debian/ubuntu) with PD-SSDs using
ceph-ansible image. The installation from clean has been good, but when I
purged ceph cluster and tried to re-install, I saw the error:

```

Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore
bluestore --mkfs -i 1 --monmap /var/lib/ceph/osd/ceph-1/activate.monmap
--keyfile - --osd-data /var/lib/ceph/osd/ceph-1/ --osd-uuid
08d766d5-e843-4c65-9f4f-db7f0129b4e9 --setuser ceph --setgroup ceph

stderr: 2022-10-21T22:07:58.447+ 7f71afead080 -1
bluestore(/var/lib/ceph/osd/ceph-1/) _read_fsid unparsable uuid

stderr: 2022-10-21T22:07:58.455+ 7f71afead080 -1 bluefs _replay 0x0:
stop: uuid 8b1ce55d-10c1-a33d-1817-8a8427657694 != super.uuid
3d8aa673-00bd-473c-a725-06ac31c6b945, block dump:

stderr:   6a bc c7 44 83 87 8b 1c  e5 5d 10 c1 a3 3d 18 17
|j..D.]...=..|

stderr: 0010  8a 84 27 65 76 94 bd 12  3c 11 4a c4 32 6c eb a4
|..'ev...<.J.2l..|

…

stderr: 0ff0  2b 57 4e a4 ad da be cb  bf df 61 fc f7 ce 4a 14
|+WN...a...J.|

stderr: 1000

stderr: 2022-10-21T22:07:58.987+ 7f71afead080 -1 rocksdb:
verify_sharding unable to list column families: NotFound:

stderr: 2022-10-21T22:07:58.987+ 7f71afead080 -1
bluestore(/var/lib/ceph/osd/ceph-1/) _open_db erroring opening db:

stderr: 2022-10-21T22:07:59.515+ 7f71afead080 -1 OSD::mkfs:
ObjectStore::mkfs failed with error (5) Input/output error

stderr: 2022-10-21T22:07:59.515+ 7f71afead080 -1 [0;31m ** ERROR: error
creating empty object store in /var/lib/ceph/osd/ceph-1/: (5) Input/output
error[0m

--> Was unable to complete a new OSD, will rollback changes
```

Can someone explain what uuid != super.uuid means? The issue seems not to
happen when installing on a clean disk. Would it be related to the purging
process not doing a good cleanup job? FWIW, I'm using
https://github.com/ceph/ceph-ansible/blob/main/infrastructure-playbooks/purge-cluster.yml
to purge the cluster.

Thanks,
Zhongzhou Cai
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-10-21 Thread Eugen Block
IIRC cephadm refreshes its daemons within 15 minutes, at least that  
was my last impression. So sometimes you have to be patient. :-)



Zitat von Satish Patel :


Hi Eugen,

My error cleared up itself, Look like it took some time but now I am not
seeing any errors and the output is very clean. Thank you so much.




On Fri, Oct 21, 2022 at 1:46 PM Eugen Block  wrote:


Do you still see it with ‚cephadm ls‘ on that node? If yes you could
try ‚cephadm rm-daemon —name osd.3‘. Or you try it with the
orchestrator: ceph orch daemon rm…
I don’t have the exact command at the moment, you should check the docs.

Zitat von Satish Patel :

> Hi Eugen,
>
> I have delected osd.3 directory from datastorn4 node as you mentioned but
> still i am seeing that duplicate osd in ps output.
>
> root@datastorn1:~# ceph orch ps | grep osd.3
> osd.3  datastorn4stopped  5m
> ago   3w-42.6G 
> osd.3  datastorn5running (3w) 5m
> ago   3w2587M42.6G  17.2.3 0912465dcea5  d139f8a1234b
>
> How do I clean up permanently?
>
>
> On Fri, Oct 21, 2022 at 6:24 AM Eugen Block  wrote:
>
>> Hi,
>>
>> it looks like the OSDs haven't been cleaned up after removing them. Do
>> you see the osd directory in /var/lib/ceph//osd.3 on datastorn4?
>> Just remove the osd.3 directory, then cephadm won't try to activate it.
>>
>>
>> Zitat von Satish Patel :
>>
>> > Folks,
>> >
>> > I have deployed 15 OSDs node clusters using cephadm and encount
duplicate
>> > OSD on one of the nodes and am not sure how to clean that up.
>> >
>> > root@datastorn1:~# ceph health
>> > HEALTH_WARN 1 failed cephadm daemon(s); 1 pool(s) have no replicas
>> > configured
>> >
>> > osd.3 is duplicated on two nodes, i would like to remove it from
>> > datastorn4 but I'm not sure how to remove it. In the ceph osd tree I
am
>> not
>> > seeing any duplicate.
>> >
>> > root@datastorn1:~# ceph orch ps | grep osd.3
>> > osd.3  datastorn4stopped
7m
>> > ago   3w-42.6G 
>> > osd.3  datastorn5running (3w)
 7m
>> > ago   3w2584M42.6G  17.2.3 0912465dcea5  d139f8a1234b
>> >
>> >
>> > Getting following error in logs
>> >
>> > 2022-10-21T09:10:45.226872+ mgr.datastorn1.nciiiu (mgr.14188)
>> 1098186 :
>> > cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
>> datastorn4,
>> > osd.3 in status running on datastorn5
>> > 2022-10-21T09:11:46.254979+ mgr.datastorn1.nciiiu (mgr.14188)
>> 1098221 :
>> > cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
>> datastorn4,
>> > osd.3 in status running on datastorn5
>> > 2022-10-21T09:12:53.009252+ mgr.datastorn1.nciiiu (mgr.14188)
>> 1098256 :
>> > cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
>> datastorn4,
>> > osd.3 in status running on datastorn5
>> > 2022-10-21T09:13:59.283251+ mgr.datastorn1.nciiiu (mgr.14188)
>> 1098293 :
>> > cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
>> datastorn4,
>> > osd.3 in status running on datastorn5
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>








___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: subdirectory pinning and reducing ranks / max_mds

2022-10-21 Thread Robert Gallop
In my experience it just falls back to behave like its un-pinned.

For my use case I do the following:

/ pinned to rank 0
/env1 to rank 1
/env2 to rank 2
/env 3 to rank 3

If I do an upgrade it will collapse to single rank, all access/IO continues
after what would be a normal failover type of interval, i.e. IO may stop on
clients for 10-60 seconds or whatever as if a normal MDS rank failover
occurred.

But it will not remain in a locked state for the entire time from what I’ve
seen.

YMMV, but as long as the reduction in ranks actually works (we’ve had them
crash when trying to shut down and stuff), you should be in good shape.

If you do hit issues of ranks crashing, be ready to pause the upgrade, and
set your max_mds back to 3 or 4 to stop the immediate bleeding and continue
your troubleshooting without impact to the clients.

On Fri, Oct 21, 2022 at 12:29 PM Wesley Dillingham 
wrote:

> In a situation where you have say 3 active MDS (and 3 standbys).
> You have 3 ranks, 0,1,2
> In your filesystem you have three directories at the root level [/a, /b,
> /c]
>
> you pin:
> /a to rank 0
> /b to rank 1
> /c to rank 2
>
> and you need to upgrade your Ceph Version. When it becomes time to reduce
> max_mds to 1 and thereby reduce the number of ranks to 1, just rank 0 what
> happens to directories /b and /c do they become unavailable between the
> time when max_mds is reduced to 1 and after the upgrade when max_mds is
> restored to 3. Alternatively if a rank disappears does the CephFS client
> understand this and begin to ignore the pinned rank and makes use of the
> remaining ranks? Thanks.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] subdirectory pinning and reducing ranks / max_mds

2022-10-21 Thread Wesley Dillingham
In a situation where you have say 3 active MDS (and 3 standbys).
You have 3 ranks, 0,1,2
In your filesystem you have three directories at the root level [/a, /b, /c]

you pin:
/a to rank 0
/b to rank 1
/c to rank 2

and you need to upgrade your Ceph Version. When it becomes time to reduce
max_mds to 1 and thereby reduce the number of ranks to 1, just rank 0 what
happens to directories /b and /c do they become unavailable between the
time when max_mds is reduced to 1 and after the upgrade when max_mds is
restored to 3. Alternatively if a rank disappears does the CephFS client
understand this and begin to ignore the pinned rank and makes use of the
remaining ranks? Thanks.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Quincy 22.04/Jammy packages

2022-10-21 Thread Konstantin Shalygin
Thank you Ilya!

> On 21 Oct 2022, at 21:02, Ilya Dryomov  wrote:
> 
> On Fri, Oct 21, 2022 at 12:48 PM Konstantin Shalygin  wrote:
>> 
>> CC'ed David
> 
> Hi Konstantin,
> 
> David has decided to pursue something else and is no longer working on
> Ceph [1].
> 
>> 
>> Maybe Ilya can tag someone from DevOps additionally
> 
> I think Dan answered this question yesterday [2]:
> 
>> there are no current plans as far as I'm aware to build earlier
>> releases for Jammy (22.04).
> 
> Also, David previously indicated that that wouldn't be straightforward:
> 
>> I'm working on backporting changes to quincy so we can get quincy
>> CentOS 9 packages.  That would be a much more monumental task for
>> Ubuntu 22.04.
> 
> [1] 
> https://lists.ceph.io/hyperkitty/list/d...@ceph.io/message/YK2LCTZ7DQ7DTGBAXMOUYDFMUILRKOOO/
> [2] 
> https://lists.ceph.io/hyperkitty/list/se...@ceph.io/thread/5CPTJCEGTV5KKZLBSLYE5OB6VTQXWE2N/
> 
> Thanks,
> 
>Ilya
> 
>> 
>> 
>> Thanks,
>> k
>> 
>> On 20 Oct 2022, at 20:07, Goutham Pacha Ravi  wrote:
>> 
>> +1
>> The OpenStack community is interested in this as well. We're trying to move
>> all our ubuntu testing to Ubuntu Jammy/22.04 [1]; and we consume packages
>> from download.ceph.com.
>> 
>> While we're adopting cephadm, a lot of OpenStack and Ceph deployers still
>> use other installers, and so the OpenStack CI system has had a barebones
>> install-from-package mechanism [2] that we use for our integration testing
>> with services like OpenStack Manila, Cinder, Glance and Nova.
>> 
>> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Quincy 22.04/Jammy packages

2022-10-21 Thread Ilya Dryomov
On Fri, Oct 21, 2022 at 12:48 PM Konstantin Shalygin  wrote:
>
> CC'ed David

Hi Konstantin,

David has decided to pursue something else and is no longer working on
Ceph [1].

>
> Maybe Ilya can tag someone from DevOps additionally

I think Dan answered this question yesterday [2]:

> there are no current plans as far as I'm aware to build earlier
> releases for Jammy (22.04).

Also, David previously indicated that that wouldn't be straightforward:

> I'm working on backporting changes to quincy so we can get quincy
> CentOS 9 packages.  That would be a much more monumental task for
> Ubuntu 22.04.

[1] 
https://lists.ceph.io/hyperkitty/list/d...@ceph.io/message/YK2LCTZ7DQ7DTGBAXMOUYDFMUILRKOOO/
[2] 
https://lists.ceph.io/hyperkitty/list/se...@ceph.io/thread/5CPTJCEGTV5KKZLBSLYE5OB6VTQXWE2N/

Thanks,

Ilya

>
>
> Thanks,
> k
>
> On 20 Oct 2022, at 20:07, Goutham Pacha Ravi  wrote:
>
> +1
> The OpenStack community is interested in this as well. We're trying to move
> all our ubuntu testing to Ubuntu Jammy/22.04 [1]; and we consume packages
> from download.ceph.com.
>
> While we're adopting cephadm, a lot of OpenStack and Ceph deployers still
> use other installers, and so the OpenStack CI system has had a barebones
> install-from-package mechanism [2] that we use for our integration testing
> with services like OpenStack Manila, Cinder, Glance and Nova.
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-10-21 Thread Satish Patel
Hi Eugen,

My error cleared up itself, Look like it took some time but now I am not
seeing any errors and the output is very clean. Thank you so much.




On Fri, Oct 21, 2022 at 1:46 PM Eugen Block  wrote:

> Do you still see it with ‚cephadm ls‘ on that node? If yes you could
> try ‚cephadm rm-daemon —name osd.3‘. Or you try it with the
> orchestrator: ceph orch daemon rm…
> I don’t have the exact command at the moment, you should check the docs.
>
> Zitat von Satish Patel :
>
> > Hi Eugen,
> >
> > I have delected osd.3 directory from datastorn4 node as you mentioned but
> > still i am seeing that duplicate osd in ps output.
> >
> > root@datastorn1:~# ceph orch ps | grep osd.3
> > osd.3  datastorn4stopped  5m
> > ago   3w-42.6G 
> > osd.3  datastorn5running (3w) 5m
> > ago   3w2587M42.6G  17.2.3 0912465dcea5  d139f8a1234b
> >
> > How do I clean up permanently?
> >
> >
> > On Fri, Oct 21, 2022 at 6:24 AM Eugen Block  wrote:
> >
> >> Hi,
> >>
> >> it looks like the OSDs haven't been cleaned up after removing them. Do
> >> you see the osd directory in /var/lib/ceph//osd.3 on datastorn4?
> >> Just remove the osd.3 directory, then cephadm won't try to activate it.
> >>
> >>
> >> Zitat von Satish Patel :
> >>
> >> > Folks,
> >> >
> >> > I have deployed 15 OSDs node clusters using cephadm and encount
> duplicate
> >> > OSD on one of the nodes and am not sure how to clean that up.
> >> >
> >> > root@datastorn1:~# ceph health
> >> > HEALTH_WARN 1 failed cephadm daemon(s); 1 pool(s) have no replicas
> >> > configured
> >> >
> >> > osd.3 is duplicated on two nodes, i would like to remove it from
> >> > datastorn4 but I'm not sure how to remove it. In the ceph osd tree I
> am
> >> not
> >> > seeing any duplicate.
> >> >
> >> > root@datastorn1:~# ceph orch ps | grep osd.3
> >> > osd.3  datastorn4stopped
> 7m
> >> > ago   3w-42.6G 
> >> > osd.3  datastorn5running (3w)
>  7m
> >> > ago   3w2584M42.6G  17.2.3 0912465dcea5  d139f8a1234b
> >> >
> >> >
> >> > Getting following error in logs
> >> >
> >> > 2022-10-21T09:10:45.226872+ mgr.datastorn1.nciiiu (mgr.14188)
> >> 1098186 :
> >> > cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
> >> datastorn4,
> >> > osd.3 in status running on datastorn5
> >> > 2022-10-21T09:11:46.254979+ mgr.datastorn1.nciiiu (mgr.14188)
> >> 1098221 :
> >> > cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
> >> datastorn4,
> >> > osd.3 in status running on datastorn5
> >> > 2022-10-21T09:12:53.009252+ mgr.datastorn1.nciiiu (mgr.14188)
> >> 1098256 :
> >> > cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
> >> datastorn4,
> >> > osd.3 in status running on datastorn5
> >> > 2022-10-21T09:13:59.283251+ mgr.datastorn1.nciiiu (mgr.14188)
> >> 1098293 :
> >> > cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
> >> datastorn4,
> >> > osd.3 in status running on datastorn5
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-10-21 Thread Eugen Block
Do you still see it with ‚cephadm ls‘ on that node? If yes you could  
try ‚cephadm rm-daemon —name osd.3‘. Or you try it with the  
orchestrator: ceph orch daemon rm…

I don’t have the exact command at the moment, you should check the docs.

Zitat von Satish Patel :


Hi Eugen,

I have delected osd.3 directory from datastorn4 node as you mentioned but
still i am seeing that duplicate osd in ps output.

root@datastorn1:~# ceph orch ps | grep osd.3
osd.3  datastorn4stopped  5m
ago   3w-42.6G 
osd.3  datastorn5running (3w) 5m
ago   3w2587M42.6G  17.2.3 0912465dcea5  d139f8a1234b

How do I clean up permanently?


On Fri, Oct 21, 2022 at 6:24 AM Eugen Block  wrote:


Hi,

it looks like the OSDs haven't been cleaned up after removing them. Do
you see the osd directory in /var/lib/ceph//osd.3 on datastorn4?
Just remove the osd.3 directory, then cephadm won't try to activate it.


Zitat von Satish Patel :

> Folks,
>
> I have deployed 15 OSDs node clusters using cephadm and encount duplicate
> OSD on one of the nodes and am not sure how to clean that up.
>
> root@datastorn1:~# ceph health
> HEALTH_WARN 1 failed cephadm daemon(s); 1 pool(s) have no replicas
> configured
>
> osd.3 is duplicated on two nodes, i would like to remove it from
> datastorn4 but I'm not sure how to remove it. In the ceph osd tree I am
not
> seeing any duplicate.
>
> root@datastorn1:~# ceph orch ps | grep osd.3
> osd.3  datastorn4stopped  7m
> ago   3w-42.6G 
> osd.3  datastorn5running (3w) 7m
> ago   3w2584M42.6G  17.2.3 0912465dcea5  d139f8a1234b
>
>
> Getting following error in logs
>
> 2022-10-21T09:10:45.226872+ mgr.datastorn1.nciiiu (mgr.14188)
1098186 :
> cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
datastorn4,
> osd.3 in status running on datastorn5
> 2022-10-21T09:11:46.254979+ mgr.datastorn1.nciiiu (mgr.14188)
1098221 :
> cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
datastorn4,
> osd.3 in status running on datastorn5
> 2022-10-21T09:12:53.009252+ mgr.datastorn1.nciiiu (mgr.14188)
1098256 :
> cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
datastorn4,
> osd.3 in status running on datastorn5
> 2022-10-21T09:13:59.283251+ mgr.datastorn1.nciiiu (mgr.14188)
1098293 :
> cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
datastorn4,
> osd.3 in status running on datastorn5
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-10-21 Thread Satish Patel
Hi Eugen,

I have delected osd.3 directory from datastorn4 node as you mentioned but
still i am seeing that duplicate osd in ps output.

root@datastorn1:~# ceph orch ps | grep osd.3
osd.3  datastorn4stopped  5m
ago   3w-42.6G 
osd.3  datastorn5running (3w) 5m
ago   3w2587M42.6G  17.2.3 0912465dcea5  d139f8a1234b

How do I clean up permanently?


On Fri, Oct 21, 2022 at 6:24 AM Eugen Block  wrote:

> Hi,
>
> it looks like the OSDs haven't been cleaned up after removing them. Do
> you see the osd directory in /var/lib/ceph//osd.3 on datastorn4?
> Just remove the osd.3 directory, then cephadm won't try to activate it.
>
>
> Zitat von Satish Patel :
>
> > Folks,
> >
> > I have deployed 15 OSDs node clusters using cephadm and encount duplicate
> > OSD on one of the nodes and am not sure how to clean that up.
> >
> > root@datastorn1:~# ceph health
> > HEALTH_WARN 1 failed cephadm daemon(s); 1 pool(s) have no replicas
> > configured
> >
> > osd.3 is duplicated on two nodes, i would like to remove it from
> > datastorn4 but I'm not sure how to remove it. In the ceph osd tree I am
> not
> > seeing any duplicate.
> >
> > root@datastorn1:~# ceph orch ps | grep osd.3
> > osd.3  datastorn4stopped  7m
> > ago   3w-42.6G 
> > osd.3  datastorn5running (3w) 7m
> > ago   3w2584M42.6G  17.2.3 0912465dcea5  d139f8a1234b
> >
> >
> > Getting following error in logs
> >
> > 2022-10-21T09:10:45.226872+ mgr.datastorn1.nciiiu (mgr.14188)
> 1098186 :
> > cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
> datastorn4,
> > osd.3 in status running on datastorn5
> > 2022-10-21T09:11:46.254979+ mgr.datastorn1.nciiiu (mgr.14188)
> 1098221 :
> > cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
> datastorn4,
> > osd.3 in status running on datastorn5
> > 2022-10-21T09:12:53.009252+ mgr.datastorn1.nciiiu (mgr.14188)
> 1098256 :
> > cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
> datastorn4,
> > osd.3 in status running on datastorn5
> > 2022-10-21T09:13:59.283251+ mgr.datastorn1.nciiiu (mgr.14188)
> 1098293 :
> > cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on
> datastorn4,
> > osd.3 in status running on datastorn5
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS_CLIENT_LATE_RELEASE after setting up scheduled CephFS snapshots

2022-10-21 Thread Edward R Huyer
Great, thank you both for the confirmation!

-Original Message-
From: Xiubo Li  
Sent: Friday, October 21, 2022 8:43 AM
To: Rishabh Dave ; Edward R Huyer 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: MDS_CLIENT_LATE_RELEASE after setting up 
scheduled CephFS snapshots


On 21/10/2022 19:39, Rishabh Dave wrote:
> Hi Edward,
>
> On Wed, 19 Oct 2022 at 21:27, Edward R Huyer  wrote:
>> I recently set up scheduled snapshots on my CephFS filesystem, and ever 
>> since the cluster has been intermittently going into HEALTH_WARN with an 
>> MDS_CLIENT_LATE_RELEASE notification.
>>
>> Specifically:
>>
>> [WARN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to 
>> capability release
>>
>>  mds.[redacted](mds.0): Client [redacted] failing to respond 
>> to capability release client_id: 806270628
>>
>>
>> I catch errors like this in the logs:
>> client.785155718 isn't responding to mclientcaps(revoke), ino 
>> 0x10004c1abc6 pending pAsLsXsFsc issued pAsLsXsFsc, sent 3844.269321 
>> seconds ago
>>
>> (Ignore the fact that the client numbers don't match in this case; 
>> one was captured before the client was rebooted, the other after.  
>> There's only one CephFS client and the numbers normally match.)
>>
>> If left alone, the issue eventually resolves itself, then comes back at some 
>> point in the future.
>>
>> It appears to be the same as this bug:  
>> https://tracker.ceph.com/issues/49434
>> Which leads here:  https://tracker.ceph.com/issues/57244
>> And then this pull request:  https://github.com/ceph/ceph/pull/47752
>>
>> My reading is that this is simply a matter of the MDS not recognizing that 
>> the caps have, in fact, been released, and that I can safely ignore the 
>> warnings until the patch percolates down to a Pacific release.  Is that 
>> right, or am I missing something significant?
>>
> The PR has been marked for being backported to Pacific, so the patch 
> will eventually end up in Pacific. In running tests for CephFS, I 
> haven't seen more complications from this bug than the issue you seem 
> to hit. This is what I see normally - "cluster [WRN] client.x 
> isn't responding to mclientcaps(revoke)". So ignoring it probably is 
> safe. I'll try to contact the PR author and ask for an opinion.

Yeah, normally if it doesn't cause any stuck for your applications it should be 
okay. As I know it won't.

And the ceph PR is still under reviewing and testing, once it gets merged I 
will backport it asap.

Thanks!

- Xiubo

> - Regards,
> Rishabh
>
>> -
>> Edward Huyer
>> Golisano College of Computing and Information Sciences Rochester 
>> Institute of Technology Golisano 70-2373
>> 152 Lomb Memorial Drive
>> Rochester, NY 14623
>> 585-475-6651
>> erh...@rit.edu
>>
>> Obligatory Legalese:
>> The information transmitted, including attachments, is intended only for the 
>> person(s) or entity to which it is addressed and may contain confidential 
>> and/or privileged material. Any review, retransmission, dissemination or 
>> other use of, or taking of any action in reliance upon this information by 
>> persons or entities other than the intended recipient is prohibited. If you 
>> received this in error, please contact the sender and destroy any copies of 
>> this information.
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
>> email to ceph-users-le...@ceph.io
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using multiple SSDs as DB

2022-10-21 Thread Robert Sander

Am 21.10.22 um 13:38 schrieb Christian:


The spec I used does not fully utilize the SSDs though. Instead of 1/8th of
the SSD, it uses about 28GB, so 1/32th of the SSD.


This is a bug in certain versions of ceph-volume:

https://tracker.ceph.com/issues/56031

It should be fixed in the latest releases.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS_CLIENT_LATE_RELEASE after setting up scheduled CephFS snapshots

2022-10-21 Thread Xiubo Li



On 21/10/2022 19:39, Rishabh Dave wrote:

Hi Edward,

On Wed, 19 Oct 2022 at 21:27, Edward R Huyer  wrote:

I recently set up scheduled snapshots on my CephFS filesystem, and ever since 
the cluster has been intermittently going into HEALTH_WARN with an 
MDS_CLIENT_LATE_RELEASE notification.

Specifically:

[WARN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability 
release

 mds.[redacted](mds.0): Client [redacted] failing to respond to 
capability release client_id: 806270628


I catch errors like this in the logs:
client.785155718 isn't responding to mclientcaps(revoke), ino 0x10004c1abc6 
pending pAsLsXsFsc issued pAsLsXsFsc, sent 3844.269321 seconds ago

(Ignore the fact that the client numbers don't match in this case; one was 
captured before the client was rebooted, the other after.  There's only one 
CephFS client and the numbers normally match.)

If left alone, the issue eventually resolves itself, then comes back at some 
point in the future.

It appears to be the same as this bug:  https://tracker.ceph.com/issues/49434
Which leads here:  https://tracker.ceph.com/issues/57244
And then this pull request:  https://github.com/ceph/ceph/pull/47752

My reading is that this is simply a matter of the MDS not recognizing that the 
caps have, in fact, been released, and that I can safely ignore the warnings 
until the patch percolates down to a Pacific release.  Is that right, or am I 
missing something significant?


The PR has been marked for being backported to Pacific, so the patch
will eventually end up in Pacific. In running tests for CephFS, I
haven't seen more complications from this bug than the issue you seem
to hit. This is what I see normally - "cluster [WRN] client.x
isn't responding to mclientcaps(revoke)". So ignoring it probably is
safe. I'll try to contact the PR author and ask for an opinion.


Yeah, normally if it doesn't cause any stuck for your applications it 
should be okay. As I know it won't.


And the ceph PR is still under reviewing and testing, once it gets 
merged I will backport it asap.


Thanks!

- Xiubo


- Regards,
Rishabh


-
Edward Huyer
Golisano College of Computing and Information Sciences
Rochester Institute of Technology
Golisano 70-2373
152 Lomb Memorial Drive
Rochester, NY 14623
585-475-6651
erh...@rit.edu

Obligatory Legalese:
The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS_CLIENT_LATE_RELEASE after setting up scheduled CephFS snapshots

2022-10-21 Thread Rishabh Dave
Hi Edward,

On Wed, 19 Oct 2022 at 21:27, Edward R Huyer  wrote:
>
> I recently set up scheduled snapshots on my CephFS filesystem, and ever since 
> the cluster has been intermittently going into HEALTH_WARN with an 
> MDS_CLIENT_LATE_RELEASE notification.
>
> Specifically:
>
> [WARN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability 
> release
>
> mds.[redacted](mds.0): Client [redacted] failing to respond to 
> capability release client_id: 806270628
>
>
> I catch errors like this in the logs:
> client.785155718 isn't responding to mclientcaps(revoke), ino 0x10004c1abc6 
> pending pAsLsXsFsc issued pAsLsXsFsc, sent 3844.269321 seconds ago
>
> (Ignore the fact that the client numbers don't match in this case; one was 
> captured before the client was rebooted, the other after.  There's only one 
> CephFS client and the numbers normally match.)
>
> If left alone, the issue eventually resolves itself, then comes back at some 
> point in the future.
>
> It appears to be the same as this bug:  https://tracker.ceph.com/issues/49434
> Which leads here:  https://tracker.ceph.com/issues/57244
> And then this pull request:  https://github.com/ceph/ceph/pull/47752
>
> My reading is that this is simply a matter of the MDS not recognizing that 
> the caps have, in fact, been released, and that I can safely ignore the 
> warnings until the patch percolates down to a Pacific release.  Is that 
> right, or am I missing something significant?
>

The PR has been marked for being backported to Pacific, so the patch
will eventually end up in Pacific. In running tests for CephFS, I
haven't seen more complications from this bug than the issue you seem
to hit. This is what I see normally - "cluster [WRN] client.x
isn't responding to mclientcaps(revoke)". So ignoring it probably is
safe. I'll try to contact the PR author and ask for an opinion.

- Regards,
Rishabh

> -
> Edward Huyer
> Golisano College of Computing and Information Sciences
> Rochester Institute of Technology
> Golisano 70-2373
> 152 Lomb Memorial Drive
> Rochester, NY 14623
> 585-475-6651
> erh...@rit.edu
>
> Obligatory Legalese:
> The information transmitted, including attachments, is intended only for the 
> person(s) or entity to which it is addressed and may contain confidential 
> and/or privileged material. Any review, retransmission, dissemination or 
> other use of, or taking of any action in reliance upon this information by 
> persons or entities other than the intended recipient is prohibited. If you 
> received this in error, please contact the sender and destroy any copies of 
> this information.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Using multiple SSDs as DB

2022-10-21 Thread Christian
Hi,

I have a problem fully utilizing some disks with cephadm service spec. The
host I have has the following disks:
4 SSD 900GB
32 HDD 10TB

I would like to use the SSDs as DB devices and the HDD devices as block. 8
HDDs per SSD and the available size for the DB would be about 111GB
(900GB/8).
The spec I used does not fully utilize the SSDs though. Instead of 1/8th of
the SSD, it uses about 28GB, so 1/32th of the SSD.

The spec I use:
spec:
  objectstore: bluestore
  filter_logic: AND
  data_devices:
rotational: 1
  db_devices:
rotational: 0

I saw "limit" in the docs but it sounds like it would limit the amount of
SSDs used for DB devices.

How can I use all of the SSDs‘ capacity?

Best,
Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Quincy 22.04/Jammy packages

2022-10-21 Thread Konstantin Shalygin
CC'ed David

Maybe Ilya can tag someone from DevOps additionally


Thanks,
k

> On 20 Oct 2022, at 20:07, Goutham Pacha Ravi  wrote:
> 
> +1
> The OpenStack community is interested in this as well. We're trying to move
> all our ubuntu testing to Ubuntu Jammy/22.04 [1]; and we consume packages
> from download.ceph.com .
> 
> While we're adopting cephadm, a lot of OpenStack and Ceph deployers still
> use other installers, and so the OpenStack CI system has had a barebones
> install-from-package mechanism [2] that we use for our integration testing
> with services like OpenStack Manila, Cinder, Glance and Nova.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-10-21 Thread Eugen Block

Hi,

it looks like the OSDs haven't been cleaned up after removing them. Do  
you see the osd directory in /var/lib/ceph//osd.3 on datastorn4?  
Just remove the osd.3 directory, then cephadm won't try to activate it.



Zitat von Satish Patel :


Folks,

I have deployed 15 OSDs node clusters using cephadm and encount duplicate
OSD on one of the nodes and am not sure how to clean that up.

root@datastorn1:~# ceph health
HEALTH_WARN 1 failed cephadm daemon(s); 1 pool(s) have no replicas
configured

osd.3 is duplicated on two nodes, i would like to remove it from
datastorn4 but I'm not sure how to remove it. In the ceph osd tree I am not
seeing any duplicate.

root@datastorn1:~# ceph orch ps | grep osd.3
osd.3  datastorn4stopped  7m
ago   3w-42.6G 
osd.3  datastorn5running (3w) 7m
ago   3w2584M42.6G  17.2.3 0912465dcea5  d139f8a1234b


Getting following error in logs

2022-10-21T09:10:45.226872+ mgr.datastorn1.nciiiu (mgr.14188) 1098186 :
cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4,
osd.3 in status running on datastorn5
2022-10-21T09:11:46.254979+ mgr.datastorn1.nciiiu (mgr.14188) 1098221 :
cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4,
osd.3 in status running on datastorn5
2022-10-21T09:12:53.009252+ mgr.datastorn1.nciiiu (mgr.14188) 1098256 :
cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4,
osd.3 in status running on datastorn5
2022-10-21T09:13:59.283251+ mgr.datastorn1.nciiiu (mgr.14188) 1098293 :
cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4,
osd.3 in status running on datastorn5
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [cephadm] Found duplicate OSDs

2022-10-21 Thread Satish Patel
Folks,

I have deployed 15 OSDs node clusters using cephadm and encount duplicate
OSD on one of the nodes and am not sure how to clean that up.

root@datastorn1:~# ceph health
HEALTH_WARN 1 failed cephadm daemon(s); 1 pool(s) have no replicas
configured

osd.3 is duplicated on two nodes, i would like to remove it from
datastorn4 but I'm not sure how to remove it. In the ceph osd tree I am not
seeing any duplicate.

root@datastorn1:~# ceph orch ps | grep osd.3
osd.3  datastorn4stopped  7m
ago   3w-42.6G 
osd.3  datastorn5running (3w) 7m
ago   3w2584M42.6G  17.2.3 0912465dcea5  d139f8a1234b


Getting following error in logs

2022-10-21T09:10:45.226872+ mgr.datastorn1.nciiiu (mgr.14188) 1098186 :
cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4,
osd.3 in status running on datastorn5
2022-10-21T09:11:46.254979+ mgr.datastorn1.nciiiu (mgr.14188) 1098221 :
cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4,
osd.3 in status running on datastorn5
2022-10-21T09:12:53.009252+ mgr.datastorn1.nciiiu (mgr.14188) 1098256 :
cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4,
osd.3 in status running on datastorn5
2022-10-21T09:13:59.283251+ mgr.datastorn1.nciiiu (mgr.14188) 1098293 :
cephadm [INF] Found duplicate OSDs: osd.3 in status stopped on datastorn4,
osd.3 in status running on datastorn5
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw networking

2022-10-21 Thread Janne Johansson
Den tors 20 okt. 2022 kl 18:57 skrev Wyll Ingersoll
:
> What network does radosgw use when it reads/writes the objects to the cluster?

Everything in ceph EXCEPT osd<->osd traffic uses the public network.
Anything that isn't backfills or replication betweens OSDs is always
using the public network, so you should design the networks based on
this.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io