[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-27 Thread Mary Zhang
Thank you Eugen so much for your insights! We will definitely apply this
method next time. :-)

Best Regards,
Mary

On Sat, Apr 27, 2024 at 1:29 AM Eugen Block  wrote:

> If the rest of the cluster is healthy and your resiliency is
> configured properly, for example to sustain the loss of one or more
> hosts at a time, you don’t need to worry about a single disk. Just
> take it out and remove it (forcefully) so it doesn’t have any clients
> anymore. Ceph will immediately assign different primary OSDs and your
> clients will be happy again. ;-)
>
> Zitat von Mary Zhang :
>
> > Thank you Wesley for the clear explanation between the 2 methods!
> > The tracker issue you mentioned https://tracker.ceph.com/issues/44400
> talks
> > about primary-affinity. Could primary-affinity help remove an OSD with
> > hardware issue from the cluster gracefully?
> >
> > Thanks,
> > Mary
> >
> >
> > On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham  >
> > wrote:
> >
> >> What you want to do is to stop the OSD (and all its copies of data it
> >> contains) by stopping the OSD service immediately. The downside of this
> >> approach is it causes the PGs on that OSD to be degraded. But the
> upside is
> >> the OSD which has bad hardware is immediately no  longer participating
> in
> >> any client IO (the source of your RGW 503s). In this situation the PGs
> go
> >> into degraded+backfilling
> >>
> >> The alternative method is to keep the failing OSD up and in the cluster
> >> but slowly migrate the data off of it, this would be a long drawn out
> >> period of time in which the failing disk would continue to serve client
> >> reads and also facilitate backfill but you wouldnt take a copy of the
> data
> >> out of the cluster and cause degraded PGs. In this scenario the PGs
> would
> >> be remapped+backfilling
> >>
> >> I tried to find a way to have your cake and eat it to in relation to
> this
> >> "predicament" in this tracker issue:
> https://tracker.ceph.com/issues/44400
> >> but it was deemed "wont fix".
> >>
> >> Respectfully,
> >>
> >> *Wes Dillingham*
> >> LinkedIn 
> >> w...@wesdillingham.com
> >>
> >>
> >>
> >>
> >> On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang 
> >> wrote:
> >>
> >>> Thank you Eugen for your warm help!
> >>>
> >>> I'm trying to understand the difference between 2 methods.
> >>> For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph
> >>> Documentation
> >>> 
> >>> says
> >>> it involves 2 steps:
> >>>
> >>>1.
> >>>
> >>>evacuating all placement groups (PGs) from the OSD
> >>>2.
> >>>
> >>>removing the PG-free OSD from the cluster
> >>>
> >>> For method 2, or the procedure you recommended, Adding/Removing OSDs —
> >>> Ceph
> >>> Documentation
> >>> <
> >>>
> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual
> >>> >
> >>> says
> >>> "After the OSD has been taken out of the cluster, Ceph begins
> rebalancing
> >>> the cluster by migrating placement groups out of the OSD that was
> removed.
> >>> "
> >>>
> >>> What's the difference between "evacuating PGs" in method 1 and
> "migrating
> >>> PGs" in method 2? I think method 1 must read the OSD to be removed.
> >>> Otherwise, we would not see slow ops warning. Does method 2 not involve
> >>> reading this OSD?
> >>>
> >>> Thanks,
> >>> Mary
> >>>
> >>> On Fri, Apr 26, 2024 at 5:15 AM Eugen Block  wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > if you remove the OSD this way, it will be drained. Which means that
> >>> > it will try to recover PGs from this OSD, and in case of hardware
> >>> > failure it might lead to slow requests. It might make sense to
> >>> > forcefully remove the OSD without draining:
> >>> >
> >>> > - stop the osd daemon
> >>> > - mark it as out
> >>> > - osd purge  [--force] [--yes-i-really-mean-it]
> >>> >
> >>> > Regards,
> >>> > Eugen
> >>> >
> >>> > Zitat von Mary Zhang :
> >>> >
> >>> > > Hi,
> >>> > >
> >>> > > We recently removed an osd from our Cepth cluster. Its underlying
> disk
> >>> > has
> >>> > > a hardware issue.
> >>> > >
> >>> > > We use command: ceph orch osd rm osd_id --zap
> >>> > >
> >>> > > During the process, sometimes ceph cluster enters warning state
> with
> >>> slow
> >>> > > ops on this osd. Our rgw also failed to respond to requests and
> >>> returned
> >>> > > 503.
> >>> > >
> >>> > > We restarted rgw daemon to make it work again. But the same failure
> >>> > occured
> >>> > > from time to time. Eventually we noticed that rgw 503 error is a
> >>> result
> >>> > of
> >>> > > osd slow ops.
> >>> > >
> >>> > > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
> >>> > > hardware issue won't impact cluster performance & rgw availbility.
> Is
> >>> our
> >>> > > expectation reasonable? What's the best way to handle osd with
> >>> hardware
> >>> > > failures?
> >>> > >
> >>> > > Thank you in advance for any comments or suggestions.
> >>> > 

[ceph-users] Re: [EXTERN] cache pressure?

2024-04-27 Thread Erich Weiler
Actually should I be excluding my whole cephfs filesystem?  Like, if I 
mount it as /cephfs, should my stanza looks something like:


{
   "files.watcherExclude": {
  "**/.git/objects/**": true,
  "**/.git/subtree-cache/**": true,
  "**/node_modules/*/**": true,
 "**/.cache/**": true,
 "**/.conda/**": true,
 "**/.local/**": true,
 "**/.nextflow/**": true,
 "**/work/**": true,
 "**/cephfs/**": true
   }
}

On 4/27/24 12:24 AM, Dietmar Rieder wrote:

Hi Erich,

hope it helps. Let us know.

Dietmar


Am 26. April 2024 15:52:06 MESZ schrieb Erich Weiler :

Hi Dietmar,

We do in fact have a bunch of users running vscode on our HPC head
node as well (in addition to a few of our general purpose
interactive compute servers). I'll suggest they make the mods you
referenced! Thanks for the tip.

cheers,
erich

On 4/24/24 12:58 PM, Dietmar Rieder wrote:

Hi Erich,

in our case the "client failing to respond to cache pressure"
situation is/was often caused by users how have vscode
connecting via ssh to our HPC head node. vscode makes heavy use
of file watchers and we have seen users with > 400k watchers.
All these watched files must be held in the MDS cache and if you
have multiple users at the same time running vscode it gets
problematic.

Unfortunately there is no global setting - at least none that we
are aware of - for vscode to exclude certain files or
directories from being watched. We asked the users to configure
their vscode (Remote Settings -> Watcher Exclude) as follows:

{
   "files.watcherExclude": {
  "**/.git/objects/**": true,
  "**/.git/subtree-cache/**": true,
  "**/node_modules/*/**": true,
     "**/.cache/**": true,
     "**/.conda/**": true,
     "**/.local/**": true,
     "**/.nextflow/**": true,
     "**/work/**": true
   }
}

~/.vscode-server/data/Machine/settings.json

To monitor and find processes with watcher you may use inotify-info
>

HTH
   Dietmar

On 4/23/24 15:47, Erich Weiler wrote:

So I'm trying to figure out ways to reduce the number of
warnings I'm getting and I'm thinking about the one "client
failing to respond to cache pressure".

Is there maybe a way to tell a client (or all clients) to
reduce the amount of cache it uses or to release caches
quickly?  Like, all the time?

I know the linux kernel (and maybe ceph) likes to cache
everything for a while, and rightfully so, but I suspect in
my use case it may be more efficient to more quickly purge
the cache or to in general just cache way less overall...?

We have many thousands of threads all doing different things
that are hitting our filesystem, so I suspect the caching
isn't really doing me much good anyway due to the churn, and
probably is causing more problems than it helping...

-erich


ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Day NYC 2024 Slides

2024-04-27 Thread Matt Vandermeulen

Hi folks!

Thanks for a great Ceph Day event in NYC! I wanted to make sure I posted 
my slides before I forget (and encourage others to do the same). Feel 
free to reach out in the Ceph Slack 
https://ceph.io/en/community/connect/


How we Operate Ceph at Scale (DigitalOcean):

- 
https://do-matt-ams3.ams3.digitaloceanspaces.com/2024%20Ceph%20Day%20NYC%20How%20we%20Operate%20Ceph%20at%20Scale.pdf
- 
https://do-matt-sfo3.sfo3.digitaloceanspaces.com/2024%20Ceph%20Day%20NYC%20How%20we%20Operate%20Ceph%20at%20Scale.pdf


Discards in Ceph (DigitalOcean):

- 
https://do-matt-ams3.ams3.digitaloceanspaces.com/2024%20Ceph%20Day%20NYC%20Discards%20Lightning%20Talk.pdf
- 
https://do-matt-sfo3.sfo3.digitaloceanspaces.com/2024%20Ceph%20Day%20NYC%20Discards%20Lightning%20Talk.pdf


Matt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crash

2024-04-27 Thread Alexey GERASIMOV
I don't know why, but I miss my topic when I reply to it. Moderators, please 
delete unnecessary topics and move my answer to the correct topic.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crash

2024-04-27 Thread Alexey GERASIMOV
Colleagues, thank you for the advice to check the operability of MGRs. In fact, 
it is strange also: we checked our nodes for the network issues (ip 
connectivity, sockets, ACL, DNS) and find nothing wrong - but suddenly just the 
restart of all MGRs solved the problem with stale PGs and with ceph commands 
hang!

So, we are at the start point again - ceph is working except MDS daemons crash. 
But now we see some additional errors in MDS logs when try to start the daemon:

dir 0x1000dd10fa0 object missing on disk; some files may be lost 
(/volumes/csi/csi-vol-2eb40f89-f2e1-11ee-b657-3aa98da4c4a6/1080803d-1277-4ad8-ae80-a004bd3a5699/gallery/pc-12083932925583528732)

dir 0x1000dd10f9d object missing on disk; some files may be lost 
(/volumes/csi/csi-vol-2eb40f89-f2e1-11ee-b657-3aa98da4c4a6/1080803d-1277-4ad8-ae80-a004bd3a5699/cadserver-filevault/project-files/661fb14d341d3746ea5c2a8f

I promiced to create the bug, so will do it later a bit. But should I try to do 
something more from my side also? What I did exactly last time:
cephfs-journal-tool journal reset
cephfs-table-tool all reset session
cephfs-data-scan scan_extents
cephfs-data-scan scan_inodes
cephfs-data-scan scan_links
cephfs-data-scan cleanup

And one more question: is it possible to access to cephfs content directly, 
without MDS?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crash

2024-04-27 Thread Alexey GERASIMOV
Colleagues, thank you for the advice to check the operability of MGRs. In fact, 
it is strange also: we checked our nodes for the network issues (ip 
connectivity, sockets, ACL, DNS) and find nothing wrong - but suddenly just the 
restart of all MGRs solved the problem with stale PGs and with ceph commands 
hang!

So, we are at the start point again - ceph is working except MDS daemons crash. 
But now we see some additional errors in MDS logs when try to start the daemon:


dir 0x1000dd10fa0 object missing on disk; some files may be lost 
(/volumes/csi/csi-vol-2eb40f89-f2e1-11ee-b657-3aa98da4c4a6/1080803d-1277-4ad8-ae80-a004bd3a5699/gallery/pc-12083932925583528732)

dir 0x1000dd10f9d object missing on disk; some files may be lost 
(/volumes/csi/csi-vol-2eb40f89-f2e1-11ee-b657-3aa98da4c4a6/1080803d-1277-4ad8-ae80-a004bd3a5699/cadserver-filevault/project-files/661fb14d341d3746ea5c2a8f


 I promiced to create the bug, so will do it later a bit. But should I try to 
do something more from my side also?  What I did exactly last time:

cephfs-journal-tool journal reset
cephfs-table-tool all reset session
cephfs-data-scan scan_extents
cephfs-data-scan scan_inodes
cephfs-data-scan scan_links
cephfs-data-scan cleanup

And one more question: is it possible to access to cephfs content directly, 
without MDS?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-27 Thread Eugen Block
If the rest of the cluster is healthy and your resiliency is  
configured properly, for example to sustain the loss of one or more  
hosts at a time, you don’t need to worry about a single disk. Just  
take it out and remove it (forcefully) so it doesn’t have any clients  
anymore. Ceph will immediately assign different primary OSDs and your  
clients will be happy again. ;-)


Zitat von Mary Zhang :


Thank you Wesley for the clear explanation between the 2 methods!
The tracker issue you mentioned https://tracker.ceph.com/issues/44400 talks
about primary-affinity. Could primary-affinity help remove an OSD with
hardware issue from the cluster gracefully?

Thanks,
Mary


On Fri, Apr 26, 2024 at 8:43 AM Wesley Dillingham 
wrote:


What you want to do is to stop the OSD (and all its copies of data it
contains) by stopping the OSD service immediately. The downside of this
approach is it causes the PGs on that OSD to be degraded. But the upside is
the OSD which has bad hardware is immediately no  longer participating in
any client IO (the source of your RGW 503s). In this situation the PGs go
into degraded+backfilling

The alternative method is to keep the failing OSD up and in the cluster
but slowly migrate the data off of it, this would be a long drawn out
period of time in which the failing disk would continue to serve client
reads and also facilitate backfill but you wouldnt take a copy of the data
out of the cluster and cause degraded PGs. In this scenario the PGs would
be remapped+backfilling

I tried to find a way to have your cake and eat it to in relation to this
"predicament" in this tracker issue: https://tracker.ceph.com/issues/44400
but it was deemed "wont fix".

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com




On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang 
wrote:


Thank you Eugen for your warm help!

I'm trying to understand the difference between 2 methods.
For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph
Documentation

says
it involves 2 steps:

   1.

   evacuating all placement groups (PGs) from the OSD
   2.

   removing the PG-free OSD from the cluster

For method 2, or the procedure you recommended, Adding/Removing OSDs —
Ceph
Documentation
<
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual
>
says
"After the OSD has been taken out of the cluster, Ceph begins rebalancing
the cluster by migrating placement groups out of the OSD that was removed.
"

What's the difference between "evacuating PGs" in method 1 and "migrating
PGs" in method 2? I think method 1 must read the OSD to be removed.
Otherwise, we would not see slow ops warning. Does method 2 not involve
reading this OSD?

Thanks,
Mary

On Fri, Apr 26, 2024 at 5:15 AM Eugen Block  wrote:

> Hi,
>
> if you remove the OSD this way, it will be drained. Which means that
> it will try to recover PGs from this OSD, and in case of hardware
> failure it might lead to slow requests. It might make sense to
> forcefully remove the OSD without draining:
>
> - stop the osd daemon
> - mark it as out
> - osd purge  [--force] [--yes-i-really-mean-it]
>
> Regards,
> Eugen
>
> Zitat von Mary Zhang :
>
> > Hi,
> >
> > We recently removed an osd from our Cepth cluster. Its underlying disk
> has
> > a hardware issue.
> >
> > We use command: ceph orch osd rm osd_id --zap
> >
> > During the process, sometimes ceph cluster enters warning state with
slow
> > ops on this osd. Our rgw also failed to respond to requests and
returned
> > 503.
> >
> > We restarted rgw daemon to make it work again. But the same failure
> occured
> > from time to time. Eventually we noticed that rgw 503 error is a
result
> of
> > osd slow ops.
> >
> > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
> > hardware issue won't impact cluster performance & rgw availbility. Is
our
> > expectation reasonable? What's the best way to handle osd with
hardware
> > failures?
> >
> > Thank you in advance for any comments or suggestions.
> >
> > Best Regards,
> > Mary Zhang
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd-mirror get status updates quicker

2024-04-27 Thread Eugen Block
Hi, I didn’t find any other config options other than you already did.  
Just wanted to note that I did read your message. :-)

Maybe one of the Devs can comment.

Zitat von Stefan Kooman :


Hi,

We're testing with rbd-mirror (mode snapshot) and try to get status  
updates about snapshots as fast a possible. We want to use  
rbd-mirror as a migration tool between two clusters and keep  
downtime during migration as short as possible. Therefore we have  
tuned the following parameters and set them to 1 second (default 30  
seconds):


rbd_mirror_pool_replayers_refresh_interval
rbd_mirror_image_state_check_interval
rbd_mirror_sync_point_update_age

However, on the destination cluster, the "last_update:" field is  
only updated every 30 seconds. Is this tunable?


Goal is to determine when the last snapshot that is made on the  
source has made it to the target and a demote (source) and promote  
(target) can be initiated.


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERN] cache pressure?

2024-04-27 Thread Dietmar Rieder
Hi Erich,

hope it helps. Let us know.

Dietmar 

Am 26. April 2024 15:52:06 MESZ schrieb Erich Weiler :
>Hi Dietmar,
>
>We do in fact have a bunch of users running vscode on our HPC head node as 
>well (in addition to a few of our general purpose interactive compute 
>servers).  I'll suggest they make the mods you referenced!  Thanks for the tip.
>
>cheers,
>erich
>
>On 4/24/24 12:58 PM, Dietmar Rieder wrote:
>> Hi Erich,
>> 
>> in our case the "client failing to respond to cache pressure" situation 
>> is/was often caused by users how have vscode connecting via ssh to our HPC 
>> head node. vscode makes heavy use of file watchers and we have seen users 
>> with > 400k watchers. All these watched files must be held in the MDS cache 
>> and if you have multiple users at the same time running vscode it gets 
>> problematic.
>> 
>> Unfortunately there is no global setting - at least none that we are aware 
>> of - for vscode to exclude certain files or directories from being watched. 
>> We asked the users to configure their vscode (Remote Settings -> Watcher 
>> Exclude) as follows:
>> 
>> {
>>    "files.watcherExclude": {
>>   "**/.git/objects/**": true,
>>   "**/.git/subtree-cache/**": true,
>>   "**/node_modules/*/**": true,
>>      "**/.cache/**": true,
>>      "**/.conda/**": true,
>>      "**/.local/**": true,
>>      "**/.nextflow/**": true,
>>      "**/work/**": true
>>    }
>> }
>> 
>> ~/.vscode-server/data/Machine/settings.json
>> 
>> To monitor and find processes with watcher you may use inotify-info
>> 
>> 
>> HTH
>>    Dietmar
>> 
>> On 4/23/24 15:47, Erich Weiler wrote:
>>> So I'm trying to figure out ways to reduce the number of warnings I'm 
>>> getting and I'm thinking about the one "client failing to respond to cache 
>>> pressure".
>>> 
>>> Is there maybe a way to tell a client (or all clients) to reduce the amount 
>>> of cache it uses or to release caches quickly?  Like, all the time?
>>> 
>>> I know the linux kernel (and maybe ceph) likes to cache everything for a 
>>> while, and rightfully so, but I suspect in my use case it may be more 
>>> efficient to more quickly purge the cache or to in general just cache way 
>>> less overall...?
>>> 
>>> We have many thousands of threads all doing different things that are 
>>> hitting our filesystem, so I suspect the caching isn't really doing me much 
>>> good anyway due to the churn, and probably is causing more problems than it 
>>> helping...
>>> 
>>> -erich
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>> 
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph reef and (slow) backfilling - how to speed it up

2024-04-27 Thread Götz Reinicke
Dear ceph community,

I’ve a ceph cluster which got upgraded from nautilus/pacific/…to reef over 
time. Now I added two new nodes to an existing EC pool as I did with the 
previous versions of ceph.

Now I face the fact, that the previous „backfilling tuning“ I’v used by 
increasing injectargs --osd-max-backfills=XX --osd-recovery-max-active=YY dose 
not work anymore.

With adjusting thous parameters the backfill was running with up to 2k +- 
objects/s.

As I’m not (yet) familiar with the reef opiont the only speed up so far I found 
is „ceph config set osd osd_mclock_profile high_recovery_ops“ which currently 
runs the backfill with up to 600 opbjects/s.

My question: What is a best (simple) way to speed that backfill up ? 

I’v tried to understand the custom profiles (?) but without success - and did 
not apply anything other yet.

Thanks for feedback and suggestions ! Best regards . Götz





smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io