[ceph-users] Re: osd removal leaves 'stray daemon'

2022-12-05 Thread Holger Naundorf

Hello,
a mgr failover did not change the situation - the osd still shows up in 
the 'ceph node ls' - I assume that this is more or less 'working as 
intended' as I did ask for the OSD to be kept in the CRUSH map to be 
replacd later - but as we are still not so experienced with Ceph here I 
wanted to get some input from other sites.


Regards,
Holger

On 30.11.22 16:28, Adam King wrote:
I typically don't see this when I do OSD replacement. If you do a mgr 
failover ("ceph mgr fail") and wait a few minutes does this still show 
up? The stray daemon/host warning is roughly equivalent to comparing the 
daemons in `ceph node ls` and `ceph orch ps` and seeing if there's 
anything in the former but not the latter. Sometimes I have seen the mgr 
will have some out of data info in the node ls and a failover will 
refresh it.


On Fri, Nov 25, 2022 at 6:07 AM Holger Naundorf > wrote:


Hello,
I have a question about osd removal/replacement:

I just removed an osd where the disk was still running but had read
errors, leading to failed deep scrubs - as the intent is to replace
this
as soon as we manage to get a spare I removed it with the
'--replace' flag:

# ceph orch osd rm 224 --replace

After all placement groups are evacuated I now have 1 osd down/out
and showing as 'destroyed':

# ceph osd tree
ID   CLASS  WEIGHT      TYPE NAME        STATUS     REWEIGHT  PRI-AFF
(...)
214    hdd    14.55269          osd.214         up   1.0  1.0
224    hdd    14.55269          osd.224  destroyed         0  1.0
234    hdd    14.55269          osd.234         up   1.0  1.0
(...)

All as expected - but now the health check complains that the
(destroyed) osd is not managed:

# ceph health detail
HEALTH_WARN 1 stray daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
      stray daemon osd.224 on host ceph19 not managed by cephadm

Is this expected behaviour and I have to live with the yellow check
until we get a replacement disk and recreate the osd or did something
not finish correctly?

Regards,
Holger

-- 
Dr. Holger Naundorf

Christian-Albrechts-Universität zu Kiel
Rechenzentrum / HPC / Server und Storage
Tel: +49 431 880-1990
Fax:  +49 431 880-1523
naund...@rz.uni-kiel.de 
___
ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io




--
Dr. Holger Naundorf
Christian-Albrechts-Universität zu Kiel
Rechenzentrum / HPC / Server und Storage
Tel: +49 431 880-1990
Fax:  +49 431 880-1523
naund...@rz.uni-kiel.de
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Odd 10-minute delay before recovery IO begins

2022-12-05 Thread Sean Matheny
Hi all, 

Thanks for the great responses. Confirming that this was the issue (feature). 
No idea why this was set differently for us in Nautilus. 

This should make the recovery benchmarking a bit faster now. :) 

Cheers,
Sean

> On 6/12/2022, at 3:09 PM, Wesley Dillingham  wrote:
> 
> I think you are experiencing the mon_osd_down_out_interval 
> 
> https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_interval
> 
> Ceph waits 10 minutes before marking a down osd as out for the reasons you 
> mention, but this would have been the case in nautilus as well. 
> 
> Respectfully,
> 
> Wes Dillingham
> w...@wesdillingham.com 
> LinkedIn 
> 
> 
> On Mon, Dec 5, 2022 at 5:20 PM Sean Matheny  > wrote:
>> Hi all,
>> 
>> New Quincy cluster here that I'm just running through some benchmarks 
>> against:
>> 
>> ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy 
>> (stable)
>> 11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs
>> 
>> I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node 
>> from the cluster until actual recovery IO begins. This is much different 
>> behaviour that what I'm used to in Nautilus previously, where recovery IO 
>> would commence within seconds. Downed OSDs are reflected in ceph health 
>> within a few seconds (as expected), and affected PGs show as undersized a 
>> few seconds later (as expected). I guess this 10-minute delay may even be a 
>> feature-- accidentally rebooting a node before setting recovery flags would 
>> prevent rebalancing, for example. Just thought it was worth asking in case 
>> it's a bug or something to look deeper into.  
>> 
>> I've read through the OSD config and all of my recovery tuneables look ok, 
>> for example: 
>> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ 
>> 
>> 
>> [ceph: root@ /]# ceph config get osd osd_recovery_delay_start
>> 20.00
>> 3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
>> 40.00
>> 5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
>> 60.10
>> 7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
>> 80.00
>> 9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
>> 100.025000
>> 
>> Thanks in advance.
>> 
>> Ngā mihi,
>> 
>> Sean Matheny
>> HPC Cloud Platform DevOps Lead
>> New Zealand eScience Infrastructure (NeSI)
>> 
>> e: sean.math...@nesi.org.nz 
>> 
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io 
>> To unsubscribe send an email to ceph-users-le...@ceph.io 
>> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Odd 10-minute delay before recovery IO begins

2022-12-05 Thread Wesley Dillingham
I think you are experiencing the mon_osd_down_out_interval

https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_interval

Ceph waits 10 minutes before marking a down osd as out for the reasons you
mention, but this would have been the case in nautilus as well.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Dec 5, 2022 at 5:20 PM Sean Matheny 
wrote:

> Hi all,
>
> New Quincy cluster here that I'm just running through some benchmarks
> against:
>
> ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy
> (stable)
> 11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs
>
> I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node
> from the cluster until actual recovery IO begins. This is much different
> behaviour that what I'm used to in Nautilus previously, where recovery IO
> would commence within seconds. Downed OSDs are reflected in ceph health
> within a few seconds (as expected), and affected PGs show as undersized a
> few seconds later (as expected). I guess this 10-minute delay may even be a
> feature-- accidentally rebooting a node before setting recovery flags would
> prevent rebalancing, for example. Just thought it was worth asking in case
> it's a bug or something to look deeper into.
>
> I've read through the OSD config and all of my recovery tuneables look ok,
> for example:
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/
>
> [ceph: root@ /]# ceph config get osd osd_recovery_delay_start
> 20.00
> 3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
> 40.00
> 5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
> 60.10
> 7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
> 80.00
> 9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
> 100.025000
>
> Thanks in advance.
>
> Ngā mihi,
>
> Sean Matheny
> HPC Cloud Platform DevOps Lead
> New Zealand eScience Infrastructure (NeSI)
>
> e: sean.math...@nesi.org.nz
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What to expect on rejoining a host to cluster?

2022-12-05 Thread Matt Larson
Frank,

 Then if you have only a few OSDs with excessive PG counts / usage, do you
reweight it down by something like 10-20% to acheive a better distribution
and improve capacity?  Do weight it back to normal after PGs have moved?

 I wondered if manually picking on some of the higher data usage OSDs could
get to a gold outcome and avoid continous rebalancing or other issues.

 Thanks,
   Matt

On Mon, Dec 5, 2022 at 4:32 AM Frank Schilder  wrote:

> Hi Matt,
>
> I can't comment on balancers, I don't use them. I manually re-weight OSDs,
> which fits well with our pools' OSD allocation. Also, we don't aim for
> perfect balance, we just remove the peak of allocation on the fullest few
> OSDs to avoid excessive capacity loss. Not balancing too much has the pro
> of being fairly stable under OSD failures/additions at the expanse of a few
> % less capacity.
>
> Maybe someone else an help here?
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Matt Larson 
> Sent: 04 December 2022 02:00:11
> To: Eneko Lacunza
> Cc: Frank Schilder; ceph-users
> Subject: Re: [ceph-users] Re: What to expect on rejoining a host to
> cluster?
>
> Thank you Frank and Eneko,
>
>  Without help and support from ceph admins like you, I would be adrift.  I
> really appreciate this.
>
>  I rejoined the host now one week ago, and the cluster has been dealing
> with the misplaced objects and recovering well.
>
> I will use this strategy in the future:
>
> "If you consider replacing the host and all disks, get a new host first
> and give it the host name in the crush map. Just before you deploy the new
> host, simply purge all down OSDs in its bucket (set norebalance) and
> deploy. Then, the data movement is restricted to re-balancing to the new
> host.
>
> If you just want to throw out the old host, destroy the OSDs but keep the
> IDs intact (ceph osd destroy). Then, no further re-balancing will happen
> and you can re-use the OSD ids later when adding a new host. That's a
> stable situation from an operations point of view."
>
> Last question I have is that I am now seeing that some OSDs have uneven
> load of PGs, which balancer do you recommend and any caveats for how the
> balancer operations can affect/slow the cluster?
>
> Thanks,
>   Matt
>
> On Mon, Nov 28, 2022 at 2:23 AM Eneko Lacunza  elacu...@binovo.es>> wrote:
> Hi Matt,
>
> Also, make sure that when rejoining host has correct time. I have seen
> clusters going down when rejoining hosts that were down for maintenance for
> various weeks and came in with datetime deltas of some months (no idea why
> that happened, I arrived with the firefighter team ;-) )
>
> Cheers
>
> El 27/11/22 a las 13:27, Frank Schilder escribió:
>
> Hi Matt,
>
> if you didn't touch the OSDs on that host, they will join and only objects
> that have been modified will actually be updated. Ceph keeps some basic
> history information and can detect changes. 2 weeks is not a very long
> time. If you have a lot of cold data, re-integration will go fast.
>
> Initially, you will see a huge amount of misplaced objects. However, this
> count will go down much faster than objects/s recovery.
>
> Before you rejoin the host, I would fix its issues though. Now that you
> have it out of the cluster, do the maintenance first. There is no rush. In
> fact, you can buy a new host, install the OSDs in the new one and join that
> to the cluster with the host-name of the old host.
>
> If you consider replacing the host and all disks, the get a new host first
> and give it the host name in the crush map. Just before you deploy the new
> host, simply purge all down OSDs in its bucket (set norebalance) and
> deploy. Then, the data movement is restricted to re-balancing to the new
> host.
>
> If you just want to throw out the old host, destroy the OSDs but keep the
> IDs intact (ceph osd destroy). Then, no further re-balancing will happen
> and you can re-use the OSD ids later when adding a new host. That's a
> stable situation from an operations point of view.
>
> Hope that helps.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Matt Larson 
> Sent: 26 November 2022 21:07:41
> To: ceph-users
> Subject: [ceph-users] What to expect on rejoining a host to cluster?
>
> Hi all,
>
>  I have had a host with 16 OSDs, each 14TB in capacity that started having
> hardware issues causing it to crash.  I took this host down 2 weeks ago,
> and the data rebalanced to the remaining 11 server hosts in the Ceph
> cluster over this time period.
>
>  My initial goal was to then remove the host completely from the cluster
> with `ceph osd rm XX` and `ceph osd purge XX` (Adding/Removing OSDs — Ceph
> Documentation
> <
> 

[ceph-users] Re: Odd 10-minute delay before recovery IO begins

2022-12-05 Thread Tyler Brekke
Sounds like your OSDs were down, but not marked out. Recovery will only
occur once they are actually marked out. The default
mon_osd_down_out_interval is 10 minutes.

You can mark them out explicitly with ceph osd out 

On Mon, Dec 5, 2022 at 2:20 PM Sean Matheny 
wrote:

> Hi all,
>
> New Quincy cluster here that I'm just running through some benchmarks
> against:
>
> ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy
> (stable)
> 11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs
>
> I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node
> from the cluster until actual recovery IO begins. This is much different
> behaviour that what I'm used to in Nautilus previously, where recovery IO
> would commence within seconds. Downed OSDs are reflected in ceph health
> within a few seconds (as expected), and affected PGs show as undersized a
> few seconds later (as expected). I guess this 10-minute delay may even be a
> feature-- accidentally rebooting a node before setting recovery flags would
> prevent rebalancing, for example. Just thought it was worth asking in case
> it's a bug or something to look deeper into.
>
> I've read through the OSD config and all of my recovery tuneables look ok,
> for example:
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/
>
> [ceph: root@ /]# ceph config get osd osd_recovery_delay_start
> 20.00
> 3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
> 40.00
> 5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
> 60.10
> 7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
> 80.00
> 9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
> 100.025000
>
> Thanks in advance.
>
> Ngā mihi,
>
> Sean Matheny
> HPC Cloud Platform DevOps Lead
> New Zealand eScience Infrastructure (NeSI)
>
> e: sean.math...@nesi.org.nz
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Tyler Brekke
Senior Engineer I
tbre...@digitalocean.com
--
We're Hiring!  | @digitalocean
 | YouTube

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Odd 10-minute delay before recovery IO begins

2022-12-05 Thread Stephen Smith6
The 10 minute delay is the default wait period Ceph allows before it attempts 
to heal the data. See "mon_osd_report_timeout" – I believe the default is 900 
seconds.

From: Sean Matheny 
Date: Monday, December 5, 2022 at 5:20 PM
To: ceph-users@ceph.io 
Cc: Blair Bethwaite , pi...@stackhpc.com 
, Michal Nasiadka 
Subject: [EXTERNAL] [ceph-users] Odd 10-minute delay before recovery IO begins
Hi all,

New Quincy cluster here that I'm just running through some benchmarks against:

ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable)
11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs

I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node from 
the cluster until actual recovery IO begins. This is much different behaviour 
that what I'm used to in Nautilus previously, where recovery IO would commence 
within seconds. Downed OSDs are reflected in ceph health within a few seconds 
(as expected), and affected PGs show as undersized a few seconds later (as 
expected). I guess this 10-minute delay may even be a feature-- accidentally 
rebooting a node before setting recovery flags would prevent rebalancing, for 
example. Just thought it was worth asking in case it's a bug or something to 
look deeper into.

I've read through the OSD config and all of my recovery tuneables look ok, for 
example:
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ 

[ceph: root@ /]# ceph config get osd osd_recovery_delay_start
20.00
3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
40.00
5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
60.10
7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
80.00
9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
100.025000

Thanks in advance.

Ngā mihi,

Sean Matheny
HPC Cloud Platform DevOps Lead
New Zealand eScience Infrastructure (NeSI)

e: sean.math...@nesi.org.nz



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Odd 10-minute delay before recovery IO begins

2022-12-05 Thread Sean Matheny
Hi all,

New Quincy cluster here that I'm just running through some benchmarks against:

ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable)
11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs

I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node from 
the cluster until actual recovery IO begins. This is much different behaviour 
that what I'm used to in Nautilus previously, where recovery IO would commence 
within seconds. Downed OSDs are reflected in ceph health within a few seconds 
(as expected), and affected PGs show as undersized a few seconds later (as 
expected). I guess this 10-minute delay may even be a feature-- accidentally 
rebooting a node before setting recovery flags would prevent rebalancing, for 
example. Just thought it was worth asking in case it's a bug or something to 
look deeper into.  

I've read through the OSD config and all of my recovery tuneables look ok, for 
example: 
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/

[ceph: root@ /]# ceph config get osd osd_recovery_delay_start
20.00
3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
40.00
5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
60.10
7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
80.00
9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
100.025000

Thanks in advance.

Ngā mihi,

Sean Matheny
HPC Cloud Platform DevOps Lead
New Zealand eScience Infrastructure (NeSI)

e: sean.math...@nesi.org.nz



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Quincy - Node does not detect ssd disks...?

2022-12-05 Thread Ralph Soika

Hello,

I have installed a Ceph Cluster (quincy) with 3 nodes. The problem I am 
facing with since days is, that new hosts added into my cluster did not 
show the disks I wanted to use for OSDs.


For example one of my nodes has 2 disks (SSD 500G).

|/dev/sda| is used for the OS (Debian 11)

The second disk |/dev/sdb| should be used by ceph for a OSD. But Ceph 
does not recognize my 2nd disk on this node:


Calling ceph-volume directly on the node:

|$ sudo cephadm ceph-volume inventory |

results in an empty list - no disks recognized. But on the other hand, 
the command 'lsblk' shows that the disk exists:


|$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:16 1 476.9G 0 
disk ├─sda1 8:17 1 32G 0 part [SWAP] ├─sda2 8:18 1 1G 0 part /boot 
└─sda3 8:19 1 443.9G 0 part / sdb 8:0 1 476.9G 0 disk |


So it looks to me that something with my partitions or logical volumes 
is totally wrong. Or something else?


I tried this with several different hosts. It seems that nodes with 
NVME/SSD have no problem. But this can be a wrong observation by myself. 
So I mean I have already added OSDs successful from my initial node.


My question is: how can I figure out what is wrong with the server that 
it does not show its disk to ceph? The hardware is not totally 
identically. So maybe there is something with the controller? But how 
can we analyze this?


Thanks for any hints

===
Ralph

--

*Imixs Software Solutions GmbH*
*Web:* www.imixs.com  *Phone:* +49 (0)89-452136 16
*Timezone:* Europe/Berlin - CET/CEST
*Office:* Agnes-Pockels-Bogen 1, 80992 München
Registergericht: Amtsgericht Muenchen, HRB 136045
Geschaeftsführer: Gaby Heinle u. Ralph Soika

*Imixs* is an open source company, read more: www.imixs.org 


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: No Authentication/Authorization for creating topics on RGW?

2022-12-05 Thread Yuval Lifshitz
Hi Ulrich,
You are correct, there is no specific authorization needed for creating
topics. User authentication is done as with any other REST call, but there
are no restrictions and any user can create a topic.
Would probably make sense to limit that ability. Would appreciate if you
could open a tracker for that.

Thanks,

Yuval


On Mon, Dec 5, 2022 at 2:26 PM Ulrich Klein 
wrote:

> Hi,
>
> I'm experimenting with notifications for S3 buckets.
> I got it working with notifications to HTTP(S) endpoints.
>
> What I did:
>
> Create a topic:
> # cat create_topic.data
> Action=CreateTopic
> =topictest2
> =verify-ssl=false
> =use-ssl=false
> =OpaqueData=Hallodrio
> =push-endpoint=
> http://helper.example.com/cgi-bin/topictest
> =persistent=false
> =cloudevents=false
> 
>
> # curl --request POST 'https://rgw.example.com' --data @create_topic.data
> https://sns.amazonaws.com/doc/2010-03-31/
> ">arn:aws:sns:example::topictest2f0904533-f4ed-4d60-886c-4125fcbed97b.4944109.3169009808426767767
>
>
> And then created a notification for some user, which I received ok via
> http.
>
>
> What I'm wondering:
> There was no authentication/authorization necessary at all to create the
> topic??
> Is that normal? Any <...> could create a million topics that way.
>
> Is there a way to prevent that from happening? I haven't found one in the
> docs.
>
> I guess - being new to the topic of notifications - that I'm missing
> something obvious?
>
> Ciao, Uli
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OMAP data growth

2022-12-05 Thread Wyll Ingersoll


But why is OMAP data usage growing at a rate 10x the amount of the actual data 
being written to RGW?

From: Robert Sander 
Sent: Monday, December 5, 2022 3:06 AM
To: ceph-users@ceph.io 
Subject: [ceph-users] Re: OMAP data growth

Am 02.12.22 um 21:09 schrieb Wyll Ingersoll:

>*   What is causing the OMAP data consumption to grow so fast and can it 
> be trimmed/throttled?

S3 is a heavy user of OMAP data. RBD and CephFS not so much.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What to expect on rejoining a host to cluster?

2022-12-05 Thread Stefan Kooman

On 12/5/22 10:32, Frank Schilder wrote:

Hi Matt,

I can't comment on balancers, I don't use them. I manually re-weight OSDs, 
which fits well with our pools' OSD allocation. Also, we don't aim for perfect 
balance, we just remove the peak of allocation on the fullest few OSDs to avoid 
excessive capacity loss. Not balancing too much has the pro of being fairly 
stable under OSD failures/additions at the expanse of a few % less capacity.

Maybe someone else an help here?



Try JJ's Ceph balancer [1]. In our case it turned out to be *way* more 
efficient than built-in balancer (faster conversion, less movements 
involved). And able to achieve a very good PG distribution and "reclaim" 
lot's of space. I Highly recommended it.


Gr. Stefan

[1]: https://github.com/TheJJ/ceph-balancer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Stretch Cluster - df pool size (Max Avail)

2022-12-05 Thread Kilian Ries
Looks like we are still waiting for a merge here ... can anybody help out? 
Really looking forward for the fix to get merged ...


https://github.com/ceph/ceph/pull/47189

https://tracker.ceph.com/issues/56650


Thanks


Von: Gregory Farnum 
Gesendet: Donnerstag, 28. Juli 2022 17:01:34
An: Nicolas FONTAINE
Cc: Kilian Ries; ceph-users
Betreff: Re: [ceph-users] Re: Ceph Stretch Cluster - df pool size (Max Avail)

https://tracker.ceph.com/issues/56650

There's a PR in progress to resolve this issue now. (Thanks, Prashant!)
-Greg

On Thu, Jul 28, 2022 at 7:52 AM Nicolas FONTAINE  wrote:
>
> Hello,
>
> We have exactly the same problem. Did you find an answer or should we
> open a bug report?
>
> Sincerely,
>
> Nicolas.
>
> Le 23/06/2022 à 11:42, Kilian Ries a écrit :
> > Hi Joachim,
> >
> >
> > yes i assigned the stretch rule to the pool (4x replica / 2x min). The rule 
> > says that two replicas should be in every datacenter.
> >
> >
> > $ ceph osd tree
> > ID   CLASS  WEIGHTTYPE NAME   STATUS  REWEIGHT  PRI-AFF
> >   -1 62.87799  root default
> > -17 31.43900  datacenter site1
> > -15 31.43900  rack b7
> >   -3 10.48000  host host01
> >0ssd   1.74699  osd.0   up   1.0  1.0
> >1ssd   1.74699  osd.1   up   1.0  1.0
> >2ssd   1.74699  osd.2   up   1.0  1.0
> >3ssd   1.74699  osd.3   up   1.0  1.0
> >4ssd   1.74699  osd.4   up   1.0  1.0
> >5ssd   1.74699  osd.5   up   1.0  1.0
> >   -5 10.48000  host host02
> >6ssd   1.74699  osd.6   up   1.0  1.0
> >7ssd   1.74699  osd.7   up   1.0  1.0
> >8ssd   1.74699  osd.8   up   1.0  1.0
> >9ssd   1.74699  osd.9   up   1.0  1.0
> >   10ssd   1.74699  osd.10  up   1.0  1.0
> >   11ssd   1.74699  osd.11  up   1.0  1.0
> >   -7 10.48000  host host03
> >   12ssd   1.74699  osd.12  up   1.0  1.0
> >   13ssd   1.74699  osd.13  up   1.0  1.0
> >   14ssd   1.74699  osd.14  up   1.0  1.0
> >   15ssd   1.74699  osd.15  up   1.0  1.0
> >   16ssd   1.74699  osd.16  up   1.0  1.0
> >   17ssd   1.74699  osd.17  up   1.0  1.0
> > -18 31.43900  datacenter site2
> > -16 31.43900  rack h2
> >   -9 10.48000  host host04
> >   18ssd   1.74699  osd.18  up   1.0  1.0
> >   19ssd   1.74699  osd.19  up   1.0  1.0
> >   20ssd   1.74699  osd.20  up   1.0  1.0
> >   21ssd   1.74699  osd.21  up   1.0  1.0
> >   22ssd   1.74699  osd.22  up   1.0  1.0
> >   23ssd   1.74699  osd.23  up   1.0  1.0
> > -11 10.48000  host host05
> >   24ssd   1.74699  osd.24  up   1.0  1.0
> >   25ssd   1.74699  osd.25  up   1.0  1.0
> >   26ssd   1.74699  osd.26  up   1.0  1.0
> >   27ssd   1.74699  osd.27  up   1.0  1.0
> >   28ssd   1.74699  osd.28  up   1.0  1.0
> >   29ssd   1.74699  osd.29  up   1.0  1.0
> > -13 10.48000  host host06
> >   30ssd   1.74699  osd.30  up   1.0  1.0
> >   31ssd   1.74699  osd.31  up   1.0  1.0
> >   32ssd   1.74699  osd.32  up   1.0  1.0
> >   33ssd   1.74699  osd.33  up   1.0  1.0
> >   34ssd   1.74699  osd.34  up   1.0  1.0
> >   35ssd   1.74699  osd.35  up   1.0  1.0
> >
> >
> > So regarding my calculation it should be
> >
> >
> > (6x Nodes * 6x SSD * 1,8TB) / 4 = 16 TB
> >
> >
> > Is this maybe a bug in the stretch mode that i only get displayed half the 
> > size available?
> >
> >
> > Regards,
> >
> > Kilian
> >
> >
> > 
> > Von: Clyso GmbH - Ceph Foundation Member 
> > Gesendet: Mittwoch, 22. Juni 2022 18:20:59
> > An: Kilian Ries; ceph-users(a)ceph.io
> > Betreff: Re: [ceph-users] Ceph Stretch Cluster - df 

[ceph-users] Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

2022-12-05 Thread Sven Kieske
On Sa, 2022-12-03 at 01:54 +0100, Boris Behrens wrote:
> hi,
> maybe someone here can help me to debug an issue we faced today.
> 
> Today one of our clusters came to a grinding halt with 2/3 of our OSDs
> reporting slow ops.
> Only option to get it back to work fast, was to restart all OSDs daemons.
> 
> The cluster is an octopus cluster with 150 enterprise SSD OSDs. Last work
> on the cluster: synced in a node 4 days ago.
> 
> The only health issue, that was reported, was the SLOW_OPS. No slow pings
> on the networks. No restarting OSDs. Nothing.
> 
> I was able to ping it to a 20s timeframe and I read ALL the logs in a 20
> minute timeframe around this issue.
> 
> I haven't found any clues.
> 
> Maybe someone encountered this in the past?

do you happen to run your rocksdb on a dedicated caching device (nvme ssd)?

I observed slow ops in octopus after a faulty nvme ssd was inserted in one ceph 
server.
as was said in other mails, try to isolate your root cause.

maybe the node added 4 days ago was the culprit here?

we were able to pinpoint the nvme by monitoring the slow osds
and the commonality in this case was the same nvme cache device.

you should always benchmark new hardware/perform burn-in tests imho, which
is not always possible due to environment constraints.

-- 
Mit freundlichen Grüßen / Regards

Sven Kieske
Systementwickler / systems engineer
 
 
Mittwald CM Service GmbH & Co. KG
Königsberger Straße 4-6
32339 Espelkamp
 
Tel.: 05772 / 293-900
Fax: 05772 / 293-333
 
https://www.mittwald.de
 
Geschäftsführer: Robert Meyer, Florian Jürgens
 
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

Informationen zur Datenverarbeitung im Rahmen unserer Geschäftstätigkeit 
gemäß Art. 13-14 DSGVO sind unter www.mittwald.de/ds abrufbar.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] No Authentication/Authorization for creating topics on RGW?

2022-12-05 Thread Ulrich Klein
Hi,

I'm experimenting with notifications for S3 buckets.
I got it working with notifications to HTTP(S) endpoints.

What I did:

Create a topic:
# cat create_topic.data
Action=CreateTopic
=topictest2
=verify-ssl=false
=use-ssl=false
=OpaqueData=Hallodrio
=push-endpoint=http://helper.example.com/cgi-bin/topictest
=persistent=false
=cloudevents=false

# curl --request POST 'https://rgw.example.com' --data @create_topic.data
https://sns.amazonaws.com/doc/2010-03-31/;>arn:aws:sns:example::topictest2f0904533-f4ed-4d60-886c-4125fcbed97b.4944109.3169009808426767767


And then created a notification for some user, which I received ok via http.


What I'm wondering:
There was no authentication/authorization necessary at all to create the topic??
Is that normal? Any <...> could create a million topics that way.

Is there a way to prevent that from happening? I haven't found one in the docs.

I guess - being new to the topic of notifications - that I'm missing something 
obvious?

Ciao, Uli
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade Ceph 16.2.10 to 17.2.x for Openstack RBD storage

2022-12-05 Thread Zakhar Kirpichenko
Answering my own question: Wallaby's cinder doesn't support Ceph Quincy,
https://docs.openstack.org/cinder/latest/configuration/block-storage/drivers/ceph-rbd-volume-driver.html


"Supported Ceph versions

The current release cycle model for Ceph targets a new release yearly on 1
March, with there being at most two active stable releases at any time.

For a given OpenStack release, Cinder supports the current Ceph active
stable releases plus the two prior releases.

For example, at the time of the OpenStack Wallaby release in April 2021,
the Ceph active supported releases are Pacific and Octopus. The Cinder
Wallaby release therefore supports Ceph Pacific, Octopus, Nautilus, and
Mimic.

Additionally, it is expected that the version of the Ceph client available
to Cinder or any of its associated libraries (os-brick, cinderlib) is
aligned with the Ceph server version. Mixing server and client versions is
unsupported and may lead to anomalous behavior."

As pointed out by a kind soul on reddit.

/Z

On Mon, 5 Dec 2022 at 06:01, Zakhar Kirpichenko  wrote:

> Hi!
>
> I'm planning to upgrade our Ceph cluster from Pacific (16.2.10) to Quincy
> (17.2.x). The cluster is used for Openstack block storage (RBD), Openstack
> version is Wallaby built on Ubuntu 20.04.
>
> Is anyone using Ceph Quincy (17.2.x) with Openstack Wallaby? If you are,
> please let me know if you've encountered any issues specific to these Ceph
> and Openstack versions.
>
> Many thanks!
>
> Best regards,
> Zakhar
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What to expect on rejoining a host to cluster?

2022-12-05 Thread Frank Schilder
Hi Matt,

I can't comment on balancers, I don't use them. I manually re-weight OSDs, 
which fits well with our pools' OSD allocation. Also, we don't aim for perfect 
balance, we just remove the peak of allocation on the fullest few OSDs to avoid 
excessive capacity loss. Not balancing too much has the pro of being fairly 
stable under OSD failures/additions at the expanse of a few % less capacity.

Maybe someone else an help here?

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Matt Larson 
Sent: 04 December 2022 02:00:11
To: Eneko Lacunza
Cc: Frank Schilder; ceph-users
Subject: Re: [ceph-users] Re: What to expect on rejoining a host to cluster?

Thank you Frank and Eneko,

 Without help and support from ceph admins like you, I would be adrift.  I 
really appreciate this.

 I rejoined the host now one week ago, and the cluster has been dealing with 
the misplaced objects and recovering well.

I will use this strategy in the future:

"If you consider replacing the host and all disks, get a new host first and 
give it the host name in the crush map. Just before you deploy the new host, 
simply purge all down OSDs in its bucket (set norebalance) and deploy. Then, 
the data movement is restricted to re-balancing to the new host.

If you just want to throw out the old host, destroy the OSDs but keep the IDs 
intact (ceph osd destroy). Then, no further re-balancing will happen and you 
can re-use the OSD ids later when adding a new host. That's a stable situation 
from an operations point of view."

Last question I have is that I am now seeing that some OSDs have uneven load of 
PGs, which balancer do you recommend and any caveats for how the balancer 
operations can affect/slow the cluster?

Thanks,
  Matt

On Mon, Nov 28, 2022 at 2:23 AM Eneko Lacunza 
mailto:elacu...@binovo.es>> wrote:
Hi Matt,

Also, make sure that when rejoining host has correct time. I have seen clusters 
going down when rejoining hosts that were down for maintenance for various 
weeks and came in with datetime deltas of some months (no idea why that 
happened, I arrived with the firefighter team ;-) )

Cheers

El 27/11/22 a las 13:27, Frank Schilder escribió:

Hi Matt,

if you didn't touch the OSDs on that host, they will join and only objects that 
have been modified will actually be updated. Ceph keeps some basic history 
information and can detect changes. 2 weeks is not a very long time. If you 
have a lot of cold data, re-integration will go fast.

Initially, you will see a huge amount of misplaced objects. However, this count 
will go down much faster than objects/s recovery.

Before you rejoin the host, I would fix its issues though. Now that you have it 
out of the cluster, do the maintenance first. There is no rush. In fact, you 
can buy a new host, install the OSDs in the new one and join that to the 
cluster with the host-name of the old host.

If you consider replacing the host and all disks, the get a new host first and 
give it the host name in the crush map. Just before you deploy the new host, 
simply purge all down OSDs in its bucket (set norebalance) and deploy. Then, 
the data movement is restricted to re-balancing to the new host.

If you just want to throw out the old host, destroy the OSDs but keep the IDs 
intact (ceph osd destroy). Then, no further re-balancing will happen and you 
can re-use the OSD ids later when adding a new host. That's a stable situation 
from an operations point of view.

Hope that helps.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Matt Larson 
Sent: 26 November 2022 21:07:41
To: ceph-users
Subject: [ceph-users] What to expect on rejoining a host to cluster?

Hi all,

 I have had a host with 16 OSDs, each 14TB in capacity that started having
hardware issues causing it to crash.  I took this host down 2 weeks ago,
and the data rebalanced to the remaining 11 server hosts in the Ceph
cluster over this time period.

 My initial goal was to then remove the host completely from the cluster
with `ceph osd rm XX` and `ceph osd purge XX` (Adding/Removing OSDs — Ceph
Documentation
).
However, I found that after the large amount of data migration from the
recovery, that the purge and removal from the crush map for an OSDs still
required another large data move.  It appears that it would have been a
better strategy to assign a 0 weight to an OSD to have only a single larger
data move instead of twice.

 I'd like to join the downed server back into the Ceph cluster.  It still
has 14 OSDs that are listed as out/down that would be brought back online.
My question is what can I expect if I bring this host online?  Will the
OSDs of a host that has been offline for an extended 

[ceph-users] Re: OMAP data growth

2022-12-05 Thread Robert Sander

Am 02.12.22 um 21:09 schrieb Wyll Ingersoll:


   *   What is causing the OMAP data consumption to grow so fast and can it be 
trimmed/throttled?


S3 is a heavy user of OMAP data. RBD and CephFS not so much.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io