[ceph-users] Re: slow recovery with Quincy

2023-10-11 Thread Ben
thanks. I tried and it improved the situation by double the speed to 10MB/s
or something. Good catch for the fix!

It would be good to be at 50MB/s of recovery as far as cluster
infrastructure could support in my case. there may be other constraint on
resource utilization for recovery that I am not aware of?

胡 玮文  于2023年10月11日周三 00:18写道:

> Hi Ben,
>
> Please see this thread
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/PWHG6QJ6N2TJEYD2U4AXJAJ23CRPJG4E/#7ZMBM23GXYFIGY52ZWJDY5NUSYSDSYL6
>  for
> possible workaround.
>
> 发自我的 iPad
>
> 在 2023年10月10日,22:26,Ben  写道:
>
> Dear cephers,
>
> with one osd down(200GB/9.1TB data), rebalance takes 3 hours still in
> progress. Client bandwidth can go as high as 200MB/s. With little client
> request throughput, recovery goes at couple MB/s. I wonder if there is
> configuration to polish for improvement. It runs with quincy 17.2.5,
> deployed by cephadm. The slowness can do harm in peak hours of usage.
>
> Best wishes,
>
> Ben
> -
>volumes: 1/1 healthy
>pools:   8 pools, 209 pgs
>objects: 93.04M objects, 4.8 TiB
>usage:   15 TiB used, 467 TiB / 482 TiB avail
>pgs: 1206837/279121971 objects degraded (0.432%)
> 208 active+clean
> 1   active+undersized+degraded+remapped+backfilling
>
>  io:
>client:   80 KiB/s rd, 420 KiB/s wr, 12 op/s rd, 29 op/s wr
>recovery: 6.2 MiB/s, 113 objects/s
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to fix 1 Inconsistent PG

2023-10-11 Thread Wesley Dillingham
Just to be clear, you should remove the osd by stopping the daemon and
marking it out before you repair the PG. The pg may not be able to be
repaired until you remove the bad disk.

1 - identify the bad disk (via scrubs or SMART/dmesg inspection)
2 - stop daemon and mark it out
3 - wait for PG to finish backfill
4 - issue the pg repair

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Wed, Oct 11, 2023 at 4:38 PM Wesley Dillingham 
wrote:

> If I recall correctly When the acting or up_set of an PG changes the scrub
> information is lost. This was likely lost when you stopped osd.238 and
> changed the sets.
>
> I do not believe based on your initial post you need to be using the
> objectstore tool currently. Inconsistent PGs are a common occurrence and
> can be repaired.
>
> After your most recent post I would get osd.238 back in the cluster unless
> you have reason to believe it is the failing hardware. But it could be any
> of the osds in the following set (from your initial post)
> [238,106,402,266,374,498,590,627,684,73,66]
>
> You should inspect the SMART data and dmesg on the drives and servers
> supporting the above OSDs to determine which one is failing.
>
> After you get the PG back to active+clean+inconsistent (get osd.238 back
> in and it finishes its backfill) you can re-issue a manual deep-scrub of it
> and once that deep-scrub finishes the rados list-inconsistent-obj 15.f4f
> should return and implicate a single osd with errors.
>
> Finally you should issue the PG repair again.
>
> In order to get your manually issued scrubs and repairs to start sooner
> you may want to set the noscrub and nodeep-scrub flags until you can get
> your PG repaired.
>
> As an aside osd_max_scrubs of 9 is too aggressive IMO I would drop that
> back to 3, max
>
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn 
>
>
> On Wed, Oct 11, 2023 at 10:51 AM Siddhit Renake 
> wrote:
>
>> Hello Wes,
>>
>> Thank you for your response.
>>
>> brc1admin:~ # rados list-inconsistent-obj 15.f4f
>> No scrub information available for pg 15.f4f
>>
>> brc1admin:~ # ceph osd ok-to-stop osd.238
>> OSD(s) 238 are ok to stop without reducing availability or risking data,
>> provided there are no other concurrent failures or interventions.
>> 341 PGs are likely to be degraded (but remain available) as a result.
>>
>> Before I proceed with your suggested action plan, needed clarification on
>> below.
>> In order to list all objects residing on the inconsistent PG, we had
>> stopped the primary osd (osd.238) and extracted the list of all objects
>> residing on this osd using ceph-objectstore tool. We notice that that when
>> we stop the osd (osd.238) using systemctl, RGW gateways continuously
>> restarts which is impacting our S3 service availability. This was observed
>> twice when we stopped osd.238 for general maintenance activity w.r.t
>> ceph-objectstore tool. How can we ensure that stopping and marking out
>> osd.238 ( primary osd of inconsistent pg) does not impact RGW service
>> availability ?
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to fix 1 Inconsistent PG

2023-10-11 Thread Wesley Dillingham
If I recall correctly When the acting or up_set of an PG changes the scrub
information is lost. This was likely lost when you stopped osd.238 and
changed the sets.

I do not believe based on your initial post you need to be using the
objectstore tool currently. Inconsistent PGs are a common occurrence and
can be repaired.

After your most recent post I would get osd.238 back in the cluster unless
you have reason to believe it is the failing hardware. But it could be any
of the osds in the following set (from your initial post)
[238,106,402,266,374,498,590,627,684,73,66]

You should inspect the SMART data and dmesg on the drives and servers
supporting the above OSDs to determine which one is failing.

After you get the PG back to active+clean+inconsistent (get osd.238 back in
and it finishes its backfill) you can re-issue a manual deep-scrub of it
and once that deep-scrub finishes the rados list-inconsistent-obj 15.f4f
should return and implicate a single osd with errors.

Finally you should issue the PG repair again.

In order to get your manually issued scrubs and repairs to start sooner you
may want to set the noscrub and nodeep-scrub flags until you can get your
PG repaired.

As an aside osd_max_scrubs of 9 is too aggressive IMO I would drop that
back to 3, max


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Wed, Oct 11, 2023 at 10:51 AM Siddhit Renake 
wrote:

> Hello Wes,
>
> Thank you for your response.
>
> brc1admin:~ # rados list-inconsistent-obj 15.f4f
> No scrub information available for pg 15.f4f
>
> brc1admin:~ # ceph osd ok-to-stop osd.238
> OSD(s) 238 are ok to stop without reducing availability or risking data,
> provided there are no other concurrent failures or interventions.
> 341 PGs are likely to be degraded (but remain available) as a result.
>
> Before I proceed with your suggested action plan, needed clarification on
> below.
> In order to list all objects residing on the inconsistent PG, we had
> stopped the primary osd (osd.238) and extracted the list of all objects
> residing on this osd using ceph-objectstore tool. We notice that that when
> we stop the osd (osd.238) using systemctl, RGW gateways continuously
> restarts which is impacting our S3 service availability. This was observed
> twice when we stopped osd.238 for general maintenance activity w.r.t
> ceph-objectstore tool. How can we ensure that stopping and marking out
> osd.238 ( primary osd of inconsistent pg) does not impact RGW service
> availability ?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou
I've ran additional tests with Pacific releases and with "ceph-volume 
inventory" things went wrong with the first v16.11 release 
(v16.2.11-20230125)


=== Ceph v16.2.10-20220920 ===

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True  SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG HE253GJ
/dev/sdb  465.76 GB    True    False WDC WD5003ABYX-1

=== Ceph v16.2.11-20230125 ===

Device Path   Size Device nodes    rotates available 
Model name



May be this could help to see what has changed ?

Patrick

Le 11/10/2023 à 17:38, Eugen Block a écrit :
That's really strange. Just out of curiosity, have you tried Quincy 
(and/or Reef) as well? I don't recall what inventory does in the 
background exactly, I believe Adam King mentioned that in some thread, 
maybe that can help here. I'll search for that thread tomorrow.


Zitat von Patrick Begou :


Hi Eugen,

[root@mostha1 ~]# rpm -q cephadm
cephadm-16.2.14-0.el8.noarch

Log associated to the

2023-10-11 16:16:02,167 7f820515fb80 DEBUG 


cephadm ['gather-facts']
2023-10-11 16:16:02,208 7f820515fb80 DEBUG /bin/podman: 4.4.1
2023-10-11 16:16:02,313 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,317 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,322 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,326 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,329 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,333 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:04,474 7ff2a5c08b80 DEBUG 


cephadm ['ceph-volume', 'inventory']
2023-10-11 16:16:04,516 7ff2a5c08b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:16:04,520 7ff2a5c08b80 DEBUG Using default config: 
/etc/ceph/ceph.conf
2023-10-11 16:16:04,573 7ff2a5c08b80 DEBUG /usr/bin/podman: 
0d28d71358d7,445.8MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
2084faaf4d54,13.27MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
61073c53805d,512.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,361.1MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
7493a28808ad,163.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
a89672a3accf,59.22MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
b45271cc9726,54.24MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
e00ec13ab138,707.3MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
fcb1e1a6b08d,35.55MB / 50.32GB
2023-10-11 16:16:04,630 7ff2a5c08b80 DEBUG /usr/bin/podman: 
0d28d71358d7,1.28%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
2084faaf4d54,0.00%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
61073c53805d,1.19%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,1.03%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
7493a28808ad,0.78%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
a89672a3accf,0.11%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
b45271cc9726,1.35%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
e00ec13ab138,0.43%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
fcb1e1a6b08d,0.02%
2023-10-11 16:16:04,634 7ff2a5c08b80 INFO Inferring fsid 
250f9864-0142-11ee-8e5f-00266cf8869c
2023-10-11 16:16:04,691 7ff2a5c08b80 DEBUG /usr/bin/podman: 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman: 
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman: 
docker.io/ceph/ceph@sha256:056637972a107df4096f10951e4216b21fcd8ae0b9fb4552e628d35df3f61139
2023-10-11 16:16:04,694 7ff2a5c08b80 INFO Using recent ceph image 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

2023-10-11 16:16:05,094 7ff2a5c08b80 DEBUG stat: 167 167
2023-10-11 16:16:05,903 7ff2a5c08b80 DEBUG Acquiring lock 
140679815723776 on 
/run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:16:05,903 7ff2a5c08b80 DEBUG Lock 140679815723776 
acquired on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:16:05,929 7ff2a5c08b80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:05,933 7ff2a5c08b80 DEBUG sestatus: 

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou
This afternoon I have a look at the python file but do not manage how it 
works with containers as I am only a Fortran HPC programmer... but I 
found that "cephadm gather-facts" shows all the HDD in Pacific.


Some quick tests show:

== Nautilus ==

[root@mostha1 ~]# cephadm --image quay.io/ceph/ceph:v14 ceph-volume 
inventory

Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True  SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG HE253GJ
/dev/sdb  465.76 GB    True    False WDC WD5003ABYX-1

== Octopus ==

[root@mostha1 ~]# cephadm --image quay.io/ceph/ceph:v15 ceph-volume 
inventory

Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True  SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG 
HE253GJNautilus

/dev/sdb  465.76 GB    True    False WDC WD5003ABYX-1

== Pacific ==

[root@mostha1 ~]# cephadm --image quay.io/ceph/ceph:v16 ceph-volume 
inventory

Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size Device nodes    rotates available 
Model name


== Quincy ==

[root@mostha1 ~]# cephadm --image quay.io/ceph/ceph:v17 ceph-volume 
inventory

Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size Device nodes    rotates available 
Model name


== Reef ==

[root@mostha1 ~]# cephadm --image quay.io/ceph/ceph:v18 ceph-volume 
inventory

Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size Device nodes    rotates available 
Model name


Could it be related to deprecated hardware support in Ceph with SATA 
drives ?


Patrick

Le 11/10/2023 à 17:38, Eugen Block a écrit :
That's really strange. Just out of curiosity, have you tried Quincy 
(and/or Reef) as well? I don't recall what inventory does in the 
background exactly, I believe Adam King mentioned that in some thread, 
maybe that can help here. I'll search for that thread tomorrow.


Zitat von Patrick Begou :


Hi Eugen,

[root@mostha1 ~]# rpm -q cephadm
cephadm-16.2.14-0.el8.noarch

Log associated to the

2023-10-11 16:16:02,167 7f820515fb80 DEBUG 


cephadm ['gather-facts']
2023-10-11 16:16:02,208 7f820515fb80 DEBUG /bin/podman: 4.4.1
2023-10-11 16:16:02,313 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,317 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,322 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,326 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,329 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,333 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:04,474 7ff2a5c08b80 DEBUG 


cephadm ['ceph-volume', 'inventory']
2023-10-11 16:16:04,516 7ff2a5c08b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:16:04,520 7ff2a5c08b80 DEBUG Using default config: 
/etc/ceph/ceph.conf
2023-10-11 16:16:04,573 7ff2a5c08b80 DEBUG /usr/bin/podman: 
0d28d71358d7,445.8MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
2084faaf4d54,13.27MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
61073c53805d,512.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,361.1MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
7493a28808ad,163.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
a89672a3accf,59.22MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
b45271cc9726,54.24MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
e00ec13ab138,707.3MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
fcb1e1a6b08d,35.55MB / 50.32GB
2023-10-11 16:16:04,630 7ff2a5c08b80 DEBUG /usr/bin/podman: 
0d28d71358d7,1.28%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
2084faaf4d54,0.00%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
61073c53805d,1.19%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,1.03%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
7493a28808ad,0.78%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
a89672a3accf,0.11%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
b45271cc9726,1.35%
2023-10-11 16:16:04,631 7ff2a5c08b80 

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
Thank you, Frank. This confirms that monitors indeed do this, and

Our boot drives in 3 systems are smaller 1 DWPD drives (RAID1 to protect
against a random single drive failure), and over 3 years mons have eaten
through 60% of their endurance. Other systems have larger boot drives and
2% of their endurance were used up over 1.5 years.

It would still be good to get an understanding why monitors do this, and
whether there is any way to reduce the amount of writes. Unfortunately,
Ceph documentation in this regard is severely lacking.

I'm copying this to ceph-docs, perhaps someone will find it useful and
adjust the hardware recommendations.

/Z

On Wed, 11 Oct 2023, 18:23 Frank Schilder,  wrote:

> Oh wow! I never bothered looking, because on our hardware the wear is so
> low:
>
> # iotop -ao -bn 2 -d 300
> Total DISK READ :   0.00 B/s | Total DISK WRITE :   6.46 M/s
> Actual DISK READ:   0.00 B/s | Actual DISK WRITE:   6.47 M/s
> TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN  IOCOMMAND
>2230 be/4 ceph  0.00 B   1818.71 M  0.00 %  0.46 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
> [rocksdb:low0]
>2256 be/4 ceph  0.00 B 19.27 M  0.00 %  0.43 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
> [safe_timer]
>2250 be/4 ceph  0.00 B 42.38 M  0.00 %  0.26 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
> [fn_monstore]
>2231 be/4 ceph  0.00 B 58.36 M  0.00 %  0.01 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65
> [rocksdb:high0]
> 644 be/3 root  0.00 B576.00 K  0.00 %  0.00 % [jbd2/sda3-8]
>2225 be/4 ceph  0.00 B128.00 K  0.00 %  0.00 % ceph-mon
> --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01
> --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 [log]
> 1637141 be/4 root  0.00 B  0.00 B  0.00 %  0.00 %
> [kworker/u113:2-flush-8:0]
> 1636453 be/4 root  0.00 B  0.00 B  0.00 %  0.00 %
> [kworker/u112:0-ceph0]
>1560 be/4 root  0.00 B 20.00 K  0.00 %  0.00 % rsyslogd -n
> [in:imjournal]
>1561 be/4 root  0.00 B 56.00 K  0.00 %  0.00 % rsyslogd -n
> [rs:main Q:Reg]
>
> 1.8GB every 5 minutes, thats 518GB per day. The 400G drives we have are
> rated 10DWPD and with the 6-drives RAID10 config this gives plenty of
> life-time. I guess this write load will kill any low-grade SSD (typical
> bood devices, even enterprise ones) specifically if its smaller drives and
> the controller doesn't reallocate cells according to remaining write
> endurance.
>
> I guess there was a reason for the recommendations by Dell. I always
> thought that the recent recommendation for MON store storage in the ceph
> docs are a "bit unrealistic", apparently both, in size and in performance
> (including endurance). Well, I guess you need to look for write intensive
> drives with decent specs. If you do, also go for sufficient size. This will
> absorb temporary usage peaks that can be very large and also provide extra
> endurance with SSDs with good controllers.
>
> I also think the recommendations on the ceph docs deserve a reality check.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Zakhar Kirpichenko 
> Sent: Wednesday, October 11, 2023 4:30 PM
> To: Eugen Block
> Cc: Frank Schilder; ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: Ceph 16.2.x mon compactions, disk writes
>
> Eugen,
>
> Thanks for your response. May I ask what numbers you're referring to?
>
> I am not referring to monitor store.db sizes. I am specifically referring
> to writes monitors do to their store.db file by frequently rotating and
> replacing them with new versions during compactions. The size of the
> store.db remains more or less the same.
>
> This is a 300s iotop snippet, sorted by aggregated disk writes:
>
> Total DISK READ:35.56 M/s | Total DISK WRITE:23.89 M/s
> Current DISK READ:  35.64 M/s | Current DISK WRITE:  24.09 M/s
> TID  PRIO  USER DISK READ DISK WRITE>  SWAPIN  IOCOMMAND
>4919 be/4 167  16.75 M  2.24 G  0.00 %  1.34 % ceph-mon -n
> mon.ceph03 -f --setuser ceph --setgr~lt-mon-cluster-log-to-stderr=true
> [rocksdb:low0]
>   15122 be/4 167   0.00 B652.91 M  0.00 %  0.27 % ceph-osd -n
> osd.31 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug
> [bstore_kv_sync]
>   17073 be/4 167   0.00 B651.86 M  0.00 %  0.27 % ceph-osd -n
> 

[ceph-users] CLT weekly notes October 11th 2023

2023-10-11 Thread Adam King
Here are the notes from this week's CLT call. The call focused heavily on
release process, specifically around figuring out which patches are
required for a release.


   - 17.2.7 status


   - A few more FS PRs and one core PR then we can start release process


   - Trying to finalize list of PRs needed for 18.2.1


   - General discussion about the process for getting the list of required
   PRs for a given release
  - Using per-release github milestones. E.g. a milestone specifically
  for 18.2.1 rather than just reef


   - Would require fixing some scripts that refer to the milestone


   -
  https://github.com/ceph/ceph/blob/main/src/script/backport-resolve-issue


   - For now, continue using etherpad until something more automated exists


   - Create pads a lot earlier


   - Could use existing clt call to try to finalize required PRs for
   releases


   - should be on agenda for every clt call


   - couple of build related PRs that were stalled.


   - for a while, it's not possible to build w/FIO


   - PR https://github.com/ceph/ceph/pull/53346


   - for a while, it's not possible to "make (or ninja) install" with
   dashboard disabled


   - PR https://github.com/ceph/ceph/pull/52313


   - Some more general discussion of how to get more attention for build PRs
  - Laura will start grouping some build PRs with RADOS PRs for
  build/testing in the ci


   - Can make CI builds with CMAKE_BUILD_TYPE=Debug


   - https://github.com/ceph/ceph-build/pull/2167


   - https://github.com/ceph/ceph/pull/53855#issuecomment-1751367302


   -
   
https://shaman.ceph.com/builds/ceph/wip-batrick-testing-20231006.014828-debug/cfbdc475a5ca4098c0330e42cd978c9fd647e012/
   - relies on us removing centos 8 from all testing suites and dropping
   that as a build target


   - Last Pacific?


   - Yes, 17.2.7, then 18.2.1, then 16.2.15 (final)


   - PTLs will need to go through and find what backports still need to get
   into pacific


   - A lot of open pacific backports right now


Thanks,
  - Adam King
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Eugen Block
That's really strange. Just out of curiosity, have you tried Quincy  
(and/or Reef) as well? I don't recall what inventory does in the  
background exactly, I believe Adam King mentioned that in some thread,  
maybe that can help here. I'll search for that thread tomorrow.


Zitat von Patrick Begou :


Hi Eugen,

[root@mostha1 ~]# rpm -q cephadm
cephadm-16.2.14-0.el8.noarch

Log associated to the

2023-10-11 16:16:02,167 7f820515fb80 DEBUG  


cephadm ['gather-facts']
2023-10-11 16:16:02,208 7f820515fb80 DEBUG /bin/podman: 4.4.1
2023-10-11 16:16:02,313 7f820515fb80 DEBUG sestatus: SELinux  
status: disabled
2023-10-11 16:16:02,317 7f820515fb80 DEBUG sestatus: SELinux  
status: disabled
2023-10-11 16:16:02,322 7f820515fb80 DEBUG sestatus: SELinux  
status: disabled
2023-10-11 16:16:02,326 7f820515fb80 DEBUG sestatus: SELinux  
status: disabled
2023-10-11 16:16:02,329 7f820515fb80 DEBUG sestatus: SELinux  
status: disabled
2023-10-11 16:16:02,333 7f820515fb80 DEBUG sestatus: SELinux  
status: disabled
2023-10-11 16:16:04,474 7ff2a5c08b80 DEBUG  


cephadm ['ceph-volume', 'inventory']
2023-10-11 16:16:04,516 7ff2a5c08b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:16:04,520 7ff2a5c08b80 DEBUG Using default config:  
/etc/ceph/ceph.conf
2023-10-11 16:16:04,573 7ff2a5c08b80 DEBUG /usr/bin/podman:  
0d28d71358d7,445.8MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:  
2084faaf4d54,13.27MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:  
61073c53805d,512.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:  
6b9f0b72d668,361.1MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:  
7493a28808ad,163.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:  
a89672a3accf,59.22MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:  
b45271cc9726,54.24MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:  
e00ec13ab138,707.3MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman:  
fcb1e1a6b08d,35.55MB / 50.32GB
2023-10-11 16:16:04,630 7ff2a5c08b80 DEBUG /usr/bin/podman:  
0d28d71358d7,1.28%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:  
2084faaf4d54,0.00%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:  
61073c53805d,1.19%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:  
6b9f0b72d668,1.03%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:  
7493a28808ad,0.78%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:  
a89672a3accf,0.11%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:  
b45271cc9726,1.35%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:  
e00ec13ab138,0.43%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman:  
fcb1e1a6b08d,0.02%
2023-10-11 16:16:04,634 7ff2a5c08b80 INFO Inferring fsid  
250f9864-0142-11ee-8e5f-00266cf8869c
2023-10-11 16:16:04,691 7ff2a5c08b80 DEBUG /usr/bin/podman:  
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman:  
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman:  
docker.io/ceph/ceph@sha256:056637972a107df4096f10951e4216b21fcd8ae0b9fb4552e628d35df3f61139
2023-10-11 16:16:04,694 7ff2a5c08b80 INFO Using recent ceph image  
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

2023-10-11 16:16:05,094 7ff2a5c08b80 DEBUG stat: 167 167
2023-10-11 16:16:05,903 7ff2a5c08b80 DEBUG Acquiring lock  
140679815723776 on  
/run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:16:05,903 7ff2a5c08b80 DEBUG Lock 140679815723776  
acquired on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:16:05,929 7ff2a5c08b80 DEBUG sestatus: SELinux  
status: disabled
2023-10-11 16:16:05,933 7ff2a5c08b80 DEBUG sestatus: SELinux  
status: disabled

2023-10-11 16:16:06,700 7ff2a5c08b80 DEBUG /usr/bin/podman:
2023-10-11 16:16:06,701 7ff2a5c08b80 DEBUG /usr/bin/podman: Device  
Path   Size Device nodes    rotates available  
Model name



I have only one version of cephadm in /var/lib/ceph/{fsid} :
[root@mostha1 ~]# ls -lrt  
/var/lib/ceph/250f9864-0142-11ee-8e5f-00266cf8869c/cephadm*
-rw-r--r-- 1 root root 350889 28 sept. 16:39  
/var/lib/ceph/250f9864-0142-11ee-8e5f-00266cf8869c/cephadm.f6868821c084cd9740b59c7c5eb59f0dd47f6e3b1e6fecb542cb44134ace8d78



Running " python3  
/var/lib/ceph/250f9864-0142-11ee-8e5f-00266cf8869c/cephadm.f6868821c084cd9740b59c7c5eb59f0dd47f6e3b1e6fecb542cb44134ace8d78 ceph-volume 

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Frank Schilder
Oh wow! I never bothered looking, because on our hardware the wear is so low:

# iotop -ao -bn 2 -d 300
Total DISK READ :   0.00 B/s | Total DISK WRITE :   6.46 M/s
Actual DISK READ:   0.00 B/s | Actual DISK WRITE:   6.47 M/s
TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN  IOCOMMAND
   2230 be/4 ceph  0.00 B   1818.71 M  0.00 %  0.46 % ceph-mon 
--cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01 
--mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 
[rocksdb:low0]
   2256 be/4 ceph  0.00 B 19.27 M  0.00 %  0.43 % ceph-mon 
--cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01 
--mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 
[safe_timer]
   2250 be/4 ceph  0.00 B 42.38 M  0.00 %  0.26 % ceph-mon 
--cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01 
--mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 
[fn_monstore]
   2231 be/4 ceph  0.00 B 58.36 M  0.00 %  0.01 % ceph-mon 
--cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01 
--mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 
[rocksdb:high0]
644 be/3 root  0.00 B576.00 K  0.00 %  0.00 % [jbd2/sda3-8]
   2225 be/4 ceph  0.00 B128.00 K  0.00 %  0.00 % ceph-mon 
--cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01 
--mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 [log]
1637141 be/4 root  0.00 B  0.00 B  0.00 %  0.00 % 
[kworker/u113:2-flush-8:0]
1636453 be/4 root  0.00 B  0.00 B  0.00 %  0.00 % 
[kworker/u112:0-ceph0]
   1560 be/4 root  0.00 B 20.00 K  0.00 %  0.00 % rsyslogd -n 
[in:imjournal]
   1561 be/4 root  0.00 B 56.00 K  0.00 %  0.00 % rsyslogd -n 
[rs:main Q:Reg]

1.8GB every 5 minutes, thats 518GB per day. The 400G drives we have are rated 
10DWPD and with the 6-drives RAID10 config this gives plenty of life-time. I 
guess this write load will kill any low-grade SSD (typical bood devices, even 
enterprise ones) specifically if its smaller drives and the controller doesn't 
reallocate cells according to remaining write endurance.

I guess there was a reason for the recommendations by Dell. I always thought 
that the recent recommendation for MON store storage in the ceph docs are a 
"bit unrealistic", apparently both, in size and in performance (including 
endurance). Well, I guess you need to look for write intensive drives with 
decent specs. If you do, also go for sufficient size. This will absorb 
temporary usage peaks that can be very large and also provide extra endurance 
with SSDs with good controllers.

I also think the recommendations on the ceph docs deserve a reality check.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Zakhar Kirpichenko 
Sent: Wednesday, October 11, 2023 4:30 PM
To: Eugen Block
Cc: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

Eugen,

Thanks for your response. May I ask what numbers you're referring to?

I am not referring to monitor store.db sizes. I am specifically referring to 
writes monitors do to their store.db file by frequently rotating and replacing 
them with new versions during compactions. The size of the store.db remains 
more or less the same.

This is a 300s iotop snippet, sorted by aggregated disk writes:

Total DISK READ:35.56 M/s | Total DISK WRITE:23.89 M/s
Current DISK READ:  35.64 M/s | Current DISK WRITE:  24.09 M/s
TID  PRIO  USER DISK READ DISK WRITE>  SWAPIN  IOCOMMAND
   4919 be/4 167  16.75 M  2.24 G  0.00 %  1.34 % ceph-mon -n 
mon.ceph03 -f --setuser ceph --setgr~lt-mon-cluster-log-to-stderr=true 
[rocksdb:low0]
  15122 be/4 167   0.00 B652.91 M  0.00 %  0.27 % ceph-osd -n 
osd.31 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug 
[bstore_kv_sync]
  17073 be/4 167   0.00 B651.86 M  0.00 %  0.27 % ceph-osd -n 
osd.32 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug 
[bstore_kv_sync]
  17268 be/4 167   0.00 B490.86 M  0.00 %  0.18 % ceph-osd -n 
osd.25 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug 
[bstore_kv_sync]
  18032 be/4 167   0.00 B463.57 M  0.00 %  0.17 % ceph-osd -n 
osd.26 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug 
[bstore_kv_sync]
  16855 be/4 167   0.00 B402.86 M  0.00 %  0.15 % ceph-osd -n 
osd.22 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug 
[bstore_kv_sync]
  17406 be/4 167   0.00 B387.03 M  0.00 %  0.14 % ceph-osd -n 
osd.27 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug 
[bstore_kv_sync]
  17932 be/4 167   0.00 B375.42 M  0.00 %  0.13 % ceph-osd -n 
osd.29 -f --setuser ceph 

[ceph-users] Re: CephFS: convert directory into subvolume

2023-10-11 Thread jie . zhang7
Eugon,

Thank you, however I'm am still lost.  I can create a subvolume group, that I 
understand.  The issue is '/volume/' isn't a real directory on the host, 
it's a 'virtual directory' in cephfs.  '/mnt/tank/database' is a real folder 
structure on the host.  I can't `mv /mnt/tank/database /volume/`

Re-reading the thread, is the answer basically:
1) Create group and subvolume
2) mount the subvolume onto the host and then move the data?

Or is there a more direct way to convert '/mnt/tank/database' to 
'/volume//database'

Thx!

Jie
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] What's the best practices of accessing ceph over flaky network connection?

2023-10-11 Thread nanericwang
(transferred from https://github.com/ceph/ceph-csi/discussions/4181)

> What's the best practices of accessing ceph over flaky network connection? 
> For example, can I setup a local dm-cache binding ceph with a local SSD to 
> buffer the I/O? Thanks.

A flaky network will usually be quite problematic. There is no guarantee that 
data has not been modified on an other system once the network comes back after 
a temporary interruption. In case the data was changed remotely, and something 
wrote to the local dm-cache device, a split-brain can happen. Someone needs to 
decide which side of the data is the right one.

Maybe it is possible to tune RBD for working with dm-cache and network 
interruptions. Or, for CephFS, you may want to look into FS-Cache (I don't know 
if CephFS supports that though). The best venue to ask about guidance and 
experience from others is at 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/ . The discussions in 
this GitHub project are for Ceph-CSI, which is _only_ a driver to 
provision/mount Ceph based storage.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to fix 1 Inconsistent PG

2023-10-11 Thread Siddhit Renake
Hello Wes,

Thank you for your response.

brc1admin:~ # rados list-inconsistent-obj 15.f4f
No scrub information available for pg 15.f4f

brc1admin:~ # ceph osd ok-to-stop osd.238
OSD(s) 238 are ok to stop without reducing availability or risking data, 
provided there are no other concurrent failures or interventions.
341 PGs are likely to be degraded (but remain available) as a result.

Before I proceed with your suggested action plan, needed clarification on below.
In order to list all objects residing on the inconsistent PG, we had stopped 
the primary osd (osd.238) and extracted the list of all objects residing on 
this osd using ceph-objectstore tool. We notice that that when we stop the osd 
(osd.238) using systemctl, RGW gateways continuously restarts which is 
impacting our S3 service availability. This was observed twice when we stopped 
osd.238 for general maintenance activity w.r.t ceph-objectstore tool. How can 
we ensure that stopping and marking out osd.238 ( primary osd of inconsistent 
pg) does not impact RGW service availability ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Unable to fix 1 Inconsistent PG

2023-10-11 Thread samdto987
Hello All,
Greetings. We've a Ceph Cluster with the version
*ceph version 14.2.16-402-g7d47dbaf4d
(7d47dbaf4d0960a2e910628360ae36def84ed913) nautilus (stable)


===

Issues: 1 pg in inconsistent state and does not recover.

# ceph -s
  cluster:
id: 30d6f7ee-fa02-4ab3-8a09-9321c8002794
health: HEALTH_ERR
2 large omap objects
1 pools have many more objects per pg than average
159224 scrub errors
Possible data damage: 1 pg inconsistent
2 pgs not deep-scrubbed in time
2 pgs not scrubbed in time

# ceph health detail

HEALTH_ERR 2 large omap objects; 1 pools have many more objects per pg than 
average; 159224 scrub errors; Possible data damage: 1 pg inconsistent; 2 pgs 
not deep-scrubbed in time; 2 pgs not scrubbed in time
LARGE_OMAP_OBJECTS 2 large omap objects
2 large objects found in pool 'default.rgw.log'
Search the cluster log for 'Large omap object found' for more details.
MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
pool iscsi-images objects per pg (541376) is more than 14.9829 times 
cluster average (36133)
OSD_SCRUB_ERRORS 159224 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 15.f4f is active+clean+inconsistent, acting 
[238,106,402,266,374,498,590,627,684,73,66]
PG_NOT_DEEP_SCRUBBED 2 pgs not deep-scrubbed in time
pg 1.5c not deep-scrubbed since 2021-04-05 23:20:13.714446
pg 1.55 not deep-scrubbed since 2021-04-11 07:12:37.185074
PG_NOT_SCRUBBED 2 pgs not scrubbed in time
pg 1.5c not scrubbed since 2023-07-10 21:15:50.352848
pg 1.55 not scrubbed since 2023-06-24 10:02:10.038311
 
==


We have implemented below command to resolve it

1. We have ran pg repair command "ceph pg repair 15.f4f
2. We have restarted associated  OSDs that is mapped to pg 15.f4f
3. We tuned osd_max_scrubs value and set it to 9.
4. We have done scrub and deep scrub by ceph pg scrub 15.4f4 & ceph pg 
deep-scrub 15.f4f
5. We also tried to ceph-objectstore-tool command to fix it 
==

We have checked the logs of the primary OSD of the respective inconsistent PG 
and found the below errors.
[ERR] : 15.f4fs0 shard 402(2) 
15:f2f3fff4:::94a51ddb-a94f-47bc-9068-509e8c09af9a.7862003.20_c%2f4%2fd61%2f885%2f49627697%2f192_1.ts:head
 : missing
/var/log/ceph/ceph-osd.238.log:339:2023-10-06 00:37:06.410 7f65024cb700 -1 
log_channel(cluster) log [ERR] : 15.f4fs0 shard 266(3) 
15:f2f2:::94a51ddb-a94f-47bc-9068-509e8c09af9a.11432468.3_TN8QHE_04.20.2020_08.41%2fCV_MAGNETIC%2fV_274396%2fCHUNK_2440801%2fSFILE_CONTAINER_031.FOLDER%2f3:head
 : missing
/var/log/ceph/ceph-osd.238.log:340:2023-10-06 00:37:06.410 7f65024cb700 -1 
log_channel(cluster) log [ERR] : 15.f4fs0 shard 402(2) 
15:f2f2:::94a51ddb-a94f-47bc-9068-509e8c09af9a.11432468.3_TN8QHE_04.20.2020_08.41%2fCV_MAGNETIC%2fV_274396%2fCHUNK_2440801%2fSFILE_CONTAINER_031.FOLDER%2f3:head
 : missing
/var/log/ceph/ceph-osd.238.log:341:2023-10-06 00:37:06.410 7f65024cb700 -1 
log_channel(cluster) log [ERR] : 15.f4fs0 shard 590(6) 
15:f2f2:::94a51ddb-a94f-47bc-9068-509e8c09af9a.11432468.3_TN8QHE_04.20.2020_08.41%2fCV_MAGNETIC%2fV_274396%2fCHUNK_2440801%2fSFILE_CONTAINER_031.FOLDER%2f3:head
 : missing
===
and also we noticed that the no. of scrub errors in ceph health status are 
matching with the ERR log entries in the primary OSD logs of the inconsistent 
PG as below
grep -Hn 'ERR' /var/log/ceph/ceph-osd.238.log|wc -l
159226

Ceph is cleaning the scrub errors but rate of scrub repair is very slow (avg of 
200 scrub errors per day) ,we want to increase the rate of scrub error repair 
to finish the cleanup of pending 159224 scrub errors.

#ceph pg 15.f4f query


{
"state": "active+clean+inconsistent",
"snap_trimq": "[]",
"snap_trimq_len": 0,
"epoch": 409009,
"up": [
238,
106,
402,
266,
374,
498,
590,
627,
684,
73,
66
],
"acting": [
238,
106,
402,
266,
374,
498,
590,
627,
684,
73,
66
],
"acting_recovery_backfill": [
"66(10)",
"73(9)",
"106(1)",
"238(0)",
"266(3)",
"374(4)",
"402(2)",
"498(5)",
"590(6)",
"627(7)",
"684(8)"
],
"info": {
"pgid": "15.f4fs0",
"last_update": "409009'7998",
"last_complete": "409009'7998",
"log_tail": "382701'4900",
"last_user_version": 592883,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": [],
"history": {
"epoch_created": 19813,
"epoch_pool_created": 16141,
"last_epoch_started": 407097,

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
Eugen,

Thanks for your response. May I ask what numbers you're referring to?

I am not referring to monitor store.db sizes. I am specifically referring
to writes monitors do to their store.db file by frequently rotating and
replacing them with new versions during compactions. The size of the
store.db remains more or less the same.

This is a 300s iotop snippet, sorted by aggregated disk writes:

Total DISK READ:35.56 M/s | Total DISK WRITE:23.89 M/s
Current DISK READ:  35.64 M/s | Current DISK WRITE:  24.09 M/s
TID  PRIO  USER DISK READ DISK WRITE>  SWAPIN  IOCOMMAND
   4919 be/4 167  16.75 M  2.24 G  0.00 %  1.34 % ceph-mon -n
mon.ceph03 -f --setuser ceph --setgr~lt-mon-cluster-log-to-stderr=true
[rocksdb:low0]
  15122 be/4 167   0.00 B652.91 M  0.00 %  0.27 % ceph-osd -n
osd.31 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug
[bstore_kv_sync]
  17073 be/4 167   0.00 B651.86 M  0.00 %  0.27 % ceph-osd -n
osd.32 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug
[bstore_kv_sync]
  17268 be/4 167   0.00 B490.86 M  0.00 %  0.18 % ceph-osd -n
osd.25 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug
[bstore_kv_sync]
  18032 be/4 167   0.00 B463.57 M  0.00 %  0.17 % ceph-osd -n
osd.26 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug
[bstore_kv_sync]
  16855 be/4 167   0.00 B402.86 M  0.00 %  0.15 % ceph-osd -n
osd.22 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug
[bstore_kv_sync]
  17406 be/4 167   0.00 B387.03 M  0.00 %  0.14 % ceph-osd -n
osd.27 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug
[bstore_kv_sync]
  17932 be/4 167   0.00 B375.42 M  0.00 %  0.13 % ceph-osd -n
osd.29 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug
[bstore_kv_sync]
  18017 be/4 167   0.00 B359.38 M  0.00 %  0.13 % ceph-osd -n
osd.28 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug
[bstore_kv_sync]
  17420 be/4 167   0.00 B332.83 M  0.00 %  0.12 % ceph-osd -n
osd.23 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug
[bstore_kv_sync]
  17975 be/4 167   0.00 B312.06 M  0.00 %  0.11 % ceph-osd -n
osd.30 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug
[bstore_kv_sync]
  17273 be/4 167   0.00 B303.49 M  0.00 %  0.11 % ceph-osd -n
osd.24 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug
[bstore_kv_sync]

Not a good example, because sometimes mon writes more intensively, but it
is very apparent that thread 4919 of the monitor process is the top disk
writer in the system.

This is the mon thread producing lots of writes:

   4919 167   20   0 2031116   1.1g  10652 S   0.0   0.3 288:48.65
rocksdb:low0

Then with a combination of lsof and sysdig I determine that the writes are
being made to /var/lib/ceph/mon/ceph-ceph03/store.db/*.sst, i.e. the mon's
rocksdb store:

ceph-mon 4838  167  200r  REG 253,11 67319253 14812899
/var/lib/ceph/mon/ceph-ceph03/store.db/3677146.sst
ceph-mon 4838  167  203r  REG 253,11 67228736 14813270
/var/lib/ceph/mon/ceph-ceph03/store.db/3677147.sst
ceph-mon 4838  167  205r  REG 253,11 67243212 14813275
/var/lib/ceph/mon/ceph-ceph03/store.db/3677148.sst
ceph-mon 4838  167  208r  REG 253,11 67247953 14813316
/var/lib/ceph/mon/ceph-ceph03/store.db/3677149.sst
ceph-mon 4838  167  220r  REG 253,11 67261659 14813332
/var/lib/ceph/mon/ceph-ceph03/store.db/3677150.sst
ceph-mon 4838  167  221r  REG 253,11 67242500 14813345
/var/lib/ceph/mon/ceph-ceph03/store.db/3677151.sst
ceph-mon 4838  167  224r  REG 253,11 67264969 14813348
/var/lib/ceph/mon/ceph-ceph03/store.db/3677152.sst
ceph-mon 4838  167  228r  REG 253,11 64346933 14813381
/var/lib/ceph/mon/ceph-ceph03/store.db/3677153.sst

By matching iotop and sysdig write records to mon's log entries, I see that
the writes happen during "manual compaction" events - whatever they are,
because there's no documentation on this whatsoever, and each time around
0.56GB is being written to disk to a new set of *.sst files, which is the
total size of the store.db. Looks like from time to time the monitor just
reads its store.db and writes it out to a new set of files, as the file
names "numbers" increase with each write:

ceph-mon 4838  167  175r  REG 253,11 67220863 14812310
/var/lib/ceph/mon/ceph-ceph03/store.db/3677167.sst
ceph-mon 4838  167  200r  REG 253,11 67358627 14812899
/var/lib/ceph/mon/ceph-ceph03/store.db/3677168.sst
ceph-mon 4838  167  203r  REG 253,11 67277978 14813270
/var/lib/ceph/mon/ceph-ceph03/store.db/3677169.sst
ceph-mon 4838  167  205r  REG 253,11 67256312 14813275

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou

Hi Eugen,

[root@mostha1 ~]# rpm -q cephadm
cephadm-16.2.14-0.el8.noarch

Log associated to the

2023-10-11 16:16:02,167 7f820515fb80 DEBUG 


cephadm ['gather-facts']
2023-10-11 16:16:02,208 7f820515fb80 DEBUG /bin/podman: 4.4.1
2023-10-11 16:16:02,313 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,317 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,322 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,326 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,329 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,333 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:04,474 7ff2a5c08b80 DEBUG 


cephadm ['ceph-volume', 'inventory']
2023-10-11 16:16:04,516 7ff2a5c08b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:16:04,520 7ff2a5c08b80 DEBUG Using default config: 
/etc/ceph/ceph.conf
2023-10-11 16:16:04,573 7ff2a5c08b80 DEBUG /usr/bin/podman: 
0d28d71358d7,445.8MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
2084faaf4d54,13.27MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
61073c53805d,512.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,361.1MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
7493a28808ad,163.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
a89672a3accf,59.22MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
b45271cc9726,54.24MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
e00ec13ab138,707.3MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
fcb1e1a6b08d,35.55MB / 50.32GB
2023-10-11 16:16:04,630 7ff2a5c08b80 DEBUG /usr/bin/podman: 
0d28d71358d7,1.28%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
2084faaf4d54,0.00%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
61073c53805d,1.19%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,1.03%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
7493a28808ad,0.78%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
a89672a3accf,0.11%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
b45271cc9726,1.35%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
e00ec13ab138,0.43%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
fcb1e1a6b08d,0.02%
2023-10-11 16:16:04,634 7ff2a5c08b80 INFO Inferring fsid 
250f9864-0142-11ee-8e5f-00266cf8869c
2023-10-11 16:16:04,691 7ff2a5c08b80 DEBUG /usr/bin/podman: 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman: 
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman: 
docker.io/ceph/ceph@sha256:056637972a107df4096f10951e4216b21fcd8ae0b9fb4552e628d35df3f61139
2023-10-11 16:16:04,694 7ff2a5c08b80 INFO Using recent ceph image 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

2023-10-11 16:16:05,094 7ff2a5c08b80 DEBUG stat: 167 167
2023-10-11 16:16:05,903 7ff2a5c08b80 DEBUG Acquiring lock 
140679815723776 on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:16:05,903 7ff2a5c08b80 DEBUG Lock 140679815723776 acquired 
on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:16:05,929 7ff2a5c08b80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:05,933 7ff2a5c08b80 DEBUG sestatus: SELinux 
status: disabled

2023-10-11 16:16:06,700 7ff2a5c08b80 DEBUG /usr/bin/podman:
2023-10-11 16:16:06,701 7ff2a5c08b80 DEBUG /usr/bin/podman: Device 
Path   Size Device nodes    rotates available Model name



I have only one version of cephadm in /var/lib/ceph/{fsid} :
[root@mostha1 ~]# ls -lrt 
/var/lib/ceph/250f9864-0142-11ee-8e5f-00266cf8869c/cephadm*
-rw-r--r-- 1 root root 350889 28 sept. 16:39 
/var/lib/ceph/250f9864-0142-11ee-8e5f-00266cf8869c/cephadm.f6868821c084cd9740b59c7c5eb59f0dd47f6e3b1e6fecb542cb44134ace8d78



Running " python3 
/var/lib/ceph/250f9864-0142-11ee-8e5f-00266cf8869c/cephadm.f6868821c084cd9740b59c7c5eb59f0dd47f6e3b1e6fecb542cb44134ace8d78 
ceph-volume inventory" give the same output and the same log (execpt the 
valu of the lock):


2023-10-11 16:21:35,965 7f467cf31b80 DEBUG 


cephadm ['ceph-volume', 'inventory']
2023-10-11 16:21:36,009 7f467cf31b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:21:36,012 7f467cf31b80 

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Eugen Block
Can you check which cephadm version is installed on the host? And then  
please add (only the relevant) output from the cephadm.log when you  
run the inventory (without the --image ). Sometimes the  
version mismatch on the host and the one the orchestrator uses can  
cause some disruptions. You could try the same with the latest cephadm  
you have in /var/lib/ceph/${fsid}/ (ls -lrt  
/var/lib/ceph/${fsid}/cephadm.*). I mentioned that in this thread [1].  
So you could try the following:


$ chmod +x /var/lib/ceph/{fsid}/cephadm.{latest}

$ python3 /var/lib/ceph/{fsid}/cephadm.{latest} ceph-volume inventory

Does the output differ? Paste the relevant cephadm.log from that  
attempt as well.


[1]  
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LASBJCSPFGDYAWPVE2YLV2ZLF3HC5SLS/


Zitat von Patrick Begou :


Hi Eugen,

first many thanks for the time spent on this problem.

"ceph osd purge 2 --force --yes-i-really-mean-it" works and clean  
all the bas status.


*[root@mostha1 ~]# cephadm shell
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image  
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

*
*
*[ceph: root@mostha1 /]# ceph osd purge 2 --force --yes-i-really-mean-it *
purged osd.2
*
*
*[ceph: root@mostha1 /]# ceph osd tree*
ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT  PRI-AFF
-1 1.72823  root default
-5 0.45477  host dean
 0    hdd  0.22739  osd.0 up   1.0  1.0
 4    hdd  0.22739  osd.4 up   1.0  1.0
-9 0.22739  host ekman
 6    hdd  0.22739  osd.6 up   1.0  1.0
-7 0.45479  host mostha1
 5    hdd  0.45479  osd.5 up   1.0  1.0
-3 0.59128  host mostha2
 1    hdd  0.22739  osd.1 up   1.0  1.0
 3    hdd  0.36389  osd.3 up   1.0  1.0
*
*
*[ceph: root@mostha1 /]# lsblk*
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda 8:0    1 232.9G  0 disk
|-sda1 8:1    1   3.9G  0 part /rootfs/boot
|-sda2 8:2    1   3.9G  0 part [SWAP]
`-sda3 8:3    1   225G  0 part
|-al8vg-rootvol 253:0    0  48.8G  0 lvm  /rootfs
|-al8vg-homevol 253:2    0   9.8G  0 lvm  /rootfs/home
|-al8vg-tmpvol 253:3    0   9.8G  0 lvm  /rootfs/tmp
`-al8vg-varvol 253:4    0  19.8G  0 lvm  /rootfs/var
sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 253:1    0 465.8G  0  
lvm

sdc 8:32   1 232.9G  0 disk

"cephadm ceph-volume inventory" returns nothing:

*[root@mostha1 ~]# cephadm ceph-volume inventory **
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image  
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e


Device Path   Size Device nodes    rotates  
available Model name


[root@mostha1 ~]#

But running the same command within cephadm 15.2.17 works:

*[root@mostha1 ~]# cephadm --image 93146564743f ceph-volume inventory*
Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True  SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG HE253GJ
/dev/sdb  465.76 GB    True    False WDC WD5003ABYX-1

[root@mostha1 ~]#

*[root@mostha1 ~]# podman images -a**
*REPOSITORY    TAG IMAGE ID CREATED    SIZE
quay.io/ceph/ceph v16.2.14    f13d80acdbb5  2 weeks  
ago    1.21 GB
quay.io/ceph/ceph v15.2.17    93146564743f  14  
months ago  1.24 GB




Patrick

Le 11/10/2023 à 15:14, Eugen Block a écrit :
Your response is a bit confusing since it seems to be mixed up with  
the previous answer. So you still need to remove the OSD properly,  
so purge it from the crush tree:


ceph osd purge 2 --force --yes-i-really-mean-it (only in a test cluster!)

If everything is clean (OSD has been removed, disk has been zapped,  
lsblk shows no LVs for that disk) you can check the inventory:


cephadm ceph-volume inventory

Please also add the output of 'ceph orch ls osd --export'.

Zitat von Patrick Begou :


Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a  
strange final state. I was connected on host mostha1 where  
/dev/sdc was not reconized. These are the steps followed based on  
the ceph-volume documentation I've read:

[root@mostha1 ~]# cephadm shell
[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd >  
/var/lib/ceph/bootstrap-osd/ceph.keyring

[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data /dev/sdc

Now lsblk command shows sdc as an osd:

sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 253:1    0 465.8G  0  

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou

Hi Eugen,

first many thanks for the time spent on this problem.

"ceph osd purge 2 --force --yes-i-really-mean-it" works and clean all 
the bas status.


*[root@mostha1 ~]# cephadm shell
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

*
*
*[ceph: root@mostha1 /]# ceph osd purge 2 --force --yes-i-really-mean-it *
purged osd.2
*
*
*[ceph: root@mostha1 /]# ceph osd tree*
ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT  PRI-AFF
-1 1.72823  root default
-5 0.45477  host dean
 0    hdd  0.22739  osd.0 up   1.0  1.0
 4    hdd  0.22739  osd.4 up   1.0  1.0
-9 0.22739  host ekman
 6    hdd  0.22739  osd.6 up   1.0  1.0
-7 0.45479  host mostha1
 5    hdd  0.45479  osd.5 up   1.0  1.0
-3 0.59128  host mostha2
 1    hdd  0.22739  osd.1 up   1.0  1.0
 3    hdd  0.36389  osd.3 up   1.0  1.0
*
*
*[ceph: root@mostha1 /]# lsblk*
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda 8:0    1 232.9G  0 disk
|-sda1 8:1    1   3.9G  0 part /rootfs/boot
|-sda2 8:2    1   3.9G  0 part [SWAP]
`-sda3 8:3    1   225G  0 part
|-al8vg-rootvol 253:0    0  48.8G  0 lvm  /rootfs
|-al8vg-homevol 253:2    0   9.8G  0 lvm  /rootfs/home
|-al8vg-tmpvol 253:3    0   9.8G  0 lvm  /rootfs/tmp
`-al8vg-varvol 253:4    0  19.8G  0 lvm  /rootfs/var
sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 
253:1    0 465.8G  0 lvm

sdc 8:32   1 232.9G  0 disk

"cephadm ceph-volume inventory" returns nothing:

*[root@mostha1 ~]# cephadm ceph-volume inventory **
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e


Device Path   Size Device nodes    rotates available 
Model name


[root@mostha1 ~]#

But running the same command within cephadm 15.2.17 works:

*[root@mostha1 ~]# cephadm --image 93146564743f ceph-volume inventory*
Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True  SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG HE253GJ
/dev/sdb  465.76 GB    True    False WDC WD5003ABYX-1

[root@mostha1 ~]#

*[root@mostha1 ~]# podman images -a**
*REPOSITORY    TAG IMAGE ID CREATED    SIZE
quay.io/ceph/ceph v16.2.14    f13d80acdbb5  2 weeks 
ago    1.21 GB
quay.io/ceph/ceph v15.2.17    93146564743f  14 months 
ago  1.24 GB




Patrick

Le 11/10/2023 à 15:14, Eugen Block a écrit :
Your response is a bit confusing since it seems to be mixed up with 
the previous answer. So you still need to remove the OSD properly, so 
purge it from the crush tree:


ceph osd purge 2 --force --yes-i-really-mean-it (only in a test cluster!)

If everything is clean (OSD has been removed, disk has been zapped, 
lsblk shows no LVs for that disk) you can check the inventory:


cephadm ceph-volume inventory

Please also add the output of 'ceph orch ls osd --export'.

Zitat von Patrick Begou :


Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a 
strange final state. I was connected on host mostha1 where /dev/sdc 
was not reconized. These are the steps followed based on the 
ceph-volume documentation I've read:

[root@mostha1 ~]# cephadm shell
[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd > 
/var/lib/ceph/bootstrap-osd/ceph.keyring
[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data 
/dev/sdc


Now lsblk command shows sdc as an osd:

sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 
253:1    0 465.8G  0 lvm

sdc 8:32   1 232.9G  0 disk
`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 
253:5    0 232.8G  0 lvm


Then I've tried to activate this osd but it fails as in podman I have 
not access to systemctl:


[ceph: root@mostha1 /]# ceph-volume lvm activate 2 
45c8e92c-caf9-4fe7-9a42-7b45a0794632

.
Running command: /usr/bin/systemctl start ceph-osd@2
 stderr: Failed to connect to bus: No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1
[ceph: root@mostha1 /]# ceph osd tree

And now I have now I have a strange status for this osd.2:

[ceph: root@mostha1 /]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT  PRI-AFF
-1 1.72823  root default
-5 0.45477  host dean
 0    hdd  0.22739  osd.0 up   

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Eugen Block
That all looks normal to me, to be honest. Can you show some details  
how you calculate the "hundreds of GB per day"? I see similar stats as  
Frank on different clusters with different client IO.


Zitat von Zakhar Kirpichenko :


Sure, nothing unusual there:

---

  cluster:
id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86
health: HEALTH_OK

  services:
mon: 5 daemons, quorum ceph01,ceph03,ceph04,ceph05,ceph02 (age 2w)
mgr: ceph01.vankui(active, since 12d), standbys: ceph02.shsinf
osd: 96 osds: 96 up (since 2w), 95 in (since 3w)

  data:
pools:   10 pools, 2400 pgs
objects: 6.23M objects, 16 TiB
usage:   61 TiB used, 716 TiB / 777 TiB avail
pgs: 2396 active+clean
 3active+clean+scrubbing+deep
 1active+clean+scrubbing

  io:
client:   2.7 GiB/s rd, 27 MiB/s wr, 46.95k op/s rd, 2.17k op/s wr

---

Please disregard the big read number, a customer is running a
read-intensive job. Mon store writes keep happening when the cluster is
much more quiet, thus I think that intensive reads have no effect on the
mons.

Mgr:

"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator",
"pg_autoscaler",
"progress",
"rbd_support",
"status",
"telemetry",
"volumes"
],
"enabled_modules": [
"cephadm",
"dashboard",
"iostat",
"prometheus",
"restful"
],

---

/Z


On Wed, 11 Oct 2023 at 14:50, Eugen Block  wrote:


Can you add some more details as requested by Frank? Which mgr modules
are enabled? What's the current 'ceph -s' output?

> Is autoscaler running and doing stuff?
> Is balancer running and doing stuff?
> Is backfill going on?
> Is recovery going on?
> Is your ceph version affected by the "excessive logging to MON
> store" issue that was present starting with pacific but should have
> been addressed


Zitat von Zakhar Kirpichenko :

> We don't use CephFS at all and don't have RBD snapshots apart from some
> cloning for Openstack images.
>
> The size of mon stores isn't an issue, it's < 600 MB. But it gets
> overwritten often causing lots of disk writes, and that is an issue for
us.
>
> /Z
>
> On Wed, 11 Oct 2023 at 14:37, Eugen Block  wrote:
>
>> Do you use many snapshots (rbd or cephfs)? That can cause a heavy
>> monitor usage, we've seen large mon stores on  customer clusters with
>> rbd mirroring on snapshot basis. In a healthy cluster they have mon
>> stores of around 2GB in size.
>>
>> >> @Eugen: Was there not an option to limit logging to the MON store?
>>
>> I don't recall at the moment, worth checking tough.
>>
>> Zitat von Zakhar Kirpichenko :
>>
>> > Thank you, Frank.
>> >
>> > The cluster is healthy, operating normally, nothing unusual is going
on.
>> We
>> > observe lots of writes by mon processes into mon rocksdb stores,
>> > specifically:
>> >
>> > /var/lib/ceph/mon/ceph-cephXX/store.db:
>> > 65M 3675511.sst
>> > 65M 3675512.sst
>> > 65M 3675513.sst
>> > 65M 3675514.sst
>> > 65M 3675515.sst
>> > 65M 3675516.sst
>> > 65M 3675517.sst
>> > 65M 3675518.sst
>> > 62M 3675519.sst
>> >
>> > The site of the files is not huge, but monitors rotate and write out
>> these
>> > files often, sometimes several times per minute, resulting in lots of
>> data
>> > written to disk. The writes coincide with "manual compaction" events
>> logged
>> > by the monitors, for example:
>> >
>> > debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb:
>> > [compaction/compaction_job.cc:1676] [default] [JOB 70854] Compacting
1@5
>> +
>> > 9@6 files to L6, score -1.00
>> > debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb:
EVENT_LOG_v1
>> > {"time_micros": 1697022610487624, "job": 70854, "event":
>> > "compaction_started", "compaction_reason": "ManualCompaction",
>> "files_L5":
>> > [3675543], "files_L6": [3675533, 3675534, 3675535, 3675536, 3675537,
>> > 3675538, 3675539, 3675540, 3675541], "score": -1, "input_data_size":
>> > 601117031}
>> > debug 2023-10-11T11:10:10.619+ 7f48a3a9b700  4 rocksdb:
>> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated
table
>> > #3675544: 2015 keys, 67287115 bytes
>> > debug 2023-10-11T11:10:10.763+ 7f48a3a9b700  4 rocksdb:
>> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated
table
>> > #3675545: 24343 keys, 67336225 bytes
>> > debug 2023-10-11T11:10:10.899+ 7f48a3a9b700  4 rocksdb:
>> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated
table
>> > #3675546: 1196 keys, 67225813 bytes
>> > debug 2023-10-11T11:10:11.035+ 7f48a3a9b700  4 rocksdb:
>> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated
table
>> > #3675547: 1049 keys, 67252678 bytes
>> > debug 2023-10-11T11:10:11.167+ 7f48a3a9b700  4 rocksdb:
>> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated
table
>> > #3675548: 1081 keys, 67216638 bytes
>> > 

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Eugen Block
Your response is a bit confusing since it seems to be mixed up with  
the previous answer. So you still need to remove the OSD properly, so  
purge it from the crush tree:


ceph osd purge 2 --force --yes-i-really-mean-it (only in a test cluster!)

If everything is clean (OSD has been removed, disk has been zapped,  
lsblk shows no LVs for that disk) you can check the inventory:


cephadm ceph-volume inventory

Please also add the output of 'ceph orch ls osd --export'.

Zitat von Patrick Begou :


Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a  
strange final state. I was connected on host mostha1 where /dev/sdc  
was not reconized. These are the steps followed based on the  
ceph-volume documentation I've read:

[root@mostha1 ~]# cephadm shell
[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd >  
/var/lib/ceph/bootstrap-osd/ceph.keyring

[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data /dev/sdc

Now lsblk command shows sdc as an osd:

sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 253:1    0 465.8G  0  
lvm

sdc 8:32   1 232.9G  0 disk
`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 253:5    0 232.8G  0  
lvm


Then I've tried to activate this osd but it fails as in podman I  
have not access to systemctl:


[ceph: root@mostha1 /]# ceph-volume lvm activate 2  
45c8e92c-caf9-4fe7-9a42-7b45a0794632

.
Running command: /usr/bin/systemctl start ceph-osd@2
 stderr: Failed to connect to bus: No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1
[ceph: root@mostha1 /]# ceph osd tree

And now I have now I have a strange status for this osd.2:

[ceph: root@mostha1 /]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT  PRI-AFF
-1 1.72823  root default
-5 0.45477  host dean
 0    hdd  0.22739  osd.0 up   1.0  1.0
 4    hdd  0.22739  osd.4 up   1.0  1.0
-9 0.22739  host ekman
 6    hdd  0.22739  osd.6 up   1.0  1.0
-7 0.45479  host mostha1
 5    hdd  0.45479  osd.5 up   1.0  1.0
-3 0.59128  host mostha2
 1    hdd  0.22739  osd.1 up   1.0  1.0
 3    hdd  0.36389  osd.3 up   1.0  1.0
 2   0  osd.2   down 0  1.0

I've tried to destroy the osd as you suggest but even if the command  
returns no error I still have this osd even if "lsblk" do not show  
any more /dev/sdc as a ceph osd device.


*[ceph: root@mostha1 /]# ceph-volume lvm zap --destroy /dev/sdc**
*--> Zapping: /dev/sdc
--> Zapping lvm member /dev/sdc. lv_path is  
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632

--> Unmounting /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/umount -v /var/lib/ceph/osd/ceph-2
 stderr: umount: /var/lib/ceph/osd/ceph-2 unmounted
Running command: /usr/bin/dd if=/dev/zero  
of=/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 bs=1M count=10  
conv=fsync

 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.575633 s, 18.2 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group  
ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt  
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net  
--uts=/rootfs/proc/1/ns/uts /sbin/vgremove -v -f  
ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c
 stderr: Removing  
ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632  
(253:1)
 stderr: Releasing logical volume  
"osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632"
 stderr: Archiving volume group  
"ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" metadata (seqno 5).
 stdout: Logical volume  
"osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632" successfully removed.
 stderr: Removing physical volume "/dev/sdc" from volume group  
"ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c"
 stdout: Volume group "ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c"  
successfully removed
 stderr: Creating volume group backup  
"/etc/lvm/backup/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" (seqno 6).
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt  
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net  
--uts=/rootfs/proc/1/ns/uts /sbin/pvremove -v -f -f /dev/sdc

 stdout: Labels on physical volume "/dev/sdc" successfully wiped.
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10  
conv=fsync

 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.590652 s, 17.8 MB/s
*--> Zapping successful for: *
*
*
*[ceph: root@mostha1 /]# ceph osd tree**
*ID  CLASS  WEIGHT   TYPE NAME STATUS  

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou

Hi Eugen,

sorry for posting twice, my zimbra server returns an error at the first 
attempt.


My initial problem is that ceph cannot detect these HDD since Pacific.
So I have deployed  Octopus, where "ceph orch apply osd 
--all-available-devices" works fine and then upgraded to Pacific.
But during the upgrate, 2 OSD went to "out" and "down" and I'm looking 
for a solution to manually re-integrate these 2 HDD in the cluster as 
Pacific is not able to do this automatically with "ceph orch..."  like 
Octopus.
But it is a test cluster to understand and get basic knowledge of Ceph  
(and I'm allowed to break everything).


Patrick


Le 11/10/2023 à 14:35, Eugen Block a écrit :
Don't use ceph-volume manually to deploy OSDs if your cluster is 
managed by cephadm. I just wanted to point out that you hadn't wiped 
the disk properly to be able to re-use it. Let the orchestrator handle 
the OSD creation and activation. I recommend to remove the OSD again, 
wipe it properly (cephadm ceph-volume lvm zap --destroy /dev/sdc) and 
then let the orchestrator add it as an OSD. Depending on your 
drivegroup configuration it will happen automatically (if 
"all-available-devices" is enabled or your osd specs are already 
applied). If it doesn't happen automatically, deploy it with 'ceph 
orch daemon add osd **:**' [1].


[1] https://docs.ceph.com/en/quincy/cephadm/services/osd/#deploy-osds

Zitat von Patrick Begou :


Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a 
strange final state. I was connected on host mostha1 where /dev/sdc 
was not reconized. These are the steps followed based on the 
ceph-volume documentation I've read:


   *[root@mostha1 ~]# cephadm shell**
   **[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd >
   /var/lib/ceph/bootstrap-osd/ceph.keyring**
   **[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data
   /dev/sdc**
   *
   *[ceph: root@mostha1 /]# ceph-volume lvm list


   == osd.2 ===

   *  [block]
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632

      block device
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632
      block uuid Pq0XeH-LJct-t4yH-f56F-d5jk-JzGQ-zITfhE
      cephx lockbox secret
      cluster fsid 250f9864-0142-11ee-8e5f-00266cf8869c
      cluster name  ceph
      crush device class
      encrypted 0
   *osd fsid 45c8e92c-caf9-4fe7-9a42-7b45a0794632*
      osd id    2
      osdspec affinity
      type  block
      vdo   0
   *  devices   /dev/sdc

   *


Now lsblk command shows sdc as an osd:

sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 
253:1    0 465.8G  0 lvm

*sdc 8:32   1 232.9G  0 disk **
**`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 
253:5    0 232.8G  0 lvm **

*

But this osd.2 is "down" and "out" with a strange status (no related 
cluster host, no weight) and I cannot activate it as within the 
podman container systemctl is not working.


   [ceph: root@mostha1 /]# ceph osd tree
   ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT PRI-AFF
   -1 1.72823  root default
   -5 0.45477  host dean
     0    hdd  0.22739  osd.0 up   1.0 1.0
     4    hdd  0.22739  osd.4 up   1.0 1.0
   -9 0.22739  host ekman
     6    hdd  0.22739  osd.6 up   1.0 1.0
   -7 0.45479  host mostha1
     5    hdd  0.45479  osd.5 up   1.0 1.0
   -3 0.59128  host mostha2
     1    hdd  0.22739  osd.1 up   1.0 1.0
     3    hdd  0.36389  osd.3 up   1.0 1.0
   *2   0  osd.2   down 0 1.0*

My attempt to activate the osd:

[ceph: root@mostha1 /]# ceph-volume lvm activate 2 
45c8e92c-caf9-4fe7-9a42-7b45a0794632

Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph 
prime-osd-dir --dev 
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 
--path /var/lib/ceph/osd/ceph-2 --no-mon-config
Running command: /usr/bin/ln -snf 
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 
/var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -h ceph:ceph 
/var/lib/ceph/osd/ceph-2/block

Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/systemctl enable 

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou

Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a strange 
final state. I was connected on host mostha1 where /dev/sdc was not 
reconized. These are the steps followed based on the ceph-volume 
documentation I've read:

[root@mostha1 ~]# cephadm shell
[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd > 
/var/lib/ceph/bootstrap-osd/ceph.keyring

[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data /dev/sdc

Now lsblk command shows sdc as an osd:

sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 
253:1    0 465.8G  0 lvm

sdc 8:32   1 232.9G  0 disk
`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 
253:5    0 232.8G  0 lvm


Then I've tried to activate this osd but it fails as in podman I have 
not access to systemctl:


[ceph: root@mostha1 /]# ceph-volume lvm activate 2 
45c8e92c-caf9-4fe7-9a42-7b45a0794632

.
Running command: /usr/bin/systemctl start ceph-osd@2
 stderr: Failed to connect to bus: No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1
[ceph: root@mostha1 /]# ceph osd tree

And now I have now I have a strange status for this osd.2:

[ceph: root@mostha1 /]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT  PRI-AFF
-1 1.72823  root default
-5 0.45477  host dean
 0    hdd  0.22739  osd.0 up   1.0  1.0
 4    hdd  0.22739  osd.4 up   1.0  1.0
-9 0.22739  host ekman
 6    hdd  0.22739  osd.6 up   1.0  1.0
-7 0.45479  host mostha1
 5    hdd  0.45479  osd.5 up   1.0  1.0
-3 0.59128  host mostha2
 1    hdd  0.22739  osd.1 up   1.0  1.0
 3    hdd  0.36389  osd.3 up   1.0  1.0
 2   0  osd.2   down 0  1.0

I've tried to destroy the osd as you suggest but even if the command 
returns no error I still have this osd even if "lsblk" do not show any 
more /dev/sdc as a ceph osd device.


*[ceph: root@mostha1 /]# ceph-volume lvm zap --destroy /dev/sdc**
*--> Zapping: /dev/sdc
--> Zapping lvm member /dev/sdc. lv_path is 
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632

--> Unmounting /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/umount -v /var/lib/ceph/osd/ceph-2
 stderr: umount: /var/lib/ceph/osd/ceph-2 unmounted
Running command: /usr/bin/dd if=/dev/zero 
of=/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 
bs=1M count=10 conv=fsync

 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.575633 s, 18.2 MB/s
--> Only 1 LV left in VG, will proceed to destroy volume group 
ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt 
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net 
--uts=/rootfs/proc/1/ns/uts /sbin/vgremove -v -f 
ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c
 stderr: Removing 
ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 
(253:1)
 stderr: Releasing logical volume 
"osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632"
 stderr: Archiving volume group 
"ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" metadata (seqno 5).
 stdout: Logical volume 
"osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632" successfully removed.
 stderr: Removing physical volume "/dev/sdc" from volume group 
"ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c"
 stdout: Volume group "ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" 
successfully removed
 stderr: Creating volume group backup 
"/etc/lvm/backup/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c" (seqno 6).
Running command: nsenter --mount=/rootfs/proc/1/ns/mnt 
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net 
--uts=/rootfs/proc/1/ns/uts /sbin/pvremove -v -f -f /dev/sdc

 stdout: Labels on physical volume "/dev/sdc" successfully wiped.
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10 
conv=fsync

 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.590652 s, 17.8 MB/s
*--> Zapping successful for: *
*
*
*[ceph: root@mostha1 /]# ceph osd tree**
*ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT PRI-AFF
-1 1.72823  root default
-5 0.45477  host dean
 0    hdd  0.22739  osd.0 up   1.0  1.0
 4    hdd  0.22739  osd.4 up   1.0  1.0
-9 0.22739  host ekman
 6    hdd  0.22739  osd.6 up   1.0  1.0
-7 0.45479  host mostha1
 5    hdd  0.45479  osd.5 up   1.0  1.0
-3 0.59128  host mostha2
 1    hdd  0.22739  osd.1 up   1.0  1.0
 3    hdd  0.36389 

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Eugen Block
Don't use ceph-volume manually to deploy OSDs if your cluster is  
managed by cephadm. I just wanted to point out that you hadn't wiped  
the disk properly to be able to re-use it. Let the orchestrator handle  
the OSD creation and activation. I recommend to remove the OSD again,  
wipe it properly (cephadm ceph-volume lvm zap --destroy /dev/sdc) and  
then let the orchestrator add it as an OSD. Depending on your  
drivegroup configuration it will happen automatically (if  
"all-available-devices" is enabled or your osd specs are already  
applied). If it doesn't happen automatically, deploy it with 'ceph  
orch daemon add osd **:**' [1].


[1] https://docs.ceph.com/en/quincy/cephadm/services/osd/#deploy-osds

Zitat von Patrick Begou :


Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a  
strange final state. I was connected on host mostha1 where /dev/sdc  
was not reconized. These are the steps followed based on the  
ceph-volume documentation I've read:


   *[root@mostha1 ~]# cephadm shell**
   **[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd >
   /var/lib/ceph/bootstrap-osd/ceph.keyring**
   **[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data
   /dev/sdc**
   *
   *[ceph: root@mostha1 /]# ceph-volume lvm list


   == osd.2 ===

   *  [block]

/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632


  block device

/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632

  block uuid Pq0XeH-LJct-t4yH-f56F-d5jk-JzGQ-zITfhE
  cephx lockbox secret
  cluster fsid 250f9864-0142-11ee-8e5f-00266cf8869c
  cluster name  ceph
  crush device class
  encrypted 0
   *osd fsid 45c8e92c-caf9-4fe7-9a42-7b45a0794632*
  osd id    2
  osdspec affinity
  type  block
  vdo   0
   *  devices   /dev/sdc

   *


Now lsblk command shows sdc as an osd:

sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 253:1    0 465.8G  0  
lvm

*sdc 8:32   1 232.9G  0 disk **
**`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 253:5    0 232.8G  0 lvm  
**

*

But this osd.2 is "down" and "out" with a strange status (no related  
cluster host, no weight) and I cannot activate it as within the  
podman container systemctl is not working.


   [ceph: root@mostha1 /]# ceph osd tree
   ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT  PRI-AFF
   -1 1.72823  root default
   -5 0.45477  host dean
 0    hdd  0.22739  osd.0 up   1.0  1.0
 4    hdd  0.22739  osd.4 up   1.0  1.0
   -9 0.22739  host ekman
 6    hdd  0.22739  osd.6 up   1.0  1.0
   -7 0.45479  host mostha1
 5    hdd  0.45479  osd.5 up   1.0  1.0
   -3 0.59128  host mostha2
 1    hdd  0.22739  osd.1 up   1.0  1.0
 3    hdd  0.36389  osd.3 up   1.0  1.0
   *2   0  osd.2   down 0 1.0*

My attempt to activate the osd:

[ceph: root@mostha1 /]# ceph-volume lvm activate 2  
45c8e92c-caf9-4fe7-9a42-7b45a0794632

Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph  
prime-osd-dir --dev  
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 --path /var/lib/ceph/osd/ceph-2  
--no-mon-config
Running command: /usr/bin/ln -snf  
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632  
/var/lib/ceph/osd/ceph-2/block

Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/systemctl enable  
ceph-volume@lvm-2-45c8e92c-caf9-4fe7-9a42-7b45a0794632
 stderr: Created symlink  
/etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-2-45c8e92c-caf9-4fe7-9a42-7b45a0794632.service ->  
/usr/lib/systemd/system/ceph-volume@.service.

Running command: /usr/bin/systemctl enable --runtime ceph-osd@2
 stderr: Created symlink  
/run/systemd/system/ceph-osd.target.wants/ceph-osd@2.service ->  
/usr/lib/systemd/system/ceph-osd@.service.

Running command: /usr/bin/systemctl start ceph-osd@2
 stderr: Failed to connect to bus: No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1

Patrick


Le 11/10/2023 à 11:00, 

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou

Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a strange 
final state. I was connected on host mostha1 where /dev/sdc was not 
reconized. These are the steps followed based on the ceph-volume 
documentation I've read:


   *[root@mostha1 ~]# cephadm shell**
   **[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd >
   /var/lib/ceph/bootstrap-osd/ceph.keyring**
   **[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data
   /dev/sdc**
   *
   *[ceph: root@mostha1 /]# ceph-volume lvm list


   == osd.2 ===

   *  [block]
   
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632

  block device
   
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632
  block uuid Pq0XeH-LJct-t4yH-f56F-d5jk-JzGQ-zITfhE
  cephx lockbox secret
  cluster fsid 250f9864-0142-11ee-8e5f-00266cf8869c
  cluster name  ceph
  crush device class
  encrypted 0
   *osd fsid 45c8e92c-caf9-4fe7-9a42-7b45a0794632*
  osd id    2
  osdspec affinity
  type  block
  vdo   0
   *  devices   /dev/sdc

   *


Now lsblk command shows sdc as an osd:

sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 
253:1    0 465.8G  0 lvm

*sdc 8:32   1 232.9G  0 disk **
**`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 
253:5    0 232.8G  0 lvm **

*

But this osd.2 is "down" and "out" with a strange status (no related 
cluster host, no weight) and I cannot activate it as within the 
podman container systemctl is not working.


   [ceph: root@mostha1 /]# ceph osd tree
   ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT  PRI-AFF
   -1 1.72823  root default
   -5 0.45477  host dean
 0    hdd  0.22739  osd.0 up   1.0  1.0
 4    hdd  0.22739  osd.4 up   1.0  1.0
   -9 0.22739  host ekman
 6    hdd  0.22739  osd.6 up   1.0  1.0
   -7 0.45479  host mostha1
 5    hdd  0.45479  osd.5 up   1.0  1.0
   -3 0.59128  host mostha2
 1    hdd  0.22739  osd.1 up   1.0  1.0
 3    hdd  0.36389  osd.3 up   1.0  1.0
   *2   0  osd.2   down 0 1.0*

My attempt to activate the osd:

[ceph: root@mostha1 /]# ceph-volume lvm activate 2 
45c8e92c-caf9-4fe7-9a42-7b45a0794632

Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph 
prime-osd-dir --dev 
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 
--path /var/lib/ceph/osd/ceph-2 --no-mon-config
Running command: /usr/bin/ln -snf 
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 
/var/lib/ceph/osd/ceph-2/block

Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/systemctl enable 
ceph-volume@lvm-2-45c8e92c-caf9-4fe7-9a42-7b45a0794632
 stderr: Created symlink 
/etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-2-45c8e92c-caf9-4fe7-9a42-7b45a0794632.service 
-> /usr/lib/systemd/system/ceph-volume@.service.

Running command: /usr/bin/systemctl enable --runtime ceph-osd@2
 stderr: Created symlink 
/run/systemd/system/ceph-osd.target.wants/ceph-osd@2.service -> 
/usr/lib/systemd/system/ceph-osd@.service.

Running command: /usr/bin/systemctl start ceph-osd@2
 stderr: Failed to connect to bus: No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1

Patrick


Le 11/10/2023 à 11:00, Eugen Block a écrit :

Hi,

just wondering if 'ceph-volume lvm zap --destroy /dev/sdc' would help 
here. From your previous output you didn't specify the --destroy flag.
Which cephadm version is installed on the host? Did you also upgrade 
the OS when moving to Pacific? (Sorry if I missed that.



Zitat von Patrick Begou :


Le 02/10/2023 à 18:22, Patrick Bégou a écrit :

Hi all,

still stuck with this problem.

I've deployed octopus and all my HDD have been setup as osd. Fine.
I've upgraded to pacific and 2 osd have failed. They have been 
automatically removed and upgrade finishes. Cluster Health is finaly 
OK, no data loss.


But now I cannot re-add these osd with pacific (I had previous 
troubles on these old HDDs, lost one osd in octopus and was able to 

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
Sure, nothing unusual there:

---

  cluster:
id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86
health: HEALTH_OK

  services:
mon: 5 daemons, quorum ceph01,ceph03,ceph04,ceph05,ceph02 (age 2w)
mgr: ceph01.vankui(active, since 12d), standbys: ceph02.shsinf
osd: 96 osds: 96 up (since 2w), 95 in (since 3w)

  data:
pools:   10 pools, 2400 pgs
objects: 6.23M objects, 16 TiB
usage:   61 TiB used, 716 TiB / 777 TiB avail
pgs: 2396 active+clean
 3active+clean+scrubbing+deep
 1active+clean+scrubbing

  io:
client:   2.7 GiB/s rd, 27 MiB/s wr, 46.95k op/s rd, 2.17k op/s wr

---

Please disregard the big read number, a customer is running a
read-intensive job. Mon store writes keep happening when the cluster is
much more quiet, thus I think that intensive reads have no effect on the
mons.

Mgr:

"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator",
"pg_autoscaler",
"progress",
"rbd_support",
"status",
"telemetry",
"volumes"
],
"enabled_modules": [
"cephadm",
"dashboard",
"iostat",
"prometheus",
"restful"
],

---

/Z


On Wed, 11 Oct 2023 at 14:50, Eugen Block  wrote:

> Can you add some more details as requested by Frank? Which mgr modules
> are enabled? What's the current 'ceph -s' output?
>
> > Is autoscaler running and doing stuff?
> > Is balancer running and doing stuff?
> > Is backfill going on?
> > Is recovery going on?
> > Is your ceph version affected by the "excessive logging to MON
> > store" issue that was present starting with pacific but should have
> > been addressed
>
>
> Zitat von Zakhar Kirpichenko :
>
> > We don't use CephFS at all and don't have RBD snapshots apart from some
> > cloning for Openstack images.
> >
> > The size of mon stores isn't an issue, it's < 600 MB. But it gets
> > overwritten often causing lots of disk writes, and that is an issue for
> us.
> >
> > /Z
> >
> > On Wed, 11 Oct 2023 at 14:37, Eugen Block  wrote:
> >
> >> Do you use many snapshots (rbd or cephfs)? That can cause a heavy
> >> monitor usage, we've seen large mon stores on  customer clusters with
> >> rbd mirroring on snapshot basis. In a healthy cluster they have mon
> >> stores of around 2GB in size.
> >>
> >> >> @Eugen: Was there not an option to limit logging to the MON store?
> >>
> >> I don't recall at the moment, worth checking tough.
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > Thank you, Frank.
> >> >
> >> > The cluster is healthy, operating normally, nothing unusual is going
> on.
> >> We
> >> > observe lots of writes by mon processes into mon rocksdb stores,
> >> > specifically:
> >> >
> >> > /var/lib/ceph/mon/ceph-cephXX/store.db:
> >> > 65M 3675511.sst
> >> > 65M 3675512.sst
> >> > 65M 3675513.sst
> >> > 65M 3675514.sst
> >> > 65M 3675515.sst
> >> > 65M 3675516.sst
> >> > 65M 3675517.sst
> >> > 65M 3675518.sst
> >> > 62M 3675519.sst
> >> >
> >> > The site of the files is not huge, but monitors rotate and write out
> >> these
> >> > files often, sometimes several times per minute, resulting in lots of
> >> data
> >> > written to disk. The writes coincide with "manual compaction" events
> >> logged
> >> > by the monitors, for example:
> >> >
> >> > debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb:
> >> > [compaction/compaction_job.cc:1676] [default] [JOB 70854] Compacting
> 1@5
> >> +
> >> > 9@6 files to L6, score -1.00
> >> > debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb:
> EVENT_LOG_v1
> >> > {"time_micros": 1697022610487624, "job": 70854, "event":
> >> > "compaction_started", "compaction_reason": "ManualCompaction",
> >> "files_L5":
> >> > [3675543], "files_L6": [3675533, 3675534, 3675535, 3675536, 3675537,
> >> > 3675538, 3675539, 3675540, 3675541], "score": -1, "input_data_size":
> >> > 601117031}
> >> > debug 2023-10-11T11:10:10.619+ 7f48a3a9b700  4 rocksdb:
> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated
> table
> >> > #3675544: 2015 keys, 67287115 bytes
> >> > debug 2023-10-11T11:10:10.763+ 7f48a3a9b700  4 rocksdb:
> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated
> table
> >> > #3675545: 24343 keys, 67336225 bytes
> >> > debug 2023-10-11T11:10:10.899+ 7f48a3a9b700  4 rocksdb:
> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated
> table
> >> > #3675546: 1196 keys, 67225813 bytes
> >> > debug 2023-10-11T11:10:11.035+ 7f48a3a9b700  4 rocksdb:
> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated
> table
> >> > #3675547: 1049 keys, 67252678 bytes
> >> > debug 2023-10-11T11:10:11.167+ 7f48a3a9b700  4 rocksdb:
> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated
> table
> >> > #3675548: 1081 keys, 67216638 bytes
> >> > debug 2023-10-11T11:10:11.303+ 7f48a3a9b700 

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Eugen Block
Can you add some more details as requested by Frank? Which mgr modules  
are enabled? What's the current 'ceph -s' output?



Is autoscaler running and doing stuff?
Is balancer running and doing stuff?
Is backfill going on?
Is recovery going on?
Is your ceph version affected by the "excessive logging to MON  
store" issue that was present starting with pacific but should have  
been addressed



Zitat von Zakhar Kirpichenko :


We don't use CephFS at all and don't have RBD snapshots apart from some
cloning for Openstack images.

The size of mon stores isn't an issue, it's < 600 MB. But it gets
overwritten often causing lots of disk writes, and that is an issue for us.

/Z

On Wed, 11 Oct 2023 at 14:37, Eugen Block  wrote:


Do you use many snapshots (rbd or cephfs)? That can cause a heavy
monitor usage, we've seen large mon stores on  customer clusters with
rbd mirroring on snapshot basis. In a healthy cluster they have mon
stores of around 2GB in size.

>> @Eugen: Was there not an option to limit logging to the MON store?

I don't recall at the moment, worth checking tough.

Zitat von Zakhar Kirpichenko :

> Thank you, Frank.
>
> The cluster is healthy, operating normally, nothing unusual is going on.
We
> observe lots of writes by mon processes into mon rocksdb stores,
> specifically:
>
> /var/lib/ceph/mon/ceph-cephXX/store.db:
> 65M 3675511.sst
> 65M 3675512.sst
> 65M 3675513.sst
> 65M 3675514.sst
> 65M 3675515.sst
> 65M 3675516.sst
> 65M 3675517.sst
> 65M 3675518.sst
> 62M 3675519.sst
>
> The site of the files is not huge, but monitors rotate and write out
these
> files often, sometimes several times per minute, resulting in lots of
data
> written to disk. The writes coincide with "manual compaction" events
logged
> by the monitors, for example:
>
> debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb:
> [compaction/compaction_job.cc:1676] [default] [JOB 70854] Compacting 1@5
+
> 9@6 files to L6, score -1.00
> debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1697022610487624, "job": 70854, "event":
> "compaction_started", "compaction_reason": "ManualCompaction",
"files_L5":
> [3675543], "files_L6": [3675533, 3675534, 3675535, 3675536, 3675537,
> 3675538, 3675539, 3675540, 3675541], "score": -1, "input_data_size":
> 601117031}
> debug 2023-10-11T11:10:10.619+ 7f48a3a9b700  4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675544: 2015 keys, 67287115 bytes
> debug 2023-10-11T11:10:10.763+ 7f48a3a9b700  4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675545: 24343 keys, 67336225 bytes
> debug 2023-10-11T11:10:10.899+ 7f48a3a9b700  4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675546: 1196 keys, 67225813 bytes
> debug 2023-10-11T11:10:11.035+ 7f48a3a9b700  4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675547: 1049 keys, 67252678 bytes
> debug 2023-10-11T11:10:11.167+ 7f48a3a9b700  4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675548: 1081 keys, 67216638 bytes
> debug 2023-10-11T11:10:11.303+ 7f48a3a9b700  4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675549: 1196 keys, 67245376 bytes
> debug 2023-10-11T11:10:12.023+ 7f48a3a9b700  4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675550: 1195 keys, 67246813 bytes
> debug 2023-10-11T11:10:13.059+ 7f48a3a9b700  4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675551: 1205 keys, 67223302 bytes
> debug 2023-10-11T11:10:13.903+ 7f48a3a9b700  4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675552: 1312 keys, 56416011 bytes
> debug 2023-10-11T11:10:13.911+ 7f48a3a9b700  4 rocksdb:
> [compaction/compaction_job.cc:1415] [default] [JOB 70854] Compacted 1@5
+
> 9@6 files to L6 => 594449971 bytes
> debug 2023-10-11T11:10:13.915+ 7f48a3a9b700  4 rocksdb: (Original Log
> Time 2023/10/11-11:10:13.920991) [compaction/compaction_job.cc:760]
> [default] compacted to: base level 5 level multiplier 10.00 max bytes
base
> 268435456 files[0 0 0 0 0 0 9] max score 0.00, MB/sec: 175.8 rd, 173.9
wr,
> level 6, files in(1, 9) out(9) MB in(0.3, 572.9) out(566.9),
> read-write-amplify(3434.6) write-amplify(1707.7) OK, records in: 35108,
> records dropped: 516 output_compression: NoCompression
> debug 2023-10-11T11:10:13.915+ 7f48a3a9b700  4 rocksdb: (Original Log
> Time 2023/10/11-11:10:13.921010) EVENT_LOG_v1 {"time_micros":
> 1697022613921002, "job": 70854, "event": "compaction_finished",
> "compaction_time_micros": 3418822, "compaction_time_cpu_micros": 785454,
> "output_level": 6, "num_output_files": 9, "total_output_size": 594449971,
> "num_input_records": 35108, 

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
We don't use CephFS at all and don't have RBD snapshots apart from some
cloning for Openstack images.

The size of mon stores isn't an issue, it's < 600 MB. But it gets
overwritten often causing lots of disk writes, and that is an issue for us.

/Z

On Wed, 11 Oct 2023 at 14:37, Eugen Block  wrote:

> Do you use many snapshots (rbd or cephfs)? That can cause a heavy
> monitor usage, we've seen large mon stores on  customer clusters with
> rbd mirroring on snapshot basis. In a healthy cluster they have mon
> stores of around 2GB in size.
>
> >> @Eugen: Was there not an option to limit logging to the MON store?
>
> I don't recall at the moment, worth checking tough.
>
> Zitat von Zakhar Kirpichenko :
>
> > Thank you, Frank.
> >
> > The cluster is healthy, operating normally, nothing unusual is going on.
> We
> > observe lots of writes by mon processes into mon rocksdb stores,
> > specifically:
> >
> > /var/lib/ceph/mon/ceph-cephXX/store.db:
> > 65M 3675511.sst
> > 65M 3675512.sst
> > 65M 3675513.sst
> > 65M 3675514.sst
> > 65M 3675515.sst
> > 65M 3675516.sst
> > 65M 3675517.sst
> > 65M 3675518.sst
> > 62M 3675519.sst
> >
> > The site of the files is not huge, but monitors rotate and write out
> these
> > files often, sometimes several times per minute, resulting in lots of
> data
> > written to disk. The writes coincide with "manual compaction" events
> logged
> > by the monitors, for example:
> >
> > debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1676] [default] [JOB 70854] Compacting 1@5
> +
> > 9@6 files to L6, score -1.00
> > debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb: EVENT_LOG_v1
> > {"time_micros": 1697022610487624, "job": 70854, "event":
> > "compaction_started", "compaction_reason": "ManualCompaction",
> "files_L5":
> > [3675543], "files_L6": [3675533, 3675534, 3675535, 3675536, 3675537,
> > 3675538, 3675539, 3675540, 3675541], "score": -1, "input_data_size":
> > 601117031}
> > debug 2023-10-11T11:10:10.619+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675544: 2015 keys, 67287115 bytes
> > debug 2023-10-11T11:10:10.763+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675545: 24343 keys, 67336225 bytes
> > debug 2023-10-11T11:10:10.899+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675546: 1196 keys, 67225813 bytes
> > debug 2023-10-11T11:10:11.035+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675547: 1049 keys, 67252678 bytes
> > debug 2023-10-11T11:10:11.167+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675548: 1081 keys, 67216638 bytes
> > debug 2023-10-11T11:10:11.303+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675549: 1196 keys, 67245376 bytes
> > debug 2023-10-11T11:10:12.023+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675550: 1195 keys, 67246813 bytes
> > debug 2023-10-11T11:10:13.059+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675551: 1205 keys, 67223302 bytes
> > debug 2023-10-11T11:10:13.903+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> > #3675552: 1312 keys, 56416011 bytes
> > debug 2023-10-11T11:10:13.911+ 7f48a3a9b700  4 rocksdb:
> > [compaction/compaction_job.cc:1415] [default] [JOB 70854] Compacted 1@5
> +
> > 9@6 files to L6 => 594449971 bytes
> > debug 2023-10-11T11:10:13.915+ 7f48a3a9b700  4 rocksdb: (Original Log
> > Time 2023/10/11-11:10:13.920991) [compaction/compaction_job.cc:760]
> > [default] compacted to: base level 5 level multiplier 10.00 max bytes
> base
> > 268435456 files[0 0 0 0 0 0 9] max score 0.00, MB/sec: 175.8 rd, 173.9
> wr,
> > level 6, files in(1, 9) out(9) MB in(0.3, 572.9) out(566.9),
> > read-write-amplify(3434.6) write-amplify(1707.7) OK, records in: 35108,
> > records dropped: 516 output_compression: NoCompression
> > debug 2023-10-11T11:10:13.915+ 7f48a3a9b700  4 rocksdb: (Original Log
> > Time 2023/10/11-11:10:13.921010) EVENT_LOG_v1 {"time_micros":
> > 1697022613921002, "job": 70854, "event": "compaction_finished",
> > "compaction_time_micros": 3418822, "compaction_time_cpu_micros": 785454,
> > "output_level": 6, "num_output_files": 9, "total_output_size": 594449971,
> > "num_input_records": 35108, "num_output_records": 34592,
> > "num_subcompactions": 1, "output_compression": "NoCompression",
> > "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0,
> > "lsm_state": [0, 0, 0, 0, 0, 0, 9]}
> >
> > The log even mentions the huge 

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Eugen Block
Do you use many snapshots (rbd or cephfs)? That can cause a heavy  
monitor usage, we've seen large mon stores on  customer clusters with  
rbd mirroring on snapshot basis. In a healthy cluster they have mon  
stores of around 2GB in size.



@Eugen: Was there not an option to limit logging to the MON store?


I don't recall at the moment, worth checking tough.

Zitat von Zakhar Kirpichenko :


Thank you, Frank.

The cluster is healthy, operating normally, nothing unusual is going on. We
observe lots of writes by mon processes into mon rocksdb stores,
specifically:

/var/lib/ceph/mon/ceph-cephXX/store.db:
65M 3675511.sst
65M 3675512.sst
65M 3675513.sst
65M 3675514.sst
65M 3675515.sst
65M 3675516.sst
65M 3675517.sst
65M 3675518.sst
62M 3675519.sst

The site of the files is not huge, but monitors rotate and write out these
files often, sometimes several times per minute, resulting in lots of data
written to disk. The writes coincide with "manual compaction" events logged
by the monitors, for example:

debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1676] [default] [JOB 70854] Compacting 1@5 +
9@6 files to L6, score -1.00
debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1697022610487624, "job": 70854, "event":
"compaction_started", "compaction_reason": "ManualCompaction", "files_L5":
[3675543], "files_L6": [3675533, 3675534, 3675535, 3675536, 3675537,
3675538, 3675539, 3675540, 3675541], "score": -1, "input_data_size":
601117031}
debug 2023-10-11T11:10:10.619+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675544: 2015 keys, 67287115 bytes
debug 2023-10-11T11:10:10.763+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675545: 24343 keys, 67336225 bytes
debug 2023-10-11T11:10:10.899+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675546: 1196 keys, 67225813 bytes
debug 2023-10-11T11:10:11.035+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675547: 1049 keys, 67252678 bytes
debug 2023-10-11T11:10:11.167+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675548: 1081 keys, 67216638 bytes
debug 2023-10-11T11:10:11.303+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675549: 1196 keys, 67245376 bytes
debug 2023-10-11T11:10:12.023+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675550: 1195 keys, 67246813 bytes
debug 2023-10-11T11:10:13.059+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675551: 1205 keys, 67223302 bytes
debug 2023-10-11T11:10:13.903+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675552: 1312 keys, 56416011 bytes
debug 2023-10-11T11:10:13.911+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1415] [default] [JOB 70854] Compacted 1@5 +
9@6 files to L6 => 594449971 bytes
debug 2023-10-11T11:10:13.915+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/11-11:10:13.920991) [compaction/compaction_job.cc:760]
[default] compacted to: base level 5 level multiplier 10.00 max bytes base
268435456 files[0 0 0 0 0 0 9] max score 0.00, MB/sec: 175.8 rd, 173.9 wr,
level 6, files in(1, 9) out(9) MB in(0.3, 572.9) out(566.9),
read-write-amplify(3434.6) write-amplify(1707.7) OK, records in: 35108,
records dropped: 516 output_compression: NoCompression
debug 2023-10-11T11:10:13.915+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/11-11:10:13.921010) EVENT_LOG_v1 {"time_micros":
1697022613921002, "job": 70854, "event": "compaction_finished",
"compaction_time_micros": 3418822, "compaction_time_cpu_micros": 785454,
"output_level": 6, "num_output_files": 9, "total_output_size": 594449971,
"num_input_records": 35108, "num_output_records": 34592,
"num_subcompactions": 1, "output_compression": "NoCompression",
"num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0,
"lsm_state": [0, 0, 0, 0, 0, 0, 9]}

The log even mentions the huge write multiplication. I wonder whether this
is normal and what can be done about it.

/Z

On Wed, 11 Oct 2023 at 13:55, Frank Schilder  wrote:


I need to ask here: where exactly do you observe the hundreds of GB
written per day? Are the mon logs huge? Is it the mon store? Is your
cluster unhealthy?

We have an octopus cluster with 1282 OSDs, 1650 ceph fs clients and about
800 librbd clients. Per week our mon logs are  about 70M, the cluster logs
about 120M , the audit logs about 70M and I see between 100-200Kb/s writes
to the mon store. That's in the lower-digit GB range per day. Hundreds of
GB per day sound completely over the top 

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
Thank you, Frank.

The cluster is healthy, operating normally, nothing unusual is going on. We
observe lots of writes by mon processes into mon rocksdb stores,
specifically:

/var/lib/ceph/mon/ceph-cephXX/store.db:
65M 3675511.sst
65M 3675512.sst
65M 3675513.sst
65M 3675514.sst
65M 3675515.sst
65M 3675516.sst
65M 3675517.sst
65M 3675518.sst
62M 3675519.sst

The site of the files is not huge, but monitors rotate and write out these
files often, sometimes several times per minute, resulting in lots of data
written to disk. The writes coincide with "manual compaction" events logged
by the monitors, for example:

debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1676] [default] [JOB 70854] Compacting 1@5 +
9@6 files to L6, score -1.00
debug 2023-10-11T11:10:10.483+ 7f48a3a9b700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1697022610487624, "job": 70854, "event":
"compaction_started", "compaction_reason": "ManualCompaction", "files_L5":
[3675543], "files_L6": [3675533, 3675534, 3675535, 3675536, 3675537,
3675538, 3675539, 3675540, 3675541], "score": -1, "input_data_size":
601117031}
debug 2023-10-11T11:10:10.619+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675544: 2015 keys, 67287115 bytes
debug 2023-10-11T11:10:10.763+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675545: 24343 keys, 67336225 bytes
debug 2023-10-11T11:10:10.899+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675546: 1196 keys, 67225813 bytes
debug 2023-10-11T11:10:11.035+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675547: 1049 keys, 67252678 bytes
debug 2023-10-11T11:10:11.167+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675548: 1081 keys, 67216638 bytes
debug 2023-10-11T11:10:11.303+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675549: 1196 keys, 67245376 bytes
debug 2023-10-11T11:10:12.023+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675550: 1195 keys, 67246813 bytes
debug 2023-10-11T11:10:13.059+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675551: 1205 keys, 67223302 bytes
debug 2023-10-11T11:10:13.903+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675552: 1312 keys, 56416011 bytes
debug 2023-10-11T11:10:13.911+ 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1415] [default] [JOB 70854] Compacted 1@5 +
9@6 files to L6 => 594449971 bytes
debug 2023-10-11T11:10:13.915+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/11-11:10:13.920991) [compaction/compaction_job.cc:760]
[default] compacted to: base level 5 level multiplier 10.00 max bytes base
268435456 files[0 0 0 0 0 0 9] max score 0.00, MB/sec: 175.8 rd, 173.9 wr,
level 6, files in(1, 9) out(9) MB in(0.3, 572.9) out(566.9),
read-write-amplify(3434.6) write-amplify(1707.7) OK, records in: 35108,
records dropped: 516 output_compression: NoCompression
debug 2023-10-11T11:10:13.915+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/11-11:10:13.921010) EVENT_LOG_v1 {"time_micros":
1697022613921002, "job": 70854, "event": "compaction_finished",
"compaction_time_micros": 3418822, "compaction_time_cpu_micros": 785454,
"output_level": 6, "num_output_files": 9, "total_output_size": 594449971,
"num_input_records": 35108, "num_output_records": 34592,
"num_subcompactions": 1, "output_compression": "NoCompression",
"num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0,
"lsm_state": [0, 0, 0, 0, 0, 0, 9]}

The log even mentions the huge write multiplication. I wonder whether this
is normal and what can be done about it.

/Z

On Wed, 11 Oct 2023 at 13:55, Frank Schilder  wrote:

> I need to ask here: where exactly do you observe the hundreds of GB
> written per day? Are the mon logs huge? Is it the mon store? Is your
> cluster unhealthy?
>
> We have an octopus cluster with 1282 OSDs, 1650 ceph fs clients and about
> 800 librbd clients. Per week our mon logs are  about 70M, the cluster logs
> about 120M , the audit logs about 70M and I see between 100-200Kb/s writes
> to the mon store. That's in the lower-digit GB range per day. Hundreds of
> GB per day sound completely over the top on a healthy cluster, unless you
> have MGR modules changing the OSD/cluster map continuously.
>
> Is autoscaler running and doing stuff?
> Is balancer running and doing stuff?
> Is backfill going on?
> Is recovery going on?
> Is your ceph version affected by the "excessive logging to MON store"
> issue that was present starting with pacific but should have been addressed
> by 

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Frank Schilder
I need to ask here: where exactly do you observe the hundreds of GB written per 
day? Are the mon logs huge? Is it the mon store? Is your cluster unhealthy?

We have an octopus cluster with 1282 OSDs, 1650 ceph fs clients and about 800 
librbd clients. Per week our mon logs are  about 70M, the cluster logs about 
120M , the audit logs about 70M and I see between 100-200Kb/s writes to the mon 
store. That's in the lower-digit GB range per day. Hundreds of GB per day sound 
completely over the top on a healthy cluster, unless you have MGR modules 
changing the OSD/cluster map continuously.

Is autoscaler running and doing stuff?
Is balancer running and doing stuff?
Is backfill going on?
Is recovery going on?
Is your ceph version affected by the "excessive logging to MON store" issue 
that was present starting with pacific but should have been addressed by now?

@Eugen: Was there not an option to limit logging to the MON store?

For information to readers, we followed old recommendations from a Dell white 
paper for building a ceph cluster and have a 1TB Raid10 array on 6x write 
intensive SSDs for the MON stores. After 5 years we are below 10% wear. Average 
size of the MON store for a healthy cluster is 500M-1G, but we have seen this 
ballooning to 100+GB in degraded conditions.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Zakhar Kirpichenko 
Sent: Wednesday, October 11, 2023 12:00 PM
To: Eugen Block
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

Thank you, Eugen.

I'm interested specifically to find out whether the huge amount of data
written by monitors is expected. It is eating through the endurance of our
system drives, which were not specced for high DWPD/TBW, as this is not a
documented requirement, and monitors produce hundreds of gigabytes of
writes per day. I am looking for ways to reduce the amount of writes, if
possible.

/Z

On Wed, 11 Oct 2023 at 12:41, Eugen Block  wrote:

> Hi,
>
> what you report is the expected behaviour, at least I see the same on
> all clusters. I can't answer why the compaction is required that
> often, but you can control the log level of the rocksdb output:
>
> ceph config set mon debug_rocksdb 1/5 (default is 4/5)
>
> This reduces the log entries and you wouldn't see the manual
> compaction logs anymore. There are a couple more rocksdb options but I
> probably wouldn't change too much, only if you know what you're doing.
> Maybe Igor can comment if some other tuning makes sense here.
>
> Regards,
> Eugen
>
> Zitat von Zakhar Kirpichenko :
>
> > Any input from anyone, please?
> >
> > On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko 
> wrote:
> >
> >> Any input from anyone, please?
> >>
> >> It's another thing that seems to be rather poorly documented: it's
> unclear
> >> what to expect, what 'normal' behavior should be, and what can be done
> >> about the huge amount of writes by monitors.
> >>
> >> /Z
> >>
> >> On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> Monitors in our 16.2.14 cluster appear to quite often run "manual
> >>> compaction" tasks:
> >>>
> >>> debug 2023-10-09T09:30:53.888+ 7f48a329a700  4 rocksdb:
> EVENT_LOG_v1
> >>> {"time_micros": 1696843853892760, "job": 64225, "event":
> "flush_started",
> >>> "num_memtables": 1, "num_entries": 715, "num_deletes": 251,
> >>> "total_data_size": 3870352, "memory_usage": 3886744, "flush_reason":
> >>> "Manual Compaction"}
> >>> debug 2023-10-09T09:30:53.904+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original
> Log
> >>> Time 2023/10/09-09:30:53.910204)
> [db_impl/db_impl_compaction_flush.cc:2516]
> >>> [default] Manual compaction from level-0 to level-5 from 'paxos ..
> 'paxos;
> >>> will stop at (end)
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original
> Log
> >>> Time 2023/10/09-09:30:53.911004)
> [db_impl/db_impl_compaction_flush.cc:2516]
> >>> [default] Manual compaction from level-5 to level-6 from 

[ceph-users] Re: Ceph 18: Unable to delete image after imcomplete migration "image being migrated"

2023-10-11 Thread Rhys Goodwin


Thanks Eugen. Operation complete:

root@hcn03:/imagework# ceph osd pool delete infra-pool infra-pool 
--yes-i-really-really-mean-it
pool 'infra-pool' removed

Everything clean and tidy again.

Thanks for your help and support. 

--- Original Message ---
On Wednesday, October 11th, 2023 at 7:21 PM, Eugen Block  wrote:


> Hi,
> 
> then I misinterpreted your message and thought you were actually
> surprised about the trash image. Yeah I don't think messing with
> hexedit really helped here, but I'm not sure either. Anyway, let us
> know how it went.
> 
> Zitat von Rhys Goodwin rhys.good...@proton.me:
> 
> > Thanks again Eugen. Looking at my command history it does look like
> > I did execute the migration but didn't commit it. I wasn't surprised
> > to see it in the trash based on the doc you mentioned, I only tried
> > the restore as a desperate measure to clean up my mess. It doesn't
> > help that I messed around like this, including with hexedit :O. I
> > should have reached out before messing around.
> > 
> > I'll proceed with the migrate/re-create and report back. I'm just
> > crossing my fingers that I'll be allowed to delete the pool. It's a
> > lesson to me to take more care of my wee cluster.
> > 
> > Cheers,
> > Rhys
> > 
> > --- Original Message ---
> > On Wednesday, October 11th, 2023 at 7:54 AM, Eugen Block
> > ebl...@nde.ag wrote:
> > 
> > > Hi,
> > > 
> > > I just re-read the docs on rbd migration [1], haven't done that in a
> > > while, and it states the following:
> > > 
> > > > Note that the source image will be moved to the RBD trash to avoid
> > > > mistaken usage during the migration process
> > > 
> > > So it was expected that your source image was in the trash during the
> > > migration, no need to restore. According to your history you also ran
> > > the "execute" command, do you remember if ran successfully as well?
> > > Did you "execute" after the prepare command completed? But you also
> > > state that the target image isn't there anymore, so it's hard to tell
> > > what exactly happened here. I'm not sure how to continue from here,
> > > maybe migrating/re-creating is the only way now.
> > > 
> > > [1] https://docs.ceph.com/en/quincy/rbd/rbd-live-migration/
> > > 
> > > Zitat von Rhys Goodwin rhys.good...@proton.me:
> > > 
> > > > Thanks Eugen.
> > > > 
> > > > root@hcn03:~# rbd status infra-pool/sophosbuild
> > > > 2023-10-10T09:44:21.234+ 7f1675c524c0 -1 librbd::Migration:
> > > > open_images: failed to open destination image images/65d188c5f5a34:
> > > > (2) No such file or directory
> > > > rbd: getting migration status failed: (2) No such file or directory
> > > > Watchers: none
> > > > 
> > > > I've checked over the other pools again, but they only contain
> > > > Openstack images. There are only 42 images in total across all
> > > > pools. In fact, the "infra-pool" pool only has 3 images, including
> > > > the faulty one. So migrating/re-creating is not a big deal. It's
> > > > more just that I'd like to learn more about how to resolve such
> > > > issues, if possible.
> > > > 
> > > > Good call on the history. I found this smoking gun with: 'history
> > > > |grep "rbd migration":
> > > > rbd migration prepare infra-pool/sophosbuild images/sophosbuild
> > > > rbd migration execute images/sophosbuild
> > > > 
> > > > But images/sophosbuild is definitely not there anymore, and not in
> > > > the trash. It looks like I was missing the commit.
> > > > 
> > > > Kind regards,
> > > > Rhys
> > > > 
> > > > --- Original Message ---
> > > > 
> > > > Eugen Block Wrote:
> > > > 
> > > > Hi, there are a couple of things I would check before migrating all
> > > > images. What's the current 'rbd status infra-pool/sophosbuild'? You
> > > > probably don't have an infinite number of pools so I would also
> > > > check if any of the other pools contains an image with the same
> > > > name, just in case you wanted to keep its original name and only
> > > > change the pool. Even if you don't have the terminal output, maybe
> > > > you find some of the commands in the history?
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > 
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm configuration in git

2023-10-11 Thread Michał Nasiadka
Hello Kamil,

There is stackhpc.cephadm Ansible collection 
(https://galaxy.ansible.com/ui/repo/published/stackhpc/cephadm/) that would 
probably fit most of your needs - other option is to ceph orch ls —export and 
then store it in git for import purposes - but it won’t cover everything.

Best regards,

Michal

> On 11 Oct 2023, at 11:42, Kamil Madac  wrote:
> 
> Hello ceph community,
> 
> Currently we have deployed ceph clusters with ceph-ansible and whole
> configuration (number od daemons, osd configurations, rgw configurations,
> crush configuration, ...) of each cluster is stored in git and ansible
> variables and we can recreate clusters with ceph-ansible in case we need
> it.
> To change the configuration of a cluster we change appropriate Ansible
> variable, we test it on testing cluster and if new configuration works
> correctly we apply it on prod cluster.
> 
> Is it possible to do it with cephadm? Is it possible to have some config
> files in git and then apply  same cluster configuration on multiple
> clusters? Or is this approach not aligned with cephadm and we should do it
> different way?
> 
> Kamil Madac
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
Thank you, Eugen.

I'm interested specifically to find out whether the huge amount of data
written by monitors is expected. It is eating through the endurance of our
system drives, which were not specced for high DWPD/TBW, as this is not a
documented requirement, and monitors produce hundreds of gigabytes of
writes per day. I am looking for ways to reduce the amount of writes, if
possible.

/Z

On Wed, 11 Oct 2023 at 12:41, Eugen Block  wrote:

> Hi,
>
> what you report is the expected behaviour, at least I see the same on
> all clusters. I can't answer why the compaction is required that
> often, but you can control the log level of the rocksdb output:
>
> ceph config set mon debug_rocksdb 1/5 (default is 4/5)
>
> This reduces the log entries and you wouldn't see the manual
> compaction logs anymore. There are a couple more rocksdb options but I
> probably wouldn't change too much, only if you know what you're doing.
> Maybe Igor can comment if some other tuning makes sense here.
>
> Regards,
> Eugen
>
> Zitat von Zakhar Kirpichenko :
>
> > Any input from anyone, please?
> >
> > On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko 
> wrote:
> >
> >> Any input from anyone, please?
> >>
> >> It's another thing that seems to be rather poorly documented: it's
> unclear
> >> what to expect, what 'normal' behavior should be, and what can be done
> >> about the huge amount of writes by monitors.
> >>
> >> /Z
> >>
> >> On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> Monitors in our 16.2.14 cluster appear to quite often run "manual
> >>> compaction" tasks:
> >>>
> >>> debug 2023-10-09T09:30:53.888+ 7f48a329a700  4 rocksdb:
> EVENT_LOG_v1
> >>> {"time_micros": 1696843853892760, "job": 64225, "event":
> "flush_started",
> >>> "num_memtables": 1, "num_entries": 715, "num_deletes": 251,
> >>> "total_data_size": 3870352, "memory_usage": 3886744, "flush_reason":
> >>> "Manual Compaction"}
> >>> debug 2023-10-09T09:30:53.904+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original
> Log
> >>> Time 2023/10/09-09:30:53.910204)
> [db_impl/db_impl_compaction_flush.cc:2516]
> >>> [default] Manual compaction from level-0 to level-5 from 'paxos ..
> 'paxos;
> >>> will stop at (end)
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original
> Log
> >>> Time 2023/10/09-09:30:53.911004)
> [db_impl/db_impl_compaction_flush.cc:2516]
> >>> [default] Manual compaction from level-5 to level-6 from 'paxos ..
> 'paxos;
> >>> will stop at (end)
> >>> debug 2023-10-09T09:32:08.956+ 7f48a329a700  4 rocksdb:
> EVENT_LOG_v1
> >>> {"time_micros": 1696843928961390, "job": 64228, "event":
> "flush_started",
> >>> "num_memtables": 1, "num_entries": 1580, "num_deletes": 502,
> >>> "total_data_size": 8404605, "memory_usage": 8465840, "flush_reason":
> >>> "Manual Compaction"}
> >>> debug 2023-10-09T09:32:08.972+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:32:08.976+ 7f48a3a9b700  4 rocksdb: (Original
> Log
> >>> Time 2023/10/09-09:32:08.977739)
> [db_impl/db_impl_compaction_flush.cc:2516]
> >>> [default] Manual compaction from level-0 to level-5 from 'logm ..
> 'logm;
> >>> will stop at (end)
> >>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
> >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
> >>> starting
> >>> 

[ceph-users] Re: cephadm configuration in git

2023-10-11 Thread Robert Sander

On 10/11/23 11:42, Kamil Madac wrote:


Is it possible to do it with cephadm? Is it possible to have some config
files in git and then apply  same cluster configuration on multiple
clusters? Or is this approach not aligned with cephadm and we should do it
different way?


You can export the service specifications with "ceph orch ls --export" 
and import the YAML file with "ceph orch apply -i …".


This does not cover the hosts in the cluster.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-dashboard python warning with new pyo3 0.17 lib (debian12)

2023-10-11 Thread Max Carrara
On 9/5/23 16:53, Max Carrara wrote:
> Hello there,
> 
> could you perhaps provide some more information on how (or where) this
> got fixed? It doesn't seem to be fixed yet on the latest Ceph Quincy
> and Reef versions, but maybe I'm mistaken. I've provided some more
> context regarding this below, in case that helps.
> 
> 
> On Ceph Quincy 17.2.6 I'm encountering the following error when trying
> to enable the dashboard (so, the same error that was posted above):
> 
>   root@ceph-01:~# ceph --version
>   ceph version 17.2.6 (810db68029296377607028a6c6da1ec06f5a2b27) quincy 
> (stable)
> 
>   root@ceph-01:~#  ceph mgr module enable dashboard
>   Error ENOENT: module 'dashboard' reports that it cannot run on the active 
> manager daemon: PyO3 modules may only be initialized once per interpreter 
> process (pass --force to force enablement)
> 
> I was then able to find this Python traceback in the systemd journal:
> 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.438+0200 
> 7fecdc91e000 -1 mgr[py] Traceback (most recent call last):
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/usr/share/ceph/mgr/dashboard/__init__.py", line 60, in 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from .module import Module, 
> StandbyModule  # noqa: F401
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 
> ^
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/usr/share/ceph/mgr/dashboard/module.py", line 30, in 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from .controllers import 
> Router, json_error_page
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 1, in 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from ._api_router import 
> APIRouter
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/usr/share/ceph/mgr/dashboard/controllers/_api_router.py", line 1, in 
> 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from ._router import Router
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/usr/share/ceph/mgr/dashboard/controllers/_router.py", line 7, in 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from ._base_controller import 
> BaseController
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 11, in 
> 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from ..services.auth import 
> AuthManager, JwtManager
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/usr/share/ceph/mgr/dashboard/services/auth.py", line 12, in 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: import jwt
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/lib/python3/dist-packages/jwt/__init__.py", line 1, in 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from .api_jwk import PyJWK, 
> PyJWKSet
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/lib/python3/dist-packages/jwt/api_jwk.py", line 6, in 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from .algorithms import 
> get_default_algorithms
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/lib/python3/dist-packages/jwt/algorithms.py", line 6, in 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from .utils import (
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/lib/python3/dist-packages/jwt/utils.py", line 7, in 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from 
> cryptography.hazmat.primitives.asymmetric.ec import EllipticCurve
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/lib/python3/dist-packages/cryptography/hazmat/primitives/asymmetric/ec.py", 
> line 11, in 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from cryptography.hazmat._oid 
> import ObjectIdentifier
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]:   File 
> "/lib/python3/dist-packages/cryptography/hazmat/_oid.py", line 7, in 
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from 
> cryptography.hazmat.bindings._rust import (
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: ImportError: PyO3 modules may only 
> be initialized once per interpreter process
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.438+0200 
> 7fecdc91e000 -1 mgr[py] Class not found in module 'dashboard'
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.438+0200 
> 7fecdc91e000 -1 mgr[py] Error loading module 'dashboard': (2) No such file or 
> directory
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.470+0200 
> 7fecdc91e000 -1 mgr[py] Module progress has missing NOTIFY_TYPES member
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.502+0200 
> 7fecdc91e000 -1 mgr[py] Module iostat has missing NOTIFY_TYPES member
>   Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.502+0200 
> 7fecdc91e000 -1 log_channel(cluster) log [ERR] : Failed to load ceph-mgr 
> modules: dashboard
> 
> 
> As the traceback above reveals, the dashboard uses `PyJWT`, which in
> turn uses `cryptography`, and `cryptography` uses `PyO3`.

[ceph-users] cephadm configuration in git

2023-10-11 Thread Kamil Madac
Hello ceph community,

Currently we have deployed ceph clusters with ceph-ansible and whole
configuration (number od daemons, osd configurations, rgw configurations,
crush configuration, ...) of each cluster is stored in git and ansible
variables and we can recreate clusters with ceph-ansible in case we need
it.
To change the configuration of a cluster we change appropriate Ansible
variable, we test it on testing cluster and if new configuration works
correctly we apply it on prod cluster.

Is it possible to do it with cephadm? Is it possible to have some config
files in git and then apply  same cluster configuration on multiple
clusters? Or is this approach not aligned with cephadm and we should do it
different way?

Kamil Madac
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Eugen Block

Hi,

what you report is the expected behaviour, at least I see the same on  
all clusters. I can't answer why the compaction is required that  
often, but you can control the log level of the rocksdb output:


ceph config set mon debug_rocksdb 1/5 (default is 4/5)

This reduces the log entries and you wouldn't see the manual  
compaction logs anymore. There are a couple more rocksdb options but I  
probably wouldn't change too much, only if you know what you're doing.  
Maybe Igor can comment if some other tuning makes sense here.


Regards,
Eugen

Zitat von Zakhar Kirpichenko :


Any input from anyone, please?

On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko  wrote:


Any input from anyone, please?

It's another thing that seems to be rather poorly documented: it's unclear
what to expect, what 'normal' behavior should be, and what can be done
about the huge amount of writes by monitors.

/Z

On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko  wrote:


Hi,

Monitors in our 16.2.14 cluster appear to quite often run "manual
compaction" tasks:

debug 2023-10-09T09:30:53.888+ 7f48a329a700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1696843853892760, "job": 64225, "event": "flush_started",
"num_memtables": 1, "num_entries": 715, "num_deletes": 251,
"total_data_size": 3870352, "memory_usage": 3886744, "flush_reason":
"Manual Compaction"}
debug 2023-10-09T09:30:53.904+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/09-09:30:53.910204) [db_impl/db_impl_compaction_flush.cc:2516]
[default] Manual compaction from level-0 to level-5 from 'paxos .. 'paxos;
will stop at (end)
debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/09-09:30:53.911004) [db_impl/db_impl_compaction_flush.cc:2516]
[default] Manual compaction from level-5 to level-6 from 'paxos .. 'paxos;
will stop at (end)
debug 2023-10-09T09:32:08.956+ 7f48a329a700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1696843928961390, "job": 64228, "event": "flush_started",
"num_memtables": 1, "num_entries": 1580, "num_deletes": 502,
"total_data_size": 8404605, "memory_usage": 8465840, "flush_reason":
"Manual Compaction"}
debug 2023-10-09T09:32:08.972+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:08.976+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/09-09:32:08.977739) [db_impl/db_impl_compaction_flush.cc:2516]
[default] Manual compaction from level-0 to level-5 from 'logm .. 'logm;
will stop at (end)
debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:08.976+ 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/09-09:32:08.978512) [db_impl/db_impl_compaction_flush.cc:2516]
[default] Manual compaction from level-5 to level-6 from 'logm .. 'logm;
will stop at (end)
debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
starting
debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
[db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Zakhar Kirpichenko
Any input from anyone, please?

On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko  wrote:

> Any input from anyone, please?
>
> It's another thing that seems to be rather poorly documented: it's unclear
> what to expect, what 'normal' behavior should be, and what can be done
> about the huge amount of writes by monitors.
>
> /Z
>
> On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko  wrote:
>
>> Hi,
>>
>> Monitors in our 16.2.14 cluster appear to quite often run "manual
>> compaction" tasks:
>>
>> debug 2023-10-09T09:30:53.888+ 7f48a329a700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1696843853892760, "job": 64225, "event": "flush_started",
>> "num_memtables": 1, "num_entries": 715, "num_deletes": 251,
>> "total_data_size": 3870352, "memory_usage": 3886744, "flush_reason":
>> "Manual Compaction"}
>> debug 2023-10-09T09:30:53.904+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original Log
>> Time 2023/10/09-09:30:53.910204) [db_impl/db_impl_compaction_flush.cc:2516]
>> [default] Manual compaction from level-0 to level-5 from 'paxos .. 'paxos;
>> will stop at (end)
>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:30:53.908+ 7f48a3a9b700  4 rocksdb: (Original Log
>> Time 2023/10/09-09:30:53.911004) [db_impl/db_impl_compaction_flush.cc:2516]
>> [default] Manual compaction from level-5 to level-6 from 'paxos .. 'paxos;
>> will stop at (end)
>> debug 2023-10-09T09:32:08.956+ 7f48a329a700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1696843928961390, "job": 64228, "event": "flush_started",
>> "num_memtables": 1, "num_entries": 1580, "num_deletes": 502,
>> "total_data_size": 8404605, "memory_usage": 8465840, "flush_reason":
>> "Manual Compaction"}
>> debug 2023-10-09T09:32:08.972+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+ 7f48a3a9b700  4 rocksdb: (Original Log
>> Time 2023/10/09-09:32:08.977739) [db_impl/db_impl_compaction_flush.cc:2516]
>> [default] Manual compaction from level-0 to level-5 from 'logm .. 'logm;
>> will stop at (end)
>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:08.976+ 7f48a3a9b700  4 rocksdb: (Original Log
>> Time 2023/10/09-09:32:08.978512) [db_impl/db_impl_compaction_flush.cc:2516]
>> [default] Manual compaction from level-5 to level-6 from 'logm .. 'logm;
>> will stop at (end)
>> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:32:12.764+ 7f4899286700  4 rocksdb:
>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual compaction
>> starting
>> debug 2023-10-09T09:33:29.028+ 7f48a329a700  4 rocksdb: EVENT_LOG_v1
>> {"time_micros": 1696844009033151, "job": 64231, "event": "flush_started",
>> "num_memtables": 1, 

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Eugen Block

Hi,

just wondering if 'ceph-volume lvm zap --destroy /dev/sdc' would help  
here. From your previous output you didn't specify the --destroy flag.
Which cephadm version is installed on the host? Did you also upgrade  
the OS when moving to Pacific? (Sorry if I missed that.



Zitat von Patrick Begou :


Le 02/10/2023 à 18:22, Patrick Bégou a écrit :

Hi all,

still stuck with this problem.

I've deployed octopus and all my HDD have been setup as osd. Fine.
I've upgraded to pacific and 2 osd have failed. They have been  
automatically removed and upgrade finishes. Cluster Health is  
finaly OK, no data loss.


But now I cannot re-add these osd with pacific (I had previous  
troubles on these old HDDs, lost one osd in octopus and was able to  
reset and re-add it).


I've tried manually to add the first osd on the node where it is  
located, following  
https://docs.ceph.com/en/pacific/rados/operations/bluestore-migration/ (not  
sure it's the best idea...) but it fails too. This node was the one  
used for deploying the cluster.


[ceph: root@mostha1 /]# *ceph-volume lvm zap /dev/sdc*
--> Zapping: /dev/sdc
--> --destroy was not specified, but zapping a whole device will  
remove the partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M  
count=10 conv=fsync

 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.663425 s, 15.8 MB/s
--> Zapping successful for: 


[ceph: root@mostha1 /]# *ceph-volume lvm create --bluestore --data /dev/sdc*
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name  
client.bootstrap-osd --keyring  
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new  
9f1eb8ee-41e6-4350-ad73-1be21234ec7c
 stderr: 2023-10-02T16:09:29.855+ 7fb4eb8c0700 -1 auth: unable  
to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2)  
No such file or directory
 stderr: 2023-10-02T16:09:29.855+ 7fb4eb8c0700 -1  
AuthRegistry(0x7fb4e405c4d8) no keyring found at  
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.856+ 7fb4eb8c0700 -1 auth: unable  
to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2)  
No such file or directory
 stderr: 2023-10-02T16:09:29.856+ 7fb4eb8c0700 -1  
AuthRegistry(0x7fb4e40601d0) no keyring found at  
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.857+ 7fb4eb8c0700 -1 auth: unable  
to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2)  
No such file or directory
 stderr: 2023-10-02T16:09:29.857+ 7fb4eb8c0700 -1  
AuthRegistry(0x7fb4eb8bee90) no keyring found at  
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.858+ 7fb4e965c700 -1  
monclient(hunting): handle_auth_bad_method server allowed_methods  
[2] but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4e9e5d700 -1  
monclient(hunting): handle_auth_bad_method server allowed_methods  
[2] but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4e8e5b700 -1  
monclient(hunting): handle_auth_bad_method server allowed_methods  
[2] but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4eb8c0700 -1 monclient:  
authenticate NOTE: no keyring found; disabled cephx authentication
 stderr: [errno 13] RADOS permission denied (error connecting to  
the cluster)

-->  RuntimeError: Unable to create a new OSD id

Any idea of what is wrong ?

Thanks

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



I'm still trying to understand what can be wrong or how to debug  
this situation where Ceph cannot see the devices.


The device :dev/sdc exists:

   [root@mostha1 ~]# cephadm shell lsmcli ldl
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image

quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

   Path | SCSI VPD 0x83    | Link Type | Serial Number   | Health
   Status
   -
   /dev/sda | 50024e92039e4f1c | PATA/SATA | S2B5J90ZA10142  | Good
   /dev/sdb | 50014ee0ad5953c9 | PATA/SATA | WD-WMAYP0982329 | Good
   /dev/sdc | 50024e920387fa2c | PATA/SATA | S2B5J90ZA02494  | Good

But I cannot do anything with it:

   [root@mostha1 ~]# cephadm shell ceph orch device zap
   mostha1.legi.grenoble-inp.fr /dev/sdc --force
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image

quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

   Error EINVAL: Device path '/dev/sdc' not found on host
   'mostha1.legi.grenoble-inp.fr'

Since I moved from octopus to Pacific.

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Copying big objects (>5GB) doesn't work after upgrade to Quincy on S3

2023-10-11 Thread Arvydas Opulskis
Hi Casey,

thank you for an update. Now it's clear why it happened and we will adopt
our code for multipart.

Cheers,
Arvydas


On Tue, Oct 10, 2023, 18:36 Casey Bodley  wrote:

> hi Arvydas,
>
> it looks like this change corresponds to
> https://tracker.ceph.com/issues/48322 and
> https://github.com/ceph/ceph/pull/38234. the intent was to enforce the
> same limitation as AWS S3 and force clients to use multipart copy
> instead. this limit is controlled by the config option
> rgw_max_put_size which defaults to 5G. the same option controls other
> operations like Put/PostObject, so i wouldn't recommend raising it as
> a workaround for copy
>
> this change really should have been mentioned in the release notes -
> apologies for that omission
>
> On Tue, Oct 10, 2023 at 10:58 AM Arvydas Opulskis 
> wrote:
> >
> > Hi all,
> >
> > after upgrading our cluster from Nautilus -> Pacific -> Quincy we noticed
> > we can't copy bigger objects anymore via S3.
> >
> > An error we get:
> > "Aws::S3::Errors::EntityTooLarge (Aws::S3::Errors::EntityTooLarge)"
> >
> > After some tests we have following findings:
> > * Problems starts for objects bigger than 5 GB (multipart limit)
> > * Issue starts after upgrading to Quincy (17.2.6). In latest Pacific
> > (16.2.13) it works fine.
> > * For Quincy it works ok with AWS S3 CLI "cp" command, but doesn't work
> > using AWS Ruby3 SDK client with copy_object command.
> > * For Pacific setup both clients work ok
> > * From RGW logs seems like AWS S3 CLI client handles multipart copying
> > "under the hood", so it is succesful.
> >
> > It is stated in AWS documentation, that for uploads (and copying) bigger
> > than 5GB files we should use multi part API for AWS S3. For some reason
> it
> > worked for years in Ceph and stopped working after Quincy release, even I
> > couldn't find something in release notes addressing this change.
> >
> > So, is this change permanent and should be considered as bug fix?
> >
> > Both Pacific and Quincy clusters were running on Rocky 8.6 OS, using
> Beast
> > frontend.
> >
> > Arvydas
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou

Le 02/10/2023 à 18:22, Patrick Bégou a écrit :

Hi all,

still stuck with this problem.

I've deployed octopus and all my HDD have been setup as osd. Fine.
I've upgraded to pacific and 2 osd have failed. They have been 
automatically removed and upgrade finishes. Cluster Health is finaly 
OK, no data loss.


But now I cannot re-add these osd with pacific (I had previous 
troubles on these old HDDs, lost one osd in octopus and was able to 
reset and re-add it).


I've tried manually to add the first osd on the node where it is 
located, following 
https://docs.ceph.com/en/pacific/rados/operations/bluestore-migration/ 
(not sure it's the best idea...) but it fails too. This node was the 
one used for deploying the cluster.


[ceph: root@mostha1 /]# *ceph-volume lvm zap /dev/sdc*
--> Zapping: /dev/sdc
--> --destroy was not specified, but zapping a whole device will 
remove the partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10 
conv=fsync

 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.663425 s, 15.8 MB/s
--> Zapping successful for: 


[ceph: root@mostha1 /]# *ceph-volume lvm create --bluestore --data 
/dev/sdc*

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name 
client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
9f1eb8ee-41e6-4350-ad73-1be21234ec7c
 stderr: 2023-10-02T16:09:29.855+ 7fb4eb8c0700 -1 auth: unable to 
find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No 
such file or directory
 stderr: 2023-10-02T16:09:29.855+ 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4e405c4d8) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.856+ 7fb4eb8c0700 -1 auth: unable to 
find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No 
such file or directory
 stderr: 2023-10-02T16:09:29.856+ 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4e40601d0) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.857+ 7fb4eb8c0700 -1 auth: unable to 
find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No 
such file or directory
 stderr: 2023-10-02T16:09:29.857+ 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4eb8bee90) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.858+ 7fb4e965c700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods [2] 
but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4e9e5d700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods [2] 
but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4e8e5b700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods [2] 
but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4eb8c0700 -1 monclient: 
authenticate NOTE: no keyring found; disabled cephx authentication
 stderr: [errno 13] RADOS permission denied (error connecting to the 
cluster)

-->  RuntimeError: Unable to create a new OSD id

Any idea of what is wrong ?

Thanks

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



I'm still trying to understand what can be wrong or how to debug this 
situation where Ceph cannot see the devices.


The device :dev/sdc exists:

   [root@mostha1 ~]# cephadm shell lsmcli ldl
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image
   
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
   Path | SCSI VPD 0x83    | Link Type | Serial Number   | Health
   Status
   -
   /dev/sda | 50024e92039e4f1c | PATA/SATA | S2B5J90ZA10142  | Good
   /dev/sdb | 50014ee0ad5953c9 | PATA/SATA | WD-WMAYP0982329 | Good
   /dev/sdc | 50024e920387fa2c | PATA/SATA | S2B5J90ZA02494  | Good

But I cannot do anything with it:

   [root@mostha1 ~]# cephadm shell ceph orch device zap
   mostha1.legi.grenoble-inp.fr /dev/sdc --force
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image
   
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
   Error EINVAL: Device path '/dev/sdc' not found on host
   'mostha1.legi.grenoble-inp.fr'

Since I moved from octopus to Pacific.

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS: convert directory into subvolume

2023-10-11 Thread Eugen Block

Hi,

check out this thread [1] as well, where Anh Phan pointed out:
Not really sure what you want, but for simplicity, just move folder  
to following structure:


/volumes/[Sub Volume Group Name]/[Sub Volume Name]

ceph will recognize it (no extend attr needed), if you use  
subvolumegroup name difference than "_nogroup", you must provide it  
in all subvolume command [--group_name ]


You'll need an existing group (the _nogroup won't work) where you can  
move your directory tree to. That worked in my test as expected.


Regards,
Eugen

[1]  
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/G4ZWGGUPPFQIOVB4SFAIK73H3NLU2WRF/#HB3WW2ENNBBC2ODVSWX2DBGZO2KVB5VK



Zitat von jie.zha...@gmail.com:


Hello,

I'm following this tread and the original.  I'm trying to convert  
directories into subvolumes.  Where I'm stuck is how you move a  
directory into the subvolume root directory.


I have a volume 'tank' and it's mounted on the host as '/mnt/tank'   
I have subfolders '/mnt/tank/database', '/mnt/tank/gitlab', etc...


I create a subvolume and getpath gives me:
/volumes/_nogroup/database/4a74

Questions:
1) How do I move /mnt/tank/database into /volumes/_nogroup/database/4a...74
2) Each of the directories have different pools associated with  
them, do I need to create the sub volume in the same pool?
3) Or can I just move '/mnt/tank/gitlab' -->  
/volumes/_nogroup/gitlab without first creating the volume?  This  
would skip question 2..


Thx!

Jie
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 18: Unable to delete image after imcomplete migration "image being migrated"

2023-10-11 Thread Eugen Block

Hi,

then I misinterpreted your message and thought you were actually  
surprised about the trash image. Yeah I don't think messing with  
hexedit really helped here, but I'm not sure either. Anyway, let us  
know how it went.


Zitat von Rhys Goodwin :

Thanks again Eugen. Looking at my command history it does look like  
I did execute the migration but didn't commit it. I wasn't surprised  
to see it in the trash based on the doc you mentioned, I only tried  
the restore as a desperate measure to clean up my mess. It doesn't  
help that I messed around like this, including with hexedit :O. I  
should have reached out before messing around.


I'll proceed with the migrate/re-create and report back. I'm just  
crossing my fingers that I'll be allowed to delete the pool. It's a  
lesson to me to take more care of my wee cluster.


Cheers,
Rhys

--- Original Message ---
On Wednesday, October 11th, 2023 at 7:54 AM, Eugen Block  
 wrote:




Hi,

I just re-read the docs on rbd migration [1], haven't done that in a
while, and it states the following:

> Note that the source image will be moved to the RBD trash to avoid
> mistaken usage during the migration process


So it was expected that your source image was in the trash during the
migration, no need to restore. According to your history you also ran
the "execute" command, do you remember if ran successfully as well?
Did you "execute" after the prepare command completed? But you also
state that the target image isn't there anymore, so it's hard to tell
what exactly happened here. I'm not sure how to continue from here,
maybe migrating/re-creating is the only way now.

[1] https://docs.ceph.com/en/quincy/rbd/rbd-live-migration/

Zitat von Rhys Goodwin rhys.good...@proton.me:

> Thanks Eugen.
>
> root@hcn03:~# rbd status infra-pool/sophosbuild
> 2023-10-10T09:44:21.234+ 7f1675c524c0 -1 librbd::Migration:
> open_images: failed to open destination image images/65d188c5f5a34:
> (2) No such file or directory
> rbd: getting migration status failed: (2) No such file or directory
> Watchers: none
>
> I've checked over the other pools again, but they only contain
> Openstack images. There are only 42 images in total across all
> pools. In fact, the "infra-pool" pool only has 3 images, including
> the faulty one. So migrating/re-creating is not a big deal. It's
> more just that I'd like to learn more about how to resolve such
> issues, if possible.
>
> Good call on the history. I found this smoking gun with: 'history
> |grep "rbd migration":
> rbd migration prepare infra-pool/sophosbuild images/sophosbuild
> rbd migration execute images/sophosbuild
>
> But images/sophosbuild is definitely not there anymore, and not in
> the trash. It looks like I was missing the commit.
>
> Kind regards,
> Rhys
>
> --- Original Message ---
>
> Eugen Block Wrote:
>
> Hi, there are a couple of things I would check before migrating all
> images. What's the current 'rbd status infra-pool/sophosbuild'? You
> probably don't have an infinite number of pools so I would also
> check if any of the other pools contains an image with the same
> name, just in case you wanted to keep its original name and only
> change the pool. Even if you don't have the terminal output, maybe
> you find some of the commands in the history?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io