[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-01-25 Thread Michel Niyoyita
It seems that are different OSDs as shown here . how have you managed to
sort this out?

ceph pg dump | grep -F 6.78
dumped all
6.78   44268   0 0  00
1786796401180   0  10099 10099
 active+clean  2024-01-26T03:51:26.781438+0200  107547'115445304
107547:225274427  [12,36,37]  12  [12,36,37]  12
106977'114532385  2024-01-24T08:37:53.597331+0200  101161'109078277
2024-01-11T16:07:54.875746+0200  0
root@ceph-osd3:~# ceph pg dump | grep -F 6.60
dumped all
6.60   9   0 0  00
179484338742  716  36  10097 10097
 active+clean  2024-01-26T03:50:44.579831+0200  107547'153238805
107547:287193139   [32,5,29]  32   [32,5,29]  32
107231'152689835  2024-01-25T02:34:01.849966+0200  102171'147920798
2024-01-13T19:44:26.922000+0200  0
6.3a   44807   0 0  00
1809690056940   0  10093 10093
 active+clean  2024-01-26T03:53:28.837685+0200  107547'114765984
107547:238170093  [22,13,11]  22  [22,13,11]  22
106945'113739877  2024-01-24T04:10:17.224982+0200  102863'109559444
2024-01-15T05:31:36.606478+0200  0
root@ceph-osd3:~# ceph pg dump | grep -F 6.5c
6.5c   44277   0 0  00
1787649782300   0  10051 10051
 active+clean  2024-01-26T03:55:23.339584+0200  107547'126480090
107547:264432655  [22,37,30]  22  [22,37,30]  22
107205'125858697  2024-01-24T22:32:10.365869+0200  101941'120957992
2024-01-13T09:07:24.780936+0200  0
dumped all
root@ceph-osd3:~# ceph pg dump | grep -F 4.12
dumped all
4.12   0   0 0  00
   00   0  0 0
 active+clean  2024-01-24T08:36:48.284388+0200   0'0
 107546:152711   [22,19,7]  22   [22,19,7]  22
 0'0  2024-01-24T08:36:48.284307+0200   0'0
2024-01-13T09:09:22.176240+0200  0
root@ceph-osd3:~# ceph pg dump | grep -F 10.d
dumped all
10.d   0   0 0  00
   00   0  0 0
 active+clean  2024-01-24T04:04:33.641541+0200   0'0
 107546:142651   [14,28,1]  14   [14,28,1]  14
 0'0  2024-01-24T04:04:33.641451+0200   0'0
2024-01-12T08:04:02.078062+0200  0
root@ceph-osd3:~# ceph pg dump | grep -F 5.f
dumped all
5.f0   0 0  00
   00   0  0 0
 active+clean  2024-01-25T08:19:04.148941+0200   0'0
 107546:161331  [11,24,35]  11  [11,24,35]  11
 0'0  2024-01-25T08:19:04.148837+0200   0'0
2024-01-12T06:06:00.970665+0200  0


On Fri, Jan 26, 2024 at 8:58 AM E Taka <0eta...@gmail.com> wrote:

> We had the same problem. It turned out that one disk was slowly dying. It
> was easy to identify by the commands (in your case):
>
> ceph pg dump | grep -F 6.78
> ceph pg dump | grep -F 6.60
> …
>
> This command shows the OSDs of a PG in square brackets. If is there always
> the same number, then you've found the OSD which causes the slow scrubs.
>
> Am Fr., 26. Jan. 2024 um 07:45 Uhr schrieb Michel Niyoyita <
> mico...@gmail.com>:
>
>> Hello team,
>>
>> I have a cluster in production composed by  3 osds servers with 20 disks
>> each deployed using ceph-ansibleand ubuntu OS , and the version is pacific
>> . These days is in WARN state caused by pgs which are not deep-scrubbed in
>> time . I tried to deep-scrubbed some pg manually but seems that the
>> cluster
>> can be slow, would like your assistance in order that my cluster can be in
>> HEALTH_OK state as before without any interuption of service . The cluster
>> is used as openstack backend storage.
>>
>> Best Regards
>>
>> Michel
>>
>>
>>  ceph -s
>>   cluster:
>> id: cb0caedc-eb5b-42d1-a34f-96facfda8c27
>> health: HEALTH_WARN
>> 6 pgs not deep-scrubbed in time
>>
>>   services:
>> mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 11M)
>> mgr: ceph-mon2(active, since 11M), standbys: ceph-mon3, ceph-mon1
>> osd: 48 osds: 48 up (since 11M), 48 in (since 11M)
>> rgw: 6 daemons active (6 hosts, 1 zones)
>>
>>   data:
>> pools:   10 pools, 385 pgs
>> objects: 5.97M objects, 23 TiB
>> usage:   151 TiB used, 282 TiB / 433 TiB avail
>> pgs: 381 active+clean
>>  4   active+clean+scrubbing+deep
>>
>>   io:
>> client:   59 MiB/s rd, 860 MiB/s wr, 155 op/s rd, 665 op/s wr
>>
>> root@ceph-osd3:~# ceph health detail
>> HEALTH_WARN 6 pgs not deep-scrubbed in time
>> [WRN] PG_NOT_DEEP_SCRUBBED: 6 pgs not deep-scrubbed in time
>> pg 6.78 not deep-scrubbed since 2024-01-11T16:07:54.875746+0200
>> pg 6.60 not 

[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-01-25 Thread E Taka
We had the same problem. It turned out that one disk was slowly dying. It
was easy to identify by the commands (in your case):

ceph pg dump | grep -F 6.78
ceph pg dump | grep -F 6.60
…

This command shows the OSDs of a PG in square brackets. If is there always
the same number, then you've found the OSD which causes the slow scrubs.

Am Fr., 26. Jan. 2024 um 07:45 Uhr schrieb Michel Niyoyita <
mico...@gmail.com>:

> Hello team,
>
> I have a cluster in production composed by  3 osds servers with 20 disks
> each deployed using ceph-ansibleand ubuntu OS , and the version is pacific
> . These days is in WARN state caused by pgs which are not deep-scrubbed in
> time . I tried to deep-scrubbed some pg manually but seems that the cluster
> can be slow, would like your assistance in order that my cluster can be in
> HEALTH_OK state as before without any interuption of service . The cluster
> is used as openstack backend storage.
>
> Best Regards
>
> Michel
>
>
>  ceph -s
>   cluster:
> id: cb0caedc-eb5b-42d1-a34f-96facfda8c27
> health: HEALTH_WARN
> 6 pgs not deep-scrubbed in time
>
>   services:
> mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 11M)
> mgr: ceph-mon2(active, since 11M), standbys: ceph-mon3, ceph-mon1
> osd: 48 osds: 48 up (since 11M), 48 in (since 11M)
> rgw: 6 daemons active (6 hosts, 1 zones)
>
>   data:
> pools:   10 pools, 385 pgs
> objects: 5.97M objects, 23 TiB
> usage:   151 TiB used, 282 TiB / 433 TiB avail
> pgs: 381 active+clean
>  4   active+clean+scrubbing+deep
>
>   io:
> client:   59 MiB/s rd, 860 MiB/s wr, 155 op/s rd, 665 op/s wr
>
> root@ceph-osd3:~# ceph health detail
> HEALTH_WARN 6 pgs not deep-scrubbed in time
> [WRN] PG_NOT_DEEP_SCRUBBED: 6 pgs not deep-scrubbed in time
> pg 6.78 not deep-scrubbed since 2024-01-11T16:07:54.875746+0200
> pg 6.60 not deep-scrubbed since 2024-01-13T19:44:26.922000+0200
> pg 6.5c not deep-scrubbed since 2024-01-13T09:07:24.780936+0200
> pg 4.12 not deep-scrubbed since 2024-01-13T09:09:22.176240+0200
> pg 10.d not deep-scrubbed since 2024-01-12T08:04:02.078062+0200
> pg 5.f not deep-scrubbed since 2024-01-12T06:06:00.970665+0200
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 6 pgs not deep-scrubbed in time

2024-01-25 Thread Michel Niyoyita
Hello team,

I have a cluster in production composed by  3 osds servers with 20 disks
each deployed using ceph-ansibleand ubuntu OS , and the version is pacific
. These days is in WARN state caused by pgs which are not deep-scrubbed in
time . I tried to deep-scrubbed some pg manually but seems that the cluster
can be slow, would like your assistance in order that my cluster can be in
HEALTH_OK state as before without any interuption of service . The cluster
is used as openstack backend storage.

Best Regards

Michel


 ceph -s
  cluster:
id: cb0caedc-eb5b-42d1-a34f-96facfda8c27
health: HEALTH_WARN
6 pgs not deep-scrubbed in time

  services:
mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 11M)
mgr: ceph-mon2(active, since 11M), standbys: ceph-mon3, ceph-mon1
osd: 48 osds: 48 up (since 11M), 48 in (since 11M)
rgw: 6 daemons active (6 hosts, 1 zones)

  data:
pools:   10 pools, 385 pgs
objects: 5.97M objects, 23 TiB
usage:   151 TiB used, 282 TiB / 433 TiB avail
pgs: 381 active+clean
 4   active+clean+scrubbing+deep

  io:
client:   59 MiB/s rd, 860 MiB/s wr, 155 op/s rd, 665 op/s wr

root@ceph-osd3:~# ceph health detail
HEALTH_WARN 6 pgs not deep-scrubbed in time
[WRN] PG_NOT_DEEP_SCRUBBED: 6 pgs not deep-scrubbed in time
pg 6.78 not deep-scrubbed since 2024-01-11T16:07:54.875746+0200
pg 6.60 not deep-scrubbed since 2024-01-13T19:44:26.922000+0200
pg 6.5c not deep-scrubbed since 2024-01-13T09:07:24.780936+0200
pg 4.12 not deep-scrubbed since 2024-01-13T09:09:22.176240+0200
pg 10.d not deep-scrubbed since 2024-01-12T08:04:02.078062+0200
pg 5.f not deep-scrubbed since 2024-01-12T06:06:00.970665+0200
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about the CRUSH details

2024-01-25 Thread Anthony D'Atri



> 
>>> forth), so this is why "ceph df" will tell you a pool has X free
>>> space, where X is "smallest free space on the OSDs on which this pool
>>> lies, times the number of OSDs".

To be even more precise, this depends on the failure domain.  With the typical 
"rack" failure domain, say you use 3x replication and have 3 racks, you'll be 
limited to the capacity of the smallest rack. If you have more racks than 
failure domains, though, you are less affected racks that vary somewhat in 
CRUSH weight.

With respect to OSDs, the above is still true, which is one reason we have the 
balancer module.  Say your OSDs are on average 50% full but you have one that 
is 70% full.  The most-full outlier will limit the reported available space.

The available space for each pool is also a function of the replication 
strategy -- replication vs EC as well as the prevailing full ratio setting.


>>> Given the pseudorandom placement of
>>> objects to PGs, there is nothing to prevent you from having the worst
>>> luck ever and all the objects you create end up on the OSD with least
>>> free space.
>> 
>> This is why you need a decent amount of PGs, to not run into statistical
>> edge cases.
> 
> Yes, just take the experiment to someone with one PG only, then it can
> only fill one OSD. Someone with a pool with only 2 PGs could at the
> very best case only fill two and so on. If you have 100+ PGs per OSD,
> the chances for many files to end up only on a few PGs becomes very
> small.

Indeed, a healthy number of PG shards per OSD is important as well for this 
reason.  I use an analogy of filling a 55 gallon drum with sportsballs.  You 
can fit maybe two beach balls in there with a ton of air space, but you could 
fit thousands of pingpong balls in there with a lot less air space.  

Having a power of 2 number of PGs per pool also helps with uniform distribution 
-- the description of why this is the case is a bit abstruse so I'll spare the 
list, but enquiring minds can read chapter 8 ;)

> and every client can't have a complete list of millions of objects in
> the cluster, so it does client-side computations.


This is one reason we have PGs -- so that there's a manageable number of things 
to juggle, while not being so few as to run into statistical and other 
imbalances.


> 
> -- 
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Throughput metrics missing iwhen updating Ceph Quincy to Reef

2024-01-25 Thread Eugen Block

Ah, there they are (different port):

reef01:~ # curl http://localhost:9926/metrics | grep ceph_osd_op | head
  % Total% Received % Xferd  Average Speed   TimeTime  
Time  Current

 Dload  Upload   Total   SpentLeft  Speed
100  124k  100  124k0 0   111M  0 --:--:-- --:--:-- --:--:--  121M
# HELP ceph_osd_op Client operations
# TYPE ceph_osd_op counter
ceph_osd_op{ceph_daemon="osd.1"} 25
ceph_osd_op{ceph_daemon="osd.4"} 543
ceph_osd_op{ceph_daemon="osd.5"} 12192
# HELP ceph_osd_op_delayed_degraded Count of ops delayed due to target  
object being degraded

# TYPE ceph_osd_op_delayed_degraded counter
ceph_osd_op_delayed_degraded{ceph_daemon="osd.1"} 0
ceph_osd_op_delayed_degraded{ceph_daemon="osd.4"} 0
ceph_osd_op_delayed_degraded{ceph_daemon="osd.5"} 0

I can't check the dashboard right now, that I will definitely do tomorrow.
Good night!

Zitat von Eugen Block :


Yeah, it's mentioned in the upgrade docs [2]:


Monitoring & Alerting
  Ceph-exporter: Now the performance metrics for Ceph daemons  
are exported by ceph-exporter, which deploys on each daemon rather  
than using prometheus exporter. This will reduce performance  
bottlenecks.



[2] https://docs.ceph.com/en/latest/releases/reef/#major-changes-from-quincy

Zitat von Eugen Block :


Hi,

I got those metrics back after setting:

reef01:~ # ceph config set mgr mgr/prometheus/exclude_perf_counters false

reef01:~ # curl http://localhost:9283/metrics | grep ceph_osd_op | head
 % Total% Received % Xferd  Average Speed   TimeTime  
Time  Current

Dload  Upload   Total   SpentLeft  Speed
100  324k  100  324k0 0  72.5M  0 --:--:-- --:--:--  
--:--:-- 79.1M

# HELP ceph_osd_op Client operations
# TYPE ceph_osd_op counter
ceph_osd_op{ceph_daemon="osd.0"} 139650.0
ceph_osd_op{ceph_daemon="osd.11"} 9711090.0
ceph_osd_op{ceph_daemon="osd.2"} 3864.0
ceph_osd_op{ceph_daemon="osd.1"} 25.0
ceph_osd_op{ceph_daemon="osd.4"} 543.0
ceph_osd_op{ceph_daemon="osd.5"} 12192.0
ceph_osd_op{ceph_daemon="osd.3"} 3661521.0
ceph_osd_op{ceph_daemon="osd.6"} 2030.0


I found the option in the docs [1], but the same section is in the  
quincy docs as well, although there's no such option in my quincy  
cluster, maybe that's why it still exports those performance  
counters in my quincy cluster:


quincy-1:~ # ceph config get mgr mgr/prometheus/exclude_perf_counters
Error ENOENT: unrecognized key 'mgr/prometheus/exclude_perf_counters'

Anyway, this should bring back the metrics the "legacy" way (I  
guess). Apparently, the ceph-exporter daemon is now required on  
your hosts to collect those metrics.
After adding the ceph-exporter service (ceph orch apply  
ceph-exporter) and setting mgr/prometheus/exclude_perf_counters  
back to "true" I see that there are "ceph_osd_op" metrics defined  
but no values yet. Apparently, I'm still missing something, I'll  
check tomorrow. But this could/should be in the upgrade docs IMO.


Regards,
Eugen

[1]  
https://docs.ceph.com/en/latest/mgr/prometheus/#ceph-daemon-performance-counters-metrics


Zitat von Martin :


Hi,

Confirmed that this happens to me as well.
After upgrading from 18.2.0 to 18.2.1 OSD metrics  
like: ceph_osd_op_* are missing from ceph-mgr.


The Grafana dashboard also doesn't display all graphs correctly.

ceph-dashboard/Ceph - Cluster : Capacity used, Cluster I/O, OSD  
Capacity Utilization, PGs per OSD


curl http://localhost:9283/metrics | grep -i ceph_osd_op
  % Total    % Received % Xferd  Average Speed   Time Time  
Time  Current

 Dload  Upload   Total Spent    Left  Speed
100 38317  100 38317    0 0   9.8M  0 --:--:-- --:--:--  
--:--:-- 12.1M


Before the upgrading to reef 18.2.1 I could get all the metrics.

Martin

On 18/01/2024 12:32, Jose Vicente wrote:

Hi,
After upgrading from Quincy to Reef the ceph-mgr daemon is not  
throwing some throughput OSD metrics like: ceph_osd_op_*

curl http://localhost:9283/metrics | grep -i ceph_osd_op
  % Total    % Received % Xferd  Average Speed   Time  Time      
Time  Current
                                 Dload  Upload   Total Spent    
 Left  Speed
100  295k  100  295k    0     0   144M      0 --:--:-- --:--:--  
--:--:--  144M

However I can get other metrics like:
# curl http://localhost:9283/metrics | grep -i ceph_osd_apply
# HELP ceph_osd_apply_latency_ms OSD stat apply_latency_ms
# TYPE ceph_osd_apply_latency_ms gauge
ceph_osd_apply_latency_ms{ceph_daemon="osd.275"} 152.0
ceph_osd_apply_latency_ms{ceph_daemon="osd.274"} 102.0
...
Before the upgrading to reef (from quincy) I I could get all the  
metrics. MGR module prometheus is enabled.

Rocky Linux release 8.8 (Green Obsidian)
ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76)  
reef (stable)

# netstat -nap | grep 9283
tcp        0      0 127.0.0.1:53834         127.0.0.1:9283      
 ESTABLISHED 3561/prometheus
tcp6       0      

[ceph-users] Re: Throughput metrics missing iwhen updating Ceph Quincy to Reef

2024-01-25 Thread Eugen Block

Yeah, it's mentioned in the upgrade docs [2]:


Monitoring & Alerting
   Ceph-exporter: Now the performance metrics for Ceph daemons  
are exported by ceph-exporter, which deploys on each daemon rather  
than using prometheus exporter. This will reduce performance  
bottlenecks.



[2] https://docs.ceph.com/en/latest/releases/reef/#major-changes-from-quincy

Zitat von Eugen Block :


Hi,

I got those metrics back after setting:

reef01:~ # ceph config set mgr mgr/prometheus/exclude_perf_counters false

reef01:~ # curl http://localhost:9283/metrics | grep ceph_osd_op | head
  % Total% Received % Xferd  Average Speed   TimeTime  
Time  Current

 Dload  Upload   Total   SpentLeft  Speed
100  324k  100  324k0 0  72.5M  0 --:--:-- --:--:--  
--:--:-- 79.1M

# HELP ceph_osd_op Client operations
# TYPE ceph_osd_op counter
ceph_osd_op{ceph_daemon="osd.0"} 139650.0
ceph_osd_op{ceph_daemon="osd.11"} 9711090.0
ceph_osd_op{ceph_daemon="osd.2"} 3864.0
ceph_osd_op{ceph_daemon="osd.1"} 25.0
ceph_osd_op{ceph_daemon="osd.4"} 543.0
ceph_osd_op{ceph_daemon="osd.5"} 12192.0
ceph_osd_op{ceph_daemon="osd.3"} 3661521.0
ceph_osd_op{ceph_daemon="osd.6"} 2030.0


I found the option in the docs [1], but the same section is in the  
quincy docs as well, although there's no such option in my quincy  
cluster, maybe that's why it still exports those performance  
counters in my quincy cluster:


quincy-1:~ # ceph config get mgr mgr/prometheus/exclude_perf_counters
Error ENOENT: unrecognized key 'mgr/prometheus/exclude_perf_counters'

Anyway, this should bring back the metrics the "legacy" way (I  
guess). Apparently, the ceph-exporter daemon is now required on your  
hosts to collect those metrics.
After adding the ceph-exporter service (ceph orch apply  
ceph-exporter) and setting mgr/prometheus/exclude_perf_counters back  
to "true" I see that there are "ceph_osd_op" metrics defined but no  
values yet. Apparently, I'm still missing something, I'll check  
tomorrow. But this could/should be in the upgrade docs IMO.


Regards,
Eugen

[1]  
https://docs.ceph.com/en/latest/mgr/prometheus/#ceph-daemon-performance-counters-metrics


Zitat von Martin :


Hi,

Confirmed that this happens to me as well.
After upgrading from 18.2.0 to 18.2.1 OSD metrics  
like: ceph_osd_op_* are missing from ceph-mgr.


The Grafana dashboard also doesn't display all graphs correctly.

ceph-dashboard/Ceph - Cluster : Capacity used, Cluster I/O, OSD  
Capacity Utilization, PGs per OSD


curl http://localhost:9283/metrics | grep -i ceph_osd_op
  % Total    % Received % Xferd  Average Speed   Time Time Time  Current
 Dload  Upload   Total Spent    Left  Speed
100 38317  100 38317    0 0   9.8M  0 --:--:-- --:--:--  
--:--:-- 12.1M


Before the upgrading to reef 18.2.1 I could get all the metrics.

Martin

On 18/01/2024 12:32, Jose Vicente wrote:

Hi,
After upgrading from Quincy to Reef the ceph-mgr daemon is not  
throwing some throughput OSD metrics like: ceph_osd_op_*

curl http://localhost:9283/metrics | grep -i ceph_osd_op
  % Total    % Received % Xferd  Average Speed   Time  Time      
Time  Current

                                 Dload  Upload   Total Spent    Left  Speed
100  295k  100  295k    0     0   144M      0 --:--:-- --:--:--  
--:--:--  144M

However I can get other metrics like:
# curl http://localhost:9283/metrics | grep -i ceph_osd_apply
# HELP ceph_osd_apply_latency_ms OSD stat apply_latency_ms
# TYPE ceph_osd_apply_latency_ms gauge
ceph_osd_apply_latency_ms{ceph_daemon="osd.275"} 152.0
ceph_osd_apply_latency_ms{ceph_daemon="osd.274"} 102.0
...
Before the upgrading to reef (from quincy) I I could get all the  
metrics. MGR module prometheus is enabled.

Rocky Linux release 8.8 (Green Obsidian)
ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76)  
reef (stable)

# netstat -nap | grep 9283
tcp        0      0 127.0.0.1:53834         127.0.0.1:9283      
 ESTABLISHED 3561/prometheus
tcp6       0      0 :::9283                 :::*      LISTEN      
 804985/ceph-mgr

Thanks,
Jose C.

___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Throughput metrics missing iwhen updating Ceph Quincy to Reef

2024-01-25 Thread Eugen Block

Hi,

I got those metrics back after setting:

reef01:~ # ceph config set mgr mgr/prometheus/exclude_perf_counters false

reef01:~ # curl http://localhost:9283/metrics | grep ceph_osd_op | head
  % Total% Received % Xferd  Average Speed   TimeTime  
Time  Current

 Dload  Upload   Total   SpentLeft  Speed
100  324k  100  324k0 0  72.5M  0 --:--:-- --:--:-- --:--:-- 79.1M
# HELP ceph_osd_op Client operations
# TYPE ceph_osd_op counter
ceph_osd_op{ceph_daemon="osd.0"} 139650.0
ceph_osd_op{ceph_daemon="osd.11"} 9711090.0
ceph_osd_op{ceph_daemon="osd.2"} 3864.0
ceph_osd_op{ceph_daemon="osd.1"} 25.0
ceph_osd_op{ceph_daemon="osd.4"} 543.0
ceph_osd_op{ceph_daemon="osd.5"} 12192.0
ceph_osd_op{ceph_daemon="osd.3"} 3661521.0
ceph_osd_op{ceph_daemon="osd.6"} 2030.0


I found the option in the docs [1], but the same section is in the  
quincy docs as well, although there's no such option in my quincy  
cluster, maybe that's why it still exports those performance counters  
in my quincy cluster:


quincy-1:~ # ceph config get mgr mgr/prometheus/exclude_perf_counters
Error ENOENT: unrecognized key 'mgr/prometheus/exclude_perf_counters'

Anyway, this should bring back the metrics the "legacy" way (I guess).  
Apparently, the ceph-exporter daemon is now required on your hosts to  
collect those metrics.
After adding the ceph-exporter service (ceph orch apply ceph-exporter)  
and setting mgr/prometheus/exclude_perf_counters back to "true" I see  
that there are "ceph_osd_op" metrics defined but no values yet.  
Apparently, I'm still missing something, I'll check tomorrow. But this  
could/should be in the upgrade docs IMO.


Regards,
Eugen

[1]  
https://docs.ceph.com/en/latest/mgr/prometheus/#ceph-daemon-performance-counters-metrics


Zitat von Martin :


Hi,

Confirmed that this happens to me as well.
After upgrading from 18.2.0 to 18.2.1 OSD metrics  
like: ceph_osd_op_* are missing from ceph-mgr.


The Grafana dashboard also doesn't display all graphs correctly.

ceph-dashboard/Ceph - Cluster : Capacity used, Cluster I/O, OSD  
Capacity Utilization, PGs per OSD


curl http://localhost:9283/metrics | grep -i ceph_osd_op
  % Total    % Received % Xferd  Average Speed   Time Time Time  Current
 Dload  Upload   Total Spent    Left  Speed
100 38317  100 38317    0 0   9.8M  0 --:--:-- --:--:--  
--:--:-- 12.1M


Before the upgrading to reef 18.2.1 I could get all the metrics.

Martin

On 18/01/2024 12:32, Jose Vicente wrote:

Hi,
After upgrading from Quincy to Reef the ceph-mgr daemon is not  
throwing some throughput OSD metrics like: ceph_osd_op_*

curl http://localhost:9283/metrics | grep -i ceph_osd_op
  % Total    % Received % Xferd  Average Speed   Time  Time      
Time  Current

                                 Dload  Upload   Total Spent    Left  Speed
100  295k  100  295k    0     0   144M      0 --:--:-- --:--:--  
--:--:--  144M

However I can get other metrics like:
# curl http://localhost:9283/metrics | grep -i ceph_osd_apply
# HELP ceph_osd_apply_latency_ms OSD stat apply_latency_ms
# TYPE ceph_osd_apply_latency_ms gauge
ceph_osd_apply_latency_ms{ceph_daemon="osd.275"} 152.0
ceph_osd_apply_latency_ms{ceph_daemon="osd.274"} 102.0
...
Before the upgrading to reef (from quincy) I I could get all the  
metrics. MGR module prometheus is enabled.

Rocky Linux release 8.8 (Green Obsidian)
ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)
# netstat -nap | grep 9283
tcp        0      0 127.0.0.1:53834         127.0.0.1:9283      
 ESTABLISHED 3561/prometheus
tcp6       0      0 :::9283                 :::*      LISTEN      
 804985/ceph-mgr

Thanks,
Jose C.

___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: podman / docker issues

2024-01-25 Thread Daniel Brown


For the OP - IBM appears to have some relevant info in their CEPH docs: 

https://www.ibm.com/docs/en/storage-ceph/5?topic=cluster-performing-disconnected-installation


Questions: 

Is it possible to reset “container_image” after the cluster has been deployed? 

sudo ceph config dump |grep container_image
globalbasic 
container_image
quay.io/ceph/ceph@sha256:aca35483144ab3548a7f670db9b79772e6fc51167246421c66c0bd56a6585468
  * 


And, can it be set using just the repo name (quay.io/ceph/ceph:v18.2.1)  or is 
it going to, under the covers, change it over to the Image ID 
(quay.io/ceph/ceph@sha256:aca3548 …)


Questionable Use Case — having a cephadmin managed cluster that has more than 
one architecture — in my case, arm64 and amd64. Maybe not best practice for 
production, but possibly useful for dev/test and/or migration from one 
architecture to another. 



> On Jan 25, 2024, at 1:46 PM, Kai Stian Olstad  wrote:
> 
> On 25.01.2024 18:19, Marc wrote:
>> More and more I am annoyed with the 'dumb' design decisions of redhat. Just 
>> now I have an issue on an 'air gapped' vm that I am unable to start a 
>> docker/podman container because it tries to contact the repository to update 
>> the image and instead of using the on disk image it just fails. (Not to 
>> mention the %$#$%#$ that design containers to download stuff from the 
>> internet on startup)
>> I was wondering if this is also an issue with ceph-admin. Is there an issue 
>> with starting containers when container image repositories are not available 
>> or when there is no internet connection.
> 
> Of course cephadm will fail if the container registry is not available 
> avaiable and the image isn't pulled locally.
> 
> But you don't need to use the official registry, so using it air-gaped is not 
> a problem.
> Just download the images you need to your local registry and specify it, some 
> details are here
> https://docs.ceph.com/en/reef/cephadm/install/#deployment-in-an-isolated-environment
> 
> The containers themself don't need to download anything at start.
> 
> 
> -- 
> Kai Stian Olstad
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-25 Thread Özkan Göksu
This is client side metrics from a "failing to respond to cache pressure"
warned client.

root@datagen-27:/sys/kernel/debug/ceph/e42fd4b0-313b-11ee-9a00-31da71873773.client1282187#
cat bdi/stats
BdiWriteback:0 kB
BdiReclaimable:  0 kB
BdiDirtyThresh:  0 kB
DirtyThresh:  35979376 kB
BackgroundThresh: 17967720 kB
BdiDirtied:3071616 kB
BdiWritten:3036864 kB
BdiWriteBandwidth:  20 kBps
b_dirty: 0
b_io:0
b_more_io:   0
b_dirty_time:0
bdi_list:1
state:   1



root@d27:/sys/kernel/debug/ceph/e42fd4b0-313b-11ee-9a00-31da71873773.client1282187#
cat metrics
item   total
--
opened files  / total inodes   4 / 14129
pinned i_caps / total inodes   14129 / 14129
opened inodes / total inodes   2 / 14129

item  total   avg_lat(us) min_lat(us) max_lat(us)
stdev(us)
---
read  1218753 3116208 8741271
2154
write 34945   24003   30172191493
16156
metadata  1703642 8395127 17936115
 1497

item  total   avg_sz(bytes)   min_sz(bytes)   max_sz(bytes)
 total_sz(bytes)

read  1218753 227009  1   4194304
276668475618
write 34945   85860   1   4194304
3000382055

item  total   misshit
-
d_lease   306 19110   3317071969
caps  14129   145404  3761682333

Özkan Göksu , 25 Oca 2024 Per, 20:25 tarihinde şunu
yazdı:

> Every user has a 1x subvolume and I only have 1 pool.
> At the beginning we were using each subvolume for ldap home directory +
> user data.
> When a user logins any docker on any host, it was using the cluster for
> home and the for user related data, we was have second directory in the
> same subvolume.
> Time to time users were feeling a very slow home environment and after a
> month it became almost impossible to use home. VNC sessions became
> unresponsive and slow etc.
>
> 2 weeks ago, I had to migrate home to a ZFS storage and now the overall
> performance is better for only user_data without home.
> But still the performance is not good enough as I expected because of the
> problems related to MDS.
> The usage is low but allocation is high and Cpu usage is high. You saw the
> IO Op/s, it's nothing but allocation is high.
>
> I develop a fio benchmark script and I run the script on 4x test server at
> the same time, the results are below:
> Script:
> https://github.com/ozkangoksu/benchmark/blob/8f5df87997864c25ef32447e02fcd41fda0d2a67/iobench.sh
>
>
> https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-01.txt
>
> https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-02.txt
>
> https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-03.txt
>
> https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-04.txt
>
> While running benchmark, I take sample values for each type of iobench run.
>
> Seq Write benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
> client:   70 MiB/s rd, 762 MiB/s wr, 337 op/s rd, 24.41k op/s wr
> client:   60 MiB/s rd, 551 MiB/s wr, 303 op/s rd, 35.12k op/s wr
> client:   13 MiB/s rd, 161 MiB/s wr, 101 op/s rd, 41.30k op/s wr
>
> Seq Read benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
> client:   1.6 GiB/s rd, 219 KiB/s wr, 28.76k op/s rd, 89 op/s wr
> client:   370 MiB/s rd, 475 KiB/s wr, 90.38k op/s rd, 89 op/s wr
>
> Rand Write benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
> client:   63 MiB/s rd, 1.5 GiB/s wr, 8.77k op/s rd, 5.50k op/s wr
> client:   14 MiB/s rd, 1.8 GiB/s wr, 81 op/s rd, 13.86k op/s wr
> client:   6.6 MiB/s rd, 1.2 GiB/s wr, 61 op/s rd, 30.13k op/s wr
>
> Rand Read benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
> client:   317 MiB/s rd, 841 MiB/s wr, 426 op/s rd, 10.98k op/s wr
> client:   2.8 GiB/s rd, 882 MiB/s wr, 25.68k op/s rd, 291 op/s wr
> client:   4.0 GiB/s rd, 226 MiB/s wr, 89.63k op/s rd, 124 op/s wr
> client:   2.4 GiB/s rd, 295 KiB/s wr, 197.86k op/s rd, 20 op/s wr
>
> It seems I only have problems with the 4K,8K,16K other sector sizes.
>
>
>
>
> Eugen Block , 25 Oca 2024 Per, 19:06 tarihinde şunu yazdı:
>
>> I understand that your MDS shows a high CPU usage, but other than that
>> what is your performance issue? Do users complain? Do some operations
>> take longer than expected? Are OSDs 

[ceph-users] Re: podman / docker issues

2024-01-25 Thread Kai Stian Olstad

On 25.01.2024 18:19, Marc wrote:
More and more I am annoyed with the 'dumb' design decisions of redhat. 
Just now I have an issue on an 'air gapped' vm that I am unable to 
start a docker/podman container because it tries to contact the 
repository to update the image and instead of using the on disk image 
it just fails. (Not to mention the %$#$%#$ that design containers to 
download stuff from the internet on startup)


I was wondering if this is also an issue with ceph-admin. Is there an 
issue with starting containers when container image repositories are 
not available or when there is no internet connection.


Of course cephadm will fail if the container registry is not available 
avaiable and the image isn't pulled locally.


But you don't need to use the official registry, so using it air-gaped 
is not a problem.
Just download the images you need to your local registry and specify it, 
some details are here

https://docs.ceph.com/en/reef/cephadm/install/#deployment-in-an-isolated-environment

The containers themself don't need to download anything at start.


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about the CRUSH details

2024-01-25 Thread Janne Johansson
Den tors 25 jan. 2024 kl 17:47 skrev Robert Sander
:
> > forth), so this is why "ceph df" will tell you a pool has X free
> > space, where X is "smallest free space on the OSDs on which this pool
> > lies, times the number of OSDs". Given the pseudorandom placement of
> > objects to PGs, there is nothing to prevent you from having the worst
> > luck ever and all the objects you create end up on the OSD with least
> > free space.
>
> This is why you need a decent amount of PGs, to not run into statistical
> edge cases.

Yes, just take the experiment to someone with one PG only, then it can
only fill one OSD. Someone with a pool with only 2 PGs could at the
very best case only fill two and so on. If you have 100+ PGs per OSD,
the chances for many files to end up only on a few PGs becomes very
small.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-25 Thread Özkan Göksu
Every user has a 1x subvolume and I only have 1 pool.
At the beginning we were using each subvolume for ldap home directory +
user data.
When a user logins any docker on any host, it was using the cluster for
home and the for user related data, we was have second directory in the
same subvolume.
Time to time users were feeling a very slow home environment and after a
month it became almost impossible to use home. VNC sessions became
unresponsive and slow etc.

2 weeks ago, I had to migrate home to a ZFS storage and now the overall
performance is better for only user_data without home.
But still the performance is not good enough as I expected because of the
problems related to MDS.
The usage is low but allocation is high and Cpu usage is high. You saw the
IO Op/s, it's nothing but allocation is high.

I develop a fio benchmark script and I run the script on 4x test server at
the same time, the results are below:
Script:
https://github.com/ozkangoksu/benchmark/blob/8f5df87997864c25ef32447e02fcd41fda0d2a67/iobench.sh

https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-01.txt
https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-02.txt
https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-03.txt
https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-04.txt

While running benchmark, I take sample values for each type of iobench run.

Seq Write benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
client:   70 MiB/s rd, 762 MiB/s wr, 337 op/s rd, 24.41k op/s wr
client:   60 MiB/s rd, 551 MiB/s wr, 303 op/s rd, 35.12k op/s wr
client:   13 MiB/s rd, 161 MiB/s wr, 101 op/s rd, 41.30k op/s wr

Seq Read benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
client:   1.6 GiB/s rd, 219 KiB/s wr, 28.76k op/s rd, 89 op/s wr
client:   370 MiB/s rd, 475 KiB/s wr, 90.38k op/s rd, 89 op/s wr

Rand Write benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
client:   63 MiB/s rd, 1.5 GiB/s wr, 8.77k op/s rd, 5.50k op/s wr
client:   14 MiB/s rd, 1.8 GiB/s wr, 81 op/s rd, 13.86k op/s wr
client:   6.6 MiB/s rd, 1.2 GiB/s wr, 61 op/s rd, 30.13k op/s wr

Rand Read benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
client:   317 MiB/s rd, 841 MiB/s wr, 426 op/s rd, 10.98k op/s wr
client:   2.8 GiB/s rd, 882 MiB/s wr, 25.68k op/s rd, 291 op/s wr
client:   4.0 GiB/s rd, 226 MiB/s wr, 89.63k op/s rd, 124 op/s wr
client:   2.4 GiB/s rd, 295 KiB/s wr, 197.86k op/s rd, 20 op/s wr

It seems I only have problems with the 4K,8K,16K other sector sizes.




Eugen Block , 25 Oca 2024 Per, 19:06 tarihinde şunu yazdı:

> I understand that your MDS shows a high CPU usage, but other than that
> what is your performance issue? Do users complain? Do some operations
> take longer than expected? Are OSDs saturated during those phases?
> Because the cache pressure messages don’t necessarily mean that users
> will notice.
> MDS daemons are single-threaded so that might be a bottleneck. In that
> case multi-active mds might help, which you already tried and
> experienced OOM killers. But you might have to disable the mds
> balancer as someone else mentioned. And then you could think about
> pinning, is it possible to split the CephFS into multiple
> subdirectories and pin them to different ranks?
> But first I’d still like to know what the performance issue really is.
>
> Zitat von Özkan Göksu :
>
> > I will try my best to explain my situation.
> >
> > I don't have a separate mds server. I have 5 identical nodes, 3 of them
> > mons, and I use the other 2 as active and standby mds. (currently I have
> > left overs from max_mds 4)
> >
> > root@ud-01:~# ceph -s
> >   cluster:
> > id: e42fd4b0-313b-11ee-9a00-31da71873773
> > health: HEALTH_WARN
> > 1 clients failing to respond to cache pressure
> >
> >   services:
> > mon: 3 daemons, quorum ud-01,ud-02,ud-03 (age 9d)
> > mgr: ud-01.qycnol(active, since 8d), standbys: ud-02.tfhqfd
> > mds: 1/1 daemons up, 4 standby
> > osd: 80 osds: 80 up (since 9d), 80 in (since 5M)
> >
> >   data:
> > volumes: 1/1 healthy
> > pools:   3 pools, 2305 pgs
> > objects: 106.58M objects, 25 TiB
> > usage:   45 TiB used, 101 TiB / 146 TiB avail
> > pgs: 2303 active+clean
> >  2active+clean+scrubbing+deep
> >
> >   io:
> > client:   16 MiB/s rd, 3.4 MiB/s wr, 77 op/s rd, 23 op/s wr
> >
> > --
> > root@ud-01:~# ceph fs status
> > ud-data - 84 clients
> > ===
> > RANK  STATE   MDS  ACTIVITY DNSINOS   DIRS
> > CAPS
> >  0active  ud-data.ud-02.xcoojt  Reqs:   40 /s  2579k  2578k   169k
> >  3048k
> > POOL   TYPE USED  AVAIL
> > cephfs.ud-data.meta  metadata   136G  44.9T
> > cephfs.ud-data.datadata44.3T  44.9T
> >
> > --
> > root@ud-01:~# ceph health detail
> > HEALTH_WARN 1 

[ceph-users] podman / docker issues

2024-01-25 Thread Marc
More and more I am annoyed with the 'dumb' design decisions of redhat. Just now 
I have an issue on an 'air gapped' vm that I am unable to start a docker/podman 
container because it tries to contact the repository to update the image and 
instead of using the on disk image it just fails. (Not to mention the %$#$%#$ 
that design containers to download stuff from the internet on startup)

I was wondering if this is also an issue with ceph-admin. Is there an issue 
with starting containers when container image repositories are not available or 
when there is no internet connection.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about the CRUSH details

2024-01-25 Thread Robert Sander

On 1/25/24 13:32, Janne Johansson wrote:


It doesn't take OSD usage into consideration except at creation time
or OSD in/out/reweighing (or manual displacements with upmap and so
forth), so this is why "ceph df" will tell you a pool has X free
space, where X is "smallest free space on the OSDs on which this pool
lies, times the number of OSDs". Given the pseudorandom placement of
objects to PGs, there is nothing to prevent you from having the worst
luck ever and all the objects you create end up on the OSD with least
free space.


This is why you need a decent amount of PGs, to not run into statistical 
edge cases.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about the CRUSH details

2024-01-25 Thread Henry lol
Oh! That's why data imbalance occurs in Ceph.
I totally misunderstood Ceph's placement algorithm until just now.

Thank you a lot for your detailed explanation :)

Sincerely,

2024년 1월 25일 (목) 오후 9:32, Janne Johansson 님이 작성:
>
> Den tors 25 jan. 2024 kl 11:57 skrev Henry lol :
> >
> > It's reasonable enough.
> > actually, I expected the client to have just? thousands of
> > "PG-to-OSDs" mappings.
>
> Yes, but filename to PG is done with a pseudorandom algo.
>
> > Nevertheless, it’s so heavy that the client calculates location on
> > demand, right?
>
> Yes, and I guess the client has some kind of algorithm that makes it
> possible to know that PG 1.a4 should be on OSD 4, 93, 44 but also if 4
> is missing, the next candidate would be 51, if 93 isn't up either then
> 66 would be the next logical OSD to contact for that copy and so on.
> Since all parts (client, mons, OSDs) have the same code, when osd 4
> dies, 51 knows it needs to get a copy from either 93 or 44 and as soon
> as that copy is made, the PG will stop being active+degraded but might
> possibly be active+remapped, since it knows it wants to go back to OSD
> 4 if it comes back with the same size again.
>
> > if the client with the outdated map sends a request to the wrong OSD,
> > then does the OSD handle it somehow through redirection or something?
>
> I think it would get told it has the wrong osdmap.
>
> > Lastly, not only CRUSH map but also other factors like storage usage
> > are considered when doing CRUSH?
> > because it seems that the target OSD set isn’t deterministic given only it.
>
> It doesn't take OSD usage into consideration except at creation time
> or OSD in/out/reweighing (or manual displacements with upmap and so
> forth), so this is why "ceph df" will tell you a pool has X free
> space, where X is "smallest free space on the OSDs on which this pool
> lies, times the number of OSDs". Given the pseudorandom placement of
> objects to PGs, there is nothing to prevent you from having the worst
> luck ever and all the objects you create end up on the OSD with least
> free space.
>
> --
> May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-25 Thread Eugen Block
I understand that your MDS shows a high CPU usage, but other than that  
what is your performance issue? Do users complain? Do some operations  
take longer than expected? Are OSDs saturated during those phases?  
Because the cache pressure messages don’t necessarily mean that users  
will notice.
MDS daemons are single-threaded so that might be a bottleneck. In that  
case multi-active mds might help, which you already tried and  
experienced OOM killers. But you might have to disable the mds  
balancer as someone else mentioned. And then you could think about  
pinning, is it possible to split the CephFS into multiple  
subdirectories and pin them to different ranks?

But first I’d still like to know what the performance issue really is.

Zitat von Özkan Göksu :


I will try my best to explain my situation.

I don't have a separate mds server. I have 5 identical nodes, 3 of them
mons, and I use the other 2 as active and standby mds. (currently I have
left overs from max_mds 4)

root@ud-01:~# ceph -s
  cluster:
id: e42fd4b0-313b-11ee-9a00-31da71873773
health: HEALTH_WARN
1 clients failing to respond to cache pressure

  services:
mon: 3 daemons, quorum ud-01,ud-02,ud-03 (age 9d)
mgr: ud-01.qycnol(active, since 8d), standbys: ud-02.tfhqfd
mds: 1/1 daemons up, 4 standby
osd: 80 osds: 80 up (since 9d), 80 in (since 5M)

  data:
volumes: 1/1 healthy
pools:   3 pools, 2305 pgs
objects: 106.58M objects, 25 TiB
usage:   45 TiB used, 101 TiB / 146 TiB avail
pgs: 2303 active+clean
 2active+clean+scrubbing+deep

  io:
client:   16 MiB/s rd, 3.4 MiB/s wr, 77 op/s rd, 23 op/s wr

--
root@ud-01:~# ceph fs status
ud-data - 84 clients
===
RANK  STATE   MDS  ACTIVITY DNSINOS   DIRS
CAPS
 0active  ud-data.ud-02.xcoojt  Reqs:   40 /s  2579k  2578k   169k
 3048k
POOL   TYPE USED  AVAIL
cephfs.ud-data.meta  metadata   136G  44.9T
cephfs.ud-data.datadata44.3T  44.9T

--
root@ud-01:~# ceph health detail
HEALTH_WARN 1 clients failing to respond to cache pressure
[WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
mds.ud-data.ud-02.xcoojt(mds.0): Client bmw-m4 failing to respond to
cache pressure client_id: 1275577

--
When I check the failing client with session ls I see only "num_caps: 12298"

ceph tell mds.ud-data.ud-02.xcoojt session ls | jq -r '.[] | "clientid:
\(.id)= num_caps: \(.num_caps), num_leases: \(.num_leases),
request_load_avg: \(.request_load_avg), num_completed_requests:
\(.num_completed_requests), num_completed_flushes:
\(.num_completed_flushes)"' | sort -n -t: -k3

clientid: 1275577= num_caps: 12298, num_leases: 0, request_load_avg: 0,
num_completed_requests: 0, num_completed_flushes: 1
clientid: 1294542= num_caps: 13000, num_leases: 12, request_load_avg: 105,
num_completed_requests: 0, num_completed_flushes: 6
clientid: 1282187= num_caps: 16869, num_leases: 1, request_load_avg: 0,
num_completed_requests: 0, num_completed_flushes: 1
clientid: 1275589= num_caps: 18943, num_leases: 0, request_load_avg: 52,
num_completed_requests: 0, num_completed_flushes: 1
clientid: 1282154= num_caps: 24747, num_leases: 1, request_load_avg: 57,
num_completed_requests: 2, num_completed_flushes: 2
clientid: 1275553= num_caps: 25120, num_leases: 2, request_load_avg: 116,
num_completed_requests: 2, num_completed_flushes: 8
clientid: 1282142= num_caps: 27185, num_leases: 6, request_load_avg: 128,
num_completed_requests: 0, num_completed_flushes: 8
clientid: 1275535= num_caps: 40364, num_leases: 6, request_load_avg: 111,
num_completed_requests: 2, num_completed_flushes: 8
clientid: 1282130= num_caps: 41483, num_leases: 0, request_load_avg: 135,
num_completed_requests: 0, num_completed_flushes: 1
clientid: 1275547= num_caps: 42953, num_leases: 4, request_load_avg: 119,
num_completed_requests: 2, num_completed_flushes: 6
clientid: 1282139= num_caps: 45435, num_leases: 27, request_load_avg: 84,
num_completed_requests: 2, num_completed_flushes: 34
clientid: 1282136= num_caps: 48374, num_leases: 8, request_load_avg: 0,
num_completed_requests: 1, num_completed_flushes: 1
clientid: 1275532= num_caps: 48664, num_leases: 7, request_load_avg: 115,
num_completed_requests: 2, num_completed_flushes: 8
clientid: 1191789= num_caps: 130319, num_leases: 0, request_load_avg: 1753,
num_completed_requests: 0, num_completed_flushes: 0
clientid: 1275571= num_caps: 139488, num_leases: 0, request_load_avg: 2,
num_completed_requests: 0, num_completed_flushes: 1
clientid: 1282133= num_caps: 145487, num_leases: 0, request_load_avg: 8,
num_completed_requests: 1, num_completed_flushes: 1
clientid: 1534496= num_caps: 1041316, num_leases: 0, request_load_avg: 0,
num_completed_requests: 0, num_completed_flushes: 1

--
When I check the dashboard/service/mds I see %120+ CPU usage on 

[ceph-users] Re: RGW crashes when rgw_enable_ops_log is enabled

2024-01-25 Thread Marc Singer

Hi

I am using a unix socket client to connect with it and read the data 
from it.
Do I need to do anything like signal the socket that this data has been 
read? Or am I not reading fast enough and data is backing up?


What I am also noticing that at some point (probably after something 
with the ops socket happens), the log level seems to increase for some 
reason? I did not find anything in the logs yet why this would be the case.


*Normal:*

2024-01-25T15:47:58.444+ 7fe98a5c0b00  1 == starting new request 
req=0x7fe98712c720 =
2024-01-25T15:47:58.548+ 7fe98b700b00  1 == req done 
req=0x7fe98712c720 op status=0 http_status=200 latency=0.104001537s ==
2024-01-25T15:47:58.548+ 7fe98b700b00  1 beast: 0x7fe98712c720: 
redacted - redacted [25/Jan/2024:15:47:58.444 +] "PUT 
/redacted/redacted/chunks/27/27242/27242514_10_4194304 HTTP/1.1" 200 
4194304 - "redacted" - latency=0.104001537s


*Close before crashing:
*

  -509> 2024-01-25T14:54:31.588+ 7f5186648b00  1 == starting 
new request req=0x7f517ffca720 =
  -508> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s initializing for trans_id = 
tx023a42eb7515dcdc0-0065b27627-823feaa-central
  -507> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s getting op 1
  -506> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s s3:put_obj verifying requester
  -505> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s s3:put_obj normalizing buckets and tenants
  -504> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s s3:put_obj init permissions
  -503> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s s3:put_obj recalculating target
  -502> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s s3:put_obj reading permissions
  -501> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s s3:put_obj init op
  -500> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s s3:put_obj verifying op mask
  -499> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s s3:put_obj verifying op permissions
  -498> 2024-01-25T14:54:31.588+ 7f5186648b00  5 req 
2568229052387020224 0.0s s3:put_obj Searching permissions for 
identity=rgw::auth::SysReqApplier -> 
rgw::auth::LocalApplier(acct_user=redacted, acct_name=redacted, 
subuser=, perm_mask=15, is_admin=0) mask=50
  -497> 2024-01-25T14:54:31.588+ 7f5186648b00  5 req 
2568229052387020224 0.0s s3:put_obj Searching permissions for 
uid=redacted
  -496> 2024-01-25T14:54:31.588+ 7f5186648b00  5 req 
2568229052387020224 0.0s s3:put_obj Found permission: 15
  -495> 2024-01-25T14:54:31.588+ 7f5186648b00  5 req 
2568229052387020224 0.0s s3:put_obj Searching permissions for 
group=1 mask=50
  -494> 2024-01-25T14:54:31.588+ 7f5186648b00  5 req 
2568229052387020224 0.0s s3:put_obj Permissions for group not found
  -493> 2024-01-25T14:54:31.588+ 7f5186648b00  5 req 
2568229052387020224 0.0s s3:put_obj Searching permissions for 
group=2 mask=50
  -492> 2024-01-25T14:54:31.588+ 7f5186648b00  5 req 
2568229052387020224 0.0s s3:put_obj Permissions for group not found
  -491> 2024-01-25T14:54:31.588+ 7f5186648b00  5 req 
2568229052387020224 0.0s s3:put_obj -- Getting permissions done 
for identity=rgw::auth::SysReqApplier -> 
rgw::auth::LocalApplier(acct_user=redacted, acct_name=redacted, 
subuser=, perm_mask=15, is_admin=0), owner=redacted, perm=2
  -490> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s s3:put_obj verifying op params
  -489> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s s3:put_obj pre-executing
  -488> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s s3:put_obj check rate limiting
  -487> 2024-01-25T14:54:31.588+ 7f5186648b00  2 req 
2568229052387020224 0.0s s3:put_obj executing
  -486> 2024-01-25T14:54:31.624+ 7f5183898b00  5 req 
2568229052387020224 0.036000550s s3:put_obj NOTICE: call to 
do_aws4_auth_completion
  -485> 2024-01-25T14:54:31.624+ 7f5183898b00  5 req 
2568229052387020224 0.036000550s s3:put_obj NOTICE: call to 
do_aws4_auth_completion
  -484> 2024-01-25T14:54:31.680+ 7f5185bc8b00  2 req 
2568229052387020224 0.092001401s s3:put_obj completing
  -483> 2024-01-25T14:54:31.680+ 7f5185bc8b00  2 req 
2568229052387020224 0.092001401s s3:put_obj op status=0
  -482> 2024-01-25T14:54:31.680+ 7f5185bc8b00  2 req 
2568229052387020224 0.092001401s s3:put_obj http status=200


  -481> 2024-01-25T14:54:31.680+ 7f5185bc8b00  1 == req done 
req=0x7f517ffca720 op status=0 http_status=200 latency=0.092001401s ==


Thanks for your help.

Marc 

[ceph-users] Re: cephadm discovery service certificate absent after upgrade.

2024-01-25 Thread David C.
It would be cool, actually, to have the metrics working in 18.2.2, for IPv6
only

Otherwise, everything works fine on my side.


Cordialement,

*David CASIER*




Le jeu. 25 janv. 2024 à 16:12, Nicolas FOURNIL 
a écrit :

> Gotcha !
>
> I've got the point, after restarting the CA certificate creation with :
> ceph restful create-self-signed-cert
>
> I get this error :
> Module 'cephadm' has failed: Expected 4 octets in
> 'fd30:::0:1101:2:0:501'
>
>
> *Ouch 4 octets = IP4 address expected... some nice code in perspective.*
>
> I go through podman to get more traces :
>
>   File "/usr/share/ceph/mgr/cephadm/ssl_cert_utils.py", line 49, in
> generate_root_cert
> [x509.IPAddress(ipaddress.IPv4Address(addr))]
>   File "/lib64/python3.6/ipaddress.py", line 1284, in __init__
> self._ip = self._ip_int_from_string(addr_str)
>   File "/lib64/python3.6/ipaddress.py", line 1118, in _ip_int_from_string
> raise AddressValueError("Expected 4 octets in %r" % ip_str)
> ipaddress.AddressValueError: Expected 4 octets in
> 'fd30:::0:1101:2:0:501'
>
> So I github this and find this fix in 19.0.0 (with backport not yet
> released) :
>
>
> https://github.com/ceph/ceph/commit/647b5d67a8a800091acea68d20e87354373b0fac
>
> This example shows that it's impossible to get any metrics in an IPv6 only
> network (Discovery is impossible) and it's visible at install so there's no
> test for IPv6 only environnement before release ?
>
> Now I'm seriously asking myself to put a crappy IPv4 subnet only for my
> ceph cluster, because it's always a headache to get it working in an IPv6
> environment.
>
>
> Le mar. 23 janv. 2024 à 17:58, David C.  a écrit :
>
>> According to sources, the certificates are generated automatically at
>> startup. Hence my question if the service started correctly.
>>
>> I also had problems with IPv6 only, but I don't immediately have more info
>> 
>>
>> Cordialement,
>>
>> *David CASIER*
>> 
>>
>>
>> Le mar. 23 janv. 2024 à 17:46, Nicolas FOURNIL 
>> a écrit :
>>
>>> IPv6 only : Yes, the -ms_bind_ipv6=true is already set-
>>>
>>> I had tried a rotation of the keys for node-exporter and I get this :
>>>
>>> 2024-01-23T16:43:56.098796+ mgr.srv06-r2b-fl1.foxykh (mgr.342408)
>>> 87074 : cephadm [INF] Rotating authentication key for
>>> node-exporter.srv06-r2b-fl1
>>> 2024-01-23T16:43:56.099224+ mgr.srv06-r2b-fl1.foxykh (mgr.342408)
>>> 87075 : cephadm [ERR] unknown daemon type node-exporter
>>> Traceback (most recent call last):
>>>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1039, in
>>> _check_daemons
>>> self.mgr._daemon_action(daemon_spec, action=action)
>>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 2203, in
>>> _daemon_action
>>> return self._rotate_daemon_key(daemon_spec)
>>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 2147, in
>>> _rotate_daemon_key
>>> 'entity': daemon_spec.entity_name(),
>>>   File "/usr/share/ceph/mgr/cephadm/services/cephadmservice.py", line
>>> 108, in entity_name
>>> return get_auth_entity(self.daemon_type, self.daemon_id,
>>> host=self.host)
>>>   File "/usr/share/ceph/mgr/cephadm/services/cephadmservice.py", line
>>> 47, in get_auth_entity
>>> raise OrchestratorError(f"unknown daemon type {daemon_type}")
>>> orchestrator._interface.OrchestratorError: unknown daemon type
>>> node-exporter
>>>
>>> Tried to remove & recreate service : it's the same ... how to stop the
>>> rotation now :-/
>>>
>>>
>>>
>>> Le mar. 23 janv. 2024 à 17:18, David C.  a
>>> écrit :
>>>
 Is the cephadm http server service starting correctly (in the mgr logs)?

 IPv6 ?
 

 Cordialement,

 *David CASIER*
 




 Le mar. 23 janv. 2024 à 16:29, Nicolas FOURNIL <
 nicolas.four...@gmail.com> a écrit :

> Hello,
>
> Thanks for advice but Prometheus cert is ok, (Self signed) and tested
> with curl and web navigator.
>
>  it seems to be the "Service discovery" certificate from cephadm who
> is missing but I cannot figure out how to set it.
>
> There's in the code a function to create this certificate inside the
> Key store but how ... that's the point :-(
>
> Regards.
>
>
>
> Le mar. 23 janv. 2024 à 15:52, David C.  a
> écrit :
>
>> Hello Nicolas,
>>
>> I don't know if it's an update issue.
>>
>> If this is not a problem for you, you can consider redeploying
>> grafana/prometheus.
>>
>> It is also possible to inject your own certificates :
>>
>> https://docs.ceph.com/en/latest/cephadm/services/monitoring/#example
>>
>>
>> 

[ceph-users] Re: RGW crashes when rgw_enable_ops_log is enabled

2024-01-25 Thread Matt Benjamin
Hi Marc,

The ops log code is designed to discard data if the socket is
flow-controlled, iirc.  Maybe we just need to handle the signal.

Of course, you should have something consuming data on the socket, but it's
still a problem if radosgw exits unexpectedly.

Matt

On Thu, Jan 25, 2024 at 10:08 AM Marc Singer  wrote:

> Hi Ceph Users
>
> I am encountering a problem with the RGW Admin Ops Socket.
>
> I am setting up the socket as follows:
>
> rgw_enable_ops_log = true
> rgw_ops_log_socket_path = /tmp/ops/rgw-ops.socket
> rgw_ops_log_data_backlog = 16Mi
>
> Seems like the socket fills up over time and it doesn't seem to get
> flushed, at some point the process runs out of file space.
>
> Do I need to configure something or send something for the socket to flush?
>
> See the log here:
>
> 0> 2024-01-25T13:10:13.908+ 7f247b00eb00 -1 *** Caught signal (File
> size limit exceeded) **
>   in thread 7f247b00eb00 thread_name:ops_log_file
>
>   ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef
> (stable)
>   NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
>
> --- logging levels ---
> 0/ 5 none
> 0/ 1 lockdep
> 0/ 1 context
> 1/ 1 crush
> 1/ 5 mds
> 1/ 5 mds_balancer
> 1/ 5 mds_locker
> 1/ 5 mds_log
> 1/ 5 mds_log_expire
> 1/ 5 mds_migrator
> 0/ 1 buffer
> 0/ 1 timer
> 0/ 1 filer
> 0/ 1 striper
> 0/ 1 objecter
> 0/ 5 rados
> 0/ 5 rbd
> 0/ 5 rbd_mirror
> 0/ 5 rbd_replay
> 0/ 5 rbd_pwl
> 0/ 5 journaler
> 0/ 5 objectcacher
> 0/ 5 immutable_obj_cache
> 0/ 5 client
> 1/ 5 osd
> 0/ 5 optracker
> 0/ 5 objclass
> 1/ 3 filestore
> 1/ 3 journal
> 0/ 0 ms
> 1/ 5 mon
> 0/10 monc
> 1/ 5 paxos
> 0/ 5 tp
> 1/ 5 auth
> 1/ 5 crypto
> 1/ 1 finisher
> 1/ 1 reserver
> 1/ 5 heartbeatmap
> 1/ 5 perfcounter
> 1/ 5 rgw
> 1/ 5 rgw_sync
> 1/ 5 rgw_datacache
> 1/ 5 rgw_access
> 1/ 5 rgw_dbstore
> 1/ 5 rgw_flight
> 1/ 5 javaclient
> 1/ 5 asok
> 1/ 1 throttle
> 0/ 0 refs
> 1/ 5 compressor
> 1/ 5 bluestore
> 1/ 5 bluefs
> 1/ 3 bdev
> 1/ 5 kstore
> 4/ 5 rocksdb
> 4/ 5 leveldb
> 1/ 5 fuse
> 2/ 5 mgr
> 1/ 5 mgrc
> 1/ 5 dpdk
> 1/ 5 eventtrace
> 1/ 5 prioritycache
> 0/ 5 test
> 0/ 5 cephfs_mirror
> 0/ 5 cephsqlite
> 0/ 5 seastore
> 0/ 5 seastore_onode
> 0/ 5 seastore_odata
> 0/ 5 seastore_omap
> 0/ 5 seastore_tm
> 0/ 5 seastore_t
> 0/ 5 seastore_cleaner
> 0/ 5 seastore_epm
> 0/ 5 seastore_lba
> 0/ 5 seastore_fixedkv_tree
> 0/ 5 seastore_cache
> 0/ 5 seastore_journal
> 0/ 5 seastore_device
> 0/ 5 seastore_backref
> 0/ 5 alienstore
> 1/ 5 mclock
> 0/ 5 cyanstore
> 1/ 5 ceph_exporter
> 1/ 5 memstore
>-2/-2 (syslog threshold)
>99/99 (stderr threshold)
> --- pthread ID / name mapping for recent threads ---
>7f2472a89b00 / safe_timer
>7f2472cadb00 / radosgw
>...
>log_file
>
> /var/lib/ceph/crash/2024-01-25T13:10:13.909546Z_01ee6e6a-e946-4006-9d32-e17ef2f9df74/log
> --- end dump of recent events ---
> reraise_fatal: default handler for signal 25 didn't terminate the process?
>
> Thank you for your help.
>
> Marc
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm discovery service certificate absent after upgrade.

2024-01-25 Thread Nicolas FOURNIL
Gotcha !

I've got the point, after restarting the CA certificate creation with :
ceph restful create-self-signed-cert

I get this error :
Module 'cephadm' has failed: Expected 4 octets in
'fd30:::0:1101:2:0:501'


*Ouch 4 octets = IP4 address expected... some nice code in perspective.*

I go through podman to get more traces :

  File "/usr/share/ceph/mgr/cephadm/ssl_cert_utils.py", line 49, in
generate_root_cert
[x509.IPAddress(ipaddress.IPv4Address(addr))]
  File "/lib64/python3.6/ipaddress.py", line 1284, in __init__
self._ip = self._ip_int_from_string(addr_str)
  File "/lib64/python3.6/ipaddress.py", line 1118, in _ip_int_from_string
raise AddressValueError("Expected 4 octets in %r" % ip_str)
ipaddress.AddressValueError: Expected 4 octets in
'fd30:::0:1101:2:0:501'

So I github this and find this fix in 19.0.0 (with backport not yet
released) :

https://github.com/ceph/ceph/commit/647b5d67a8a800091acea68d20e87354373b0fac

This example shows that it's impossible to get any metrics in an IPv6 only
network (Discovery is impossible) and it's visible at install so there's no
test for IPv6 only environnement before release ?

Now I'm seriously asking myself to put a crappy IPv4 subnet only for my
ceph cluster, because it's always a headache to get it working in an IPv6
environment.


Le mar. 23 janv. 2024 à 17:58, David C.  a écrit :

> According to sources, the certificates are generated automatically at
> startup. Hence my question if the service started correctly.
>
> I also had problems with IPv6 only, but I don't immediately have more info
> 
>
> Cordialement,
>
> *David CASIER*
> 
>
>
> Le mar. 23 janv. 2024 à 17:46, Nicolas FOURNIL 
> a écrit :
>
>> IPv6 only : Yes, the -ms_bind_ipv6=true is already set-
>>
>> I had tried a rotation of the keys for node-exporter and I get this :
>>
>> 2024-01-23T16:43:56.098796+ mgr.srv06-r2b-fl1.foxykh (mgr.342408)
>> 87074 : cephadm [INF] Rotating authentication key for
>> node-exporter.srv06-r2b-fl1
>> 2024-01-23T16:43:56.099224+ mgr.srv06-r2b-fl1.foxykh (mgr.342408)
>> 87075 : cephadm [ERR] unknown daemon type node-exporter
>> Traceback (most recent call last):
>>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1039, in
>> _check_daemons
>> self.mgr._daemon_action(daemon_spec, action=action)
>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 2203, in
>> _daemon_action
>> return self._rotate_daemon_key(daemon_spec)
>>   File "/usr/share/ceph/mgr/cephadm/module.py", line 2147, in
>> _rotate_daemon_key
>> 'entity': daemon_spec.entity_name(),
>>   File "/usr/share/ceph/mgr/cephadm/services/cephadmservice.py", line
>> 108, in entity_name
>> return get_auth_entity(self.daemon_type, self.daemon_id,
>> host=self.host)
>>   File "/usr/share/ceph/mgr/cephadm/services/cephadmservice.py", line 47,
>> in get_auth_entity
>> raise OrchestratorError(f"unknown daemon type {daemon_type}")
>> orchestrator._interface.OrchestratorError: unknown daemon type
>> node-exporter
>>
>> Tried to remove & recreate service : it's the same ... how to stop the
>> rotation now :-/
>>
>>
>>
>> Le mar. 23 janv. 2024 à 17:18, David C.  a écrit :
>>
>>> Is the cephadm http server service starting correctly (in the mgr logs)?
>>>
>>> IPv6 ?
>>> 
>>>
>>> Cordialement,
>>>
>>> *David CASIER*
>>> 
>>>
>>>
>>>
>>>
>>> Le mar. 23 janv. 2024 à 16:29, Nicolas FOURNIL <
>>> nicolas.four...@gmail.com> a écrit :
>>>
 Hello,

 Thanks for advice but Prometheus cert is ok, (Self signed) and tested
 with curl and web navigator.

  it seems to be the "Service discovery" certificate from cephadm who is
 missing but I cannot figure out how to set it.

 There's in the code a function to create this certificate inside the
 Key store but how ... that's the point :-(

 Regards.



 Le mar. 23 janv. 2024 à 15:52, David C.  a
 écrit :

> Hello Nicolas,
>
> I don't know if it's an update issue.
>
> If this is not a problem for you, you can consider redeploying
> grafana/prometheus.
>
> It is also possible to inject your own certificates :
>
> https://docs.ceph.com/en/latest/cephadm/services/monitoring/#example
>
>
> https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2
>
> 
>
> Cordialement,
>
> *David CASIER*
> 
>
>
>
> Le mar. 23 janv. 2024 à 10:56, Nicolas FOURNIL <
> nicolas.four...@gmail.com> a écrit :
>
>>  Hello,
>>
>> I've just fresh upgrade from Quincy to Reef and my graphs are now

[ceph-users] RGW crashes when rgw_enable_ops_log is enabled

2024-01-25 Thread Marc Singer

Hi Ceph Users

I am encountering a problem with the RGW Admin Ops Socket.

I am setting up the socket as follows:

rgw_enable_ops_log = true
rgw_ops_log_socket_path = /tmp/ops/rgw-ops.socket
rgw_ops_log_data_backlog = 16Mi

Seems like the socket fills up over time and it doesn't seem to get 
flushed, at some point the process runs out of file space.


Do I need to configure something or send something for the socket to flush?

See the log here:

0> 2024-01-25T13:10:13.908+ 7f247b00eb00 -1 *** Caught signal (File 
size limit exceeded) **

 in thread 7f247b00eb00 thread_name:ops_log_file

 ceph version 18.2.0 (5dd24139a1eada541a3bc16b6941c5dde975e26d) reef 
(stable)
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 rbd_pwl
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 immutable_obj_cache
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 0 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 rgw_sync
   1/ 5 rgw_datacache
   1/ 5 rgw_access
   1/ 5 rgw_dbstore
   1/ 5 rgw_flight
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   1/ 5 fuse
   2/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
   1/ 5 prioritycache
   0/ 5 test
   0/ 5 cephfs_mirror
   0/ 5 cephsqlite
   0/ 5 seastore
   0/ 5 seastore_onode
   0/ 5 seastore_odata
   0/ 5 seastore_omap
   0/ 5 seastore_tm
   0/ 5 seastore_t
   0/ 5 seastore_cleaner
   0/ 5 seastore_epm
   0/ 5 seastore_lba
   0/ 5 seastore_fixedkv_tree
   0/ 5 seastore_cache
   0/ 5 seastore_journal
   0/ 5 seastore_device
   0/ 5 seastore_backref
   0/ 5 alienstore
   1/ 5 mclock
   0/ 5 cyanstore
   1/ 5 ceph_exporter
   1/ 5 memstore
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
--- pthread ID / name mapping for recent threads ---
  7f2472a89b00 / safe_timer
  7f2472cadb00 / radosgw
  ...
  log_file 
/var/lib/ceph/crash/2024-01-25T13:10:13.909546Z_01ee6e6a-e946-4006-9d32-e17ef2f9df74/log

--- end dump of recent events ---
reraise_fatal: default handler for signal 25 didn't terminate the process?

Thank you for your help.

Marc
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-25 Thread Özkan Göksu
I will try my best to explain my situation.

I don't have a separate mds server. I have 5 identical nodes, 3 of them
mons, and I use the other 2 as active and standby mds. (currently I have
left overs from max_mds 4)

root@ud-01:~# ceph -s
  cluster:
id: e42fd4b0-313b-11ee-9a00-31da71873773
health: HEALTH_WARN
1 clients failing to respond to cache pressure

  services:
mon: 3 daemons, quorum ud-01,ud-02,ud-03 (age 9d)
mgr: ud-01.qycnol(active, since 8d), standbys: ud-02.tfhqfd
mds: 1/1 daemons up, 4 standby
osd: 80 osds: 80 up (since 9d), 80 in (since 5M)

  data:
volumes: 1/1 healthy
pools:   3 pools, 2305 pgs
objects: 106.58M objects, 25 TiB
usage:   45 TiB used, 101 TiB / 146 TiB avail
pgs: 2303 active+clean
 2active+clean+scrubbing+deep

  io:
client:   16 MiB/s rd, 3.4 MiB/s wr, 77 op/s rd, 23 op/s wr

--
root@ud-01:~# ceph fs status
ud-data - 84 clients
===
RANK  STATE   MDS  ACTIVITY DNSINOS   DIRS
CAPS
 0active  ud-data.ud-02.xcoojt  Reqs:   40 /s  2579k  2578k   169k
 3048k
POOL   TYPE USED  AVAIL
cephfs.ud-data.meta  metadata   136G  44.9T
cephfs.ud-data.datadata44.3T  44.9T

--
root@ud-01:~# ceph health detail
HEALTH_WARN 1 clients failing to respond to cache pressure
[WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
mds.ud-data.ud-02.xcoojt(mds.0): Client bmw-m4 failing to respond to
cache pressure client_id: 1275577

--
When I check the failing client with session ls I see only "num_caps: 12298"

ceph tell mds.ud-data.ud-02.xcoojt session ls | jq -r '.[] | "clientid:
\(.id)= num_caps: \(.num_caps), num_leases: \(.num_leases),
request_load_avg: \(.request_load_avg), num_completed_requests:
\(.num_completed_requests), num_completed_flushes:
\(.num_completed_flushes)"' | sort -n -t: -k3

clientid: 1275577= num_caps: 12298, num_leases: 0, request_load_avg: 0,
num_completed_requests: 0, num_completed_flushes: 1
clientid: 1294542= num_caps: 13000, num_leases: 12, request_load_avg: 105,
num_completed_requests: 0, num_completed_flushes: 6
clientid: 1282187= num_caps: 16869, num_leases: 1, request_load_avg: 0,
num_completed_requests: 0, num_completed_flushes: 1
clientid: 1275589= num_caps: 18943, num_leases: 0, request_load_avg: 52,
num_completed_requests: 0, num_completed_flushes: 1
clientid: 1282154= num_caps: 24747, num_leases: 1, request_load_avg: 57,
num_completed_requests: 2, num_completed_flushes: 2
clientid: 1275553= num_caps: 25120, num_leases: 2, request_load_avg: 116,
num_completed_requests: 2, num_completed_flushes: 8
clientid: 1282142= num_caps: 27185, num_leases: 6, request_load_avg: 128,
num_completed_requests: 0, num_completed_flushes: 8
clientid: 1275535= num_caps: 40364, num_leases: 6, request_load_avg: 111,
num_completed_requests: 2, num_completed_flushes: 8
clientid: 1282130= num_caps: 41483, num_leases: 0, request_load_avg: 135,
num_completed_requests: 0, num_completed_flushes: 1
clientid: 1275547= num_caps: 42953, num_leases: 4, request_load_avg: 119,
num_completed_requests: 2, num_completed_flushes: 6
clientid: 1282139= num_caps: 45435, num_leases: 27, request_load_avg: 84,
num_completed_requests: 2, num_completed_flushes: 34
clientid: 1282136= num_caps: 48374, num_leases: 8, request_load_avg: 0,
num_completed_requests: 1, num_completed_flushes: 1
clientid: 1275532= num_caps: 48664, num_leases: 7, request_load_avg: 115,
num_completed_requests: 2, num_completed_flushes: 8
clientid: 1191789= num_caps: 130319, num_leases: 0, request_load_avg: 1753,
num_completed_requests: 0, num_completed_flushes: 0
clientid: 1275571= num_caps: 139488, num_leases: 0, request_load_avg: 2,
num_completed_requests: 0, num_completed_flushes: 1
clientid: 1282133= num_caps: 145487, num_leases: 0, request_load_avg: 8,
num_completed_requests: 1, num_completed_flushes: 1
clientid: 1534496= num_caps: 1041316, num_leases: 0, request_load_avg: 0,
num_completed_requests: 0, num_completed_flushes: 1

--
When I check the dashboard/service/mds I see %120+ CPU usage on active MDS
but on the host everything is almost idle and disk waits are very low.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.610.000.380.410.00   98.60

Devicer/s rMB/s   rrqm/s  %rrqm r_await rareq-sz w/s
  wMB/s   wrqm/s  %wrqm w_await wareq-sz d/s dMB/s   drqm/s  %drqm
d_await dareq-sz f/s f_await  aqu-sz  %util
sdc  2.00  0.01 0.00   0.000.50 6.00   20.00
   0.04 0.00   0.000.50 2.000.00  0.00 0.00   0.00
   0.00 0.00   10.000.600.02   1.20
sdd  3.00  0.02 0.00   0.000.67 8.00  285.00
   1.8477.00  21.270.44 6.610.00  0.00 0.00   0.00
   0.00 0.00  114.000.830.22  

[ceph-users] Re: TLS 1.2 for dashboard

2024-01-25 Thread Nizamudeen A
Understood, thank you.

On Thu, Jan 25, 2024, 20:24 Sake Ceph  wrote:

> I would say drop it for squid release or if you keep it in squid, but
> going to disable it in a minor release later, please make a note in the
> release notes if the option is being removed.
> Just my 2 cents :)
>
> Best regards,
> Sake
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: TLS 1.2 for dashboard

2024-01-25 Thread Sake Ceph
I would say drop it for squid release or if you keep it in squid, but going to 
disable it in a minor release later, please make a note in the release notes if 
the option is being removed. 
Just my 2 cents :) 

Best regards,
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: TLS 1.2 for dashboard

2024-01-25 Thread Nizamudeen A
Ah okay, thanks for the clarification.

In that case, probably we'll need to keep this 1.2 fix for squid i guess.
I'll check and will update as necessary.

On Thu, Jan 25, 2024, 20:12 Sake Ceph  wrote:

> Hi Nizamudeen,
>
> Thank you for your quick response!
>
> The load balancers support TLS 1.3, but the administrators need to
> reconfigure the healthchecks. The only problem, it's a global change for
> all load balancers... So not something they change overnight and need to
> plan/test for.
>
> Best regards,
> Sake
>
> > Op 25-01-2024 15:22 CET schreef Nizamudeen A :
> >
> >
> > Hi,
> >
> > I'll re-open the PR and will merge it to Quincy. Btw i want to know if
> the load balancers will be supporting tls 1.3 in future. Because we were
> planning to completely drop the tls1.2 support from dashboard because of
> security reasons. (But so far we are planning to keep it as it is atleast
> for the older releases)
> >
> > Regards,
> > Nizam
> >
> >
> > On Thu, Jan 25, 2024, 19:41 Sake Ceph  wrote:
> > > After upgrading to 17.2.7 our load balancers can't check the status of
> the manager nodes for the dashboard. After some troubleshooting I noticed
> only TLS 1.3 is availalbe for the dashboard.
> > >
> > >  Looking at the source (quincy), TLS config got changed from 1.2 to
> 1.3. Searching in the tracker I found out that we are not the only one with
> troubles and there will be added an option to the dashboard config. Tracker
> ID 62940 got backports and the ones for reef and pacific already merged.
> But the pull request (63068) for Quincy is closed :(
> > >
> > >  What to do? I hope this one can get merged for 17.2.8.
> > >  ___
> > >  ceph-users mailing list -- ceph-users@ceph.io
> > >  To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: TLS 1.2 for dashboard

2024-01-25 Thread Sake Ceph
Hi Nizamudeen, 

Thank you for your quick response! 

The load balancers support TLS 1.3, but the administrators need to reconfigure 
the healthchecks. The only problem, it's a global change for all load 
balancers... So not something they change overnight and need to plan/test for.

Best regards, 
Sake

> Op 25-01-2024 15:22 CET schreef Nizamudeen A :
> 
> 
> Hi,
> 
> I'll re-open the PR and will merge it to Quincy. Btw i want to know if the 
> load balancers will be supporting tls 1.3 in future. Because we were planning 
> to completely drop the tls1.2 support from dashboard because of security 
> reasons. (But so far we are planning to keep it as it is atleast for the 
> older releases)
> 
> Regards,
> Nizam
> 
> 
> On Thu, Jan 25, 2024, 19:41 Sake Ceph  wrote:
> > After upgrading to 17.2.7 our load balancers can't check the status of the 
> > manager nodes for the dashboard. After some troubleshooting I noticed only 
> > TLS 1.3 is availalbe for the dashboard. 
> >  
> >  Looking at the source (quincy), TLS config got changed from 1.2 to 1.3. 
> > Searching in the tracker I found out that we are not the only one with 
> > troubles and there will be added an option to the dashboard config. Tracker 
> > ID 62940 got backports and the ones for reef and pacific already merged. 
> > But the pull request (63068) for Quincy is closed :(
> >  
> >  What to do? I hope this one can get merged for 17.2.8.
> >  ___
> >  ceph-users mailing list -- ceph-users@ceph.io
> >  To unsubscribe send an email to ceph-users-le...@ceph.io
> >  
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: TLS 1.2 for dashboard

2024-01-25 Thread Nizamudeen A
Hi,

I'll re-open the PR and will merge it to Quincy. Btw i want to know if the
load balancers will be supporting tls 1.3 in future. Because we were
planning to completely drop the tls1.2 support from dashboard because of
security reasons. (But so far we are planning to keep it as it is atleast
for the older releases)

Regards,
Nizam

On Thu, Jan 25, 2024, 19:41 Sake Ceph  wrote:

> After upgrading to 17.2.7 our load balancers can't check the status of the
> manager nodes for the dashboard. After some troubleshooting I noticed only
> TLS 1.3 is availalbe for the dashboard.
>
> Looking at the source (quincy), TLS config got changed from 1.2 to 1.3.
> Searching in the tracker I found out that we are not the only one with
> troubles and there will be added an option to the dashboard config. Tracker
> ID 62940 got backports and the ones for reef and pacific already merged.
> But the pull request (63068) for Quincy is closed :(
>
> What to do? I hope this one can get merged for 17.2.8.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] TLS 1.2 for dashboard

2024-01-25 Thread Sake Ceph
After upgrading to 17.2.7 our load balancers can't check the status of the 
manager nodes for the dashboard. After some troubleshooting I noticed only TLS 
1.3 is availalbe for the dashboard. 

Looking at the source (quincy), TLS config got changed from 1.2 to 1.3. 
Searching in the tracker I found out that we are not the only one with troubles 
and there will be added an option to the dashboard config. Tracker ID 62940 got 
backports and the ones for reef and pacific already merged. But the pull 
request (63068) for Quincy is closed :(

What to do? I hope this one can get merged for 17.2.8.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stupid question about ceph fs volume

2024-01-25 Thread David C.
It would be a pleasure to complete the documentation but we would need to
test or have someone confirm what I have assumed.

Concerning the warning, I think we should not talk about the discovery
procedure.
While the discovery procedure has already saved some entities, it has also
put entities at risk if misused.


Cordialement,

*David CASIER*




Le jeu. 25 janv. 2024 à 14:45, Eugen Block  a écrit :

> Oh right, I forgot about that, good point! But if that is (still) true
> then this should definitely be in the docs as a warning for EC pools
> in cephfs!
>
> Zitat von "David C." :
>
> > In case the root is EC, it is likely that is not possible to apply the
> > disaster recovery procedure, (no xattr layout/parent on the data pool).
> >
> > 
> >
> > Cordialement,
> >
> > *David CASIER*
> > 
> >
> >
> > Le jeu. 25 janv. 2024 à 13:03, Eugen Block  a écrit :
> >
> >> I'm not sure if using EC as default data pool for cephfs is still
> >> discouraged as stated in the output when attempting to do that, the
> >> docs don't mention that (at least not in the link I sent in the last
> >> mail):
> >>
> >> ceph:~ # ceph fs new cephfs cephfs_metadata cephfs_data
> >> Error EINVAL: pool 'cephfs_data' (id '8') is an erasure-coded pool.
> >> Use of an EC pool for the default data pool is discouraged; see the
> >> online CephFS documentation for more information. Use --force to
> >> override.
> >>
> >> ceph:~ # ceph fs new cephfs cephfs_metadata cephfs_data --force
> >> new fs with metadata pool 6 and data pool 8
> >>
> >> CC'ing Zac here to hopefully clear that up.
> >>
> >> Zitat von "David C." :
> >>
> >> > Albert,
> >> > Never used EC for (root) data pool.
> >> >
> >> > Le jeu. 25 janv. 2024 à 12:08, Albert Shih  a
> >> écrit :
> >> >
> >> >> Le 25/01/2024 à 08:42:19+, Eugen Block a écrit
> >> >> > Hi,
> >> >> >
> >> >> > it's really as easy as it sounds (fresh test cluster on 18.2.1
> without
> >> >> any
> >> >> > pools yet):
> >> >> >
> >> >> > ceph:~ # ceph fs volume create cephfs
> >> >>
> >> >> Yes...I already try that with the label and works fine.
> >> >>
> >> >> But I prefer to use «my» pools. Because I have ssd/hdd and want also
> try
> >> >> «erasure coding» pool for the data.
> >> >>
> >> >
> >> >> I also need to set the pg_num and pgp_num (I know I can do that after
> >> the
> >> >> creation).
> >> >
> >> >
> >> >> So I manage to do ... half what I want...
> >> >>
> >> >> In fact
> >> >>
> >> >>   ceph fs volume create thing
> >> >>
> >> >> will create two pools
> >> >>
> >> >>   cephfs.thing.meta
> >> >>   cephfs.thing.data
> >> >>
> >> >> and if those pool already existe it will use them.
> >> >>
> >> >> But that's only if the data are replicated no with erasure
> >> coding(maybe
> >> >> I forget something config on the pool).
> >> >>
> >> >> Well I will currently continue my test with replicated data.
> >> >>
> >> >> > The pools and the daemons are created automatically (you can
> control
> >> the
> >> >> > placement of the daemons with the --placement option). Note that
> the
> >> >> > metadata pool needs to be on fast storage, so you might need to
> change
> >> >> the
> >> >> > ruleset for the metadata pool after creation in case you have HDDs
> in
> >> >> place.
> >> >> > Changing pools after the creation can be done via ceph fs commands:
> >> >> >
> >> >> > ceph:~ # ceph osd pool create cephfs_data2
> >> >> > pool 'cephfs_data2' created
> >> >> >
> >> >> > ceph:~ # ceph fs add_data_pool cephfs cephfs_data2
> >> >> >   Pool 'cephfs_data2' (id '4') has pg autoscale mode 'on' but is
> not
> >> >> marked
> >> >> > as bulk.
> >> >> >   Consider setting the flag by running
> >> >> > # ceph osd pool set cephfs_data2 bulk true
> >> >> > added data pool 4 to fsmap
> >> >> >
> >> >> > ceph:~ # ceph fs status
> >> >> > cephfs - 0 clients
> >> >> > ==
> >> >> > RANK  STATE MDS   ACTIVITY DNSINOS
> >>  DIRS
> >> >> > CAPS
> >> >> >  0active  cephfs.soc9-ceph.uqcybj  Reqs:0 /s10 13
> >>  12
> >> >> > 0
> >> >> >POOL   TYPE USED  AVAIL
> >> >> > cephfs.cephfs.meta  metadata  64.0k  13.8G
> >> >> > cephfs.cephfs.datadata   0   13.8G
> >> >> >cephfs_data2   data   0   13.8G
> >> >> >
> >> >> >
> >> >> > You can't remove the default data pool, though (here it's
> >> >> > cephfs.cephfs.data). If you want to control the pool creation you
> can
> >> >> fall
> >> >> > back to the method you mentioned, create pools as you require them
> and
> >> >> then
> >> >> > create a new cephfs, and deploy the mds service.
> >> >>
> >> >> Yes, but I'm guessing the
> >> >>
> >> >>   ceph fs volume
> >> >>
> >> >> are the «future» so it would be super nice to add (at least) the
> option
> >> to
> >> >> choose the 

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-25 Thread Eugen Block
There is no definitive answer wrt mds tuning. As it is everywhere  
mentioned, it's about finding the right setup for your specific  
workload. If you can synthesize your workload (maybe scale down a bit)  
try optimizing it in a test cluster without interrupting your  
developers too much.
But what you haven't explained yet is what are you experiencing as a  
performance issue? Do you have numbers or a detailed description?
From the fs status output you didn't seem to have too much activity  
going on (around 140 requests per second), but that's probably not the  
usual traffic? What does ceph report in its client IO output?

Can you paste the 'ceph osd df' output as well?
Do you have dedicated MDS servers or are they colocated with other services?

Zitat von Özkan Göksu :


Hello  Eugen.

I read all of your MDS related topics and thank you so much for your effort
on this.
There is not much information and I couldn't find a MDS tuning guide at
all. It  seems that you are the correct person to discuss mds debugging and
tuning.

Do you have any documents or may I learn what is the proper way to debug
MDS and clients ?
Which debug logs will guide me to understand the limitations and will help
to tune according to the data flow?

While searching, I find this:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YO4SGL4DJQ6EKUBUIHKTFSW72ZJ3XLZS/
quote:"A user running VSCodium, keeping 15k caps open.. the opportunistic
caps recall eventually starts recalling those but the (el7 kernel) client
won't release them. Stopping Codium seems to be the only way to release."

Because of this I think I also need to play around with the client side too.

My main goal is increasing the speed and reducing the latency and I wonder
if these ideas are correct or not:
- Maybe I need to increase client side cache size because via each client,
multiple users request a lot of objects and clearly the
client_cache_size=16 default is not enough.
-  Maybe I need to increase client side maximum cache limit for
object "client_oc_max_objects=1000 to 1" and data "client_oc_size=200mi
to 400mi"
- The client cache cleaning threshold is not aggressive enough to keep the
free cache size in the desired range. I need to make it aggressive but this
should not reduce speed and increase latency.

mds_cache_memory_limit=4gi to 16gi
client_oc_max_objects=1000 to 1
client_oc_size=200mi to 400mi
client_permissions=false #to reduce latency.
client_cache_size=16 to 128


What do you think?



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2024-01-25 Thread Adrien Georget

We are a lot impacted by this issue with MGR in Pacific.
This has to be fixed.

As someone suggested in the issue tracker, we limited the memory usage 
of the MGR in the systemd unit (MemoryLimit=16G) in order to kill the 
MGR before it consumes all the memory of the server and impacts other 
services.


Adrien

Le 25/01/2024 à 08:06, Zakhar Kirpichenko a écrit :

I have to say that not including a fix for a serious issue into the last
minor release of Pacific is a rather odd decision.

/Z

On Thu, 25 Jan 2024 at 09:00, Konstantin Shalygin  wrote:


Hi,

The backport to pacific was rejected [1], you may switch to reef, when [2]
merged and released


[1] https://github.com/ceph/ceph/pull/55109
[2] https://github.com/ceph/ceph/pull/55110

k
Sent from my iPhone


On Jan 25, 2024, at 04:12, changzhi tan <544463...@qq.com> wrote:

Is there any way to solve this problem?thanks

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stupid question about ceph fs volume

2024-01-25 Thread Eugen Block
Oh right, I forgot about that, good point! But if that is (still) true  
then this should definitely be in the docs as a warning for EC pools  
in cephfs!


Zitat von "David C." :


In case the root is EC, it is likely that is not possible to apply the
disaster recovery procedure, (no xattr layout/parent on the data pool).



Cordialement,

*David CASIER*



Le jeu. 25 janv. 2024 à 13:03, Eugen Block  a écrit :


I'm not sure if using EC as default data pool for cephfs is still
discouraged as stated in the output when attempting to do that, the
docs don't mention that (at least not in the link I sent in the last
mail):

ceph:~ # ceph fs new cephfs cephfs_metadata cephfs_data
Error EINVAL: pool 'cephfs_data' (id '8') is an erasure-coded pool.
Use of an EC pool for the default data pool is discouraged; see the
online CephFS documentation for more information. Use --force to
override.

ceph:~ # ceph fs new cephfs cephfs_metadata cephfs_data --force
new fs with metadata pool 6 and data pool 8

CC'ing Zac here to hopefully clear that up.

Zitat von "David C." :

> Albert,
> Never used EC for (root) data pool.
>
> Le jeu. 25 janv. 2024 à 12:08, Albert Shih  a
écrit :
>
>> Le 25/01/2024 à 08:42:19+, Eugen Block a écrit
>> > Hi,
>> >
>> > it's really as easy as it sounds (fresh test cluster on 18.2.1 without
>> any
>> > pools yet):
>> >
>> > ceph:~ # ceph fs volume create cephfs
>>
>> Yes...I already try that with the label and works fine.
>>
>> But I prefer to use «my» pools. Because I have ssd/hdd and want also try
>> «erasure coding» pool for the data.
>>
>
>> I also need to set the pg_num and pgp_num (I know I can do that after
the
>> creation).
>
>
>> So I manage to do ... half what I want...
>>
>> In fact
>>
>>   ceph fs volume create thing
>>
>> will create two pools
>>
>>   cephfs.thing.meta
>>   cephfs.thing.data
>>
>> and if those pool already existe it will use them.
>>
>> But that's only if the data are replicated no with erasure
coding(maybe
>> I forget something config on the pool).
>>
>> Well I will currently continue my test with replicated data.
>>
>> > The pools and the daemons are created automatically (you can control
the
>> > placement of the daemons with the --placement option). Note that the
>> > metadata pool needs to be on fast storage, so you might need to change
>> the
>> > ruleset for the metadata pool after creation in case you have HDDs in
>> place.
>> > Changing pools after the creation can be done via ceph fs commands:
>> >
>> > ceph:~ # ceph osd pool create cephfs_data2
>> > pool 'cephfs_data2' created
>> >
>> > ceph:~ # ceph fs add_data_pool cephfs cephfs_data2
>> >   Pool 'cephfs_data2' (id '4') has pg autoscale mode 'on' but is not
>> marked
>> > as bulk.
>> >   Consider setting the flag by running
>> > # ceph osd pool set cephfs_data2 bulk true
>> > added data pool 4 to fsmap
>> >
>> > ceph:~ # ceph fs status
>> > cephfs - 0 clients
>> > ==
>> > RANK  STATE MDS   ACTIVITY DNSINOS
 DIRS
>> > CAPS
>> >  0active  cephfs.soc9-ceph.uqcybj  Reqs:0 /s10 13
 12
>> > 0
>> >POOL   TYPE USED  AVAIL
>> > cephfs.cephfs.meta  metadata  64.0k  13.8G
>> > cephfs.cephfs.datadata   0   13.8G
>> >cephfs_data2   data   0   13.8G
>> >
>> >
>> > You can't remove the default data pool, though (here it's
>> > cephfs.cephfs.data). If you want to control the pool creation you can
>> fall
>> > back to the method you mentioned, create pools as you require them and
>> then
>> > create a new cephfs, and deploy the mds service.
>>
>> Yes, but I'm guessing the
>>
>>   ceph fs volume
>>
>> are the «future» so it would be super nice to add (at least) the option
to
>> choose the couple of pool...
>>
>> >
>> > I haven't looked too deep into changing the default pool yet, so there
>> might
>> > be a way to switch that as well.
>>
>> Ok. I will also try but...well...newbie ;-)
>>
>> Anyway thanks.
>>
>> regards
>>
>> --
>> Albert SHIH 嶺 
>> France
>> Heure locale/Local time:
>> jeu. 25 janv. 2024 12:00:08 CET
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stupid question about ceph fs volume

2024-01-25 Thread David C.
In case the root is EC, it is likely that is not possible to apply the
disaster recovery procedure, (no xattr layout/parent on the data pool).



Cordialement,

*David CASIER*



Le jeu. 25 janv. 2024 à 13:03, Eugen Block  a écrit :

> I'm not sure if using EC as default data pool for cephfs is still
> discouraged as stated in the output when attempting to do that, the
> docs don't mention that (at least not in the link I sent in the last
> mail):
>
> ceph:~ # ceph fs new cephfs cephfs_metadata cephfs_data
> Error EINVAL: pool 'cephfs_data' (id '8') is an erasure-coded pool.
> Use of an EC pool for the default data pool is discouraged; see the
> online CephFS documentation for more information. Use --force to
> override.
>
> ceph:~ # ceph fs new cephfs cephfs_metadata cephfs_data --force
> new fs with metadata pool 6 and data pool 8
>
> CC'ing Zac here to hopefully clear that up.
>
> Zitat von "David C." :
>
> > Albert,
> > Never used EC for (root) data pool.
> >
> > Le jeu. 25 janv. 2024 à 12:08, Albert Shih  a
> écrit :
> >
> >> Le 25/01/2024 à 08:42:19+, Eugen Block a écrit
> >> > Hi,
> >> >
> >> > it's really as easy as it sounds (fresh test cluster on 18.2.1 without
> >> any
> >> > pools yet):
> >> >
> >> > ceph:~ # ceph fs volume create cephfs
> >>
> >> Yes...I already try that with the label and works fine.
> >>
> >> But I prefer to use «my» pools. Because I have ssd/hdd and want also try
> >> «erasure coding» pool for the data.
> >>
> >
> >> I also need to set the pg_num and pgp_num (I know I can do that after
> the
> >> creation).
> >
> >
> >> So I manage to do ... half what I want...
> >>
> >> In fact
> >>
> >>   ceph fs volume create thing
> >>
> >> will create two pools
> >>
> >>   cephfs.thing.meta
> >>   cephfs.thing.data
> >>
> >> and if those pool already existe it will use them.
> >>
> >> But that's only if the data are replicated no with erasure
> coding(maybe
> >> I forget something config on the pool).
> >>
> >> Well I will currently continue my test with replicated data.
> >>
> >> > The pools and the daemons are created automatically (you can control
> the
> >> > placement of the daemons with the --placement option). Note that the
> >> > metadata pool needs to be on fast storage, so you might need to change
> >> the
> >> > ruleset for the metadata pool after creation in case you have HDDs in
> >> place.
> >> > Changing pools after the creation can be done via ceph fs commands:
> >> >
> >> > ceph:~ # ceph osd pool create cephfs_data2
> >> > pool 'cephfs_data2' created
> >> >
> >> > ceph:~ # ceph fs add_data_pool cephfs cephfs_data2
> >> >   Pool 'cephfs_data2' (id '4') has pg autoscale mode 'on' but is not
> >> marked
> >> > as bulk.
> >> >   Consider setting the flag by running
> >> > # ceph osd pool set cephfs_data2 bulk true
> >> > added data pool 4 to fsmap
> >> >
> >> > ceph:~ # ceph fs status
> >> > cephfs - 0 clients
> >> > ==
> >> > RANK  STATE MDS   ACTIVITY DNSINOS
>  DIRS
> >> > CAPS
> >> >  0active  cephfs.soc9-ceph.uqcybj  Reqs:0 /s10 13
>  12
> >> > 0
> >> >POOL   TYPE USED  AVAIL
> >> > cephfs.cephfs.meta  metadata  64.0k  13.8G
> >> > cephfs.cephfs.datadata   0   13.8G
> >> >cephfs_data2   data   0   13.8G
> >> >
> >> >
> >> > You can't remove the default data pool, though (here it's
> >> > cephfs.cephfs.data). If you want to control the pool creation you can
> >> fall
> >> > back to the method you mentioned, create pools as you require them and
> >> then
> >> > create a new cephfs, and deploy the mds service.
> >>
> >> Yes, but I'm guessing the
> >>
> >>   ceph fs volume
> >>
> >> are the «future» so it would be super nice to add (at least) the option
> to
> >> choose the couple of pool...
> >>
> >> >
> >> > I haven't looked too deep into changing the default pool yet, so there
> >> might
> >> > be a way to switch that as well.
> >>
> >> Ok. I will also try but...well...newbie ;-)
> >>
> >> Anyway thanks.
> >>
> >> regards
> >>
> >> --
> >> Albert SHIH 嶺 
> >> France
> >> Heure locale/Local time:
> >> jeu. 25 janv. 2024 12:00:08 CET
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrubbing?

2024-01-25 Thread Jan Marek
Hello Sridhar,

Dne Čt, led 25, 2024 at 09:53:26 CET napsal(a) Sridhar Seshasayee:
> Hello Jan,
> 
> Meaning of my previous post was, that CEPH cluster didn't fulfill
> my needs and, although I had set mClock profile to
> "high_client_ops" (because I have a plenty of time to rebalancing
> and scrubbing), my clients went to problems.
> 
> As far as the question around mClock is concerned, there are further
> improvements in the works to handle QoS between client ops and
> background scrub ops. This should help address the issue you are
> currently facing. See PR: https://github.com/ceph/ceph/pull/51171
> for more information.
> Also, it would be helpful to know the Ceph version you are currently using.

thanks for your reply.

I've just in process upgrade between 17.2.6 and 18.2.1 (you can
see my previous posts about stuck in upgrade to reef).

Maybe this was cause of my problem...

Now I've tried give rest to the cluster to do some "background"
tasks (and it seems, that this was correct, because on my hosts
there is around 50-100MBps read and cca 10-50MBps write traffic -
cca 1/4-1/2 of previous load).

At Saturday I will change some settings on networking and I will
try to start upgrade process, maybe with --limit=1, to be "soft"
for cluster and for our clients...

> -Sridhar

Sincerely
Jan Marek
-- 
Ing. Jan Marek
University of South Bohemia
Academic Computer Centre
Phone: +420389032080
http://www.gnu.org/philosophy/no-word-attachments.cs.html


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about the CRUSH details

2024-01-25 Thread Janne Johansson
Den tors 25 jan. 2024 kl 11:57 skrev Henry lol :
>
> It's reasonable enough.
> actually, I expected the client to have just? thousands of
> "PG-to-OSDs" mappings.

Yes, but filename to PG is done with a pseudorandom algo.

> Nevertheless, it’s so heavy that the client calculates location on
> demand, right?

Yes, and I guess the client has some kind of algorithm that makes it
possible to know that PG 1.a4 should be on OSD 4, 93, 44 but also if 4
is missing, the next candidate would be 51, if 93 isn't up either then
66 would be the next logical OSD to contact for that copy and so on.
Since all parts (client, mons, OSDs) have the same code, when osd 4
dies, 51 knows it needs to get a copy from either 93 or 44 and as soon
as that copy is made, the PG will stop being active+degraded but might
possibly be active+remapped, since it knows it wants to go back to OSD
4 if it comes back with the same size again.

> if the client with the outdated map sends a request to the wrong OSD,
> then does the OSD handle it somehow through redirection or something?

I think it would get told it has the wrong osdmap.

> Lastly, not only CRUSH map but also other factors like storage usage
> are considered when doing CRUSH?
> because it seems that the target OSD set isn’t deterministic given only it.

It doesn't take OSD usage into consideration except at creation time
or OSD in/out/reweighing (or manual displacements with upmap and so
forth), so this is why "ceph df" will tell you a pool has X free
space, where X is "smallest free space on the OSDs on which this pool
lies, times the number of OSDs". Given the pseudorandom placement of
objects to PGs, there is nothing to prevent you from having the worst
luck ever and all the objects you create end up on the OSD with least
free space.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-25 Thread Özkan Göksu
Hello  Eugen.

I read all of your MDS related topics and thank you so much for your effort
on this.
There is not much information and I couldn't find a MDS tuning guide at
all. It  seems that you are the correct person to discuss mds debugging and
tuning.

Do you have any documents or may I learn what is the proper way to debug
MDS and clients ?
Which debug logs will guide me to understand the limitations and will help
to tune according to the data flow?

While searching, I find this:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YO4SGL4DJQ6EKUBUIHKTFSW72ZJ3XLZS/
quote:"A user running VSCodium, keeping 15k caps open.. the opportunistic
caps recall eventually starts recalling those but the (el7 kernel) client
won't release them. Stopping Codium seems to be the only way to release."

Because of this I think I also need to play around with the client side too.

My main goal is increasing the speed and reducing the latency and I wonder
if these ideas are correct or not:
- Maybe I need to increase client side cache size because via each client,
multiple users request a lot of objects and clearly the
client_cache_size=16 default is not enough.
-  Maybe I need to increase client side maximum cache limit for
object "client_oc_max_objects=1000 to 1" and data "client_oc_size=200mi
to 400mi"
- The client cache cleaning threshold is not aggressive enough to keep the
free cache size in the desired range. I need to make it aggressive but this
should not reduce speed and increase latency.

mds_cache_memory_limit=4gi to 16gi
client_oc_max_objects=1000 to 1
client_oc_size=200mi to 400mi
client_permissions=false #to reduce latency.
client_cache_size=16 to 128


What do you think?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs-top causes 16 mgr modules have recently crashed

2024-01-25 Thread Özkan Göksu
Hello Jos.

I check the diff and notice the difference:
https://github.com/ceph/ceph/pull/52127/files

Thank you for the guide link and for the fix.
Have a great day.

Regards.



23 Oca 2024 Sal 11:07 tarihinde Jos Collin  şunu yazdı:

> This fix is in the mds.
> I think you need to read
> https://docs.ceph.com/en/quincy/cephadm/upgrade/#staggered-upgrade.
>
> On 23/01/24 12:19, Özkan Göksu wrote:
>
> Hello Jos.
> Thank you for the reply.
>
> I can upgrade to 17.2.7 but I wonder can I only upgrade MON+MGR for this
> issue or do I need to upgrade all the parts?
> Otherwise I need to wait few weeks. I don't want to request maintenance
> during delivery time.
>
> root@ud-01:~# ceph orch upgrade ls
> {
> "image": "quay.io/ceph/ceph",
> "registry": "quay.io",
> "bare_image": "ceph/ceph",
> "versions": [
> "18.2.1",
> "18.2.0",
> "18.1.3",
> "18.1.2",
> "18.1.1",
> "18.1.0",
> "17.2.7",
> "17.2.6",
> "17.2.5",
> "17.2.4",
> "17.2.3",
> "17.2.2",
> "17.2.1",
> "17.2.0"
> ]
> }
>
> Best regards
>
> Jos Collin , 23 Oca 2024 Sal, 07:42 tarihinde şunu
> yazdı:
>
>> Please have this fix: https://tracker.ceph.com/issues/59551. It's
>> backported to quincy.
>>
>> On 23/01/24 03:11, Özkan Göksu wrote:
>> > Hello
>> >
>> > When I run cephfs-top it causes mgr module crash. Can you please tell me
>> > the reason?
>> >
>> > My environment:
>> > My ceph version 17.2.6
>> > Operating System: Ubuntu 22.04.2 LTS
>> > Kernel: Linux 5.15.0-84-generic
>> >
>> > I created the cephfs-top user with the following command:
>> > ceph auth get-or-create client.fstop mon 'allow r' mds 'allow r' osd
>> 'allow
>> > r' mgr 'allow r' > /etc/ceph/ceph.client.fstop.keyring
>> >
>> > This is the crash report:
>> >
>> > root@ud-01:~# ceph crash info
>> > 2024-01-22T21:25:59.313305Z_526253e3-e8cc-4d2c-adcb-69a7c9986801
>> > {
>> >  "backtrace": [
>> >  "  File \"/usr/share/ceph/mgr/stats/module.py\", line 32, in
>> > notify\nself.fs_perf_stats.notify_cmd(notify_id)",
>> >  "  File \"/usr/share/ceph/mgr/stats/fs/perf_stats.py\", line
>> 177,
>> > in notify_cmd\nmetric_features =
>> >
>> int(metadata[CLIENT_METADATA_KEY][\"metric_spec\"][\"metric_flags\"][\"feature_bits\"],
>> > 16)",
>> >  "ValueError: invalid literal for int() with base 16: '0x'"
>> >  ],
>> >  "ceph_version": "17.2.6",
>> >  "crash_id":
>> > "2024-01-22T21:25:59.313305Z_526253e3-e8cc-4d2c-adcb-69a7c9986801",
>> >  "entity_name": "mgr.ud-01.qycnol",
>> >  "mgr_module": "stats",
>> >  "mgr_module_caller": "ActivePyModule::notify",
>> >  "mgr_python_exception": "ValueError",
>> >  "os_id": "centos",
>> >  "os_name": "CentOS Stream",
>> >  "os_version": "8",
>> >  "os_version_id": "8",
>> >  "process_name": "ceph-mgr",
>> >  "stack_sig":
>> > "971ae170f1fff7f7bc0b7ae86d164b2b0136a8bd5ca7956166ea5161e51ad42c",
>> >  "timestamp": "2024-01-22T21:25:59.313305Z",
>> >  "utsname_hostname": "ud-01",
>> >  "utsname_machine": "x86_64",
>> >  "utsname_release": "5.15.0-84-generic",
>> >  "utsname_sysname": "Linux",
>> >  "utsname_version": "#93-Ubuntu SMP Tue Sep 5 17:16:10 UTC 2023"
>> > }
>> >
>> >
>> > Best regards.
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stupid question about ceph fs volume

2024-01-25 Thread Eugen Block
I'm not sure if using EC as default data pool for cephfs is still  
discouraged as stated in the output when attempting to do that, the  
docs don't mention that (at least not in the link I sent in the last  
mail):


ceph:~ # ceph fs new cephfs cephfs_metadata cephfs_data
Error EINVAL: pool 'cephfs_data' (id '8') is an erasure-coded pool.  
Use of an EC pool for the default data pool is discouraged; see the  
online CephFS documentation for more information. Use --force to  
override.


ceph:~ # ceph fs new cephfs cephfs_metadata cephfs_data --force
new fs with metadata pool 6 and data pool 8

CC'ing Zac here to hopefully clear that up.

Zitat von "David C." :


Albert,
Never used EC for (root) data pool.

Le jeu. 25 janv. 2024 à 12:08, Albert Shih  a écrit :


Le 25/01/2024 à 08:42:19+, Eugen Block a écrit
> Hi,
>
> it's really as easy as it sounds (fresh test cluster on 18.2.1 without
any
> pools yet):
>
> ceph:~ # ceph fs volume create cephfs

Yes...I already try that with the label and works fine.

But I prefer to use «my» pools. Because I have ssd/hdd and want also try
«erasure coding» pool for the data.




I also need to set the pg_num and pgp_num (I know I can do that after the
creation).




So I manage to do ... half what I want...

In fact

  ceph fs volume create thing

will create two pools

  cephfs.thing.meta
  cephfs.thing.data

and if those pool already existe it will use them.

But that's only if the data are replicated no with erasure coding(maybe
I forget something config on the pool).

Well I will currently continue my test with replicated data.

> The pools and the daemons are created automatically (you can control the
> placement of the daemons with the --placement option). Note that the
> metadata pool needs to be on fast storage, so you might need to change
the
> ruleset for the metadata pool after creation in case you have HDDs in
place.
> Changing pools after the creation can be done via ceph fs commands:
>
> ceph:~ # ceph osd pool create cephfs_data2
> pool 'cephfs_data2' created
>
> ceph:~ # ceph fs add_data_pool cephfs cephfs_data2
>   Pool 'cephfs_data2' (id '4') has pg autoscale mode 'on' but is not
marked
> as bulk.
>   Consider setting the flag by running
> # ceph osd pool set cephfs_data2 bulk true
> added data pool 4 to fsmap
>
> ceph:~ # ceph fs status
> cephfs - 0 clients
> ==
> RANK  STATE MDS   ACTIVITY DNSINOS   DIRS
> CAPS
>  0active  cephfs.soc9-ceph.uqcybj  Reqs:0 /s10 13 12
> 0
>POOL   TYPE USED  AVAIL
> cephfs.cephfs.meta  metadata  64.0k  13.8G
> cephfs.cephfs.datadata   0   13.8G
>cephfs_data2   data   0   13.8G
>
>
> You can't remove the default data pool, though (here it's
> cephfs.cephfs.data). If you want to control the pool creation you can
fall
> back to the method you mentioned, create pools as you require them and
then
> create a new cephfs, and deploy the mds service.

Yes, but I'm guessing the

  ceph fs volume

are the «future» so it would be super nice to add (at least) the option to
choose the couple of pool...

>
> I haven't looked too deep into changing the default pool yet, so there
might
> be a way to switch that as well.

Ok. I will also try but...well...newbie ;-)

Anyway thanks.

regards

--
Albert SHIH 嶺 
France
Heure locale/Local time:
jeu. 25 janv. 2024 12:00:08 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stupid question about ceph fs volume

2024-01-25 Thread Eugen Block

Did you set the ec-overwrites flag for the pool as mentioned in the docs?
https://docs.ceph.com/en/latest/cephfs/createfs/#using-erasure-coded-pools-with-cephfs

If you plan to use pre-created pools anyway then the slightly more  
manual method is the way to go.

You can set the pg_num (and pgp_num) while you create a pool:

ceph:~ # ceph osd pool create testpool 16 16 replicated
pool 'testpool' created



Zitat von Albert Shih :


Le 25/01/2024 à 08:42:19+, Eugen Block a écrit

Hi,

it's really as easy as it sounds (fresh test cluster on 18.2.1 without any
pools yet):

ceph:~ # ceph fs volume create cephfs


Yes...I already try that with the label and works fine.

But I prefer to use «my» pools. Because I have ssd/hdd and want also try
«erasure coding» pool for the data.

I also need to set the pg_num and pgp_num (I know I can do that after the
creation).


So I manage to do ... half what I want...

In fact

  ceph fs volume create thing

will create two pools

  cephfs.thing.meta
  cephfs.thing.data

and if those pool already existe it will use them.

But that's only if the data are replicated no with erasure coding(maybe
I forget something config on the pool).

Well I will currently continue my test with replicated data.


The pools and the daemons are created automatically (you can control the
placement of the daemons with the --placement option). Note that the
metadata pool needs to be on fast storage, so you might need to change the
ruleset for the metadata pool after creation in case you have HDDs in place.
Changing pools after the creation can be done via ceph fs commands:

ceph:~ # ceph osd pool create cephfs_data2
pool 'cephfs_data2' created

ceph:~ # ceph fs add_data_pool cephfs cephfs_data2
  Pool 'cephfs_data2' (id '4') has pg autoscale mode 'on' but is not marked
as bulk.
  Consider setting the flag by running
# ceph osd pool set cephfs_data2 bulk true
added data pool 4 to fsmap

ceph:~ # ceph fs status
cephfs - 0 clients
==
RANK  STATE MDS   ACTIVITY DNSINOS   DIRS
CAPS
 0active  cephfs.soc9-ceph.uqcybj  Reqs:0 /s10 13 12

   POOL   TYPE USED  AVAIL
cephfs.cephfs.meta  metadata  64.0k  13.8G
cephfs.cephfs.datadata   0   13.8G
   cephfs_data2   data   0   13.8G


You can't remove the default data pool, though (here it's
cephfs.cephfs.data). If you want to control the pool creation you can fall
back to the method you mentioned, create pools as you require them and then
create a new cephfs, and deploy the mds service.


Yes, but I'm guessing the

  ceph fs volume

are the «future» so it would be super nice to add (at least) the option to
choose the couple of pool...



I haven't looked too deep into changing the default pool yet, so there might
be a way to switch that as well.


Ok. I will also try but...well...newbie ;-)

Anyway thanks.

regards

--
Albert SHIH 嶺 
France
Heure locale/Local time:
jeu. 25 janv. 2024 12:00:08 CET



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stupid question about ceph fs volume

2024-01-25 Thread David C.
Albert,
Never used EC for (root) data pool.

Le jeu. 25 janv. 2024 à 12:08, Albert Shih  a écrit :

> Le 25/01/2024 à 08:42:19+, Eugen Block a écrit
> > Hi,
> >
> > it's really as easy as it sounds (fresh test cluster on 18.2.1 without
> any
> > pools yet):
> >
> > ceph:~ # ceph fs volume create cephfs
>
> Yes...I already try that with the label and works fine.
>
> But I prefer to use «my» pools. Because I have ssd/hdd and want also try
> «erasure coding» pool for the data.
>

> I also need to set the pg_num and pgp_num (I know I can do that after the
> creation).


> So I manage to do ... half what I want...
>
> In fact
>
>   ceph fs volume create thing
>
> will create two pools
>
>   cephfs.thing.meta
>   cephfs.thing.data
>
> and if those pool already existe it will use them.
>
> But that's only if the data are replicated no with erasure coding(maybe
> I forget something config on the pool).
>
> Well I will currently continue my test with replicated data.
>
> > The pools and the daemons are created automatically (you can control the
> > placement of the daemons with the --placement option). Note that the
> > metadata pool needs to be on fast storage, so you might need to change
> the
> > ruleset for the metadata pool after creation in case you have HDDs in
> place.
> > Changing pools after the creation can be done via ceph fs commands:
> >
> > ceph:~ # ceph osd pool create cephfs_data2
> > pool 'cephfs_data2' created
> >
> > ceph:~ # ceph fs add_data_pool cephfs cephfs_data2
> >   Pool 'cephfs_data2' (id '4') has pg autoscale mode 'on' but is not
> marked
> > as bulk.
> >   Consider setting the flag by running
> > # ceph osd pool set cephfs_data2 bulk true
> > added data pool 4 to fsmap
> >
> > ceph:~ # ceph fs status
> > cephfs - 0 clients
> > ==
> > RANK  STATE MDS   ACTIVITY DNSINOS   DIRS
> > CAPS
> >  0active  cephfs.soc9-ceph.uqcybj  Reqs:0 /s10 13 12
> > 0
> >POOL   TYPE USED  AVAIL
> > cephfs.cephfs.meta  metadata  64.0k  13.8G
> > cephfs.cephfs.datadata   0   13.8G
> >cephfs_data2   data   0   13.8G
> >
> >
> > You can't remove the default data pool, though (here it's
> > cephfs.cephfs.data). If you want to control the pool creation you can
> fall
> > back to the method you mentioned, create pools as you require them and
> then
> > create a new cephfs, and deploy the mds service.
>
> Yes, but I'm guessing the
>
>   ceph fs volume
>
> are the «future» so it would be super nice to add (at least) the option to
> choose the couple of pool...
>
> >
> > I haven't looked too deep into changing the default pool yet, so there
> might
> > be a way to switch that as well.
>
> Ok. I will also try but...well...newbie ;-)
>
> Anyway thanks.
>
> regards
>
> --
> Albert SHIH 嶺 
> France
> Heure locale/Local time:
> jeu. 25 janv. 2024 12:00:08 CET
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stupid question about ceph fs volume

2024-01-25 Thread Albert Shih
Le 25/01/2024 à 08:42:19+, Eugen Block a écrit
> Hi,
> 
> it's really as easy as it sounds (fresh test cluster on 18.2.1 without any
> pools yet):
> 
> ceph:~ # ceph fs volume create cephfs

Yes...I already try that with the label and works fine. 

But I prefer to use «my» pools. Because I have ssd/hdd and want also try
«erasure coding» pool for the data.

I also need to set the pg_num and pgp_num (I know I can do that after the
creation).


So I manage to do ... half what I want...

In fact 

  ceph fs volume create thing 

will create two pools 

  cephfs.thing.meta
  cephfs.thing.data

and if those pool already existe it will use them. 

But that's only if the data are replicated no with erasure coding(maybe
I forget something config on the pool).

Well I will currently continue my test with replicated data.

> The pools and the daemons are created automatically (you can control the
> placement of the daemons with the --placement option). Note that the
> metadata pool needs to be on fast storage, so you might need to change the
> ruleset for the metadata pool after creation in case you have HDDs in place.
> Changing pools after the creation can be done via ceph fs commands:
> 
> ceph:~ # ceph osd pool create cephfs_data2
> pool 'cephfs_data2' created
> 
> ceph:~ # ceph fs add_data_pool cephfs cephfs_data2
>   Pool 'cephfs_data2' (id '4') has pg autoscale mode 'on' but is not marked
> as bulk.
>   Consider setting the flag by running
> # ceph osd pool set cephfs_data2 bulk true
> added data pool 4 to fsmap
> 
> ceph:~ # ceph fs status
> cephfs - 0 clients
> ==
> RANK  STATE MDS   ACTIVITY DNSINOS   DIRS
> CAPS
>  0active  cephfs.soc9-ceph.uqcybj  Reqs:0 /s10 13 12
> 0
>POOL   TYPE USED  AVAIL
> cephfs.cephfs.meta  metadata  64.0k  13.8G
> cephfs.cephfs.datadata   0   13.8G
>cephfs_data2   data   0   13.8G
> 
> 
> You can't remove the default data pool, though (here it's
> cephfs.cephfs.data). If you want to control the pool creation you can fall
> back to the method you mentioned, create pools as you require them and then
> create a new cephfs, and deploy the mds service.

Yes, but I'm guessing the 

  ceph fs volume

are the «future» so it would be super nice to add (at least) the option to
choose the couple of pool...

> 
> I haven't looked too deep into changing the default pool yet, so there might
> be a way to switch that as well.

Ok. I will also try but...well...newbie ;-)

Anyway thanks. 

regards

-- 
Albert SHIH 嶺 
France
Heure locale/Local time:
jeu. 25 janv. 2024 12:00:08 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about the CRUSH details

2024-01-25 Thread Henry lol
It's reasonable enough.
actually, I expected the client to have just? thousands of
"PG-to-OSDs" mappings.
Nevertheless, it’s so heavy that the client calculates location on
demand, right?

if the client with the outdated map sends a request to the wrong OSD,
then does the OSD handle it somehow through redirection or something?

Lastly, not only CRUSH map but also other factors like storage usage
are considered when doing CRUSH?
because it seems that the target OSD set isn’t deterministic given only it.

2024년 1월 25일 (목) 오후 4:42, Janne Johansson 님이 작성:
>
> Den tors 25 jan. 2024 kl 03:05 skrev Henry lol :
> >
> > Do you mean object location (osds) is initially calculated only using its
> > name and crushmap,
> > and then the result is reprocessed with the map of the PGs?
> >
> > and I'm still skeptical about computation on the client-side.
> > is it possible to obtain object location without computation on the client
> > because ceph-mon already updates that information to PG map?
>
> The client should not need to contact the mon for each object access
> and every client can't have a complete list of millions of objects in
> the cluster, so it does client-side computations.
>
> The mon connection will more or less only require new updates if/when
> OSDs change weight or goes in/out. This way, clients can run on
> "autopilot" even if all mons are down, as long as OSD states don't
> change.
>
> --
> May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrubbing?

2024-01-25 Thread Sridhar Seshasayee
Hello Jan,


> Meaning of my previous post was, that CEPH cluster didn't fulfill
> my needs and, although I had set mClock profile to
> "high_client_ops" (because I have a plenty of time to rebalancing
> and scrubbing), my clients went to problems.
>

As far as the question around mClock is concerned, there are further
improvements in the works to handle QoS between client ops and
background scrub ops. This should help address the issue you are
currently facing. See PR: https://github.com/ceph/ceph/pull/51171
for more information.

Also, it would be helpful to know the Ceph version you are currently using.

-Sridhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stupid question about ceph fs volume

2024-01-25 Thread Eugen Block

Hi,

it's really as easy as it sounds (fresh test cluster on 18.2.1 without  
any pools yet):


ceph:~ # ceph fs volume create cephfs

(wait a minute or two)

ceph:~ # ceph fs status
cephfs - 0 clients
==
RANK  STATE MDS   ACTIVITY DNSINOS
DIRS   CAPS
 0active  cephfs.soc9-ceph.uqcybj  Reqs:0 /s10 13  
12  0

   POOL   TYPE USED  AVAIL
cephfs.cephfs.meta  metadata  64.0k  13.8G
cephfs.cephfs.datadata   0   13.8G
  STANDBY MDS
cephfs.soc9-ceph.cgkvrf
MDS version: ceph version 18.2.1  
(7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)


The pools and the daemons are created automatically (you can control  
the placement of the daemons with the --placement option). Note that  
the metadata pool needs to be on fast storage, so you might need to  
change the ruleset for the metadata pool after creation in case you  
have HDDs in place.

Changing pools after the creation can be done via ceph fs commands:

ceph:~ # ceph osd pool create cephfs_data2
pool 'cephfs_data2' created

ceph:~ # ceph fs add_data_pool cephfs cephfs_data2
  Pool 'cephfs_data2' (id '4') has pg autoscale mode 'on' but is not  
marked as bulk.

  Consider setting the flag by running
# ceph osd pool set cephfs_data2 bulk true
added data pool 4 to fsmap

ceph:~ # ceph fs status
cephfs - 0 clients
==
RANK  STATE MDS   ACTIVITY DNSINOS
DIRS   CAPS
 0active  cephfs.soc9-ceph.uqcybj  Reqs:0 /s10 13  
12  0

   POOL   TYPE USED  AVAIL
cephfs.cephfs.meta  metadata  64.0k  13.8G
cephfs.cephfs.datadata   0   13.8G
   cephfs_data2   data   0   13.8G


You can't remove the default data pool, though (here it's  
cephfs.cephfs.data). If you want to control the pool creation you can  
fall back to the method you mentioned, create pools as you require  
them and then create a new cephfs, and deploy the mds service.


I haven't looked too deep into changing the default pool yet, so there  
might be a way to switch that as well.


Regards,
Eugen


Zitat von Albert Shih :


Hi everyone,

Stupid question about

  ceph fs volume create

how can I specify the metadata pool and the data pool ?

I was able to create a cephfs «manually» with something like

  ceph fs new vo cephfs_metadata cephfs_data

but as I understand the documentation, with this method I need to deploy
the mds, and the «new» way to do it is to use ceph fs volume.

But with ceph fs volume I didn't find any documentation of how to set the
metadata/data pool

I also didn't find any way to change after the creation of the volume the
pool.

Thanks

--
Albert SHIH 嶺 
France
Heure locale/Local time:
mer. 24 janv. 2024 19:24:23 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrubbing?

2024-01-25 Thread Jan Marek
Hello Peter,

your irony is perfect, it is worth to notice.

Meaning of my previous post was, that CEPH cluster didn't fulfill
my needs and, although I had set mClock profile to
"high_client_ops" (because I have a plenty of time to rebalancing
and scrubbing), my clients went to problems.

And there was question, if scheduler manage CEPH cluster
background (and clients) operation in this way to stil be usable
for clients.

I've tried to send feedback to developers.

Thanks for understanding.

Sincerely
Jan Marek

Dne St, led 24, 2024 at 11:18:20 CET napsal(a) Peter Grandi:
> > [...] After a few days, I have on our OSD nodes around 90MB/s
> > read and 70MB/s write while 'ceph -s' have client io as
> > 2,5MB/s read and 50MB/s write. [...]
> 
> This is one of my pet-peeves: that a storage system must have
> capacity (principally IOPS) to handle both a maintenance
> workload and a user workload, and since the former often
> involves whole-storage or whole-metadata operations it can be
> quite heavy, especially in the case of Ceph where rebalancing
> and scrubbing and checking should be fairly frequent to detect
> and correct inconsistencies.
> 
> > Is this activity OK? [...]
> 
> Indeed. Some "clever" people "save money" by "rightsizing" their
> storage so it cannot run at the same time the maintenance and
> the user workload, and so turn off the maintenance workload,
> because they "feel lucky" I guess, but I do not recommend that.
> :-). I have seen more than one Ceph cluster that did not have
> the capacity even to run *just* the maintenance workload.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Ing. Jan Marek
University of South Bohemia
Academic Computer Centre
Phone: +420389032080
http://www.gnu.org/philosophy/no-word-attachments.cs.html


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io