[ceph-users] Re: Newer linux kernel cephfs clients is more trouble?

2023-05-29 Thread 胡 玮文
Hi Dan,

We also experienced very high network usage and memory pressure with our 
machine learning workload. This patch [1] (currently testing, may be merged in 
6.5) may fix it. See [2] for more about my experiment about this issue.

[1]: 
https://lkml.kernel.org/ceph-devel/20230515012044.98096-1-xiu...@redhat.com/T/#t
[2]: 
https://lore.kernel.org/ceph-devel/20230504082510.247-1-seh...@mail.scut.edu.cn

Weiwen Hu

在 2023年5月30日,02:26,Dan van der Ster  写道:

Hi,

Sorry for poking this old thread, but does this issue still persist in
the 6.3 kernels?

Cheers, Dan

__
Clyso GmbH | https://www.clyso.com


On Wed, Dec 7, 2022 at 3:42 AM William Edwards  wrote:


Op 7 dec. 2022 om 11:59 heeft Stefan Kooman  het volgende 
geschreven:

On 5/13/22 09:38, Xiubo Li wrote:
On 5/12/22 12:06 AM, Stefan Kooman wrote:
Hi List,

We have quite a few linux kernel clients for CephFS. One of our customers has 
been running mainline kernels (CentOS 7 elrepo) for the past two years. They 
started out with 3.x kernels (default CentOS 7), but upgraded to mainline when 
those kernels would frequently generate MDS warnings like "failing to respond 
to capability release". That worked fine until 5.14 kernel. 5.14 and up would 
use a lot of CPU and *way* more bandwidth on CephFS than older kernels (order 
of magnitude). After the MDS was upgraded from Nautilus to Octopus that 
behavior is gone (comparable CPU / bandwidth usage as older kernels). However, 
the newer kernels are now the ones that give "failing to respond to capability 
release", and worse, clients get evicted (unresponsive as far as the MDS is 
concerned). Even the latest 5.17 kernels have that. No difference is observed 
between using messenger v1 or v2. MDS version is 15.2.16.
Surprisingly the latest stable kernels from CentOS 7 work flawlessly now. 
Although that is good news, newer operating systems come with newer kernels.

Does anyone else observe the same behavior with newish kernel clients?
There have some known bugs, which have been fixed or under fixing recently, 
even in the mainline and, not sure whether are they related. Such as 
[1][2][3][4]. More detail please see ceph-client repo testing branch [5].

None of the issues you mentioned were related. We gained some more experience 
with newer kernel clients, specifically on Ubuntu Focal / Jammy (5.15). 
Performance issues seem to arise in certain workloads, specifically 
load-balanced Apache shared web hosting clusters with CephFS. We have tested 
linux kernel clients from 5.8 up to and including 6.0 with a production 
workload and the short summary is:

< 5.13, everything works fine
5.13 and up is giving issues

I see this issue on 6.0.0 as well.


We tested the 5.13.-rc1 as well, and already that kernel is giving issues. So 
something has changed in 5.13 that results in performance regression in certain 
workloads. And I wonder if it has something to do with the changes related to 
fscache that have, and are, happening in the kernel. These web servers might 
access the same directories / files concurrently.

Note: we have quite a few 5.15 kernel clients not doing any (load-balanced) web 
based workload (container clusters on CephFS) that don't have any performance 
issue running these kernels.

Issue: poor CephFS performance
Symptom / result: excessive CephFS network usage (order of magnitude higher 
than for older kernels not having this issue), within a minute there are a 
bunch of slow web service processes, claiming loads of virtual memory, that 
result in heavy swap usage and basically rendering the node unusable slow.

Other users that replied to this thread experienced similar symptoms. It is 
reproducible on both CentOS (EPEL mainline kernels) as well as on Ubuntu (hwe 
as well as default relase kernel).

MDS version used: 15.2.16 (with a backported patch from 15.2.17) (single active 
/ standby-replay)

Does this ring a bell?

Gr. Stefan

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Newer linux kernel cephfs clients is more trouble?

2023-05-29 Thread Dan van der Ster
Hi,

Sorry for poking this old thread, but does this issue still persist in
the 6.3 kernels?

Cheers, Dan

__
Clyso GmbH | https://www.clyso.com


On Wed, Dec 7, 2022 at 3:42 AM William Edwards  wrote:
>
>
> > Op 7 dec. 2022 om 11:59 heeft Stefan Kooman  het volgende 
> > geschreven:
> >
> > On 5/13/22 09:38, Xiubo Li wrote:
> >>> On 5/12/22 12:06 AM, Stefan Kooman wrote:
> >>> Hi List,
> >>>
> >>> We have quite a few linux kernel clients for CephFS. One of our customers 
> >>> has been running mainline kernels (CentOS 7 elrepo) for the past two 
> >>> years. They started out with 3.x kernels (default CentOS 7), but upgraded 
> >>> to mainline when those kernels would frequently generate MDS warnings 
> >>> like "failing to respond to capability release". That worked fine until 
> >>> 5.14 kernel. 5.14 and up would use a lot of CPU and *way* more bandwidth 
> >>> on CephFS than older kernels (order of magnitude). After the MDS was 
> >>> upgraded from Nautilus to Octopus that behavior is gone (comparable CPU / 
> >>> bandwidth usage as older kernels). However, the newer kernels are now the 
> >>> ones that give "failing to respond to capability release", and worse, 
> >>> clients get evicted (unresponsive as far as the MDS is concerned). Even 
> >>> the latest 5.17 kernels have that. No difference is observed between 
> >>> using messenger v1 or v2. MDS version is 15.2.16.
> >>> Surprisingly the latest stable kernels from CentOS 7 work flawlessly now. 
> >>> Although that is good news, newer operating systems come with newer 
> >>> kernels.
> >>>
> >>> Does anyone else observe the same behavior with newish kernel clients?
> >> There have some known bugs, which have been fixed or under fixing 
> >> recently, even in the mainline and, not sure whether are they related. 
> >> Such as [1][2][3][4]. More detail please see ceph-client repo testing 
> >> branch [5].
> >
> > None of the issues you mentioned were related. We gained some more 
> > experience with newer kernel clients, specifically on Ubuntu Focal / Jammy 
> > (5.15). Performance issues seem to arise in certain workloads, specifically 
> > load-balanced Apache shared web hosting clusters with CephFS. We have 
> > tested linux kernel clients from 5.8 up to and including 6.0 with a 
> > production workload and the short summary is:
> >
> > < 5.13, everything works fine
> > 5.13 and up is giving issues
>
> I see this issue on 6.0.0 as well.
>
> >
> > We tested the 5.13.-rc1 as well, and already that kernel is giving issues. 
> > So something has changed in 5.13 that results in performance regression in 
> > certain workloads. And I wonder if it has something to do with the changes 
> > related to fscache that have, and are, happening in the kernel. These web 
> > servers might access the same directories / files concurrently.
> >
> > Note: we have quite a few 5.15 kernel clients not doing any (load-balanced) 
> > web based workload (container clusters on CephFS) that don't have any 
> > performance issue running these kernels.
> >
> > Issue: poor CephFS performance
> > Symptom / result: excessive CephFS network usage (order of magnitude higher 
> > than for older kernels not having this issue), within a minute there are a 
> > bunch of slow web service processes, claiming loads of virtual memory, that 
> > result in heavy swap usage and basically rendering the node unusable slow.
> >
> > Other users that replied to this thread experienced similar symptoms. It is 
> > reproducible on both CentOS (EPEL mainline kernels) as well as on Ubuntu 
> > (hwe as well as default relase kernel).
> >
> > MDS version used: 15.2.16 (with a backported patch from 15.2.17) (single 
> > active / standby-replay)
> >
> > Does this ring a bell?
> >
> > Gr. Stefan
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Pacific - mds How to know how many sequences still to be replayed

2023-05-29 Thread Emmanuel Jaep
Hi,

I just restarted one of our mds servers. I can find some "progress" in logs
as below:
mds.beacon.icadmin006 Sending beacon up:replay seq 461
mds.beacon.icadmin006 received beacon reply up:replay seq 461 rtt 0

How I know how long is the sequence (ie. when the node will be finished
replaying)?

Best,

Emmanuel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: BlueStore fragmentation woes

2023-05-29 Thread Hector Martin
On 29/05/2023 22.26, Igor Fedotov wrote:
> So fragmentation score calculation was improved recently indeed, see 
> https://github.com/ceph/ceph/pull/49885
> 
> 
> And yeah one can see some fragmentation in allocations for the first two
> OSDs. Doesn't look that dramatic as fragmentation scores tell though.
> 
> 
> Additionally you might want to collect free extents dump using 'ceph
> tell osd.N ceph bluestore allocator dump block' command and do more
> analysis on these data.
> 
> E.g. I'd recommend to build something like a histogram showing amount of
> chunks for specific size range:
> 
> [1-4K]: N1 chunks
> 
> (4K-16]: N2 chunks
> 
> (16K-64K): N3
> 
> ...
> 
> [16M-inf) : Nn chunks
> 
> 
> This should be even more informative about fragmentation state -
> particularly if observed in evolution.
> 
> Looking for volunteers to write a script for building such a histogram... ;)

I'm up for that, once I get through some other cluster maintenance I
need to deal with first :)

Backfill is almost done and I was finally able to destroy two OSDs, will
be doing a bunch of restructuring in the coming weeks. I can probably
get the script done partway through doing this, so I can see how the
distributions evolve over a bunch of data movement.

> 
> 
> Thanks,
> 
> Igor
> 
> 
> On 28/05/2023 08:31, Hector Martin wrote:
>> So chiming in, I think something is definitely wrong with at *least* the
>> frag score.
>>
>> Here's what happened so far:
>>
>> 1. I had 8 OSDs (all 8T HDDs)
>> 2. I added 2 more (osd.0,1) , with Quincy defaults
>> 3. I marked 2 old ones out (the ones that seemed to be struggling the
>> most with IOPS)
>> 4. I added 2 more (osd.2,3), but this time I had previously set
>> bluestore_min_alloc_size_hdd to 16K as an experiment
>>
>> This has all happened in the space of a ~week. That means there was data
>> movement into the first 2 new OSDs, then before that completed I added 2
>> new OSDs. So I would expect some data thashing on the first 2, but
>> nothing extreme.
>>
>> The fragmentation scores for the 4 new OSDs are, respectively:
>>
>> 0.746, 0.835, 0.160, 0.067
>>
>> That seems ridiculous for the first two, it's only been a week. The
>> newest two seem in better shape, though those mostly would've seen only
>> data moving in, not out. The rebalance isn't done yet, but it's almost
>> done and all 4 OSDs have a similar fullness level at this time.
>>
>> Looking at alloc stats:
>>
>> ceph-0)  allocation stats probe 6: cnt: 2219302 frags: 2328003 size:
>> 1238454677504
>> ceph-0)  probe -1: 1848577,  1970325, 1022324588544
>> ceph-0)  probe -2: 848301,  862622, 505329963008
>> ceph-0)  probe -6: 2187448,  2187448, 1055241568256
>> ceph-0)  probe -14: 0,  0, 0
>> ceph-0)  probe -22: 0,  0, 0
>>
>> ceph-1)  allocation stats probe 6: cnt: 1882396 frags: 1947321 size:
>> 1054829641728
>> ceph-1)  probe -1: 2212293,  2345923, 1215418728448
>> ceph-1)  probe -2: 1471623,  1525498, 826984652800
>> ceph-1)  probe -6: 2095298,  2095298, 165933312
>> ceph-1)  probe -14: 0,  0, 0
>> ceph-1)  probe -22: 0,  0, 0
>>
>> ceph-2)  allocation stats probe 3: cnt: 2760200 frags: 2760200 size:
>> 1554513903616
>> ceph-2)  probe -1: 2584046,  2584046, 1498140393472
>> ceph-2)  probe -3: 1696921,  1696921, 869424496640
>> ceph-2)  probe -7: 0,  0, 0
>> ceph-2)  probe -11: 0,  0, 0
>> ceph-2)  probe -19: 0,  0, 0
>>
>> ceph-3)  allocation stats probe 3: cnt: 2544818 frags: 2544818 size:
>> 1432225021952
>> ceph-3)  probe -1: 2688015,  2688015, 1515260739584
>> ceph-3)  probe -3: 1086875,  1086875, 622025424896
>> ceph-3)  probe -7: 0,  0, 0
>> ceph-3)  probe -11: 0,  0, 0
>> ceph-3)  probe -19: 0,  0, 0
>>
>> So OSDs 2 and 3 (the latest ones to be added, note that these 4 new OSDs
>> are 0-3 since those IDs were free) are in good shape, but 0 and 1 are
>> already suffering from at least some fragmentation of objects, which is
>> a bit worrying when they are only ~70% full right now and only a week old.
>>
>> I did delete a couple million small objects during the rebalance to try
>> to reduce load (I had some nasty directories), but that was cumulatively
>> only about 60GB of data. So while that could explain a high frag score
>> if there are now a million little holes in the free space map of the
>> OSDs (how is it calculated?), it should not actually cause new data
>> moving in to end up fragmented since there should be plenty of
>> unfragmented free space going around still.
>>
>> I am now restarting OSDs 0 and 1 to see whether that makes the frag
>> score go down over time. I will do further analysis later with the raw
>> bluestore free space map, since I still have a bunch of rebalancing and
>> moving data around planned (I'm moving my cluster to new machines).
>>
>> On 26/05/2023 00.29, Igor Fedotov wrote:
>>> Hi Hector,
>>>
>>> I can advise two tools for further fragmentation analysis:
>>>
>>> 1) One might want to use ceph-bluestore-tool's free-dump command to get 
>>> a list of free chunks for an OSD and try to 

[ceph-users] Re: BlueStore fragmentation woes

2023-05-29 Thread Igor Fedotov

Hi Stefan,

given that allocation probes include every allocation (including short 
4K ones) your stats look pretty high indeed.


Although you omitted historic probes so it's hard to tell if there is 
negative trend in it..


As I mentioned in my reply to Hector one might want to make further 
investigation by e.g. building a histogram (chunk-size, num chanks) 
using the output from 'ceph tell osd.N bluestore allocator dump block' 
command and monitoring how  it evolves over time. Script to build such a 
histogram still to be written. ;)



As for Pacific release being a culprit - likely it is. But there were 
two major updates which could have the impact. Both came in the same PR 
(https://github.com/ceph/ceph/pull/34588):


1. 4K allocation unit for spinners

2. Switch to avl/hybrid allocator.

Honestly I'd rather bet on 1.

>BlueFS 4K allocation unit will not be backported to Pacific [3]. Would 
it make sense to skip re-provisiong OSDs in Pacific altogether and do 
re-provisioning in Quincy release with BlueFS 4K alloc size support [4]?


IIRC this feature doesn't require OSD redeployment - new superblock 
format is applied on-the-fly and 4K allocations are enabled immediately. 
So there is no specific requirement to re-provision OSD at Quincy+. 
Hence you're free to go with Pacific and enable 4K for BlueFS later in 
Quincy.



Thanks,

Igor

On 26/05/2023 16:03, Stefan Kooman wrote:

On 5/25/23 22:12, Igor Fedotov wrote:


On 25/05/2023 20:36, Stefan Kooman wrote:

On 5/25/23 18:17, Igor Fedotov wrote:

Perhaps...

I don't like the idea to use fragmentation score as a real index. 
IMO it's mostly like a very imprecise first turn marker to alert 
that something might be wrong. But not a real quantitative 
high-quality estimate.


Chiming in on the high fragmentation issue. We started collecting 
"fragmentation_rating" of each OSD this afternoon. All OSDs that 
have been provisioned a year ago have a fragmentation rating of ~ 
0.9. Not sure for how long they are on this level.


Could you please collect allocation probes from existing OSD logs? 
Just a few samples from different OSDs...


10 OSDs from one host, but I have checked other nodes and they are 
similar:


CNT    FRAG    Size    Ratio    Avg Frag size
21350923    37146899    317040259072    1.73982637659271 8534.77053554322
20951932    38122769    317841477632    1.8195347808498 8337.31352599283
21188454    37298950    278389411840    1.76034315670223 7463.73321072041
21605451    39369462    270427185152    1.82220042525379 6868.95810646333
19215230    36063713    290967818240    1.87682962941375 8068.16032059705
19293599    35464928    269238423552    1.83817068033807 7591.68109835159
19963538    36088151    315796836352    1.80770317365589 8750.70702159277
18030613    31753098    297826177024    1.76106591606176 9379.43683554909
17889602    31718012    299550142464    1.77298589426417 9444.16511551859
18475332    33264944    266053271552    1.80050588536109 7998.0074985847
18618154    31914219    254801883136    1.71414518324427 7983.96110323113
16437108    29421873    275350355968    1.78996651965784 9358.69568766067
17164338    28605353    249404649472    1.66655731202683 8718.81040838755
17895480    29658102    309047177216    1.65729569701399 10420.3288941416
19546560    34588509    301368737792    1.76954456436324 8712.97279081905
18525784    34806856    314875801600    1.87883309014075 9046.37297893266
18550989    35236438    273069948928    1.89943716747393 7749.64679823767
19085807    34605572    255512043520    1.81315738967705 7383.55209155335
17203820    31205542    277097357312    1.81387284916954 8879.74826112618
18003801    33723670    269696761856    1.87314167713807 7997.25420916525
18655425    33227176    306511810560    1.78109992133655 9224.7325069094
26380965    45627920    33528040    1.72957736762093 7348.15680925188
24923956    44721109    328790982656    1.79430219664968 7352.03106559813
25312482    43035393    287792226304    1.70016488308021 6687.33817079351
25841471    46276699    288168476672    1.79079197929561 6227.07502693742
25618384    43785917    321591488512    1.70915999229303 7344.63294469772
26006097    45056206    298747666432    1.73252472295247 6630.55532088077
26684805    45196730    351100243968    1.69372532420604 7768.26650883814
24025872    42450135    353265467392    1.76685095966548 8321.89267223768
24080466    45510525    371726323712    1.88993539410741 8167.91991988666
23195936    45095051    326473826304    1.94409274969546 7239.68193990955
23653302    43312705    307549573120    1.83114835298683 7100.67803707942
21589455    40034670    322982109184    1.85436223378497 8067.56017182107
22469039    42042723    314323701760    1.87114023879704 7476.29266924504
23647633    43486098    370003841024    1.83891969230071 8508.55464254346
23750561    37387139    320471453696    1.57415814304344 8571.70305799542
23142315    38640274    329341046784    1.66968058294946 8523.25857689312
23539469 

[ceph-users] Re: BlueStore fragmentation woes

2023-05-29 Thread Igor Fedotov
So fragmentation score calculation was improved recently indeed, 
seehttps://github.com/ceph/ceph/pull/49885



And yeah one can see some fragmentation in allocations for the first two 
OSDs. Doesn't look that dramatic as fragmentation scores tell though.



Additionally you might want to collect free extents dump using 'ceph 
tell osd.N ceph bluestore allocator dump block' command and do more 
analysis on these data.


E.g. I'd recommend to build something like a histogram showing amount of 
chunks for specific size range:


[1-4K]: N1 chunks

(4K-16]: N2 chunks

(16K-64K): N3

...

[16M-inf) : Nn chunks


This should be even more informative about fragmentation state - 
particularly if observed in evolution.


Looking for volunteers to write a script for building such a histogram... ;)


Thanks,

Igor


On 28/05/2023 08:31, Hector Martin wrote:

So chiming in, I think something is definitely wrong with at *least* the
frag score.

Here's what happened so far:

1. I had 8 OSDs (all 8T HDDs)
2. I added 2 more (osd.0,1) , with Quincy defaults
3. I marked 2 old ones out (the ones that seemed to be struggling the
most with IOPS)
4. I added 2 more (osd.2,3), but this time I had previously set
bluestore_min_alloc_size_hdd to 16K as an experiment

This has all happened in the space of a ~week. That means there was data
movement into the first 2 new OSDs, then before that completed I added 2
new OSDs. So I would expect some data thashing on the first 2, but
nothing extreme.

The fragmentation scores for the 4 new OSDs are, respectively:

0.746, 0.835, 0.160, 0.067

That seems ridiculous for the first two, it's only been a week. The
newest two seem in better shape, though those mostly would've seen only
data moving in, not out. The rebalance isn't done yet, but it's almost
done and all 4 OSDs have a similar fullness level at this time.

Looking at alloc stats:

ceph-0)  allocation stats probe 6: cnt: 2219302 frags: 2328003 size:
1238454677504
ceph-0)  probe -1: 1848577,  1970325, 1022324588544
ceph-0)  probe -2: 848301,  862622, 505329963008
ceph-0)  probe -6: 2187448,  2187448, 1055241568256
ceph-0)  probe -14: 0,  0, 0
ceph-0)  probe -22: 0,  0, 0

ceph-1)  allocation stats probe 6: cnt: 1882396 frags: 1947321 size:
1054829641728
ceph-1)  probe -1: 2212293,  2345923, 1215418728448
ceph-1)  probe -2: 1471623,  1525498, 826984652800
ceph-1)  probe -6: 2095298,  2095298, 165933312
ceph-1)  probe -14: 0,  0, 0
ceph-1)  probe -22: 0,  0, 0

ceph-2)  allocation stats probe 3: cnt: 2760200 frags: 2760200 size:
1554513903616
ceph-2)  probe -1: 2584046,  2584046, 1498140393472
ceph-2)  probe -3: 1696921,  1696921, 869424496640
ceph-2)  probe -7: 0,  0, 0
ceph-2)  probe -11: 0,  0, 0
ceph-2)  probe -19: 0,  0, 0

ceph-3)  allocation stats probe 3: cnt: 2544818 frags: 2544818 size:
1432225021952
ceph-3)  probe -1: 2688015,  2688015, 1515260739584
ceph-3)  probe -3: 1086875,  1086875, 622025424896
ceph-3)  probe -7: 0,  0, 0
ceph-3)  probe -11: 0,  0, 0
ceph-3)  probe -19: 0,  0, 0

So OSDs 2 and 3 (the latest ones to be added, note that these 4 new OSDs
are 0-3 since those IDs were free) are in good shape, but 0 and 1 are
already suffering from at least some fragmentation of objects, which is
a bit worrying when they are only ~70% full right now and only a week old.

I did delete a couple million small objects during the rebalance to try
to reduce load (I had some nasty directories), but that was cumulatively
only about 60GB of data. So while that could explain a high frag score
if there are now a million little holes in the free space map of the
OSDs (how is it calculated?), it should not actually cause new data
moving in to end up fragmented since there should be plenty of
unfragmented free space going around still.

I am now restarting OSDs 0 and 1 to see whether that makes the frag
score go down over time. I will do further analysis later with the raw
bluestore free space map, since I still have a bunch of rebalancing and
moving data around planned (I'm moving my cluster to new machines).

On 26/05/2023 00.29, Igor Fedotov wrote:

Hi Hector,

I can advise two tools for further fragmentation analysis:

1) One might want to use ceph-bluestore-tool's free-dump command to get
a list of free chunks for an OSD and try to analyze whether it's really
highly fragmented and lacks long enough extents. free-dump just returns
a list of extents in json format, I can take a look to the output if
shared...

2) You might want to look for allocation probs in OSD logs and see how
fragmentation in allocated chunks has evolved.

E.g.

allocation stats probe 33: cnt: 8148921 frags: 10958186 size: 1704348508>
probe -1: 35168547,  46401246, 1199516209152
probe -3: 27275094,  35681802, 200121712640
probe -5: 34847167,  52539758, 271272230912
probe -9: 44291522,  60025613, 523997483008
probe -17: 10646313,  10646313, 155178434560

The first probe refers to the last day while others match days (or
rather probes) -1, -3, -5, -9, -17

'cnt' 

[ceph-users] Re: Recoveries without any misplaced objects?

2023-05-29 Thread Hector Martin
On 29/05/2023 20.55, Anthony D'Atri wrote:
> Check the uptime for the OSDs in question

I restarted all my OSDs within the past 10 days or so. Maybe OSD
restarts are somehow breaking these stats?

> 
>> On May 29, 2023, at 6:44 AM, Hector Martin  wrote:
>>
>> Hi,
>>
>> I'm watching a cluster finish a bunch of backfilling, and I noticed that
>> quite often PGs end up with zero misplaced objects, even though they are
>> still backfilling.
>>
>> Right now the cluster is down to 6 backfilling PGs:
>>
>>  data:
>>volumes: 1/1 healthy
>>pools:   6 pools, 268 pgs
>>objects: 18.79M objects, 29 TiB
>>usage:   49 TiB used, 25 TiB / 75 TiB avail
>>pgs: 262 active+clean
>> 6   active+remapped+backfilling
>>
>> But there are no misplaced objects, and the misplaced column in `ceph pg
>> dump` is zero for all PGs.
>>
>> If I do a `ceph pg dump_json`, I can see `num_objects_recovered`
>> increasing for these PGs... but the misplaced count is still 0.
>>
>> Is there something else that would cause recoveries/backfills other than
>> misplaced objects? Or perhaps there is a bug somewhere causing the
>> misplaced object count to be misreported as 0 sometimes?
>>
>> # ceph -v
>> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
>> (stable)
>>
>> - Hector
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 

- Hector
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-29 Thread Milind Changire
An MDS-wide lock is acquired before the cache dump is done.
After the dump is complete, the lock is released.

So, the MDS freezing temporarily during the cache dump is expected.


On Fri, May 26, 2023 at 12:51 PM Emmanuel Jaep 
wrote:

> Hi Milind,
>
> I finally managed to dump the cache and find the file.
> It generated a 1.5 GB file with about 7 Mio lines. It's kind of hard to
> know what is out of the ordinary…
>
> Furthermore, I noticed that dumping the cache was actually stopping the
> MDS. Is it a normal behavior?
>
> Best,
>
> Emmanuel
>
> On Thu, May 25, 2023 at 1:19 PM Milind Changire 
> wrote:
>
>> try the command with the --id argument:
>>
>> # ceph --id admin --cluster floki daemon mds.icadmin011 dump cache
>> /tmp/dump.txt
>>
>> I presume that your keyring has an appropriate entry for the client.admin
>> user
>>
>>
>> On Wed, May 24, 2023 at 5:10 PM Emmanuel Jaep 
>> wrote:
>>
>>> Absolutely! :-)
>>>
>>> root@icadmin011:/tmp# ceph --cluster floki daemon mds.icadmin011 dump
>>> cache /tmp/dump.txt
>>> root@icadmin011:/tmp# ll
>>> total 48
>>> drwxrwxrwt 12 root root 4096 May 24 13:23  ./
>>> drwxr-xr-x 18 root root 4096 Jun  9  2022  ../
>>> drwxrwxrwt  2 root root 4096 May  4 12:43  .ICE-unix/
>>> drwxrwxrwt  2 root root 4096 May  4 12:43  .Test-unix/
>>> drwxrwxrwt  2 root root 4096 May  4 12:43  .X11-unix/
>>> drwxrwxrwt  2 root root 4096 May  4 12:43  .XIM-unix/
>>> drwxrwxrwt  2 root root 4096 May  4 12:43  .font-unix/
>>> drwx--  2 root root 4096 May 24 13:23  ssh-Sl5AiotnXp/
>>> drwx--  3 root root 4096 May  8 13:26
>>> 'systemd-private-18c17b770fc24c48a0507b8faa1c0ec2-ceph-mds@icadmin011.service-SGZrKf
>>> '/
>>> drwx--  3 root root 4096 May  4 12:43
>>>  
>>> systemd-private-18c17b770fc24c48a0507b8faa1c0ec2-systemd-logind.service-uU1GAi/
>>> drwx--  3 root root 4096 May  4 12:43
>>>  
>>> systemd-private-18c17b770fc24c48a0507b8faa1c0ec2-systemd-resolved.service-KYHd7f/
>>> drwx--  3 root root 4096 May  4 12:43
>>>  
>>> systemd-private-18c17b770fc24c48a0507b8faa1c0ec2-systemd-timesyncd.service-1Qtj5i/
>>>
>>> On Wed, May 24, 2023 at 1:17 PM Milind Changire 
>>> wrote:
>>>
 I hope the daemon mds.icadmin011 is running on the same machine that
 you are looking for /tmp/dump.txt, since the file is created on the system
 which has that daemon running.


 On Wed, May 24, 2023 at 2:16 PM Emmanuel Jaep 
 wrote:

> Hi Milind,
>
> you are absolutely right.
>
> The dump_ops_in_flight is giving a good hint about what's happening:
> {
> "ops": [
> {
> "description": "internal op exportdir:mds.5:975673",
> "initiated_at": "2023-05-23T17:49:53.030611+0200",
> "age": 60596.355186077999,
> "duration": 60596.355234167997,
> "type_data": {
> "flag_point": "failed to wrlock, waiting",
> "reqid": "mds.5:975673",
> "op_type": "internal_op",
> "internal_op": 5377,
> "op_name": "exportdir",
> "events": [
> {
> "time": "2023-05-23T17:49:53.030611+0200",
> "event": "initiated"
> },
> {
> "time": "2023-05-23T17:49:53.030611+0200",
> "event": "throttled"
> },
> {
> "time": "2023-05-23T17:49:53.030611+0200",
> "event": "header_read"
> },
> {
> "time": "2023-05-23T17:49:53.030611+0200",
> "event": "all_read"
> },
> {
> "time": "2023-05-23T17:49:53.030611+0200",
> "event": "dispatched"
> },
> {
> "time": "2023-05-23T17:49:53.030657+0200",
> "event": "requesting remote authpins"
> },
> {
> "time": "2023-05-23T17:49:53.050253+0200",
> "event": "failed to wrlock, waiting"
> }
> ]
> }
> }
> ],
> "num_ops": 1
> }
>
> However, the dump cache does not seem to produce an output:
> root@icadmin011:~# ceph --cluster floki daemon mds.icadmin011 dump
> cache /tmp/dump.txt
> root@icadmin011:~# ls /tmp
> ssh-cHvP3iF611
>
> systemd-private-18c17b770fc24c48a0507b8faa1c0ec2-ceph-mds@icadmin011.service-SGZrKf
>
> systemd-private-18c17b770fc24c48a0507b8faa1c0ec2-systemd-logind.service-uU1GAi
>
> 

[ceph-users] Recoveries without any misplaced objects?

2023-05-29 Thread Hector Martin
Hi,

I'm watching a cluster finish a bunch of backfilling, and I noticed that
quite often PGs end up with zero misplaced objects, even though they are
still backfilling.

Right now the cluster is down to 6 backfilling PGs:

  data:
volumes: 1/1 healthy
pools:   6 pools, 268 pgs
objects: 18.79M objects, 29 TiB
usage:   49 TiB used, 25 TiB / 75 TiB avail
pgs: 262 active+clean
 6   active+remapped+backfilling

But there are no misplaced objects, and the misplaced column in `ceph pg
dump` is zero for all PGs.

If I do a `ceph pg dump_json`, I can see `num_objects_recovered`
increasing for these PGs... but the misplaced count is still 0.

Is there something else that would cause recoveries/backfills other than
misplaced objects? Or perhaps there is a bug somewhere causing the
misplaced object count to be misreported as 0 sometimes?

# ceph -v
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
(stable)

- Hector
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Ceph | Quency ]The scheduled snapshots are not getting created till we create a manual backup.

2023-05-29 Thread Sake Paulusma
Hi!

I noticed the same that the snapshot scheduler seemed to do nothing , but after 
a manager fail over the creation of snapshots started to work (including the 
retention rules)..

Best regards,
Sake


From: Lokendra Rathour 
Sent: Monday, May 29, 2023 10:11:54 AM
To: ceph-users ; Ceph Users 
Subject: [ceph-users] [Ceph | Quency ]The scheduled snapshots are not getting 
created till we create a manual backup.

Hi Team,



*Problem:*

Create scheduled snapshots of the ceph subvolume.



*Expected Result:*

The scheduled snapshots should be created at the given scheduled time.



*Actual Result:*

The scheduled snapshots are not getting created till we create a manual
backup.



*Description:*

*Ceph version: 17(quincy)*

OS: Centos/Almalinux





The scheduled snapshot creation is not working and we were only able to see
the following logs in the file "ceph-mgr.storagenode3.log":



*2023-05-29T04:59:35.101+ 7f4cd3ad8700  0 [snap_schedule INFO mgr_util]
scanning for idle connections..*

*2023-05-29T04:59:35.101+ 7f4cd3ad8700  0 [snap_schedule DEBUG
mgr_util] fs_name (cephfs) connections ([])*

*2023-05-29T04:59:35.101+ 7f4cd3ad8700  0 [snap_schedule INFO mgr_util]
cleaning up connections: [*





The command which we were executing to add the snapshot schedule:

*ceph fs snap-schedule add /volumes//
 *

*eg.*

*ceph fs snap-schedule add /volumes/xyz/test_restore_53 1h
2023-05-26T11:05:00*



We can make sure that the schedule has been created using the following
commands:

*#ceph fs snap-schedule list / --recursive=true*

*#ceph fs snap-schedule status /volumes/xyz/test_restore_53*



Even though we created the snapshot schedule, snapshots were not getting
created.

We then tried creating a manual snapshot for one of the sub-volumes using
the following command:

*#ceph fs subvolume snapshot create cephfs  
--group_name *

*eg. ceph fs subvolume snapshot create cephfs test_restore_53 snapshot-1
--group_name xyz*



To check the snapshots created we can use the following command:

*ceph fs subvolume snapshot ls cephfs  
*

*eg. ceph fs subvolume snapshot ls cephfs test_restore_53 snapshot-1 xyz*



To delete the manually created snapshot:

*ceph fs subvolume snapshot rm cephfs  
*

*eg. ceph fs subvolume snapshot rm cephfs test_restore_53 snapshot-1 xyz*



To our surprise, the scheduled snapshots started working. We also applied
the retention policy and seems to be working fine.

We re-tested this understanding for another subvolume. And the scheduled
snapshots only started once we triggered a manual snapshot.



Could you please help us out with this?



--
~ Lokendra
skype: lokendrarathour
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-29 Thread Sandip Divekar
Hi Chris /  Gregory,

Did you get a chance to investigate this issue ?

Thanks and Regards
   Sandip Divekar

From: Sandip Divekar
Sent: Thursday, May 25, 2023 11:16 PM
To: Chris Palmer ; ceph-users@ceph.io
Cc: d...@ceph.io; Gavin Lucas ; Joseph 
Fernandes ; Simon Crosland 

Subject: RE: [ceph-users] Re: Unexpected behavior of directory mtime after 
being set explicitly

Hi Chris,

I think, you have missed one steps and that is to change mtime for directory 
explicitly. Please have a look at highlighted steps.



CEPHFS

===

root@sds-ceph:/mnt/cephfs/volumes/_nogroup/test1/d5052b71-39ec-4d0a-9b0b-2091e1723538#
 mkdir dir1

root@sds-ceph:/mnt/cephfs/volumes/_nogroup/test1/d5052b71-39ec-4d0a-9b0b-2091e1723538#
 stat dir1

  File: dir1

  Size: 0   Blocks: 0  IO Block: 65536  directory

Device: 28h/40d Inode: 1099511714911  Links: 2

Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/root)

Access: 2023-05-24 11:09:25.260851345 +0530

Modify: 2023-05-24 11:09:25.260851345 +0530

Change: 2023-05-24 11:09:25.260851345 +0530

Birth: 2023-05-24 11:09:25.260851345 +0530

root@sds-ceph:/mnt/cephfs/volumes/_nogroup/test1/d5052b71-39ec-4d0a-9b0b-2091e1723538#
  touch -m -d '26 Aug 1982 22:00' dir1

root@sds-ceph:/mnt/cephfs/volumes/_nogroup/test1/d5052b71-39ec-4d0a-9b0b-2091e1723538#
 stat dir1/

  File: dir1/

  Size: 0   Blocks: 0  IO Block: 65536  directory

Device: 28h/40d Inode: 1099511714911  Links: 2

Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/root)

Access: 2023-05-24 11:09:25.260851345 +0530

Modify: 1982-08-26 22:00:00.0 +0530

Change: 2023-05-24 11:10:04.881454967 +0530

Birth: 2023-05-24 11:09:25.260851345 +0530

root@sds-ceph:/mnt/cephfs/volumes/_nogroup/test1/d5052b71-39ec-4d0a-9b0b-2091e1723538#
 mkdir dir1/dir2

root@sds-ceph:/mnt/cephfs/volumes/_nogroup/test1/d5052b71-39ec-4d0a-9b0b-2091e1723538#
 stat dir1/

  File: dir1/

  Size: 1   Blocks: 0  IO Block: 65536  directory

Device: 28h/40d Inode: 1099511714911  Links: 3

Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/root)

Access: 2023-05-24 11:09:25.260851345 +0530

Modify: 1982-08-26 22:00:00.0 +0530

Change: 2023-05-24 11:10:19.141672220 +0530

Birth: 2023-05-24 11:09:25.260851345 +0530

root@sds-ceph:/mnt/cephfs/volumes/_nogroup/test1/d5052b71-39ec-4d0a-9b0b-2091e1723538#



Note : In a last step, it is expected that “Modify” time should change.

Thanks and Regards
   Sandip Divekar



From: Chris Palmer mailto:chris.pal...@idnet.com>>
Sent: Thursday, May 25, 2023 9:46 PM
To: Sandip Divekar 
mailto:sandip.dive...@hitachivantara.com>>; 
ceph-users@ceph.io
Cc: d...@ceph.io; Gavin Lucas 
mailto:gavin.lu...@hitachivantara.com>>; Joseph 
Fernandes 
mailto:joseph.fernan...@hitachivantara.com>>;
 Simon Crosland 
mailto:simon.crosl...@hitachivantara.com>>
Subject: Re: [ceph-users] Re: Unexpected behavior of directory mtime after 
being set explicitly

* EXTERNAL EMAIL *
Hi Sandip

Ceph servers (debian11/ceph base with Proxmox installed on top - NOT the ceph 
that comes with Proxmox!):

ceph@pve1:~$ uname -a

Linux pve1 5.15.107-2-pve #1 SMP PVE 5.15.107-2 (2023-05-10T09:10Z) x86_64 
GNU/Linux

ceph@pve1:~$ ceph version

ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)


Fedora workstation. I waited until the minute had clicked over before doing 
each step:

[chris@rex mtime]$ uname -a

Linux rex.palmer 6.2.15-300.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu May 11 
17:37:39 UTC 2023 x86_64 GNU/Linux

[chris@rex mtime]$ rpm -q ceph-common

ceph-common-17.2.6-2.fc38.x86_64



[chris@rex mtime]$ df .

Filesystem   1K-blocks   Used  
Available Use% Mounted on

192.168.80.121,192.168.80.122,192.168.80.123:/data2 8589930496 4944801792 
3645128704  58% /mnt/data2

[chris@rex mtime]$ mount|grep data2

systemd-1 on /mnt/data2 type autofs 
(rw,relatime,fd=61,pgrp=1,timeout=600,minproto=5,maxproto=5,direct,pipe_ino=22804)

192.168.80.121,192.168.80.122,192.168.80.123:/data2 on /mnt/data2 type ceph 
(rw,noatime,nodiratime,name=data2-rex,secret=,fsid=----,acl,_netdev,x-systemd.mount-timeout=30,x-systemd.automount,x-systemd.idle-timeout=600)



[chris@rex mtime]$ date; mkdir one; ls -ld one

Thu 25 May 16:57:28 BST 2023

drwxrwxr-x 2 chris groupname 0 May 25 16:57 one



[chris@rex mtime]$ date; touch one; ls -ld one

Thu 25 May 16:58:14 BST 2023

drwxrwxr-x 2 chris groupname 0 May 25 16:58 one



[chris@rex mtime]$ date; mkdir one/two; ls -ld one

Thu 25 May 16:59:26 BST 2023

drwxrwxr-x 3 chris groupname 1 May 25 16:59 one




I also repeated it with the test run on the ceph debian11 server, having 
mounted the cephfs filesystem on the ceph server - exactly the same result.

I then repeated it again on a 

[ceph-users] Re: Creating a bucket with bucket constructor in Ceph v16.2.7

2023-05-29 Thread Robert Hish
Ramin,

I think youre still going to experience what Casey described.

If your intent is to completely isolate bucket metadata/data in one
zonegroup from another, then I believe you need multiple independent
realms. Each with its own endpoint.

For instance;

Ceph Cluster A
Realm1/zonegroup1/zone1 (endpoint.left)
Realm2/zonegroup2/zone2 (endpoint.right)

Then, you dont need to bother with location constraint. Attempts to
create buckets via endpoint.right will be created at realm2/zonegroup2.
Buckets created via endpoint.left will show up at zonegroup1/zone1. They
will be completely isolated from one another, yet reside on the same
cluster.

Location constraint could be used for different placement targets within
zonegroup2. For instance, location constraint zonegroup2:put-data-on-
replicated-storage or zonegroup2:put-data-on-erasure-storage. Without
separate realms, I believe you will continue to experience what Casey
explained.

If it is your intent to have multiple isolated zonegroups within the
same cluster, then I would read through the docs for creating a multi-
realm deployment. And just stop short of creating an additional
synchronized replicated realm. (step #15)

https://www.ibm.com/docs/en/storage-ceph/5?topic=administration-configuring-multiple-realms-in-same-storage-cluster

You should then have two independent realms residing on the same
cluster. You can further isolate the two zonegroups by deploying new
root pools. By default, I believe the zonegroup/zone settings are stored
in .rgw.root

Any actions without specifying an alternative zonegroup/zone will by
default fall under the .rgw.root hierarchy.

You could, for example, deploy two new realms, each with its own root
pool.

zonegroup1 --> .zonegroup1.rgw.root
zonegroup2 --> .zonegroup2.rgw.root 

These two zonegroups will be isolated in their own realms, and also
isolated from the default .rgw.root

Once you have that setup, it's like each realm is its own completely
separate rgw deployment, but reside within a single Ceph cluster. It
takes a bit of forethought to get everything set up correctly. You will
need to create keys for each realm and be mindful of which realm you are
operating on. It may take several attempts to get everything in place
the way you expect it to be.


On Sat, 2023-05-20 at 15:42 +0330, Ramin Najjarbashi wrote:
> I had a typo in s3cmd output :D
> 
> I am encountering an issue with the LocationConstraint when using
> s3cmd to
> create a bucket. In my setup, I have two zone groups within the same
> realm.
> My intention is to create the bucket in the second zone group by
> correctly
> setting the bucket location using bucket_location:
> abrak nt>.
> However, despite configuring it this way, the bucket is still being
> created
> in the first zone group.
> 
> https://gist.github.com/RaminNietzsche/1ff266a6158b437319dcd2eb10eeb34e
> 
> ```sh
> s3cmd --region zg2-api-name mb s3://test-zg2s3cmd info
> s3://test-zg2s3://test-zg2/ (bucket):
>    Location:  zg1-api-name
>    Payer: BucketOwner
>    Expiration Rule: none
>    Policy:    none
>    CORS:  none
>    ACL:   development: FULL_CONTROL
> ```
> 
> 
> On Thu, May 18, 2023 at 4:29 PM Ramin Najjarbashi <
> ramin.najarba...@gmail.com> wrote:
> 
> > Thank Casey
> > 
> > Currently, when I create a new bucket and specify the bucket
> > location as
> > zone group 2, I expect the request to be handled by the master zone
> > in zone
> > group 1, as it is the expected behavior. However, I noticed that
> > regardless
> > of the specified bucket location, the zone group ID for all buckets
> > created
> > using this method remains the same as zone group 1.
> > 
> > 
> > My expectation was that when I create a bucket in zone group 2, the
> > zone
> > group ID in the bucket metadata would reflect the correct zone group
> > ID.
> > 
> > 
> > 
> > On Thu, May 18, 2023 at 15:54 Casey Bodley 
> > wrote:
> > 
> > > On Wed, May 17, 2023 at 11:13 PM Ramin Najjarbashi
> > >  wrote:
> > > > 
> > > > Hi
> > > > 
> > > > I'm currently using Ceph version 16.2.7 and facing an issue with
> > > > bucket
> > > > creation in a multi-zone configuration. My setup includes two
> > > > zone
> > > groups:
> > > > 
> > > > ZG1 (Master) and ZG2, with one zone in each zone group (zone-1
> > > > in ZG1
> > > and
> > > > zone-2 in ZG2).
> > > > 
> > > > The objective is to create buckets in a specific zone group
> > > > (ZG2) using
> > > the
> > > > bucket constructor.
> > > > However, despite setting the desired zone group (abrak) in the
> > > > request,
> > > the
> > > > bucket is still being created in the master zone group (ZG1).
> > > > I have defined the following endpoint pattern for each zone
> > > > group:
> > > > 
> > > > s3.{zg}.mydomain.com
> > > > 
> > > > I am using the s3cmd client to interact with the Ceph cluster. I
> > > > have
> > > > ensured that I provide the necessary endpoint and region
> > > > information
> > > while
> > > > executing the bucket creation command. Despite my 

[ceph-users] [Ceph | Quency ]The scheduled snapshots are not getting created till we create a manual backup.

2023-05-29 Thread Lokendra Rathour
Hi Team,



*Problem:*

Create scheduled snapshots of the ceph subvolume.



*Expected Result:*

The scheduled snapshots should be created at the given scheduled time.



*Actual Result:*

The scheduled snapshots are not getting created till we create a manual
backup.



*Description:*

*Ceph version: 17(quincy)*

OS: Centos/Almalinux





The scheduled snapshot creation is not working and we were only able to see
the following logs in the file "ceph-mgr.storagenode3.log":



*2023-05-29T04:59:35.101+ 7f4cd3ad8700  0 [snap_schedule INFO mgr_util]
scanning for idle connections..*

*2023-05-29T04:59:35.101+ 7f4cd3ad8700  0 [snap_schedule DEBUG
mgr_util] fs_name (cephfs) connections ([])*

*2023-05-29T04:59:35.101+ 7f4cd3ad8700  0 [snap_schedule INFO mgr_util]
cleaning up connections: [*





The command which we were executing to add the snapshot schedule:

*ceph fs snap-schedule add /volumes//
 *

*eg.*

*ceph fs snap-schedule add /volumes/xyz/test_restore_53 1h
2023-05-26T11:05:00*



We can make sure that the schedule has been created using the following
commands:

*#ceph fs snap-schedule list / --recursive=true*

*#ceph fs snap-schedule status /volumes/xyz/test_restore_53*



Even though we created the snapshot schedule, snapshots were not getting
created.

We then tried creating a manual snapshot for one of the sub-volumes using
the following command:

*#ceph fs subvolume snapshot create cephfs  
--group_name *

*eg. ceph fs subvolume snapshot create cephfs test_restore_53 snapshot-1
--group_name xyz*



To check the snapshots created we can use the following command:

*ceph fs subvolume snapshot ls cephfs  
*

*eg. ceph fs subvolume snapshot ls cephfs test_restore_53 snapshot-1 xyz*



To delete the manually created snapshot:

*ceph fs subvolume snapshot rm cephfs  
*

*eg. ceph fs subvolume snapshot rm cephfs test_restore_53 snapshot-1 xyz*



To our surprise, the scheduled snapshots started working. We also applied
the retention policy and seems to be working fine.

We re-tested this understanding for another subvolume. And the scheduled
snapshots only started once we triggered a manual snapshot.



Could you please help us out with this?



-- 
~ Lokendra
skype: lokendrarathour
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io