date:20230424

[ceph-users] Re: Bucket sync policy

2023-04-24 Thread Matthew Darwin

I have basically given up relying on bucket sync to work properly in 
quincy.  I have been running a cron job to manually sync files between 
datacentres to catch the files that don't get replicated.  It's pretty 
inefficient, but at least all the files get to the backup datacentre.


Would love to have this working properly.

On 2023-04-24 16:56, Matt Benjamin wrote:

I'm unclear whether all of this currently works on upstream quincy
(apologies if all such backports have been done)?  You might retest against
reef or your ceph/main branch.

Matt

On Mon, Apr 24, 2023 at 2:52 PM Yixin Jin  wrote:


  Actually, "bucket sync run" somehow made it worse since now the
destination zone shows "bucket is caught up with source" from "bucket sync
status" even though it clearly missed an object.

 On Monday, April 24, 2023 at 02:37:46 p.m. EDT, Yixin Jin <
yji...@yahoo.ca> wrote:

   An update:
After creating and enabling the bucket sync policy, I ran "bucket sync
markers" and saw that each shard had the status of "init". The run "bucket
sync run" in the end marked the status to be "incremental-sync", which
seems to go through full-sync stage. However, the lone object in the source
zone wasn't synced over to the destination zone.
I actually used gdb to walk through radosgw-admin to run "bucket sync
run". It seems not to do anything for full-sync and it printed a log saying
"finished iterating over all available prefixes:...", which actually broke
off the do-while loop after the call to
prefix_handler.revalidate_marker(_marker). This call returned false
because it couldn't find rules from the sync pipe. I haven't drilled deeper
to see why it didn't get rules, whatever it means. Nevertheless, the
workaround with "bucket sync run" doesn't seem to work, at least not with
Quincy.

Regards,Yixin

 On Monday, April 24, 2023 at 12:37:24 p.m. EDT, Soumya Koduri <
skod...@redhat.com> wrote:

  On 4/24/23 21:52, Yixin Jin wrote:

Hello ceph gurus,

We are trying bucket-specific sync policy feature with Quincy release

and we encounter something strange. Our test setup is very simple. I use
mstart.sh to spin up 3 clusters, configure them with a single realm, a
single zonegroup and 3 zones – z0, z1, z2, with z0 being the master. I
created a zonegroup-level sync policy with “allowed”, a symmetrical flow
among all 3 zones and a pipe allowing all zones to all zones. I created a
single bucket “test-bucket” at z0 and uploaded a single object to it. By
now, there should be no sync since the policy is “allowed” only and I can
see the single file only exist in z0 and “bucket sync status” shows the
sync is actually disabled. Finally, I created a bucket-specific sync policy
being “enabled” and a pipe between z0 and z1 only. I expected that sync
should be kicked off between z0 and z1 and I did see from “sync info” that
there are sources/dests being z0/z1. “bucket sync status” also shows the
source zone and source bucket. At z0, it shows everything is caught up but
at z1 it shows one shard is behind, which is expected since that only
object exists in z0 but not in z1.



Now, here comes the strange part. Although z1 shows there is one shard

behind, it doesn’t seem to make any progress on syncing it. It doesn’t seem
to do any full sync at all since “bucket sync status” shows “full sync:
0/11 shards”. There hasn’t been any full sync since otherwise, z1 should
have that only object. It is stuck in this condition forever until I make
another upload on the same object. I suspect the update of the object
triggers a new data log, which triggers the sync. Why wasn’t there a full
sync and how can one force a full sync?

yes this is known_issue yet to be addressed with bucket level sync
policy ( -https://tracker.ceph.com/issues/57489  ). The interim
workaround to sync existing objects  is to either

* create new objects (or)

* execute "bucket sync run"

after creating/enabling the bucket policy.

Please note that this issue is specific to only bucket policy but
doesn't exist for sync-policy set at zonegroup level.


Thanks,

Soumya


___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io

___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Bucket sync policy

2023-04-24 Thread Matt Benjamin

I'm unclear whether all of this currently works on upstream quincy
(apologies if all such backports have been done)?  You might retest against
reef or your ceph/main branch.

Matt

On Mon, Apr 24, 2023 at 2:52 PM Yixin Jin  wrote:

>  Actually, "bucket sync run" somehow made it worse since now the
> destination zone shows "bucket is caught up with source" from "bucket sync
> status" even though it clearly missed an object.
>
> On Monday, April 24, 2023 at 02:37:46 p.m. EDT, Yixin Jin <
> yji...@yahoo.ca> wrote:
>
>   An update:
> After creating and enabling the bucket sync policy, I ran "bucket sync
> markers" and saw that each shard had the status of "init". The run "bucket
> sync run" in the end marked the status to be "incremental-sync", which
> seems to go through full-sync stage. However, the lone object in the source
> zone wasn't synced over to the destination zone.
> I actually used gdb to walk through radosgw-admin to run "bucket sync
> run". It seems not to do anything for full-sync and it printed a log saying
> "finished iterating over all available prefixes:...", which actually broke
> off the do-while loop after the call to
> prefix_handler.revalidate_marker(_marker). This call returned false
> because it couldn't find rules from the sync pipe. I haven't drilled deeper
> to see why it didn't get rules, whatever it means. Nevertheless, the
> workaround with "bucket sync run" doesn't seem to work, at least not with
> Quincy.
>
> Regards,Yixin
>
> On Monday, April 24, 2023 at 12:37:24 p.m. EDT, Soumya Koduri <
> skod...@redhat.com> wrote:
>
>  On 4/24/23 21:52, Yixin Jin wrote:
> > Hello ceph gurus,
> >
> > We are trying bucket-specific sync policy feature with Quincy release
> and we encounter something strange. Our test setup is very simple. I use
> mstart.sh to spin up 3 clusters, configure them with a single realm, a
> single zonegroup and 3 zones – z0, z1, z2, with z0 being the master. I
> created a zonegroup-level sync policy with “allowed”, a symmetrical flow
> among all 3 zones and a pipe allowing all zones to all zones. I created a
> single bucket “test-bucket” at z0 and uploaded a single object to it. By
> now, there should be no sync since the policy is “allowed” only and I can
> see the single file only exist in z0 and “bucket sync status” shows the
> sync is actually disabled. Finally, I created a bucket-specific sync policy
> being “enabled” and a pipe between z0 and z1 only. I expected that sync
> should be kicked off between z0 and z1 and I did see from “sync info” that
> there are sources/dests being z0/z1. “bucket sync status” also shows the
> source zone and source bucket. At z0, it shows everything is caught up but
> at z1 it shows one shard is behind, which is expected since that only
> object exists in z0 but not in z1.
> >
> >
> >
> > Now, here comes the strange part. Although z1 shows there is one shard
> behind, it doesn’t seem to make any progress on syncing it. It doesn’t seem
> to do any full sync at all since “bucket sync status” shows “full sync:
> 0/11 shards”. There hasn’t been any full sync since otherwise, z1 should
> have that only object. It is stuck in this condition forever until I make
> another upload on the same object. I suspect the update of the object
> triggers a new data log, which triggers the sync. Why wasn’t there a full
> sync and how can one force a full sync?
>
> yes this is known_issue yet to be addressed with bucket level sync
> policy ( - https://tracker.ceph.com/issues/57489 ). The interim
> workaround to sync existing objects  is to either
>
> * create new objects (or)
>
> * execute "bucket sync run"
>
> after creating/enabling the bucket policy.
>
> Please note that this issue is specific to only bucket policy but
> doesn't exist for sync-policy set at zonegroup level.
>
>
> Thanks,
>
> Soumya
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: pacific 16.2.13 point release

2023-04-24 Thread Cory Snyder

Hi Yuri,

We were hoping that the following patch would make it in for 16.2.13 if 
possible:

https://github.com/ceph/ceph/pull/51200

Thanks,

Cory Snyder



From: Yuri Weinstein 
Sent: Monday, April 24, 2023 11:39 AM
To: dev ; ceph-users ; clt 
Subject: pacific 16.2.13 point release 
 
We want to do the next urgent point release for pacific 16.2.13 ASAP.

The tip of the current pacific branch will be used as a base for this
release and we will build it later today.

Dev leads - if you have any outstanding PRs that must be included pls
merged them now.

Thx
YuriW
___
Dev mailing list -- d...@ceph.io
To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Bucket sync policy

2023-04-24 Thread Yixin Jin

 Re-post my rely to make sure it goes through the mailing list.

   - Forwarded Message - From: Yixin Jin To: Soumya 
Koduri Sent: Monday, April 24, 2023 at 02:37:46 p.m. 
EDTSubject: Re: [ceph-users] Re: Bucket sync policy
  An update:
After creating and enabling the bucket sync policy, I ran "bucket sync markers" 
and saw that each shard had the status of "init". The run "bucket sync run" in 
the end marked the status to be "incremental-sync", which seems to go through 
full-sync stage. However, the lone object in the source zone wasn't synced over 
to the destination zone.
I actually used gdb to walk through radosgw-admin to run "bucket sync run". It 
seems not to do anything for full-sync and it printed a log saying "finished 
iterating over all available prefixes:...", which actually broke off the 
do-while loop after the call to prefix_handler.revalidate_marker(_marker). 
This call returned false because it couldn't find rules from the sync pipe. I 
haven't drilled deeper to see why it didn't get rules, whatever it means. 
Nevertheless, the workaround with "bucket sync run" doesn't seem to work, at 
least not with Quincy.

Regards,Yixin

On Monday, April 24, 2023 at 12:37:24 p.m. EDT, Soumya Koduri 
 wrote:  
 
 On 4/24/23 21:52, Yixin Jin wrote:
> Hello ceph gurus,
>
> We are trying bucket-specific sync policy feature with Quincy release and we 
> encounter something strange. Our test setup is very simple. I use mstart.sh 
> to spin up 3 clusters, configure them with a single realm, a single zonegroup 
> and 3 zones – z0, z1, z2, with z0 being the master. I created a 
> zonegroup-level sync policy with “allowed”, a symmetrical flow among all 3 
> zones and a pipe allowing all zones to all zones. I created a single bucket 
> “test-bucket” at z0 and uploaded a single object to it. By now, there should 
> be no sync since the policy is “allowed” only and I can see the single file 
> only exist in z0 and “bucket sync status” shows the sync is actually 
> disabled. Finally, I created a bucket-specific sync policy being “enabled” 
> and a pipe between z0 and z1 only. I expected that sync should be kicked off 
> between z0 and z1 and I did see from “sync info” that there are sources/dests 
> being z0/z1. “bucket sync status” also shows the source zone and source 
> bucket. At z0, it shows everything is caught up but at z1 it shows one shard 
> is behind, which is expected since that only object exists in z0 but not in 
> z1.
>
>  
>
> Now, here comes the strange part. Although z1 shows there is one shard 
> behind, it doesn’t seem to make any progress on syncing it. It doesn’t seem 
> to do any full sync at all since “bucket sync status” shows “full sync: 0/11 
> shards”. There hasn’t been any full sync since otherwise, z1 should have that 
> only object. It is stuck in this condition forever until I make another 
> upload on the same object. I suspect the update of the object triggers a new 
> data log, which triggers the sync. Why wasn’t there a full sync and how can 
> one force a full sync?

yes this is known_issue yet to be addressed with bucket level sync 
policy ( - https://tracker.ceph.com/issues/57489 ). The interim 
workaround to sync existing objects  is to either

* create new objects (or)

* execute "bucket sync run"

after creating/enabling the bucket policy.

Please note that this issue is specific to only bucket policy but 
doesn't exist for sync-policy set at zonegroup level.


Thanks,

Soumya


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Bucket sync policy

2023-04-24 Thread Yixin Jin

 Actually, "bucket sync run" somehow made it worse since now the destination 
zone shows "bucket is caught up with source" from "bucket sync status" even 
though it clearly missed an object.

On Monday, April 24, 2023 at 02:37:46 p.m. EDT, Yixin Jin  
wrote:  

  An update:
After creating and enabling the bucket sync policy, I ran "bucket sync markers" 
and saw that each shard had the status of "init". The run "bucket sync run" in 
the end marked the status to be "incremental-sync", which seems to go through 
full-sync stage. However, the lone object in the source zone wasn't synced over 
to the destination zone.
I actually used gdb to walk through radosgw-admin to run "bucket sync run". It 
seems not to do anything for full-sync and it printed a log saying "finished 
iterating over all available prefixes:...", which actually broke off the 
do-while loop after the call to prefix_handler.revalidate_marker(_marker). 
This call returned false because it couldn't find rules from the sync pipe. I 
haven't drilled deeper to see why it didn't get rules, whatever it means. 
Nevertheless, the workaround with "bucket sync run" doesn't seem to work, at 
least not with Quincy.

Regards,Yixin

On Monday, April 24, 2023 at 12:37:24 p.m. EDT, Soumya Koduri 
 wrote:  

 On 4/24/23 21:52, Yixin Jin wrote:
> Hello ceph gurus,
>
> We are trying bucket-specific sync policy feature with Quincy release and we 
> encounter something strange. Our test setup is very simple. I use mstart.sh 
> to spin up 3 clusters, configure them with a single realm, a single zonegroup 
> and 3 zones – z0, z1, z2, with z0 being the master. I created a 
> zonegroup-level sync policy with “allowed”, a symmetrical flow among all 3 
> zones and a pipe allowing all zones to all zones. I created a single bucket 
> “test-bucket” at z0 and uploaded a single object to it. By now, there should 
> be no sync since the policy is “allowed” only and I can see the single file 
> only exist in z0 and “bucket sync status” shows the sync is actually 
> disabled. Finally, I created a bucket-specific sync policy being “enabled” 
> and a pipe between z0 and z1 only. I expected that sync should be kicked off 
> between z0 and z1 and I did see from “sync info” that there are sources/dests 
> being z0/z1. “bucket sync status” also shows the source zone and source 
> bucket. At z0, it shows everything is caught up but at z1 it shows one shard 
> is behind, which is expected since that only object exists in z0 but not in 
> z1.
>
>  
>
> Now, here comes the strange part. Although z1 shows there is one shard 
> behind, it doesn’t seem to make any progress on syncing it. It doesn’t seem 
> to do any full sync at all since “bucket sync status” shows “full sync: 0/11 
> shards”. There hasn’t been any full sync since otherwise, z1 should have that 
> only object. It is stuck in this condition forever until I make another 
> upload on the same object. I suspect the update of the object triggers a new 
> data log, which triggers the sync. Why wasn’t there a full sync and how can 
> one force a full sync?

yes this is known_issue yet to be addressed with bucket level sync 
policy ( - https://tracker.ceph.com/issues/57489 ). The interim 
workaround to sync existing objects  is to either

* create new objects (or)

* execute "bucket sync run"

after creating/enabling the bucket policy.

Please note that this issue is specific to only bucket policy but 
doesn't exist for sync-policy set at zonegroup level.

Thanks,

Soumya

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Bucket sync policy

2023-04-24 Thread Soumya Koduri


On 4/24/23 21:52, Yixin Jin wrote:

Hello ceph gurus,

We are trying bucket-specific sync policy feature with Quincy release and we 
encounter something strange. Our test setup is very simple. I use mstart.sh to 
spin up 3 clusters, configure them with a single realm, a single zonegroup and 
3 zones – z0, z1, z2, with z0 being the master. I created a zonegroup-level 
sync policy with “allowed”, a symmetrical flow among all 3 zones and a pipe 
allowing all zones to all zones. I created a single bucket “test-bucket” at z0 
and uploaded a single object to it. By now, there should be no sync since the 
policy is “allowed” only and I can see the single file only exist in z0 and 
“bucket sync status” shows the sync is actually disabled. Finally, I created a 
bucket-specific sync policy being “enabled” and a pipe between z0 and z1 only. 
I expected that sync should be kicked off between z0 and z1 and I did see from 
“sync info” that there are sources/dests being z0/z1. “bucket sync status” also 
shows the source zone and source bucket. At z0, it shows everything is caught 
up but at z1 it shows one shard is behind, which is expected since that only 
object exists in z0 but not in z1.

  


Now, here comes the strange part. Although z1 shows there is one shard behind, 
it doesn’t seem to make any progress on syncing it. It doesn’t seem to do any 
full sync at all since “bucket sync status” shows “full sync: 0/11 shards”. 
There hasn’t been any full sync since otherwise, z1 should have that only 
object. It is stuck in this condition forever until I make another upload on 
the same object. I suspect the update of the object triggers a new data log, 
which triggers the sync. Why wasn’t there a full sync and how can one force a 
full sync?


yes this is known_issue yet to be addressed with bucket level sync 
policy ( - https://tracker.ceph.com/issues/57489 ). The interim 
workaround to sync existing objects  is to either


* create new objects (or)

* execute "bucket sync run"

after creating/enabling the bucket policy.

Please note that this issue is specific to only bucket policy but 
doesn't exist for sync-policy set at zonegroup level.



Thanks,

Soumya


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Bucket sync policy

2023-04-24 Thread Yixin Jin

Hello ceph gurus,

We are trying bucket-specific sync policy feature with Quincy release and we 
encounter something strange. Our test setup is very simple. I use mstart.sh to 
spin up 3 clusters, configure them with a single realm, a single zonegroup and 
3 zones – z0, z1, z2, with z0 being the master. I created a zonegroup-level 
sync policy with “allowed”, a symmetrical flow among all 3 zones and a pipe 
allowing all zones to all zones. I created a single bucket “test-bucket” at z0 
and uploaded a single object to it. By now, there should be no sync since the 
policy is “allowed” only and I can see the single file only exist in z0 and 
“bucket sync status” shows the sync is actually disabled. Finally, I created a 
bucket-specific sync policy being “enabled” and a pipe between z0 and z1 only. 
I expected that sync should be kicked off between z0 and z1 and I did see from 
“sync info” that there are sources/dests being z0/z1. “bucket sync status” also 
shows the source zone and source bucket. At z0, it shows everything is caught 
up but at z1 it shows one shard is behind, which is expected since that only 
object exists in z0 but not in z1.

 

Now, here comes the strange part. Although z1 shows there is one shard behind, 
it doesn’t seem to make any progress on syncing it. It doesn’t seem to do any 
full sync at all since “bucket sync status” shows “full sync: 0/11 shards”. 
There hasn’t been any full sync since otherwise, z1 should have that only 
object. It is stuck in this condition forever until I make another upload on 
the same object. I suspect the update of the object triggers a new data log, 
which triggers the sync. Why wasn’t there a full sync and how can one force a 
full sync?

 
I also tried “sync error list” and they are all empty. I also tried to apply 
the fix in https://tracker.ceph.com/issues/57853, although I am not sure if it 
is relevant. The fix didn’t change the behavior that we observed. I also tried 
"bucket sync init" and "bucket sync run" via radosgw-admin. They don't seem to 
do what I expected. They simply mark z1 as not behind anymore but the single 
object still lives in z0 only.
I wonder how mature this sync policy feature is for production use.
Thanks,Yixin





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] pacific 16.2.13 point release

2023-04-24 Thread Yuri Weinstein

We want to do the next urgent point release for pacific 16.2.13 ASAP.

The tip of the current pacific branch will be used as a base for this
release and we will build it later today.

Dev leads - if you have any outstanding PRs that must be included pls
merged them now.

Thx
YuriW
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: v16.2.12 Pacific (hot-fix) released

2023-04-24 Thread Michel Jouvin


Hi Wesley,

I can only answer your second question and give an opinion on the last one!

- Yes the OSD activation problem (in cephadm clusters only) was 
introduce by an unfortunate change (indentation problem in Python code) 
in 16.2.11. The issue doesn't exist in 16.2.10 and is one of the fixed 
issue in 16.2.12 (with the current caveat for this version).


- Because of 16.2.11 issue and the missing validation for some of the 
fixes in current 16.2.12, I'd say that if you want to upgrade to 
Pacific, it is advised to use 16.2.10. At least, it is what we done 
succesfully on 2 cephadm-based Ceph clusters.


Cheers,

Michel

Le 24/04/2023 à 15:46, Wesley Dillingham a écrit :

A few questions:

- Will the 16.2.12 packages be "corrected" and reuploaded to the ceph.com
mirror? or will 16.2.13 become what 16.2.12 was supposed to be?

- Was the osd activation regression introduced in 16.2.11 (or does 16.2.10
have it as well)?

- Were the hotfxes in 16.2.12 just related to perf / time-to-activation or
was there a total failure to activate / other breaking issue?

- Which version of Pacific is recommended at this time?

Thank you very much.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Apr 24, 2023 at 3:16 AM Simon Oosthoek 
wrote:


Dear List

we upgraded to 16.2.12 on April 17th, since then we've seen some
unexplained downed osd services in our cluster (264 osds), is there any
risk of data loss, if so, would it be possible to downgrade or is a fix
expected soon? if so, when? ;-)

FYI, we are running a cluster without cephadm, installed from packages.

Cheers

/Simon

On 23/04/2023 03:03, Yuri Weinstein wrote:

We are writing to inform you that Pacific v16.2.12, released on April
14th, has many unintended commits in the changelog than listed in the
release notes [1].

As these extra commits are not fully tested, we request that all users
please refrain from upgrading to v16.2.12 at this time. The current
v16.2.12 will be QE validated and released as soon as possible.

v16.2.12 was a hotfix release meant to resolve several performance
flaws in ceph-volume, particularly during osd activation. The extra
commits target v16.2.13.

We apologize for the inconvenience. Please reach out to the mailing
list with any questions.

[1]

https://urldefense.com/v3/__https://ceph.io/en/news/blog/2023/v16-2-12-pacific-released/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fNaPJ0M8$

On Fri, Apr 14, 2023 at 9:42 AM Yuri Weinstein 

wrote:

We're happy to announce the 12th hot-fix release in the Pacific series.



https://urldefense.com/v3/__https://ceph.io/en/news/blog/2023/v16-2-12-pacific-released/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fNaPJ0M8$

Notable Changes
---
This is a hotfix release that resolves several performance flaws in

ceph-volume,

particularly during osd activation (

https://urldefense.com/v3/__https://tracker.ceph.com/issues/57627__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fg0yeu7U$
)

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at

https://urldefense.com/v3/__https://download.ceph.com/tarballs/ceph-16.2.12.tar.gz__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fBEJl5p4$

* Containers at

https://urldefense.com/v3/__https://quay.io/repository/ceph/ceph__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fc7HeSms$

* For packages, see

https://urldefense.com/v3/__https://docs.ceph.com/en/latest/install/get-packages/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fAKdWZK4$

* Release git sha1: 5a2d516ce4b134bfafc80c4274532ac0d56fc1e2

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Disks are filling up

2023-04-24 Thread Omar Siam


Hi list,

we created a cluster for using cephfs with a kubernetes cluster. Since a 
few weeks now the cluster keeps filling up at an alarming rate

(100 GB per day).
This is while the most relevant pg is deep scrubbing and was interupted 
a few times.


We use about 150G (du using the mounted filesystem) on the cephfs 
filesystem and try not to use snapshots (.snap directories "exist" but 
are empty).
We do not understand why the pgs get bigger and bigger while cephfs 
stays about the same size (overwrites on files certainly happen).

I suspect some snapshots mechanism. Any ideas how to debug this to stop it?

Maybe we should try to speed up the deep scrubbing somehow?

ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific 
(stable)


  cluster:
    id: ece0290c-cd32-11ec-a0e2-005056a9dd02
    health: HEALTH_WARN
    1 MDSs report slow metadata IOs
    3 nearfull osd(s)
    13 pool(s) nearfull

  services:
    mon: 3 daemons, quorum 
acdh-gluster-hdd3,acdh-gluster-hdd1,acdh-gluster-hdd2 (age 3d)
    mgr: acdh-gluster-hdd3.kzsplh(active, since 5d), standbys: 
acdh-gluster-hdd2.kiotbg, acdh-gluster-hdd1.ywgyfx

    mds: 1/1 daemons up, 2 standby
    osd: 3 osds: 3 up (since 4d), 3 in (since 7w)
    rgw: 3 daemons active (3 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   13 pools, 292 pgs
    objects: 167.25M objects, 1.2 TiB
    usage:   3.3 TiB used, 1.2 TiB / 4.5 TiB avail
    pgs: 290 active+clean
 1   active+clean+scrubbing+deep
 1   active+clean+scrubbing

  io:
    client:   58 MiB/s rd, 3.6 MiB/s wr, 51 op/s rd, 148 op/s wr

rancher-ceph-fs - 227 clients
===
RANK  STATE  MDS    ACTIVITY DNS    
INOS   DIRS   CAPS
 0    active  ceph-mds.acdh-gluster-hdd1.pqydya  Reqs:   68 /s 793k   
792k   102k   210k

  POOL  TYPE USED  AVAIL
 rancherFsPoolMetadata    metadata   160G   329G
rancherFsPoolDefaultData    data    2268k   329G
 rancherFsPoolMainData  data    2584G   658G
   STANDBY MDS
ceph-mds.acdh-gluster-hdd2.zfleqe
ceph-mds.acdh-gluster-hdd3.etaobl
MDS version: ceph version 16.2.11 
(3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)


(rancherFsPoolMainData is a 2+1 erasure encoded pool)

--- RAW STORAGE ---
CLASS SIZE    AVAIL USED  RAW USED  %RAW USED
hdd    4.5 TiB  1.2 TiB  3.3 TiB   3.3 TiB  73.46
TOTAL  4.5 TiB  1.2 TiB  3.3 TiB   3.3 TiB  73.46

--- POOLS ---
POOL    ID  PGS   STORED  OBJECTS USED %USED  
MAX AVAIL
device_health_metrics    1    1  0 B    0  0 B  0    
331 GiB
rancher-rbd-erasure  2   32  8.4 GiB    2.16k   13 GiB 1.25    
661 GiB
rancher-rbd-meta 3   32 55 B   11   36 KiB  0    
331 GiB
rancherFsPoolMetadata    4   32   53 GiB    5.18M  160 GiB 13.88    
331 GiB
rancherFsPoolDefaultData 5    1   29 KiB   80.00M  2.2 MiB  0    
331 GiB
rancherFsPoolMainData    6    1  1.7 TiB   82.08M  2.5 TiB 72.23    
661 GiB
.rgw.root    7   32  1.3 KiB    4   48 KiB  0    
331 GiB
default.rgw.log  8   32  3.6 KiB  209  408 KiB  0    
331 GiB
default.rgw.control  9   32  0 B    8  0 B  0    
331 GiB
default.rgw.meta    10   32  3.8 KiB   11  124 KiB  0    
331 GiB
default.rgw.buckets.index   11   32  2.4 MiB   33  7.2 MiB  0    
331 GiB
default.rgw.buckets.non-ec  12   32  0 B    0  0 B  0    
331 GiB
default.rgw.buckets.data    14    1   55 GiB   16.57k   83 GiB 7.70    
661 GiB


HEALTH_WARN 1 MDSs report slow metadata IOs; 3 nearfull osd(s); 13 
pool(s) nearfull

[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
    mds.ceph-mds.acdh-gluster-hdd1.pqydya(mds.0): 100+ slow metadata 
IOs are blocked > 30 secs, oldest blocked for 306 secs

[WRN] OSD_NEARFULL: 3 nearfull osd(s)
    osd.0 is near full
    osd.2 is near full
    osd.3 is near full
[WRN] POOL_NEARFULL: 13 pool(s) nearfull
    pool 'device_health_metrics' is nearfull
    pool 'rancher-rbd-erasure' is nearfull
    pool 'rancher-rbd-meta' is nearfull
    pool 'rancherFsPoolMetadata' is nearfull
    pool 'rancherFsPoolDefaultData' is nearfull
    pool 'rancherFsPoolMainData' is nearfull
    pool '.rgw.root' is nearfull
    pool 'default.rgw.log' is nearfull
    pool 'default.rgw.control' is nearfull
    pool 'default.rgw.meta' is nearfull
    pool 'default.rgw.buckets.index' is nearfull
    pool 'default.rgw.buckets.non-ec' is nearfull
    pool 'default.rgw.buckets.data' is nearfull

(near full is set to 0.66)

--
Mag. Ing. Omar Siam
Austrian Center for Digital Humanities and Cultural Heritage
Österreichische Akademie der Wissenschaften | Austrian Academy of Sciences
Stellvertretende Behindertenvertrauensperson | Deputy representative for 
disabled persons
Bäckerstraße 13, 1010 Wien, Österreich | Vienna, Austria
T: +43 1 51581-7295

[ceph-users] Re: v16.2.12 Pacific (hot-fix) released

2023-04-24 Thread Wesley Dillingham

A few questions:

- Will the 16.2.12 packages be "corrected" and reuploaded to the ceph.com
mirror? or will 16.2.13 become what 16.2.12 was supposed to be?

- Was the osd activation regression introduced in 16.2.11 (or does 16.2.10
have it as well)?

- Were the hotfxes in 16.2.12 just related to perf / time-to-activation or
was there a total failure to activate / other breaking issue?

- Which version of Pacific is recommended at this time?

Thank you very much.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Apr 24, 2023 at 3:16 AM Simon Oosthoek 
wrote:

> Dear List
>
> we upgraded to 16.2.12 on April 17th, since then we've seen some
> unexplained downed osd services in our cluster (264 osds), is there any
> risk of data loss, if so, would it be possible to downgrade or is a fix
> expected soon? if so, when? ;-)
>
> FYI, we are running a cluster without cephadm, installed from packages.
>
> Cheers
>
> /Simon
>
> On 23/04/2023 03:03, Yuri Weinstein wrote:
> > We are writing to inform you that Pacific v16.2.12, released on April
> > 14th, has many unintended commits in the changelog than listed in the
> > release notes [1].
> >
> > As these extra commits are not fully tested, we request that all users
> > please refrain from upgrading to v16.2.12 at this time. The current
> > v16.2.12 will be QE validated and released as soon as possible.
> >
> > v16.2.12 was a hotfix release meant to resolve several performance
> > flaws in ceph-volume, particularly during osd activation. The extra
> > commits target v16.2.13.
> >
> > We apologize for the inconvenience. Please reach out to the mailing
> > list with any questions.
> >
> > [1]
> https://urldefense.com/v3/__https://ceph.io/en/news/blog/2023/v16-2-12-pacific-released/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fNaPJ0M8$
> >
> > On Fri, Apr 14, 2023 at 9:42 AM Yuri Weinstein 
> wrote:
> >>
> >> We're happy to announce the 12th hot-fix release in the Pacific series.
> >>
> >>
> https://urldefense.com/v3/__https://ceph.io/en/news/blog/2023/v16-2-12-pacific-released/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fNaPJ0M8$
> >>
> >> Notable Changes
> >> ---
> >> This is a hotfix release that resolves several performance flaws in
> ceph-volume,
> >> particularly during osd activation (
> https://urldefense.com/v3/__https://tracker.ceph.com/issues/57627__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fg0yeu7U$
> )
> >> Getting Ceph
> >>
> >> 
> >> * Git at git://github.com/ceph/ceph.git
> >> * Tarball at
> https://urldefense.com/v3/__https://download.ceph.com/tarballs/ceph-16.2.12.tar.gz__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fBEJl5p4$
> >> * Containers at
> https://urldefense.com/v3/__https://quay.io/repository/ceph/ceph__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fc7HeSms$
> >> * For packages, see
> https://urldefense.com/v3/__https://docs.ceph.com/en/latest/install/get-packages/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fAKdWZK4$
> >> * Release git sha1: 5a2d516ce4b134bfafc80c4274532ac0d56fc1e2
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-24 Thread Michel Jouvin


Hi,

I'm still interesting by getting feedback from those using the LRC 
plugin about the right way to configure it... Last week I upgraded from 
Pacific to Quincy (17.2.6) with cephadm which is doing the upgrade host 
by host, checking if an OSD is ok to stop before actually upgrading it. 
I had the surprise to see 1 or 2 PGs down at some points in the upgrade 
(happened not for all OSDs but for every site/datacenter). Looking at 
the details with "ceph health detail", I saw that for these PGs there 
was 3 OSDs down but I was expecting the pool to be resilient to 6 OSDs 
down (5 for R/W access) so I'm wondering if there is something wrong in 
our pool configuration (k=9, m=6, l=5).


Cheers,

Michel

Le 06/04/2023 à 08:51, Michel Jouvin a écrit :

Hi,

Is somebody using LRC plugin ?

I came to the conclusion that LRC  k=9, m=3, l=4 is not the same as 
jerasure k=9, m=6 in terms of protection against failures and that I 
should use k=9, m=6, l=5 to get a level of resilience >= jerasure k=9, 
m=6. The example in the documentation (k=4, m=2, l=3) suggests that 
this LRC configuration gives something better than jerasure k=4, m=2 
as it is resilient to 3 drive failures (but not 4 if I understood 
properly). So how many drives can fail in the k=9, m=6, l=5 
configuration first without loosing RW access and second without 
loosing data?


Another thing that I don't quite understand is that a pool created 
with this configuration (and failure domain=osd, locality=datacenter) 
has a min_size=3 (max_size=18 as expected). It seems wrong to me, I'd 
expected something ~10 (depending on answer to the previous question)...


Thanks in advance if somebody could provide some sort of authoritative 
answer on these 2 questions. Best regards,


Michel

Le 04/04/2023 à 15:53, Michel Jouvin a écrit :
Answering to myself, I found the reason for 2147483647: it's 
documented as a failure to find enough OSD (missing OSDs). And it is 
normal as I selected different hosts for the 15 OSDs but I have only 
12 hosts!


I'm still interested by an "expert" to confirm that LRC  k=9, m=3, 
l=4 configuration is equivalent, in terms of redundancy, to a 
jerasure configuration with k=9, m=6.


Michel

Le 04/04/2023 à 15:26, Michel Jouvin a écrit :

Hi,

As discussed in another thread (Crushmap rule for multi-datacenter 
erasure coding), I'm trying to create an EC pool spanning 3 
datacenters (datacenters are present in the crushmap), with the 
objective to be resilient to 1 DC down, at least keeping the 
readonly access to the pool and if possible the read-write access, 
and have a storage efficiency better than 3 replica (let say a 
storage overhead <= 2).


In the discussion, somebody mentioned LRC plugin as a possible 
jerasure alternative to implement this without tweaking the crushmap 
rule to implement the 2-step OSD allocation. I looked at the 
documentation 
(https://docs.ceph.com/en/latest/rados/operations/erasure-code-lrc/) 
but I have some questions if someone has experience/expertise with 
this LRC plugin.


I tried to create a rule for using 5 OSDs per datacenter (15 in 
total), with 3 (9 in total) being data chunks and others being 
coding chunks. For this, based of my understanding of examples, I 
used k=9, m=3, l=4. Is it right? Is this configuration equivalent, 
in terms of redundancy, to a jerasure configuration with k=9, m=6?


The resulting rule, which looks correct to me, is:



{
    "rule_id": 6,
    "rule_name": "test_lrc_2",
    "ruleset": 6,
    "type": 3,
    "min_size": 3,
    "max_size": 15,
    "steps": [
    {
    "op": "set_chooseleaf_tries",
    "num": 5
    },
    {
    "op": "set_choose_tries",
    "num": 100
    },
    {
    "op": "take",
    "item": -4,
    "item_name": "default~hdd"
    },
    {
    "op": "choose_indep",
    "num": 3,
    "type": "datacenter"
    },
    {
    "op": "chooseleaf_indep",
    "num": 5,
    "type": "host"
    },
    {
    "op": "emit"
    }
    ]
}



Unfortunately, it doesn't work as expected: a pool created with this 
rule ends up with its pages active+undersize, which is unexpected 
for me. Looking at 'ceph health detail` output, I see for each page 
something like:


pg 52.14 is stuck undersized for 27m, current state 
active+undersized, last acting 
[90,113,2147483647,103,64,147,164,177,2147483647,133,58,28,8,32,2147483647]


For each PG, there is 3 '2147483647' entries and I guess it is the 
reason of the problem. What are these entries about? Clearly it is 
not OSD entries... Looks like a negative number, -1, which in terms 
of crushmap ID is the crushmap root (named "default" in our 
configuration). Any trivial mistake I would have made?


Thanks in advance for any help or for sharing any successful 
configuration?


Best regards,

Michel
___

[ceph-users] Re: Troubleshooting cephadm OSDs aborting start

2023-04-24 Thread Clyso GmbH - Ceph Foundation Member


Hi André,

at the cephalocon 2023 last week in amsterdam there were two 
presentations by Adam and Mark that might help you.


Joachim

___
Clyso GmbH - Ceph Foundation Member

Am 21.04.23 um 10:53 schrieb André Gemünd:

Dear Ceph-users,

in the meantime I found this ticket which seems to have the same assertion / 
stacktrace but was solved: https://tracker.ceph.com/issues/44532

Anyone have any ideas how it could still happen in 16.2.7?

Greetings
André


- Am 17. Apr 2023 um 10:30 schrieb Andre Gemuend 
andre.gemu...@scai.fraunhofer.de:


Dear Ceph-users,

we have trouble with a Ceph cluster after a full shutdown. A couple of OSDs
don't start anymore, exiting with SIGABRT very quickly. With debug logs and
lots of work (I find cephadm clusters hard to debug btw) we received the
following stack trace:

debug-16> 2023-04-14T11:52:17.617+ 7f10ab4d2700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.7/rpm/el8/BUILD/ceph-16.2.7/src/osd/PGLog.h:
In function 'void PGLog::IndexedLog::add(const pg_log_entry_t&, bool)' thread
7f10ab4d2700 time 2023-04-14T11:52:17.614095+

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.7/rpm/el8/BUILD/ceph-16.2.7/src/osd/PGLog.h:
607: FAILED ceph_assert(head.version == 0 || e.version.version > head.version)


ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158)
[0x55b2dafc7b7e]

2: /usr/bin/ceph-osd(+0x56ad98) [0x55b2dafc7d98]

3: (bool PGLog::append_log_entries_update_missing

(hobject_t const&, std::__cxx11::list
mempool::pool_allocator<(mempool::pool_index_t)22, pg_log_entry_t> > const&,
bool, PGLog::IndexedLog*, pg_missing_set&, PGLog::LogEntryHandler*,
DoutPrefixProvider const*)+0xc19) [0x55b2db1bb6b9]

4: (PGLog::merge_log(pg_info_t&, pg_log_t&&, pg_shard_t, pg_info_t&,
PGLog::LogEntryHandler*, bool&, bool&)+0xee2) [0x55b2db1adf22]

5: (PeeringState::merge_log(ceph::os::Transaction&, pg_info_t&, pg_log_t&&,
pg_shard_t)+0x75) [0x55b2db33c165]

6: (PeeringState::Stray::react(MLogRec const&)+0xcc) [0x55b2db37adec]

7: (boost::statechart::simple_state,
(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
const&, void const*)+0xd5) [0x55b2db3a6e65]

8: (boost::statechart::state_machine,
boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
const&)+0x5b) [0x55b2db18ef6b]

9: (PG::do_peering_event(std::shared_ptr, PeeringCtx&)+0x2d1)
[0x55b2db1839e1]

10: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr,
ThreadPool::TPHandle&)+0x29c) [0x55b2db0fde5c]

11: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*,
boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x56) [0x55b2db32d0e6]

12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xc28)
[0x55b2db0efd48]

13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
[0x55b2db7615b4]

14: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55b2db764254]

15: /lib64/libpthread.so.0(+0x817f) [0x7f10cef1117f]

16: clone()

debug-15> 2023-04-14T11:52:17.618+ 7f10b64e8700  3 osd.70 72507
handle_osd_map epochs [72507,72507], i have 72507, src has [68212,72507]

debug-14> 2023-04-14T11:52:17.619+ 7f10b64e8700  3 osd.70 72507
handle_osd_map epochs [72507,72507], i have 72507, src has [68212,72507]

debug-13> 2023-04-14T11:52:17.619+ 7f10ac4d4700  5 osd.70 pg_epoch:
72507 pg[18.7( v 64162'106 (0'0,64162'106] local-lis/les=72506/72507 n=14
ec=17104/17104 lis/c=72506/72480 les/c/f=72507/72481/0 sis=72506
pruub=9.160680771s) [70,86,41] r=0 lpr=72506 pi=[72480,72506)/1 crt=64162'106
lcod 0'0 mlcod 0'0 active+wait pruub 12.822580338s@ mbc={}] exit
Started/Primary/Active/Activating 0.011269 7 0.000114

# ceph versions
{
"mon": {
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
(stable)": 5
},
"mgr": {
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
(stable)": 2
},
"osd": {
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
(stable)": 92
},
"mds": {
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
(stable)": 2
},
"rgw": {
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
(stable)": 2
},
"overall": {
"ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
(stable)": 103
}
}


Another thing is that things like `ceph -s`, `ceph osd tree`, `rbd ls`,  etc.
work, but `ceph orch ps` (or generally any orch commands) simply hang forever,
seemingly in a futex waiting on a socket to the mons.

If anyone

[ceph-users] Re: Rados gateway data-pool replacement.

2023-04-24 Thread Gaël THEROND

Hi casey,

I’ve tested that while you answered me actually :-)

So, all in all, we can’t stop the radosgw for now and tier cache option
can’t work as we use EC based pools (at least for nautilus).

Due to those constraints we’re currently thinking of the following
procedure:

1°/- Create the new EC Profile.
2°/- Create the new EC based pool and assign it the new profile.
3°/- Create a new storage class that use this new pool.
4°/- Add this storage class to the default placement policy.
5°/- Force a bucket lifecycle objects migration (possible??).

It seems at least one user attempted to do just that in here:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/RND652IBFIG6ESSQXVGNX7NAGCNEVYOU

The only part of that thread that I don’t get is the:

« I think actually moving an already-stored object requires a lifecycle
transition policy… » part of the Matt Benjamin answer.

What kind of policy should I write to do that ??

Is this procedure something that looks ok to you?

Kind regards!

Le mer. 19 avr. 2023 à 14:49, Casey Bodley  a écrit :

> On Wed, Apr 19, 2023 at 5:13 AM Gaël THEROND 
> wrote:
> >
> > Hi everyone, quick question regarding radosgw zone data-pool.
> >
> > I’m currently planning to migrate an old data-pool that was created with
> > inappropriate failure-domain to a newly created pool with appropriate
> > failure-domain.
> >
> > If I’m doing something like:
> > radosgw-admin zone modify —rgw-zone default —data-pool 
> >
> > Will data from the old pool be migrated to the new one or do I need to do
> > something else to migrate those data out of the old pool?
>
> radosgw won't migrate anything. you'll need to use rados tools to do
> that first. make sure you stop all radosgws in the meantime so it
> doesn't write more objects to the old data pool
>
> > I’ve read a lot
> > of mail archive with peoples willing to do that but I can’t get a clear
> > answer from those archives.
> >
> > I’m running on nautilus release of it ever help.
> >
> > Thanks a lot!
> >
> > PS: This mail is a redo of the old one as I’m not sure the former one
> > worked (missing tags).
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: v16.2.12 Pacific (hot-fix) released

2023-04-24 Thread Simon Oosthoek


Dear List

we upgraded to 16.2.12 on April 17th, since then we've seen some 
unexplained downed osd services in our cluster (264 osds), is there any 
risk of data loss, if so, would it be possible to downgrade or is a fix 
expected soon? if so, when? ;-)


FYI, we are running a cluster without cephadm, installed from packages.

Cheers

/Simon

On 23/04/2023 03:03, Yuri Weinstein wrote:

We are writing to inform you that Pacific v16.2.12, released on April
14th, has many unintended commits in the changelog than listed in the
release notes [1].

As these extra commits are not fully tested, we request that all users
please refrain from upgrading to v16.2.12 at this time. The current
v16.2.12 will be QE validated and released as soon as possible.

v16.2.12 was a hotfix release meant to resolve several performance
flaws in ceph-volume, particularly during osd activation. The extra
commits target v16.2.13.

We apologize for the inconvenience. Please reach out to the mailing
list with any questions.

[1] 
https://urldefense.com/v3/__https://ceph.io/en/news/blog/2023/v16-2-12-pacific-released/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fNaPJ0M8$

On Fri, Apr 14, 2023 at 9:42 AM Yuri Weinstein  wrote:


We're happy to announce the 12th hot-fix release in the Pacific series.

https://urldefense.com/v3/__https://ceph.io/en/news/blog/2023/v16-2-12-pacific-released/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fNaPJ0M8$

Notable Changes
---
This is a hotfix release that resolves several performance flaws in ceph-volume,
particularly during osd activation 
(https://urldefense.com/v3/__https://tracker.ceph.com/issues/57627__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fg0yeu7U$
 )
Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at 
https://urldefense.com/v3/__https://download.ceph.com/tarballs/ceph-16.2.12.tar.gz__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fBEJl5p4$
* Containers at 
https://urldefense.com/v3/__https://quay.io/repository/ceph/ceph__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fc7HeSms$
* For packages, see 
https://urldefense.com/v3/__https://docs.ceph.com/en/latest/install/get-packages/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fAKdWZK4$
* Release git sha1: 5a2d516ce4b134bfafc80c4274532ac0d56fc1e2

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Bucket sync policy

[ceph-users] Re: Bucket sync policy

[ceph-users] Re: pacific 16.2.13 point release

[ceph-users] Re: Bucket sync policy

[ceph-users] Re: Bucket sync policy

[ceph-users] Re: Bucket sync policy

[ceph-users] Bucket sync policy

[ceph-users] pacific 16.2.13 point release

[ceph-users] Re: v16.2.12 Pacific (hot-fix) released

[ceph-users] Disks are filling up

[ceph-users] Re: v16.2.12 Pacific (hot-fix) released

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

[ceph-users] Re: Troubleshooting cephadm OSDs aborting start

[ceph-users] Re: Rados gateway data-pool replacement.

[ceph-users] Re: v16.2.12 Pacific (hot-fix) released

15 matches

Site Navigation

Mail list logo

Footer information