[ceph-users] Re: Ovirt integration with Ceph

2023-04-25 Thread Konstantin Shalygin
Hi,

Can you see logs at the vdsm.log file? What exactly happened on storage domain 
connection?


k
Sent from my iPhone

> On 26 Apr 2023, at 00:37, kushagra.gu...@hsc.com wrote:
> 
> Hi Team,
> 
> We are trying to integrate ceph with ovirt.
> We have deployed ovirt 4.4.
> We want to create a storage domain of POSIX compliant type for mounting a 
> ceph based infrastructure in ovirt.
> We have done SRV based resolution in our DNS server for ceph mon nodes but we 
> are unable to create a storage domain using that.
> 
> We are able to manually mount the ceph-mon nodes using the following command 
> on the deployment hosts:
> 
> sudo mount -t ceph :/volumes/xyz/conf/00593e1d-b674-4b00-a289-20bec06761c9 
> /rhev/data-center/mnt/:_volumes_xyz_conf_00593e1d-b674-4b00-a289-20bec06761c9 
> -o rw,name=foo,secret=AQABDzRkTaJCEhAAC7rC6E68ofwULnx6qX/VDA==
> 
> [root@deployment-host mnt]# df -kh
> df: /run/user/0/gvfs: Transport endpoint is not connected
> Filesystem
>  Size  Used Avail 
> Use% Mounted on
> 
> [abcd:abcd:abcd::51]:6789,[abcd:abcd:abcd::52]:6789,[abcd:abcd:abcd::53]:6789:/volumes/xyz/conf/00593e1d-b674-4b00-a289-20bec06761c9
>19G 0   19G   0% 
> /rhev/data-center/mnt/:_volumes_xyz_conf_00593e1d-b674-4b00-a289-20bec06761c9
> 
> 
> Query:
> 1. Could anyone help us out with storage domain creation in ovirt for SRV 
> resolved ceph-mon nodes.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Rados gateway data-pool replacement.

2023-04-25 Thread Richard Bade
Hi Gaël,
I'm actually embarking on a similar project to migrate EC pool from
k=2,m=1 to k=4,m=2 using rgw multi site sync.
I just thought I'd check before you do a lot of work for nothing that
when you say failure domain that's the crush failure domain you mean,
not k and m? If it is failure domain you mean I wonder if you realise
that you can change the crush rule on an EC pool?
You can change the rule the same as other pool types like this:
sudo ceph osd pool set {pool_name} crush_rule {rule_name}
At least that is my understanding and I have done so on a couple of my
pools (changed from Host to Chassis failure domain).
I found it a bit confusing in the docs because you can't change the EC
profile of a pool due to k and m numbers and the crush rule is defined
in the profile as well, but you can change that outside of the
profile.

Regards,
Rich

On Mon, 24 Apr 2023 at 20:55, Gaël THEROND  wrote:
>
> Hi casey,
>
> I’ve tested that while you answered me actually :-)
>
> So, all in all, we can’t stop the radosgw for now and tier cache option
> can’t work as we use EC based pools (at least for nautilus).
>
> Due to those constraints we’re currently thinking of the following
> procedure:
>
> 1°/- Create the new EC Profile.
> 2°/- Create the new EC based pool and assign it the new profile.
> 3°/- Create a new storage class that use this new pool.
> 4°/- Add this storage class to the default placement policy.
> 5°/- Force a bucket lifecycle objects migration (possible??).
>
> It seems at least one user attempted to do just that in here:
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/RND652IBFIG6ESSQXVGNX7NAGCNEVYOU
>
> The only part of that thread that I don’t get is the:
>
> « I think actually moving an already-stored object requires a lifecycle
> transition policy… » part of the Matt Benjamin answer.
>
> What kind of policy should I write to do that ??
>
> Is this procedure something that looks ok to you?
>
> Kind regards!
>
> Le mer. 19 avr. 2023 à 14:49, Casey Bodley  a écrit :
>
> > On Wed, Apr 19, 2023 at 5:13 AM Gaël THEROND 
> > wrote:
> > >
> > > Hi everyone, quick question regarding radosgw zone data-pool.
> > >
> > > I’m currently planning to migrate an old data-pool that was created with
> > > inappropriate failure-domain to a newly created pool with appropriate
> > > failure-domain.
> > >
> > > If I’m doing something like:
> > > radosgw-admin zone modify —rgw-zone default —data-pool 
> > >
> > > Will data from the old pool be migrated to the new one or do I need to do
> > > something else to migrate those data out of the old pool?
> >
> > radosgw won't migrate anything. you'll need to use rados tools to do
> > that first. make sure you stop all radosgws in the meantime so it
> > doesn't write more objects to the old data pool
> >
> > > I’ve read a lot
> > > of mail archive with peoples willing to do that but I can’t get a clear
> > > answer from those archives.
> > >
> > > I’m running on nautilus release of it ever help.
> > >
> > > Thanks a lot!
> > >
> > > PS: This mail is a redo of the old one as I’m not sure the former one
> > > worked (missing tags).
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Could you please explain the PG concept

2023-04-25 Thread Anthony D'Atri
Absolutely.

Moreover, PGs are not a unit of size, they are a logical grouping of smaller 
RADOS objects, because a few thousand PGs are a lot easier and less expensive 
to manage than tens or hundreds of millions of small underlying RADOS objects.  
They’re for efficiency, and are not any set size in bytes.

With respect to PG calculators, conventional wisdom is for the number of PGs in 
a given pool to be a power of 2:  1024, 2048, 4096, etc.  The reasons for this 
aren’t as impactful as they were with previous releases, but it still has 
benefits and is good practice.  That’s one reason why a pool forecast for 15% 
and one for 18% may recommend the same number of PGs, because the usual 
practice is to round up to the nearest power of 2.  So if calculations suggest 
say 903 PGs for 15% and 1138 for 18%, both will round up to the same 1024.

If you only have one pool in your cluster, which usually means you’re only 
using RBD, then the calculations are very simple.  When you have multiple 
pools, it becomes more complicated because you’re solving for the number of PG 
replicas that end up on each OSD, which involves the (potentially different) 
replication factor of each pool, the relative expected capacities of the pools, 
the use of each pool (an RGW pool and an RBD pool experience different 
workloads), etc.

> On Apr 25, 2023, at 7:45 PM, Alex Gorbachev  wrote:
> 
> Hi Wodel,
> 
> The simple explanation is that PGs are a level of storage abstraction above
> the drives (OSD) and below objects (pools).  The links below may be
> helpful.  PGs consume resources, so they should be planned as best you
> can.  Now you can scale them up and down, and use autoscaler, so you don't
> have to be spot on right away.  PGs peer up and replicate data according to
> your chosen CRUSH rules.
> 
> https://ceph.io/en/news/blog/2014/how-data-is-stored-in-ceph-cluster/
> 
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.3/html/storage_strategies_guide/placement_groups_pgs
> 
> https://www.sebastien-han.fr/blog/2012/10/15/ceph-data-placement/
> --
> Alex Gorbachev
> ISS Storcium
> 
> 
> 
> On Tue, Apr 25, 2023 at 6:10 PM wodel youchi  wrote:
> 
>> Hi,
>> 
>> I am learning Ceph and I am having a hard time understanding PG and PG
>> calculus .
>> 
>> I know that a PG is a collection of objects, and that PG are replicated
>> over the hosts to respect the replication size, but...
>> 
>> In traditional storage, we use size in Gb, Tb and so on, we create a pool
>> from a bunch of disks or raid arrays of some size then we create volumes of
>> a certain size and use them. If the storage is full we add disks, then we
>> extend our pools/volumes.
>> The idea of size is simple to understand.
>> 
>> Ceph, although it supports the notion of pool size in Gb, Tb ...etc. Pools
>> are created using PGs, and now there is also the notion of % of data.
>> 
>> When I use pg calc from ceph or from redhat, the generated yml file
>> contains the % variable, but the commands file contains only the PGs, and
>> when you are configuring 15% and 18% have the same number of PGs
>> ???
>> 
>> The pg calc encourages you to create a %data multiple of 100, in other
>> words, it assumes that you know all your pools from the start. What if you
>> won't consume all your raw disk space.
>> What happens when you need to add a new pool?
>> 
>> Also when you create several pools, and then execute ceph osd df tree, you
>> can see that all pools show the raw size as a free space, it is like all
>> pools share the same raw space regardless of their PG number.
>> 
>> If someone can put some light on this concept and how to manage it wisely,
>> because the documentation keeps saying that it's an important concept, that
>> you have to pay attention when choosing the number of PGs for a pool from
>> the start.
>> 
>> Regards.
>> 
>> <
>> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
>>> 
>> Virus-free.www.avast.com
>> <
>> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
>>> 
>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Could you please explain the PG concept

2023-04-25 Thread Alex Gorbachev
Hi Wodel,

The simple explanation is that PGs are a level of storage abstraction above
the drives (OSD) and below objects (pools).  The links below may be
helpful.  PGs consume resources, so they should be planned as best you
can.  Now you can scale them up and down, and use autoscaler, so you don't
have to be spot on right away.  PGs peer up and replicate data according to
your chosen CRUSH rules.

https://ceph.io/en/news/blog/2014/how-data-is-stored-in-ceph-cluster/

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.3/html/storage_strategies_guide/placement_groups_pgs

https://www.sebastien-han.fr/blog/2012/10/15/ceph-data-placement/
--
Alex Gorbachev
ISS Storcium



On Tue, Apr 25, 2023 at 6:10 PM wodel youchi  wrote:

> Hi,
>
> I am learning Ceph and I am having a hard time understanding PG and PG
> calculus .
>
> I know that a PG is a collection of objects, and that PG are replicated
> over the hosts to respect the replication size, but...
>
> In traditional storage, we use size in Gb, Tb and so on, we create a pool
> from a bunch of disks or raid arrays of some size then we create volumes of
> a certain size and use them. If the storage is full we add disks, then we
> extend our pools/volumes.
> The idea of size is simple to understand.
>
> Ceph, although it supports the notion of pool size in Gb, Tb ...etc. Pools
> are created using PGs, and now there is also the notion of % of data.
>
> When I use pg calc from ceph or from redhat, the generated yml file
> contains the % variable, but the commands file contains only the PGs, and
> when you are configuring 15% and 18% have the same number of PGs
> ???
>
> The pg calc encourages you to create a %data multiple of 100, in other
> words, it assumes that you know all your pools from the start. What if you
> won't consume all your raw disk space.
> What happens when you need to add a new pool?
>
> Also when you create several pools, and then execute ceph osd df tree, you
> can see that all pools show the raw size as a free space, it is like all
> pools share the same raw space regardless of their PG number.
>
> If someone can put some light on this concept and how to manage it wisely,
> because the documentation keeps saying that it's an important concept, that
> you have to pay attention when choosing the number of PGs for a pool from
> the start.
>
> Regards.
>
> <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> Virus-free.www.avast.com
> <
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
> >
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Could you please explain the PG concept

2023-04-25 Thread wodel youchi
Hi,

I am learning Ceph and I am having a hard time understanding PG and PG
calculus .

I know that a PG is a collection of objects, and that PG are replicated
over the hosts to respect the replication size, but...

In traditional storage, we use size in Gb, Tb and so on, we create a pool
from a bunch of disks or raid arrays of some size then we create volumes of
a certain size and use them. If the storage is full we add disks, then we
extend our pools/volumes.
The idea of size is simple to understand.

Ceph, although it supports the notion of pool size in Gb, Tb ...etc. Pools
are created using PGs, and now there is also the notion of % of data.

When I use pg calc from ceph or from redhat, the generated yml file
contains the % variable, but the commands file contains only the PGs, and
when you are configuring 15% and 18% have the same number of PGs
???

The pg calc encourages you to create a %data multiple of 100, in other
words, it assumes that you know all your pools from the start. What if you
won't consume all your raw disk space.
What happens when you need to add a new pool?

Also when you create several pools, and then execute ceph osd df tree, you
can see that all pools show the raw size as a free space, it is like all
pools share the same raw space regardless of their PG number.

If someone can put some light on this concept and how to manage it wisely,
because the documentation keeps saying that it's an important concept, that
you have to pay attention when choosing the number of PGs for a pool from
the start.

Regards.


Virus-free.www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Move ceph to new addresses and hostnames

2023-04-25 Thread Jan Marek
Hi there,

I have ceph cluster created by ceph-volume - bluestore, in every
node is 12 HDD and 1 NVMe, which is divided to 24 LVM partition
for DB and WAL.

I've turned this cluster to 'ceph orch' management, then I've
moved to quincy release (now I'm using a 17.2.5 version).

I had to move whole cluster to another addreses and another
hostnames.

MON, MGR and MDS goes without problem, but OSD was really pain
process :-(

Now I have cluster with this problem:

# ceph orch ps
NAMEHOST  PORTS   STATUS  REFRESHED  AGE  MEM USE  
MEM LIM  VERSION  IMAGE ID  CONTAINER ID
mds.cephfs.mon1.ulytsa  mon1  running (11w)  3m ago  11w5609M   
 -  17.2.5   cc65afd6173a  db1aa336263a
mds.cephfs.mon2.zxhxqk  mon2  running (11w)  3m ago  11w33.1M   
 -  17.2.5   cc65afd6173a  5b9ced4a4b71
mds.cephfs.mon3.rpkvlt  mon3  running (11w)  3m ago  11w32.4M   
 -  17.2.5   cc65afd6173a  045e23f124aa
mgr.mon1.buqyga mon1  *:8080  running (11w)  3m ago  11w2300M   
 -  17.2.5   cc65afd6173a  9577239527b5
mgr.mon2.goghws mon2  *:8080  running (11w)  3m ago  11w 495M   
 -  17.2.5   cc65afd6173a  4fb1ae26765e
mgr.mon3.slpgay mon3  *:8080  running (11w)  3m ago  11w 495M   
 -  17.2.5   cc65afd6173a  06e491084a5e
mon.mon1mon1  running (11w)  3m ago  11w1576M   
 2048M  17.2.5   cc65afd6173a  2f18c737faa9
mon.mon2mon2  running (11w)  3m ago  11w1598M   
 2048M  17.2.5   cc65afd6173a  31091cbbfb8e
mon.mon3mon3  running (11w)  3m ago  11w1463M   
 2048M  17.2.5   cc65afd6173a  4d0b094c9ca1
osd.0   osd1  running (9w)   3m ago  10w5133M   
 3745M  17.2.5   cc65afd6173a  3b28e48d3630
osd.1   osd1  running (7w)   3m ago  10w5425M   
 3745M  17.2.5   cc65afd6173a  3336ccdfd232
osd.2   osd1  running (9w)   3m ago  10w5223M   
 3745M  17.2.5   cc65afd6173a  e8fc077aef59
osd.3   osd1  running (9w)   3m ago  10w5050M   
 3745M  17.2.5   cc65afd6173a  4fbf34450237
osd.4   osd1  running (9w)   3m ago  10w7526M   
 3745M  17.2.5   cc65afd6173a  a4875c354540
osd.5   osd1  running (9w)   3m ago  10w4854M   
 3745M  17.2.5   cc65afd6173a  b006526228ae
osd.6   osd1  running (9w)   3m ago  10w6498M   
 3745M  17.2.5   cc65afd6173a  4c326271e188
osd.7   osd1  running (9w)   3m ago  10w4410M   
 3745M  17.2.5   cc65afd6173a  ca0f3ce31031
osd.8   osd1  running (9w)   3m ago  10w7337M   
 3745M  17.2.5   cc65afd6173a  99269a832819
osd.9   osd1  running (9w)   3m ago  10w4717M   
 3745M  17.2.5   cc65afd6173a  f39ce0bb5316
osd.10  osd1  running (9w)   3m ago  10w4295M   
 3745M  17.2.5   cc65afd6173a  0871793fa261
osd.11  osd1  running (9w)   3m ago  10w5552M   
 3745M  17.2.5   cc65afd6173a  32a8b589b3bd
osd.24  osd3  running (109m) 3m ago   6M3306M   
 3745M  17.2.5   cc65afd6173a  466d80a55d96
osd.25  osd3  running (109m) 3m ago   6M3145M   
 3745M  17.2.5   cc65afd6173a  b1705621116a
osd.26  osd3  running (109m) 3m ago   6M3063M   
 3745M  17.2.5   cc65afd6173a  c30253a1a83f
osd.27  osd3  running (109m) 3m ago   6M3257M   
 3745M  17.2.5   cc65afd6173a  aa0a647d93f1
osd.28  osd3  running (109m) 3m ago   6M2244M   
 3745M  17.2.5   cc65afd6173a  d3c68ed6572b
osd.29  osd3  running (109m) 3m ago   6M3509M   
 3745M  17.2.5   cc65afd6173a  2c425b17abf7
osd.30  osd3  running (109m) 3m ago   6M3814M   
 3745M  17.2.5   cc65afd6173a  44747256b34a
osd.31  osd3  running (109m) 3m ago   6M2958M   
 3745M  17.2.5   cc65afd6173a  b7b7946fa24e
osd.32  osd3  running (109m) 3m ago   6M3016M   
 3745M  17.2.5   cc65afd6173a  fc9c024fed4f
osd.33  osd3  running (109m) 3m ago   6M5366M   
 3745M  17.2.5   cc65afd6173a  edc2dbd9c556
osd.34  osd3  running (109m) 3m ago   6M4577M   
 3745M  17.2.5   cc65afd6173a  46d7668742cf
osd.35  osd3  running (109m) 3m ago   6M2538M   
 3745M  17.2.5   cc65afd6173a  96a15a9ad3d7
osd.36  osd4  running (103m) 3m ago   8w2707M   
 3745M  17.2.5   cc65afd6173a  adf884af609b
osd.37  osd4  running (103m) 3m ago   6M3347M   
 3745M  17.2.5   cc65afd6173a  8f824026c6ae
osd.38 

[ceph-users] Re: Bucket sync policy

2023-04-25 Thread vitaly . goot
On Quincy:

The zone-group policy has the same problem. with a similar setup. Way to 
reproduce: 
- Setup 3-way symmetrical replication on zone-group level 
- Switch zone-group to 'allowed' (zones are not in sync).
- Add content to the detached zone(s) 
- Switch it back to 'enabled' (3-way symmetrical replication should be in 
effect)

Newly added content stays in each zone.  Then "bucket sync run" -> does 
*nothing* either on zone-group or bucket levels to trigger replication.

The only way to trigger replication is to modify some object (commit a new or 
existing object), Then replication is triggered on *shard* level.  To fully 
replicate content multiple commits are required (as many as shards).
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.13 point release

2023-04-25 Thread Radoslaw Zarzynski
How about tracking the stuff in a single Etherpad?

https://pad.ceph.com/p/prs-to-have-16.2.13

On Mon, Apr 24, 2023 at 9:01 PM Cory Snyder  wrote:
>
> Hi Yuri,
>
> We were hoping that the following patch would make it in for 16.2.13 if 
> possible:
>
> https://github.com/ceph/ceph/pull/51200
>
> Thanks,
>
> Cory Snyder
>
>
>
> From: Yuri Weinstein 
> Sent: Monday, April 24, 2023 11:39 AM
> To: dev ; ceph-users ; clt 
> Subject: pacific 16.2.13 point release
>
> We want to do the next urgent point release for pacific 16.2.13 ASAP.
>
> The tip of the current pacific branch will be used as a base for this
> release and we will build it later today.
>
> Dev leads - if you have any outstanding PRs that must be included pls
> merged them now.
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Bucket sync policy

2023-04-25 Thread yjin77
Hello, ceph gurus,

We are trying multisite sync policy feature with Quincy release and we 
encounter something strange, which we cannot solve even after combing through 
the internet for clues. Our test setup is very simple. I use mstart.sh to spin 
up 3 clusters, configure them with a single realm, a single zonegroup and 3 
zones – z0, z1, z2, with z0 being the master. I created a zonegroup-level sync 
policy with “allowed”, a symmetrical flow among all 3 zones and a pipe allowing 
all zones to all zones. I created a single bucket “test-bucket” at z0 and 
uploaded a single object to it. By now, there should be no sync since the 
policy is “allowed” only and I can see the single file only exist in z0 and 
“bucket sync status” shows the sync is actually disabled. Finally, I created a 
bucket-level sync policy being “enabled” and a pipe between z0 and z1 only. I 
expected that sync should be kicked off between z0 and z1 and I did see from 
“sync info” that there are sources/dests being z0/z1. “bucket sync status” also 
shows the source zone and source bucket. At z0, it shows everything is caught 
up but at z1 it shows one shard is behind, which is expected since that only 
object exists in z0 but not in z1.
 
Now, here comes the strange part. Although z1 shows there is one shard behind, 
it doesn’t seem to make any progress on syncing it. It doesn’t seem to do any 
full sync at all since “bucket sync status” shows “full sync: 0/11 shards”. 
There hasn’t been any full sync since otherwise, z1 should have that only 
object. It is stuck in this condition forever until I make another upload on 
the same object. I suspect the update of the object triggers a new data log, 
which triggers the sync. So, why wasn’t there a full sync and how can one force 
a full sync?
 
I also tried “sync error list” and they are all empty. I also tried to apply 
the fix in https://tracker.ceph.com/issues/57853, although I am not sure if it 
is relevant. The fix didn’t change the behavior that we observed.

Thanks,
Yixin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to control omap capacity?

2023-04-25 Thread WeiGuo Ren
I have two osds. these  osd are used to rgw index pool. After a lot of
stress tests, these two osds were written to 99.90%. The full ratio
(95%) did not take effect? I don't know much. Could it be that if the
osd of omap is fully stored, it cannot be limited by the full ratio?
ALSO I use ceph-bluestore-tool to expand it . Before I add a partition
. But i failed, I dont know why.
In my cluster every osd have 55GB (db val data in same device), ceph
-v is 14.2.5. can anyone give me some idear to fix it?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Deep-scrub much slower than HDD speed

2023-04-25 Thread Niklas Hambüchen
I observed that on an otherwise idle cluster, scrubbing cannot fully utilise 
the speed of my HDDs.

`iostat` shows only 8-10 MB/s per disk, instead of the ~100 MB/s most HDDs can 
easily deliver.

Changing scrubbing settings does not help (see below).

Environment:

* 6 active+clean+scrubbing+deep
* Ceph version 16.2.7.
* BlueStore
* My cluster has many objects small objects ("402.32M objects, 38 TiB" from 
"ceph status") due to small files (4 - 32 KiB) on CephFS.

`iostat -x 5` with default settings:

Devicer/s rkB/s   rrqm/s  %rrqm r_await rareq-sz w/s
 wkB/s   wrqm/s  %wrqm w_await wareq-sz d/s dkB/s   drqm/s  %drqm 
d_await dareq-sz f/s f_await  aqu-sz  %util
dm-0   198.60   6878.40 0.00   0.00   12.7834.63   51.80   
2612.80 0.00   0.00   14.8250.440.00  0.00 0.00   0.00
0.00 0.000.000.003.30  91.24
dm-1 0.80  3.20 0.00   0.00   11.50 4.00   52.60   
2582.40 0.00   0.00   13.6949.100.00  0.00 0.00   0.00
0.00 0.000.000.000.73   3.78
dm-10   11.20 71.20 0.00   0.000.09 6.36  145.80
583.20 0.00   0.000.14 4.000.00  0.00 0.00   0.00
0.00 0.000.000.000.02   2.62
dm-11  192.60   6737.60 0.00   0.00   10.7434.98   34.80   
1684.80 0.00   0.00   11.4748.410.00  0.00 0.00   0.00
0.00 0.000.000.002.47  91.40
dm-12  245.40  10194.40 0.00   0.009.4341.54   21.20
575.20 0.00   0.003.9427.130.00  0.00 0.00   0.00
0.00 0.000.000.002.40  87.92
dm-13   30.80   1772.80 0.00   0.00   11.6157.56   78.80   
4507.20 0.00   0.00   19.7857.200.00  0.00 0.00   0.00
0.00 0.000.000.001.92   9.54
dm-143.20 24.80 0.00   0.000.12 7.75  125.20
500.80 0.00   0.000.12 4.000.00  0.00 0.00   0.00
0.00 0.000.000.000.01   2.18
dm-152.80 19.20 0.00   0.000.14 6.86  105.40
421.60 0.00   0.000.05 4.000.00  0.00 0.00   0.00
0.00 0.000.000.000.01   1.76
dm-160.80  6.40 0.00   0.000.00 8.00  111.00
444.00 0.00   0.000.10 4.000.00  0.00 0.00   0.00
0.00 0.000.000.000.01   1.82
dm-17   10.80 76.80 0.00   0.000.09 7.11  151.40
605.60 0.00   0.000.08 4.000.00  0.00 0.00   0.00
0.00 0.000.000.000.01   2.92
dm-18   10.20 67.20 0.00   0.000.08 6.59  115.60
462.40 0.00   0.000.04 4.000.00  0.00 0.00   0.00
0.00 0.000.000.000.01   2.16
dm-19   10.20 56.80 0.00   0.000.10 5.57  109.00
436.00 0.00   0.000.07 4.000.00  0.00 0.00   0.00
0.00 0.000.000.000.01   2.34
dm-2 4.80435.20 0.00   0.000.1290.67  751.80   
6292.80 0.00   0.000.07 8.370.00  0.00 0.00   0.00
0.00 0.000.000.000.05  16.14
dm-200.40  2.40 0.00   0.000.00 6.00  265.00   
2459.20 0.00   0.000.10 9.280.00  0.00 0.00   0.00
0.00 0.000.000.000.03   5.36
dm-21  191.00   6105.60 0.00   0.006.3431.97   67.80   
3748.00 0.00   0.00   19.5655.280.00  0.00 0.00   0.00
0.00 0.000.000.002.54  42.22
dm-3 1.00  8.80 0.00   0.000.00 8.80   91.00
364.00 0.00   0.000.04 4.000.00  0.00 0.00   0.00
0.00 0.000.000.000.00   1.54
dm-4   167.60   4973.60 0.00   0.00   10.1529.68   49.20   
2511.20 0.00   0.00   11.3951.040.00  0.00 0.00   0.00
0.00 0.000.000.002.26  89.18
dm-511.20 73.60 0.00   0.000.12 6.57  124.40
497.60 0.00   0.000.07 4.000.00  0.00 0.00   0.00
0.00 0.000.000.000.01   2.16
dm-627.20   1644.80 0.00   0.00   12.2260.47   57.20   
3316.80 0.00   0.00   15.9357.990.00  0.00 0.00   0.00
0.00 0.000.000.001.24   6.78
dm-7   217.40   8032.80 0.00   0.00   12.0436.95   64.40   
3654.40 0.00   0.00   23.6956.750.00  0.00 0.00   0.00
0.00 0.000.000.004.14  97.28
dm-810.80 70.40 0.00   0.000.15 6.52  111.80
447.20 0.00   0.000.04 4.000.00  0.00 0.00   0.00
0.00 0.000.000.000.01   2.32
  

[ceph-users] Re: How to replace an HDD in a OSD with shared SSD for DB/WAL

2023-04-25 Thread enochlew
Than you for your suggest!
I have already deleated the lvm both of the Block and DB devices.
I monitored the creating process of OSD.23. with the command-line "podman ps 
-a".
The osd.23 apeared for a shorted time,  then was deleated.
The feedback of the command-line 
"compute11:data_devices=/dev/sda,db_devices=/dev/sdc,osds_per_device=1" told me 
"Created osd(s) 23 on host 'compute11'"
The osd was shown in dashboard, but was always down, and cannot be started.
Thans again.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Increase timeout for marking osd down

2023-04-25 Thread Nicola Mori

Dear Ceph users,

my cluster is made of very old machines on a Gbit ethernet. I see that 
sometimes some OSDs are marked down due to slow networking, especially 
on heavy network load like during recovery. This causes problems, for 
example PGs keeps being deactivated and activated as the OSDs are marked 
down and up (at least to my best understanding). So I'd need to know if 
there is some way to increase the timeout after which an OSD is marked 
down, to cope with my slow network.

Thanks,

Nicola
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] advise on adding RGW and NFS/iSCSI on proxmox

2023-04-25 Thread MartijnF
Hello,

i'm maintaining a 3 small infrastructures in NL.
7 primary, 4 secondary and 3 testing proxmox nodes (hyperconverged).

Till now we only use the embedded RBD facilities of proxmox using ceph 17.2.5
To facilitate S3/swift and persistent storage to our internal k8s cluster i am 
looking around for best-practices like:

- should i install it on-top of proxmox-hosts or create a few ha-VM's with 
public and private lan-segment to avoid upgrade errors.
- can i use the new ceph-dashboard to create and install mentioned services 
needed? 

Should i virtualize a proxmox or other OS for this? I can imagine it is not 
their primary focus.


A lot of information can be found when using google-foo but it is hard to find 
answers reflecting my setup.

Thanks in advance for any pointers!

Martijn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [sync policy] multisite bucket full sync

2023-04-25 Thread yjin77
Hi folks,

We are trying multisite sync policy feature with Quincy release and we 
encounter something strange. Maybe our understanding of sync policy is 
incorrect. I hope the community could help us uncover the mystery.

Our test setup is very simple. I use mstart.sh to spin up 3 clusters, configure 
them with a single realm "world", a single zonegroup "zg" and 3 zones – z0, z1, 
z2, with z0 being the master. I created a zonegroup-level sync policy with 
status “allowed”, a symmetrical flow among all 3 zones and a pipe allowing all 
zones to all zones. I created a single bucket “test-bucket” at z0 and uploaded 
a single object to it. By now, there should be no sync since the policy status 
is “allowed” only and I can see the single file only exist in z0 and “bucket 
sync status” shows the sync is actually disabled. Finally, I created a 
bucket-level sync policy with status “enabled” and a pipe between z0 and z1 
only. I expected that sync should be kicked off between z0 and z1 and I did see 
from “sync info” that there are sources/dests being z0/z1. “bucket sync status” 
also shows the source zone and source bucket. At z0, it shows everything is 
caught up but at z1 it shows one shard is behind for data sync, which is 
expected since that only object exists in z0 but not in z1.
 
Now, here comes the strange part. Although z1 shows there is one shard behind, 
it doesn’t seem to make any progress on syncing it. It doesn’t seem to do any 
full sync at all since “bucket sync status” shows “full sync: 0/11 shards”. 
There hasn’t been any full sync since otherwise, z1 should have that only 
object. It is stuck in this condition forever until I make another upload on 
the same object. I suspect the update of the object triggers a new data log, 
which triggers the sync. My questions are:

1. With bucket-specific sync policy, why wasn’t there a full sync and how can 
one force a full sync?
2. Does the sync policy only work datalogs, without which no action it may 
take? I wonder if it would explain why it didn't sync even though it knows it 
is behind?

BTW, I also tried “sync error list” and they are all empty. I also tried to 
apply the fix in https://tracker.ceph.com/issues/57853, although I am not sure 
if it is relevant. The fix didn’t change the behavior that we observed.

Thanks in advance,
Yixin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Massive OMAP remediation

2023-04-25 Thread Ben . Zieglmeier
Hi All,

We have a RGW cluster running Luminous (12.2.11) that has one object with an 
extremely large OMAP database in the index pool. Listomapkeys on the object 
returned 390 Million keys to start. Through bilog trim commands, we’ve whittled 
that down to about 360 Million. This is a bucket index for a regrettably 
unsharded bucket. There are only about 37K objects actually in the bucket, but 
through years of neglect, the bilog grown completely out of control. We’ve hit 
some major problems trying to deal with this particular OMAP object. We just 
crashed 4 OSDs when a bilog trim caused enough churn to knock one of the OSDs 
housing this PG out of the cluster temporarily. The OSD disks are 6.4TB NVMe, 
but are split into 4 partitions, each housing their own OSD daemon (collocated 
journal).

We want to be rid of this large OMAP object, but are running out of options to 
deal with it. Reshard outright does not seem like a viable option, as we 
believe the deletion would deadlock OSDs can could cause impact. Continuing to 
run `bilog trim` 1000 records at a time has been what we’ve done, but this also 
seems to be creating impacts to performance/stability. We are seeking options 
to remove this problematic object without creating additional problems. It is 
quite likely this bucket is abandoned, so we could remove the data, but I fear 
even the deletion of such a large OMAP could bring OSDs down and cause 
potential for metadata loss (the other bucket indexes on that same PG).

Any insight available would be highly appreciated.

Thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to replace an HDD in a OSD with shared SSD for DB/WAL

2023-04-25 Thread enochlew
HI,

I build a Ceph Cluster with cephadm. 
Every cehp node has 4 OSDs. These 4 OSD were build with 4 HDD (block) and 1 SDD 
(DB).
At present , one HDD is broken, and I am trying to replace the HDD,and build 
the OSD with the new HDD and the free space of the SDD. I did the follows:

#ceph osd stop osd.23
#ceph osd out osd.23
#ceph osd crush remove osd.23
#ceph osd rm osd.23
#ceph orch daemon rm osd.23 --force
#lvremove 
/dev/ceph-ae21e618-601e-4273-9185-99180edb8453/osd-block-96eda371-1a3f-4139-9123-24ec1ba362c4
#wipefs -af /dev/sda
#lvremove 
/dev/ceph-e50203a6-8b8e-480f-965c-790e21515395/osd-db-70f7a032-cf2c-4964-b979-2b90f43f2216
#ceph orch daemon add osd 
compute11:data_devices=/dev/sda,db_devices=/dev/sdc,osds_per_device=1

The OSD can be built, but is always down.

Is there anyting that I missed during the building?

Thank you very much!

Regards,

LIUTao
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] I am unable to execute 'rbd map xxx' as it returns the error 'rbd: map failed: (5) Input/output error'.

2023-04-25 Thread siriusa51
$ rbd map xxx

rbd: sysfs write failed
2023-04-21 11:29:13.786418 7fca1bfff700 -1 librbd::image::OpenRequest: failed 
to retrieve image id: (5) Input/output error
2023-04-21 11:29:13.786456 7fca1b7fe700 -1 librbd::ImageState: 0x55a60108a040 
failed to open image: (5) Input/output error
rbd: error opening image xxx: (5) Input/output error
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (5) Input/output error

the command 'dmesg | tail' did not show any useful results.

and other command: 
$ rbd info xxx

2023-04-21 11:33:10.223701 7f3547fff700 -1 librbd::image::OpenRequest: failed 
to retrieve image id: (5) Input/output error
2023-04-21 11:33:10.223739 7f35477fe700 -1 librbd::ImageState: 0x5647d5cfeeb0 
failed to open image: (5) Input/output error
rbd: error opening image xxx: (5) Input/output error

i know header id is c2c061579478fe ,so i get omapvals: 

$ rados -p rbd listomapvals rbd_header.c2c061579478fe

features
value (8 bytes) :
  01 00 00 00 00 00 00 00   ||
0008

object_prefix
value (27 bytes) :
  17 00 00 00 72 62 64 5f  64 61 74 61 2e 63 32 63  |rbd_data.c2c|
0010  30 36 31 35 37 39 34 37  38 66 65 |061579478fe|
001b

order
value (1 bytes) :
  16|.|
0001

size
value (8 bytes) :
  00 00 00 00 71 02 00 00   |q...|
0008

snap_seq
value (8 bytes) :
  00 00 00 00 00 00 00 00   ||
0008

I scanned all existing data blocks using the command 'rados -p rbd list | grep 
c2c061579478fe', and found many 'rbd_data' blocks. I suspect that the loss of 
important blocks caused this situation. Currently, my plan is to obtain all 
'rbd' blocks, concatenate them into an XFS file, and then perform a repair. 
However, as the data size is huge, I am still experimenting with this method. 
Are there any other feasible methods to repair this data?

rbd_data:

rbd_data.c2c061579478fe.
rbd_data.c2c061579478fe.0001
rbd_data.c2c061579478fe.0002
rbd_data.c2c061579478fe.0003
rbd_data.c2c061579478fe.0004
rbd_data.c2c061579478fe.0005
rbd_data.c2c061579478fe.0006
rbd_data.c2c061579478fe.0007
rbd_data.c2c061579478fe.0008
rbd_data.c2c061579478fe.0009
rbd_data.c2c061579478fe.000a
rbd_data.c2c061579478fe.000b
rbd_data.c2c061579478fe.000c
rbd_data.c2c061579478fe.000d
rbd_data.c2c061579478fe.000e
rbd_data.c2c061579478fe.000f
rbd_data.c2c061579478fe.0010
rbd_data.c2c061579478fe.0011
...
rbd_data.c2c061579478fe.0009c3f6
rbd_data.c2c061579478fe.0009c3f7
rbd_data.c2c061579478fe.0009c3fa
rbd_data.c2c061579478fe.0009c3fb
rbd_data.c2c061579478fe.0009c3fc
rbd_data.c2c061579478fe.0009c3fd
rbd_data.c2c061579478fe.0009c3fe
rbd_data.c2c061579478fe.0009c3ff
rbd_header.c2c061579478fe


thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MDS recovery

2023-04-25 Thread jack
Hi All,

We have a CephFS cluster running Octopus with three control nodes each running 
an MDS, Monitor, and Manager on Ubuntu 20.04. The OS drive on one of these 
nodes failed recently and we had to do a fresh install, but made the mistake of 
installing Ubuntu 22.04 where Octopus is not available. We tried to force apt 
to use the Ubuntu 20.04 repo when installing Ceph so that it would install 
Octopus, but for some reason Quincy was still installed. We re-integrated this 
node and it seemed to work fine for about a week until our cluster reported 
damage to an MDS daemon and placed our filesystem into a degraded state.

cluster:
id: 692905c0-f271-4cd8-9e43-1c32ef8abd13
health: HEALTH_ERR
mons are allowing insecure global_id reclaim
1 filesystem is degraded
1 filesystem is offline
1 mds daemon damaged
noout flag(s) set
161 scrub errors
Possible data damage: 24 pgs inconsistent
8 pgs not deep-scrubbed in time
4 pgs not scrubbed in time
6 daemons have recently crashed

  services:
mon: 3 daemons, quorum database-0,file-server,webhost (age 12d)
mgr: database-0(active, since 4w), standbys: webhost, file-server
mds: cephfs:0/1 3 up:standby, 1 damaged
osd: 91 osds: 90 up (since 32h), 90 in (since 5M)
 flags noout

  task status:

  data:
pools:   7 pools, 633 pgs
objects: 169.18M objects, 640 TiB
usage:   883 TiB used, 251 TiB / 1.1 PiB avail
pgs: 605 active+clean
 23  active+clean+inconsistent
 4   active+clean+scrubbing+deep
 1   active+clean+scrubbing+deep+inconsistent

We are not sure if the Quincy/Octopus version mismatch is the problem, but we 
are in the process of downgrading this node now to ensure all nodes are running 
Octopus. Before doing that, we ran the following commands to try and recover:

$ cephfs-journal-tool --rank=cephfs:all journal export backup.bin

$ sudo cephfs-journal-tool --rank=cephfs:all event recover_dentries summary:

Events by type:
  OPEN: 29589
  PURGED: 1
  SESSION: 16
  SESSIONS: 4
  SUBTREEMAP: 127
  UPDATE: 70438
Errors: 0

$ cephfs-journal-tool --rank=cephfs:0 journal reset:

old journal was 170234219175~232148677
new journal start will be 170469097472 (2729620 bytes past old end)
writing journal head
writing EResetJournal entry
done

$ cephfs-table-tool all reset session

All of our MDS daemons are down and fail to restart with the following errors:

-3> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) log [ERR] 
: journal replay alloc 0x153af79 not in free 
[0x153af7d~0x1e8,0x153b35c~0x1f7,0x153b555~0x2,0x153b559~0x2,0x153b55d~0x2,0x153b561~0x2,0x153b565~0x1de,0x153b938~0x1fd,0x153bd2a~0x4,0x153bf23~0x4,0x153c11c~0x4,0x153cd7b~0x158,0x153ced8~0xac3128]
-2> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 log_channel(cluster) log 
[ERR] : journal replay alloc 
[0x153af7a~0x1eb,0x153b35c~0x1f7,0x153b555~0x2,0x153b559~0x2,0x153b55d~0x2],
 only 
[0x153af7d~0x1e8,0x153b35c~0x1f7,0x153b555~0x2,0x153b559~0x2,0x153b55d~0x2]
 is in free 
[0x153af7d~0x1e8,0x153b35c~0x1f7,0x153b555~0x2,0x153b559~0x2,0x153b55d~0x2,0x153b561~0x2,0x153b565~0x1de,0x153b938~0x1fd,0x153bd2a~0x4,0x153bf23~0x4,0x153c11c~0x4,0x153cd7b~0x158,0x153ced8~0xac3128]
-1> 2023-04-20T10:25:15.072-0700 7f0465069700 -1 
/build/ceph-15.2.15/src/mds/journal.cc: In function 'void 
EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread 7f0465069700 
time 2023-04-20T10:25:15.076784-0700
/build/ceph-15.2.15/src/mds/journal.cc: 1513: FAILED ceph_assert(inotablev == 
mds->inotable->get_version())

 ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus 
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x155) [0x7f04717a3be1]
 2: (()+0x26ade9) [0x7f04717a3de9]
 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2) 
[0x560feaca36f2]
 4: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2]
 5: (MDLog::_replay_thread()+0x90c) [0x560feac393ac]
 6: (MDLog::ReplayThread::entry()+0x11) [0x560fea920821]
 7: (()+0x8609) [0x7f0471318609]
 8: (clone()+0x43) [0x7f0470ee9163]

 0> 2023-04-20T10:25:15.076-0700 7f0465069700 -1 *** Caught signal 
(Aborted) **
 in thread 7f0465069700 thread_name:md_log_replay

 ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus 
(stable)
 1: (()+0x143c0) [0x7f04713243c0]
 2: (gsignal()+0xcb) [0x7f0470e0d03b]
 3: (abort()+0x12b) [0x7f0470dec859]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x1b0) [0x7f04717a3c3c]
 5: (()+0x26ade9) [0x7f04717a3de9]
 6: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x67e2) 
[0x560feaca36f2]
 7: (EUpdate::replay(MDSRank*)+0x42) [0x560feaca5bd2]
 8: (MDLog::_

[ceph-users] Ovirt integration with Ceph

2023-04-25 Thread kushagra . gupta
Hi Team,

We are trying to integrate ceph with ovirt.
We have deployed ovirt 4.4.
We want to create a storage domain of POSIX compliant type for mounting a ceph 
based infrastructure in ovirt.
We have done SRV based resolution in our DNS server for ceph mon nodes but we 
are unable to create a storage domain using that.

We are able to manually mount the ceph-mon nodes using the following command on 
the deployment hosts:
 
sudo mount -t ceph :/volumes/xyz/conf/00593e1d-b674-4b00-a289-20bec06761c9 
/rhev/data-center/mnt/:_volumes_xyz_conf_00593e1d-b674-4b00-a289-20bec06761c9 
-o rw,name=foo,secret=AQABDzRkTaJCEhAAC7rC6E68ofwULnx6qX/VDA==

[root@deployment-host mnt]# df -kh
df: /run/user/0/gvfs: Transport endpoint is not connected
Filesystem  
   Size  Used Avail Use% 
Mounted on

[abcd:abcd:abcd::51]:6789,[abcd:abcd:abcd::52]:6789,[abcd:abcd:abcd::53]:6789:/volumes/xyz/conf/00593e1d-b674-4b00-a289-20bec06761c9
   19G 0   19G   0% 
/rhev/data-center/mnt/:_volumes_xyz_conf_00593e1d-b674-4b00-a289-20bec06761c9


Query:
1. Could anyone help us out with storage domain creation in ovirt for SRV 
resolved ceph-mon nodes.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Consequence of maintaining hundreds of clones of a single RBD image snapshot

2023-04-25 Thread Perspecti Vus
Hi again,

Is there a limit/best-practice regarding number of clones? I'd like to start 
development, but want to make sure I won't run into scaling issues.

  Perspectivus
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to find the bucket name from Radosgw log?

2023-04-25 Thread viplanghe6
I find a log like this, and I thought the bucket name should be "photos":

[2023-04-19 15:48:47.0.5541s] "GET /photos/shares/

But I can not find it:

radosgw-admin bucket stats --bucket photos
failure: 2023-04-19 15:48:53.969 7f69dce49a80  0 could not get bucket info for 
bucket=photos
(2002) Unknown error 2002

How does this happen? Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cannot add disks back after their OSDs were drained and removed from a cluster

2023-04-25 Thread stillsmil
I find that I cannot re-add a disk to a Ceph cluster after the OSD on the disk 
is removed. Ceph seems to know about the existence of these disks, but not 
about their "host:dev" information:

```
# ceph device ls
DEVICE HOST:DEV DAEMONS  WEAR  LIFE 
EXPECTANCY
SAMSUNG_MZ7L37T6_S6KHNE0T100049 
   0%   <-- should be host01:sda, was osd.0
SAMSUNG_MZ7L37T6_S6KHNE0T100050host01:sdb   osd.1  0%
SAMSUNG_MZ7L37T6_S6KHNE0T100052 
   0%   <-- should be host02:sda
SAMSUNG_MZ7L37T6_S6KHNE0T100053host01:sde   osd.9  0%
SAMSUNG_MZ7L37T6_S6KHNE0T100061host01:sdf   osd.11 0%
SAMSUNG_MZ7L37T6_S6KHNE0T100062 
   0%
SAMSUNG_MZ7L37T6_S6KHNE0T100063host01:sdc   osd.5  0%
SAMSUNG_MZ7L37T6_S6KHNE0T100064host01:sdg   osd.13 0%
SAMSUNG_MZ7L37T6_S6KHNE0T100065 
   0%
SAMSUNG_MZ7L37T6_S6KHNE0T100066host01:sdd   osd.7  0%
SAMSUNG_MZ7L37T6_S6KHNE0T100067 
   0%
SAMSUNG_MZ7L37T6_S6KHNE0T100068 
   0%<-- should be host02:sdb
SAMSUNG_MZ7L37T6_S6KHNE0T100069 
   0%
SAMSUNG_MZ7L37T6_S6KHNE0T100070 
   0%
SAMSUNG_MZ7L37T6_S6KHNE0T100071 
   0%
SAMSUNG_MZ7L37T6_S6KHNE0T100072host01:sdh   osd.15 0%
SAMSUNG_MZQL27T6HBLA-00B7C_S6CVNG0T321600  host03:nvme4n1   osd.20 0%
... obmitted ...
SAMSUNG_MZQL27T6HBLA-00B7C_S6CVNG0T321608  host03:nvme8n1   osd.22 0%
```

For disk "SAMSUNG_MZ7L37T6_S6KHNE0T100049", the "HOST:DEV" field is empty, 
while I believe it should be "host01:sda", as I have confirmed by running 
`smartctl -i /dev/sda" on host01.

I guess the missing information is the reason that OSDs cannot be created, 
either manually or automatically on these devices. I have tried:
1. `ceph orch daemon add osd host01:/dev/sda` Prints "Created no osd(s) on host 
host01; already created?"
2. `ceph orch apply osd --all-available-devices` adds nothing

I arrived at such situation while testing if draining a host works: I drained 
host02, removed it and added it back via:
```
ceph orch host drain host02
ceph orch host rm host02
ceph orch host add host02  --labels _admin
```

I am running Ceph 17.2.6 on ubuntu. Ceph was deployed via cephadm. FYI the 
relavent orchestrator spec for osd:
```
# ceph orch ls osd --export
service_type: osd
service_name: osd
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
  host_pattern: '*'
spec:
  data_devices:
all: true
  filter_logic: AND
  objectstore: bluestore
```


Any thoughts on what maybe wrong here? Is there a way that I can tell Ceph "you 
are wrong about the whereabouts of these disks, forget what you know and fetch 
disk information afresh"?

Any help much appreciated!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Dead node (watcher) won't timeout on RBD

2023-04-25 Thread max
Hey all,

I recently had a k8s node failure in my homelab, and even though I powered it 
off (and it's done for, so it won't get back up), it still shows up as watcher 
in rbd status.

```
root@node0:~# rbd status kubernetes/csi-vol-3e7af8ae-ceb6-4c94-8435-2f8dc29b313b
Watchers:
watcher=10.0.0.103:0/1520114202 client.1697844 cookie=140289402510784
watcher=10.0.0.103:0/39967552 client.1805496 cookie=140549449430704
root@node0:~# ceph osd blocklist ls
10.0.0.103:0/0 2023-04-15T13:15:39.061379+0200
listed 1 entries
```

Even though the node is down & I have blocked it multiple times for hours, it 
won't disappear. Meaning, ceph-csi-rbd claims the image is mounted already 
(manually binding works fine, and can cleanly unbind as well, but can't unbind 
from a node that doesn't exist anymore).

Is there any possibility to force kick an rbd client / watcher from ceph (e.g. 
switching the mgr / mon) or to see why this is not timing out?

I found some historical mails & issues (related to rook, which I don't use) 
regarding a param `osd_client_watch_timeout` but can't find how that relates to 
the RBD images.

Cheers,
Max.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: For suggestions and best practices on expanding Ceph cluster and removing old nodes

2023-04-25 Thread Marc

Maybe he is limited by the supported OS


> 
> I would create a new cluster with Quincy and would migrate the data from
> the old to the new cluster bucket by bucket. Nautilus is out of support
> and
> I would recommend at least to use a ceph version that is receiving
> Backports.
> 
> huxia...@horebdata.cn  schrieb am Di., 25. Apr.
> 2023, 18:30:
> 
> > Dear Ceph folks,
> >
> > I would like to listen to your advice on the following topic: We have
> a
> > 6-node Ceph cluster (for RGW usage only ) running on Luminous 12.2.12,
> and
> > now will add 10 new nodes. Our plan is to phase out the old 6 nodes,
> and
> > run RGW Ceph cluster with the new 10 nodes on Nautilus version。
> >
> > I can think of two ways to achieve the above goal. The first method
> would
> > be:   1) Upgrade the current 6-node cluster from Luminous 12.2.12 to
> > Nautilus 14.2.22;  2) Expand the cluster with the 10 new nodes, and
> then
> > re-balance;  3) After rebalance completes, remove the 6 old nodes from
> the
> > cluster
> >
> > The second method would get rid of the procedure to upgrade the old 6-
> node
> > from Luminous to Nautilus, because those 6 nodes will be phased out
> anyway,
> > but then we have to deal with a hybrid cluster with 6-node on Luminous
> > 12.2.12, and 10-node on Nautilus, and after re-balancing, we can
> remove the
> > 6 old nodes from the cluster.
> >
> > Any suggestions, advice, or best practice would be highly appreciated.
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: For suggestions and best practices on expanding Ceph cluster and removing old nodes

2023-04-25 Thread Joachim Kraftmayer
I would create a new cluster with Quincy and would migrate the data from
the old to the new cluster bucket by bucket. Nautilus is out of support and
I would recommend at least to use a ceph version that is receiving
Backports.

huxia...@horebdata.cn  schrieb am Di., 25. Apr.
2023, 18:30:

> Dear Ceph folks,
>
> I would like to listen to your advice on the following topic: We have a
> 6-node Ceph cluster (for RGW usage only ) running on Luminous 12.2.12, and
> now will add 10 new nodes. Our plan is to phase out the old 6 nodes, and
> run RGW Ceph cluster with the new 10 nodes on Nautilus version。
>
> I can think of two ways to achieve the above goal. The first method would
> be:   1) Upgrade the current 6-node cluster from Luminous 12.2.12 to
> Nautilus 14.2.22;  2) Expand the cluster with the 10 new nodes, and then
> re-balance;  3) After rebalance completes, remove the 6 old nodes from the
> cluster
>
> The second method would get rid of the procedure to upgrade the old 6-node
> from Luminous to Nautilus, because those 6 nodes will be phased out anyway,
> but then we have to deal with a hybrid cluster with 6-node on Luminous
> 12.2.12, and 10-node on Nautilus, and after re-balancing, we can remove the
> 6 old nodes from the cluster.
>
> Any suggestions, advice, or best practice would be highly appreciated.
>
> best regards,
>
>
> Samuel
>
>
>
> huxia...@horebdata.cn
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: For suggestions and best practices on expanding Ceph cluster and removing old nodes

2023-04-25 Thread huxia...@horebdata.cn
Thanks a lot for the valuable input, Wesley, Josh, and Anthony.

It seems the best practice would be upgrade first, and then expand, remove old 
nodes afterwards.

best regards,

Samuel



huxia...@horebdata.cn
 
From: Wesley Dillingham
Date: 2023-04-25 19:55
To: huxia...@horebdata.cn
CC: ceph-users
Subject: Re: [ceph-users] For suggestions and best practices on expanding Ceph 
cluster and removing old nodes
Get on nautilus first and (perhaps even go to pacific) before expansion. 
Primarily for the reason that starting  in nautilus degraded data recovery will 
be prioritized over remapped data recovery. As you phase out old hardware and 
phase in new hardware you will have a very large amount of backfill happening 
and if you get into a degraded state in the middle of this backfill it will 
take a much longer time for the degraded data to become clean again. 

Additionally, you will want to follow the best practice of updating your 
cluster in order. In short monitors then managers then osds then MDS and RGW 
then other clients. More details here: 
https://docs.ceph.com/en/latest/releases/nautilus/#upgrading-from-mimic-or-luminous

You dont want to run with a mixed software version cluster longer than a well 
coordinated upgrade takes. 

Respectfully,

Wes Dillingham
w...@wesdillingham.com
LinkedIn


On Tue, Apr 25, 2023 at 12:31 PM huxia...@horebdata.cn  
wrote:
Dear Ceph folks,

I would like to listen to your advice on the following topic: We have a 6-node 
Ceph cluster (for RGW usage only ) running on Luminous 12.2.12, and now will 
add 10 new nodes. Our plan is to phase out the old 6 nodes, and run RGW Ceph 
cluster with the new 10 nodes on Nautilus version。 

I can think of two ways to achieve the above goal. The first method would be:   
1) Upgrade the current 6-node cluster from Luminous 12.2.12 to Nautilus 
14.2.22;  2) Expand the cluster with the 10 new nodes, and then re-balance;  3) 
After rebalance completes, remove the 6 old nodes from the cluster

The second method would get rid of the procedure to upgrade the old 6-node from 
Luminous to Nautilus, because those 6 nodes will be phased out anyway, but then 
we have to deal with a hybrid cluster with 6-node on Luminous 12.2.12, and 
10-node on Nautilus, and after re-balancing, we can remove the 6 old nodes from 
the cluster.

Any suggestions, advice, or best practice would be highly appreciated.

best regards,


Samuel 



huxia...@horebdata.cn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Lua scripting in the rados gateway

2023-04-25 Thread Thomas Bennett
Hi ceph users,

I've been trying out the lua scripting for the rados gateway (thanks Yuval).

As in my previous email I mentioned that there is an error when trying to
load the luasocket module. However, I thought it was a good time to report
on my progress.

My 'hello world' example below is called *test.lua* below includes the
following checks:

   1. Can I write to the debug log?
   2. Can I use the lua socket package to do something stupid but
   intersting, like connect to a webservice?

Before you continue reading this, you might need to know that I run all
ceph processes in a *CentOS Stream release 8 *container deployed using ceph
orchestrator running *Ceph v17.2.5*, so please view the information below
in that context.

For anyone looking for a reference, I suggest going to the ceph lua rados
gateway documentation at radosgw/lua-scripting
.

There are two new switches you need to know about in the radosgw-admin:

   - *script* -> loading your lua script
   - *script-package* -> loading supporting packages for your script - e.i.
   luasocket in this case.

For a basic setup, you'll need to have a few dependencies in your
containers:

   - cephadm container: requires luarocks (I've checked the code - it runs
   a luarocks search command)
   - radosgw container: requires luarocks, gcc, make,  m4, wget (wget just
   in case).

To achieve the above, I updated the container image for our running system.
I needed to do this because I needed to redeploy the rados gateway
container to inject the lua script packages into the radosgw runtime
process. This will start with a fresh container based on the global config
*container_image* setting on your running system.

For us this is currently captured in *quay.io/tsolo/ceph:v17.2.5-3
* and included the following exta
steps (including installing the lua dev from an rpm because there is no
centos package in yum):
yum install luarocks gcc make wget m4
rpm -i
https://rpmfind.net/linux/centos/8-stream/PowerTools/x86_64/os/Packages/lua-devel-5.3.4-12.el8.x86_64.rpm

You will notice that I've included a compiler and compiler support into the
image. This is because luarocks on the radosgw to compile luasocket (the
package I want to install). This will happen at start time when the radosgw
is restarted from ceph orch.

In the cephadm container I still need to update our cephadm shell so I need
to install luarocks by hand:
yum install luarocks

Then set thew updated image to use:
ceph config set global container_image quay.io/tsolo/ceph:v17.2.5-3

I now create a file called: *test.lua* in the cephadm container. This
contains the following lines to write to the log and then do a get request
to google. This is not practical in production, but it serves the purpose
of testing the infrastructure:

RGWDebugLog("Tsolo start lua script")
local LuaSocket = require("socket")
client = LuaSocket.connect("google.com", 80)
client:send("GET / HTTP/1.0\r\nHost: google.com\r\n\r\n")
while true do
  s, status, partial = client:receive('*a')
  RGWDebugLog(s or partial)
  if status == "closed" then
break
  end
end
client:close()
RGWDebugLog("Tsolo stop lua")

Next I run:
radosgw-admin script-package add --package=luasocket --allow-compilation

And then list the added package to make sure it is there:
radosgw-admin script-package list

Note - at this point the radosgw has not been modified, it must first be
restarted.

Then I put the *test.lua *script into the pre request context:
radosgw-admin script put --infile=test.lua --context=preRequest

You also need to raise the debug log level on the running rados gateway:
ceph daemon
/var/run/ceph/ceph-client.rgw.xxx.xxx-cms1.x.x.xx.asok
config set debug_rgw 20

Inside the radosgw container I apply my fix (as per previous email):
cp -ru /tmp/luarocks/client.rgw.xx.xxx--.pcoulb/lib64/*
/tmp/luarocks/client.rgw.xx.xxx--.pcoulb/lib/

Outside on the host running the radosgw-admin container I follow the
journalctl for the radosgw container (to get the logs):
journalctl -fu ceph-----@rgw.
xxx.xxx-cms1.x.x.xx.service

Then I run an s3cmd to put data in via the rados gateway and check the
journalctl logs and see:
Apr 25 20:54:47 brp-ceph-cms1 radosgw[60901]: Lua INFO: Tsolo start lua
Apr 25 20:54:47 brp-ceph-cms1 radosgw[60901]: Lua INFO: HTTP/1.0 301 Moved
Permanently
Apr 25 20:54:47 brp-ceph-cms1 radosgw[60901]: Lua INFO:
Apr 25 20:54:47 brp-ceph-cms1 radosgw[60901]: Lua INFO: Tsolo stop lua
Apr 25 20:54:47 brp-ceph-cms1 radosgw[60901]: Lua INFO: Tsolo start lua
Apr 25 20:54:48 brp-ceph-cms1 radosgw[60901]: Lua INFO: HTTP/1.0 301 Moved
Permanently
Apr 25 20:54:48 brp-ceph-cms1 radosgw[60901]: Lua INFO:
Apr 25 20:54:48 brp-ceph-cms1 radosgw[60901]: Lua INFO: Tsolo stop lua

So the script worked :)

If you want to see where the luarocks libraries have been installed,  look

[ceph-users] PVE CEPH OSD heartbeat show

2023-04-25 Thread Peter
Dear all,

We are experiencing with Ceph after deploying it by PVE with the network backed 
by a 10G Cisco switch with VPC feature on. We are encountering a slow OSD 
heartbeat and have not been able to identify any network traffic issues.

Upon checking, we found that the ping is around 0.1ms, and there is occasional 
2% packet loss when using flood ping, but not consistently. We also noticed a 
large number of UDP port 5405 packets and the 'corosync' process utilizing a 
significant amount of CPU.

When running the 'ceph -s' command, we observed a slow OSD heartbeat on the 
back and front, with the longest latency being 2250.54ms. We suspect that this 
may be a network issue, but we are unsure of how Ceph detects such long 
latency. Additionally, we are wondering if a 2% packet loss can significantly 
affect Ceph's performance and even cause the OSD process to fail sometimes.

We have heard about potential issues with rockdb 6 causing OSD process 
failures, and we are curious about how to check the rockdb version. 
Furthermore, we are wondering how severe traffic package loss and latency must 
be to cause OSD process crashes, and how the monitoring system determines that 
an OSD is offline.

We would greatly appreciate any assistance or insights you could provide on 
these matters.
Thanks,

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Reset a bucket in a zone

2023-04-25 Thread Yixin Jin
Hi folks,
Within a zonegroup, once a bucket is created, its metadata is sync-ed over to 
all zones. With bucket-level sync policy, however, its content may or may not 
be sync-ed over. To simplify the sync process, sometime I'd like to pick the 
bucket in a zone as the absolute truth and sync its content over to the bucket 
in another zone, which may have done some local deletions since the last time 
they were sync-ed. I don't want those local deletions to mess up with the 
planned sync. Is it possible to reset the bucket in this zone so it is in a 
"pristine" state and will receive everything from the source?
Thanks,Yixin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Rados gateway lua script-package error lib64

2023-04-25 Thread Thomas Bennett
Hi,

I've noticed that when my lua script runs I get the following error on my
radosgw container. It looks like the lib64 directory is not included in the
path when looking for shared libraries.

Copying the content of lib64 into the lib directory solves the issue on the
running container.

Here are more details:
Apr 25 20:26:59 xxx-ceph- radosgw[60901]: req 2268223694354647302
0.0s Lua ERROR:
/tmp/luarocks/client.rgw.xx.xxx--.pcoulb/*share*/lua/5.3/socket.lua:12:
module 'socket.core' not found:
 no field package.preload['socket.core']
 no file '/tmp/luarocks/client.rgw.xx.xxx--.pcoulb/*share*
/lua/5.3/socket/core.lua'
 no file '/tmp/luarocks/client.rgw.xx.xxx--.pcoulb/*lib*
/lua/5.3/socket/core.so'
 no file '/tmp/luarocks/client.rgw.xx.xxx--.pcoulb/*lib*
/lua/5.3/socket.so'

As mentioned the following command on the running radosgw container solves
the issue for the running container:
cp -ru /tmp/luarocks/client.rgw.xx.xxx--.pcoulb/lib64/*
/tmp/luarocks/client.rgw.xx.xxx--.pcoulb/lib/

Cheers,
Tom
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: For suggestions and best practices on expanding Ceph cluster and removing old nodes

2023-04-25 Thread Wesley Dillingham
Get on nautilus first and (perhaps even go to pacific) before expansion.
Primarily for the reason that starting  in nautilus degraded data recovery
will be prioritized over remapped data recovery. As you phase out old
hardware and phase in new hardware you will have a very large amount of
backfill happening and if you get into a degraded state in the middle of
this backfill it will take a much longer time for the degraded data to
become clean again.

Additionally, you will want to follow the best practice of updating your
cluster in order. In short monitors then managers then osds then MDS and
RGW then other clients. More details here:
https://docs.ceph.com/en/latest/releases/nautilus/#upgrading-from-mimic-or-luminous

You dont want to run with a mixed software version cluster longer than a
well coordinated upgrade takes.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Apr 25, 2023 at 12:31 PM huxia...@horebdata.cn <
huxia...@horebdata.cn> wrote:

> Dear Ceph folks,
>
> I would like to listen to your advice on the following topic: We have a
> 6-node Ceph cluster (for RGW usage only ) running on Luminous 12.2.12, and
> now will add 10 new nodes. Our plan is to phase out the old 6 nodes, and
> run RGW Ceph cluster with the new 10 nodes on Nautilus version。
>
> I can think of two ways to achieve the above goal. The first method would
> be:   1) Upgrade the current 6-node cluster from Luminous 12.2.12 to
> Nautilus 14.2.22;  2) Expand the cluster with the 10 new nodes, and then
> re-balance;  3) After rebalance completes, remove the 6 old nodes from the
> cluster
>
> The second method would get rid of the procedure to upgrade the old 6-node
> from Luminous to Nautilus, because those 6 nodes will be phased out anyway,
> but then we have to deal with a hybrid cluster with 6-node on Luminous
> 12.2.12, and 10-node on Nautilus, and after re-balancing, we can remove the
> 6 old nodes from the cluster.
>
> Any suggestions, advice, or best practice would be highly appreciated.
>
> best regards,
>
>
> Samuel
>
>
>
> huxia...@horebdata.cn
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: For suggestions and best practices on expanding Ceph cluster and removing old nodes

2023-04-25 Thread Josh Baergen
Hi Samuel,

While the second method would probably work fine in the happy path, if
something goes wrong I think you'll be happier having a uniform
release installed. In general, we've found the backfill experience to
be better on Nautilus than Luminous, so my vote would be for the first
method. Given that your usage is RGW, just note that the OMAP format
change that happens between Luminous and Nautilus can sometimes take a
while.

Josh

On Tue, Apr 25, 2023 at 10:31 AM huxia...@horebdata.cn
 wrote:
>
> Dear Ceph folks,
>
> I would like to listen to your advice on the following topic: We have a 
> 6-node Ceph cluster (for RGW usage only ) running on Luminous 12.2.12, and 
> now will add 10 new nodes. Our plan is to phase out the old 6 nodes, and run 
> RGW Ceph cluster with the new 10 nodes on Nautilus version。
>
> I can think of two ways to achieve the above goal. The first method would be: 
>   1) Upgrade the current 6-node cluster from Luminous 12.2.12 to Nautilus 
> 14.2.22;  2) Expand the cluster with the 10 new nodes, and then re-balance;  
> 3) After rebalance completes, remove the 6 old nodes from the cluster
>
> The second method would get rid of the procedure to upgrade the old 6-node 
> from Luminous to Nautilus, because those 6 nodes will be phased out anyway, 
> but then we have to deal with a hybrid cluster with 6-node on Luminous 
> 12.2.12, and 10-node on Nautilus, and after re-balancing, we can remove the 6 
> old nodes from the cluster.
>
> Any suggestions, advice, or best practice would be highly appreciated.
>
> best regards,
>
>
> Samuel
>
>
>
> huxia...@horebdata.cn
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] For suggestions and best practices on expanding Ceph cluster and removing old nodes

2023-04-25 Thread huxia...@horebdata.cn
Dear Ceph folks,

I would like to listen to your advice on the following topic: We have a 6-node 
Ceph cluster (for RGW usage only ) running on Luminous 12.2.12, and now will 
add 10 new nodes. Our plan is to phase out the old 6 nodes, and run RGW Ceph 
cluster with the new 10 nodes on Nautilus version。 

I can think of two ways to achieve the above goal. The first method would be:   
1) Upgrade the current 6-node cluster from Luminous 12.2.12 to Nautilus 
14.2.22;  2) Expand the cluster with the 10 new nodes, and then re-balance;  3) 
After rebalance completes, remove the 6 old nodes from the cluster

The second method would get rid of the procedure to upgrade the old 6-node from 
Luminous to Nautilus, because those 6 nodes will be phased out anyway, but then 
we have to deal with a hybrid cluster with 6-node on Luminous 12.2.12, and 
10-node on Nautilus, and after re-balancing, we can remove the 6 old nodes from 
the cluster.

Any suggestions, advice, or best practice would be highly appreciated.

best regards,


Samuel 



huxia...@horebdata.cn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Bucket notification

2023-04-25 Thread Szabo, Istvan (Agoda)
Hi,

I'm trying to set a kafka endpoint for bucket object create operation 
notifications but the notification is not created in kafka endpoint.
Settings seems to be fine because I can upload to the bucket objects when these 
settings are applied:

NotificationConfiguration>

bulknotif
arn:aws:sns:default::butcen
s3:ObjectCreated:*
s3:ObjectRemoved:*



but it simply not created any message in kafka.

This is my topic creation post request:

https://xxx.local/?
Action=CreateTopic&
Name=butcen&
kafka-ack-level=broker&
use-ssl=true&
push-endpoint=kafka://ceph:pw@xxx.local:9093

Am I missing something or it's definitely kafka issue?

Thank you



This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm grafana per host certificate

2023-04-25 Thread Eugen Block

Seems like the per-host config was actually introduced in 16.2.11:
https://github.com/ceph/ceph/pull/48103

So I'm gonna have to wait for 16.2.13. Sorry for the noise.

Zitat von Eugen Block :

I looked a bit deeper and compared to a similar customer cluster  
(16.2.11) where I had to reconfigure grafana after an upgrade  
anyway. There it seems to work as expected with the per-host  
certificate. I only added the host-specific certs and keys and see  
the graphs in the dashboard while on our 16.2.10 cluster this  
doesn't work that way. So I assume there must be a difference  
between .10 and .11 regarding grafana, could anyone confirm this?


Zitat von Eugen Block :


Hi,

thanks for the suggestion, I'm aware of a wildcard certificate  
option (which brings its own issues for other services). But since  
the ceph config seems to support this per-host based certificates I  
would like to get this running.


Thanks,
Eugen

Zitat von Reto Gysi :


Hi Eugen,

I've created a certificate with subject alternative names, so the
certificate is valid on each node of the cluster.
[image: image.png]

Cheers

Reto

Am Do., 20. Apr. 2023 um 11:42 Uhr schrieb Eugen Block :


Hi *,

I've set up grafana, prometheus and node-exporter on an adopted
cluster (currently running 16.2.10) and was trying to enable ssl for
grafana. As stated in the docs [1] there's a way to configure
individual certs and keys per host:

ceph config-key set mgr/cephadm/{hostname}/grafana_key -i $PWD/key.pem
ceph config-key set mgr/cephadm/{hostname}/grafana_crt -i
$PWD/certificate.pem

So I did that, then ran 'ceph orch reconfig grafana' but I still get a
bad cert error message:

Apr 20 10:21:19 ceph01 conmon[3772491]: server.go:3160: http: TLS
handshake error from :46084: remote error: tls: bad certificate

It seems like the cephadm generated cert/key pair
(mgr/cephadm/grafana_key; mgr/cephadm/grafana_crt) supersedes the
per-host certs, and even after removing the generated cert/key (and
then reconfigure) cephadm regenerates a them and leaves me with the
same problem. Is this a known issue and what would be the fix? I
didn't find anything on tracker, but I might have missed it.
To confirm that my custom certs actually work I replaced the general
cert with my custom cert and the error doesn't appear, I can see the
grafana graphs in the dashboard. I could leave it like this, but if
grafana would failover it wouldn't work anymore, of course.
Any hints are greatly appreciated.

Thanks,
Eugen

[1]

https://docs.ceph.com/en/latest/cephadm/services/monitoring/#configuring-ssl-tls-for-grafana
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm grafana per host certificate

2023-04-25 Thread Eugen Block
I looked a bit deeper and compared to a similar customer cluster  
(16.2.11) where I had to reconfigure grafana after an upgrade anyway.  
There it seems to work as expected with the per-host certificate. I  
only added the host-specific certs and keys and see the graphs in the  
dashboard while on our 16.2.10 cluster this doesn't work that way. So  
I assume there must be a difference between .10 and .11 regarding  
grafana, could anyone confirm this?


Zitat von Eugen Block :


Hi,

thanks for the suggestion, I'm aware of a wildcard certificate  
option (which brings its own issues for other services). But since  
the ceph config seems to support this per-host based certificates I  
would like to get this running.


Thanks,
Eugen

Zitat von Reto Gysi :


Hi Eugen,

I've created a certificate with subject alternative names, so the
certificate is valid on each node of the cluster.
[image: image.png]

Cheers

Reto

Am Do., 20. Apr. 2023 um 11:42 Uhr schrieb Eugen Block :


Hi *,

I've set up grafana, prometheus and node-exporter on an adopted
cluster (currently running 16.2.10) and was trying to enable ssl for
grafana. As stated in the docs [1] there's a way to configure
individual certs and keys per host:

ceph config-key set mgr/cephadm/{hostname}/grafana_key -i $PWD/key.pem
ceph config-key set mgr/cephadm/{hostname}/grafana_crt -i
$PWD/certificate.pem

So I did that, then ran 'ceph orch reconfig grafana' but I still get a
bad cert error message:

Apr 20 10:21:19 ceph01 conmon[3772491]: server.go:3160: http: TLS
handshake error from :46084: remote error: tls: bad certificate

It seems like the cephadm generated cert/key pair
(mgr/cephadm/grafana_key; mgr/cephadm/grafana_crt) supersedes the
per-host certs, and even after removing the generated cert/key (and
then reconfigure) cephadm regenerates a them and leaves me with the
same problem. Is this a known issue and what would be the fix? I
didn't find anything on tracker, but I might have missed it.
To confirm that my custom certs actually work I replaced the general
cert with my custom cert and the error doesn't appear, I can see the
grafana graphs in the dashboard. I could leave it like this, but if
grafana would failover it wouldn't work anymore, of course.
Any hints are greatly appreciated.

Thanks,
Eugen

[1]

https://docs.ceph.com/en/latest/cephadm/services/monitoring/#configuring-ssl-tls-for-grafana
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Veeam backups to radosgw seem to be very slow

2023-04-25 Thread Anthony D'Atri



>> 
>> 
>> We have a customer that tries to use veeam with our rgw objectstorage and
>> it seems to be blazingly slow.
>> What also seems to be strange, that veeam sometimes show "bucket does not
>> exist" or "permission denied".
>> I've tested parallel and everything seems to work fine from the s3cmd/aws
>> cli standpoint.
>> Does anyone here ever experienced veeam problems with rgw?
> 
> Mostly the issue that if you do not set block sizes to a large value,
> it will create REALLY small files.
> If "problems" relates to "poor MB/s perf", then this could be it. TCP
> will not get up to any speed if you keep the S3 objects to very small
> sizes.
> 
> Look at 
> https://community.veeam.com/blogs-and-podcasts-57/sobr-veeam-capacity-tier-calculations-and-considerations-in-v11-2548
> for "extra large blocks" to make them 8M at least.
> We had one Veeam installation vomit millions of files onto our rgw-S3
> at an average size of 180k per object, and at those sizes, you will
> see very poor throughput and the many objs/MB will hurt all other
> kinds of performance like listing the bucket and so on.


Depending on the bucket pool and the size distribution, there could be 
significant space amp too.

A while back I looked at a cluster that was using Ceph's iscsi-gw as a 
destination for Veeam, they were experiencing very high latency, on the order 
of 100ms.  I wonder if a variant of this was in play.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Veeam backups to radosgw seem to be very slow

2023-04-25 Thread Ulrich Klein
Hi,

I’ve tested that combination once last year. My experience was similar. It was 
dead-slow.
But if I remember correctly my conclusion was that Veeam was sending very 
slowly lots of rather small objects without any parallelism.
But apart from the cruel slowness I didn’t have problems of the “bucket does 
not exist”/“permission denied” type.

I don’t have the setup anymore. But I think it might be worth checking what 
kind of objects Veeam puts into its buckets.

Ciao, Uli

> On 25. Apr 2023, at 15:01, Boris Behrens  wrote:
> 
> We have a customer that tries to use veeam with our rgw objectstorage and
> it seems to be blazingly slow.
> 
> What also seems to be strange, that veeam sometimes show "bucket does not
> exist" or "permission denied".
> 
> I've tested parallel and everything seems to work fine from the s3cmd/aws
> cli standpoint.
> 
> Does anyone here ever experienced veeam problems with rgw?
> 
> Cheers
> Boris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Veeam backups to radosgw seem to be very slow

2023-04-25 Thread Janne Johansson
Den tis 25 apr. 2023 kl 15:02 skrev Boris Behrens :
>
> We have a customer that tries to use veeam with our rgw objectstorage and
> it seems to be blazingly slow.
> What also seems to be strange, that veeam sometimes show "bucket does not
> exist" or "permission denied".
> I've tested parallel and everything seems to work fine from the s3cmd/aws
> cli standpoint.
> Does anyone here ever experienced veeam problems with rgw?

Mostly the issue that if you do not set block sizes to a large value,
it will create REALLY small files.
If "problems" relates to "poor MB/s perf", then this could be it. TCP
will not get up to any speed if you keep the S3 objects to very small
sizes.

Look at 
https://community.veeam.com/blogs-and-podcasts-57/sobr-veeam-capacity-tier-calculations-and-considerations-in-v11-2548
for "extra large blocks" to make them 8M at least.
We had one Veeam installation vomit millions of files onto our rgw-S3
at an average size of 180k per object, and at those sizes, you will
see very poor throughput and the many objs/MB will hurt all other
kinds of performance like listing the bucket and so on.




--
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Veeam backups to radosgw seem to be very slow

2023-04-25 Thread Boris Behrens
We have a customer that tries to use veeam with our rgw objectstorage and
it seems to be blazingly slow.

What also seems to be strange, that veeam sometimes show "bucket does not
exist" or "permission denied".

I've tested parallel and everything seems to work fine from the s3cmd/aws
cli standpoint.

Does anyone here ever experienced veeam problems with rgw?

Cheers
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Bucket sync policy

2023-04-25 Thread Soumya Koduri

Hi Yixin,

On 4/25/23 00:21, Yixin Jin wrote:

  Actually, "bucket sync run" somehow made it worse since now the destination zone shows 
"bucket is caught up with source" from "bucket sync status" even though it clearly missed 
an object.

 On Monday, April 24, 2023 at 02:37:46 p.m. EDT, Yixin Jin 
 wrote:
  
   An update:

After creating and enabling the bucket sync policy, I ran "bucket sync markers" and saw that each shard had 
the status of "init". The run "bucket sync run" in the end marked the status to be 
"incremental-sync", which seems to go through full-sync stage. However, the lone object in the source zone 
wasn't synced over to the destination zone.
I actually used gdb to walk through radosgw-admin to run "bucket sync run". It seems not to do anything 
for full-sync and it printed a log saying "finished iterating over all available prefixes:...", which 
actually broke off the do-while loop after the call to prefix_handler.revalidate_marker(&list_marker). This 
call returned false because it couldn't find rules from the sync pipe. I haven't drilled deeper to see why it 
didn't get rules, whatever it means. Nevertheless, the workaround with "bucket sync run" doesn't seem 
to work, at least not with Quincy.



 As Matt mentioned, we have been fixing couple of issues related to 
sync-policy lately (eg., https://tracker.ceph.com/issues/58518) . Could 
you please re-test on the current mainline.


I tested this scenario -

1) Create Zonegroup Policy and set it to Allowed

2) Create 'testobject' on the primary zone

3) Create Bucket level policy and set it to Enabled

4) Check if 'testobject' is synced to secondary zone

As expected, this object was not synced but post running "bucket sync 
run" cmd  from the secondary zone, the object got synced .



Let me know if I missed any step from your testcase.


Thanks,

Soumya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io