[ceph-users] Re: Disable signature url in ceph rgw

2023-12-08 Thread Robin H. Johnson
On Fri, Dec 08, 2023 at 10:41:59AM +0100, marc@singer.services wrote:
> Hi Ceph users
> 
> We are using Ceph Pacific (16) in this specific deployment.
> 
> In our use case we do not want our users to be able to generate signature v4 
> URLs because they bypass the policies that we set on buckets (e.g IP 
> restrictions).
> Currently we have a sidecar reverse proxy running that filters requests with 
> signature URL specific request parameters.
> This is obviously not very efficient and we are looking to replace this 
> somehow in the future.
> 
> 1. Is there an option in RGW to disable this signed URLs (e.g returning 
> status 403)?
> 2. If not is this planned or would it make sense to add it as a configuration 
> option?
> 3. Or is the behaviour of not respecting bucket policies in RGW with 
> signature v4 URLs a bug and they should be actually applied?

Trying to clarify your ask:
- you want ALL requests, including presigned URLs, to be subject to the
  IP restrictions encoded in your bucket policy?
  e.g. auth (signature AND IP-list)

That should be possible with bucket policy.

Can you post the current bucket policy that you have? (redact with
distinct values the IPs, userids, bucket name, any paths, but otherwise
keep it complete).

You cannot fundamentally stop anybody from generating presigned URLs,
because that's purely a client-side operation. Generating presigned URLs
requires an access key and secret key, at which point they can do
presigned or regular authenticated requests.

P.S. What stops your users from changing the bucket policy?

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS recovery with existing pools

2023-12-08 Thread Eugen Block
Some more information on the damaged CephFS, apparently the journal is  
damaged:


---snip---
# cephfs-journal-tool --rank=storage:0 --journal=mdlog journal inspect

2023-12-08T15:35:22.922+0200 7f834d0320c0 -1 Missing object 200.000527c4

2023-12-08T15:35:22.938+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f140067f) at 0x149f1174595


2023-12-08T15:35:22.942+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f1400e66) at 0x149f1174d7c


2023-12-08T15:35:22.954+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f1401642) at 0x149f1175558


2023-12-08T15:35:22.970+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f1401e29) at 0x149f1175d3f


2023-12-08T15:35:22.974+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f1402610) at 0x149f1176526


2023-12-08T15:35:22.978+0200 7f834d0320c0 -1 Missing object 200.000527ca

2023-12-08T15:35:22.978+0200 7f834d0320c0 -1 Missing object 200.000527cb

2023-12-08T15:35:22.994+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f30008f4) at 0x149f2d7480a


2023-12-08T15:35:22.998+0200 7f834d0320c0 -1 Bad entry start ptr  
(0x149f3000ced) at 0x149f2d74c03


Overall journal integrity: DAMAGED
Objects missing:
  0x527c4
  0x527ca
  0x527cb
Corrupt regions:
  0x149f0d73f16-149f1174595
  0x149f1174595-149f1174d7c
  0x149f1174d7c-149f1175558
  0x149f1175558-149f1175d3f
  0x149f1175d3f-149f1176526
  0x149f1176526-149f2d7480a
  0x149f2d7480a-149f2d74c03
  0x149f2d74c03-

# cephfs-journal-tool --rank=storage:0 --journal=purge_queue journal inspect

2023-12-08T15:35:57.691+0200 7f331621e0c0 -1 Missing object 500.0dc6

Overall journal integrity: DAMAGED
Objects missing:
  0xdc6
Corrupt regions:
  0x3718522e9-
---snip---

A backup isn't possible:

---snip---
# cephfs-journal-tool --rank=storage:0 journal export backup.bin
2023-12-08T15:42:07.643+0200 7fde6a24f0c0 -1 Missing object 200.000527c4

2023-12-08T15:42:07.659+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f140067f) at 0x149f1174595


2023-12-08T15:42:07.667+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f1400e66) at 0x149f1174d7c


2023-12-08T15:42:07.675+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f1401642) at 0x149f1175558


2023-12-08T15:42:07.687+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f1401e29) at 0x149f1175d3f


2023-12-08T15:42:07.699+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f1402610) at 0x149f1176526


2023-12-08T15:42:07.699+0200 7fde6a24f0c0 -1 Missing object 200.000527ca

2023-12-08T15:42:07.699+0200 7fde6a24f0c0 -1 Missing object 200.000527cb

2023-12-08T15:42:07.707+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f30008f4) at 0x149f2d7480a


2023-12-08T15:42:07.707+0200 7fde6a24f0c0 -1 Bad entry start ptr  
(0x149f3000ced) at 0x149f2d74c03


2023-12-08T15:42:07.707+0200 7fde6a24f0c0 -1 journal_export: Journal  
not readable, attempt object-by-object dump with `rados`


Error ((5) Input/output error)
---snip---

Does it make sense to continue with the advanced disaster recovery [3]  
bei running (all of) these steps:


cephfs-journal-tool event recover_dentries summary
cephfs-journal-tool [--rank=N] journal reset
cephfs-table-tool all reset session
ceph fs reset  --yes-i-really-mean-it
cephfs-table-tool 0 reset session
cephfs-table-tool 0 reset snap
cephfs-table-tool 0 reset inode
cephfs-journal-tool --rank=0 journal reset
cephfs-data-scan init

Fortunately, I didn't have to run through this procedure too often, so  
I'd appreciate any comments what the best approach would be here.


Thanks!
Eugen

[3]  
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts



Zitat von Eugen Block :

I was able to (almost) reproduce the issue in a (Pacific) test  
cluster. I rebuilt the monmap from the OSDs, brought everything back  
up, started the mds recovery like described in [1]:


ceph fs new--force --recover

Then I added two mds daemons which went into standby:

---snip---
Started Ceph mds.cephfs.pacific.uexvvq for  
1b0afda4-2221-11ee-87be-fa163eed040c.
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.086+ 7ff5f589b900  0 set uid:gid to 167:167  
(ceph:ceph)
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.086+ 7ff5f589b900  0 ceph version 16.2.14  
(238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable), process  
ceph-md>
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.086+ 7ff5f589b900  1 main not setting numa  
affinity
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.086+ 7ff5f589b900  0 pidfile_write: ignore  
empty --pid-file

Dez 08 12:51:53 pacific conmon[100493]: starting mds.cephfs.pacific.uexvvq at
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.102+ 7ff5e37be700  1  
mds.cephfs.pacific.uexvvq Updating MDS map to version 2 from mon.0
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.802+ 7ff5e37be700  1  
mds.cephfs.pacific.uexvvq Updating MDS map to version 3 from mon.

[ceph-users] Re: MDS recovery with existing pools

2023-12-08 Thread Eugen Block
I was able to (almost) reproduce the issue in a (Pacific) test  
cluster. I rebuilt the monmap from the OSDs, brought everything back  
up, started the mds recovery like described in [1]:


ceph fs new--force --recover

Then I added two mds daemons which went into standby:

---snip---
Started Ceph mds.cephfs.pacific.uexvvq for  
1b0afda4-2221-11ee-87be-fa163eed040c.
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.086+ 7ff5f589b900  0 set uid:gid to 167:167  
(ceph:ceph)
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.086+ 7ff5f589b900  0 ceph version 16.2.14  
(238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable), process  
ceph-md>
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.086+ 7ff5f589b900  1 main not setting numa  
affinity
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.086+ 7ff5f589b900  0 pidfile_write: ignore  
empty --pid-file

Dez 08 12:51:53 pacific conmon[100493]: starting mds.cephfs.pacific.uexvvq at
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.102+ 7ff5e37be700  1 mds.cephfs.pacific.uexvvq  
Updating MDS map to version 2 from mon.0
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.802+ 7ff5e37be700  1 mds.cephfs.pacific.uexvvq  
Updating MDS map to version 3 from mon.0
Dez 08 12:51:53 pacific conmon[100493]: debug  
2023-12-08T11:51:53.802+ 7ff5e37be700  1 mds.cephfs.pacific.uexvvq  
Monitors have assigned me to become a standby.

---snip---


But as soon as I ran

pacific:~ # ceph fs set cephfs joinable true
cephfs marked joinable; MDS may join as newly active.

one MDS daemon became active and the FS is available now. So  
apparently the "Advanced" steps from [2] usually weren't necessary,  
but are they in this case? I'm still trying to find an explanation for  
the purge_queue errors.


Zitat von Eugen Block :


Hi,

following up on the previous thread (After hardware failure tried to  
recover ceph and followed instructions for recovery using OSDS), we  
were able to get ceph back into a healthy state (including the  
unfound object). Now the CephFS needs to be recovered and I'm having  
trouble to fully understand the docs [1] which the next steps would  
be. We ran the following which according to [1] sets the state to  
existing but failed:


ceph fs new--force --recover

But how to continue from here? Should we expect an active MDS at  
this point or not? Because the "ceph fs status" output still shows  
rank 0 as failed. We then tried:


ceph fs set  joinable true

But apparently it was already joinable, nothing changed. Before  
doing anything (destructive) from the advanced options [2] I wanted  
to ask the community, how to get on from here. I pasted the mds logs  
at the bottom, I'm not really sure if the current state is expected  
or not. Apparently, the journal recovers but the purge_queue does not:


mds.0.41 Booting: 2: waiting for purge queue recovered
mds.0.journaler.pq(ro) _finish_probe_end write_pos = 14797504512  
(header had 14789452521). recovered.

mds.0.purge_queue operator(): open complete
mds.0.purge_queue operator(): recovering write_pos
monclient: get_auth_request con 0x55c280bc5c00 auth_method 0
monclient: get_auth_request con 0x55c280ee0c00 auth_method 0
mds.0.journaler.pq(ro) _finish_read got error -2
mds.0.purge_queue _recover: Error -2 recovering write_pos
mds.0.purge_queue _go_readonly: going readonly because internal IO  
failed: No such file or directory

mds.0.journaler.pq(ro) set_readonly
mds.0.41 unhandled write error (2) No such file or directory, force  
readonly...

mds.0.cache force file system read-only
force file system read-only

Is this expected because the "--recover" flag prevents an active MDS  
or not? Before running "ceph mds rmfailed ..." and/or "ceph fs reset  
" with the --yes-i-really-mean-it flag I'd like to  
ask for your input. In which case should we run those commands? The  
docs are not really clear to me. Any input is highly appreciated!


Thanks!
Eugen

[1] https://docs.ceph.com/en/latest/cephfs/recover-fs-after-mon-store-loss/
[2]  
https://docs.ceph.com/en/latest/cephfs/administration/#advanced-cephfs-admin-settings


---snip---
Dec 07 15:35:48 node02 bash[692598]: debug-90>  
2023-12-07T13:35:47.730+ 7f4cd855f700  1  
mds.storage.node02.hemalk Updating MDS map to version 41 from mon.0
Dec 07 15:35:48 node02 bash[692598]: debug-89>  
2023-12-07T13:35:47.730+ 7f4cd855f700  4 mds.0.purge_queue  
operator():  data pool 3 not found in OSDMap
Dec 07 15:35:48 node02 bash[692598]: debug-88>  
2023-12-07T13:35:47.730+ 7f4cd855f700  5 asok(0x55c27fe86000)  
register_command objecter_requests hook 0x55c27fe16310
Dec 07 15:35:48 node02 bash[692598]: debug-87>  
2023-12-07T13:35:47.730+ 7f4cd855f700 10 monclient: _renew_subs
Dec 07 15:35:48 node02 bash[692598]: debug-86>  
2023-12-07T13:35:47.730+ 7f4cd855f700 10 monclient:  
_send

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-12-08 Thread Frank Schilder
Hi Xiubo,

I will update the case. I'm afraid this will have to wait a little bit though. 
I'm too occupied for a while and also don't have a test cluster that would help 
speed things up. I will update you, please keep the tracker open.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: Tuesday, December 5, 2023 1:58 AM
To: Frank Schilder; Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent

Frank,

By using your script I still couldn't reproduce it. Locally my python
version is 3.9.16, and I didn't have other VMs to test python other
versions.

Could you check the tracker to provide the debug logs ?

Thanks

- Xiubo

On 12/1/23 21:08, Frank Schilder wrote:
> Hi Xiubo,
>
> I uploaded a test script with session output showing the issue. When I look 
> at your scripts, I can't see the stat-check on the second host anywhere. 
> Hence, I don't really know what you are trying to compare.
>
> If you want me to run your test scripts on our system for comparison, please 
> include the part executed on the second host explicitly in an ssh-command. 
> Running your scripts alone in their current form will not reproduce the issue.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Xiubo Li 
> Sent: Monday, November 27, 2023 3:59 AM
> To: Frank Schilder; Gregory Farnum
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent
>
>
> On 11/24/23 21:37, Frank Schilder wrote:
>> Hi Xiubo,
>>
>> thanks for the update. I will test your scripts in our system next week. 
>> Something important: running both scripts on a single client will not 
>> produce a difference. You need 2 clients. The inconsistency is between 
>> clients, not on the same client. For example:
> Frank,
>
> Yeah, I did this with 2 different kclients.
>
> Thanks
>
>> Setup: host1 and host2 with a kclient mount to a cephfs under /mnt/kcephfs
>>
>> Test 1
>> - on host1: execute shutil.copy2
>> - execute ls -l /mnt/kcephfs/ on host1 and host2: same result
>>
>> Test 2
>> - on host1: shutil.copy
>> - execute ls -l /mnt/kcephfs/ on host1 and host2: file size=0 on host 2 
>> while correct on host 1
>>
>> Your scripts only show output of one host, but the inconsistency requires 
>> two hosts for observation. The stat information is updated on host1, but not 
>> synchronized to host2 in the second test. In case you can't reproduce that, 
>> I will append results from our system to the case.
>>
>> Also it would be important to know the python and libc versions. We observe 
>> this only for newer versions of both.
>>
>> Best regards,
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> 
>> From: Xiubo Li 
>> Sent: Thursday, November 23, 2023 3:47 AM
>> To: Frank Schilder; Gregory Farnum
>> Cc: ceph-users@ceph.io
>> Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent
>>
>> I just raised one tracker to follow this:
>> https://tracker.ceph.com/issues/63510
>>
>> Thanks
>>
>> - Xiubo
>>
>>
>
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Disable signature url in ceph rgw

2023-12-08 Thread marc
Hi Ceph users

We are using Ceph Pacific (16) in this specific deployment.

In our use case we do not want our users to be able to generate signature v4 
URLs because they bypass the policies that we set on buckets (e.g IP 
restrictions).
Currently we have a sidecar reverse proxy running that filters requests with 
signature URL specific request parameters.
This is obviously not very efficient and we are looking to replace this somehow 
in the future.

1. Is there an option in RGW to disable this signed URLs (e.g returning status 
403)?
2. If not is this planned or would it make sense to add it as a configuration 
option?
3. Or is the behaviour of not respecting bucket policies in RGW with signature 
v4 URLs a bug and they should be actually applied?

Thanks you for your help and let me know if you have any questions

Marc Singer
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to replace a disk with minimal impact on performance

2023-12-08 Thread Janne Johansson
>
> Based on our observation of the impact of the balancer on the
> performance of the entire cluster, we have drawn conclusions that we
> would like to discuss with you.
>
>  - A newly created pool should be balanced before being handed over
> to the user. This, I believe, is quite evident.
>

I think this question might contain a lot of hidden assumptions, so it's
hard to respond to in a correct manner. Using rgw means you get some
7-10-13 different pools depending on if you use either swift/s3 or all at
the same time. In this case, only one or a few of those pools need care
before doing bulk work, the rest are quite fine being very small and ..
"unbalanced".


>  - When replacing a disk, it is advisable to exchange it directly
> for a new one. As soon as the OSD replacement occurs, the balancer
> should be invoked to realign any improperly placed PGs during the disk
> outage and disk recovery.
>

Not that I think the default behaviours are optimal in any way, but the
above text seems to describe what actually does happen, even if the
balancer may not be involved, the normal crush "repairs" of an imbalanced
cluster will even the data out when the new OSD is in place.

 Perhaps an even better method is to pause recovery and backfilling
> before removing the disk, remove the disk itself, promptly add a new
> one, and then resume recovery and backfilling. It's essential to per
> form all of this as quickly as possible (using a script).
>

Here I would just state "set norebalance (and noout if you must stop the
whole OSD host) before removing the old and adding the new OSD", then when
the new OSD is created and started, you unset the options and let it repair
back to the newly added OSD.


> Ad. We are using a community balancer developed by Jonas Jelton because
> the built-in one does not meet our requirements.
>

We sometimes use the python or go upmap remapper scripts/programs to have
the cluster be less sad while moving a small number of PGs at a time, but
that is more or less just for convenience and to let scrubs run on the
non-moving PGs if the data movements are expected to take long calendar
time.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io