[ceph-users] Dilemma with PG distribution

2022-12-04 Thread Boris Behrens
Hi, I am just evaluating out cluster configuration again, because we had an very bad incident with laggy OSDs that shut down the entire cluster. We use datacenter SSDs in different sizes (2, 4, 8TB) and someone said, that I should not go beyond a specific amount of PGs on certain device classes.

[ceph-users] Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

2022-12-04 Thread Boris Behrens
.82f60d356b4e4a.a1c2:head [write 1900544~147456 in=147456b] snapc 0=[] ondisk+write+known_if_redirected e945868) Cheers Boris Am So., 4. Dez. 2022 um 03:15 Uhr schrieb Alex Gorbachev < a...@iss-integration.com>: > Boris, I have seen one problematic OSD cause this issue on all OSD wit

[ceph-users] octopus rbd cluster just stopped out of nowhere (>20k slow ops)

2022-12-02 Thread Boris Behrens
hi, maybe someone here can help me to debug an issue we faced today. Today one of our clusters came to a grinding halt with 2/3 of our OSDs reporting slow ops. Only option to get it back to work fast, was to restart all OSDs daemons. The cluster is an octopus cluster with 150 enterprise SSD

[ceph-users] Re: radosgw octopus - how to cleanup orphan multipart uploads

2022-12-02 Thread Boris Behrens
ozuCYXDKYvhkW5RiZUxuaNfu48C.365_1-- --ff7a8b0c-07e6-463a-861b-78f0adeba8ad.2339856956.63__multipart_8cfd0bdb-05f9-40cd-a50d-83295b416ea9.lz4.CwlAWozuCYXDKYvhkW5RiZUxuaNfu48C.365-- Am Fr., 2. Dez. 2022 um 12:17 Uhr schrieb Boris Behrens : > Hi, > we are currently encountering a lot of broken

[ceph-users] Re: radosgw-octopus latest - NoSuchKey Error - some buckets lose their rados objects, but not the bucket index

2022-12-02 Thread Boris Behrens
like > "c44a7aab-e086-43df-befe-ed8151b3a209.4147.1_obj1”. > > 3. grep through the logs for the head object and see if you find anything. > > Eric > (he/him) > > On Nov 22, 2022, at 10:36 AM, Boris Behrens wrote: > > Does someone have an idea what I can ch

[ceph-users] radosgw octopus - how to cleanup orphan multipart uploads

2022-12-02 Thread Boris Behrens
Hi, we are currently encountering a lot of broken / orphan multipart uploads. When I try to fetch the multipart uploads via s3cmd, it just never finishes. Debug output looks like this and it basically never changes. DEBUG: signature-v4 headers: {'x-amz-date': '20221202T105838Z', 'Authorization':

[ceph-users] Re: radosgw-admin bucket check --fix returns a lot of errors (unable to find head object data)

2022-11-23 Thread Boris Behrens
ere, but now I don't care. (I also have this for a healthy bucket, where I test stuff like this prior, which gets recreated periodically) Am Mi., 23. Nov. 2022 um 12:22 Uhr schrieb Boris Behrens : > Hi, > we have a customer that got some _multipart_ files in his bucket, but the > bucket g

[ceph-users] radosgw-admin bucket check --fix returns a lot of errors (unable to find head object data)

2022-11-23 Thread Boris Behrens
Hi, we have a customer that got some _multipart_ files in his bucket, but the bucket got no unfinished multipart objects. So I tried to remove them via $ radosgw-admin object rm --bucket BUCKET --object=_multipart_OBJECT.qjqyT8bXiWW5jdbxpVqHxXnLWOG3koUi.1 ERROR: object remove returned: (2) No

[ceph-users] radosgw-octopus latest - NoSuchKey Error - some buckets lose their rados objects, but not the bucket index

2022-11-21 Thread Boris Behrens
Good day people, we have a very strange problem with some bucket. Customer informed us, that they had issues with objects. They are listed, but on a GET they receive "NoSuchKey" error. They did not delete anything from the bucket. We checked and `radosgw-admin bucket radoslist --bucket $BUCKET`

[ceph-users] Re: rgw multisite octopus - bucket can not be resharded after cancelling prior reshard process

2022-10-25 Thread Boris Behrens
Opened a bug on the tracker for it: https://tracker.ceph.com/issues/57919 Am Fr., 7. Okt. 2022 um 11:30 Uhr schrieb Boris Behrens : > Hi, > I just wanted to reshard a bucket but mistyped the amount of shards. In a > reflex I hit ctrl-c and waited. It looked like the resharding did not

[ceph-users] Re: rgw multisite octopus - bucket can not be resharded after cancelling prior reshard process

2022-10-24 Thread Boris Behrens
Cheers again. I am still stuck at this. Someone got an idea how to fix it? Am Fr., 7. Okt. 2022 um 11:30 Uhr schrieb Boris Behrens : > Hi, > I just wanted to reshard a bucket but mistyped the amount of shards. In a > reflex I hit ctrl-c and waited. It looked like the resharding did not

[ceph-users] Re: radosgw networking

2022-10-20 Thread Boris
AFAIK radosgw uses the public network to talk to the OSDs. You could ditch the cluster network and have the public network use the high speed cluster network connections? Maybe there is another way, which I don't know. Cheers Boris > Am 20.10.2022 um 18:58 schrieb Wyll Ingers

[ceph-users] Re: rgw multisite octopus - bucket can not be resharded after cancelling prior reshard process

2022-10-13 Thread Boris
Hi Christian, resharding is not an issue, because we only sync the metadata. Like aws s3. But this looks very broken to me, does anyone got an idea how to fix that? > Am 13.10.2022 um 11:58 schrieb Christian Rohmann > : > > Hey Boris, > >> On 07/10/2022 11:30, Boris Beh

[ceph-users] Re: octopus 15.2.17 RGW daemons begin to crash regularly

2022-10-07 Thread Boris Behrens
Hi Casey, thanks a lot. I added the full stack trace from our ceph-client log. Cheers Boris Am Do., 6. Okt. 2022 um 19:21 Uhr schrieb Casey Bodley : > hey Boris, > > that looks a lot like https://tracker.ceph.com/issues/40018 where an > exception was thrown when trying to rea

[ceph-users] rgw multisite octopus - bucket can not be resharded after cancelling prior reshard process

2022-10-07 Thread Boris Behrens
"mtime": "2022-10-07T07:16:49.231685Z", "data": { "bucket_info": { "bucket": { "name": "bucket", "marker": "ff7a8b0c-07e6-463a-861b-78f0adeba8ad.229633393

[ceph-users] Re: octopus 15.2.17 RGW daemons begin to crash regularly

2022-10-06 Thread Boris Behrens
Any ideas on this? Am So., 2. Okt. 2022 um 00:44 Uhr schrieb Boris Behrens : > Hi, > we are experiencing that the rgw daemons crash and I don't understand why, > Maybe someone here can lead me to a point where I can dig further. > > { > "backtrace": [ >

[ceph-users] Re: Convert mon kv backend to rocksdb

2022-10-04 Thread Boris Behrens
Cheers Reed, just saw this and checked on my own. Also had one mon that ran on leveldb. I just removed the mon, pulled the new monmap and deployed it. After that all was fine. Thanks for paging the ML, so I've read it :D Boris # assuming there is only one mon and you are connected to the host

[ceph-users] octopus 15.2.17 RGW daemons begin to crash regularly

2022-10-01 Thread Boris Behrens
Hi, we are experiencing that the rgw daemons crash and I don't understand why, Maybe someone here can lead me to a point where I can dig further. { "backtrace": [ "(()+0x43090) [0x7f143ca06090]", "(gsignal()+0xcb) [0x7f143ca0600b]", "(abort()+0x12b) [0x7f143c9e5859]",

[ceph-users] Public RGW access without any LB in front?

2022-09-16 Thread Boris Behrens
someone got experience with it and can share some insights? Cheers Boris -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an ema

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
efs_buffered_io to true alleviated that issue. That was on a nautilus > cluster. > > Respectfully, > > *Wes Dillingham* > w...@wesdillingham.com > LinkedIn <http://www.linkedin.com/in/wesleydillingham> > > > On Tue, Sep 13, 2022 at 10:48 AM Boris Behrens wrote: &

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
rieb Frank Schilder : > Hi Boris. > > > 3. wait some time (took around 5-20 minutes) > > Sounds short. Might just have been the compaction that the OSDs do any > ways on startup after upgrade. I don't know how to check for completed > format conversion. What I see in your M

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
JKMI5K/ > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ____ > From: Boris Behrens > Sent: 13 September 2022 11:43:20 > To: ceph-users@ceph.io > Subject: [ceph-users] laggy OSDs and staling krbd IO afte

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
I checked the cluster for other snaptrim operations and they happen all over the place, so for me it looks like they just happend to be done when the issue occured, but were not the driving factor. Am Di., 13. Sept. 2022 um 12:04 Uhr schrieb Boris Behrens : > Because someone mentio

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Because someone mentioned that the attachments did not went through I created pastebin links: monlog: https://pastebin.com/jiNPUrtL osdlog: https://pastebin.com/dxqXgqDz Am Di., 13. Sept. 2022 um 11:43 Uhr schrieb Boris Behrens : > Hi, I need you help really bad. > > we are

[ceph-users] laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Hi, I need you help really bad. we are currently experiencing a very bad cluster hangups that happen sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once 2022-09-12 in the evening) We use krbd without cephx for the qemu clients and when the OSDs are getting laggy, the krbd

[ceph-users] Re: Downside of many rgw bucket shards?

2022-08-30 Thread Boris Behrens
rn them to the >> client. And it has to do this in batches of about 1000 at a time. >> > >> > It looks like you’re expecting on the order of 10,000,000 objects in >> these buckets, so I imagine you’re not going to be listing them with any >> regularity. >&g

[ceph-users] Downside of many rgw bucket shards?

2022-08-29 Thread Boris Behrens
Hi there, I have some buckets that would require >100 shards and I would like to ask if there are any downsides to have these many shards on a bucket? Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to c

[ceph-users] large omap object in .rgw.usage pool

2022-08-26 Thread Boris Behrens
have not trimmed below two months). I also tried to increase the PGs on it, but this also did not help. For normal buckets, I just reshard, but I haven't found any resharing options for the usage log. Does anyone got a solution for it? Cheers Boris

[ceph-users] Re: Benefits of dockerized ceph?

2022-08-24 Thread Boris
Ah great. Might have missed it. Will go through the ML archive then. Cheers Boris > Am 24.08.2022 um 22:20 schrieb William Edwards : > > > There was a very long discussion about this on the mailing list not too long > ago… ___ cep

[ceph-users] Re: radosgw-admin hangs

2022-08-24 Thread Boris
Hi Magdy, maybe this helps. https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/6J5KZ7ELC7EWUS6YMKOSJ3E3JRNTHKBQ/ Cheers Boris > Am 24.08.2022 um 22:09 schrieb Magdy Tawfik : > > Hi All > > I have a cluster with 5 MON & 3 MGR 12 OSD + RGW nodes > was wo

[ceph-users] Benefits of dockerized ceph?

2022-08-24 Thread Boris
Hi, I was just asked if we can switch to dockerized ceph, because it is easier to update. Last time I tried wo use ceph orch i failed really hard to get the rgw daemon running as I would like to (IP/port/zonegroup and so on). Also I never really felt comfortable running production workload in

[ceph-users] Re: Ceph Octopus RGW 15.2.17 - files not available in rados while still in bucket index

2022-08-22 Thread Boris
in in the week Boris > Am 22.08.2022 um 05:12 schrieb Szabo, Istvan (Agoda) : > > Hi, > > So your problem has it been fixed? > > Istvan Szabo > Senior Infrastructure Engineer > --- > Agoda Services Co., Lt

[ceph-users] Re: Reserve OSDs exclusive for pool

2022-08-21 Thread Boris Behrens
Hi Anthony, oh that is cool. Does the OSD overwrite it, after restarts? Anything I would need to know, after doing this to persist it? Cheers Boris Am So., 21. Aug. 2022 um 20:55 Uhr schrieb Anthony D'Atri < anthony.da...@gmail.com>: > Set an arbitrary device class for those

[ceph-users] Reserve OSDs exclusive for pool

2022-08-21 Thread Boris
Cheers, because of this bug (https://tracker.ceph.com/issues/53585) I‘d like to reserve one SSD OSD per host exclusive for a single pool. All other pools can take whatever OSDs they want. Is this possible? Best wishes Boris ___ ceph-users mailing

[ceph-users] Re: Ceph Octopus RGW 15.2.17 - files not available in rados while still in bucket index

2022-08-21 Thread Boris Behrens
pressure from the cluster, when the GC goes nuts. Maybe this happens together. Am So., 21. Aug. 2022 um 19:34 Uhr schrieb Boris Behrens : > Cheers everybody, > > I had this issue some time ago, and we though it was fixed, but it seems > to happen again. > We have files, that get

[ceph-users] Ceph Octopus RGW 15.2.17 - files not available in rados while still in bucket index

2022-08-21 Thread Boris Behrens
. Hope someone can tell me what this is and how I can fix it. Cheers Boris Strange errors: 2022-08-18T22:04:29.538+ 7f7ba9fcb700 0 req 9033182355071581504 183.407425780s s3:complete_multipart WARNING: failed to remove object sql-backup-de:_multipart_IM_DIFFERENTIAL_22.bak.2

[ceph-users] creating OSD partition on blockdb ssd

2022-07-23 Thread Boris Behrens
Hi, I would like to use some of the blockdb ssd space for OSDs. We provide some radosgw clusters with 8TB and 16TB rotational OSDs. We added 2TB SSDs and use one SSD per 5 8TB OSDs or 3 16TB OSDs. Now there is still space left on the devices and I thought I could just create another LV of 100GB

[ceph-users] Possible customer impact on resharding radosgw bucket indexes?

2022-07-06 Thread Boris Behrens
time. So my question is: does this somehow affect customer workload, or do I put their data in danger, when I reshard and they upload files? And how do you approach this problem? Do you have a very high default for all buckets, or do you just ignore the large omap objects message? Cheers Boris

[ceph-users] Re: Ceph Octopus RGW - files vanished from rados while still in bucket index

2022-06-14 Thread Boris Behrens
es of said files I am even more clueless. Thank you for your time and your reply. Cheers Boris Am Di., 14. Juni 2022 um 18:38 Uhr schrieb J. Eric Ivancich < ivanc...@redhat.com>: > Hi Boris, > > I’m a little confused. The pastebin seems to show that you can stat " > ff7

[ceph-users] Re: Ceph Octopus RGW - files vanished from rados while still in bucket index

2022-06-13 Thread Boris Behrens
Hmm.. I will check what the user is deleting. Maybe this is it. Do you know if this bug is new in 15.2.16? I can't share the data, but I can share the metadata: https://pastebin.com/raw/T1YYLuec For the missing files I have, the multipart file is not available in rados, but the 0 byte file is.

[ceph-users] Ceph Octopus RGW - files vanished from rados while still in bucket index

2022-06-13 Thread Boris Behrens
ormed a month before the files last MTIME. Is there ANY way this could happen in some correlation with the GC, restarting/adding/removing OSDs, sharding bucket indexes, OSD crashes and other? Anything that isn't "rados -p POOL rm OBJECT"? Cheers Boris __

[ceph-users] Re: HDD disk for RGW and CACHE tier for giving beter performance

2022-05-24 Thread Boris
Hi Farhad, you can put the block.db (contains WAL and Metadata) on SSDs, when creating the OSD. Cheers - Boris > Am 24.05.2022 um 11:52 schrieb farhad kh : > > I want to save data pools for rgw on HDD disk drives And use some SSD hard > drive for the cache tier on top of it

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-04-26 Thread Boris
gt; k > Sent from my iPhone > >> On 26 Apr 2022, at 13:58, Boris Behrens wrote: >> >> The cluster contains 12x8TB OSDs without any SSDs as cache > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-04-26 Thread Boris Behrens
113, "filter_policy_name": "rocksdb.BuiltinBloomFilter"}} 2022-04-24T06:54:28.689+ 7f7c76c00700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c5d130700' had timed out after 15 2022-04-24T06:54:28.689+ 7f7c75bfe700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-04-26 Thread Boris Behrens
So, I just checked the logs on one of our smaller cluster and it looks like this error happened twice last week. The cluster contains 12x8TB OSDs without any SSDs as cache. And it started with octopus (so no upgrade from nautilus was performed) root@3cecef08a104:~# zgrep -i marked

[ceph-users] calculate rocksdb size

2022-04-25 Thread Boris Behrens
Hi, is there a way to show the utilization of cache.db devices? Cheers Boris -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe sen

[ceph-users] Re: How I disable DB and WAL for an OSD for improving 8K performance

2022-04-25 Thread Boris Behrens
`) Regarding the WAL I can't give any advice. I have never thought about tuning this option/ Cheers Boris Am Mo., 25. Apr. 2022 um 12:59 Uhr schrieb huxia...@horebdata.cn < huxia...@horebdata.cn>: > Thanks a lof, Boris. > > Do you mean that, the best practice would be to create

[ceph-users] Re: How I disable DB and WAL for an OSD for improving 8K performance

2022-04-25 Thread Boris Behrens
Hi Samuel, IIRC at least the DB (I am not sure if flash drives use the 1GB WAL) is always located on the same device as the OSD, when it is not configured somewhere else. On SSDs/NVMEs people tend to not separate the DB/WAL on other devices. Cheers Boris Am Mo., 25. Apr. 2022 um 10:09 Uhr

[ceph-users] Re: latest octopus radosgw missing cors header

2022-04-06 Thread Boris Behrens
ntral-1 Access-Control-Allow-Origin: https://example.com Vary: Origin Access-Control-Allow-Methods: GET Access-Control-Max-Age: 3000 Content-Type: application/octet-stream Date: Wed, 06 Apr 2022 09:49:59 GMT Thank you all :) Am Mi., 6. Apr. 2022 um 11:01 Uhr schrieb Boris Behrens : > Hi, >

[ceph-users] latest octopus radosgw missing cors header

2022-04-06 Thread Boris Behrens
P/1.1 200 OK Content-Length: 12 Accept-Ranges: bytes Last-Modified: Wed, 06 Apr 2022 08:33:55 GMT x-rgw-object-type: Normal ETag: "ed076287532e86365e841e92bfc50d8c" x-amz-request-id: tx0bf7cf6cbcc6c8c93-00624d55e7-895784b1-eu-central-1 Content-Type: application/octet-stream Date:

[ceph-users] radosgw metadata sync does not catch up

2022-03-29 Thread Boris Behrens
Hi, how do you handle situations like this? root@rgw-1-branch1:~# radosgw-admin sync status realm 57009437-b025-4f1a-ae1d-ec1ed75be5ca (central) zonegroup f254f1ac-e3e9-4aac-a3df-0a95b3b27cef (branch1) zone 2127caf7-07e4-49d8-9bf5-4ac298ce2225 (branch1) metadata sync

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-23 Thread Boris Behrens
for s3 payload which is pretty DB > intensive. > > > Thanks, > > Igor > On 3/23/2022 5:03 PM, Boris Behrens wrote: > > Hi Igor, > yes, I've compacted them all. > > So is there a solution for the problem, because I can imagine this happens > when we rem

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-23 Thread Boris Behrens
Hi Igor, yes, I've compacted them all. So is there a solution for the problem, because I can imagine this happens when we remove large files from s3 (we use it as backup storage for lz4 compressed rbd exports). Maybe I missed it. Cheers Boris Am Mi., 23. März 2022 um 13:43 Uhr schrieb Igor

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-23 Thread Boris Behrens
ior Infrastructure Engineer > --- > Agoda Services Co., Ltd. > e: istvan.sz...@agoda.com > --- > > > > *From:* Boris Behrens > *Sent:* Wednesday, March 23, 2022 1:29 PM >

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-23 Thread Boris Behrens
> Agoda Services Co., Ltd. > e: istvan.sz...@agoda.com > --- > > On 2022. Mar 22., at 23:34, Boris Behrens wrote: > > Email received from the internet. If in doubt, don't click any link nor > open any attachment ! > ___

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-22 Thread Boris Behrens
on it, and around 50mb/s r/w throughput) I also can not reproduce it via "ceph tell osd.NN compact", so I am not 100% sure it is the compactation. What do you mean with "grep for latency string"? Cheers Boris Am Di., 22. März 2022 um 15:53 Uhr schrieb Konstantin Shaly

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-22 Thread Boris Behrens
Norf, I missed half of the answers... * the 8TB disks hold around 80-90 PGs (16TB around 160-180) * per PG we've around 40k objects 170m objects in 1.2PiB of storage Am Di., 22. März 2022 um 09:29 Uhr schrieb Boris Behrens : > Good morning K, > > the "freshly done" host, whe

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-22 Thread Boris Behrens
m with one of the 21 OSDs, but I expect it to happen random some time in the future. Cluster got 212 OSDs and 2-3 of them get at least marked down once per day. Sometime they get marked down >3 times, so systemd hast to restart the OSD process. Cheers Boris Am Di., 22. März 2022 um 07:4

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-20 Thread Boris Behrens
d for 10 osd increase the risk for almost no gain because > the ssd is 10 times faster but has 10 times more access ! > Indeed, we did some benches with nvme for the wal db (1 nvme for ~10 > osds), and the gain was not tremendous, so we decided not use them ! > F. > > > Le

[ceph-users] radosgw-admin zonegroup synced user with colon in name is not working

2022-03-18 Thread Boris Behrens
Hi, we've got user that is called like : (please don't ask me why. I have no clue) and it got some strange behavior in the syncing process. In the master zonegroup the user looks like this: root@s3db1:~# radosgw-admin user info --uid

[ceph-users] Re: empty lines in radosgw-admin bucket radoslist (octopus 15.2.16)

2022-03-10 Thread Boris Behrens
After removing some orphan objects (4million) I pulled the radoslist again and got the exact same files with the empty line between them. Can filenames contain a newline / cr character so the radosgw-admin tool just makes a new line in the output? Am Mi., 9. März 2022 um 17:50 Uhr schrieb Boris

[ceph-users] empty lines in radosgw-admin bucket radoslist (octopus 15.2.16)

2022-03-09 Thread Boris Behrens
8ad.2071065474.2040_38/a1/fd/_450x450_90-0.jpeg Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-08 Thread Boris Behrens
this is a horrible SPOF) and reuse the SSDs as OSDs for the smaller pools in a RGW (like log and meta). How long ago did you recreate the earliest OSD? Cheers Boris Am Di., 8. März 2022 um 10:03 Uhr schrieb Francois Legrand < f...@lpnhe.in2p3.fr>: > Hi, > We also had this kind of problems aft

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-08 Thread Boris Behrens
so there are not 24 disks attached to a single controller, but I doubt this will help. Cheers Boris Am Di., 8. März 2022 um 09:10 Uhr schrieb Dan van der Ster < dvand...@gmail.com>: > Here's the reason they exit: > > 7f1605dc9700 -1 osd.97 486896 _committed_osd_maps marked

[ceph-users] octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-07 Thread Boris Behrens
Hi, we've had the problem with OSDs marked as offline since we updated to octopus and hope the problem would be fixed with the latest patch. We have this kind of problem only with octopus and there only with the big s3 cluster. * Hosts are all Ubuntu 20,04 and we've set the txqueuelen to 10k *

[ceph-users] OS suggestion for further ceph installations (centos stream, rocky, ubuntu)?

2022-02-01 Thread Boris Behrens
Hi, there was a bit of debate on the list about centos stream and if it's suitable to host something like ceph in production. So as I am now going to update our clusters I would like to know if there are any suggestions on the os. Currently we don't plan to go the docker / ceph orch route,

[ceph-users] Re: How to troubleshoot monitor node

2022-01-10 Thread Boris Behrens
I would go with the ss tool, because netstat shortens IPv6 addresses, so you don't see if it is actually listening on the correct address. Am Mo., 10. Jan. 2022 um 16:14 Uhr schrieb Janne Johansson < icepic...@gmail.com>: > modern clusters use msgr2 communications on port 3300 by default I

[ceph-users] Re: Ceph orch command hangs forever

2022-01-10 Thread Boris Behrens
the host to the monmap ceph mon getmap -o /tmp/monmap sudo -u ceph ceph-mon --cluster ceph --mkfs -i `hostname -s` --monmap /tmp/monmap systemctl restart ceph-mon.target rm /tmp/monmap 1. cleanup ceph.conf Maybe it helps. Cheers Boris Am Mo., 10. Jan. 2022 um 12:43 Uhr schrieb Boldbayar

[ceph-users] Re: restore failed ceph cluster

2021-12-09 Thread Boris Behrens
Hi Soan, does `ceph status` work? Did you use ceph-volume to initially create the OSDs (we only use this tool and create LVM OSDs)? If yes, you might bring the OSDs back up with `ceph-volume lvm activate --all` Cheers Boris Am Do., 9. Dez. 2021 um 13:48 Uhr schrieb Mini Serve : > Hi, >

[ceph-users] Re: Removing an OSD node the right way

2021-12-03 Thread Boris Behrens
Hi Samuel, I tend to set the crush-weight to 0, but I am not sure if this is the "correct" way. ceph osd crush reweight osd.0 0 After the rebelance I can remove them from crush without rebalancing. Hope that helps Cheers Boris Am Fr., 3. Dez. 2021 um 13:09 Uhr schrieb huxia...@ho

[ceph-users] Re: can't update Nautilus on Ubuntu 18.04 due to cert error

2021-11-27 Thread Boris
Hi David, you need to update your ca-certificates. And maybe you need to disable the expired root ca. apt-update upgrade ca-certificates should do the trick. Mit freundlichen Grüßen - Boris Behrens > Am 27.11.2021 um 21:39 schrieb David neal : > > Hi, > > >

[ceph-users] Re: [rgw multisite] adding lc policy to buckets in non-master zones result in 503 code

2021-11-17 Thread Boris Behrens
Is there an incompatibility between nautilus and octopus? master is octopus, the other one is nautilus. Am Mi., 17. Nov. 2021 um 11:12 Uhr schrieb Boris Behrens : > Hi, > we've set up a non replicated multisite environment. > We have one realm, with multiple zonegroups and one zone

[ceph-users] [rgw multisite] adding lc policy to buckets in non-master zones result in 503 code

2021-11-17 Thread Boris Behrens
Hi, we've set up a non replicated multisite environment. We have one realm, with multiple zonegroups and one zone per group. When I try to add a lifecycle policy to a bucket that is not located in the master zonegroup, I will receive 503 errorse from the RGW. s3cmd sometimes just hangs forever or

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-12 Thread Boris Behrens
not just use them. Just add more 8TB disks to the 4RU chassis, and when they are out of slots add another 8RU chassis with 8TB disks. Basically I am just cleaning up old technical debts and emergency decissions and want to do it now in the most optimal way with the res

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-11 Thread Boris Behrens
storing omap and xattr. > > > ср, 10 лист. 2021, 23:51 користувач Boris пише: > >> Oh. >> How would one recover from that? Sounds like it basically makes no >> difference if 2, 5 oder 10 OSD are in the blast radius. >> >> Can the omap key/values be regenerat

[ceph-users] Re: slow operation observed for _collection_list

2021-11-11 Thread Boris Behrens
Hi, are you sure this can be "solved" via offline compactation? I had a crashed OSD yesterday which was added to the cluster a couple hours ago and it was still in den process of syncing in. @Igor, did you manage to fix the problem or find a workaround. Am Do., 11. Nov. 2021 um 09:23 Uhr schrieb

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-10 Thread Boris
device > > ср, 10 лист. 2021, 11:51 користувач Boris Behrens пише: >> Hi, >> we use enterprise SSDs like SAMSUNG MZ7KM1T9. >> The work very well for our block storage. Some NVMe would be a lot nicer but >> we have some good experience with them. >>

[ceph-users] Re: slow operation observed for _collection_list

2021-11-10 Thread Boris Behrens
Did someone figure this out? We are currently facing the same issue but the OSDs more often kill themself and need to be restarted by us. This happens to OSDs that have a SSD backed block.db and OSDs that got the block.db on the bluestore device. All OSDs are rotating disk of various sizes. We've

[ceph-users] Re: large bucket index in multisite environement (how to deal with large omap objects warning)?

2021-11-10 Thread Boris Behrens
gt; > > > > Надіслано з пристрою Galaxy > > > ---- Оригінальне повідомлення > Від: mhnx > Дата: 08.11.21 13:28 (GMT+02:00) > Кому: Сергей Процун > Копія: "Szabo, Istvan (Agoda)" , Boris Behrens < > b...@kervyn.de>, Ceph Users &g

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-10 Thread Boris Behrens
-8840-a678-c2e23d38bfd6,... When the SSD fails, can I just remove the tags and restart the OSD with ceph-volume lvm activate --all? And after replacing the failed SSD readd the tags with the correct IDs? Do I need to do anything else to prepare a block.db partition? Cheers Boris Am Di., 9. Nov

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-08 Thread Boris Behrens
> That does not seem like a lot. Having SSD based metadata pools might > reduce latency though. > So block.db and block.wal doesn't make sense? I would like to have a consistent cluster. In either case I would need to remove or add SSDs, because we currently have this mixed. It does waste a lot

[ceph-users] Re: large bucket index in multisite environement (how to deal with large omap objects warning)?

2021-11-08 Thread Boris Behrens
>> multisite, >>> No index if it's possible. >>> >>> >>> >>> Szabo, Istvan (Agoda) , 5 Kas 2021 Cum, 12:30 >>> tarihinde şunu yazdı: >>> >>> > You mean prepare or reshard? >>> > Prepare: >>> > I coll

[ceph-users] Re: Question if WAL/block.db partition will benefit us

2021-11-08 Thread Boris Behrens
the rgw.meta pools on it, but it looks like a waste of space. Having a 2TB OSD in evey chassis that only handles 23GB of data. Am Mo., 8. Nov. 2021 um 12:30 Uhr schrieb Stefan Kooman : > On 11/8/21 12:07, Boris Behrens wrote: > > Hi, > > we run a larger octopus s3 cluster with only rotating

[ceph-users] Question if WAL/block.db partition will benefit us

2021-11-08 Thread Boris Behrens
into restructuring the cluster and also two other clusters. And does it make a different to have only a block.db partition or a block.db and a block.wal partition? Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph

[ceph-users] Re: large bucket index in multisite environement (how to deal with large omap objects warning)?

2021-11-05 Thread Boris Behrens
Cheers Istvan, how do you do this? Am Do., 4. Nov. 2021 um 19:45 Uhr schrieb Szabo, Istvan (Agoda) < istvan.sz...@agoda.com>: > This one you need to prepare, you beed to preshard the bucket which you > know that will hold more than millions of objects. > > I have a bucket where we store 1.2

[ceph-users] Re: large bucket index in multisite environement (how to deal with large omap objects warning)?

2021-11-05 Thread Boris Behrens
Hi Teoman, I don't sync the bucket content. It's just the metadata that get's synced. But turning off the access to our s3 is not an option, because our customer rely on it (the make backups and serve objects for their web applications through it). Am Do., 4. Nov. 2021 um 18:20 Uhr schrieb

[ceph-users] large bucket index in multisite environement (how to deal with large omap objects warning)?

2021-11-04 Thread Boris Behrens
ch addresses this issue. Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-29 Thread Boris Behrens
Hi guys, we just updated the cluster to latest octopus, but we still can not list multipart uploads if there are more than 2k multiparts. Is there any way to show the multiparts and maybe cancel them? Am Mo., 25. Okt. 2021 um 16:23 Uhr schrieb Boris Behrens : > Hi Casey, > > tha

[ceph-users] Re: upgrade OSDs before mon

2021-10-26 Thread Boris Behrens
., 26. Okt. 2021 um 15:47 Uhr schrieb Yury Kirsanov < y.kirsa...@gmail.com>: > You can downgrade any CEPH packages if you want to. Just specify the > number you'd like to go to. > > On Wed, Oct 27, 2021 at 12:36 AM Boris Behrens wrote: > >> Hi, >> I just added new

[ceph-users] upgrade OSDs before mon

2021-10-26 Thread Boris Behrens
Hi, I just added new storage to our s3 cluster and saw that ubuntu didn't priortize the nautilus package over the octopus package. Now I have 10 OSDs with octopus in a pure nautilus cluster. Can I leave it this way, or should I remove the OSDs and first upgrade the mons? Cheers Boris -- Die

[ceph-users] Re: s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-25 Thread Boris Behrens
. Okt. 2021 um 16:19 Uhr schrieb Casey Bodley : > hi Boris, this sounds a lot like > https://tracker.ceph.com/issues/49206, which says "When deleting a > bucket with an incomplete multipart upload that has about 2000 parts > uploaded, we noticed an infinite loop, which stopped s3cm

[ceph-users] s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-25 Thread Boris Behrens
Good day everybody, I just came across very strange behavior. I have two buckets where s3cmd hangs when I try to show current multipart uploads. When I use --debug I see that it loops over the same response. What I tried to fix it on one bucket: * radosgw-admin bucket check --bucket=BUCKETNAME *

[ceph-users] Re: recreate a period in radosgw

2021-10-14 Thread Boris Behrens
with realm set 7. period update; period update --commit This looks like it is correct, but I am not sure if this is the correct way. Does someone got another way to do this? Am Do., 14. Okt. 2021 um 15:44 Uhr schrieb Boris Behrens : > Hi, > is there a way to restore a deleted

[ceph-users] recreate a period in radosgw

2021-10-14 Thread Boris Behrens
Hi, is there a way to restore a deleted period? The realm, zonegroup and zone are still there, but I can't apply any changes, because the period is missing. Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email

[ceph-users] shards falling behind on multisite metadata sync

2021-10-01 Thread Boris Behrens
Hi, does someone got a quick fix for falling behin shards in the metadata sync? I can do a radosgw-admin metadata sync init and restart the rgw daemons to get a full sync, but after a day the first shards falls behind, and after two days I also get the message with "oldest incremental change not

[ceph-users] Re: debugging radosgw sync errors

2021-09-20 Thread Boris Behrens
-ae86-4dc1-b432-470b0772fded 284760 [root@s3db16 ~]# radosgw-admin mdlog list | grep name | wc -l No --period given, using current period=e8fc96f1-ae86-4dc1-b432-470b0772fded 343078 Is it safe to clear the mdlog? Am Mo., 20. Sept. 2021 um 01:00 Uhr schrieb Boris Behrens : > I just deleted the ra

[ceph-users] Re: debugging radosgw sync errors

2021-09-19 Thread Boris Behrens
., 17. Sept. 2021 um 17:54 Uhr schrieb Boris Behrens : > While searching for other things I came across this: > [root ~]# radosgw-admin metadata list bucket | grep www1 > "www1", > [root ~]# radosgw-admin metadata list bucket.instance | grep www1 > "www1:ff7a8b

[ceph-users] Re: debugging radosgw sync errors

2021-09-17 Thread Boris Behrens
While searching for other things I came across this: [root ~]# radosgw-admin metadata list bucket | grep www1 "www1", [root ~]# radosgw-admin metadata list bucket.instance | grep www1 "www1:ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.31103", "www1.company.dev", [root ~]#

[ceph-users] Re: radosgw find buckets which use the s3website feature

2021-09-17 Thread Boris Behrens
Found it: for bucket in `radosgw-admin metadata list bucket.instance | jq .[] | cut -f2 -d\"`; do if radosgw-admin metadata get --metadata-key=bucket.instance:$bucket | grep --silent website_conf; then echo $bucket fi done Am Do., 16. Sept. 2021 um 09:49 Uhr schrieb Boris Behrens :

[ceph-users] debugging radosgw sync errors

2021-09-17 Thread Boris Behrens
Hello again, as my tests with some fresh clusters answerd most of my config questions, I now wanted to start with our production cluster and the basic setup looks good, but the sync does not work: [root@3cecef5afb05 ~]# radosgw-admin sync status realm

<    1   2   3   4   >