Hi,
I am looking for some experience on how people make their RGW public.
Currently we use the follow:
3 IP addresses that get distributed via keepalived between three HAproxy
instances, which then balance to three RGWs.
The caveat is, that keepalived is PITA to get working in distributing a set
anks for your help, I'm very stuck because the data is present but I
> don't know how to add the old osd in the cluster to recover the data.
>
>
>
> Le jeu. 2 nov. 2023 à 11:55, Boris Behrens a écrit :
>
>> Hi Mohamed,
>> are all mons down, or do you still have at leas
Hi Mohamed,
are all mons down, or do you still have at least one that is running?
AFAIK: the mons save their DB on the normal OS disks, and not within the
ceph cluster.
So if all mons are dead, which mean the disks which contained the mon data
are unrecoverable dead, you might need to bootstrap a
og bucket name at level 1.
>
> Cheers, Dan
>
> --
> Dan van der Ster
> CTO
>
> Clyso GmbH
> p: +49 89 215252722 | a: Vancouver, Canada
> w: https://clyso.com | e: dan.vanders...@clyso.com
>
> Try our Ceph Analyzer: https://analyzer.clyso.com
>
> On Thu, Mar 30,
Hi,
did someone have a solution ready to monitor traffic by IP address?
Cheers
Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
ug
> reports to improve it.
>
> Zitat von Boris Behrens :
>
> > Hi,
> > I've just upgraded to our object storages to the latest pacific version
> > (16.2.14) and the autscaler is acting weird.
> > On one cluster it just shows nothing:
> > ~# ceph osd pool autoscal
Also found what the 2nd problem was:
When there are pools using the default replicated_ruleset while there are
multiple rulesets with differenct device classes, the autoscaler does not
produce any output.
Should I open a bug for that?
Am Mi., 4. Okt. 2023 um 14:36 Uhr schrieb Boris Behrens
Found the bug for the TOO_MANY_PGS: https://tracker.ceph.com/issues/62986
But I am still not sure, why I don't have any output on that one cluster.
Am Mi., 4. Okt. 2023 um 14:08 Uhr schrieb Boris Behrens :
> Hi,
> I've just upgraded to our object storages to the latest pacific version
>
Hi,
I've just upgraded to our object storages to the latest pacific version
(16.2.14) and the autscaler is acting weird.
On one cluster it just shows nothing:
~# ceph osd pool autoscale-status
~#
On the other clusters it shows this when it is set to warn:
~# ceph health detail
...
[WRN]
Hi,
is it possible to use one cephx key for multiple parallel running RGW?
Maybe I could just use the same 'name' and the same key for all of the RGW
instances?
I plan to start RGWs all over the place in container and let BGP handle the
traffic. But I don't know how to create on demand keys, that
I have a use case where I want to only use a small portion of the disk for
the OSD and the documentation states that I can use
data_allocation_fraction [1]
But cephadm can not use this and throws this error:
/usr/bin/podman: stderr ceph-volume lvm batch: error: unrecognized
arguments:
> >
> > I don’t have time to look into all the details, but I’m wondering how
> you seem to be able to start mgr services with the orchestrator if all mgr
> daemons are down. The orchestrator is a mgr module, so that’s a bit weird,
> isn’t it?
> >
> > Zitat von Boris B
a node where I had to "play around" a bit with removed and
> redeployed osd containers. At some point they didn't react to
> systemctl commands anymore, but a reboot fixed that. But I haven't
> seen that in a production cluster yet, so some more details would be
> useful.
Hi,
is there a way to have the pods start again after reboot?
Currently I need to start them by hand via ceph orch start mon/mgr/osd/...
I imagine this will lead to a lot of headache when the ceph cluster gets a
powercycle and the mon pods will not start automatically.
I've spun up a test
none
*
global advanced auth_service_required
none
Am Fr., 15. Sept. 2023 um 13:01 Uhr schrieb Boris Behrens :
> Oh, we found the issue. A very old update was stuck in the pipeline. We
> canceled it and then the correct images got
.0cc47a6df330@-1(probing)
e0 handle_auth_bad_method hmm, they didn't like 2 result (95) Operation not
supported
I added the mon via:
ceph orch daemon add mon FQDN:[IPv6_address]
Am Fr., 15. Sept. 2023 um 09:21 Uhr schrieb Boris Behrens :
> Hi Stefan,
>
> the cluster is running 17.6.
alling the hosts, but as I have to adopt 17 clusters to the
orchestrator, I rather get some learnings from the not working thing :)
Am Fr., 15. Sept. 2023 um 08:26 Uhr schrieb Stefan Kooman :
> On 14-09-2023 17:49, Boris Behrens wrote:
> > Hi,
> > I currently try to adopt our
Hi,
I currently try to adopt our stage cluster, some hosts just pull strange
images.
root@0cc47a6df330:/var/lib/containers/storage/overlay-images# podman ps
CONTAINER ID IMAGE COMMAND
CREATEDSTATUSPORTS NAMES
ility of both: old and new network, until end of migration
>
> k
> Sent from my iPhone
>
> > On 22 Aug 2023, at 10:43, Boris Behrens wrote:
> >
> > The OSDs are still only bound to one IP address.
>
>
--
Die Selbsthilfegruppe "UTF-
IP,
> I'm not aware of a way to have them bind to multiple public IPs like
> the MONs can. You'll probably need to route the compute node traffic
> towards the new network. Please correct me if I misunderstood your
> response.
>
> Zitat von Boris Behrens :
>
> > The OSDs ar
o have both old and new network in there, but I'd try on one
> host first and see if it works.
>
> Zitat von Boris Behrens :
>
> > We're working on the migration to cephadm, but it requires some
> > prerequisites that still needs planing.
> >
> > root@host:~#
via cephadm /
> > orchestrator.
>
> I just assumed that with Quincy it already would be managed by
> cephadm. So what does the ceph.conf currently look like on an OSD host
> (mask sensitive data)?
>
> Zitat von Boris Behrens :
>
> > Hey Eugen,
> > I don't ha
tps://www.spinics.net/lists/ceph-users/msg75162.html
> [2]
>
> https://docs.ceph.com/en/quincy/cephadm/services/mon/#moving-monitors-to-a-different-network
>
> Zitat von Boris Behrens :
>
> > Hi,
> > I need to migrate a storage cluster to a new network.
> >
> > I adde
Hi,
I need to migrate a storage cluster to a new network.
I added the new network to the ceph config via:
ceph config set global public_network "old_network/64, new_network/64"
I've added a set of new mon daemons with IP addresses in the new network
and they are added to the quorum and seem to
Hi Goetz,
I've done the same, and went to Octopus and to Ubuntu. It worked like a
charm and with pip, you can get the pecan library working. I think I did it
with this:
yum -y install python36-six.noarch python36-PyYAML.x86_64
pip3 install pecan werkzeug cherrypy
Worked very well, until we got
Are there any ideas how to work with this?
We disabled the logging so we do not run our of diskspace, but the rgw
daemon still requires A LOT of cpu because of this.
Am Mi., 21. Juni 2023 um 10:45 Uhr schrieb Boris Behrens :
> I've update the dc3 site from octopus to pacific and the prob
gt;
> The following command extract all their ids
>
> ceph service dump -f json-pretty | jq '.services.rgw.daemons' | egrep -e
>> 'gid' -e '\"id\"'
>>
>
> Best Regards,
> Mahnoosh
>
> On Mon, Jul 3, 2023 at 3:00 PM Boris Behrens wrote:
>
>
Hi,
might be a dump question, but is there a way to list the rgw instances that
are running in a ceph cluster?
Before pacific it showed up in `ceph status` but now it only tells me how
many daemons are active, now which daemons are active.
ceph orch ls tells me that I need to configure a backend
So basically it does not matter unless I want to have that split up.
Thanks for all the answers.
I am still lobbying to phase out SATA SSDs and replace them with NVME
disks. :)
Am Mi., 28. Juni 2023 um 18:14 Uhr schrieb Anthony D'Atri <
a...@dreamsnake.net>:
> Even when you factor in density,
Hi,
is it a problem that the device class for all my disks is SSD even all of
these disks are NVME disks? If it is just a classification for ceph, so I
can have pools on SSDs and NVMEs separated I don't care. But maybe ceph
handles NVME disks differently internally?
I've added them via
I've update the dc3 site from octopus to pacific and the problem is still
there.
I find it very weird that in only happens from one single zonegroup to the
master and not from the other two.
Am Mi., 21. Juni 2023 um 01:59 Uhr schrieb Boris Behrens :
> I recreated the site and the problem st
I currently think I made a
> mistake in the process.
>
> Mit freundlichen Grüßen
> - Boris Behrens
>
> > Am 20.06.2023 um 18:30 schrieb Casey Bodley :
> >
> > hi Boris,
> >
> > we've been investigating reports of excessive polling from metadata
&
Hi,
yesterday I added a new zonegroup and it looks like it seems to cycle over
the same requests over and over again.
In the log of the main zone I see these requests:
2023-06-20T09:48:37.979+ 7f8941fb3700 1 beast: 0x7f8a602f3700:
fd00:2380:0:24::136 - - [2023-06-20T09:48:37.979941+]
E:OLD_BUCKET_ID <
bucket.instance:BUCKET_NAME:NEW_BUCKET_ID.json
Am Do., 27. Apr. 2023 um 13:32 Uhr schrieb Boris Behrens :
> To clarify a bit:
> The bucket data is not in the main zonegroup.
> I wanted to start the reshard in the zonegroup where the bucket and the
> data is located, but rgw told me to
les
Am Do., 27. Apr. 2023 um 13:08 Uhr schrieb Boris Behrens :
> Hi,
> I just resharded a bucket on an octopus multisite environment from 11 to
> 101.
>
> I did it on the master zone and it went through very fast.
> But now the index is empty.
>
> The files are still there
Hi,
I just resharded a bucket on an octopus multisite environment from 11 to
101.
I did it on the master zone and it went through very fast.
But now the index is empty.
The files are still there when doing a radosgw-admin bucket radoslist
--bucket-id
Do I just need to wait or do I need to
Cheers Dan,
would it be an option to enable the ops log? I still didn't figure out how
it is actually working.
But I am also thinking to move to the logparsing in HAproxy and disable the
access log on the RGW instances.
Am Mi., 26. Apr. 2023 um 18:21 Uhr schrieb Dan van der Ster <
Thanks Janne, I will hand that to the customer.
> Look at https://community.veeam.com/blogs-and-podcasts-57/sobr-veeam
> -capacity-tier-calculations-and-considerations-in-v11-2548
> for "extra large blocks" to make them 8M at least.
> We had one Veeam installation vomit millions of files onto our
We have a customer that tries to use veeam with our rgw objectstorage and
it seems to be blazingly slow.
What also seems to be strange, that veeam sometimes show "bucket does not
exist" or "permission denied".
I've tested parallel and everything seems to work fine from the s3cmd/aws
cli
I don't think you can exclude that.
We've build a notification in the customer panel that there are incomplete
multipart uploads which will be added as space to the bill. We also added a
button to create a LC policy for these objects.
Am Di., 11. Apr. 2023 um 19:07 Uhr schrieb :
> The
x_buckets": 1000, and those users have the same access_denied issue
> when creating a bucket.
>
> We also tried other bucket names and it is the same issue.
>
> On Thu, Mar 30, 2023 at 6:28 PM Boris Behrens wrote:
>
>> Hi Kamil,
>> is this with all new buckets o
Hi,
you might suffer from the same bug we suffered:
https://tracker.ceph.com/issues/53729
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/KG35GRTN4ZIDWPLJZ5OQOKERUIQT5WQ6/#K45MJ63J37IN2HNAQXVOOT3J6NTXIHCA
Basically there is a bug that prevents the removal of PGlog items. You need
Hi Nicola, can you send the output of
ceph osd df tree
ceph df
?
Cheers
Boris
Am Do., 30. März 2023 um 16:36 Uhr schrieb Nicola Mori :
> Dear Ceph users,
>
> my cluster is made up of 10 old machines, with uneven number of disks and
> disk size. Essentially I have just one big data pool (6+2
Hi Kamil,
is this with all new buckets or only the 'test' bucket? Maybe the name is
already taken?
Can you check s3cmd --debug if you are connecting to the correct endpoint?
Also I see that the user seems to not be allowed to create bukets
...
"max_buckets": 0,
...
Cheers
Boris
Am Do., 30.
frastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
> On 2023. Mar 30., at 17:44, Boris Behrens wrote:
>
> Email received from the internet. If in doubt, d
Bringing up that topic again:
is it possible to log the bucket name in the rgw client logs?
currently I am only to know the bucket name when someone access the bucket
via https://TLD/bucket/object instead of https://bucket.TLD/object.
Am Di., 3. Jan. 2023 um 10:25 Uhr schrieb Boris Behrens
. After idling over night
it is back up to 120 IOPS
Am Do., 30. März 2023 um 09:45 Uhr schrieb Boris Behrens :
> After some digging in the nautilus cluster I see that the disks with the
> exceptional high IOPS performance are actually SAS attached NVME disks
> (these
(4h resolution) goes up again (2023-03-01 upgrade to pacific,
the dip around 25th was the redeploy and now it seems to go up again)
[image: image.png]
Am Mo., 27. März 2023 um 17:24 Uhr schrieb Igor Fedotov <
igor.fedo...@croit.io>:
>
> On 3/27/2023 12:19 PM, Boris Behrens wrote:
Hey Igor,
we are currently using these disks - all SATA attached (is it normal to
have some OSDs without waer counter?):
# ceph device ls | awk '{print $1}' | cut -f 1,2 -d _ | sort | uniq -c
18 SAMSUNG_MZ7KH3T8 (4TB)
126 SAMSUNG_MZ7KM1T9 (2TB)
24 SAMSUNG_MZ7L37T6 (8TB)
1
?
@marc
If I interpret the linked bug correctly, you might want to have the
metadata on an SSD, because the write aplification might hit very hard on
HDDs. But maybe someone else from the mailing list can say more about it.
Cheers
Boris
Am Mi., 22. März 2023 um 22:45 Uhr schrieb Boris Behrens :
>
warning - I presume this might be caused by newer RocksDB
> version running on top of DB with a legacy format.. Perhaps redeployment
> would fix that...
>
>
> Thanks,
>
> Igor
> On 3/21/2023 5:31 PM, Boris Behrens wrote:
>
> Hi Igor,
> i've offline compacted all t
Might be. Josh also pointed in that direction. I currently search for ways
to mitigate it.
Am Mi., 22. März 2023 um 10:30 Uhr schrieb Konstantin Shalygin <
k0...@k0ste.ru>:
> Hi,
>
>
> Maybe [1] ?
>
>
> [1] https://tracker.ceph.com/issues/58530
> k
>
> On
>=5.
Am Di., 21. März 2023 um 10:46 Uhr schrieb Igor Fedotov <
igor.fedo...@croit.io>:
> Hi Boris,
>
> additionally you might want to manually compact RocksDB for every OSD.
>
>
> Thanks,
>
> Igor
> On 3/21/2023 12:22 PM, Boris Behrens
Hi Istvan,
I currently make the move from centos7 to ubuntu18.04 (we want to jump
directly from nautilus to pacific), When everything in the cluster got the
same version, and the version is available on the new OS you can just
reinstall the hosts with the new OS.
With the mons, I remove the
?
Cheers
Boris
Am Di., 28. Feb. 2023 um 22:46 Uhr schrieb Boris Behrens :
> Hi Josh,
> thanks a lot for the breakdown and the links.
> I disabled the write cache but it didn't change anything. Tomorrow I will
> try to disable bluefs_buffered_io.
>
> It doesn't sound that I can mi
Ha, found the error and now I feel just a tiny bit stupid:
haproxy did not add the X-Forwarded-Proto header.
Am Fr., 17. März 2023 um 12:03 Uhr schrieb Boris Behrens :
> Hi,
> I try to evaluate SSE-C (so customer provides keys) for our object
> storages.
> We do not provide
Hi,
I try to evaluate SSE-C (so customer provides keys) for our object storages.
We do not provide a KMS server.
I've added "Access-Control-Allow-Headers" to the haproxy frontend.
rspadd Access-Control-Allow-Headers...
x-amz-server-side-encryption-customer-algorithm,\
Maybe worth to mention, because it caught me by surprise:
Ubuntu creates a swap file (/swap.img) if you do not specify a swap
partition (check /etc/fstab).
Cheers
Boris
Am Mi., 15. März 2023 um 22:11 Uhr schrieb Anthony D'Atri <
a...@dreamsnake.net>:
>
> With CentOS/Rocky 7-8 I’ve observed
Hi,
we've observed 500er errors on uploading files to a single bucket, but the
problem went away after around 2 hours.
We've checked and saw the following error message:
2023-03-08T17:55:58.778+ 7f8062f15700 0 WARNING: set_req_state_err
err_no=125 resorting to 500 2023-03-08T17:55:58.778+
mething else to consider is
>
> https://docs.ceph.com/en/quincy/start/hardware-recommendations/#write-caches
> ,
> as sometimes disabling these write caches can improve the IOPS
> performance of SSDs.
>
> Josh
>
> On Tue, Feb 28, 2023 at 7:19 AM Boris Behrens wrote:
>
(RBD, etc.)?
>
> Josh
>
> On Tue, Feb 28, 2023 at 6:51 AM Boris Behrens wrote:
> >
> > Hi,
> > today I did the first update from octopus to pacific, and it looks like
> the
> > avg apply latency went up from 1ms to 2ms.
> >
> > All 36 OSDs are 4TB SSDs
Hi,
today I did the first update from octopus to pacific, and it looks like the
avg apply latency went up from 1ms to 2ms.
All 36 OSDs are 4TB SSDs and nothing else changed.
Someone knows if this is an issue, or am I just missing a config value?
Cheers
Boris
.
Is there anything I can do with an octopus cluster, or is the only way to
upgrade?
And why does it happen?
Am Di., 21. Feb. 2023 um 18:31 Uhr schrieb Boris Behrens :
> Thanks a lot Josh. That really seems like my problem.
> That does not look healthy in the cluster. oof.
> ~# ceph tell osd.* perf d
"osd_pglog_bytes": 541849048,
"osd_pglog_items": 3880437,
...
Am Di., 21. Feb. 2023 um 18:21 Uhr schrieb Josh Baergen <
jbaer...@digitalocean.com>:
> Hi Boris,
>
> This sounds a bit like https://tracker.ceph.com/issues/53729.
> https://tracker.c
Hi,
today I wanted to increase the PGs from 2k -> 4k and random OSDs went
offline in the cluster.
After some investigation we saw, that the OSDs got OOM killed (I've seen a
host that went from 90GB used memory to 190GB before OOM kills happen).
We have around 24 SSD OSDs per host and
Hi,
we've encountered the same issue after upgrading to octopus on on of our
rbd cluster, and now it reappears after the autoscaler lowered the PGs form
8k to 2k for the RBD pool.
What we've done in the past:
- recreate all OSD after our 2nd incident with slow OPS in a single week
after the ceph
I've tried it the other way around and let cat give out all escaped chars
and the did the grep:
# cat -A omapkeys_list | grep -aFn '/'
9844:/$
9845:/^@v913^@$
88010:M-^@1000_/^@$
128981:M-^@1001_/$
Did anyone ever saw something like this?
Am Mo., 13. Feb. 2023 um 14:31 Uhr schrieb Boris Behrens
rminal)
<80>1000_//^@
Any idea what this is?
Am Mo., 13. Feb. 2023 um 13:57 Uhr schrieb Boris Behrens :
> Hi,
> I have one bucket that showed up with a large omap warning, but the amount
> of objects in the bucket, does not align with the amount of omap keys. The
> buck
Hi,
I have one bucket that showed up with a large omap warning, but the amount
of objects in the bucket, does not align with the amount of omap keys. The
bucket is sharded to get rid of the "large omapkeys" warning.
I've counted all the omapkeys of one bucket and it came up with 33.383.622
(rados
Hi Casey,
changes to the user's default placement target/storage class don't
> apply to existing buckets, only newly-created ones. a bucket's default
> placement target/storage class can't be changed after creation
>
so I can easily update the placement rules for this user and can migrate
Hi,
we use rgw as our backup storage, and it basically holds only compressed
rbd snapshots.
I would love to move these out of the replicated into a ec pool.
I've read that I can set a default placement target for a user (
https://docs.ceph.com/en/octopus/radosgw/placement/). What does happen to
Hmm.. I ran into some similar issue.
IMHO there are two ways to work around the problem until the new disk in
place:
1. change the backfill full threshold (I use these commands:
https://www.suse.com/support/kb/doc/?id=19724)
2. reweight the backfill full OSDs just a little bit, so they move
Hi,
since last week the scrubbing results in large omap warning.
After some digging I've got these results:
# searching for indexes with large omaps:
$ for i in `rados -p eu-central-1.rgw.buckets.index ls`; do
rados -p eu-central-1.rgw.buckets.index listomapkeys $i | wc -l | tr -d
'\n' >>
Hi,
I am just reading through this document (
https://docs.ceph.com/en/octopus/radosgw/config-ref/) and on the top is
states:
The following settings may added to the Ceph configuration file (i.e.,
> usually ceph.conf) under the [client.radosgw.{instance-name}] section.
>
And my ceph.conf looks
Hi,
I am currently trying to figure out how to resolve the
"large objects found in pool 'rgw.usage'"
error.
In the past I trimmed the usage log, but now I am at the point that I need
to trim it down to two weeks.
I checked and amount of omapkeys and the distribution is quite off:
# for OBJECT
I actually do not mind if i need to scroll up a line, but I also think
it is a good idea to remove it.
Am Mo., 9. Jan. 2023 um 11:06 Uhr schrieb Frank Schilder :
>
> Hi John,
>
> firstly, image attachments are filtered out by the list. How about you upload
> the image somewhere like
Hi Andrei,
happy new year to you too.
The file might be already removed.
You can check if the radosobject is there with `rados -p ls ...`
You can also check if the file is is still in the bucket with
`radosgw-admin bucket radoslist --bucket BUCKET`
Cheers
Boris
Am Di., 3. Jan. 2023 um 13:47
Hi,
I am looking forward to move our logs from
/var/log/ceph/ceph-client...log to our logaggregator.
Is there a way to have the bucket name in the log file?
Or can I write the rgw_enable_ops_log into a file? Maybe I could work with this.
Cheers and happy new year
Boris
gt; On Wed, Dec 7, 2022 at 6:10 PM Boris wrote:
>>
>> Hi Jakub,
>>
>> the problem is in our case that we hit this bug
>> (https://tracker.ceph.com/issues/53585) and the GC leads to this problem.
>>
>> We worked around this, by moving the GC to separate d
Hi,
we had an issue with an old cluster, where we put disks from one host
to another.
We destroyed the disks and added them as new OSDs, but since then the
mgr daemon were restarting in 120s intervals.
I tried to debug it a bit, and it looks like the balancer is the problem.
I tried to disable it
g help *rgw_multipart_part_upload_limit*
> rgw_multipart_part_upload_limit - Max number of parts in multipart upload
> (int, advanced)
> Default: 1
> Can update at runtime: true
> Services: [rgw]
>
> *rgw_max_put_size* is set in bytes.
>
> Regards,
> Eric.
>
> On Fr
Hi,
is it possible to somehow limit the maximum file/object size?
I've read that I can limit the size of multipart objects and the amount of
multipart objects, but I would like to limit the size of each object in the
index to 100GB.
I haven't found a config or quota value, that would fit.
Hello together,
@Alex: I am not sure for what to look in /sys/block//device
There are a lot of files.Is there anything I should check in particular?
You have sysfs access in /sys/block//device - this will show a lot
> of settings. You can go to this directory on CentOS vs. Ubuntu, and see if
>
<
icepic...@gmail.com>:
> Perhaps run "iostat -xtcy 5" on the OSD hosts to
> see if any of the drives have weirdly high utilization despite low
> iops/requests?
>
>
> Den tis 6 dec. 2022 kl 10:02 skrev Boris Behrens :
> >
> > Hi Sven,
> > I am
schrieb Sven Kieske :
> On Sa, 2022-12-03 at 01:54 +0100, Boris Behrens wrote:
> > hi,
> > maybe someone here can help me to debug an issue we faced today.
> >
> > Today one of our clusters came to a grinding halt with 2/3 of our OSDs
> > reporting slow ops.
> &
Something has got to be there,
>> which makes the problem go away.
>> --
>> Alex Gorbachev
>> https://alextelescope.blogspot.com
>>
>>
>>
>> On Sun, Dec 4, 2022 at 6:08 AM Boris Behrens wrote:
>>
>> > Hi Alex,
>> > I am searching for a log line tha
@Alex:
the issue is done for now, but I fear it might come back sometime. The
cluster was running fine for months.
I check if we can restart the switches easily. Host reboots should also be
no problem.
There is no "implicated OSD" message in the logs.
All OSDs were recreated 3 months ago. (sync
Hi,
I am just evaluating out cluster configuration again, because we had an
very bad incident with laggy OSDs that shut down the entire cluster.
We use datacenter SSDs in different sizes (2, 4, 8TB) and someone said,
that I should not go beyond a specific amount of PGs on certain device
classes.
Hi Alex,
I am searching for a log line that points me in the right direction. From
what I've seen, I could find a specific Host, OSD, PG that was leading to
this problem.
But maybe I am looking at the wrong logs.
I have around 150k lines that look like this:
hi,
maybe someone here can help me to debug an issue we faced today.
Today one of our clusters came to a grinding halt with 2/3 of our OSDs
reporting slow ops.
Only option to get it back to work fast, was to restart all OSDs daemons.
The cluster is an octopus cluster with 150 enterprise SSD
ozuCYXDKYvhkW5RiZUxuaNfu48C.365_1--
--ff7a8b0c-07e6-463a-861b-78f0adeba8ad.2339856956.63__multipart_8cfd0bdb-05f9-40cd-a50d-83295b416ea9.lz4.CwlAWozuCYXDKYvhkW5RiZUxuaNfu48C.365--
Am Fr., 2. Dez. 2022 um 12:17 Uhr schrieb Boris Behrens :
> Hi,
> we are currently encountering a lot of broken
like
> "c44a7aab-e086-43df-befe-ed8151b3a209.4147.1_obj1”.
>
> 3. grep through the logs for the head object and see if you find anything.
>
> Eric
> (he/him)
>
> On Nov 22, 2022, at 10:36 AM, Boris Behrens wrote:
>
> Does someone have an idea what I can ch
Hi,
we are currently encountering a lot of broken / orphan multipart uploads.
When I try to fetch the multipart uploads via s3cmd, it just never finishes.
Debug output looks like this and it basically never changes.
DEBUG: signature-v4 headers: {'x-amz-date': '20221202T105838Z',
'Authorization':
ere, but
now I don't care. (I also have this for a healthy bucket, where I test
stuff like this prior, which gets recreated periodically)
Am Mi., 23. Nov. 2022 um 12:22 Uhr schrieb Boris Behrens :
> Hi,
> we have a customer that got some _multipart_ files in his bucket, but the
> bucket g
Hi,
we have a customer that got some _multipart_ files in his bucket, but the
bucket got no unfinished multipart objects.
So I tried to remove them via
$ radosgw-admin object rm --bucket BUCKET
--object=_multipart_OBJECT.qjqyT8bXiWW5jdbxpVqHxXnLWOG3koUi.1
ERROR: object remove returned: (2) No
Good day people,
we have a very strange problem with some bucket.
Customer informed us, that they had issues with objects. They are listed,
but on a GET they receive "NoSuchKey" error.
They did not delete anything from the bucket.
We checked and `radosgw-admin bucket radoslist --bucket $BUCKET`
Opened a bug on the tracker for it: https://tracker.ceph.com/issues/57919
Am Fr., 7. Okt. 2022 um 11:30 Uhr schrieb Boris Behrens :
> Hi,
> I just wanted to reshard a bucket but mistyped the amount of shards. In a
> reflex I hit ctrl-c and waited. It looked like the resharding did not
Cheers again.
I am still stuck at this. Someone got an idea how to fix it?
Am Fr., 7. Okt. 2022 um 11:30 Uhr schrieb Boris Behrens :
> Hi,
> I just wanted to reshard a bucket but mistyped the amount of shards. In a
> reflex I hit ctrl-c and waited. It looked like the resharding did not
d a socket's remote_endpoint().
> i didn't think that local_endpoint() could fail the same way, but i've
> opened https://tracker.ceph.com/issues/57784 to track this and the fix
> should look the same
>
> On Thu, Oct 6, 2022 at 12:12 PM Boris Behrens wrote:
> >
> > Any ideas o
Hi,
I just wanted to reshard a bucket but mistyped the amount of shards. In a
reflex I hit ctrl-c and waited. It looked like the resharding did not
finish so I canceled it, and now the bucket is in this state.
How can I fix it. It does not show up in the stale-instace list. It's also
a multisite
1 - 100 of 288 matches
Mail list logo