Hello,
I am planning to setup a small Ceph cluster for testing purpose with 6 Ubuntu
nodes and have a few questions mostly regarding planning of the infra.
1) Based on the documentation the OS requirements mentions Ubuntu 18.04 LTS, is
it ok to use Ubuntu 20.04 instead or should I stick with
Mabi;
We're running Nautilus, and I am not wholly convinced of the "everything in
containers" view of the world, so take this with a small grain of salt...
1) We don't run Ubuntu, sorry. I suspect the documentation highlights 18.04
because it's the current LTS release. Personally, if I had
I've not undertaken such a large data movement,
The pgupmap script may be of use here, but assuming that its not.
But if I were, I would first take many backups of the current crush map.
I would set the noreblance and norecover flags.
Then I would verify all of the backfill settings are as
Thanks for the quick release! \o/
On Tue, 30 Mar 2021, 22:30 David Galloway, wrote:
> This is the 19th update to the Ceph Nautilus release series. This is a
> hotfix release to prevent daemons from binding to loopback network
> interfaces. All nautilus users are advised to upgrade to this
Casey,
Many thanks. That did the trick.
Regards
Marcel
Casey Bodley schreef op 2021-03-30 16:48:
this error 2039 is ERR_NO_SUCH_WEBSITE_CONFIGURATION. if you want to
access a bucket via rgw_dns_s3website_name, you have to set a website
configuration on the bucket - see
This is the 19th update to the Ceph Nautilus release series. This is a
hotfix release to prevent daemons from binding to loopback network
interfaces. All nautilus users are advised to upgrade to this release.
Notable Changes
---
* This release fixes a regression introduced in
It's a bug: https://tracker.ceph.com/issues/50060
On Wed, Dec 23, 2020 at 5:53 PM Alex Taylor wrote:
>
> Hi Patrick,
>
> Any updates? Looking forward to your reply :D
>
>
> On Thu, Dec 17, 2020 at 11:39 AM Patrick Donnelly wrote:
> >
> > On Wed, Dec 16, 2020 at 5:46 PM Alex Taylor wrote:
> > >
On 3/30/21 3:02 PM, Boris Behrens wrote:
I reweighted the OSD to .0 and then forced the backfilling.
How long does it take for ceph to free up space? I looks like it was
doing this, but it could also be the "backup cleanup job" that removed
images from the buckets.
I don't have any numbers
despite the examples that can be found on the internet I have troubles
setting up a static website that serves from a S3 bucket If anyone could
point me in the right direction that would be much appreciated
Marcel
I created an index.html in the bucket sky
gm-rc3-jumphost01@ceph/s3cmd
Ahh, right. I saw it fixed here https://tracker.ceph.com/issues/18749 a long
time ago, but it seems the back-port never happened.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Josh Baergen
Sent: 30 March 2021
I thought that recovery below min_size for EC pools wasn't expected to work
until Octopus. From the Octopus release notes: "Ceph will allow recovery
below min_size for Erasure coded pools, wherever possible."
Josh
On Tue, Mar 30, 2021 at 6:53 AM Frank Schilder wrote:
> Dear Rainer,
>
> hmm,
On 3/30/21 3:00 PM, Thomas Hukkelberg wrote:
Any thoughts or insight on how to achieve this with minimal data movement and
risk of cluster downtime would be welcome!
I would do so with Dan's "upmap-remap" script [1]. See [2] for his
presentation. We have used that quite a few times now
Sorry about the flow of messages.
I forgot to mention this. Looking at other replies, in particular, that the PG
in question remained at 4 out of 6 OSDs until you reduced min_size might
indicate that peering was blocked for some reason, but completed after the
reduction. If this was the order
Hi, this is odd. The problem with recovery when sufficiently many but less than
min_size shards are present should have been resolved with
osd_allow_recovery_below_min_size=true. It is really dangerous to reduce
min_size below k+1 and, in fact, should never be necessary for recovery. Can
you
Hi Robert,
We get a handful of verify_authorizer warnings on some of our clusters
too but they don't seem to pose any problems.
I've tried without success to debug this in the past -- IIRC I started
to suspect it was coming from old cephfs kernel clients but got
distracted and never reached the
this error 2039 is ERR_NO_SUCH_WEBSITE_CONFIGURATION. if you want to
access a bucket via rgw_dns_s3website_name, you have to set a website
configuration on the bucket - see
https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketWebsite.html
On Tue, Mar 30, 2021 at 10:05 AM Marcel Kuiper
I raised the backfillfull_ratio to .91 to see what happens, now I am
waiting. Some OSDs were around 89-91%, some are around 50-60%
The pgp_num is on 1946 since one week. I think this will solve itself, when
the cluster becomes a bit more tidy.
Am Di., 30. März 2021 um 15:23 Uhr schrieb Dan van
On 3/25/21 1:05 PM, Nico Schottelius wrote:
it seems there is no reference to it in the ceph documentation. Do you
have any pointers to it?
Not anymore with new Ceph documentation.
Out of curiosity, do you have any clue why it's not in there anymore?
It might still be, but I cannot find
You started with 1024 PGs, and are splitting to 2048.
Currently there are 1946 PGs used .. so it is nearly there at the goal.
You need to watch that value 1946 and see if it increases slowly. If
it does not increase, then those backfill_toofull PGs are probably
splitting PGs, and they are blocked
The output from ceph osd pool ls detail tell me nothing, except that the
pgp_num is not where it should be. Can you help me to read the output? How
do I estimate how long the split will take?
[root@s3db1 ~]# ceph status
cluster:
id: dca79fff-ffd0-58f4-1cff-82a2feea05f4
health:
Hi Ben,
That was beyond helpful. Thank you so much for the thoughtful and
detailed explanation. That should definitely be added to the
documentation, until/unless the dynamic resharder/sharder handle this
case (if there is even desire to do so) with versioned objects.
Respectfully,
David
On
It would be safe to turn off the balancer, yes go ahead.
To know if adding more hardware will help, we need to see how much
longer this current splitting should take. This will help:
ceph status
ceph osd pool ls detail
-- dan
On Tue, Mar 30, 2021 at 3:00 PM Boris Behrens wrote:
>
> I
I reweighted the OSD to .0 and then forced the backfilling.
How long does it take for ceph to free up space? I looks like it was doing
this, but it could also be the "backup cleanup job" that removed images
from the buckets.
Am Di., 30. März 2021 um 14:41 Uhr schrieb Stefan Kooman :
> On
Any time frame on 14.2.19?
On Fri, Mar 26, 2021, 1:43 AM Konstantin Shalygin wrote:
> Finally master is merged now
>
>
> k
>
> Sent from my iPhone
>
> > On 25 Mar 2021, at 23:09, Simon Oosthoek
> wrote:
> >
> > I'll wait a bit before upgrading the remaining nodes. I hope 14.2.19
> will be
I would think due to splitting, because the balancer doesn't refuses it's
work, because to many misplaced objects.
I also think to turn it off for now, so it doesn't begin it's work at 5%
missplaced objects.
Would adding more hardware help? We wanted to insert another OSD node with
7x8TB disks
Hi all!
We run a 1.5PB cluster with 12 hosts, 192 OSDs (mix of NVMe and HDD) and need
to improve our failure domain by altering the crush rules and moving rack to
pods, which would imply a lot of data movement.
I wonder what would the preferred order of operations be when doing such
changes
Hello,
in between ceph is runing again normally, except for the two osds that
are down because of the failed disks.
What really helped in my situation was to lower min_size from 5 (k+1)
to 4 in my 4+2 erasure code setup. So I am also greatful for the
programmer who put the helping hint in
Are those PGs backfilling due to splitting or due to balancing?
If it's the former, I don't think there's a way to pause them with
upmap or any other trick.
-- dan
On Tue, Mar 30, 2021 at 2:07 PM Boris Behrens wrote:
>
> One week later the ceph is still balancing.
> What worries me like hell is
One week later the ceph is still balancing.
What worries me like hell is the %USE on a lot of those OSDs. Does ceph
resolv this on it's own? We are currently down to 5TB space in the cluster.
Rebalancing single OSDs doesn't work well and it increases the "missplaced
objects".
I thought about
Hi,
On 30.03.21 13:05, Rainer Krienke wrote:
Hello,
yes your assumptions are correct pxa-rbd ist the metadata pool for
pxa-ec which uses a erasure coding 4+2 profile.
In the last hours ceph repaired most of the damage. One inactive PG
remained and in ceph health detail then told me:
Hello Frank,
the option is actually set. On one of my monitors:
# ceph daemon /var/run/ceph/ceph-mon.*.asok config show|grep
osd_allow_recovery_below_min_size
"osd_allow_recovery_below_min_size": "true",
Thank you very much
Rainer
Am 30.03.21 um 13:20 schrieb Frank Schilder:
Hi, this
Hello,
yes your assumptions are correct pxa-rbd ist the metadata pool for
pxa-ec which uses a erasure coding 4+2 profile.
In the last hours ceph repaired most of the damage. One inactive PG
remained and in ceph health detail then told me:
-
HEALTH_WARN Reduced data availability: 1
I just move one PG away from the OSD, but the diskspace will not get freed.
Do I need to do something to clean obsolete objects from the osd?
Am Di., 30. März 2021 um 11:47 Uhr schrieb Boris Behrens :
> Hi,
> I have a couple OSDs that currently get a lot of data, and are running
> towards 95%
Hi everyone,
I didn't get enough responses on the previous Doodle to schedule a
meeting. I'm wondering if people are OK with the previous PDF I
released or if there's interest in the community to develop better
survey results?
https://ceph.io/community/ceph-user-survey-2019/
On Mon, Mar 22,
Hi,
I have a couple OSDs that currently get a lot of data, and are running
towards 95% fillrate.
I would like to forcefully remap some PGs (they are around 100GB) to more
empty OSDs and drop them from the full OSDs. I know this would lead to
degraded objects, but I am not sure how long the
Hi,
from what you've sent my conclusion about the stalled I/O would be
indeed the min_size of the EC pool.
There's only one PG reported as incomplete, I assume that is the EC
pool, not the replicated pxa-rbd, right? Both pools are for rbd so I'm
guessing the rbd headers are in pxa-rbd
Hi David,
On Tuesday, March 30th, 2021 at 00:50, David Orman wrote:
> Sure enough, it is more than 200,000, just as the alert indicates.
> However, why did it not reshard further? Here's the kicker - we only
> see this with versioned buckets/objects. I don't see anything in the
> documentation
Hello,
i run a ceph Nautilus cluster with 9 hosts and 144 OSDs. Last night we
lost two disks, so two OSDs (67,90) are down. The two disks are on two
different hosts. A third ODS on a third host repotrts slow ops. ceph is
repairing at the moment.
Pools affected are eg these ones:
pool 35
I have this deep scrub issue in the index pool's pg's almost every week which
made the cluster health error so I always need to repair that pg :/
Any solution that you have found so far?
Istvan Szabo
Senior Infrastructure Engineer
---
Agoda
39 matches
Mail list logo