I upgraded a [1] cluster from Jewel 10.2.7 to Luminous 12.2.2 and last week
I added 2 nodes to the cluster. The backfilling has been ATROCIOUS. I
have OSDs consistently [2] segfaulting during recovery. There's no pattern
of which OSDs are segfaulting, which hosts have segfaulting OSDs, etc...
on the RAID controller
with each drive as it's own RAID-0 has positive performance results. This is
something to try and see if you can regain some of the performance, but as
always in storage, YMMV.
David Byte
Sr. Technology Strategist
SCE Enterprise Linux
SCE Enterprise Storage
Alliances and SUSE
attributes are getting
corrupted. All the errors are on shard 0. My testing shows that repair
will fix this scenario.
David
On 3/13/18 3:48 PM, Graham Allan wrote:
Updated cluster now to 12.2.4 and the cycle of
inconsistent->repair->unfound seems to continue, though possibly
slightly differ
Thanks, John. I'm pretty sure the root of my slow OSD issues is filestore
subfolder splitting.
On Wed, Mar 14, 2018 at 2:17 PM, John Spray <jsp...@redhat.com> wrote:
> On Tue, Mar 13, 2018 at 7:17 PM, David C <dcsysengin...@gmail.com> wrote:
> > Hi All
> >
On Mon, Feb 26, 2018 at 6:08 PM, David Turner <drakonst...@gmail.com> wrote:
> The slow requests are absolutely expected on filestore subfolder
> splitting. You can however stop an OSD, split it's subfolders, and start
> it back up. I perform this maintenance once/month. I chan
Thanks for the detailed response, Greg. A few follow ups inline:
On 13 Mar 2018 20:52, "Gregory Farnum" <gfar...@redhat.com> wrote:
On Tue, Mar 13, 2018 at 12:17 PM, David C <dcsysengin...@gmail.com> wrote:
> Hi All
>
> I have a Samba server that is exporting d
ended highest MDS debug setting before performance
starts to be adversely affected (I'm aware log files will get huge)?
4) What's the best way of matching inodes in the MDS log to the file names
in cephfs?
Hardware/Versions:
Luminous 12.1.1
Cephfs client 3.10.0-514.2.2.el7.x86_64
Samba 4.4.4
4 node
kes sense to
change the current behaviour of blocking the TMF ABORT response until
the cluster I/O completes.
Cheers, David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
existing abort_request() codepath only cancels the I/O on the
client/gw side. A TMF ABORT successful response should only be sent if
we can guarantee that the I/O is terminated at all layers below, so I
think this would have to be implemented via an additional OSD epoch
barrier or similar.
Chee
I'll have to do some processing to correlate
> the key id with the rest of the request info.
>
>
> Aaron
>
> On Mar 8, 2018, at 8:18 PM, Matt Benjamin <mbenj...@redhat.com> wrote:
>
> Hi Yehuda,
>
> I did add support for logging arbitrary headers, but not a
>
PGs being unevenly distributed is a common occurrence in Ceph. Luminous
started making some steps towards correcting this, but you're in Jewel.
There are a lot of threads in the ML archives about fixing PG
distribution. Generally every method comes down to increasing the weight
on OSDs with too
at makes me sad.
>
> Aaron
>
>
> On Mar 8, 2018, at 12:36 PM, David Turner <drakonst...@gmail.com> wrote:
>
> Setting radosgw debug logging to 10/10 is the only way I've been able to
> get the access key in the logs for requests. It's very unfortunate as it
> DRASTICALLY i
Setting radosgw debug logging to 10/10 is the only way I've been able to
get the access key in the logs for requests. It's very unfortunate as it
DRASTICALLY increases the amount of log per request, but it's what we
needed to do to be able to have the access key in the logs along with the
the same
region via the alternate path.
It's not something that we've observed in the wild, but is nevertheless
a bug that is being worked on, with a resolution that should also be
usable for active/active tcmu-runner.
Cheers, David
___
ceph-users mailing
if the mon can detect slow/blocked request from certain osd
> why can't mon mark a osd with blocked request down if the request is
> blocked for a certain time.
>
> 2018-03-07
> --
> shadow_lin
> --
>
> *发件人:*David Tu
There are multiple settings that affect this. osd_heartbeat_grace is
probably the most apt. If an OSD is not getting a response from another
OSD for more than the heartbeat_grace period, then it will tell the mons
that the OSD is down. Once mon_osd_min_down_reporters have told the mons
that an
I'm pretty sure I put up one of those scripts in the past. Basically what
we did was we set our scrub cycle to something like 40 days, we then sort
all PGs by the last time they were deep scrubbed. We grab the oldest 1/30
of those PGs and tell them to deep-scrub manually, the next day we do it
I would guess that the higher iops in ceph status are from iops calculated
from replication. fio isn't aware of the backend replication iops, only
what it's doing to the rbd
On Fri, Mar 2, 2018, 11:53 PM shadow_lin wrote:
> Hi list,
> There is a client io section from the
[1] Here is a ceph starts on a brand new cluster that has never had any
pools created or data or into it at all. 323GB used out of 2.3PB. that's
0.01% overhead, but we're using 10TB disks for this cluster, and the
overhead is moreso per osd than per TB. It is 1.1GB overhead per osd. 34
of the
> Hi David,
>
> Thank you for your reply. As I understand your experience with multiple
> subnets
> suggests sticking to a single device. However, I have a powerful RDMA NIC
> (100Gbps) with two ports and I have seen recommendations from Mellanox to
> separate the
> two
Blocked requests and slow requests are synonyms in ceph. They are 2 names
for the exact same thing.
On Thu, Mar 1, 2018, 10:21 PM Alex Gorbachev <a...@iss-integration.com> wrote:
> On Thu, Mar 1, 2018 at 2:47 PM, David Turner <drakonst...@gmail.com>
> wrote:
> > `ceph he
`ceph health detail` should show you more information about the slow
requests. If the output is too much stuff, you can grep out for blocked or
something. It should tell you which OSDs are involved, how long they've
been slow, etc. The default is for them to show '> 32 sec' but that may
very
When dealing with the admin socket you need to be an admin. `sudu` or
`sudo -u ceph` ought to get you around that.
I was able to delete a pool just by using the injectargs that you showed
above.
ceph tell mon.\* injectargs '--mon-allow-pool-delete=true'
ceph osd pool rm pool_name pool_name
This aspect of osds has not changed from filestore with SSD journals to
bluestore with DB and WAL soon SSDs. If the SSD fails, all osds using it
aren't lost and need to be removed from the cluster and recreated with a
new drive.
You can never guarantee data integrity on bluestore or filestore if
with vim I notice that is a bit
> slower while is updating the repository, but after the update it works as
> fast as before.
>
> It fails even on Jewel so I think that maybe the only way to do it is to
> create a task to remount the FS when I deploy.
>
> Greetings and thanks!!
&g
They added `ceph pg force-backfill ` but there is nothing to force
scrubbing yet aside from the previously mentioned tricks. You should be
able to change osd_max_scrubs around until the PGs you want to scrub are
going.
On Thu, Mar 1, 2018 at 9:30 AM Kenneth Waegeman
s too intrusive to make it
upstream, so we need to work on a proper upstreamable solution, with
tcmu-runner or otherwise.
Cheers, David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Using CephFS for something like this is about the last thing I would do.
Does it need to be on a networked posix filesystem that can be mounted on
multiple machines at the same time? If so, then you're kinda stuck and we
can start looking at your MDS hardware and see if there are any MDS
settings
With default memory settings, the general rule is 1GB ram/1TB OSD. If you
have a 4TB OSD, you should plan to have at least 4GB ram. This was the
recommendation for filestore OSDs, but it was a bit much memory for the
OSDs. From what I've seen, this rule is a little more appropriate with
`ceph pg stat` might be cleaner to watch than the `ceph status | grep
pgs`. I also like watching `ceph osd pool stats` which breaks down all IO
by pool. You also have the option of the dashboard mgr service which has a
lot of useful information including the pool IO breakdown.
On Thu, Mar 1,
There has been some chatter on the ML questioning the need to separate out
the public and private subnets for Ceph. The trend seems to be in
simplifying your configuration which for some is not specifying multiple
subnets here. I haven't heard of anyone complaining about network problems
with
o repo-luminous
> ... got always Jewel.
>
> Only force the release in the ceph-deploy command allow me to install
> luminous.
>
> Probably yum-plugin-priorities should not be installed after ceph-deploy
> even if I didn't run still any command.
> But what is so strange is
Which version of ceph-deploy are you using?
On Wed, Feb 28, 2018 at 4:37 AM Massimiliano Cuttini
wrote:
> This worked.
>
> However somebody should investigate why default is still jewel on Centos
> 7.4
>
> Il 28/02/2018 00:53, jorpilo ha scritto:
>
> Try using:
> ceph-deploy
A more common search term for this might be Rack failure domain. The
premise is the same for room as it is for rack, both can hold hosts and be
set as the failure domain. There is a fair bit of discussion on how to
achieve multi-rack/room/datacenter setups. Datacenter setups are more
likely to
You could probably write an SNMP module for the new ceph-mgr daemon. What
do you want to use to monitor Ceph that requires SNMP?
On Wed, Feb 28, 2018 at 1:13 PM Andre Goree wrote:
> I've looked and haven't found much information besides custom 3rd-party
> plugins so I figured
My thought is that in 4 years you could have migrated to a hypervisor that
will have better performance into ceph than an added iSCSI layer. I won't
deploy VMs for ceph on anything that won't allow librbd to work. Anything
else is added complexity and reduced performance.
On Wed, Feb 28, 2018,
If you run your container in privileged mode you can mount ceph-fuse inside
of the VMs instead of from the shared resource on the host. I used a
configuration like this to test multi-tenency speed tests of CephFS using
ceph-fuse. The more mount points I used 1 per container, the more bandwidth
I
On 27 Feb 2018 06:46, "Jan Pekař - Imatic" wrote:
I think I hit the same issue.
I have corrupted data on cephfs and I don't remember the same issue before
Luminous (i did the same tests before).
It is on my test 1 node cluster with lower memory then recommended (so
server
Like John says, noout prevents an osd being marked out in the cluster. It
does not impede it from being marked down and back up which is the desired
behavior when restarting a server. What are you seeing with your osds
becoming unusable and needing to rebuild them?
When rebooting a server if it
This is super helpful, thanks for sharing, David. I need to a bit more
reading into this.
On 26 Feb 2018 6:08 p.m., "David Turner" <drakonst...@gmail.com> wrote:
The slow requests are absolutely expected on filestore subfolder
splitting. You can however stop an OSD, split
Smit <caspars...@supernas.eu> wrote:
> David,
>
> Yes i know, i use 20GB partitions for 2TB disks as journal. It was just to
> inform other people that Ceph's default of 1GB is pretty low.
> Now that i read my own sentence it indeed looks as if i was using 1GB
> partitions,
nt
> work.
>
>
>
> 2018-02-27 14:24 GMT+01:00 David Turner <drakonst...@gmail.com>:
>
>> `systemctl list-dependencies ceph.target`
>>
>> I'm guessing that you might need to enable your osds to be managed by
>> systemctl so that they can be stopped when the serve
>>> I believe you can override the setting (I'm not sure how),
> but you really want to correct that flag at the OS layer. Generally when we
> see this there's a RAID card or something between the solid-state device
> and the host which is lying about the s
`systemctl list-dependencies ceph.target`
I'm guessing that you might need to enable your osds to be managed by
systemctl so that they can be stopped when the server goes down.
`systemctl enable ceph-osd@{osd number}`
On Tue, Feb 27, 2018, 4:13 AM Philip Schroth
<caspars...@supernas.eu>
>> wrote:
>>
>>> 2018-02-24 7:10 GMT+01:00 David Turner <drakonst...@gmail.com>:
>>>
>>>> Caspar, it looks like your idea should work. Worst case scenario seems
>>>> like the osd wouldn't start, y
ure (assuming http endpoint and not https).
>
> Yehuda
>
> On Mon, Feb 26, 2018 at 1:21 PM, David Turner <drakonst...@gmail.com>
> wrote:
> > I set it to that for randomness. I don't have a zonegroup named 'us'
> > either, but that works fine. I don't see why 'cn'
t you set in the config file, I assume that's what passed
> in. Why did you set that in your config file? You don't have a
> zonegroup named 'cn', right?
>
> On Mon, Feb 26, 2018 at 1:10 PM, David Turner <drakonst...@gmail.com>
> wrote:
> > I'm also not certain how to do th
I'm also not certain how to do the tcpdump for this. Do you have any
pointers to how to capture that for you?
On Mon, Feb 26, 2018 at 4:09 PM David Turner <drakonst...@gmail.com> wrote:
> That's what I set it to in the config file. I probably should have
> mentioned that.
>
&
tcpdump, see if that's actually
> what's passed in?
>
> On Mon, Feb 26, 2018 at 12:02 PM, David Turner <drakonst...@gmail.com>
> wrote:
> > I run with `debug rgw = 10` and was able to find these lines at the end
> of a
> > request to create the bucket.
> >
Depending on what your security requirements are, you may not have a
choice. If your OpenStack deployment shouldn't be able to load the
Kubernetes RBDs (or vice versa), then you need to keep them separate and
maintain different keyrings for the 2 services. If that is going to be how
you go about
the specific failed
> request might shed some light (would be interesting to look at the
> generated LocationConstraint).
>
> Yehuda
>
> On Mon, Feb 26, 2018 at 11:29 AM, David Turner <drakonst...@gmail.com>
> wrote:
> > Our problem only appeared to be present in bucke
Our problem only appeared to be present in bucket creation. Listing,
putting, etc objects in a bucket work just fine regardless of the
bucket_location setting. I ran this test on a few different realms to see
what would happen and only 1 of them had a problem. There isn't an obvious
thing that
are being split.
[1] filestore_merge_threshold = -16
filestore_split_multiple = 256
[2] https://gist.github.com/drakonstein/cb76c7696e65522ab0e699b7ea1ab1c4
[3] filestore_merge_threshold = -1
filestore_split_multiple = 1
On Mon, Feb 26, 2018 at 12:18 PM David C <dcsysengin...@gmail.
Deza <ad...@redhat.com> wrote:
> On Mon, Feb 26, 2018 at 11:24 AM, David Turner <drakonst...@gmail.com>
> wrote:
> > If we're asking for documentation updates, the man page for ceph-volume
> is
> > incredibly outdated. In 12.2.3 it still says that bluestore is not
Thanks, David. I think I've probably used the wrong terminology here, I'm
not splitting PGs to create more PGs. This is the PG folder splitting that
happens automatically, I believe it's controlled by the
"filestore_split_multiple" setting (which is 8 on my OSDs, I believe that's
th
picked up these changes here.
On Mon, Feb 26, 2018 at 6:23 AM Caspar Smit <caspars...@supernas.eu> wrote:
> 2018-02-24 7:10 GMT+01:00 David Turner <drakonst...@gmail.com>:
>
>> Caspar, it looks like your idea should work. Worst case scenario seems
>> like the osd wou
).
> > Using that you can extrapolate how much space the data pool needs
> > based on your file system usage. (If all you're doing is filling the
> > file system with empty files, of course you're going to need an
> > unusually large metadata pool.)
> >
> Many th
If we're asking for documentation updates, the man page for ceph-volume is
incredibly outdated. In 12.2.3 it still says that bluestore is not yet
implemented and that it's planned to be supported.
'[--bluestore] filestore objectstore (not yet implemented)'
'using a filestore setup (bluestore
much priority to the recovery operations so that client IO can still
happen.
On Mon, Feb 26, 2018 at 11:10 AM David C <dcsysengin...@gmail.com> wrote:
> Hi All
>
> I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals
> on NVME. Cluster primarily used for Ce
crash issues which I think could be related.
Is there anything I can do to mitigate the slow requests problem? The rest
of the time the cluster is performing pretty well.
Thanks,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Patrick's answer supersedes what I said about RocksDB usage. My knowledge
was more general for actually storing objects, not the metadata inside of
MDS. Thank you for sharing Patrick.
On Mon, Feb 26, 2018 at 11:00 AM Patrick Donnelly
wrote:
> On Sun, Feb 25, 2018 at 10:26
I would recommend continuing from where you are now and running `ceph osd
reweight-by-utilization` again. Your weights might be a little more odd,
but your data distribution should be the same. If you were to reset the
weights for the previous OSDs, you would only incur an additional round of
When a Ceph system is in recovery, it uses much more RAM than it does while
running healthy. This increase is often on the order of 4x more memory (at
least back in the days of filestore, I'm not 100% certain about bluestore,
but I would assume the same applies). You have another thread on the
In the past I downloaded the packages for a version and configured it as a
local repo on the server. basically it was a tar.gz that I would extract
that would place the ceph packages in a folder for me and swap out the repo
config file to a version that points to the local folder. I haven't
Mons won't compact and clean up old maps while any PG is in a non-clean
state. What is your `ceph status`? I would guess this isn't your problem,
but thought I'd throw it out there just in case.
Also in Hammer, OSDs started telling each other when they clean up maps and
this caused a map
Thanks for the tips, John. I'll increase the debug level as suggested.
On 25 Feb 2018 20:56, "John Spray" <jsp...@redhat.com> wrote:
> On Sat, Feb 24, 2018 at 10:13 AM, David C <dcsysengin...@gmail.com> wrote:
> > Hi All
> >
> > I had an MDS go down
Hi All
I had an MDS go down on a 12.2.1 cluster, the standby took over but I don't
know what caused the issue. Scrubs are scheduled to start at 23:00 on this
cluster but this appears to have started a minute before.
Can anyone help me with diagnosing this please. Here's the relevant bit
from the
There was another part to my suggestion which was to set the initial crush
weight to 0 in ceph.conf. after you add all of your osds, you could
download the crush map, weight the new osds to what they should be, and
upload the crush map to give them all the ability to take PGs at the same
time.
Caspar, it looks like your idea should work. Worst case scenario seems like
the osd wouldn't start, you'd put the old SSD back in and go back to the
idea to weight them to 0, backfilling, then recreate the osds. Definitely
with a try in my opinion, and I'd love to hear your experience after.
+1 for this. I messed up a cap on a cluster I was configuring doing this
same thing. Luckily it wasn't production and I could fix it quickly.
On Thu, Feb 22, 2018, 8:09 PM Gregory Farnum wrote:
> On Wed, Feb 21, 2018 at 10:54 AM, Enrico Kern
>
Your 6.7GB of DB partition for each 4TB osd is on the very small side of
things. It's been discussed a few times in the ML and the general use case
seems to be about 10GB DB per 1TB of osd. That would be about 40GB DB
partition for each of your osds. This general rule covers most things
except for
Here is a [1] link to a ML thread tracking some slow backfilling on
bluestore. It came down to the backfill sleep setting for them. Maybe it
will help.
[1] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg40256.html
On Fri, Feb 23, 2018 at 10:46 AM Reed Dier
The pool will not actually go read only. All read and write requests will
block until both osds are back up. If I were you, I would use min_size=2
and change it to 1 temporarily if needed to do maintenance or
troubleshooting where down time is not an option.
On Thu, Feb 22, 2018, 5:31 PM Georgios
their Data Centre reliability stamp.
I returned the lot and am done with Intel SSDs, will advise as many customers
and peers to do the same…
Regards
David Herselman
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mike
Lovell
Sent: Thursday, 22 February 2018 11:19 PM
Did you remove and recreate the OSDs that used the SSD for their WAL/DB?
Or did you try to do something to not have to do that? That is an integral
part of the OSD and changing the SSD would destroy the OSDs involved unless
you attempted some sort of dd. If you did that, then any corruption for
You could set the flag noin to prevent the new osds from being calculated
by crush until you are ready for all of them in the host to be marked in.
You can also set initial crush weight to 0 for new pads so that they won't
receive any PGs until you're ready for it.
On Wed, Feb 21, 2018, 5:46 PM
Osds can change their IP every time they start. When they start and check
in with the mons, they tell the mons where they are. Changing your public
network requires restarting every daemon. Likely you will want to schedule
downtime for this. Clients can be routed and on whatever subnet you want,
There WAL sis a required party of the osd. If you remove that, then the osd
is missing a crucial part of itself and it will be unable to start until
the WAL is back online. If the SSD were to fail, then all osds using it
would need to be removed and recreated on the cluster.
On Tue, Feb 20, 2018,
I recently migrated several VMs from an HDD pool to an SSD pool without any
downtime with proxmox. It is definitely possible with qemu to do no
downtime migrations between pools.
On Wed, Feb 21, 2018, 8:32 PM Alexandre DERUMIER
wrote:
> Hi,
>
> if you use qemu, it's also
Having all of the daemons in your cluster able to restart themselves at
will sounds terrifying. What's preventing every osd from restarting at the
same time? Also, ceph dot releases have been known to break environments.
It's the nature of such a widely used software. I would recommend pinning
the
r Ceph cluster" from Ceph Days Germany earlier this month for
> other things to watch out for:
>
>
>
> https://ceph.com/cephdays/germany/
>
>
>
> Bryan
>
>
>
> *From: *ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Bryan
> Banister <bbani
Specifically my issue was having problems without this set in the .s3cfg
file. `bucket_location = US`
On Mon, Feb 19, 2018 at 5:04 PM David Turner <drakonst...@gmail.com> wrote:
> I wasn't using the Go SDK. I was using s3cmd when I came across this.
>
> On Mon, Feb 19, 2018 at
I wasn't using the Go SDK. I was using s3cmd when I came across this.
On Mon, Feb 19, 2018 at 4:42 PM Yehuda Sadeh-Weinraub
wrote:
> Sounds like the go sdk adds a location constraint to requests that
> don't go to us-east-1. RGW itself is definitely isn't tied to
>
I recently I came across this as well. It is an odd requirement.
On Sun, Feb 18, 2018, 4:54 PM F21 wrote:
> I am using the AWS Go SDK v2 (https://github.com/aws/aws-sdk-go-v2) to
> talk to my RGW instance using the s3 interface. I am running ceph in
> docker using the
hung before due to a bug or if recovery stopped (as designed)
because of the unfound object. The new recovery_unfound and
backfill_unfound states indicates that recovery has stopped due to
unfound objects.
commit 64047e1bac2e775a06423a03cfab69b88462538c
Author: David Zafman <d
20; do ceph osd rm
> ${n}; done
>
> I assume that I did the right steps...
>
>
>
>
>
> On 16.02.2018 21:56, David Turner wrote:
> > What is the output of `ceph osd stat`? My guess is that they are still
> > considered to be part of the cluster and going throug
a couple
OSDs holding everything up.
On Fri, Feb 16, 2018 at 4:15 PM Bryan Banister <bbanis...@jumptrading.com>
wrote:
> Thanks David,
>
>
>
> Taking the list of all OSDs that are stuck reports that a little over 50%
> of all OSDs are in this condition. There isn’t any disc
What is the output of `ceph osd stat`? My guess is that they are still
considered to be part of the cluster and going through the process of
removing OSDs from your cluster is what you need to do. In particular
`ceph osd rm 19`.
On Fri, Feb 16, 2018 at 2:31 PM Karsten Becker
osds
4,8,10,15,16,20,27,29,30,31,34,37,38,42,43,44,47,48,49,51,52,57,66,68,73,81,84,85,87,90,95,97,99,100,102,105,106,107,108,111,112,113,121,124,127,130,132
have stuck requests > 268435 sec
On Fri, Feb 16, 2018 at 2:53 PM Bryan Banister <bbanis...@jumptrading.com>
wrote:
> Thanks David,
>
>
>
> I have set the nobackfill,
Your problem might have been creating too many PGs at once. I generally
increase pg_num and pgp_num by no more than 256 at a time. Making sure
that all PGs are creating, peered, and healthy (other than backfilling).
To help you get back to a healthy state, let's start off by getting all of
your
Can you send us a `ceph status` and `ceph health detail`? Something is
still weird. Also can you query the running daemon for it's version instead
of asking the cluster? You should also be able to find it in the logs when
it starts.
On Fri, Feb 16, 2018, 4:24 AM Mark Schouten
Which is more important to you? Deleting the bucket fast or having the
used space become available? If deleting the bucket fast is the priority,
then you can swamp the GC by multithreading object deletion from the bucket
with python or something. If having everything deleted and cleaned up from
There are a lot of threads in the ML about rebalancing the data
distribution in a cluster. The CRUSH algorithm is far from perfect when it
comes to evenly distributing PGs, but it's fairly simple to work around and
there are ceph tools that help with it. reweight-by-utilization being one
of
15, 2018, 7:01 AM Egoitz Aurrekoetxea <ego...@sarenet.es> wrote:
> Good morning David!!
>
>
> First all I wanted to hugely thank the mail you sent yesterday. You don't
> receive all the days these kind of advises from an expert in the area. I
> printed the mail and read it
>From the mon.0 server run `ceph --version`. If you've restarted the mon
daemon and it is still showing 0.94.5, it is most likely because that is
the version of the packages on that server.
On Wed, Feb 14, 2018 at 10:56 AM Mark Schouten wrote:
> Hi,
>
>
>
> I have a (Proxmox)
.
On Wed, Feb 14, 2018 at 11:08 AM <dhils...@performair.com> wrote:
> All;
>
> This might be a noob type question, but this thread is interesting, and
> there's one thing I would like clarified.
>
> David Turner mentions setting 3 flags on OSDs, Götz has mentioned 5 flags
http://tracker.ceph.com/issues/22754
This is a bug in Luminous for cephfs volumes. This is not anything you're
doing wrong. The mon check for removing a cache tier only checks that it's
EC on CephFS and says no. The above tracker has a PR marked for
backporting into Luminous to respond yes if
ceph osd set noout
ceph osd set nobackfill
ceph osd set norecover
Noout will prevent OSDs from being marked out during the maintenance and no
PGs will be able to shift data around with the other 2 flags. After
everything is done, unset the 3 flags and you're good to go.
On Wed, Feb 14, 2018 at
that use librbd.
On Tue, Feb 13, 2018 at 6:13 PM Egoitz Aurrekoetxea <ego...@sarenet.es>
wrote:
> Hi David!!
>
> Thanks a lot for your answer. But what happens when you have... imagine
> two monitors or more and one of them becomes unreponsive?. Another one is
> used after
/master/rados/operations/add-or-rm-osds/#removing-osds-manual
>
> On 13/02/18 14:38, David Turner wrote:
> > An out osd still has a crush weight. Removing that osd or weighting it
> > to 0 will change the weight of the host that it's in. That is why data
> > moves again. Th
401 - 500 of 1451 matches
Mail list logo