There are 2 relevant params
mon_osd_full_ratio 0.95
osd_backfill_full_ratio 0.85
you are probably hitting them both
As a short term/ temp fix you may increase these values and maybe adjust
weights on osds if you have to.
However you really need to fix this by adding more osds to your clust
Hi
Than means, the 'mv' operation should be done if src and dst
are in the same pool, and the client should have same permission
on both src and dst.
Do I have the right understanding?
Marc Roos 于2018年12月11日周二 下午4:53写道:
> >Moving data between pools when a file is moved to a different directory
Hi everyone. Yesterday i found that on our overcrowded Hammer ceph cluster (83%
used in HDD pool) several osds were in danger zone - near 95%.
I reweighted them, and after several moments I got pgs stuck in
backfill_toofull.
After that, I reapplied reweight to osds - no luck.
Currently, all re
AFAIR, there is a feature request in the works to allow rebuild with K
chunks, but not allow normal read/write until min_size is met. Not
that I think running with m=1 is a good idea. I'm not seeing the
tracker issue for it at the moment, though.
--
Adam
On Tue, Dec 11, 2018 at 9:50 PM Ashley Merr
Yes if you set back to 5, every time your loose an OSD your have to set to
4 and let the rebuild take place before putting back to 5.
I guess is all down to how important 100% up time is over you manually
monitoring the back fill / fix the OSD / replace the OSD by dropping to 4
vs letting it do th
(accidentally forgot to reply to the list)
> Thank you, setting min_size to 4 allowed I/O again, and the 39 incomplete PGs
> are now:
>
> 39 active+undersized+degraded+remapped+backfilling
>
> Once backfilling is done, I'll increase min_size to 5 again.
>
> Am I likely to encounter this issue wh
With EC the min size is set to K + 1.
Generally EC is used with a M of 2 or more, reason min size is set to 1 is
now you are in a state when a further OSD loss will cause some PG’s to not
have at least K size available as you only have 1 extra M.
As per the error you can get your pool back online
Hi all,
I have a small 2-node cluster with 40 OSDs, using erasure coding 4+1
I lost osd38, and now I have 39 incomplete PGs.
---
PG_AVAILABILITY Reduced data availability: 39 pgs inactive, 39 pgs incomplete
pg 22.2 is incomplete, acting [19,33,10,8,29] (reducing pool media min_size
from 5 m
Looks like the 'delaylog' option for xfs is the problem - no longer
supported in later kernels. See
https://github.com/torvalds/linux/commit/444a702231412e82fb1c09679adc159301e9242c
Offhand I'm not sure where that option is being added (whether
ceph-deploy or ceph-volume), but you could just do su
>
> [root] osci-1001.infra.cin1.corp:~/cephdeploy # ceph-deploy osd create
>> --filestore --fs-type xfs --data /dev/sdb2 --journal /dev/sdb1 osci-1001
>
> [ceph_deploy.conf][DEBUG ] found configuration file at:
>> /root/.cephdeploy.conf
>
> [ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-d
Now I'm just trying to figure out how to create filestore in Luminous.
I've read every doc and tried every flag but I keep ending up with
either a data LV of 100% on the VG or a bunch fo random errors for
unsupported flags...
# ceph-disk prepare --filestore --fs-type xfs --data-dev /dev/sdb1
--jou
Hello,
On Tue, 11 Dec 2018 23:22:40 +0300 Igor Fedotov wrote:
> Hi Tyler,
>
> I suspect you have BlueStore DB/WAL at these drives as well, don't you?
>
> Then perhaps you have performance issues with f[data]sync requests which
> DB/WAL invoke pretty frequently.
>
Since he explicitly mentioned
> > Assuming everything is on LVM including the root filesystem, only moving
> > the boot partition will have to be done outside of LVM.
>
> Since the OP mentioned MS Exchange, I assume the VM is running windows.
> You can do the same LVM-like trick in Windows Server via Disk Manager
> though; add
Hi Tyler,
I suspect you have BlueStore DB/WAL at these drives as well, don't you?
Then perhaps you have performance issues with f[data]sync requests which
DB/WAL invoke pretty frequently.
See the following links for details:
https://www.percona.com/blog/2018/02/08/fsync-performance-storage-d
On 11.12.2018 12:59, Kevin Olbrich wrote:
Hi!
Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes
and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous
cluster (which already holds lot's of images).
The server has access to both local and cluster-storage, I on
On 11.12.2018 17:39, Lionel Bouton wrote:
Le 11/12/2018 à 15:51, Konstantin Shalygin a écrit :
Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes
and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous
cluster (which already holds lot's of images).
The server
We are using qemu storage migration regularly via proxmox
Works fine, you can go on
On 12/11/2018 05:39 PM, Lionel Bouton wrote:
>
> I believe OP is trying to use the storage migration feature of QEMU.
> I've never tried it and I wouldn't recommend it (probably not very
> tested and there is a
On 12/11/2018 10:39 AM, Lionel Bouton wrote:
Le 11/12/2018 à 15:51, Konstantin Shalygin a écrit :
Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes
and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous
cluster (which already holds lot's of images).
The s
Hi Leon,
Are you running with a non-default value of rgw_gc_max_objs? I was able
to reproduce this exact stack trace by setting rgw_gc_max_objs = 0; I
can't think of any other way to get a 'Floating point exception' here.
On 12/11/18 10:31 AM, Leon Robinson wrote:
Hello, I have found a suref
Le 11/12/2018 à 15:51, Konstantin Shalygin a écrit :
>
>> Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes
>> and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous
>> cluster (which already holds lot's of images).
>> The server has access to both local and clu
Hello, I have found a surefire way to bring down our swift gateways.
First, upload a bunch of large files and split it in to segments, e.g.
for i in {1..100}; do swift upload test_container -S 10485760
CentOS-7-x86_64-GenericCloud.qcow2 --object-name
CentOS-7-x86_64-GenericCloud.qcow2-$i; done
Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes
and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous
cluster (which already holds lot's of images).
The server has access to both local and cluster-storage, I only need
to live migrate the storage, not machine
Hi Stefan,
Hi Everyone,
I am in a similar situation like you were a year ago. During some
backfilling we removed an old snapshot and with the next deep-scrub we
ended with the same log as you did.
> deep-scrub 2.61b
2:d8736536:::rbd_data.e22260238e1f29.0046d527:177f6 : is an
unexpected cl
Den tis 11 dec. 2018 kl 12:54 skrev Caspar Smit :
>
> On a Luminous 12.2.7 cluster these are the defaults:
> ceph daemon osd.x config show
thank you very much.
--
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-
Hi!
Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes
and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous
cluster (which already holds lot's of images).
The server has access to both local and cluster-storage, I only need
to live migrate the storage, not mac
On a Luminous 12.2.7 cluster these are the defaults:
ceph daemon osd.x config show
"osd_scrub_max_interval": "604800.00",
"osd_scrub_min_interval": "86400.00",
"osd_scrub_interval_randomize_ratio": "0.50",
"osd_scrub_chunk_max": "25",
"osd_scrub_chunk_min": "5",
"osd
Den tis 11 dec. 2018 kl 12:26 skrev Caspar Smit :
>
> Furthermore, presuming you are running Jewel or Luminous you can change some
> settings in ceph.conf to mitigate the deep-scrub impact:
>
> osd scrub max interval = 4838400
> osd scrub min interval = 2419200
> osd scrub interval randomize ratio
Furthermore, presuming you are running Jewel or Luminous you can change
some settings in ceph.conf to mitigate the deep-scrub impact:
osd scrub max interval = 4838400
osd scrub min interval = 2419200
osd scrub interval randomize ratio = 1.0
osd scrub chunk max = 1
osd scrub chunk min = 1
osd scrub
Hi Vladimir,
While it is advisable to investigate why deep-scrub is killing your
performance (it's enabled for a reason) and find ways to fix that (seperate
block.db SSD's for instance might help) here's a way to accomodate your
needs:
For all your 7200RPM Spinner based pools do:
ceph osd pool s
>Moving data between pools when a file is moved to a different directory
>is most likely problematic - for example an inode can be hard linked to
>two different directories that are in two different pools - then what
>happens to the file? Unix/posix semantics don't really specify a
parent
>
30 matches
Mail list logo