Re: [ceph-users] ceph pg backfill_toofull

2018-12-11 Thread Maged Mokhtar
There are 2 relevant params mon_osd_full_ratio     0.95 osd_backfill_full_ratio 0.85 you are probably hitting them both As a short term/ temp fix you may increase these values and maybe adjust weights on osds if you have to. However you really need to fix this by adding more osds to your clust

Re: [ceph-users] move directories in cephfs

2018-12-11 Thread Zhenshi Zhou
Hi Than means, the 'mv' operation should be done if src and dst are in the same pool, and the client should have same permission on both src and dst. Do I have the right understanding? Marc Roos 于2018年12月11日周二 下午4:53写道: > >Moving data between pools when a file is moved to a different directory

[ceph-users] ceph pg backfill_toofull

2018-12-11 Thread Klimenko, Roman
Hi everyone. Yesterday i found that on our overcrowded Hammer ceph cluster (83% used in HDD pool) several osds were in danger zone - near 95%. I reweighted them, and after several moments I got pgs stuck in backfill_toofull. After that, I reapplied reweight to osds - no luck. Currently, all re

Re: [ceph-users] Lost 1/40 OSDs at EC 4+1, now PGs are incomplete

2018-12-11 Thread Adam Tygart
AFAIR, there is a feature request in the works to allow rebuild with K chunks, but not allow normal read/write until min_size is met. Not that I think running with m=1 is a good idea. I'm not seeing the tracker issue for it at the moment, though. -- Adam On Tue, Dec 11, 2018 at 9:50 PM Ashley Merr

Re: [ceph-users] Lost 1/40 OSDs at EC 4+1, now PGs are incomplete

2018-12-11 Thread Ashley Merrick
Yes if you set back to 5, every time your loose an OSD your have to set to 4 and let the rebuild take place before putting back to 5. I guess is all down to how important 100% up time is over you manually monitoring the back fill / fix the OSD / replace the OSD by dropping to 4 vs letting it do th

Re: [ceph-users] Lost 1/40 OSDs at EC 4+1, now PGs are incomplete

2018-12-11 Thread David Young
(accidentally forgot to reply to the list) > Thank you, setting min_size to 4 allowed I/O again, and the 39 incomplete PGs > are now: > > 39 active+undersized+degraded+remapped+backfilling > > Once backfilling is done, I'll increase min_size to 5 again. > > Am I likely to encounter this issue wh

Re: [ceph-users] Lost 1/40 OSDs at EC 4+1, now PGs are incomplete

2018-12-11 Thread Ashley Merrick
With EC the min size is set to K + 1. Generally EC is used with a M of 2 or more, reason min size is set to 1 is now you are in a state when a further OSD loss will cause some PG’s to not have at least K size available as you only have 1 extra M. As per the error you can get your pool back online

[ceph-users] Lost 1/40 OSDs at EC 4+1, now PGs are incomplete

2018-12-11 Thread David Young
Hi all, I have a small 2-node cluster with 40 OSDs, using erasure coding 4+1 I lost osd38, and now I have 39 incomplete PGs. --- PG_AVAILABILITY Reduced data availability: 39 pgs inactive, 39 pgs incomplete pg 22.2 is incomplete, acting [19,33,10,8,29] (reducing pool media min_size from 5 m

Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-11 Thread Mark Kirkwood
Looks like the 'delaylog' option for xfs is the problem - no longer supported in later kernels. See https://github.com/torvalds/linux/commit/444a702231412e82fb1c09679adc159301e9242c Offhand I'm not sure where that option is being added (whether ceph-deploy or ceph-volume), but you could just do su

Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-11 Thread Tyler Bishop
> > [root] osci-1001.infra.cin1.corp:~/cephdeploy # ceph-deploy osd create >> --filestore --fs-type xfs --data /dev/sdb2 --journal /dev/sdb1 osci-1001 > > [ceph_deploy.conf][DEBUG ] found configuration file at: >> /root/.cephdeploy.conf > > [ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-d

Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-11 Thread Tyler Bishop
Now I'm just trying to figure out how to create filestore in Luminous. I've read every doc and tried every flag but I keep ending up with either a data LV of 100% on the VG or a bunch fo random errors for unsupported flags... # ceph-disk prepare --filestore --fs-type xfs --data-dev /dev/sdb1 --jou

Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-11 Thread Christian Balzer
Hello, On Tue, 11 Dec 2018 23:22:40 +0300 Igor Fedotov wrote: > Hi Tyler, > > I suspect you have BlueStore DB/WAL at these drives as well, don't you? > > Then perhaps you have performance issues with f[data]sync requests which > DB/WAL invoke pretty frequently. > Since he explicitly mentioned

Re: [ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Kevin Olbrich
> > Assuming everything is on LVM including the root filesystem, only moving > > the boot partition will have to be done outside of LVM. > > Since the OP mentioned MS Exchange, I assume the VM is running windows. > You can do the same LVM-like trick in Windows Server via Disk Manager > though; add

Re: [ceph-users] SLOW SSD's after moving to Bluestore

2018-12-11 Thread Igor Fedotov
Hi Tyler, I suspect you have BlueStore DB/WAL at these drives as well, don't you? Then perhaps you have performance issues with f[data]sync requests which DB/WAL invoke pretty frequently. See the following links for details: https://www.percona.com/blog/2018/02/08/fsync-performance-storage-d

Re: [ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Ronny Aasen
On 11.12.2018 12:59, Kevin Olbrich wrote: Hi! Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous cluster (which already holds lot's of images). The server has access to both local and cluster-storage, I on

Re: [ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Ronny Aasen
On 11.12.2018 17:39, Lionel Bouton wrote: Le 11/12/2018 à 15:51, Konstantin Shalygin a écrit : Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous cluster (which already holds lot's of images). The server

Re: [ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Jack
We are using qemu storage migration regularly via proxmox Works fine, you can go on On 12/11/2018 05:39 PM, Lionel Bouton wrote: > > I believe OP is trying to use the storage migration feature of QEMU. > I've never tried it and I wouldn't recommend it (probably not very > tested and there is a

Re: [ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Graham Allan
On 12/11/2018 10:39 AM, Lionel Bouton wrote: Le 11/12/2018 à 15:51, Konstantin Shalygin a écrit : Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous cluster (which already holds lot's of images). The s

Re: [ceph-users] civitweb segfaults

2018-12-11 Thread Casey Bodley
Hi Leon, Are you running with a non-default value of rgw_gc_max_objs? I was able to reproduce this exact stack trace by setting rgw_gc_max_objs = 0; I can't think of any other way to get a 'Floating point exception' here. On 12/11/18 10:31 AM, Leon Robinson wrote: Hello, I have found a suref

Re: [ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Lionel Bouton
Le 11/12/2018 à 15:51, Konstantin Shalygin a écrit : > >> Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes >> and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous >> cluster (which already holds lot's of images). >> The server has access to both local and clu

[ceph-users] civitweb segfaults

2018-12-11 Thread Leon Robinson
Hello, I have found a surefire way to bring down our swift gateways. First, upload a bunch of large files and split it in to segments, e.g. for i in {1..100}; do swift upload test_container -S 10485760 CentOS-7-x86_64-GenericCloud.qcow2 --object-name CentOS-7-x86_64-GenericCloud.qcow2-$i; done

Re: [ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Konstantin Shalygin
Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous cluster (which already holds lot's of images). The server has access to both local and cluster-storage, I only need to live migrate the storage, not machine

Re: [ceph-users] how to fix X is an unexpected clone

2018-12-11 Thread Achim Ledermüller
Hi Stefan, Hi Everyone, I am in a similar situation like you were a year ago. During some backfilling we removed an old snapshot and with the next deep-scrub we ended with the same log as you did. > deep-scrub 2.61b 2:d8736536:::rbd_data.e22260238e1f29.0046d527:177f6 : is an unexpected cl

Re: [ceph-users] yet another deep-scrub performance topic

2018-12-11 Thread Janne Johansson
Den tis 11 dec. 2018 kl 12:54 skrev Caspar Smit : > > On a Luminous 12.2.7 cluster these are the defaults: > ceph daemon osd.x config show thank you very much. -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-

[ceph-users] KVM+Ceph: Live migration of I/O-heavy VM

2018-12-11 Thread Kevin Olbrich
Hi! Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous cluster (which already holds lot's of images). The server has access to both local and cluster-storage, I only need to live migrate the storage, not mac

Re: [ceph-users] yet another deep-scrub performance topic

2018-12-11 Thread Caspar Smit
On a Luminous 12.2.7 cluster these are the defaults: ceph daemon osd.x config show "osd_scrub_max_interval": "604800.00", "osd_scrub_min_interval": "86400.00", "osd_scrub_interval_randomize_ratio": "0.50", "osd_scrub_chunk_max": "25", "osd_scrub_chunk_min": "5", "osd

Re: [ceph-users] yet another deep-scrub performance topic

2018-12-11 Thread Janne Johansson
Den tis 11 dec. 2018 kl 12:26 skrev Caspar Smit : > > Furthermore, presuming you are running Jewel or Luminous you can change some > settings in ceph.conf to mitigate the deep-scrub impact: > > osd scrub max interval = 4838400 > osd scrub min interval = 2419200 > osd scrub interval randomize ratio

Re: [ceph-users] yet another deep-scrub performance topic

2018-12-11 Thread Caspar Smit
Furthermore, presuming you are running Jewel or Luminous you can change some settings in ceph.conf to mitigate the deep-scrub impact: osd scrub max interval = 4838400 osd scrub min interval = 2419200 osd scrub interval randomize ratio = 1.0 osd scrub chunk max = 1 osd scrub chunk min = 1 osd scrub

Re: [ceph-users] yet another deep-scrub performance topic

2018-12-11 Thread Caspar Smit
Hi Vladimir, While it is advisable to investigate why deep-scrub is killing your performance (it's enabled for a reason) and find ways to fix that (seperate block.db SSD's for instance might help) here's a way to accomodate your needs: For all your 7200RPM Spinner based pools do: ceph osd pool s

Re: [ceph-users] move directories in cephfs

2018-12-11 Thread Marc Roos
>Moving data between pools when a file is moved to a different directory >is most likely problematic - for example an inode can be hard linked to >two different directories that are in two different pools - then what >happens to the file?  Unix/posix semantics don't really specify a parent >