Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Richard Bade
me in... > > > > Cheers, Dan > > > > On Fri, Oct 16, 2015 at 1:38 PM, Richard Bade <hitr...@gmail.com> wrote: > > > Thanks for your quick response Dan, but no. All the ceph-mon.*.log > files are > > > empty. > > > I did track this down i

[ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Richard Bade
Hi Everyone, I upgraded our cluster to Hammer 0.94.3 a couple of days ago and today we've had one monitor crash twice and another one once. We have 3 monitors total and have been running Firefly 0.80.10 for quite some time without any monitor issues. When the monitor crashes it leaves a core file

Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Richard Bade
ctober 2015 at 00:33, Dan van der Ster <d...@vanderster.com> wrote: > Hi, > Is there a backtrace in /var/log/ceph/ceph-mon.*.log ? > Cheers, Dan > > On Fri, Oct 16, 2015 at 12:46 PM, Richard Bade <hitr...@gmail.com> wrote: > > Hi Everyone, > > I upgraded

[ceph-users] XFS and nobarriers on Intel SSD

2015-09-04 Thread Richard Bade
Hi Everyone, We have a Ceph pool that is entirely made up of Intel S3700/S3710 enterprise SSD's. We are seeing some significant I/O delays on the disks causing a “SCSI Task Abort” from the OS. This seems to be triggered by the drive receiving a “Synchronize cache command”. My current thinking

Re: [ceph-users] XFS and nobarriers on Intel SSD

2015-09-07 Thread Richard Bade
. Regards, Richard On 5 September 2015 at 07:55, Richard Bade <hitr...@gmail.com> wrote: > Hi Everyone, > > We have a Ceph pool that is entirely made up of Intel S3700/S3710 > enterprise SSD's. > > We are seeing some significant I/O delays on the disks causing a “SCSI &

Re: [ceph-users] XFS and nobarriers on Intel SSD

2015-09-07 Thread Richard Bade
Hi Christian, On 8 September 2015 at 14:02, Christian Balzer wrote: > > Indeed. But first a word about the setup where I'm seeing this. > These are 2 mailbox server clusters (2 nodes each), replicating via DRBD > over Infiniband (IPoIB at this time), LSI 3008 controller. One

Re: [ceph-users] XFS and nobarriers on Intel SSD

2015-09-07 Thread Richard Bade
conds to recover), not whatever insignificant delay caused by > the SSDs. > > Christian > On Tue, 8 Sep 2015 11:35:38 +1200 Richard Bade wrote: > > > Thanks guys for the pointers to this Intel thread: > > > > https://communities.intel.com/thread/77801 > > > > It

Re: [ceph-users] XFS and nobarriers on Intel SSD

2015-09-04 Thread Richard Bade
t; not necessary with fast drives (such as S3700). > > Take a look in the mailing list archives, I elaborated on this quite a bit > in the past, including my experience with Kingston drives + XFS + LSI (and > the effect is present even on Intels, but because they are much faster it > shouldn't cau

Re: [ceph-users] XFS and nobarriers on Intel SSD

2015-09-13 Thread Richard Bade
to update the firmware on the remainder of the S3710 drives this week and also set nobarriers. Regards, Richard On 8 September 2015 at 14:27, Richard Bade <hitr...@gmail.com> wrote: > Hi Christian, > > On 8 September 2015 at 14:02, Christian Balzer <ch...@gol.com> wrote: >>

[ceph-users] mark_unfound_lost revert|delete behaviour

2016-06-01 Thread Richard Bade
Hi Everyone, Can anyone tell me how the ceph pg x.x mark_unfound_lost revert|delete command is meant to work? Due to some not fully know strange circumstances I have 1 unfound object in one of my pools. I've read through

Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2016-03-13 Thread Richard Bade
update we have not had any Monitor crashes. It's now been over two months and the Mon's have been stable. Thanks again, Richard On 17 October 2015 at 07:26, Richard Bade <hitr...@gmail.com> wrote: > Ok, debugging increased > ceph tell mon.[abc] injectargs --debug-mon 20 > cep

[ceph-users] Objects Stuck Degraded

2017-01-24 Thread Richard Bade
Hi Everyone, I've got a strange one. After doing a reweight of some osd's the other night our cluster is showing 1pg stuck unclean. 2017-01-25 09:48:41 : 1 pgs stuck unclean | recovery 140/71532872 objects degraded (0.000%) | recovery 2553/71532872 objects misplaced (0.004%) When I query the pg

Re: [ceph-users] Objects Stuck Degraded

2017-01-25 Thread Richard Bade
of the osd's in one pool down to around 0.3. This seems to have caused the crush map not to be able to find a suitable osd for the 2nd copy. Changing the reweight weights back up to near 1 has resolved the issue. Regards, Richard On 25 January 2017 at 10:58, Richard Bade <hitr...@gmail.com>

Re: [ceph-users] Inconsistent PG won't repair

2017-10-23 Thread Richard Bade
ubb...@redhat.com> wrote: > On Sat, Oct 21, 2017 at 1:59 AM, Richard Bade <hitr...@gmail.com> wrote: >> Hi Lincoln, >> Yes the object is 0-bytes on all OSD's. Has the same filesystem >> date/time too. Before I removed the rbd image (migrated disk to >> different po

Re: [ceph-users] Inconsistent PG won't repair

2017-11-08 Thread Richard Bade
the scrub was finished the inconsistency went away. Note, the object in question was empty (size of zero bytes) before I started this process. I emptied the object by moving the rbd image to another pool. Rich On 24 October 2017 at 14:34, Richard Bade <hitr...@gmail.com> wrote: > What I'm

[ceph-users] Ceph ObjectCacher FAILED assert (qemu/kvm)

2018-05-08 Thread Richard Bade
Hi Everyone, We run some hosts with Proxmox 4.4 connected to our ceph cluster for RBD storage. Occasionally we get a vm suddenly stop with no real explanation. The last time this happened to one particular vm I turned on some qemu logging via Proxmox Monitor tab for the vm and got this dump this

[ceph-users] Inconsistent PG won't repair

2017-10-20 Thread Richard Bade
Hi Everyone, In our cluster running 0.94.10 we had a pg pop up as inconsistent during scrub. Previously when this has happened running ceph pg repair [pg_num] has resolved the problem. This time the repair runs but it remains inconsistent. ~$ ceph health detail HEALTH_ERR 1 pgs inconsistent; 2

Re: [ceph-users] Inconsistent PG won't repair

2017-10-20 Thread Richard Bade
ent metadata. Ultimately it was resolved by doing a > "rados get" and then a "rados put" on the object. *However* that was a last > ditch effort after I couldn't get any other repair option to work, and I have > no idea if that will cause any issues down the road :) > >

[ceph-users] Safe to delete data, metadata pools?

2018-01-07 Thread Richard Bade
Hi Everyone, I've got a couple of pools that I don't believe are being used but have a reasonably large number of pg's (approx 50% of our total pg's). I'd like to delete them but as they were pre-existing when I inherited the cluster, I wanted to make sure they aren't needed for anything first.

Re: [ceph-users] Safe to delete data, metadata pools?

2018-01-15 Thread Richard Bade
Thanks John, I removed these pools on Friday and as you suspected there was no impact. Regards, Rich On 8 January 2018 at 23:15, John Spray <jsp...@redhat.com> wrote: > On Mon, Jan 8, 2018 at 2:55 AM, Richard Bade <hitr...@gmail.com> wrote: >> Hi Everyone, >> I've g

Re: [ceph-users] cephfs compression?

2018-06-28 Thread Richard Bade
I'm using compression on a cephfs-data pool in luminous. I didn't do anything special $ sudo ceph osd pool get cephfs-data all | grep ^compression compression_mode: aggressive compression_algorithm: zlib You can check how much compression you're getting on the osd's $ for osd in `seq 0 11`; do

Re: [ceph-users] cephfs compression?

2018-06-28 Thread Richard Bade
expect for my k4, m2 pool settings. On Fri, 29 Jun 2018 at 17:08, Richard Bade wrote: > > I'm using compression on a cephfs-data pool in luminous. I didn't do > anything special > > $ sudo ceph osd pool get cephfs-data all | grep ^compression > compression_mode: aggressive >

Re: [ceph-users] Luminous Bluestore performance, bcache

2018-06-28 Thread Richard Bade
Hi Andrei, These are good questions. We have another cluster with filestore and bcache but for this particular one I was interested in testing out bluestore. So I have used bluestore both with and without bcache. For my synthetic load on the vm's I'm using this fio command: fio --randrepeat=1

[ceph-users] Luminous Bluestore performance, bcache

2018-06-27 Thread Richard Bade
Hi Everyone, There's been a few threads go past around this but I haven't seen any that pointed me in the right direction. We've recently set up a new luminous (12.2.5) cluster with 5 hosts each with 12 4TB Seagate Constellation ES spinning disks for osd's. We also have 2x 400GB Intel DC P3700's

[ceph-users] Upgrade Documentation: Wait for recovery

2019-06-17 Thread Richard Bade
Hi Everyone, Recently we moved a bunch of our servers from one rack to another. In the late stages of this we hit a point when some requests were blocked due to one pg being in "peered" state. This was unexpected to us, but on discussion with Wido we understand why this happened. However it's

Re: [ceph-users] PG Balancer Upmap mode not working

2019-12-10 Thread Richard Bade
> How is that possible? I dont know how much more proof I need to present that > there's a bug. I also think there's a bug in the balancer plugin as it seems to have stopped for me also. I'm on Luminous though, so not sure if that will be the same bug. The balancer used to work flawlessly,