Re: [ceph-users] Deleting large pools

2017-11-18 Thread Gregory Farnum
On Wed, Nov 15, 2017 at 6:50 AM David Turner wrote: > 2 weeks later and things are still deleting, but getting really close to > being done. I tried to use ceph-objectstore-tool to remove one of the > PGs. I only tested on 1 PG on 1 OSD, but it's doing something really >

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Ashley Merrick
One thing I have just noticed. The Abort is always on the next thread along for example the last PG 6.84s12 was thread 7f78721ad700, however the assert was listed for 7f78721ae700 Does this mean the PG 6.84s12 crashed causing the next thread to exit, or was the thread 7f78721ae700 what caused

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Ashley Merrick
Will add to ticket. But no the cluster does not see the OSD go up, just the OSD fails on the same assert. ,Ashley From: David Turner [mailto:drakonst...@gmail.com] Sent: 18 November 2017 23:19 To: Ashley Merrick Cc: Eric Nelson ;

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread David Turner
The osd shouldn't be about to peer while it's down. I think this is good information to update your ticket with as it is possible a different code path than anticipated. Did your cluster see the osd as up? On Sat, Nov 18, 2017, 9:32 AM Ashley Merrick wrote: > Hello, > >

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Ashley Merrick
Hello, Added an empty one, and was fine as I guess had no peering to do as had no PG’s. I did also disable no backfill for a while and some PG’s have moved across fine, I did try and export a PG on a OSD that fails to boot and import a PG onto the OSD but then caused the OSD to do the same

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Sean Redmond
Hi, Is it possible to add new empty osds to your cluster? Or do these also crash out? Thanks On 18 Nov 2017 14:32, "Ashley Merrick" wrote: > Hello, > > > > So seems noup does not help. > > > > Still have the same error : > > > > 2017-11-18 14:26:40.982827 7fb4446cd700

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Ashley Merrick
Hello, So seems noup does not help. Still have the same error : 2017-11-18 14:26:40.982827 7fb4446cd700 -1 *** Caught signal (Aborted) **in thread 7fb4446cd700 thread_name:tp_peering ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable) 1: (()+0xa0c554)

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Ashley Merrick
Hello, Will try with the noup now and see if makes any difference. Is effecting both BS & FS OSD’s and effecting different host’s and different PG’s seems to be no form of pattern. ,Ashley From: David Turner [mailto:drakonst...@gmail.com] Sent: 18 November 2017 22:19 To: Ashley Merrick

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread David Turner
Does letting the cluster run with noup for a while until all down disks are idle, and then letting them come in help at all? I don't know your specific issue and haven't touched bluestore yet, but that is generally sound advice when is won't start. Also is there any pattern to the osds that are

Re: [ceph-users] I/O stalls when doing fstrim on large RBD

2017-11-18 Thread Jason Dillaman
Can you capture a blktrace while perform fstrim to record the discard operations? A 1TB trim extent would cause a huge impact since it would translate to approximately 262K IO requests to the OSDs (assuming 4MB backing files). On Fri, Nov 17, 2017 at 6:19 PM, Brendan Moloney

[ceph-users] Rebuild rgw bucket index

2017-11-18 Thread Milanov, Radoslav Nikiforov
Is there a way to rebuild the contents of .rgw.buckets.index pool removed by accident ? Thanks in advance. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Ashley Merrick
Hello, Any further suggestions or work around's from anyone? Cluster is hard down now with around 2% PG's offline, on the occasion able to get an OSD to start for a bit but then will seem to do some peering and again crash with "*** Caught signal (Aborted) **in thread 7f3471c55700

Re: [ceph-users] Bluestore performance 50% of filestore

2017-11-18 Thread Nick Fisk
Just a couple of points. There is no way you can be writing over 7000 iops to 27x7200rpm disks at a replica level of 3. As Mark has suggested, with a 1GB test file, you are only touching a tiny area on each physical disk and so you are probably getting a combination of short stroking from the

Re: [ceph-users] bucket cloning/writable snapshots

2017-11-18 Thread Haomai Wang
On Sat, Nov 18, 2017 at 4:49 PM, Fred Gansevles wrote: > Hi, > > Currently our company has +/- 50 apps where every app has its own > data-area on NFS. > We need to switch S3, using Ceph, as our new data layer with > every app using its own s3-bucket, equivalent to the NFS

[ceph-users] bucket cloning/writable snapshots

2017-11-18 Thread Fred Gansevles
Hi,   Currently our company has +/- 50 apps where every app has its own data-area on NFS. We need to switch S3, using Ceph, as our new data layer with every app using its own s3-bucket, equivalent to the NFS data-area. The sizes of the data-areas, depending on the app, varies from 1.3 GB to 358