Re: [ceph-users] OSDs crash after deleting unfound object in Luminous 12.2.8

2018-10-18 Thread Mike Lovell
nt > while scrubbing due to the missing object, but I don't think so. > > Anyway, I just wanted to thank you for your help! > > Best wishes, > > Lawrence > > On 10/13/2018 02:00 AM, Mike Lovell wrote: > > what was the object name that you marked lost? was it one of th

Re: [ceph-users] OSDs crash after deleting unfound object in Luminous 12.2.8

2018-10-12 Thread Mike Lovell
what was the object name that you marked lost? was it one of the cache tier hit_sets? the trace you have does seem to be failing when the OSD is trying to remove a hit set that is no longer needed. i ran into a similar problem which might have been why that bug you listed was created. maybe

Re: [ceph-users] All pools full after one OSD got OSD_FULL state

2018-03-29 Thread Mike Lovell
On Thu, Mar 29, 2018 at 1:17 AM, Jakub Jaszewski wrote: > Many thanks Mike, that justifies stopped IOs. I've just finished adding > new disks to cluster and now try to evenly reweight OSD by PG. > > May I ask you two more questions? > 1. As I was in a hurry I did not

Re: [ceph-users] PG mapped to OSDs on same host although 'chooseleaf type host'

2018-02-22 Thread Mike Lovell
was the pg-upmap feature used to force a pg to get mapped to a particular osd? mike On Thu, Feb 22, 2018 at 10:28 AM, Wido den Hollander wrote: > Hi, > > I have a situation with a cluster which was recently upgraded to Luminous > and has a PG mapped to OSDs on the same host. > >

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-02-22 Thread Mike Lovell
hu, Feb 22, 2018 at 3:58 PM, Hans Chris Jones < chris.jo...@lambdastack.io> wrote: > Interesting. This does not inspire confidence. What SSDs (2TB or 4TB) do > people have good success with in high use production systems with bluestore? > > Thanks > > On Thu, Feb 22, 2018

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-02-22 Thread Mike Lovell
the lot and am done with Intel SSDs, will advise as many > customers and peers to do the sameā€¦ > > > > > > Regards > > David Herselman > > > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *Mike Lovell > *Sent:*

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2018-02-22 Thread Mike Lovell
has anyone tried with the most recent firmwares from intel? i've had a number of s4600 960gb drives that have been waiting for me to get around to adding them to a ceph cluster. this as well as having 2 die almost simultaneously in a different storage box is giving me pause. i noticed that David

Re: [ceph-users] Removing cache tier for RBD pool

2018-01-19 Thread Mike Lovell
On Tue, Jan 16, 2018 at 9:25 AM, Jens-U. Mozdzen <jmozd...@nde.ag> wrote: > Hello Mike, > > Zitat von Mike Lovell <mike.lov...@endurance.com>: > >> On Mon, Jan 8, 2018 at 6:08 AM, Jens-U. Mozdzen <jmozd...@nde.ag> wrote: >> >>> Hi *, >>>

Re: [ceph-users] Removing cache tier for RBD pool

2018-01-15 Thread Mike Lovell
On Mon, Jan 8, 2018 at 6:08 AM, Jens-U. Mozdzen wrote: > Hi *, > > trying to remove a caching tier from a pool used for RBD / Openstack, we > followed the procedure from http://docs.ceph.com/docs/mast > er/rados/operations/cache-tiering/#removing-a-writeback-cache and ran > into

Re: [ceph-users] cephfs cache tiering - hitset

2017-03-20 Thread Mike Lovell
On Mon, Mar 20, 2017 at 4:20 PM, Nick Fisk <n...@fisk.me.uk> wrote: > Just a few corrections, hope you don't mind > > > -Original Message- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > > Mike Lovell > > Sent: 20 Marc

Re: [ceph-users] cephfs cache tiering - hitset

2017-03-20 Thread Mike Lovell
i'm not an expert but here is my understanding of it. a hit_set keeps track of whether or not an object was accessed during the timespan of the hit_set. for example, if you have a hit_set_period of 600, then the hit_set covers a period of 10 minutes. the hit_set_count defines how many of the

[ceph-users] hammer to jewel upgrade experiences? cache tier experience?

2017-03-06 Thread Mike Lovell
has anyone on the list done an upgrade from hammer (something later than 0.94.6) to jewel with a cache tier configured? i tried doing one last week and had a hiccup with it. i'm curious if others have been able to successfully do the upgrade and, if so, did they take any extra steps related to the

[ceph-users] osds crashing during hit_set_trim and hit_set_remove_all

2017-03-03 Thread Mike Lovell
i started an upgrade process to go from 0.94.7 to 10.2.5 on a production cluster that is using cache tiering. this cluster has 3 monitors, 28 storage nodes, around 370 osds. the upgrade of the monitors completed without issue. i then upgraded 2 of the storage nodes, and after the restarts, the

Re: [ceph-users] Issue with upgrade from 0.94.9 to 10.2.5

2017-01-23 Thread Mike Lovell
i was just testing an upgrade of some monitors in a test cluster from hammer (0.94.7) to jewel (10.2.5). after upgrade each of the first two monitors, i stopped and restarted a single osd to cause changes in the maps. the same error messages showed up in ceph -w. i haven't dug into it much but

Re: [ceph-users] Crashing OSDs (suicide timeout, following a single pool)

2016-06-01 Thread Mike Lovell
On Wed, Jun 1, 2016 at 9:13 AM, Adam Tygart wrote: > Hello all, > > I'm running into an issue with ceph osds crashing over the last 4 > days. I'm running Jewel (10.2.1) on CentOS 7.2.1511. > > A little setup information: > 26 hosts > 2x 400GB Intel DC P3700 SSDs > 12x6TB spinning

Re: [ceph-users] help troubleshooting some osd communication problems

2016-04-29 Thread Mike Lovell
On Fri, Apr 29, 2016 at 9:34 AM, Mike Lovell <mike.lov...@endurance.com> wrote: > On Fri, Apr 29, 2016 at 5:54 AM, Alexey Sheplyakov < > asheplya...@mirantis.com> wrote: > >> Hi, >> >> > i also wonder if just taking 148 out of the cluster (probably just

Re: [ceph-users] Backfilling caused RBD corruption on Hammer?

2016-04-29 Thread Mike Lovell
are the new osds running 0.94.5 or did they get the latest .6 packages? are you also using cache tiering? we ran in to a problem with individual rbd objects getting corrupted when using 0.94.6 with a cache tier and min_read_recency_for_promote was > 1. our only solution to corruption that happened

Re: [ceph-users] help troubleshooting some osd communication problems

2016-04-29 Thread Mike Lovell
On Fri, Apr 29, 2016 at 5:54 AM, Alexey Sheplyakov wrote: > Hi, > > > i also wonder if just taking 148 out of the cluster (probably just > marking it out) would help > > As far as I understand this can only harm your data. The acting set of PG > 17.73 is [41, 148], >

Re: [ceph-users] help troubleshooting some osd communication problems

2016-04-29 Thread Mike Lovell
h > sides of the connection when a message is lost. > -Sam > > On Thu, Apr 28, 2016 at 2:38 PM, Mike Lovell <mike.lov...@endurance.com> > wrote: > > there was a problem on one of the clusters i manage a couple weeks ago > where > > pairs of OSDs would w

[ceph-users] help troubleshooting some osd communication problems

2016-04-28 Thread Mike Lovell
there was a problem on one of the clusters i manage a couple weeks ago where pairs of OSDs would wait indefinitely on subops from the other OSD in the pair. we used a liberal dose of "ceph osd down ##" on the osds and eventually things just sorted them out a couple days later. it seems to have

Re: [ceph-users] data corruption with hammer

2016-03-19 Thread Mike Lovell
than 1. mike On Wed, Mar 16, 2016 at 4:41 PM, Mike Lovell <mike.lov...@endurance.com> wrote: > robert and i have done some further investigation the past couple days on > this. we have a test environment with a hard drive tier and an ssd tier as > a cache. several vms were created wi

Re: [ceph-users] data corruption with hammer

2016-03-15 Thread Mike Lovell
. mike On Mon, Mar 14, 2016 at 9:35 PM, Christian Balzer <ch...@gol.com> wrote: > > Hello, > > On Mon, 14 Mar 2016 20:51:04 -0600 Mike Lovell wrote: > > > something weird happened on one of the ceph clusters that i administer > > tonight which resulted in virtual

[ceph-users] data corruption with hammer

2016-03-14 Thread Mike Lovell
something weird happened on one of the ceph clusters that i administer tonight which resulted in virtual machines using rbd volumes seeing corruption in multiple forms. when everything was fine earlier in the day, the cluster was a number of storage nodes spread across 3 different roots in the

Re: [ceph-users] osds crashing on Thread::create

2016-03-07 Thread Mike Lovell
. it looks like they're creating over 2500 threads each. i don't know the internals of the code but that seems like a lot. oh well. hopefully this fixes it. mike On Mon, Mar 7, 2016 at 1:55 PM, Gregory Farnum <gfar...@redhat.com> wrote: > On Mon, Mar 7, 2016 at 11:04 AM, Mike Lovell

[ceph-users] osds crashing on Thread::create

2016-03-07 Thread Mike Lovell
first off, hello all. this is my first time posting to the list. i have seen a recurring problem that has starting in the past week or so on one of my ceph clusters. osds will crash and it seems to happen whenever backfill or recovery is started. looking at the logs it appears that the the osd is