Re: [ceph-users] Consumer-grade SSD in Ceph

2020-01-03 Thread Reed Dier
Also, just for more diversity, Samsung has the 883 DCT and the 860 DCT models as well. Both less than 1 DWPD, but they are enterprise rated. Reed > On Jan 3, 2020, at 2:10 AM, Eneko Lacunza wrote: > > I'm sure you know also the following, but just in case: > - Intel SATA D3-S4610 (I think

Re: [ceph-users] Local Device Health PG inconsistent

2019-10-02 Thread Reed Dier
ow my cluster is happy once more. So, in case anyone else runs into this issue, and doesn't think to run pg repair on the pg in question, in this case, go for it. Reed > On Sep 23, 2019, at 9:07 AM, Reed Dier wrote: > > And to come full circle, > > After this whole saga, I now h

Re: [ceph-users] Local Device Health PG inconsistent

2019-09-23 Thread Reed Dier
1 errors Nothing fancy set for the plugin: > $ ceph config dump | grep device > global basicdevice_failure_prediction_mode local > mgr advanced mgr/devicehealth/enable_monitoring true Reed > On Sep 18, 2019, at 11:33 AM, Reed Dier wrote: > > And to provide

Re: [ceph-users] Local Device Health PG inconsistent

2019-09-18 Thread Reed Dier
11d6862d55be) > nautilus (stable)": 1 > }, > "overall": { > "ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) > nautilus (stable)": 206, > "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) > n

Re: [ceph-users] Local Device Health PG inconsistent

2019-09-18 Thread Reed Dier
is down > osd.128 is down > osd.183 is down > osd.190 is down But 190 and 5 were never acting members for that PG, so I have no clue why they are implicated. I re-enabled the module, and that cleared the health error about devicehealth, which doesn't mat

[ceph-users] Local Device Health PG inconsistent

2019-09-12 Thread Reed Dier
Trying to narrow down a strange issue where the single PG for the device_health_metrics that was created when I enabled the 'diskprediction_local' module in the ceph-mgr. But I never see any inconsistent objects in the PG. > $ ceph health detail > OSD_SCRUB_ERRORS 1 scrub errors > PG_DAMAGED

Re: [ceph-users] iostat and dashboard freezing

2019-09-12 Thread Reed Dier
also not running, due to some OSDs being marked as nearfull, again, because of poor distribution. Since running with balancer turned off, I have had very few issues with my MGRs. Reed > On Sep 9, 2019, at 11:19 PM, Konstantin Shalygin wrote: > > On 8/29/19 9:56 PM, Reed Dier wrote: &

Re: [ceph-users] iostat and dashboard freezing

2019-08-29 Thread Reed Dier
See responses below. > On Aug 28, 2019, at 11:13 PM, Konstantin Shalygin wrote: >> Just a follow up 24h later, and the mgr's seem to be far more stable, and >> have had no issues or weirdness after disabling the balancer module. >> >> Which isn't great, because the balancer plays an important

Re: [ceph-users] iostat and dashboard freezing

2019-08-28 Thread Reed Dier
the stability. Just wanted to follow up with another 2¢. Reed > On Aug 27, 2019, at 11:53 AM, Reed Dier wrote: > > Just to further piggyback, > > Probably the most "hard" the mgr seems to get pushed is when the balancer is > engaged. > When trying to eval a po

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Reed Dier
d cephfs metadata pool. > > let me know if the balancer is your problem too... > > best, > > Jake > > On 8/27/19 3:57 PM, Jake Grimmett wrote: >> Yes, the problem still occurs with the dashboard disabled... >> >> Possibly relevant, when both the dashboard

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Reed Dier
I'm currently seeing this with the dashboard disabled. My instability decreases, but isn't wholly cured, by disabling prometheus and rbd_support, which I use in tandem, as the only thing I'm using the prom-exporter for is the per-rbd metrics. > ceph mgr module ls > { > "enabled_modules": [

Re: [ceph-users] iostat and dashboard freezing

2019-08-27 Thread Reed Dier
Curious what dist you're running on, as I've been having similar issues with instability in the mgr as well, curious if any similar threads to pull at. While the iostat command is running, is the active mgr using 100% CPU in top? Reed > On Aug 27, 2019, at 6:41 AM, Jake Grimmett wrote: > >

Re: [ceph-users] ceph status: pg backfill_toofull, but all OSDs have enough space

2019-08-21 Thread Reed Dier
Just chiming in to say that I too had some issues with backfill_toofull PGs, despite no OSD's being in a backfill_full state, albeit, there were some nearfull OSDs. I was able to get through it by reweighting down the OSD that was the target reported by ceph pg dump | grep 'backfill_toofull'.

[ceph-users] compat weight reset

2019-08-02 Thread Reed Dier
Hi all, I am trying to find a simple way that might help me better distribute my data, as I wrap up my Nautilus upgrades. Currently rebuilding some OSD's with bigger block.db to prevent BlueFS spillover where it isn't difficult to do so, and I'm once again struggling with unbalanced

Re: [ceph-users] How to add 100 new OSDs...

2019-07-24 Thread Reed Dier
Just chiming in to say that this too has been my preferred method for adding [large numbers of] OSDs. Set the norebalance nobackfill flags. Create all the OSDs, and verify everything looks good. Make sure my max_backfills, recovery_max_active are as expected. Make sure everything has peered.

Re: [ceph-users] Need to replace OSD. How do I find physical disk

2019-07-18 Thread Reed Dier
You can use ceph-volume to get the LV ID > # ceph-volume lvm list > > == osd.24 == > > [block] > /dev/ceph-edeb727e-c6d3-4347-bfbb-b9ce7f60514b/osd-block-1da5910e-136a-48a7-8cf1-1c265b7b612a > > type block > osd id24 >

Re: [ceph-users] Ubuntu 18.04 - Mimic - Nautilus

2019-07-10 Thread Reed Dier
6.04 and 14.2.1. ? > -Ed > >> On Jul 10, 2019, at 1:46 PM, Reed Dier > <mailto:reed.d...@focusvq.com>> wrote: >> >> It does not appear that that page has been updated in a while. >> >> The official Ceph deb repos only include Mimic and Nautilus p

Re: [ceph-users] Ubuntu 18.04 - Mimic - Nautilus

2019-07-10 Thread Reed Dier
It does not appear that that page has been updated in a while. The official Ceph deb repos only include Mimic and Nautilus packages for 18.04, While the Ubuntu-bionic repos include a Luminous build. Hope that helps. Reed > On Jul 10, 2019, at 1:20 PM, Edward Kalk wrote: > > When reviewing:

[ceph-users] Faux-Jewel Client Features

2019-07-02 Thread Reed Dier
Hi all, Starting to make preparations for Nautilus upgrades from Mimic, and I'm looking over my client/session features and trying to fully grasp the situation. > $ ceph versions > { > "mon": { > "ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic > (stable)": 3

Re: [ceph-users] obj_size_info_mismatch error handling

2019-06-06 Thread Reed Dier
gt; > PG Repair doesn't fix the inconsistency, nor does Brad's omap > workaround earlier in the thread. > In our case, we can fix by cp'ing the file to a new inode, deleting > the inconsistent file, then scrubbing the PG. > > -- Dan > > > On Fri, May 3, 2019 at

Re: [ceph-users] performance in a small cluster

2019-05-31 Thread Reed Dier
Is there any other evidence of this? I have 20 5100 MAX (MTFDDAK1T9TCC) and have not experienced any real issues with them. I would pick my Samsung SM863a's or any of my Intel's over the Micron's, but I haven't seen the Micron's cause any issues for me. For what its worth, they are all FW

Re: [ceph-users] obj_size_info_mismatch error handling

2019-05-03 Thread Reed Dier
rrect* you could try just doing a rados > get followed by a rados put of the object to see if the size is > updated correctly. > > It's more likely the object info size is wrong IMHO. > >> >> On Tue, Apr 30, 2019 at 1:06 AM Reed Dier wrote: >>> >>> H

[ceph-users] obj_size_info_mismatch error handling

2019-04-29 Thread Reed Dier
Hi list, Woke up this morning to two PG's reporting scrub errors, in a way that I haven't seen before. > $ ceph versions > { > "mon": { > "ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic > (stable)": 3 > }, > "mgr": { > "ceph version 13.2.5

Re: [ceph-users] SSD Recovery Settings

2019-03-20 Thread Reed Dier
Grafana is the web frontend for creating the graphs. InfluxDB holds the time series data that Grafana pulls from. To collect data, I am using collectd daemons

Re: [ceph-users] SSD Recovery Settings

2019-03-20 Thread Reed Dier
Not sure what your OSD config looks like, When I was moving from Filestore to Bluestore on my SSD OSD's (and NVMe FS journal to NVMe Bluestore block.db), I had an issue where the OSD was incorrectly being reported as rotational in some part of the chain. Once I overcame that, I had a huge boost

Re: [ceph-users] collectd problems with pools

2019-02-28 Thread Reed Dier
I've been collecting with collectd since Jewel, and experienced the growing pains when moving to Luminous and collectd-ceph needing to be reworked to support Luminous. It is also worth mentioning that in Luminous+ there is an Influx plugin for ceph-mgr that has some per pool statistics. Reed

Re: [ceph-users] Bionic Upgrade 12.2.10

2019-01-14 Thread Reed Dier
This is because Luminous is not being built for Bionic for whatever reason. There are some other mailing list entries detailing this. Right now you have ceph installed from the Ubuntu bionic-updates repo, which has 12.2.8, but does not get regular release updates. This is what I ended up having

Re: [ceph-users] Mimic 13.2.3?

2019-01-10 Thread Reed Dier
> Could I suggest building Luminous for Bionic +1 for Luminous on Bionic. Ran into issues with bionic upgrades, and had to eventually revert from the ceph repos to the Ubuntu repos where they have 12.2.8, which isn’t ideal. Reed > On Jan 9, 2019, at 10:27 AM, Matthew Vernon wrote: > > Hi, >

Re: [ceph-users] Mimic 13.2.3?

2019-01-04 Thread Reed Dier
Piggy backing for a +1 on this. Really would love if bad packages would be recalled, and also if packages would follow release announcement, rather than precede it. For anyone wondering, this is the likely changelog for 13.2.3 in case people want to know what is in it.

Re: [ceph-users] Should ceph build against libcurl4 for Ubuntu 18.04 and later?

2018-12-13 Thread Reed Dier
Figured I would chime in as also having this issue. Moving from 16.04 to 18.04 on some OSD nodes. I have been using the ceph apt repo > deb https://download.ceph.com/debian-luminous/ xenial main During the release-upgrade, it can’t find a candidate package, and actually removes the ceph-osd

Re: [ceph-users] Favorite SSD

2018-09-17 Thread Reed Dier
SM863a were always good to me. Micron 5100 MAX are fine, but felt less consistent than the Samsung’s. Haven’t had any issues with Intel S4600. Intel S3710’s obviously not available anymore, but those were a crowd favorite. Micron 5200 line seems to not have a high endurance SKU like the 5100 line

Re: [ceph-users] cephfs kernel client hangs

2018-08-07 Thread Reed Dier
This is the first I am hearing about this as well. Granted, I am using ceph-fuse rather than the kernel client at this point, but that isn’t etched in stone. Curious if there is more to share. Reed > On Aug 7, 2018, at 9:47 AM, Webert de Souza Lima > wrote: > > > Yan, Zheng

Re: [ceph-users] Best way to replace OSD

2018-08-06 Thread Reed Dier
e, a rebalance or two is peanuts > compared to your normal I/O. If you're not, then there's more than > enough write endurance in an SSD to handle daily rebalances for years. > > On 06/08/18 17:05, Reed Dier wrote: >> This has been my modus operandi when replacing drives. >> >

Re: [ceph-users] Best way to replace OSD

2018-08-06 Thread Reed Dier
This has been my modus operandi when replacing drives. Only having ~50 OSD’s for each drive type/pool, rebalancing can be a lengthy process, and in the case of SSD’s, shuffling data adds unnecessary write wear to the disks. When migrating from filestore to bluestore, I would actually forklift

Re: [ceph-users] Ceph Balancer per Pool/Crush Unit

2018-08-03 Thread Reed Dier
ght-set reweight-compat 28 1.964446 > ceph osd crush weight-set reweight-compat 29 1.629001 > ceph osd crush weight-set reweight-compat 30 1.961968 > ceph osd crush weight-set reweight-compat 31 1.738253 > ceph osd crush weight-set reweight-compat 32 1.884098 > ceph osd crush weight-

[ceph-users] Ceph Balancer per Pool/Crush Unit

2018-08-01 Thread Reed Dier
Hi Cephers, I’m starting to play with the Ceph Balancer plugin after moving to straw2 and running into something I’m surprised I haven’t seen posted here. My cluster has two crush roots, one for HDD, one for SSD. Right now, HDD’s are a single pool to themselves, SSD’s are a single pool to

Re: [ceph-users] separate monitoring node

2018-06-22 Thread Reed Dier
> On Jun 22, 2018, at 2:14 AM, Stefan Kooman wrote: > > Just checking here: Are you using the telegraf ceph plugin on the nodes? > In that case you _are_ duplicating data. But the good news is that you > don't need to. There is a Ceph mgr telegraf plugin now (mimic) which > also works on

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-06-04 Thread Reed Dier
Appreciate the input. Wasn’t sure if ceph-volume was the one setting these bits of metadata or something else. Appreciate the help guys. Thanks, Reed > The fix is in core Ceph (the OSD/BlueStore code), not ceph-volume. :) > journal_rotational is still a thing in BlueStore; it represents the

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-06-04 Thread Reed Dier
A HDD's which each have a 100GB SSD based > block.db > > Looking at ceph osd metadata for each of those: > > "bluefs_db_model": "SAMSUNG MZ7KM960", > "bluefs_db_rotational": "0", > "bluefs_db_type":

Re: [ceph-users] Luminous cluster - how to find out which clients are still jewel?

2018-05-29 Thread Reed Dier
Possibly helpful, If you are able to hit your ceph-mgr dashboard in a web browser, I find it possible to see a table of currently connected cephfs clients, hostnames, state, type (userspace/kernel), and ceph version. Assuming that the link is persistent, for me the url is

Re: [ceph-users] Show and Tell: Grafana cluster dashboard

2018-05-07 Thread Reed Dier
I’ll +1 on InfluxDB rather than Prometheus, though I think having a version for each infrastructure path would be best. I’m sure plenty here have existing InfluxDB infrastructure as their TSDB of choice, and moving to Prometheus would be less advantageous. Conversely, I’m sure all of the

[ceph-users] ceph-mgr balancer getting started

2018-04-12 Thread Reed Dier
Hi ceph-users, I am trying to figure out how to go about making ceph balancer do its magic, as I have some pretty unbalanced distribution across osd’s currently, both SSD and HDD. Cluster is 12.2.4 on Ubuntu 16.04. All OSD’s have been migrated to bluestore. Specifically, my HDD’s are the main

Re: [ceph-users] Disk write cache - safe?

2018-03-14 Thread Reed Dier
Tim, I can corroborate David’s sentiments as it pertains to being a disaster. In the early days of my Ceph cluster, I had 8TB SAS drives behind an LSI RAID controller as RAID0 volumes (no IT mode), with on-drive write-caching enabled (pdcache=default). I subsequently had my the data center

Re: [ceph-users] ceph-mds suicide on upgrade

2018-03-12 Thread Reed Dier
> > See: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/025092.html > <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/025092.html> > > Might be of interest. > > Dietmar > > Am 12. März 2018 18:19:51 MEZ schrieb Reed Dier <reed.d.

[ceph-users] ceph-mds suicide on upgrade

2018-03-12 Thread Reed Dier
Figured I would see if anyone has seen this or can see something I am doing wrong. Upgrading all of my daemons from 12.2.2. to 12.2.4. Followed the documentation, upgraded mons, mgrs, osds, then mds’s in that order. All was fine, until the MDSs. I have two MDS’s in Active:Standby config. I

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
that I was able to use to figure out the issue. Adding to ceph.conf for future OSD conversions. Thanks, Reed > On Feb 26, 2018, at 4:12 PM, Reed Dier <reed.d...@focusvq.com> wrote: > > For the record, I am not seeing a demonstrative fix by injecting the value of > 0 into the

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
;hb_front_addr": "", > "hostname": “host00", > "journal_rotational": "1", > "kernel_description": "#29~16.04.2-Ubuntu SMP Tue Jan 9 22:00:44 UTC > 2018", > "kernel_version&

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
bluestore OSD’s {0,24} with nvme block.db. Hope this may be helpful in determining the root cause. If it helps, all of the OSD’s were originally deployed with ceph-deploy, but are now being redone with ceph-volume locally on each host. Thanks, Reed > On Feb 26, 2018, at 1:00 PM, Grego

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
vice class is configured correctly as far as I know, it all shows as ssd/hdd correctly in ceph osd tree. So hopefully this may be enough of a smoking gun to help narrow down where this may be stemming from. Thanks, Reed > On Feb 23, 2018, at 10:04 AM, David Turner <drakonst...@gmail.c

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-23 Thread Reed Dier
/s wr Don’t mean to clutter the ML/thread, however it did seem odd, maybe its a culprit? Maybe its some weird sampling interval issue thats been solved in 12.2.3? Thanks, Reed > On Feb 23, 2018, at 8:26 AM, Reed Dier <reed.d...@focusvq.com> wrote: > > Below is ceph -s > &g

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-23 Thread Reed Dier
with the > problem you're seeing. (eg, it could be that reading out the omaps is > expensive, so you can get higher recovery op numbers by turning down the > number of entries per request, but not actually see faster backfilling > because you have to issue more requests.) > -Greg >

[ceph-users] SSD Bluestore Backfills Slow

2018-02-21 Thread Reed Dier
Hi all, I am running into an odd situation that I cannot easily explain. I am currently in the midst of destroy and rebuild of OSDs from filestore to bluestore. With my HDDs, I am seeing expected behavior, but with my SSDs I am seeing unexpected behavior. The HDDs and SSDs are set in crush

Re: [ceph-users] Is there a "set pool readonly" command?

2018-02-12 Thread Reed Dier
I do know that there is a pause flag in Ceph. What I do not know is if that also pauses recovery traffic, in addition to client traffic. Also worth mentioning, this is a cluster-wide flag, not a pool level flag. Reed > On Feb 11, 2018, at 11:45 AM, David Turner wrote:

Re: [ceph-users] Migrating filestore to bluestore using ceph-volume

2018-01-26 Thread Reed Dier
0 0 0 00 0 osd.6 > 7 00 0 0 0 00 0 osd.7 > > I guess I can just remove them from crush,auth and rm them? > > Kind Regards, > > David Majchrzak > >> 26 jan. 2018 kl. 18:09 skrev Reed Dier <reed.d...@fo

Re: [ceph-users] Migrating filestore to bluestore using ceph-volume

2018-01-26 Thread Reed Dier
This is the exact issue that I ran into when starting my bluestore conversion journey. See my thread here: https://www.spinics.net/lists/ceph-users/msg41802.html Specifying --osd-id causes it to fail. Below are my steps for OSD

Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-11 Thread Reed Dier
gration. Good luck > > On Jan 11, 2018 12:22 PM, "Reed Dier" <reed.d...@focusvq.com > <mailto:reed.d...@focusvq.com>> wrote: > I am in the process of migrating my OSDs to bluestore finally and thought I > would give you some input on how I am approaching it. > Som

Re: [ceph-users] filestore to bluestore: osdmap epoch problem and is the documentation correct?

2018-01-11 Thread Reed Dier
I am in the process of migrating my OSDs to bluestore finally and thought I would give you some input on how I am approaching it. Some of saga you can find in another ML thread here: https://www.spinics.net/lists/ceph-users/msg41802.html

Re: [ceph-users] Ceph MGR Influx plugin 12.2.2

2018-01-11 Thread Reed Dier
ps that create data points and apply it to every point > created in loops through stats. Of course we'll feed that back > upstream when we get to it and assuming it is still an issue in the > current code. > > thanks, > Ben > > On Thu, Jan 11, 2018 at 2:04 AM, Reed Di

[ceph-users] Ceph MGR Influx plugin 12.2.2

2018-01-10 Thread Reed Dier
Hi all, Does anyone have any idea if the influx plugin for ceph-mgr is stable in 12.2.2? Would love to ditch collectd and report directly from ceph if that is the case. Documentation says that it is added in Mimic/13.x, however it looks like from an earlier ML post that it would be coming to

Re: [ceph-users] OSD Bluestore Migration Issues

2018-01-09 Thread Reed Dier
ore is the default and filestore requires intervention. Thanks, Reed > On Jan 9, 2018, at 2:10 PM, Reed Dier <reed.d...@focusvq.com> wrote: > >> -221.81000 host node24 >> 0 hdd 7.26999 osd.0 destroyed

Re: [ceph-users] OSD Bluestore Migration Issues

2018-01-09 Thread Reed Dier
ng removed from the crush map. Thanks, Reed > On Jan 9, 2018, at 2:05 PM, Alfredo Deza <ad...@redhat.com> wrote: > > On Tue, Jan 9, 2018 at 2:19 PM, Reed Dier <reed.d...@focusvq.com > <mailto:reed.d...@focusvq.com>> wrote: >> Hi ceph-users, >> >

[ceph-users] OSD Bluestore Migration Issues

2018-01-09 Thread Reed Dier
Hi ceph-users, Hoping that this is something small that I am overlooking, but could use the group mind to help. Ceph 12.2.2, Ubuntu 16.04 environment. OSD (0) is an 8TB spinner (/dev/sda) and I am moving from a filestore journal to a blocks.db and WAL device on an NVMe partition

Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2018-01-09 Thread Reed Dier
I would just like to mirror what Dan van der Ster’s sentiments are. As someone attempting to move an OSD to bluestore, with limited/no LVM experience, it is a completely different beast and complexity level compared to the ceph-disk/filestore days. ceph-deploy was a very simple tool that did

Re: [ceph-users] CephFS log jam prevention

2017-12-07 Thread Reed Dier
hanks, Reed > On Dec 5, 2017, at 4:02 PM, Patrick Donnelly <pdonn...@redhat.com> wrote: > > On Tue, Dec 5, 2017 at 8:07 AM, Reed Dier <reed.d...@focusvq.com> wrote: >> Been trying to do a fairly large rsync onto a 3x replicated, filestore HDD >> backed CephFS pool. &

[ceph-users] CephFS log jam prevention

2017-12-05 Thread Reed Dier
Been trying to do a fairly large rsync onto a 3x replicated, filestore HDD backed CephFS pool. Luminous 12.2.1 for all daemons, kernel CephFS driver, Ubuntu 16.04 running mix of 4.8 and 4.10 kernels, 2x10GbE networking between all daemons and clients. > $ ceph versions > { > "mon": { >

Re: [ceph-users] CephFS metadata pool to SSDs

2017-10-13 Thread Reed Dier
23 P/E cycles so far. Thanks again, Reed > On Oct 12, 2017, at 4:18 PM, John Spray <jsp...@redhat.com> wrote: > > On Thu, Oct 12, 2017 at 9:34 PM, Reed Dier <reed.d...@focusvq.com> wrote: >> I found an older ML entry from 2015 and not much else, mostly detailing

[ceph-users] CephFS metadata pool to SSDs

2017-10-12 Thread Reed Dier
I found an older ML entry from 2015 and not much else, mostly detailing the doing performance testing to dispel poor performance numbers presented by OP. Currently have the metadata pool on my slow 24 HDDs, and am curious if I should see any increased performance with CephFS by moving the

Re: [ceph-users] min_size & hybrid OSD latency

2017-10-11 Thread Reed Dier
Just for the sake of putting this in the public forum, In theory, by placing the primary copy of the object on an SSD medium, and placing replica copies on HDD medium, it should still yield some improvement in writes, compared to an all HDD scenario. My logic here is rooted in the idea that

Re: [ceph-users] Ceph monitoring

2017-10-02 Thread Reed Dier
As someone currently running collectd/influxdb/grafana stack for monitoring, I am curious if anyone has seen issues moving Jewel -> Luminous. I thought I remembered reading that collectd wasn’t working perfectly in Luminous, likely not helped with the MGR daemon. Also thought about trying

Re: [ceph-users] Watch for fstrim running on your Ubuntu systems

2017-07-06 Thread Reed Dier
me or worse (takes very long... 100% blocked for about 5min for 16GB > trimmed), and works just fine with firmware M017 (4s for 32GB trimmed). So > maybe you just need an update. > > Peter > > > > On 07/06/17 18:39, Reed Dier wrote: >> Hi Wido, >>

Re: [ceph-users] Watch for fstrim running on your Ubuntu systems

2017-07-06 Thread Reed Dier
Hi Wido, I came across this ancient ML entry with no responses and wanted to follow up with you to see if you recalled any solution to this. Copying the ceph-users list to preserve any replies that may result for archival. I have a couple of boxes with 10x Micron 5100 SATA SSD’s, journaled on

Re: [ceph-users] Ideas on the UI/UX improvement of ceph-mgr: Cluster Status Dashboard

2017-06-29 Thread Reed Dier
I’d like to see per pool iops/usage, et al. Being able to see rados vs rbd vs whatever else performance, or pools with different backing mediums and see which workloads result in what performance. Most of this I pretty well cobble together with collectd, but it would still be nice to have out

Re: [ceph-users] Changing SSD Landscape

2017-06-08 Thread Reed Dier
anything > interesting/comparable in the Samsung range... > > > On Wed, May 17, 2017 at 5:03 PM, Reed Dier <reed.d...@focusvq.com> wrote: >> Agreed, the issue I have seen is that the P4800X (Optane) is demonstrably >> more expensive than the P3700 for a roughly equivalent amou

Re: [ceph-users] OSD scrub during recovery

2017-05-30 Thread Reed Dier
. Either way, make sense, and thanks for the insight. And don’t worry Wido, they aren’t SMR drives! Thanks, Reed > On May 30, 2017, at 11:03 AM, Wido den Hollander <w...@42on.com> wrote: > >> >> Op 30 mei 2017 om 17:37 schreef Reed Dier <reed.d...@focusvq.com>: >

[ceph-users] OSD scrub during recovery

2017-05-30 Thread Reed Dier
Lost an OSD and having to rebuild it. 8TB drive, so it has to backfill a ton of data. Been taking a while, so looked at ceph -s and noticed that deep/scrubs were running even though I’m running newest Jewel (10.2.7) and OSD’s have the osd_scrub_during_recovery set to false. > $ cat

Re: [ceph-users] Changing SSD Landscape

2017-05-18 Thread Reed Dier
> BTW, you asked about Samsung parts earlier. We are running these > SM863's in a block storage cluster: > > Model Family: Samsung based SSDs > Device Model: SAMSUNG MZ7KM240HAGR-0E005 > Firmware Version: GXM1003Q > > > 177 Wear_Leveling_Count 0x0013 094 094 005Pre-fail >

Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Reed Dier
Agreed, the issue I have seen is that the P4800X (Optane) is demonstrably more expensive than the P3700 for a roughly equivalent amount of storage space (400G v 375G). However, the P4800X is perfectly suited to a Ceph environment, with 30 DWPD, or 12.3 PBW. And on top of that, it seems to

Re: [ceph-users] Sharing SSD journals and SSD drive choice

2017-04-26 Thread Reed Dier
Hi Adam, How did you settle on the P3608 vs say the P3600 or P3700 for journals? And also the 1.6T size? Seems overkill, unless its pulling double duty beyond OSD journals. Only improvement over the P3x00 is the move from x4 lanes to x8 lanes on the PCIe bus, but the P3600/P3700 offer much

Re: [ceph-users] Adding New OSD Problem

2017-04-25 Thread Reed Dier
Others will likely be able to provide some better responses, but I’ll take a shot to see if anything makes sense. With 10.2.6 you should be able to set 'osd scrub during recovery’ to false to prevent any new scrubs from occurring during a recovery event. Current scrubs will complete, but

Re: [ceph-users] SSD Primary Affinity

2017-04-19 Thread Reed Dier
In this case the spinners have their journals on an NVMe drive, 3 OSD : 1 NVMe Journal. Will be trying tomorrow to get some benchmarks and compare some hdd/ssd/hybrid workloads to see performance differences across the three backing layers. Most client traffic is read oriented to begin with,

Re: [ceph-users] SSD Primary Affinity

2017-04-19 Thread Reed Dier
Hi Maxime, This is a very interesting concept. Instead of the primary affinity being used to choose SSD for primary copy, you set crush rule to first choose an osd in the ‘ssd-root’, then the ‘hdd-root’ for the second set. And with 'step chooseleaf first {num}’ > If {num} > 0 && <

[ceph-users] SSD Primary Affinity

2017-04-17 Thread Reed Dier
Hi all, I am looking at a way to scale performance and usable space using something like Primary Affinity to effectively use 3x replication across 1 primary SSD OSD, and 2 replicated HDD OSD’s. Assuming production level, we would keep a pretty close 1:2 SSD:HDD ratio, but looking to

[ceph-users] Strange crush / ceph-deploy issue

2017-03-31 Thread Reed Dier
Trying to add a batch of OSD’s to my cluster, (Jewel 10.2.6, Ubuntu 16.04) 2 new nodes (ceph01,ceph02), 10 OSD’s per node. I am trying to steer the OSD’s into a different root pool with crush location set in ceph.conf with > [osd.34] > crush_location = "host=ceph01 rack=ssd.rack2 root=ssd" >

Re: [ceph-users] Ceph PG repair

2017-03-08 Thread Reed Dier
IR_A/{object.name} > 55a76349b758d68945e5028784c59f24 > /var/lib/ceph/osd/ceph-22/current/10.2d8_head/DIR_8/DIR_D/DIR_2/DIR_4/DIR_4/DIR_A/{object.name} So is the object actually inconsistent? Is rados somehow behind on something, not showing the third inconsistent PG? Appreciate any help. Reed

[ceph-users] Ceph PG repair

2017-03-02 Thread Reed Dier
Over the weekend, two inconsistent PG’s popped up in my cluster. This being after having scrubs disabled for close to 6 weeks after a very long rebalance after adding 33% more OSD’s, an OSD failing, increasing PG’s, etc. It appears we came out the other end with 2 inconsistent PG’s and I’m

[ceph-users] Backfill/recovery prioritization

2017-02-01 Thread Reed Dier
Have a smallish cluster that has been expanding with almost a 50% increase in the number of OSD (16->24). This has caused some issues with data integrity and cluster performance as we have increased PG count, and added OSDs. 8x nodes with 3x drives, connected over 2x10G. My problem is that I

Re: [ceph-users] OSD create with SSD journal

2017-01-11 Thread Reed Dier
> > On 1/11/17, 10:31 AM, "ceph-users on behalf of Reed Dier" > <ceph-users-boun...@lists.ceph.com on behalf of reed.d...@focusvq.com> > wrote: > >>> 2017-01-03 12:10:23.514577 7f1d821f2800 0 ceph version 10.2.5 >>> (c461ee19ecbc0c5c330aca20f7392c

[ceph-users] OSD create with SSD journal

2017-01-11 Thread Reed Dier
So I was attempting to add an OSD to my ceph-cluster (running Jewel 10.2.5), using ceph-deploy (1.5.35), on Ubuntu. I have 2 OSD’s on this node, attempting to add third. The first two OSD’s I created with on-disk journals, then later moved them to partitions on the NVMe system disk (Intel

Re: [ceph-users] High load on OSD processes

2016-12-09 Thread Reed Dier
is exactly 15 minutes when the load starts to climb. > > So, just like Diego, do you know if there is a fix for this yet and when it > might be available on the repo? Should I try to install the prior minor > release version for now? > > Thank you for the information.

Re: [ceph-users] High load on OSD processes

2016-12-09 Thread Reed Dier
Assuming you deployed within the last 48 hours, I’m going to bet you are using v10.2.4 which has an issue that causes high cpu utilization. Should see large ramp up in loadav after 15 minutes exactly. See mailing list thread here:

Re: [ceph-users] Migrate OSD Journal to SSD

2016-12-02 Thread Reed Dier
> On Dec 1, 2016, at 6:26 PM, Christian Balzer <ch...@gol.com> wrote: > > On Thu, 1 Dec 2016 18:06:38 -0600 Reed Dier wrote: > >> Apologies if this has been asked dozens of times before, but most answers >> are from pre-Jewel days, and want to double check that t

[ceph-users] Migrate OSD Journal to SSD

2016-12-01 Thread Reed Dier
Apologies if this has been asked dozens of times before, but most answers are from pre-Jewel days, and want to double check that the methodology still holds. Currently have 16 OSD’s across 8 machines with on-disk journals, created using ceph-deploy. These machines have NVMe storage (Intel

[ceph-users] CephFS in existing pool namespace

2016-10-27 Thread Reed Dier
Looking to add CephFS into our Ceph cluster (10.2.3), and trying to plan for that addition. Currently only using RADOS on a single replicated, non-EC, pool, no RBD or RGW, and segmenting logically in namespaces. No auth scoping at this time, but likely something we will be moving to in the

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-21 Thread Reed Dier
> On Oct 19, 2016, at 7:54 PM, Christian Balzer wrote: > > > Hello, > > On Wed, 19 Oct 2016 12:28:28 + Jim Kilborn wrote: > >> I have setup a new linux cluster to allow migration from our old SAN based >> cluster to a new cluster with ceph. >> All systems running centos

Re: [ceph-users] OSD won't come back "UP"

2016-10-07 Thread Reed Dier
ull 8TB disk. > Filesystem1K-blocksUsed Available Use% Mounted on > /dev/sda17806165996 1953556296 5852609700 26% > /var/lib/ceph/osd/ceph-0 Reed > On Oct 7, 2016, at 7:33 PM, Reed Dier <reed.d...@focusvq.com> wrote: > > Attempting to adju

[ceph-users] OSD won't come back "UP"

2016-10-07 Thread Reed Dier
item name 'osd.0' initial_weight 7.2701 at location > {host=node24,root=default} > 2016-10-07 19:41:59.714517 7fd39b4ee700 0 log_channel(cluster) log [INF] : > osd.0 out (down for 338.148761) Everything running latest Jewel release > ceph --version > ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) Any help with this is extremely appreciated. Hoping someone has dealt with this before. Reed Dier ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Recovery/Backfill Speedup

2016-10-04 Thread Reed Dier
Attempting to expand our small ceph cluster currently. Have 8 nodes, 3 mons, and went from a single 8TB disk per node to 2x 8TB disks per node, and the rebalancing process is excruciatingly slow. Originally at 576 PGs before expansion, and wanted to allow rebalance to finish before expanding

Re: [ceph-users] Replacing a failed OSD

2016-09-14 Thread Reed Dier
Hi Jim, This is pretty fresh in my mind so hopefully I can help you out here. Firstly, the crush map will back fill any holes in the enumeration that are existing. So assuming only one drive has been removed from the crush map, it will repopulate the same OSD number. My steps for removing an

[ceph-users] OSD daemon randomly stops

2016-09-02 Thread Reed Dier
OSD has randomly stopped for some reason. Lots of recovery processes currently running on the ceph cluster. OSD log with assert below: > -14> 2016-09-02 11:32:38.672460 7fcf65514700 5 -- op tracker -- seq: 1147, > time: 2016-09-02 11:32:38.672460, event: queued_for_pg, op: >

Re: [ceph-users] Slow Request on OSD

2016-09-02 Thread Reed Dier
t part of this configuration is the hit to write I/O due to less than optimal write scheduling compared to cached writes. Hope to enable write-back at the controller level after BBU installation. Thanks, Reed > On Sep 1, 2016, at 6:21 AM, Cloud List <cloud-l...@sg.or.id> wrote: > >

  1   2   >