[ceph-users] how to map rbd using rbd-nbd on boot?

2017-07-21 Thread Daniel K
Once again my google-fu has failed me and I can't find the 'correct' way to map an rbd using rbd-nbd on boot. Everything takes me to rbdmap, which isn't using rbd-nbd. If someone could just point me in the right direction I'd appreciated it. Thanks! Dan

[ceph-users] ceph recovery incomplete PGs on Luminous RC

2017-07-21 Thread Daniel K
Luminous 12.1.0(RC) I replaced two OSD drives(old ones were still good, just too small), using: ceph osd out osd.12 ceph osd crush remove osd.12 ceph auth del osd.12 systemctl stop ceph-osd@osd.12 ceph osd rm osd.12 I later found that I also should have unmounted it from /var/lib/ceph/osd-12

Re: [ceph-users] dealing with incomplete PGs while using bluestore

2017-07-22 Thread Daniel K
I am in the process of doing exactly what you are -- this worked for me: 1. mount the first partition of the bluestore drive that holds the missing PGs (if it's not already mounted) > mkdir /mnt/tmp > mount /dev/sdb1 /mnt/tmp 2. export the pg to a suitable temporary storage location: >

[ceph-users] Ceph object recovery

2017-07-25 Thread Daniel K
I did some bad things to my cluster, broke 5 OSDs and wound up with 1 unfound object. I mounted one of the OSD drives, used ceph-objectstore-tool to find and exported the object: ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-10 162.001c0ed4 get-bytes filename.obj What's the

Re: [ceph-users] Can't start bluestore OSDs after sucessfully moving them 12.1.1 ** ERROR: osd init failed: (2) No such file or directory

2017-07-25 Thread Daniel K
8180 7f25a88af700 1 -- 10.0.15.142:6800/16150 <http://10.0.15.142:6800/16150> <== mon.1 10.0.15.51:6789/0 <http://10.0.15.51:6789/0> 9 mon_command_ack([{"prefix": "osd crush set-device-class", "class": "hdd", "ids": ["7"]}]=-2

Re: [ceph-users] Can't start bluestore OSDs after sucessfully moving them 12.1.1 ** ERROR: osd init failed: (2) No such file or directory

2017-07-25 Thread Daniel K
and learning. On Tue, Jul 25, 2017 at 2:38 PM, Daniel K <satha...@gmail.com> wrote: > Update to this -- I tried building a new host and a new OSD, new disk, and > I am having the same issue. > > > > I set osd debug level to 10 -- the issue looks like it's coming from a mo

Re: [ceph-users] ceph recovery incomplete PGs on Luminous RC

2017-07-24 Thread Daniel K
far...@redhat.com> wrote: > > On Fri, Jul 21, 2017 at 10:23 PM Daniel K <satha...@gmail.com> wrote: > >> Luminous 12.1.0(RC) >> >> I replaced two OSD drives(old ones were still good, just too small), >> using: >> >> ceph osd out osd.1

[ceph-users] Can't start bluestore OSDs after sucessfully moving them 12.1.1 ** ERROR: osd init failed: (2) No such file or directory

2017-07-24 Thread Daniel K
List -- I have a 4-node cluster running on baremetal and have a need to use the kernel client on 2 nodes. As I read you should not run the kernel client on a node that runs an OSD daemon, I decided to move the OSD daemons into a VM on the same device. Orignal host is stor-vm2(bare metal), new

Re: [ceph-users] Ceph object recovery

2017-07-27 Thread Daniel K
viously) # rados -p cephfs_data put 162.001c0ed4 162.001c0ed4.obj Still have more recovery to do but this seems to have fixed my unfound object problem. On Tue, Jul 25, 2017 at 12:54 PM, Daniel K <satha...@gmail.com> wrote: > I did some bad things to my cluster, broke 5 OSDs a

[ceph-users] Client behavior when OSD is unreachable

2017-07-27 Thread Daniel K
Does the client track which OSDs are reachable? How does it behave if some are not reachable? For example: Cluster network with all OSD hosts on a switch. Public network with OSD hosts split between two switches, failure domain is switch. copies=3 so with a failure of the public switch, 1 copy

Re: [ceph-users] rbd-fuse performance

2017-06-28 Thread Daniel K
thank you! On Wed, Jun 28, 2017 at 11:48 AM, Mykola Golub <mgo...@mirantis.com> wrote: > On Tue, Jun 27, 2017 at 07:17:22PM -0400, Daniel K wrote: > > > rbd-nbd isn't good as it stops at 16 block devices (/dev/nbd0-15) > > modprobe nbd nbds_max=1024 > > Or, if n

Re: [ceph-users] osds exist in the crush map but not in the osdmap after kraken > luminous rc1 upgrade

2017-06-27 Thread Daniel K
the extra stuff is and the easiest way to remove > it. > > On Tue, Jun 27, 2017, 12:12 PM Daniel K <satha...@gmail.com> wrote: > >> Hi, >> >> I'm extremely new to ceph and have a small 4-node/20-osd cluster. >> >> I just upgraded from kraken to lumino

[ceph-users] Luminous/Bluestore compression documentation

2017-06-27 Thread Daniel K
Is there anywhere that details the various compression settings for bluestore backed pools? I can see compression in the list of options when I run ceph osd pool set, but can't find anything that details what valid settings are. I've tried discovering the options via the command line utilities

[ceph-users] rbd-fuse performance

2017-06-27 Thread Daniel K
Hi, As mentioned in my previous emails, I'm extremely new to ceph, so please forgive my lack of knowledge. I'm trying to find a good way to mount ceph rbd images for export by LIO/targetcli rbd-nbd isn't good as it stops at 16 block devices (/dev/nbd0-15) kernel rbd mapping doesn't have

[ceph-users] ceph-monstore-tool missing in 12.1.1 on Xenial?

2017-07-30 Thread Daniel K
All 3 of my mons crashed while I was adding OSDs and now error out with: (/build/ceph-12.1.1/src/mon/OSDMonitor.cc: 3018: FAILED assert(osdmap.get_up_osd_features() & CEPH_FEATURE_MON_STATEFUL_SUB) I've resorted to just rebuilding the mon DB and making 3 new mon daemons, using the steps here:

[ceph-users] implications of losing the MDS map

2017-08-07 Thread Daniel K
I finally figured out how to get the ceph-monstore-tool (compiled from source) and am ready to attemp to recover my cluster. I have one question -- in the instructions, http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/ under Recovery from OSDs, Known limitations: ->

[ceph-users] osds exist in the crush map but not in the osdmap after kraken > luminous rc1 upgrade

2017-06-27 Thread Daniel K
Hi, I'm extremely new to ceph and have a small 4-node/20-osd cluster. I just upgraded from kraken to luminous without much ado, except now when I run ceph status, I get a health_warn because "2 osds exist in the crush map but not in the osdmap" Googling the error message only took me to the

Re: [ceph-users] design guidance

2017-06-06 Thread Daniel K
I started down that path and got so deep that I couldn't even find where I went in. I couldn't make heads or tails out of what would or wouldn't work. We didn't need multiple hosts accessing a single datastore, so on the client side I just have a single VM guest running on each ESXi hosts, with

Re: [ceph-users] design guidance

2017-06-06 Thread Daniel K
'd rather learn this the easy way than to have to rebuild this 6 times over the next 6 months. On Tue, Jun 6, 2017 at 2:05 AM, Christian Balzer <ch...@gol.com> wrote: > > Hello, > > lots of similar questions in the past, google is your friend. > > On Mon, 5 Jun 2017 23:59

Re: [ceph-users] mds slow request, getattr currently failed to rdlock. Kraken with Bluestore

2017-05-24 Thread Daniel K
production the pieces will be separated. On Wed, May 24, 2017 at 4:55 PM, Gregory Farnum <gfar...@redhat.com> wrote: > On Wed, May 24, 2017 at 3:15 AM, John Spray <jsp...@redhat.com> wrote: > > On Tue, May 23, 2017 at 11:41 PM, Daniel K <satha...@gmail.com> wrote:

Re: [ceph-users] mds slow request, getattr currently failed to rdlock. Kraken with Bluestore

2017-05-24 Thread Daniel K
e anyone's time on a wild goose chase. On Wed, May 24, 2017 at 6:15 AM, John Spray <jsp...@redhat.com> wrote: > On Tue, May 23, 2017 at 11:41 PM, Daniel K <satha...@gmail.com> wrote: > > Have a 20 OSD cluster -"my first ceph cluster" that has another 400 OS

[ceph-users] Kraken bluestore compression

2017-06-05 Thread Daniel K
Hi, I see several mentions that compression is available in Kraken for bluestore OSDs, however, I can find almost nothing in the documentation that indicates how to use it. I've found: - http://docs.ceph.com/docs/master/radosgw/compression/ - http://ceph.com/releases/v11-2-0-kraken-released/

[ceph-users] design guidance

2017-06-05 Thread Daniel K
I've built 'my-first-ceph-cluster' with two of the 4-node, 12 drive Supermicro servers and dual 10Gb interfaces(one cluster, one public) I now have 9x 36-drive supermicro StorageServers made available to me, each with dual 10GB and a single Mellanox IB/40G nic. No 1G interfaces except IPMI. 2x

Re: [ceph-users] Ceph re-ip of OSD node

2017-08-30 Thread Daniel K
Just curious why it wouldn't work as long as the IPs were reachable? Is there something going on in layer 2 with Ceph that wouldn't survive a trip across a router? On Wed, Aug 30, 2017 at 1:52 PM, David Turner wrote: > ALL OSDs need to be running the same private

[ceph-users] RBD encryption options?

2017-08-21 Thread Daniel K
Are there any client-side options to encrypt an RBD device? Using latest luminous RC, on Ubuntu 16.04 and a 4.10 kernel I assumed adding client site encryption would be as simple as using luks/dm-crypt/cryptsetup after adding the RBD device to /etc/ceph/rbdmap and enabling the rbdmap service --

Re: [ceph-users] RBD encryption options?

2017-08-24 Thread Daniel K
Awesome -- I searched and all I could find was restricting access at the pool level I will investigate the dm-crypt/RBD path also. Thanks again! On Thu, Aug 24, 2017 at 7:40 PM, Alex Gorbachev <a...@iss-integration.com> wrote: > > On Mon, Aug 21, 2017 at 9:03 PM Daniel K <sat

[ceph-users] Added two OSDs, 10% of pgs went inactive

2017-12-19 Thread Daniel K
I'm trying to understand why adding OSDs would cause pgs to go inactive. This cluster has 88 OSDs, and had 6 OSD with device class "hdd_10TB_7.2k" I added two more OSDs, set the device class to "hdd_10TB_7.2k" and 10% of pgs went inactive. I have an EC pool on these OSDs with the profile:

Re: [ceph-users] Added two OSDs, 10% of pgs went inactive

2017-12-20 Thread Daniel K
, Dec 19, 2017 at 8:57 PM, Daniel K <satha...@gmail.com> wrote: > I'm trying to understand why adding OSDs would cause pgs to go inactive. > > This cluster has 88 OSDs, and had 6 OSD with device class "hdd_10TB_7.2k" > > I added two more OSDs, set the device class to &

Re: [ceph-users] Added two OSDs, 10% of pgs went inactive

2017-12-21 Thread Daniel K
D's from one chassis to another (there was no data on it > yet). > > I tried restarting OSD's to no avail. > > Couldn't find anything about the stuck in "activating+remapped" state so > in the end i threw away the pool and started over. > > Could this be a bug in 12.2.2 ? &

Re: [ceph-users] Ceph newbie(?) issues

2018-03-05 Thread Daniel K
I had a similar problem with some relatively underpowered servers (2x E5-2603 6 core 1.7ghz no HT, 12-14 2TB OSDs per server, 32Gb RAM) There was a process on a couple of the servers that would hang and chew up all available CPU. When that happened, I started getting scrub errors on those

Re: [ceph-users] Ceph iSCSI is a prank?

2018-03-02 Thread Daniel K
There's been quite a few VMWare/Ceph threads on the mailing list in the past. One setup I've been toying with is a linux guest running on the vmware host on local storage, with the guest mounting a ceph RBD with a filesystem on it, then exporting that via NFS to the VMWare host as a datastore.

Re: [ceph-users] Multiple OSD crashing on 12.2.0. Bluestore / EC pool / rbd

2018-12-20 Thread Daniel K
I'm hitting this same issue on 12.2.5. Upgraded one node to 12.2.10 and it didn't clear. 6 OSDs flapping with this error. I know this is an older issue but are traces still needed? I don't see a resolution available. Thanks, Dan On Wed, Sep 6, 2017 at 10:30 PM Brad Hubbard wrote: > These

Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

2018-12-20 Thread Daniel K
Did you ever get anywhere with this? I have 6 OSDs out of 36 continuously flapping with this error in the logs. Thanks, Dan On Fri, Jun 8, 2018 at 11:10 AM Caspar Smit wrote: > Hi all, > > Maybe this will help: > > The issue is with shards 3,4 and 5 of PG 6.3f: > > LOG's of OSD's 16, 17 & 36

[ceph-users] 12.2.5 multiple OSDs crashing

2018-12-19 Thread Daniel K
12.2.5 on Proxmox cluster. 6 nodes, about 50 OSDs, bluestore and cache tiering on an EC pool. Mostly spinners with an SSD OSD drive and an SSD WAL DB drive on each node. PM863 SSDs with ~75%+ endurance remaning. Has been running relatively okay besides some spinner failures until I checked today

[ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-01 Thread Daniel K
56 OSD, 6-node 12.2.5 cluster on Proxmox We had multiple drives fail(about 30%) within a few days of each other, likely faster than the cluster could recover. After the dust settled, we have 2 out of 896 pgs stuck inactive. The failed drives are completely inaccessible, so I can't mount them and

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-02 Thread Daniel K
I bought the wrong drives trying to be cheap. They were 2TB WD Blue 5400rpm 2.5 inch laptop drives. They've been replace now with HGST 10K 1.8TB SAS drives. On Sat, Mar 2, 2019, 12:04 AM wrote: > > > Saturday, 2 March 2019, 04.20 +0100 from satha...@gmail.com < > satha...@gmail.com>: > > 56

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-02 Thread Daniel K
t; > Saturday, 2 March 2019, 14.34 +0100 from Daniel K : > > I bought the wrong drives trying to be cheap. They were 2TB WD Blue > 5400rpm 2.5 inch laptop drives. > > They've been replace now with HGST 10K 1.8TB SAS drives. > > > > On Sat, Mar 2, 2019, 12:04 AM wrot

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-04 Thread Daniel K
o reset the PGs instead. > Data will obviously be lost afterwards. > > Paul > > > > > On Sat, Mar 2, 2019 at 6:08 AM Daniel K wrote: > >> > >> They all just started having read errors. Bus resets. Slow reads. Which > is one of the reasons the cluster

[ceph-users] Ceph PVE cluster help

2019-08-26 Thread Daniel K
Have some friends I set up a Ceph cluster for use with PVE a few years ago. It wasn't maintained and is now in bad shape. They've reached out to me for help, but I do not have the time to assist right now. Is there anyone on the list that would be willing to help? As a professional service of