Re: [ceph-users] pros/cons of multiple OSD's per host

2017-08-28 Thread David Turner
In your example of EC 5 + 3, your min_size is 5. As long as you have 5 hosts up, you should still be serving content. My home cluster uses 2+1 and has 3 nodes. I can reboot any node (leaving 2 online) as long as the PGs in the cluster are healthy. If I were to actually lose a node, I would have to

Re: [ceph-users] pros/cons of multiple OSD's per host

2017-08-28 Thread Nick Tan
On Wed, Aug 23, 2017 at 2:28 PM, Christian Balzer wrote: > On Wed, 23 Aug 2017 13:38:25 +0800 Nick Tan wrote: > > > Thanks for the advice Christian. I think I'm leaning more towards the > > 'traditional' storage server with 12 disks - as you say they give a lot > > more flexibility with the perf

[ceph-users] Grafana Dasboard

2017-08-28 Thread Shravana Kumar.S
All, I am looking for Grafana dashboard to monitor CEPH. I am using telegraf to collect the metrics and influxDB to store the value. Anyone is having the dashboard json file. Thanks, Saravans ___ ceph-users mailing list ceph-users@lists.ceph.com http://

Re: [ceph-users] State of play for RDMA on Luminous

2017-08-28 Thread Nigel Williams
On 29 August 2017 at 00:21, Haomai Wang wrote: > On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas wrote: >> - And more broadly, if a user wants to use the performance benefits of >> RDMA, but not all of their potential Ceph clients have InfiniBand HCAs, >> what are their options? RoCE? > > roce v2 i

Re: [ceph-users] CephFS: mount fs - single posing of failure

2017-08-28 Thread LOPEZ Jean-Charles
Hi Oscar, the mount command accepts multiple MON addresses. mount -t ceph monhost1,monhost2,monhost3:/ /mnt/foo If not specified the port by default is 6789. JC > On Aug 28, 2017, at 13:54, Oscar Segarra wrote: > > Hi, > > In Ceph, by design there is no single point of failure I terms of s

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Tomasz Kusmierz
Rule of thumb with batteries is: - more “proper temperature” you run them at the more life you get out of them - more battery is overpowered for your application the longer it will survive. Get your self a LSI 94** controller and use it as HBA and you will be fine. but get MORE DRIVES ! … >

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread hjcho616
Thank you Tomasz and Ronny.  I'll have to order some hdd soon and try these out.  Car battery idea is nice!  I may try that.. =)  Do they last longer?   Ones that fit the UPS original battery spec didn't last very long... part of the reason why I gave up on them.. =P  My wife probably won't like

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Tomasz Kusmierz
Sorry for being brutal … anyway 1. get the battery for UPS ( a car battery will do as well, I’ve moded on ups in the past with truck battery and it was working like a charm :D ) 2. get spare drives and put those in because your cluster CAN NOT get out of error due to lack of space 3. Follow advi

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread hjcho616
Tomasz, Those machines are behind a surge protector.  Doesn't appear to be a good one!   I do have a UPS... but it is my fault... no battery.  Power was pretty reliable for a while... and UPS was just beeping every chance it had, disrupting some sleep.. =P  So running on surge protector only.  I

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Ronny Aasen
> [SNIP - bad drives] Generally when a disk is displaying bad blocks to the OS, the drive have been remapping blocks for ages in the background. and the disk is really on it's last legs. a bit unlikely that you get so many disks dying at the same time tho. but the problem can have been silent

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Tomasz Kusmierz
So to decode few things about your disk: 1 Raw_Read_Error_Rate 0x002f 100 100 051Pre-fail Always - 37 37 read erros and only one sector marked as pending - fun disk :/ 181 Program_Fail_Cnt_Total 0x0022 099 099 000Old_age Always - 35325174 S

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Tomasz Kusmierz
I think you are looking at something more like this : https://www.google.co.uk/imgres?imgurl=https%3A%2F%2Fthumbs.dreamstime.com%2Fz%2Fhard-drive-being-destroyed-hammer-16668693.jpg&imgrefurl=https%3A%2F%2Fwww.dreamstime.com%2Fstock-photos-hard-drive-being-destroyed-hammer-image16668693&docid=Ofi7

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Tomasz Kusmierz
I think you are looking at something more like this : https://www.google.co.uk/imgres?imgurl=https%3A%2F%2Fthumbs.dreamstime.com%2Fz%2Fhard-drive-being-destroyed-hammer-16668693.jpg&imgrefurl=https%3A%2F%2Fwww.dreamstime.com%2Fstock-photos-hard-drive-being-destroyed-hammer-image16668693&docid=Ofi7

[ceph-users] CephFS: mount fs - single posing of failure

2017-08-28 Thread Oscar Segarra
Hi, In Ceph, by design there is no single point of failure I terms of server roles, nevertheless, from the client point of view, it might exist. In my environment: Mon1: 192.168.100.101:6789 Mon2: 192.168.100.102:6789 Mon3: 192.168.100.103:6789 Client: 192.168.100.104 I have created a line in

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread hjcho616
So.. would doing something like this could potentially bring it back to life? =) Analyzing a Faulty Hard Disk using Smartctl - Thomas-Krenn-Wiki | | | | || | | | | | Analyzing a Faulty Hard Disk using Smartctl - Thomas-Krenn-Wiki | | | | On Monday,

Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

2017-08-28 Thread Ali Maredia
Marc, These rpms (and debs) are built with the latest ganesha 2.5 stable release and the latest luminous release on download.ceph.com: http://download.ceph.com/nfs-ganesha/ I just put them up late last week, and I will be maintaining them in the future. -Ali - Original Message - > From

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Tomasz Kusmierz
I think you’ve got your anwser: 197 Current_Pending_Sector 0x0032 100 100 000Old_age Always - 1 > On 28 Aug 2017, at 21:22, hjcho616 wrote: > > Steve, > > I thought that was odd too.. > > Below is from the log, This captures transition from good to bad. Looks like

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread hjcho616
Steve, I thought that was odd too..  Below is from the log, This captures transition from good to bad. Looks like there is "Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors".   And looks like I did a repair with /dev/sdb1... =P # grep sdb syslog.1Aug 27 06:27:22 OSD1 smartd[1031]:

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Steve Taylor
I'm jumping in a little late here, but running xfs_repair on your partition can't frag your partition table. The partition table lives outside the partition block device and xfs_repair doesn't have access to it when run against /dev/sdb1. I haven't actually tested it, but it seems unlikely that

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread hjcho616
Tomasz, Looks like when I did xfs_repair -L /dev/sdb1 it did something to partition table and I don't see /dev/sdb1 anymore... or maybe I missed 1 in the /dev/sdb1? =(. Yes.. that extra power outage did a pretty good damage... =P  I am hoping 0.007% is very small...=P  Any recommendations on fix

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Ronny Aasen
comments inline On 28.08.2017 18:31, hjcho616 wrote: I'll see what I can do on that... Looks like I may have to add another OSD host as I utilized all of the SATA ports on those boards. =P Ronny, I am running with size=2 min_size=1. I created everything with ceph-deploy and didn't touch

Re: [ceph-users] RGW Multisite metadata sync init

2017-08-28 Thread David Turner
The vast majority of the sync error list is "failed to sync bucket instance: (16) Device or resource busy". I can't find anything on Google about this error message in relation to Ceph. Does anyone have any idea what this means? and/or how to fix it? On Fri, Aug 25, 2017 at 2:48 PM Casey Bodley

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Tomasz Kusmierz
Sorry mate I’ve just noticed the "unfound (0.007%)” I think that your main culprit here is osd.0. You need to have all osd’s on one host to get all the data back. Also for time being I would just change size and min size down to 1 and try to figure out which osd you actually need to get all the

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread hjcho616
Thank you all for suggestions! Maged, I'll see what I can do on that... Looks like I may have to add another OSD host as I utilized all of the SATA ports on those boards. =P Ronny, I am running with size=2 min_size=1.  I created everything with ceph-deploy and didn't touch much of that pool setti

Re: [ceph-users] State of play for RDMA on Luminous

2017-08-28 Thread Haomai Wang
On Mon, Aug 28, 2017 at 7:54 AM, Florian Haas wrote: > On Mon, Aug 28, 2017 at 4:21 PM, Haomai Wang wrote: >> On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas wrote: >>> Hello everyone, >>> >>> I'm trying to get a handle on the current state of the async messenger's >>> RDMA transport in Luminous,

Re: [ceph-users] State of play for RDMA on Luminous

2017-08-28 Thread David Disseldorp
Hi Florian, On Wed, 23 Aug 2017 10:26:45 +0200, Florian Haas wrote: > - In case there is no such support in the kernel yet: What's the current > status of RDMA support (and testing) with regard to > * libcephfs? > * the Samba Ceph VFS? On the client side, the SMB3 added an SMB-Direct protoco

Re: [ceph-users] Ceph on RDMA

2017-08-28 Thread Haomai Wang
do you follow this instruction(https://community.mellanox.com/docs/DOC-2693)? On Mon, Aug 28, 2017 at 6:40 AM, Jeroen Oldenhof wrote: > Hi All! > > I'm trying to run CEPH over RDMA, using a batch of Infiniband Mellanox > MT25408 20GBit (4x DDR) cards. > > RDMA is running, rping works between all

Re: [ceph-users] State of play for RDMA on Luminous

2017-08-28 Thread Florian Haas
On Mon, Aug 28, 2017 at 4:21 PM, Haomai Wang wrote: > On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas wrote: >> Hello everyone, >> >> I'm trying to get a handle on the current state of the async messenger's >> RDMA transport in Luminous, and I've noticed that the information >> available is a littl

Re: [ceph-users] State of play for RDMA on Luminous

2017-08-28 Thread Haomai Wang
On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas wrote: > Hello everyone, > > I'm trying to get a handle on the current state of the async messenger's > RDMA transport in Luminous, and I've noticed that the information > available is a little bit sparse (I've found > https://community.mellanox.com/do

[ceph-users] Ceph on RDMA

2017-08-28 Thread Jeroen Oldenhof
Hi All! I'm trying to run CEPH over RDMA, using a batch of Infiniband Mellanox MT25408 20GBit (4x DDR) cards. RDMA is running, rping works between all hosts, and I've configured 10.0.0.x addressing on the ib0 interfaces. But when enabling RMDA in ceph.conf:   ms_type = async+rdma   ms_asyn

Re: [ceph-users] OSD: no data available during snapshot

2017-08-28 Thread Dieter Jablanovsky
I was able to drill it further down. The messages get logged when I create a VM image snapshot like:"rbd snap create libvirt/wiki@backup" And while the snapshot gets deleted at the end. Btw. I'm runing Ceph 10.2.3 I saw this: http://tracker.ceph.com/issues/18990 and thought this migh

Re: [ceph-users] Ceph rbd lock

2017-08-28 Thread Jason Dillaman
The rbd CLI's "lock"-related commands are advisory locks that require an outside process to manage. The exclusive-lock feature replaces the advisory locks (and purposely conflicts with it so you cannot use both concurrently). I'd imagine at some point those CLI commands should be deprecated, but th

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Tomasz Kusmierz
Personally I would suggest to: - change minimal replication type to OSD (from default host) - remove the OSD from the host with all those "down OSD’s" (note that they are down not out which makes it more weird) - let single node cluster stabilise, yes performance will suck but at least you will h

[ceph-users] PGs in peered state?

2017-08-28 Thread Yuri Gorshkov
Hi. When trying to take down a host for maintenance purposes I encountered an I/O stall along with some PGs marked 'peered' unexpectedly. Cluster stats: 96/96 OSDs, healthy prior to incident, 5120 PGs, 4 hosts consisting of 24 OSDs each. Ceph version 11.2.0, using standard filestore (with LVM jou

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Ronny Aasen
On 28. aug. 2017 08:01, hjcho616 wrote: Hello! I've been using ceph for long time mostly for network CephFS storage, even before Argonaut release! It's been working very well for me. Yes, I had some power outtages before and asked few questions on this list before and got resolved happily!

Re: [ceph-users] Ceph Lock

2017-08-28 Thread Stuart Longland
Hi Marcelo, On 26/08/17 00:05, lis...@marcelofrota.info wrote: > Some days ago, I read about this comands rbd lock add and rbd lock > remove , this commands will go maintened in ceph in future versions, or > the better form, to use lock in ceph, will go exclusive-lock and this > commands will go

[ceph-users] Any information about ceph daemon metrics ?

2017-08-28 Thread Александр Высочин
I am looking for any materials which can help me to track and troubleshoot performance of my cluster, and particulary rados gateway. I am using command "ceph daemon 'daemon-name' perf dump", and in summary getting a ridiculous count of various metrics, but where i can find their description ?

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Maged Mokhtar
I would suggest either adding 1 new disk on each of the 2 machines increasing the osd_backfill_full_ratio to something like 90 or 92 from default 85. /Maged On 2017-08-28 08:01, hjcho616 wrote: > Hello! > > I've been using ceph for long time mostly for network CephFS storage, even > before

[ceph-users] Check bluestore db content and db partition usage

2017-08-28 Thread TYLin
Hello, We plan to change our filestore osd to bluestore backend, and doing survey now. Two questions need your help. 1. Is there any way to dump the rocksdb to let us check the content? 2. How can I get the space usage information of the db partition? We want to figure out a reasonable size for