Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread hjcho616
This is what it looks like today.  Seems like ceph-osds are sitting at 0% cpu so... all the migrations appear to be done,  Does this look ok to shutdown and continue when I get the HDD on Thursday? # ceph healthHEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 20 pgs

Re: [ceph-users] v12.2.0 Luminous released

2017-08-29 Thread kefu chai
On Wed, Aug 30, 2017 at 11:50 AM, Xiaoxi Chen wrote: > The ceph -v for 12.2.0 still go with RC, a little bit confusing > > root@slx03c-5zkd:~# ceph -v > ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc) https://github.com/ceph/ceph/pull/17359

[ceph-users] Ceph Developers Monthly - September

2017-08-29 Thread Leonardo Vaz
Hey Cephers, This is just a friendly reminder that the next Ceph Developer Montly meeting is coming up: http://wiki.ceph.com/Planning If you have work that you're doing that it a feature work, significant backports, or anything you would like to discuss with the core team, please add it to the

[ceph-users] get error when use prometheus plugin of ceph-mgr

2017-08-29 Thread shawn tim
Hello, I just want to try prometheus plugin of ceph-mgr. Following this doc(http://docs.ceph.com/docs/master/mgr/prometheus/) . I get output like [root@ceph01 ~]# curl localhost:9283/metrics/ | head % Total% Received % Xferd Average

Re: [ceph-users] Help with down OSD with Ceph 12.1.4 on Bluestore back

2017-08-29 Thread Mark Nelson
Hi Bryan, Check out your SCSI device failures, but if that doesn't pan out, Sage and I have been tracking this: http://tracker.ceph.com/issues/21171 There's a fix in place being tested now! Mark On 08/29/2017 05:41 PM, Bryan Banister wrote: Found some bad stuff in the messages file about

Re: [ceph-users] Help with down OSD with Ceph 12.1.4 on Bluestore back

2017-08-29 Thread Bryan Banister
Found some bad stuff in the messages file about SCSI block device fails... I think I found my smoking gun... -B From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bryan Banister Sent: Tuesday, August 29, 2017 5:02 PM To: ceph-users@lists.ceph.com Subject: [ceph-users] Help

[ceph-users] Help with down OSD with Ceph 12.1.4 on Bluestore back

2017-08-29 Thread Bryan Banister
Hi all, Not sure what to do with this down OSD: -2> 2017-08-29 16:55:34.588339 72d58700 1 -- 7.128.13.57:6979/18818 --> 7.128.13.55:0/52877 -- osd_ping(ping_reply e935 stamp 2017-08-29 16:55:34.587991) v4 -- 0x67397000 con 0 -1> 2017-08-29 16:55:34.588351 72557700 1 --

[ceph-users] Centos7, luminous, cephfs, .snaps

2017-08-29 Thread Marc Roos
Where can I find some examples on creating a snapshot on a directory. Can I just do mkdir .snaps? I tried with stock kernel and a 4.12.9-1 http://docs.ceph.com/docs/luminous/dev/cephfs-snapshots/ ___ ceph-users mailing list

Re: [ceph-users] OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1

2017-08-29 Thread Tomasz Kusmierz
so on IRC I was asked to add this log from OSD that was marked as missing during scrub: https://pastebin.com/raw/YQj3Drzi ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1

2017-08-29 Thread David Zafman
Please file a bug in tracker: http://tracker.ceph.com/projects/ceph When an OSD is marked down is there are a crash (e.g. assert, heartbeat timeout, declared down by another daemon)?  Please include relevant log snippets.  If no obvious information, then bump osd debug log levels. Luminous

[ceph-users] Possible way to clean up leaked multipart objects?

2017-08-29 Thread William Schroeder
Hello! Our team finally had a chance to take another look at the problem identified by Brian Felton in http://tracker.ceph.com/issues/16767. Basically, if any multipart objects are retried before an Abort or Complete, they remain on the system, taking up space and leaving their accounting in

Re: [ceph-users] Broken Ceph Cluster when adding new one - Proxmox 5.0 & Ceph Luminous

2017-08-29 Thread Phil Schwarz
Hi, back to work, i face my problem. @Alexandre : AMDTurion for N54L HP Microserver. This server is OSD and LXC only, no mon working in. After rebooting the whole cluster and attempting to add a third time the same disk : ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Tomasz Kusmierz
Maged, on second host he has 4 out of 5 OSD failed on him … I think he’s past the trying to increase the backfill threshold :) ofcourse he could try to degrade cluster by letting mirror within same host :) > On 29 Aug 2017, at 21:26, Maged Mokhtar wrote: > > One of the

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Maged Mokhtar
One of the things to watch out in small clusters is OSDs can get full rather unexpectedly in recovery/backfill cases: In your case you have 2 OSD nodes with 5 disks each. Since you have a replica of 2, each PG will have 1 copy on each host, so if an OSD fails, all its PGs will have to be

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Tomasz Kusmierz
Just FYI, setting size and min_size to 1 is a last resort in my mind - to get you out of dodge !! Before setting that you should have made your self 105% certain that all OSD you leave ON, have NO bad sectors or no sectors pending or no any errors of any kind. once you can mount the

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Willem Jan Withagen
On 29-8-2017 19:12, Steve Taylor wrote: > Hong, > > Probably your best chance at recovering any data without special, > expensive, forensic procedures is to perform a dd from /dev/sdb to > somewhere else large enough to hold a full disk image and attempt to > repair that. You'll want to use

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread hjcho616
Nice!  Thank you for the explanation!  I feel like I can revive that OSD. =)   That does sound great.  I don't quite have another cluster so waiting for a drive to arrive! =)   After setting min and max_min to 1, looks like toofull flag is gone... Maybe when I was making that video copy OSDs

[ceph-users] v12.2.0 Luminous released

2017-08-29 Thread Abhishek Lekshmanan
We're glad to announce the first release of Luminous v12.2.x long term stable release series. There have been major changes since Kraken (v11.2.z) and Jewel (v10.2.z), and the upgrade process is non-trivial. Please read the release notes carefully. For more details, links & changelog please

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread David Turner
But it was absolutely awesome to run an osd off of an rbd after the disk failed. On Tue, Aug 29, 2017, 1:42 PM David Turner wrote: > To addend Steve's success, the rbd was created in a second cluster in the > same datacenter so it didn't run the risk of deadlocking that

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread David Turner
To addend Steve's success, the rbd was created in a second cluster in the same datacenter so it didn't run the risk of deadlocking that mapping rbds on machines running osds has. It is still theoretical to work on the same cluster, but more inherently dangerous for a few reasons. On Tue, Aug 29,

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Steve Taylor
Hong, Probably your best chance at recovering any data without special, expensive, forensic procedures is to perform a dd from /dev/sdb to somewhere else large enough to hold a full disk image and attempt to repair that. You'll want to use 'conv=noerror' with your dd command since your disk is

Re: [ceph-users] OSD's flapping on ordinary scrub with cluster being static (after upgrade to 12.1.1

2017-08-29 Thread Tomasz Kusmierz
So nobody has any clue on this one ??? Should I go with this one to dev mailing list ? > On 27 Aug 2017, at 01:49, Tomasz Kusmierz wrote: > > Hi, > for purposes of experimenting I’m running a home cluster that consists of > single node and 4 OSD (weights in crush map

Re: [ceph-users] RGW Multisite metadata sync init

2017-08-29 Thread Orit Wasserman
Hi David, On Mon, Aug 28, 2017 at 8:33 PM, David Turner wrote: > The vast majority of the sync error list is "failed to sync bucket > instance: (16) Device or resource busy". I can't find anything on Google > about this error message in relation to Ceph. Does anyone

Re: [ceph-users] Grafana Dasboard

2017-08-29 Thread Félix Barbeira
Hi, You can check the official site: https://grafana.com/dashboards?search=ceph 2017-08-29 3:08 GMT+02:00 Shravana Kumar.S : > All, > I am looking for Grafana dashboard to monitor CEPH. I am using telegraf to > collect the metrics and influxDB to store the value. > >

Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

2017-08-29 Thread Marc Roos
nfs-ganesha-2.5.2-.el7.x86_64.rpm ^ Is this correct? -Original Message- From: Marc Roos Sent: dinsdag 29 augustus 2017 11:40 To: amaredia; wooertim Cc: ceph-users Subject: Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7 Ali, Very very nice! I was creating

Re: [ceph-users] Cephfs fsal + nfs-ganesha + el7/centos7

2017-08-29 Thread Marc Roos
Ali, Very very nice! I was creating the rpm's based on a old rpm source spec. And it was a hastle to get them to build, and I am not sure if I even used to correct compile settings. -Original Message- From: Ali Maredia [mailto:amare...@redhat.com] Sent: maandag 28 augustus 2017

Re: [ceph-users] CephFS: mount fs - single posing of failure

2017-08-29 Thread Oscar Segarra
Hi, thanks a lot... I apologize for my simple question! El 29 ago. 2017 1:38, "Michael Kuriger" escribió: > I use automount (/etc/auto.auto) > > > > Example: > > ceph-fstype=ceph,name=admin,secretfile=/etc/ceph/admin.secret,noatime >

Re: [ceph-users] State of play for RDMA on Luminous

2017-08-29 Thread Haomai Wang
On Tue, Aug 29, 2017 at 12:01 AM, Florian Haas wrote: > Sorry, I worded my questions poorly in the last email, so I'm asking > for clarification here: > > On Mon, Aug 28, 2017 at 6:04 PM, Haomai Wang wrote: >> On Mon, Aug 28, 2017 at 7:54 AM, Florian Haas

Re: [ceph-users] State of play for RDMA on Luminous

2017-08-29 Thread Florian Haas
Sorry, I worded my questions poorly in the last email, so I'm asking for clarification here: On Mon, Aug 28, 2017 at 6:04 PM, Haomai Wang wrote: > On Mon, Aug 28, 2017 at 7:54 AM, Florian Haas wrote: >> On Mon, Aug 28, 2017 at 4:21 PM, Haomai Wang