[ceph-users] Ceph Volume Issue

2016-11-15 Thread Mehul1.Jani
Hi All, We have a Ceph Storage Cluster and it's been integrated with our Openstack private cloud. We have created a Pool for Volume which allows our Openstack Private Cloud user to create a volume from image and boot from volume. Additionally our images(both Ubuntu1404 and CentOS 7) are in a

[ceph-users] Fwd: iSCSI Lun issue after MON Out Of Memory

2016-11-15 Thread Daleep Singh Bais
Dear All, Any suggestion in this regard will be helpful. Thanks, Daleep Singh Bais Forwarded Message Subject:iSCSI Lun issue after MON Out Of Memory Date: Tue, 15 Nov 2016 11:58:07 +0530 From: Daleep Singh Bais To: ceph-users

[ceph-users] stalls caused by scrub on jewel

2016-11-15 Thread Sage Weil
Hi everyone, There was a regression in jewel that can trigger long OSD stalls during scrub. How long the stalls are depends on how many objects are in your PGs, how fast your storage device is, and what is cached, but in at least one case they were long enough that the OSD internal heartbeat

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-15 Thread Peter Maloney
On 11/15/16 22:13, Thomas Danan wrote: > Very interesting ... > > Any idea why optimal tunable would help here ? I think there are some versions where it rebalances data a bunch to even things out... I don't know why I think that...where I read it or anything. Maybe it was only argonaut vs newer.

Re: [ceph-users] Ceph and container

2016-11-15 Thread John Petrini
I forgot to mention that we are running 2 of our 3 monitors in VM's on our OSD nodes. It's a small cluster with only two OSD nodes. The third monitor is on a VM on a separate host. It works well but we made sure the OSD's had plenty of extra resources to accommodate the VM's and the host OS is

Re: [ceph-users] Ceph and container

2016-11-15 Thread Matt Taylor
I think you may need to re-evaluate your situation. If you aren't willing to spend the $ on 3 Dedicated Servers, is your platform big enough to warrant the need for Ceph? On 16/11/16 01:25, Matteo Dacrema wrote: Hi, does anyone ever tried to run ceph monitors in containers? Could it lead

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-15 Thread Peter Maloney
On 11/15/16 14:05, Thomas Danan wrote: > Hi Peter, > > Ceph cluster version is 0.94.5 and we are running with Firefly tunables and > also we have 10KPGs instead of the 30K / 40K we should have. > The linux kernel version is 3.10.0-327.36.1.el7.x86_64 with RHEL 7.2 > > On our side we havethe

Re: [ceph-users] After OSD Flap - FAILED assert(oi.version == i->first)

2016-11-15 Thread Samuel Just
http://tracker.ceph.com/issues/17916 I just pushed a branch wip-17916-jewel based on v10.2.3 with some additional debugging. Once it builds, would you be able to start the afflicted osds with that version of ceph-osd and debug osd = 20 debug ms = 1 debug filestore = 20 and get me the log? -Sam

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima
I removed cephfs and its pools, created everything again using the default crush ruleset, which is for the HDD, and now ceph health is OK. I appreciate your help. Thank you very much. On Tue, Nov 15, 2016 at 11:48 AM Webert de Souza Lima wrote: > Right, thank you. > > On

[ceph-users] Issues with RGW randomly restarting

2016-11-15 Thread John Rowe
Hello, We have 3 RGW servers setup with 5 OSDs. We have an application that is doing pretty steady writes, as well as a bunch of reads from that and other applications. Over the last week or so we have been seeing the app doing the writing getting blocked connections randomly, and in the RGW

Re: [ceph-users] Ceph and container

2016-11-15 Thread Tomasz Kuzemko
We are running all Ceph services inside LXC containers with XFS bind mounts since few years and it works great. Additionally we use macvlan for networking so each container has it's own IP address without any NATing. As for Docker (and specifically aufs/overlay), I would advise to test for data

Re: [ceph-users] Ceph and container

2016-11-15 Thread Daniel Gryniewicz
In addition, Red Hat is shipping a containerized Ceph (all daemons, not just mons) as a tech preview in RHCS, and the plan is to support it going forward. We have not seen performance issues related to being containerized. It's based on the ceph-docker and ceph-ansible projects. Daniel On

Re: [ceph-users] Ceph and container

2016-11-15 Thread John Petrini
I've had lots of success running monitors in VM's. Never tried the container route but there is a ceph-docker project https://github.com/ceph/ceph-docker if you want to give it a shot. I don't know how highly recommended that it though, I've got no personal experience with it. No matter what you

[ceph-users] Ceph and container

2016-11-15 Thread Matteo Dacrema
Hi, does anyone ever tried to run ceph monitors in containers? Could it lead to performance issues? Can I run monitor containers on the OSD nodes? I don’t want to buy 3 dedicated servers. Is there any other solution? Thanks Best regards Matteo Dacrema

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima
Right, thank you. On this particular cluster it would be Ok to have everything on the HDD. No big traffic here. In order to do that, do I need to delete this cephfs, delete its pools and create them again? After that I assume I would run ceph osd pool set cephfs_metadata crush_ruleset 0, as 0 is

[ceph-users] Best practices for use ceph cluster and directories with many! Entries

2016-11-15 Thread Hauke Homburg
Hello, wie have setup a ceph cluster with 10.0.2.3. under centos7. We have some Directories with more than 100k entries. We cannot! Unfortunately reduce directory count on the 100k Directories. As well as we don't want a ceph cluster with development functions. We installed the jewel release

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Burkhard Linke
Hi, On 11/15/2016 01:55 PM, Webert de Souza Lima wrote: sure, as requested: *cephfs* was created using the following command: ceph osd pool create cephfs_metadata 128 128 ceph osd pool create cephfs_data 128 128 ceph fs new cephfs cephfs_metadata cephfs_data *ceph.conf:*

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-15 Thread Thomas Danan
Hi Peter, Ceph cluster version is 0.94.5 and we are running with Firefly tunables and also we have 10KPGs instead of the 30K / 40K we should have. The linux kernel version is 3.10.0-327.36.1.el7.x86_64 with RHEL 7.2 On our side we havethe following settings: mon_osd_adjust_heartbeat_grace =

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima
sure, as requested: *cephfs* was created using the following command: ceph osd pool create cephfs_metadata 128 128 ceph osd pool create cephfs_data 128 128 ceph fs new cephfs cephfs_metadata cephfs_data *ceph.conf:* https://paste.debian.net/895841/ *# ceph osd crush

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-15 Thread Peter Maloney
Which kernel version are you using? I have a similar issue..ubuntu 14.04 kernel 3.13.0-96-generic, and ceph jewel 10.2.3. I get logs like this: 2016-11-15 13:13:57.295067 osd.9 10.3.0.132:6817/24137 98 : cluster [WRN] 16 slow requests, 5 included below; oldest blocked for > 7.957045 secs I set

[ceph-users] kernel versions and slow requests - WAS: Re: FW: Kernel 4.7 on OSD nodes

2016-11-15 Thread Peter Maloney
On 11/15/16 12:58, Оралов Алкексей wrote: > > > > Hello! > > > > I have problem with slow requests on kernel 4.4.0-45 , rolled back all > nodes to 4.4.0-42 > > > > Ubuntu 16.04.1 LTS (Xenial Xerus) > > ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) > > > Can you describe

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Burkhard Linke
Hi, On 11/15/2016 01:27 PM, Webert de Souza Lima wrote: Not that I know of. On 5 other clusters it works just fine and configuration is the same for all. On this cluster I was using only radosgw, but cephfs was not in use but it had been already created following our procedures. This

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima
Not that I know of. On 5 other clusters it works just fine and configuration is the same for all. On this cluster I was using only radosgw, but cephfs was not in use but it had been already created following our procedures. This happened right after mounting it. On Tue, Nov 15, 2016 at 10:24 AM

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread John Spray
On Tue, Nov 15, 2016 at 12:14 PM, Webert de Souza Lima wrote: > Hey John. > > Just to be sure; by "deleting the pools" you mean the cephfs_metadata and > cephfs_metadata pools, right? > Does it have any impact over radosgw? Thanks. Yes, I meant the cephfs pools. It

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima
I'm sorry, I meant *cephfs_data* and *cephfs_metadata* On Tue, Nov 15, 2016 at 10:15 AM Webert de Souza Lima wrote: > Hey John. > > Just to be sure; by "deleting the pools" you mean the *cephfs_metadata* > and *cephfs_metadata* pools, right? > Does it have any impact

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima
Hey John. Just to be sure; by "deleting the pools" you mean the *cephfs_metadata* and *cephfs_metadata* pools, right? Does it have any impact over radosgw? Thanks. On Tue, Nov 15, 2016 at 10:10 AM John Spray wrote: > On Tue, Nov 15, 2016 at 11:58 AM, Webert de Souza Lima >

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread John Spray
On Tue, Nov 15, 2016 at 11:58 AM, Webert de Souza Lima wrote: > Hi, > > after running a cephfs on my ceph cluster I got stuck with the following > heath status: > > # ceph status > cluster ac482f5b-dce7-410d-bcc9-7b8584bd58f5 > health HEALTH_WARN > 128

Re: [ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima
Also, i instructed all unclean pgs to repair and nothing happend. I did it like this: ~# for pg in `ceph pg dump_stuck unclean 2>&1 | grep -Po '[0-9]+\.[A-Za-z0-9]+'`; do ceph pg repair $pg; done On Tue, Nov 15, 2016 at 9:58 AM Webert de Souza Lima wrote: > Hi, > > after

[ceph-users] FW: Kernel 4.7 on OSD nodes

2016-11-15 Thread Оралов Алкексей
Hello! I have problem with slow requests on kernel 4.4.0-45 , rolled back all nodes to 4.4.0-42 Ubuntu 16.04.1 LTS (Xenial Xerus) ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) cid:image001.png@01CDADF0.79E46560 Оралов Алексей Отдел корпоративной сети и

[ceph-users] Can't recover pgs degraded/stuck unclean/undersized

2016-11-15 Thread Webert de Souza Lima
Hi, after running a cephfs on my ceph cluster I got stuck with the following heath status: # ceph status cluster ac482f5b-dce7-410d-bcc9-7b8584bd58f5 health HEALTH_WARN 128 pgs degraded 128 pgs stuck unclean 128 pgs undersized recovery

Re: [ceph-users] ceph-mon not starting on system startup (Ubuntu 16.04 / systemd)

2016-11-15 Thread Craig Chi
Hi, You can try to manually fix this by adding the /lib/systemd/system/ceph-mon.target file, which contains: === [Unit] Description=ceph target allowing to start/stop all ceph-mon@.service instances at once PartOf=ceph.target [Install]

Re: [ceph-users] Standby-replay mds: 10.2.2

2016-11-15 Thread John Spray
On Mon, Nov 14, 2016 at 11:35 PM, Goncalo Borges wrote: > Hi John... > > Thanks for replying. > > Some of the requested input is inline. > > Cheers > > Goncalo > > >>> >>> >>> We are currently undergoing an infrastructure migration. One of the first >>> machines to

Re: [ceph-users] ceph cluster having blocke requests very frequently

2016-11-15 Thread Thomas Danan
Hi Chris, We checked memory as well and we have plenty of free memory (12GB used / 125GB available) on each and every DN. Actually we have activated some Debug logs yesterday and found many messages like : 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7ff9bdb42700' had timed out after

Re: [ceph-users] ceph-mon not starting on system startup (Ubuntu 16.04 / systemd)

2016-11-15 Thread Matthew Vernon
Hi, On 15/11/16 01:27, Craig Chi wrote: > What's your Ceph version? > I am using Jewel 10.2.3 and systemd seems to work normally. I deployed > Ceph by ansible, too. The version in Ubuntu 16.04, which is 10.2.2-0ubuntu0.16.04.2 > You can check whether you have

[ceph-users] Kernel 4.7 on OSD nodes

2016-11-15 Thread Nick Fisk
Hi All, Just a slight note of caution. I had been running the 4.7 kernel (With Ubuntu 16.04) on the majority of my OSD Nodes, as when I installed the cluster there was that outstanding panic bug with the 4.4 kernel. I have been experiencing a lot of flapping OSD's every time the cluster was

[ceph-users] After OSD Flap - FAILED assert(oi.version == i->first)

2016-11-15 Thread Nick Fisk
Hi, I have two OSD's which are failing with an assert which looks related to missing objects. This happened after a large RBD snapshot was deleted causing several OSD's to start flapping as they experienced high load. Cluster is fully recovered and I don't need any help from a recovery

Re: [ceph-users] Intermittent permission denied using kernel client with mds path cap

2016-11-15 Thread Henrik Korkuc
I filled http://tracker.ceph.com/issues/17858 recently, I am seeing this problem on 10.2.3 ceph-fuse, but maybe kernel client is affected too. It is easy to replicate, just do deep "mkdir -p", e.g. "mkdir -p 1/2/3/4/5/6/7/8/9/0/1/2/3/4/5/6/7/8/9" On 16-11-11 10:46, Dan van der Ster wrote: