Re: [ceph-users] MDS damaged

2017-10-25 Thread Daniel Davidson
The system is down again saying it is missing the same stray7 again. 2017-10-25 11:24:29.736774 mds.0 [WRN] failed to reconnect caps for missing inodes: 2017-10-25 11:24:29.736779 mds.0 [WRN]  ino 100147160e6 2017-10-25 11:24:29.753665 mds.0 [ERR] dir 607 object missing on disk; some files

Re: [ceph-users] Why size=3

2017-10-25 Thread Brian Andrus
Apologies, corrected second link: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-March/016663.html On Wed, Oct 25, 2017 at 9:44 AM, Brian Andrus wrote: > Please see the following mailing list topics that have covered this topic > in detail: > > "2x

Re: [ceph-users] Why size=3

2017-10-25 Thread Brian Andrus
Please see the following mailing list topics that have covered this topic in detail: "2x replication: A BIG warning": https://www.spinics.net/lists/ceph-users/msg32915.html "replica questions": https://www.spinics.net/lists/ceph-users/msg32915.html On Wed, Oct 25, 2017 at 9:39 AM, Ian Bobbitt

Re: [ceph-users] Deep scrub distribution

2017-10-25 Thread Denes Dolhay
I think you are searching for this: |osd scrub sleep| |http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/| | | |Denes.| On 10/25/2017 06:06 PM, Alejandro Comisario wrote: any comment on this one ? interesting what to do in this situation On Wed, Jul 5, 2017 at 10:51 PM,

Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-25 Thread Bryan Stillwell
That helps a little bit, but overall the process would take years at this rate: # for i in {1..3600}; do ceph df -f json-pretty |grep -A7 '".rgw.buckets"' |grep objects; sleep 60; done "objects": 1660775838 "objects": 1660775733 "objects":

Re: [ceph-users] Infinite degraded objects

2017-10-25 Thread Christian Wuerdig
Well, there were a few bug logged around upgraded which hit a similar assert but those were fixed 2 years ago supposedly. Looks like Ubuntu 15.04 shipped Hammer (0.94.5) so presumably that's what you upgraded from. The current Jewel release is 10.2.10 - I don't know if the problem you're seeing is

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-25 Thread Russell Glaue
Thanks to all. I took the OSDs down in the problem host, without shutting down the machine. As predicted, our MB/s about doubled. Using this bench/atop procedure, I found two other OSDs on another host that are the next bottlenecks. Is this the only good way to really test the performance of the

[ceph-users] web access breaks while 1 host reboot

2017-10-25 Thread Малков Пётр Викторович
HI 3 jewel hosts All pools min_size 2 size 3 All 3 RGW are balanced by nginx If I restart specific services it's OK But when I reboot a host Web dashboard through balancer gives back 502 - Web server received an invalid response while acting as a gateway or proxy server. While 2 host still

Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-25 Thread Yehuda Sadeh-Weinraub
On Wed, Oct 25, 2017 at 2:32 PM, Bryan Stillwell wrote: > That helps a little bit, but overall the process would take years at this > rate: > > > > # for i in {1..3600}; do ceph df -f json-pretty |grep -A7 '".rgw.buckets"' > |grep objects; sleep 60; done > >

Re: [ceph-users] How to increase the size of requests written to a ceph image

2017-10-25 Thread Maged Mokhtar
It depends on what stage you are in: in production, probably the best thing is to setup a monitoring tool (collectd/grahite/prometheus/grafana) to monitor both ceph stats as well as resource load. This will, among other things, show you if you have slowing disks. Before production you should

Re: [ceph-users] Infinite degraded objects

2017-10-25 Thread Gonzalo Aguilar Delgado
Hi Christian, I've just upgraded to 10.2.10 and the problem still persist. Both. OSD not starting (the most problematic now) and the wrong report of degraded objects:    20266198323226120/281736 objects degraded (7193329330730.229%)   Any ideas about how to resolve the problem with the

Re: [ceph-users] MDS damaged

2017-10-25 Thread Daniel Davidson
Thanks for the information. I did: # ceph daemon mds.ceph-0 scrub_path / repair recursive Saw in the logs it finished # ceph daemon mds.ceph-0 flush journal Saw in the logs it finished #ceph mds fail 0 #ceph mds repaired 0 And it went back to missing stray7 again.  I added that back like we

Re: [ceph-users] Bluestore with SSD-backed DBs; what if the SSD fails?

2017-10-25 Thread Wido den Hollander
> Op 25 oktober 2017 om 10:39 schreef koukou73gr : > > > On 2017-10-25 11:21, Wido den Hollander wrote: > > > >> Op 25 oktober 2017 om 5:58 schreef Christian Sarrasin > >> : > >> > >> The one thing I'm still wondering about is failure domains.

Re: [ceph-users] rbd rm snap on image with exclusive lock

2017-10-25 Thread Jason Dillaman
That log is showing that a snap remove request was made from a client that couldn't acquire the lock to a client that currently owns the lock. The client that currently owns the lock responded w/ an -ENOENT error that the snapshot doesn't exist. Depending on the maintenance operation requested,

Re: [ceph-users] MDS damaged

2017-10-25 Thread John Spray
Commands that start "ceph daemon" take mds. rather than a rank (notes on terminology here: http://docs.ceph.com/docs/master/cephfs/standby/). The name is how you would refer to the daemon from systemd, it's often set to the hostname where the daemon is running by default. John On Wed, Oct 25,

Re: [ceph-users] rbd rm snap on image with exclusive lock

2017-10-25 Thread Piotr Dałek
On 17-10-25 02:39 PM, Jason Dillaman wrote: That log is showing that a snap remove request was made from a client that couldn't acquire the lock to a client that currently owns the lock. The client that currently owns the lock responded w/ an -ENOENT error that the snapshot doesn't exist.

Re: [ceph-users] rbd rm snap on image with exclusive lock

2017-10-25 Thread Jason Dillaman
Thanks -- let me know. In the future, you may want to consider having librbd create an admin socket so that you can change (certain) settings or interact w/ the process w/o restarting it. On Wed, Oct 25, 2017 at 9:54 AM, Piotr Dałek wrote: > On 17-10-25 03:30 PM, Jason

Re: [ceph-users] MDS damaged

2017-10-25 Thread Daniel Davidson
John, thank you so much.  After doing the initial rados command you mentioned it is back up and running.  It did complain about a bunch of files which frankly are not important having duplicate inodes, but I will run those repair and scrub commands you mentioned and get it back clean again.

Re: [ceph-users] rbd rm snap on image with exclusive lock

2017-10-25 Thread Jason Dillaman
Hmm, hard to say off the top of my head. If you could enable "debug librbd = 20" logging on the buggy client that owns the lock, create a new snapshot, and attempt to delete it, it would be interesting to verify that the image is being properly refreshed. On Wed, Oct 25, 2017 at 9:23 AM, Piotr

Re: [ceph-users] rbd rm snap on image with exclusive lock

2017-10-25 Thread Piotr Dałek
On 17-10-25 03:30 PM, Jason Dillaman wrote: Hmm, hard to say off the top of my head. If you could enable "debug librbd = 20" logging on the buggy client that owns the lock, create a new snapshot, and attempt to delete it, it would be interesting to verify that the image is being properly

Re: [ceph-users] MDS damaged

2017-10-25 Thread Daniel Davidson
I do have a problem with running the commands you mentioned to repair the mds: # ceph daemon mds.0 scrub_path admin_socket: exception getting command descriptions: [Errno 2] No such file or directory admin_socket: exception getting command descriptions: [Errno 2] No such file or directory

Re: [ceph-users] Bluestore with SSD-backed DBs; what if the SSD fails?

2017-10-25 Thread Mark Nelson
On 10/25/2017 03:51 AM, Caspar Smit wrote: Hi, I've asked the exact same question a few days ago, same answer: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021708.html I guess we'll have to bite the bullet on this one and take this into account when designing. This is

Re: [ceph-users] iSCSI gateway for ceph

2017-10-25 Thread Marc Roos
Hi Giang, Can I ask you if you used the elrepo kernels? Because I tried these, but they are not booting because of I think the mpt2sas mpt3sas drivers. Regards, Marc -Original Message- From: GiangCoi Mr [mailto:ltrgian...@gmail.com] Sent: woensdag 25 oktober 2017 16:11 To:

Re: [ceph-users] Deep scrub distribution

2017-10-25 Thread Alejandro Comisario
any comment on this one ? interesting what to do in this situation On Wed, Jul 5, 2017 at 10:51 PM, Adrian Saul wrote: > > > During a recent snafu with a production cluster I disabled scrubbing and > deep scrubbing in order to reduce load on the cluster while

Re: [ceph-users] iSCSI gateway for ceph

2017-10-25 Thread Jorge Pinilla López
https://github.com/ceph/ceph-iscsi-cli/issues/36 I have already asked for that, you can remove the check or download the 2.5 release which doesnt check for OS El 25/10/2017 a las 17:02, Marc Roos escribió: > > > Hi Giang, > > Can I ask you if you used the elrepo kernels? Because I tried

Re: [ceph-users] iSCSI gateway for ceph

2017-10-25 Thread GiangCoi Mr
Yes, I used elerepo to upgrade kernel, I can boot and show it, kernel 4.x. What is the problem? Sent from my iPhone > On Oct 25, 2017, at 10:02 PM, Marc Roos wrote: > > > > Hi Giang, > > Can I ask you if you used the elrepo kernels? Because I tried these, but >

[ceph-users] s3 bucket permishions

2017-10-25 Thread nigel davies
Hay All is it possible to set permissions to buckets for example if i have 2 users (user_a and user_b) and 2 buckets (bk_a and bk_b) i want to set permissions, so user a can only see bk_a and user b can only see bk_b I have been looking at cant see what i am after. Any advise would be

[ceph-users] iSCSI gateway for ceph

2017-10-25 Thread GiangCoi Mr
Hi all. I am researching with Ceph for Storage. I am using 3 VM: ceph01, ceph02, ceph03. All VM is using CentOS 7.4 with kernel from 4.x (I upgraded). Now I want to configure high availability iSCSI with ceph-iscsi-cli. I installed ceph-iscsi-cli on ceph01. But when I create isci gateway by

Re: [ceph-users] iSCSI gateway for ceph

2017-10-25 Thread Jorge Pinilla López
It does support for CentOS or other distributions but there is not avaible yet a way of checking for pre requisites so the only way to automaticaly detect that they are met is using RedHat. Still that is a pre-requisite check, you can delete it if you manually ensure they are met. :) Check the

Re: [ceph-users] iSCSI gateway for ceph

2017-10-25 Thread GiangCoi Mr
So, do you have other solution to configure iscsi gateway for ceph? Please help me Sent from my iPhone > On Oct 25, 2017, at 10:17 PM, Jorge Pinilla López wrote: > > It does support for CentOS or other distributions but there is not avaible > yet a way of checking for pre

Re: [ceph-users] iSCSI gateway for ceph

2017-10-25 Thread GiangCoi Mr
Hi jorge pinilla Lopez So, it mean, now ceph iscsi doesn’t support for centos? Sent from my iPhone > On Oct 25, 2017, at 10:07 PM, Jorge Pinilla López wrote: > > https://github.com/ceph/ceph-iscsi-cli/issues/36 > > I have already asked for that, you can remove the check or

Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-25 Thread Bryan Stillwell
We tried various options like the one's Ben mentioned to speed up the garbage collection process and were unsuccessful. Luckily, we had the ability to create a new cluster and move all the data that wasn't part of the POC which created our problem. One of the things we ran into was the

Re: [ceph-users] iSCSI gateway for ceph

2017-10-25 Thread Jason Dillaman
I believe you can append "skipchecks" to the "create ceph01 192.168.101.151" action. The tools still expect to have a kernel that includes queue full timeout handling [1][2] which is awaiting upstream review. That was added to support some low, non-configurable timeouts in ESX environments when

Re: [ceph-users] s3 bucket permishions

2017-10-25 Thread David Turner
Are you talking about RGW buckets with limited permissions for cephx authentication? Or RGW buckets with limited permissions for RGW users? On Wed, Oct 25, 2017 at 12:16 PM nigel davies wrote: > Hay All > > is it possible to set permissions to buckets > > for example if i

Re: [ceph-users] OSD daemons active in nodes after removal

2017-10-25 Thread Michael Kuriger
When I do this, I reweight all of the OSDs I want to remove to 0 first, wait for the rebalance, then proceed to remove the OSDs. Doing it your way, you have to wait for the rebalance after removing each OSD one by one. Mike Kuriger Sr. Unix Systems Engineer 818-434-6195

[ceph-users] Hammer to Jewel Upgrade - Extreme OSD Boot Time

2017-10-25 Thread Chris Jones
After upgrading from CEPH Hammer to Jewel, we are experiencing extremely long osd boot duration. This long boot time is a huge concern for us and are looking for insight into how we can speed up the boot time. In Hammer, OSD boot time was approx 3 minutes. After upgrading to Jewel, boot time

[ceph-users] OSD daemons active in nodes after removal

2017-10-25 Thread Karun Josy
Hello everyone! :) I have an interesting problem. For a few weeks, we've been testing Luminous in a cluster made up of 8 servers and with about 20 SSD disks almost evenly distributed. It is running erasure coding. Yesterday, we decided to bring the cluster to a minimum of 8 servers and 1 disk

Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-25 Thread Yehuda Sadeh-Weinraub
Some of the options there won't do much for you as they'll only affect newer object removals. I think the default number of gc objects is just inadequate for your needs. You can try manually running 'radosgw-admin gc process' concurrently (for the start 2 or 3 processes), see if it makes any dent

Re: [ceph-users] s3 bucket permishions

2017-10-25 Thread nigel davies
I am fallowing a guide at the mo. But I believe it's RWG users On 25 Oct 2017 5:29 pm, "David Turner" wrote: > Are you talking about RGW buckets with limited permissions for cephx > authentication? Or RGW buckets with limited permissions for RGW users? > > On Wed, Oct

Re: [ceph-users] announcing ceph-helm (ceph on kubernetes orchestration)

2017-10-25 Thread Hans van den Bogert
Very interesting. I've been toying around with Rook.io [1]. Did you know of this project, and if so can you tell if ceph-helm and Rook.io have similar goals? Regards, Hans [1] https://rook.io/ On 25 Oct 2017 21:09, "Sage Weil" wrote: > There is a new repo under the ceph

[ceph-users] ceph zstd not for bluestor due to performance reasons

2017-10-25 Thread Stefan Priebe - Profihost AG
Hello, in the lumious release notes is stated that zstd is not supported by bluestor due to performance reason. I'm wondering why btrfs instead states that zstd is as fast as lz4 but compresses as good as zlib. Why is zlib than supported by bluestor? And why does btrfs / facebook behave

Re: [ceph-users] Infinite degraded objects

2017-10-25 Thread Gonzalo Aguilar Delgado
Hello, I cannot tell what was the previous version since I used the one installed on ubuntu 15.04. Now 16.04. But what I can tell is that I get errors from ceph osd and mon from time to time. The mon problems are scaring since I have to wipe the monitor and then reinstall a new one. I cannot

[ceph-users] ceph zstd not for bluestor due to performance reasons

2017-10-25 Thread Stefan Priebe - Profihost AG
Hello, in the lumious release notes is stated that zstd is not supported by bluestor due to performance reason. I'm wondering why btrfs instead states that zstd is as fast as lz4 but compresses as good as zlib. Why is zlib than supported by bluestor? And why does btrfs / facebook behave

Re: [ceph-users] iSCSI gateway for ceph

2017-10-25 Thread Marc Roos
I could not get it to boot on CentOS7 with just installing it. I think it is because of the booting from mpt2sas and that driver is replaced with mpt3sas in >4.x kernels. I was even recreating the boot initrd's, but could not get it to run quickly. -Original Message- From:

[ceph-users] announcing ceph-helm (ceph on kubernetes orchestration)

2017-10-25 Thread Sage Weil
There is a new repo under the ceph org, ceph-helm, which includes helm charts for deploying ceph on kubernetes. The code is based on the ceph charts from openstack-helm, but we've moved them into their own upstream repo here so that they can be developed more quickly and independently from

Re: [ceph-users] announcing ceph-helm (ceph on kubernetes orchestration)

2017-10-25 Thread Sage Weil
On Wed, 25 Oct 2017, Hans van den Bogert wrote: > Very interesting.I've been toying around with Rook.io [1]. Did you know of > this project, and if so can you tell if ceph-helm > and Rook.io have similar goals? Similar but a bit different. Probably the main difference is that ceph-helm aims to

Re: [ceph-users] ceph zstd not for bluestor due to performance reasons

2017-10-25 Thread Sage Weil
On Wed, 25 Oct 2017, Stefan Priebe - Profihost AG wrote: > Hello, > > in the lumious release notes is stated that zstd is not supported by > bluestor due to performance reason. I'm wondering why btrfs instead > states that zstd is as fast as lz4 but compresses as good as zlib. > > Why is zlib

Re: [ceph-users] MDS damaged

2017-10-25 Thread danield
Hi Ronny, >From the documentation, I thought this was the proper way to resolve the issue. Dan > On 24. okt. 2017 19:14, Daniel Davidson wrote: >> Our ceph system is having a problem. >> >> A few days a go we had a pg that was marked as inconsistent, and today I >> fixed it with a: >> >> #ceph

Re: [ceph-users] Bluestore with SSD-backed DBs; what if the SSD fails?

2017-10-25 Thread Wido den Hollander
> Op 25 oktober 2017 om 5:58 schreef Christian Sarrasin > : > > > I'm planning to migrate an existing Filestore cluster with (SATA) > SSD-based journals fronting multiple HDD-hosted OSDs - should be a > common enough setup. So I've been trying to parse various

Re: [ceph-users] Bluestore with SSD-backed DBs; what if the SSD fails?

2017-10-25 Thread koukou73gr
On 2017-10-25 11:21, Wido den Hollander wrote: > >> Op 25 oktober 2017 om 5:58 schreef Christian Sarrasin >> : >> >> The one thing I'm still wondering about is failure domains. With >> Filestore and SSD-backed journals, an SSD failure would kill writes but >> OSDs

Re: [ceph-users] MDS damaged

2017-10-25 Thread Ronny Aasen
On 24. okt. 2017 19:14, Daniel Davidson wrote: Our ceph system is having a problem. A few days a go we had a pg that was marked as inconsistent, and today I fixed it with a: #ceph pg repair 1.37c then a file was stuck as missing so I did a: #ceph pg 1.37c mark_unfound_lost delete pg has 1

[ceph-users] rbd rm snap on image with exclusive lock

2017-10-25 Thread Piotr Dałek
Hello, I have a 10.2.5 Ceph cluster, there is an image with exclusive lock that is being held by client. Some other client creates a snapshot on that image, then (that client) goes away. Later, third client attempts to remove that snapshot using rbd snap rm, but fails to do so without error:

Re: [ceph-users] Bluestore with SSD-backed DBs; what if the SSD fails?

2017-10-25 Thread Caspar Smit
Hi, I've asked the exact same question a few days ago, same answer: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021708.html I guess we'll have to bite the bullet on this one and take this into account when designing. Kind regards, Caspar 2017-10-25 10:39 GMT+02:00

Re: [ceph-users] UID Restrictions

2017-10-25 Thread Gregory Farnum
On Mon, Oct 23, 2017 at 5:03 PM Keane Wolter wrote: > Hi Gregory, > > I did set the cephx caps for the client to: > > caps: [mds] allow r, allow rw uid=100026 path=/user, allow rw path=/project > So you’ve got three different permission granting clauses here: 1) allows the

Re: [ceph-users] Reported bucket size incorrect (Luminous)

2017-10-25 Thread Mark Schouten
I'm on Luminous with this cluster. I've seen that the cluster started cleaning up on sunday, which made the bucketsize shrink again. I've changed the garbagecollection settings to: rgw_gc_max_objs = 67 rgw_gc_obj_min_wait = 1800 rgw_gc_processor_max_time = 1800 rgw_gc_processor_period = 1800

Re: [ceph-users] pg inconsistent and repair doesn't work

2017-10-25 Thread Wei Jin
I found it is similar to bug: http://tracker.ceph.com/issues/21388. And fix it by rados command. The pg inconsistent info is like following,wish it could be fixed in the future. root@n10-075-019:/var/lib/ceph/osd/ceph-27/current/1.fcd_head# rados list-inconsistent-obj 1.fcd --format=json-pretty