Re: [ceph-users] pg remapped+peering forever and MDS trimming behind

2016-10-26 Thread Wido den Hollander
> Op 26 oktober 2016 om 20:44 schreef Brady Deetz : > > > Summary: > This is a production CephFS cluster. I had an OSD node crash. The cluster > rebalanced successfully. I brought the down node back online. Everything > has rebalanced except 1 hung pg and MDS trimming is now behind. No hardware

Re: [ceph-users] Dead pool recovery - Nightmare

2016-10-27 Thread Wido den Hollander
> Op 27 oktober 2016 om 11:23 schreef Ralf Zerres : > > > Hello community, > hello ceph developers, > > My name is Ralf working as IT-consultant. In this paticular case I do support > a > german customer running a 2 node CEPH cluster. > > This customer is struggeling with a desasterous situa

Re: [ceph-users] Dead pool recovery - Nightmare

2016-10-27 Thread Wido den Hollander
> Op 27 oktober 2016 om 11:46 schreef Ralf Zerres : > > > Here we go ... > > > > Wido den Hollander hat am 27. Oktober 2016 um 11:35 > > geschrieben: > > > > > > > > > Op 27 oktober 2016 om 11:23 schreef Ralf Zerres : >

Re: [ceph-users] Dead pool recovery - Nightmare

2016-10-27 Thread Wido den Hollander
Bringing back to the list > Op 27 oktober 2016 om 12:08 schreef Ralf Zerres : > > > > Wido den Hollander hat am 27. Oktober 2016 um 11:51 > > geschrieben: > > > > > > > > > Op 27 oktober 2016 om 11:46 schreef Ralf Zerres : > > > > >

Re: [ceph-users] Deep scrubbing causes severe I/O stalling

2016-10-28 Thread Wido den Hollander
> Op 28 oktober 2016 om 13:18 schreef Kees Meijs : > > > Hi, > > On 28-10-16 12:06, w...@42on.com wrote: > > I don't like this personally. Your cluster should be capable of doing > > a deep scrub at any moment. If not it will also not be able to handle > > a node failure during peak times. > >

Re: [ceph-users] Deep scrubbing causes severe I/O stalling

2016-10-31 Thread Wido den Hollander
gt; the scheduler? E.g. CFG for spinners _and_ noop for SSD? > Yes, CFQ for the spinners and noop for the SSD is good. The scrubbing doesn't touch the journal anyway. Wido > K. > > On 28-10-16 14:43, Wido den Hollander wrote: > >

Re: [ceph-users] Is straw2 bucket type working well?

2016-10-31 Thread Wido den Hollander
> Op 31 oktober 2016 om 11:33 schreef 한승진 : > > > Hi all, > > I tested straw / straw 2 bucket type. > > The Ceph document says below > > >- straw2 bucket type fixed several limitations in the original straw >bucket >- *the old straw buckets would change some mapping that should h

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Wido den Hollander
> Op 26 oktober 2016 om 11:18 schreef Wido den Hollander : > > > > > Op 26 oktober 2016 om 10:44 schreef Sage Weil : > > > > > > On Wed, 26 Oct 2016, Dan van der Ster wrote: > > > On Tue, Oct 25, 2016 at 7:06 AM, Wido den Hollander wrote: > &g

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Wido den Hollander
> Op 2 november 2016 om 14:30 schreef Sage Weil : > > > On Wed, 2 Nov 2016, Wido den Hollander wrote: > > > > > Op 26 oktober 2016 om 11:18 schreef Wido den Hollander : > > > > > > > > > > > > > Op 26 oktober 2016 om 10:44

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Wido den Hollander
> Op 2 november 2016 om 15:06 schreef Sage Weil : > > > On Wed, 2 Nov 2016, Wido den Hollander wrote: > > > > > Op 2 november 2016 om 14:30 schreef Sage Weil : > > > > > > > > > On Wed, 2 Nov 2016, Wido den Hollander wrote: > >

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Wido den Hollander
> Op 2 november 2016 om 16:00 schreef Sage Weil : > > > On Wed, 2 Nov 2016, Wido den Hollander wrote: > > > Op 2 november 2016 om 15:06 schreef Sage Weil : > > > > > > > > > On Wed, 2 Nov 2016, Wido den Hollander wrote: > > > &

Re: [ceph-users] All PGs are active+clean, still remapped PGs

2016-11-02 Thread Wido den Hollander
> Op 2 november 2016 om 16:21 schreef Sage Weil : > > > On Wed, 2 Nov 2016, Wido den Hollander wrote: > > > > > I'm pretty sure this is a race condition that got cleaned up as part > > > > > of > > > > > https://github.com/ce

[ceph-users] Monitors stores not trimming after upgrade from Dumpling to Hammer

2016-11-03 Thread Wido den Hollander
Hi, After finally resolving the remapped PGs [0] I'm running into a a problem where the MON stores are not trimming. health HEALTH_WARN noscrub,nodeep-scrub flag(s) set 1 mons down, quorum 0,1 1,2 mon.1 store is getting too big! 37115 MB >= 15360 MB

Re: [ceph-users] Monitors stores not trimming after upgrade from Dumpling to Hammer

2016-11-03 Thread Wido den Hollander
m when all PGs are active+clean. A cluster can go into WARN state for almost any reason, eg old CRUSH tunables. Will give it a try though. Wido > -- Dan > > > On Thu, Nov 3, 2016 at 10:40 AM, Wido den Hollander wrote: > > Hi, > > > > After finally resolving the remappe

Re: [ceph-users] Monitors stores not trimming after upgrade from Dumpling to Hammer

2016-11-03 Thread Wido den Hollander
> Op 3 november 2016 om 10:46 schreef Wido den Hollander : > > > > > Op 3 november 2016 om 10:42 schreef Dan van der Ster : > > > > > > Hi Wido, > > > > AFAIK mon's won't trim while a cluster is in HEALTH_WARN. Unset > >

Re: [ceph-users] Monitors stores not trimming after upgrade from Dumpling to Hammer

2016-11-03 Thread Wido den Hollander
> Op 3 november 2016 om 13:09 schreef Joao Eduardo Luis : > > > On 11/03/2016 09:40 AM, Wido den Hollander wrote: > > root@mon3:/var/lib/ceph/mon# ceph-monstore-tool ceph-mon3 dump-keys|awk > > '{print $1}'|uniq -c > > 96 auth > >

Re: [ceph-users] Monitors stores not trimming after upgrade from Dumpling to Hammer

2016-11-07 Thread Wido den Hollander
> Op 4 november 2016 om 2:05 schreef Joao Eduardo Luis : > > > On 11/03/2016 06:18 PM, w...@42on.com wrote: > > > >> Personally, I don't like this solution one bit, but I can't see any other > >> way without a patched monitor, or maybe ceph_monstore_tool. > >> > >> If you are willing to wait ti

Re: [ceph-users] Missing heartbeats, OSD spending time reconnecting - possible bug?

2016-11-11 Thread Wido den Hollander
> Op 11 november 2016 om 14:23 schreef Trygve Vea > : > > > Hi, > > We recently experienced a problem with a single OSD. This occurred twice. > > The problem manifested itself thus: > > - 8 placement groups stuck peering, all of which had the problematic OSD as > one of the acting OSDs in

Re: [ceph-users] CEPH mirror down again

2016-11-25 Thread Wido den Hollander
> Op 26 november 2016 om 5:13 schreef "Andrus, Brian Contractor" > : > > > Hmm. Apparently download.ceph.com = us-west.ceph.com > And there is no repomd.xml on us-east.ceph.com > You could check http://us-east.ceph.com/timestamp to see how far behind it is on download.ceph.com For what repo

[ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Wido den Hollander
Hi, As a Ceph consultant I get numerous calls throughout the year to help people with getting their broken Ceph clusters back online. The causes of downtime vary vastly, but one of the biggest causes is that people use replication 2x. size = 2, min_size = 1. In 2016 the amount of cases I have

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Wido den Hollander
ts, but it will be mainly useful in a flapping case where a OSD might have outdated data, but that's still better then nothing there. Wido > Cheers, Dan > > P.S. we're going to retry erasure coding for this cluster in 2017, > because clearly 4+2 or similar would be much

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Wido den Hollander
do not want to loose. Wido > Thanks! > > Regards, > Kees > > On 07-12-16 09:08, Wido den Hollander wrote: > > As a Ceph consultant I get numerous calls throughout the year to help > > people with getting their broken Ceph clusters back online. > > > >

Re: [ceph-users] 2x replication: A BIG warning

2016-12-07 Thread Wido den Hollander
> Op 7 december 2016 om 15:54 schreef LOIC DEVULDER : > > > Hi Wido, > > > As a Ceph consultant I get numerous calls throughout the year to help people > > with getting their broken Ceph clusters back online. > > > > The causes of downtime vary vastly, but one of the biggest causes is that > >

[ceph-users] CephFS recovery from missing metadata objects questions

2016-12-07 Thread Wido den Hollander
(I think John knows the answer, but sending to ceph-users for archival purposes) Hi John, A Ceph cluster lost a PG with CephFS metadata in there and it is currently doing a CephFS disaster recovery as described here: http://docs.ceph.com/docs/master/cephfs/disaster-recovery/ This data pool has

Re: [ceph-users] CephFS recovery from missing metadata objects questions

2016-12-07 Thread Wido den Hollander
> Op 7 december 2016 om 16:38 schreef John Spray : > > > On Wed, Dec 7, 2016 at 3:28 PM, Wido den Hollander wrote: > > (I think John knows the answer, but sending to ceph-users for archival > > purposes) > > > > Hi John, > > > > A Ceph cluster l

Re: [ceph-users] CephFS recovery from missing metadata objects questions

2016-12-07 Thread Wido den Hollander
> Op 7 december 2016 om 16:53 schreef John Spray : > > > On Wed, Dec 7, 2016 at 3:46 PM, Wido den Hollander wrote: > > > >> Op 7 december 2016 om 16:38 schreef John Spray : > >> > >> > >> On Wed, Dec 7, 2016 at 3:28 PM, Wido den Hollan

Re: [ceph-users] CephFS recovery from missing metadata objects questions

2016-12-07 Thread Wido den Hollander
> Op 7 december 2016 om 20:54 schreef John Spray : > > > On Wed, Dec 7, 2016 at 7:47 PM, Wido den Hollander wrote: > > > >> Op 7 december 2016 om 16:53 schreef John Spray : > >> > >> > >> On Wed, Dec 7, 2016 at 3:46 PM, Wido den Hollan

Re: [ceph-users] [EXTERNAL] Re: 2x replication: A BIG warning

2016-12-07 Thread Wido den Hollander
t it doesn't have the changes which #3 had. The result is corrupted data. Does this make sense? Wido > On 12/7/16, 9:11 AM, "ceph-users on behalf of LOIC DEVULDER" > wrote: > > > -Message d'origine- > > De : Wido den Hollander [

Re: [ceph-users] [EXTERNAL] Re: 2x replication: A BIG warning

2016-12-07 Thread Wido den Hollander
on. Without doing anything the PG will be marked as down+incomplete Wido > > Mit freundlichen Grüßen / best regards, > Kevin Olbrich. > > 2016-12-07 21:10 GMT+01:00 Wido den Hollander : > > > > > > Op 7 december 2016 om 21:04 schreef "Will.Boege" >

Re: [ceph-users] [EXTERNAL] Re: 2x replication: A BIG warning

2016-12-07 Thread Wido den Hollander
oss situations and that is why I started this thread in the first place. min_size is just a additional protection mechanism against data loss. Wido > I guess it’s just where you want to put that needle on the spectrum of > availability vs integrity. > > On 12/7/16, 2:10 PM, &quo

Re: [ceph-users] 2x replication: A BIG warning

2016-12-11 Thread Wido den Hollander
> Op 9 december 2016 om 22:31 schreef Oliver Humpage : > > > > > On 7 Dec 2016, at 15:01, Wido den Hollander wrote: > > > > I would always run with min_size = 2 and manually switch to min_size = 1 if > > the situation really requires it at that moment. &g

Re: [ceph-users] Crush rule check

2016-12-12 Thread Wido den Hollander
> Op 10 december 2016 om 12:45 schreef Adrian Saul > : > > > > Hi Ceph-users, > I just want to double check a new crush ruleset I am creating - the intent > here is that over 2 DCs, it will select one DC, and place two copies on > separate hosts in that DC. The pools created on this will

Re: [ceph-users] Crush rule check

2016-12-12 Thread Wido den Hollander
> > > -Original Message- > > From: Wido den Hollander [mailto:w...@42on.com] > > Sent: Monday, 12 December 2016 7:07 PM > > To: ceph-users@lists.ceph.com; Adrian Saul > > Subject: Re: [ceph-users] Crush rule check > > > > > > > Op 10

Re: [ceph-users] Upgrading from Hammer

2016-12-13 Thread Wido den Hollander
> Op 13 december 2016 om 9:05 schreef Kees Meijs : > > > Hi guys, > > In the past few months, I've read some posts about upgrading from > Hammer. Maybe I've missed something, but I didn't really read something > on QEMU/KVM behaviour in this context. > > At the moment, we're using: > > > $ qe

[ceph-users] cannot commit period: period does not have a master zone of a master zonegroup

2016-12-15 Thread Wido den Hollander
Hi, On a Ceph cluster running Jewel 10.2.5 I'm running into a problem. I want to change the amount of shards: # radosgw-admin zonegroup-map get > zonegroup.json # nano zonegroup.json # radosgw-admin zonegroup-map set --infile zonegroup.json # radosgw-admin period update --commit Now, the error

Re: [ceph-users] Monitors stores not trimming after upgrade from Dumpling to Hammer

2016-12-15 Thread Wido den Hollander
> Op 7 november 2016 om 13:17 schreef Wido den Hollander : > > > > > Op 4 november 2016 om 2:05 schreef Joao Eduardo Luis : > > > > > > On 11/03/2016 06:18 PM, w...@42on.com wrote: > > > > > >> Personally, I don't like this sol

Re: [ceph-users] ceph and rsync

2016-12-16 Thread Wido den Hollander
> Op 16 december 2016 om 9:26 schreef Alessandro Brega > : > > > Hi guys, > > I'm running a ceph cluster using 0.94.9-1trusty release on XFS for RBD > only. I'd like to replace some SSDs because they are close to their TBW. > > I know I can simply shutdown the OSD, replace the SSD, restart th

Re: [ceph-users] CephFS metdata inconsistent PG Repair Problem

2016-12-19 Thread Wido den Hollander
> Op 19 december 2016 om 18:14 schreef Sean Redmond : > > > Hi Ceph-Users, > > I have been running into a few issue with cephFS metadata pool corruption > over the last few weeks, For background please see > tracker.ceph.com/issues/17177 > > # ceph -v > ceph version 10.2.5 (c461ee19ecbc0c5c330

Re: [ceph-users] Unwanted automatic restart of daemons during an upgrade since 10.2.5 (on Trusty)

2016-12-20 Thread Wido den Hollander
> Op 20 december 2016 om 0:52 schreef Francois Lafont > : > > > Hi, > > On 12/19/2016 09:58 PM, Ken Dreyer wrote: > > > I looked into this again on a Trusty VM today. I set up a single > > mon+osd cluster on v10.2.3, with the following: > > > > # status ceph-osd id=0 > > ceph-osd (ceph/0

Re: [ceph-users] How exactly does rgw work?

2016-12-20 Thread Wido den Hollander
> Op 20 december 2016 om 3:24 schreef Gerald Spencer : > > > Hello all, > > We're currently waiting on a delivery of equipment for a small 50TB proof > of concept cluster, and I've been lurking/learning a ton from you. Thanks > for how active everyone is. > > Question(s): > How does the raids

Re: [ceph-users] Upgrading from Hammer

2016-12-20 Thread Wido den Hollander
ammer/Jewel client can talk to a Hammer/Jewel cluster. One thing, don't change any CRUSH tunables if the cluster runs Jewel and the client is still on Hammer. The librados/librbd version is what matters. If you upgrade the cluster to Jewel and leave the client on Hammer it works. Wido

Re: [ceph-users] Upgrading from Hammer

2016-12-20 Thread Wido den Hollander
e we really didn't? > If you didn't touch them nothing happened. You can download the CRUSHMap and check the tunables set on top after decompiling it. See: http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables Wido > Regards, > Kees > > On 20-12-16 10:1

Re: [ceph-users] cannot commit period: period does not have a master zone of a master zonegroup

2016-12-20 Thread Wido den Hollander
0 0 error in read_id for id : (2) No such file or directory 2016-12-20 16:38:07.960035 7f9571697a00 0 error in read_id for id : (2) No such file or directory Brought me to: - http://tracker.ceph.com/issues/15776 - https://github.com/ceph/ceph/pull/8994 Doesn't seem to be backported to 10

Re: [ceph-users] tracker.ceph.com

2016-12-20 Thread Wido den Hollander
> Op 20 december 2016 om 17:31 schreef Nathan Cutler : > > > > Looks like it was trying to send mail over IPv6 and failing. > > > > I switched back to postfix, disabled IPv6, and show a message was > > recently queued for delivery to you. Please confirm you got it. > > Got it. Thanks for the f

Re: [ceph-users] Unwanted automatic restart of daemons during an upgrade since 10.2.5 (on Trusty)

2016-12-20 Thread Wido den Hollander
> Op 20 december 2016 om 17:13 schreef Francois Lafont > : > > > On 12/20/2016 10:02 AM, Wido den Hollander wrote: > > > I think it is commit 0cdf3bc875447c87fdc0fed29831554277a3774b: > > https://github.com/ceph/ceph/commit/0cdf3bc875447c87fdc0fed29831554277

Re: [ceph-users] When Zero isn't 0 (Crush weight mysteries)

2016-12-21 Thread Wido den Hollander
> Op 21 december 2016 om 2:39 schreef Christian Balzer : > > > > Hello, > > I just (manually) added 1 OSD each to my 2 cache-tier nodes. > The plan was/is to actually do the data-migration at the least busiest day > in Japan, New Years (the actual holiday is January 2nd this year). > > So I

Re: [ceph-users] cannot commit period: period does not have a master zone of a master zonegroup

2016-12-22 Thread Wido den Hollander
> Op 20 december 2016 om 18:06 schreef Orit Wasserman : > > > On Tue, Dec 20, 2016 at 5:39 PM, Wido den Hollander wrote: > > > >> Op 15 december 2016 om 17:10 schreef Orit Wasserman : > >> > >> > >> Hi Wido, > >> > >&

Re: [ceph-users] What is pauserd and pausewr status?

2016-12-22 Thread Wido den Hollander
> Op 22 december 2016 om 17:55 schreef Stéphane Klein > : > > > Hi, > > I have this status: > > bash-4.2# ceph status > cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac > health HEALTH_WARN > pauserd,pausewr,sortbitwise,require_jewel_osds flag(s) set > monmap e1: 3 mons

Re: [ceph-users] What is pauserd and pausewr status?

2016-12-23 Thread Wido den Hollander
> Op 23 december 2016 om 10:31 schreef Stéphane Klein > : > > > 2016-12-22 18:09 GMT+01:00 Wido den Hollander : > > > > > > Op 22 december 2016 om 17:55 schreef Stéphane Klein < > > cont...@stephane-klein.info>: > > > > > > >

Re: [ceph-users] BlueStore with v11.1.0 Kraken

2016-12-23 Thread Wido den Hollander
> Op 22 december 2016 om 14:36 schreef Eugen Leitl : > > > Hi guys, > > I'm building a first test cluster for homelab, and would like to start > using BlueStore since data loss is not critical. However, there are > obviously no official documentation for basic best usage online yet. > True, s

Re: [ceph-users] rgw leaking data, orphan search loop

2016-12-23 Thread Wido den Hollander
> Op 22 december 2016 om 19:00 schreef Orit Wasserman : > > > HI Maruis, > > On Thu, Dec 22, 2016 at 12:00 PM, Marius Vaitiekunas > wrote: > > On Thu, Dec 22, 2016 at 11:58 AM, Marius Vaitiekunas > > wrote: > >> > >> Hi, > >> > >> 1) I've written before into mailing list, but one more time. W

Re: [ceph-users] BlueStore with v11.1.0 Kraken

2016-12-24 Thread Wido den Hollander
> Op 23 december 2016 om 14:34 schreef Eugen Leitl : > > > > Hi Wido, > > thanks for your comments. > > On Fri, Dec 23, 2016 at 02:00:44PM +0100, Wido den Hollander wrote: > > > > My original layout was using 2x single Xeon nodes with 24 GB RAM e

Re: [ceph-users] rgw leaking data, orphan search loop

2016-12-24 Thread Wido den Hollander
> Op 23 december 2016 om 16:05 schreef Wido den Hollander : > > > > > Op 22 december 2016 om 19:00 schreef Orit Wasserman : > > > > > > HI Maruis, > > > > On Thu, Dec 22, 2016 at 12:00 PM, Marius Vaitiekunas > > wrote: > > > On

Re: [ceph-users] Atomic Operations?

2016-12-24 Thread Wido den Hollander
> Op 23 december 2016 om 21:14 schreef Kent Borg : > > > Hello, a newbie here! > > Doing some playing with Python and librados, and it is mostly easy to > use, but I am confused about atomic operations. The documentation isn't > clear to me, and Google isn't giving me obvious answers either..

Re: [ceph-users] Java librados issue

2016-12-27 Thread Wido den Hollander
> Op 26 december 2016 om 19:24 schreef Bogdan SOLGA : > > > Hello, everyone! > > I'm trying to integrate the Java port of librados > into our app, using this > > sample as a reference.

Re: [ceph-users] Java librados issue

2016-12-27 Thread Wido den Hollander
age/storage_backend_rbd.c;h=b1c51ab1b472f6bb599e8ac0c22cf7d0d2e1949a;hb=HEAD#l90 it doesn't read a config file. Wido > Thanks, > Bogdan > > > On Tue, Dec 27, 2016 at 3:11 PM, Wido den Hollander wrote: > > > > > > Op 26 december 2016 om 19:24 schreef Bogd

Re: [ceph-users] How to know if an object is stored in clients?

2016-12-29 Thread Wido den Hollander
> Op 28 december 2016 om 12:58 schreef Jaemyoun Lee : > > > Hello, > > I executed the RADOS tool to store an object as follows: > ``` > user@ClientA:~$ rados put -p=rbd objectA a.txt > ``` > > I wonder how the client knows a completion of storing the object in some > OSDs. > When the primary

Re: [ceph-users] Unbalanced OSD's

2016-12-30 Thread Wido den Hollander
> Op 30 december 2016 om 11:06 schreef Kees Meijs : > > > Hi Asley, > > We experience (using Hammer) a similar issue. Not that I have a perfect > solution to share, but I felt like mentioning a "me too". ;-) > > On a side note: we configured correct weight per drive as well. > Ceph will neve

Re: [ceph-users] linux kernel version for clients

2016-12-31 Thread Wido den Hollander
> Op 31 december 2016 om 6:56 schreef Manuel Sopena Ballesteros > : > > > Hi, > > I have several questions regarding kernel running on client machines: > > > * Why is kernel 3.10 considered an old kernel to run ceph clients? > Development in the Ceph world goes fast and 3.10 is a o

Re: [ceph-users] Migrate cephfs metadata to SSD in running cluster

2017-01-02 Thread Wido den Hollander
> Op 2 januari 2017 om 10:33 schreef Shinobu Kinjo : > > > I've never done migration of cephfs_metadata from spindle disks to > ssds. But logically you could achieve this through 2 phases. > > #1 Configure CRUSH rule including spindle disks and ssds > #2 Configure CRUSH rule for just pointing

Re: [ceph-users] Cluster pause - possible consequences

2017-01-02 Thread Wido den Hollander
> Op 2 januari 2017 om 15:43 schreef Matteo Dacrema : > > > Increasing pg_num will lead to several slow requests and cluster freeze, but > due to creating pgs operation , for what I’ve seen until now. > During the creation period all the request are frozen , and the creation > period take a l

Re: [ceph-users] Migrate cephfs metadata to SSD in running cluster

2017-01-03 Thread Wido den Hollander
#x27;s not that much data thus recovery will go quickly, but don't expect a CephFS performance improvement. Wido > Mike > > On 1/2/17 11:50 AM, Wido den Hollander wrote: > > > >> Op 2 januari 2017 om 10:33 schreef Shinobu Kinjo : > >> > >> >

Re: [ceph-users] Ceph all-possible configuration options

2017-01-03 Thread Wido den Hollander
> Op 3 januari 2017 om 13:05 schreef Rajib Hossen > : > > > Hello, > I am exploring ceph and installed a mini cluster with 1 mon, 3 osd node(3 > osd daemon each node). For that I wrote a ceph.conf file with only needed > configuration options(see below) > > fsid = > mon initial members = host

Re: [ceph-users] RGW pool usage is higher that total bucket size

2017-01-05 Thread Wido den Hollander
> Op 5 januari 2017 om 10:08 schreef Luis Periquito : > > > Hi, > > I have a cluster with RGW in which one bucket is really big, so every > so often we delete stuff from it. > > That bucket is now taking 3.3T after we deleted just over 1T from it. > That was done last week. > > The pool (.rgw

Re: [ceph-users] Write back cache removal

2017-01-09 Thread Wido den Hollander
> Op 9 januari 2017 om 13:02 schreef Stuart Harland > : > > > Hi, > > We’ve been operating a ceph storage system storing files using librados > (using a replicated pool on rust disks). We implemented a cache over the top > of this with SSDs, however we now want to turn this off. > > The doc

Re: [ceph-users] Write back cache removal

2017-01-10 Thread Wido den Hollander
> Op 10 januari 2017 om 9:52 schreef Nick Fisk : > > > > -Original Message- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > > Wido den Hollander > > Sent: 10 January 2017 07:54 > > To: ceph new ; Stuart Harland &

Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-06 Thread Wido den Hollander
On 07/05/2018 03:36 PM, John Spray wrote: > On Thu, Jul 5, 2018 at 1:42 PM Dennis Kramer (DBS) wrote: >> >> Hi list, >> >> I have a serious problem now... I think. >> >> One of my users just informed me that a file he created (.doc file) has >> a different content then before. It looks like the

Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-06 Thread Wido den Hollander
On 07/06/2018 01:47 PM, John Spray wrote: > On Fri, Jul 6, 2018 at 12:19 PM Wido den Hollander wrote: >> >> >> >> On 07/05/2018 03:36 PM, John Spray wrote: >>> On Thu, Jul 5, 2018 at 1:42 PM Dennis Kramer (DBS) wrote: >>>> >>>>

[ceph-users] Mimic 13.2.1 release date

2018-07-09 Thread Wido den Hollander
Hi, Is there a release date for Mimic 13.2.1 yet? There are a few issues which currently make deploying with Mimic 13.2.0 a bit difficult, for example: - https://tracker.ceph.com/issues/24423 - https://github.com/ceph/ceph/pull/22393 Especially the first one makes it difficult. 13.2.1 would be

Re: [ceph-users] CephFS - How to handle "loaded dup inode" errors

2018-07-11 Thread Wido den Hollander
is? Wido > > Cheers, > > Linh > > > > > *From:* John Spray > *Sent:* Tuesday, 10 July 2018 7:11 PM > *To:* Linh Vu > *Cc:* Wido den Hollander; ceph-users@lists.ceph.com > *Subject:* Re: [ceph

Re: [ceph-users] mimic (13.2.0) and "Failed to send data to Zabbix"

2018-07-11 Thread Wido den Hollander
On 07/11/2018 10:02 AM, ceph.nov...@habmalnefrage.de wrote: > anyone with "mgr Zabbix enabled" migrated from Luminous (12.2.5 or 5) and has > the same problem in Mimic now? > if I disable and re-enable the "zabbix" module, the status is "HEALTH_OK" for > some sec. and changes to "HEALTH_WARN"

Re: [ceph-users] SSDs for data drives

2018-07-11 Thread Wido den Hollander
On 07/11/2018 10:10 AM, Robert Stanford wrote: > >  In a recent thread the Samsung SM863a was recommended as a journal > SSD.  Are there any recommendations for data SSDs, for people who want > to use just SSDs in a new Ceph cluster? > Depends on what you are looking for, SATA, SAS3 or NVMe?

Re: [ceph-users] mimic (13.2.0) and "Failed to send data to Zabbix"

2018-07-11 Thread Wido den Hollander
On 07/11/2018 10:22 AM, ceph.nov...@habmalnefrage.de wrote: > at about the same time we also updated the Linux OS via "YUM" to: > > # more /etc/redhat-release > Red Hat Enterprise Linux Server release 7.5 (Maipo) > > > > from the given error message, it seems like there are 32 "measure points

Re: [ceph-users] Safe to use rados -p rbd cleanup?

2018-07-16 Thread Wido den Hollander
On 07/15/2018 11:12 AM, Mehmet wrote: > hello guys, > > in my production cluster i've many objects like this > > "#> rados -p rbd ls | grep 'benchmark'" > ... .. . > benchmark_data_inkscope.example.net_32654_object1918 > benchmark_data_server_26414_object1990 > ... .. . > > Is it safe to run

Re: [ceph-users] Mimic 13.2.1 release date

2018-07-23 Thread Wido den Hollander
Any news on this yet? 13.2.1 would be very welcome! :-) Wido On 07/09/2018 05:11 PM, Wido den Hollander wrote: > Hi, > > Is there a release date for Mimic 13.2.1 yet? > > There are a few issues which currently make deploying with Mimic 13.2.0 > a bit difficult, for exa

Re: [ceph-users] Read/write statistics per RBD image

2018-07-24 Thread Wido den Hollander
On 07/24/2018 12:51 PM, Mateusz Skala (UST, POL) wrote: > Hello again, > > How can I determine $cctid for specific rbd name? Or is there any good > way to map admin-socket with rbd? > Yes, check the output of 'perf dump', you can fetch the RBD image information from that JSON output. Wido >

Re: [ceph-users] Fwd: Mons stucking in election afther 3 Days offline

2018-07-26 Thread Wido den Hollander
On 07/26/2018 10:12 AM, Benjamin Naber wrote: > Hi together, > > we currently have some problems with monitor quorum after shutting down all > cluster nodes for migration to another location. > > mon_status gives uns the following outputt: > > { > "name": "mon01", > "rank": 0, > "state":

Re: [ceph-users] Fwd: Mons stucking in election afther 3 Days offline

2018-07-26 Thread Wido den Hollander
o get some more information about the Messenger Traffic. Wido > kind regards > > Ben > >> Wido den Hollander hat am 26. Juli 2018 um 10:18 > geschrieben: >> >> >> >> >> On 07/26/2018 10:12 AM, Benjamin Naber wrote: >> > Hi together, >> >

Re: [ceph-users] Fwd: Mons stucking in election afther 3 Days offline

2018-07-26 Thread Wido den Hollander
_FOOTER_AND_DISPATCH pgs=74 cs=1 l=1). rx > client.? seq 1 0x55aa46be4fc0 auth(proto 0 30 bytes epoch 0) v1 > 2018-07-26 11:46:24.004914 7f819e167700 10 -- 10.111.73.1:6789/0 >> > 10.111.73.3:0/1033315403 conn(0x55aa46bc1000 :6789 s=STATE_OPEN pgs=74 > cs=1 l=1).handle_write >

Re: [ceph-users] Mimi Telegraf plugin on Luminous

2018-07-31 Thread Wido den Hollander
On 07/31/2018 09:38 AM, Denny Fuchs wrote: > hi, > > I try to get the Telegraf plugin from Mimic on Luminous running (Debian > Stretch). I copied the files from the Git into > /usr/lib/ceph/mgr/telegraf; enabled the plugin and get: > > > 2018-07-31 09:25:46.501858 7f496cfc9700 -1 log_channel(c

Re: [ceph-users] Run ceph-rest-api in Mimic

2018-08-01 Thread Wido den Hollander
On 08/01/2018 12:00 PM, Ha, Son Hai wrote: > Hello everybody! > >   > > Because some of my applications are depended on the obsoleted > ceph-rest-api module, I would like to know if there is a way to run it > in Mimic? If I understood correctly, the new restful plugin > (http://docs.ceph.com/d

Re: [ceph-users] ceph-mgr dashboard behind reverse proxy

2018-08-04 Thread Wido den Hollander
On 08/04/2018 09:04 AM, Tobias Florek wrote: > Hi! > > Thank you for your reply. > >>> I want to set up the dashboard behind a reverse proxy. How do >>> people determine which ceph-mgr is active? Is there any simple and >>> elegant solution? >> >> You can use haproxy. It supports periodic che

Re: [ceph-users] Beginner's questions regarding Ceph, Deployment with ceph-ansible

2018-08-07 Thread Wido den Hollander
On 08/07/2018 11:23 AM, Jörg Kastning wrote: > Am 06.08.2018 um 22:01 schrieb Pawel S: >> On Mon, Aug 6, 2018 at 3:08 PM J?rg Kastning > wrote: >>> But what are agents, rgws, nfss, restapis, rbdmirrors, clients and >>> iscsi-gws? Where could I found additional information about them? Where >>> f

Re: [ceph-users] failing to respond to cache pressure

2018-08-13 Thread Wido den Hollander
On 08/13/2018 01:22 PM, Zhenshi Zhou wrote: > Hi, > Recently, the cluster runs healthy, but I get warning messages everyday: > Which version of Ceph? Which version of clients? Can you post: $ ceph versions $ ceph features $ ceph fs status Wido > 2018-08-13 17:39:23.682213 [INF]  Cluster is

Re: [ceph-users] BlueStore wal vs. db size

2018-08-14 Thread Wido den Hollander
On 08/15/2018 04:17 AM, Robert Stanford wrote: > I am keeping the wal and db for a ceph cluster on an SSD.  I am using > the masif_bluestore_block_db_size / masif_bluestore_block_wal_size > parameters in ceph.conf to specify how big they should be.  Should these > values be the same, or should on

Re: [ceph-users] limited disk slots - should I ran OS on SD card ?

2018-08-15 Thread Wido den Hollander
On 08/14/2018 09:12 AM, Burkhard Linke wrote: > Hi, > > > AFAIk SD cards (and SATA DOMs) do not have any kind of wear-leveling > support. Even if the crappy write endurance of these storage systems > would be enough to operate a server for several years on average, you > will always have some

Re: [ceph-users] BlueStore wal vs. db size

2018-08-15 Thread Wido den Hollander
> partitions for each? > Yes, that is correct. Each OSD needs 10GB/1TB of storage of DB. So size your SSD according to your storage needs. However, it depends on the workload if you need to offload WAL+DB to a SSD. What is the workload? Wido > On Wed, Aug 15, 2018 at 1:59 AM, Wido den H

Re: [ceph-users] BlueStore wal vs. db size

2018-08-15 Thread Wido den Hollander
vice can also help. Keep in mind that the 'journal' doesn't apply anymore with BlueStore. That was a FileStore thing. Wido > On Wed, Aug 15, 2018 at 10:58 AM, Wido den Hollander <mailto:w...@42on.com>> wrote: > > > > On 08/15/2018 05:57 PM, Robert Sta

Re: [ceph-users] Removing all rados objects based on a prefix

2018-08-20 Thread Wido den Hollander
On 08/20/2018 05:20 PM, David Turner wrote: > The general talk about the rados cleanup command is to clean things up > after benchmarking.  Could this command also be used for deleting an old > RGW bucket or an RBD.  For instance, a bucket with a prefix of > `25ff9eff-058b-41e3-8724-cfffecb979c0.

Re: [ceph-users] ceph auto repair. What is wrong?

2018-08-23 Thread Wido den Hollander
On 08/24/2018 06:11 AM, Fyodor Ustinov wrote: > Hi! > > I have fresh ceph cluster. 12 host and 3 osd on each host (one - hdd and two > - ssd). Each host located in own rack. > > I make such crush configuration on fresh ceph installation: > >sudo ceph osd crush add-bucket R-26-3-1 rack >

[ceph-users] PG auto repair with BlueStore

2018-08-23 Thread Wido den Hollander
Hi, osd_scrub_auto_repair still defaults to false and I was wondering how we think about enabling this feature by default. Would we say it's safe to enable this with BlueStore? Wido ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.cep

Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Wido den Hollander
Hi, $ ceph-volume lvm list Does that work for you? Wido On 10/08/2018 12:01 PM, Kevin Olbrich wrote: > Hi! > > Is there an easy way to find raw disks (eg. sdd/sdd1) by OSD id? > Before I migrated from filestore with simple-mode to bluestore with lvm, > I was able to find the raw disk with "df"

Re: [ceph-users] Mons are using a lot of disk space and has a lot of old osd maps

2018-10-08 Thread Wido den Hollander
On 10/08/2018 05:04 PM, Aleksei Zakharov wrote: > Hi all, > > We've upgraded our cluster from jewel to luminous and re-created monitors > using rocksdb. > Now we see, that mon's are using a lot of disk space and used space only > grows. It is about 17GB for now. It was ~13GB when we used level

Re: [ceph-users] Mons are using a lot of disk space and has a lot of old osd maps

2018-10-09 Thread Wido den Hollander
: api2 >> osd: 832 osds: 791 up, 790 in >>  flags noout,nodeep-scrub >> >>   data: >> pools: 10 pools, 52336 pgs >> objects: 47.78M objects, 238TiB >>     usage: 854TiB used, 1.28PiB / 2.12PiB avail >> pgs: 52336 active+clean &g

[ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-10 Thread Wido den Hollander
Hi, On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm seeing OSDs writing heavily to their logfiles spitting out these lines: 2018-10-10 21:52:04.019037 7f90c2f0f700 0 stupidalloc 0x0x55828ae047d0 dump 0x15cd2078000~34000 2018-10-10 21:52:04.019038 7f90c2f0f700 0 stupidallo

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-10 Thread Wido den Hollander
On 10/11/2018 12:08 AM, Wido den Hollander wrote: > Hi, > > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm > seeing OSDs writing heavily to their logfiles spitting out these lines: > > > 2018-10-10 21:52:04.019037 7f90c2f0f700 0 stupidall

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-11 Thread Wido den Hollander
e cluster is a mix of SSDs and HDDs. The problem is with the SSD OSDs. So we moved a pool from SSD to HDD and that seems to have fixed the problem for now. But it will probably get back as soon as some OSDs go >80%. Wido > On Wed, Oct 10, 2018 at 6:37 PM Wido den Hollander <mailto:w...

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Wido den Hollander
On 10/11/2018 12:08 AM, Wido den Hollander wrote: > Hi, > > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm > seeing OSDs writing heavily to their logfiles spitting out these lines: > > > 2018-10-10 21:52:04.019037 7f90c2f0f700 0 stupidall

Re: [ceph-users] SSD for MON/MGR/MDS

2018-10-15 Thread Wido den Hollander
On 10/15/2018 07:50 PM, solarflow99 wrote: > I think the answer is, yes.  I'm pretty sure only the OSDs require very > long life enterprise grade SSDs > Yes and No. Please use reliable Datacenter Grade SSDs for your MON databases. Something like 200GB is more then enough in your MON servers.

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Wido den Hollander
s://github.com/ceph/ceph/pull/24543 It will stop the spamming, but that's not the root cause. The OSDs in this case are at max 80% full and they do have a lot of OMAP (RGW indexes) in them, but that's all. I'm however not sure why this is happening suddenly in this cluster. Wido > -G

<    4   5   6   7   8   9   10   11   12   13   >