[ceph-users] Fwd: Ceph OSD status toggles between active and failed, monitor shows no osd

2018-04-16 Thread Akshita Parekh
Hi all, I configured two weeks back. OSD always shows failed status ,so i removed the OSDS and added them again,almost in the same process as given- http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/ My cluster has 2 OSDs, 1 monitor, 1 admin and 1 client. After removing and addin

[ceph-users] list submissions

2018-04-16 Thread ZHONG
list submissions ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph mds memory usage 20GB : is it normal ?

2018-04-16 Thread Yan, Zheng
On Sat, Apr 14, 2018 at 9:23 PM, Alexandre DERUMIER wrote: > Hi, > > Still leaking again after update to 12.2.4, around 17G after 9 days > > > > > USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > > ceph 629903 50.7 25.9 17473680 17082432 ? Ssl avril05 6498:21 >

[ceph-users] Ceph Jewel and Ubuntu 16.04

2018-04-16 Thread Shain Miley
Hello, We are currently running Ceph Jewel (10.2.10) on Ubuntu 14.04 in production.  We have been running into a kernel panic bug off an on for a while and I am starting to look into upgrading as a possible solution.  We are currently running version 4.4.0-31-generic kernel on these servers a

[ceph-users] osds with different disk sizes may killing, > performance (?? ?)

2018-04-16 Thread Chad William Seys
You'll find it said time and time agin on the ML... avoid disks of different sizes in the same cluster. It's a headache that sucks. It's not impossible, it's not even overly hard to pull off... but it's very easy to cause a mess and a lot of headaches. It will also make it harder to diagnose perf

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-16 Thread Oliver Freyermuth
Hi Paul, Am 16.04.2018 um 17:51 schrieb Paul Emmerich: > Hi, > > can you try to get a stack trace from ganesha (with gdb or from procfs) when > it's stuck? I can try, as soon as it happens again. The problem is that it's not fully stuck - only the other clients are stuck when trying to access

Re: [ceph-users] Fixing bad radosgw index

2018-04-16 Thread Robert Stanford
This doesn't work for me: for i in `radosgw-admin bucket list`; do radosgw-admin bucket unlink --bucket=$i --uid=myuser; done (tried with and without '=') Errors for each bucket: failure: (2) No such file or directory2018-04-16 15:37:54.022423 7f7c250fbc80 0 could not get bucket info for bu

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-16 Thread Paul Emmerich
Hi, can you try to get a stack trace from ganesha (with gdb or from procfs) when it's stuck? Also, try to upgrade to ganesha 2.6. I'm running a bigger deployment with ~30 ganesha 2.6 gateways that are quite stable so far. Paul 2018-04-16 17:30 GMT+02:00 Oliver Freyermuth : > Am 16.04.2018 um 0

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-16 Thread Oliver Freyermuth
Am 16.04.2018 um 08:58 schrieb Oliver Freyermuth: > Am 16.04.2018 um 02:43 schrieb Oliver Freyermuth: >> Am 15.04.2018 um 23:04 schrieb John Spray: >>> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth >>> wrote: Dear Cephalopodians, in our cluster (CentOS 7.4, EC Pool, Snappy comp

Re: [ceph-users] Fixing bad radosgw index

2018-04-16 Thread Casey Bodley
On 04/14/2018 12:54 PM, Robert Stanford wrote:  I deleted my default.rgw.buckets.data and default.rgw.buckets.index pools in an attempt to clean them out.  I brought this up on the list and received replies telling me essentially, "You shouldn't do that." There was however no helpful advice

[ceph-users] Big usage of db.slow

2018-04-16 Thread Rafał Wądołowski
Hi, We're using ceph as object storage. Several days ago we noticed that listing operation is very slow. Command ceph daemon osd.ID perf dump showed us a very big usage of db.slow. I aggregate output from servers: SUM DB used: 217.29 GiB SUM SLOW used= 1.25 TiB SUM WAL used= 75.14 GiB SUM DB used

Re: [ceph-users] ceph-users Digest, Vol 63, Issue 15

2018-04-16 Thread ZHONG
subscribe > 在 2018年4月16日,04:01,ceph-users-requ...@lists.ceph.com 写道: > > Send ceph-users mailing list submissions to > ceph-users@lists.ceph.com > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > or, via email, sen

Re: [ceph-users] Best way to remove an OSD node

2018-04-16 Thread John Petrini
There's a gentle reweight python script floating around on the net that does this. It gradually reduces the weight of each osd one by one waiting for rebalance to complete each time. I've never used it and it may not work on all versions so I'd make sure to test it. That or do it manually but tha

Re: [ceph-users] Error Creating OSD

2018-04-16 Thread Alfredo Deza
On Sat, Apr 14, 2018 at 5:17 PM, Rhian Resnick wrote: > Afternoon, > > > Happily, I resolved this issue. > > > Running vgdisplay showed that ceph-volume tried to create a disk on failed > disk. (We didn't know we had a bad did so this is information that was new > to us) and when the command fail

[ceph-users] Best way to remove an OSD node

2018-04-16 Thread Caspar Smit
Hi All, What would be the best way to remove an entire OSD node from a cluster? I've ran into problems removing OSD's from that node 1 by 1, eventually the last few OSD's are overloaded with data. Setting the crush weight of all these OSD's to 0 at once seems a bit rigorous Is there also a gentl

Re: [ceph-users] How much damage have I done to RGW hardcore-wiping a bucket out of its existence?

2018-04-16 Thread Yehuda Sadeh-Weinraub
On Fri, Apr 13, 2018 at 5:09 PM, Katie Holly <8ld3j...@meo.ws> wrote: > Hi everyone, > > I found myself in a situation where dynamic sharding and writing data to a > bucket containing a little more than 5M objects at the same time caused > corruption on the data rendering the entire bucket unusab

Re: [ceph-users] CephFS MDS stuck (failed to rdlock when getattr / lookup)

2018-04-16 Thread Oliver Freyermuth
Am 16.04.2018 um 02:43 schrieb Oliver Freyermuth: > Am 15.04.2018 um 23:04 schrieb John Spray: >> On Fri, Apr 13, 2018 at 5:16 PM, Oliver Freyermuth >> wrote: >>> Dear Cephalopodians, >>> >>> in our cluster (CentOS 7.4, EC Pool, Snappy compression, Luminous 12.2.4), >>> we often have all (~40) cli

Re: [ceph-users] High TCP retransmission rates, only with Ceph

2018-04-16 Thread Paweł Sadowsk
Those retransmission happen at the TCP handshake or after that? Did you checked error counters on your NICs? Maybe your radosgw is sending too much requests to OSDs. On 04/15/2018 10:54 PM, Robert Stanford wrote: > >  I should have been more clear.  The TCP retransmissions are on the > OSD host.

[ceph-users] (no subject)

2018-04-16 Thread F21
unsub ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com