Re: [ceph-users] **** SPAM **** jewel - recovery keeps stalling (continues after restarting OSDs)

2017-08-07 Thread Nikola Ciprich
Hi, I tried balancing number of OSDs per node, set their weights the same, increased op recovery priority, but it still takes ages to recover.. I've got my cluster OK now, so I'll try switching to kraken to see if it behaves better.. nik On Mon, Aug 07, 2017 at 11:36:10PM +0800, cgxu wrote:

[ceph-users] ceph cluster experiencing major performance issues

2017-08-07 Thread Mclean, Patrick
High CPU utilization and inexplicably slow I/O requests We have been having similar performance issues across several ceph clusters. When all the OSDs are up in the cluster, it can stay HEALTH_OK for a while, but eventually performance worsens and becomes (at first intermittently, but eventually

[ceph-users] implications of losing the MDS map

2017-08-07 Thread Daniel K
I finally figured out how to get the ceph-monstore-tool (compiled from source) and am ready to attemp to recover my cluster. I have one question -- in the instructions, http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-mon/ under Recovery from OSDs, Known limitations: ->

Re: [ceph-users] hammer(0.94.5) librbd dead lock, i want to how to resolve

2017-08-07 Thread Jason Dillaman
I am not sure what you mean by "I stop ceph" (stopped all the OSDs?) -- and I am not sure how you are seeing ETIMEDOUT errors on a "rbd_write" call since it should just block assuming you are referring to stopping the OSDs. What is your use-case? Are you developing your own application on top of

Re: [ceph-users] FAILED assert(last_e.version.version < e.version.version) - Or: how to use ceph-kvstore-tool?

2017-08-07 Thread Ricardo J. Barberis
Sorry, forgot to mention it, it's Hammer 0.94.10. But I already marked the OSDs as lost, after rebalancing finished. I saw a bug report at http://tracker.ceph.com/issues/14471 I can post some debug logs there but I don't know if it'll be useful at this point. Thank you, El Miércoles

Re: [ceph-users] broken parent/child relationship

2017-08-07 Thread Jason Dillaman
Correct -- deep-flatten can only be enabled at image creation time. If you do still have snapshots on that image and you wish to delete the parent, you will need to delete the snapshots. On Mon, Aug 7, 2017 at 4:52 PM, Shawn Edwards wrote: > Nailed it. Did not have

Re: [ceph-users] broken parent/child relationship

2017-08-07 Thread Shawn Edwards
Nailed it. Did not have deep-flatten feature turned on for that image. Deep-flatten cannot be added to an rbd after creation, correct? What are my options here? On Mon, Aug 7, 2017 at 3:32 PM Jason Dillaman wrote: > Does the image

Re: [ceph-users] broken parent/child relationship

2017-08-07 Thread Jason Dillaman
Does the image "tyr-p0/a56eae5f-fd35-4299-bcdc-65839a00f14c" have snapshots? If the deep-flatten feature isn't enabled, the flatten operation is not able to dissociate child images from parents when those child images have one or more snapshots. On Fri, Aug 4, 2017 at 2:30 PM, Shawn Edwards

Re: [ceph-users] download.ceph.com rsync errors

2017-08-07 Thread David Galloway
Thanks for bringing this to our attention. I've removed the lockfiles from download.ceph.com. On 08/06/2017 11:10 PM, Matthew Taylor wrote: > Hi, > > The rsync target (rsync://download.ceph.com/ceph/) has been throwing the > following errors for a while: > >> rsync: send_files failed to open

Re: [ceph-users] expanding cluster with minimal impact

2017-08-07 Thread Bryan Stillwell
Dan, We recently went through an expansion of an RGW cluster and found that we needed 'norebalance' set whenever making CRUSH weight changes to avoid slow requests. We were also increasing the CRUSH weight by 1.0 each time which seemed to reduce the extra data movement we were seeing with

Re: [ceph-users] 1 pg inconsistent, 1 pg unclean, 1 pg degraded

2017-08-07 Thread Etienne Menguy
Hi, Removing the whole OSD will work but it's overkill (if the inconsistent is not caused by a faulty disk [] ) Which ceph version are you running? If you have a recent version you can check http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent rados

[ceph-users] 1 pg inconsistent, 1 pg unclean, 1 pg degraded

2017-08-07 Thread Marc Roos
I tried to fix a 1 pg inconsistent by taking the osd 12 out, hoping for the data to be copied to a different osd, and that one would be used as 'active?'. - Would deleting the whole image in the rbd pool solve this? (or would it fail because of this status) - Should I have done this rather

Re: [ceph-users] CephFS: concurrent access to the same file from multiple nodes

2017-08-07 Thread Andras Pataki
I've filed a tracker bug for this: http://tracker.ceph.com/issues/20938 Andras On 08/01/2017 10:26 AM, Andras Pataki wrote: Hi John, Sorry for the delay, it took a bit of work to set up a luminous test environment. I'm sorry to have to report that the 12.1.1 RC version also suffers from

[ceph-users] how to migrate cached erasure pool to another type of erasure?

2017-08-07 Thread Малков Петр Викторович
Hello! Luminous v 12.1.2 Rgw ssd tiering over EC pool works fine. But I want to change type of erasure (now and in the future). Type of erasure code is not allowed for on-fly changing Only new pool with new coding First Idea was to add second tiering level Ssd - EC - ISA and to evict all down,

Re: [ceph-users] jewel: bug? forgotten rbd files?

2017-08-07 Thread Stefan Priebe - Profihost AG
ceph-dencoder type object_info_t import /tmp/a decode dump_json results in: error: buffer::malformed_input: void object_info_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding Greets, Stefan Am 05.08.2017 um 21:43 schrieb Gregory Farnum: > is OSD 20 actually a member of

Re: [ceph-users] jewel: bug? forgotten rbd files?

2017-08-07 Thread Stefan Priebe - Profihost AG
Hello Greg, if i remove the files manually from the primary - it does not help either. The primary osd is than crashing that trim_object can't find the files. Is there any chance that i manually correct the omap digest so that it just matches the files? Greets, Stefan Am 05.08.2017 um 21:43

Re: [ceph-users] All flash ceph witch NVMe and SPDK

2017-08-07 Thread Wido den Hollander
> Op 3 augustus 2017 om 15:28 schreef Mike A : > > > Hello > > Our goal it is make fast storage as possible. > By now our configuration of 6 servers look like that: > * 2 x CPU Intel Gold 6150 20 core 2.4Ghz > * 2 x 16 Gb NVDIMM DDR4 DIMM > * 6 x 16 Gb RAM DDR4 > * 6

Re: [ceph-users] jewel: bug? forgotten rbd files?

2017-08-07 Thread Stefan Priebe - Profihost AG
Hello Greg, even after 48h the files are still there and the PG is still in: active+clean+inconsistent+snaptrim state. Greets, Stefan Am 05.08.2017 um 21:43 schrieb Gregory Farnum: > is OSD 20 actually a member of the pg right now? It could be stray data > that is slowly getting cleaned up. > >