Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Maged Mokhtar
Hi Robert 1) Can you specify how many threads were used in the 4k write rados test ? i suspect that only 16 threads were used, this is because it is the default + also the average latency was 2.9 ms giving average of 344 iops per thread, your average iops were 5512 divide this by 344 we get

Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Paul Emmerich
On Sat, May 25, 2019 at 12:30 AM Mark Lehrer wrote: > > but only 20MB/s write and 95MB/s read with 4KB objects. > > There is copy-on-write overhead for each block, so 4K performance is > going to be limited no matter what. > no snapshots are involved and he's using rados bench which operates on

Re: [ceph-users] "allow profile rbd" or "profile rbd"

2019-05-24 Thread Jason Dillaman
On Fri, May 24, 2019 at 6:09 PM Marc Roos wrote: > > > I have still some account listing either "allow" or not. What should > this be? Should this not be kept uniform? What if the profile in the future adds denials? What does "allow profile XYX" (or "deny profile rbd") mean when it has other

Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Mark Lehrer
> but only 20MB/s write and 95MB/s read with 4KB objects. There is copy-on-write overhead for each block, so 4K performance is going to be limited no matter what. However, if your system is like mine the main problem you will run into is that Ceph was designed for spinning disks. Therefore, its

Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Paul Emmerich
On Fri, May 24, 2019 at 3:27 PM Robert Sander wrote: > Am 24.05.19 um 14:43 schrieb Paul Emmerich: > > 20 MB/s at 4K blocks is ~5000 iops, that's 1250 IOPS per SSD (assuming > > replica 3). > > > > What we usually check in scenarios like these: > > > > * SSD model? Lots of cheap SSDs simply

Re: [ceph-users] Failed Disk simulation question

2019-05-24 Thread solarflow99
I think a deep scrub would eventually catch this right? On Wed, May 22, 2019 at 2:56 AM Eugen Block wrote: > Hi Alex, > > > The cluster has been idle at the moment being new and all. I > > noticed some disk related errors in dmesg but that was about it. > > It looked to me for the next 20 -

Re: [ceph-users] inconsistent number of pools

2019-05-24 Thread Michel Raabe
On 20.05.19 13:04, Lars Täuber wrote: Mon, 20 May 2019 10:52:14 + Eugen Block ==> ceph-users@lists.ceph.com : Hi, have you tried 'ceph health detail'? No I hadn't. Thanks for the hint. You can also try $ rados lspools $ ceph osd pool ls and verify that with the pgs $ ceph pg ls

Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Robert LeBlanc
On Fri, May 24, 2019 at 6:26 AM Robert Sander wrote: > Am 24.05.19 um 14:43 schrieb Paul Emmerich: > > 20 MB/s at 4K blocks is ~5000 iops, that's 1250 IOPS per SSD (assuming > > replica 3). > > > > What we usually check in scenarios like these: > > > > * SSD model? Lots of cheap SSDs simply

Re: [ceph-users] large omap object in usage_log_pool

2019-05-24 Thread Casey Bodley
On 5/24/19 1:15 PM, shubjero wrote: Thanks for chiming in Konstantin! Wouldn't setting this value to 0 disable the sharding? Reference: http://docs.ceph.com/docs/mimic/radosgw/config-ref/ rgw override bucket index max shards Description:Represents the number of shards for the bucket index

Re: [ceph-users] CephFS object mapping.

2019-05-24 Thread Robert LeBlanc
On Fri, May 24, 2019 at 2:14 AM Burkhard Linke < burkhard.li...@computational.bio.uni-giessen.de> wrote: > Hi, > On 5/22/19 5:53 PM, Robert LeBlanc wrote: > > When you say 'some' is it a fixed offset that the file data starts? Is the > first stripe just metadata? > > No, the first stripe contains

Re: [ceph-users] large omap object in usage_log_pool

2019-05-24 Thread shubjero
Thanks for chiming in Konstantin! Wouldn't setting this value to 0 disable the sharding? Reference: http://docs.ceph.com/docs/mimic/radosgw/config-ref/ rgw override bucket index max shards Description:Represents the number of shards for the bucket index object, a value of zero indicates there

[ceph-users] "allow profile rbd" or "profile rbd"

2019-05-24 Thread Marc Roos
I have still some account listing either "allow" or not. What should this be? Should this not be kept uniform? [client.xxx.xx] key = xxx caps mon = "allow profile rbd" caps osd = "profile rbd pool=rbd,profile rbd pool=rbd.ssd" [client.xxx] key =

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Robert LeBlanc
I'd say that if you can't find that object in Rados, then your assumption may be good. I haven't run into this problem before. Try doing a Rados get for that object and see if you get anything. I've done a Rados list grepping for the hex inode, but it took almost two days on our cluster that had

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Kevin Flöh
ok this just gives me: error getting xattr ec31/10004dfce92./parent: (2) No such file or directory Does this mean that the lost object isn't even a file that appears in the ceph directory. Maybe a leftover of a file that has not been deleted properly? It wouldn't be an issue to mark

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Robert LeBlanc
You need to use the first stripe of the object as that is the only one with the metadata. Try "rados -p ec31 getxattr 10004dfce92. parent" instead. Robert LeBlanc Sent from a mobile device, please excuse any typos. On Fri, May 24, 2019, 4:42 AM Kevin Flöh wrote: > Hi, > > we already

Re: [ceph-users] Lost OSD - 1000: FAILED assert(r == 0)

2019-05-24 Thread Guillaume Chenuet
Hi, Thanks for your answers. I recreated the OSD and I'll monitor the disk health (currently OK). Thanks a lot, Guillaume On Fri, 24 May 2019 at 15:56, Igor Fedotov wrote: > Hi Guillaume, > > Could you please set debug-bluefs to 20, restart OSD and collect the whole > log. > > > Thanks, > >

Re: [ceph-users] RFC: relicence Ceph LGPL-2.1 code as LGPL-2.1 or LGPL-3.0

2019-05-24 Thread Sage Weil
On Fri, 10 May 2019, Robin H. Johnson wrote: > On Fri, May 10, 2019 at 02:27:11PM +, Sage Weil wrote: > > If you are a Ceph developer who has contributed code to Ceph and object to > > this change of license, please let us know, either by replying to this > > message or by commenting on that

Re: [ceph-users] Lost OSD - 1000: FAILED assert(r == 0)

2019-05-24 Thread Igor Fedotov
Hi Guillaume, Could you please set debug-bluefs to 20, restart OSD and collect the whole log. Thanks, Igor On 5/24/2019 4:50 PM, Guillaume Chenuet wrote: Hi, We are running a Ceph cluster with 36 OSD splitted on 3 servers (12 OSD per server) and Ceph version 12.2.11 

Re: [ceph-users] Lost OSD - 1000: FAILED assert(r == 0)

2019-05-24 Thread Paul Emmerich
Disk got corrupted, it might be dead. Check kernel log for errors and SMART reallocated sector count or errors. If the disk is still good: simply re-create the OSD. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h

[ceph-users] Lost OSD - 1000: FAILED assert(r == 0)

2019-05-24 Thread Guillaume Chenuet
Hi, We are running a Ceph cluster with 36 OSD splitted on 3 servers (12 OSD per server) and Ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable). This cluster is used by an OpenStack private cloud and deployed with OpenStack Kolla. Every OSD ran into a Docker

Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Robert Sander
Am 24.05.19 um 14:43 schrieb Paul Emmerich: > 20 MB/s at 4K blocks is ~5000 iops, that's 1250 IOPS per SSD (assuming > replica 3). > > What we usually check in scenarios like these: > > * SSD model? Lots of cheap SSDs simply can't handle more than that The system has been newly created and is

Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Paul Emmerich
20 MB/s at 4K blocks is ~5000 iops, that's 1250 IOPS per SSD (assuming replica 3). What we usually check in scenarios like these: * SSD model? Lots of cheap SSDs simply can't handle more than that * Get some proper statistics such as OSD latencies, disk IO utilization, etc. A benchmark without

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Kevin Flöh
Hi, we already tried "rados -p ec31 getxattr 10004dfce92.003d parent" but this is just hanging forever if we are looking for unfound objects. It works fine for all other objects. We also tried scanning the ceph directory with find -inum 1099593404050 (decimal of 10004dfce92) and found

[ceph-users] performance in a small cluster

2019-05-24 Thread Robert Sander
Hi, we have a small cluster at a customer's site with three nodes and 4 SSD-OSDs each. Connected with 10G the system is supposed to perform well. rados bench shows ~450MB/s write and ~950MB/s read speeds with 4MB objects but only 20MB/s write and 95MB/s read with 4KB objects. This is a

Re: [ceph-users] How to fix this? session lost, hunting for new mon, session established, io error

2019-05-24 Thread Ilya Dryomov
On Tue, May 21, 2019 at 11:41 AM Marc Roos wrote: > > > > I have this on a cephfs client, I had ceph common on 12.2.11, and > upgraded to 12.2.12 while having this error. They are writing here [0] > you need to upgrade kernel and it is fixed in 12.2.2 > > [@~]# uname -a > Linux mail03

[ceph-users] Is there some changes in ceph instructions in latest version(14.2.1)?

2019-05-24 Thread Yuan Minghui
Hello :    When I try to install the latest version ceph-14.2.1. When I try to create a ‘mon’, there is something wrong ? What should I do now? Thanks kyle ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] CephFS object mapping.

2019-05-24 Thread Burkhard Linke
Hi, On 5/22/19 5:53 PM, Robert LeBlanc wrote: On Wed, May 22, 2019 at 12:22 AM Burkhard Linke > wrote: Hi, On 5/21/19 9:46 PM, Robert LeBlanc wrote: > I'm at a new job working with Ceph again and am excited to back in

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Burkhard Linke
Hi, On 5/24/19 9:48 AM, Kevin Flöh wrote: We got the object ids of the missing objects with|ceph pg 1.24c list_missing:| |{     "offset": {     "oid": "",     "key": "",     "snapid": 0,     "hash": 0,     "max": 0,     "pool": -9223372036854775808,    

Re: [ceph-users] Ceph dovecot

2019-05-24 Thread Danny Al-Gaaf
Hi, you can find the slides here: https://dalgaaf.github.io/Cephalocon-Barcelona-librmb/ And Wido is right, it's not production ready and we have some work ahead to make it work with an acceptable performance atm especially in our scale. If you have any questions don't hesitate to contact me.

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Kevin Flöh
We got the object ids of the missing objects with|ceph pg 1.24c list_missing:| |{     "offset": {     "oid": "",     "key": "",     "snapid": 0,     "hash": 0,     "max": 0,     "pool": -9223372036854775808,     "namespace": ""     },     "num_missing": 1,