Re: [ceph-users] Ceph nautilus upgrade problem

2019-04-02 Thread Stefan Kooman
Quoting Paul Emmerich (paul.emmer...@croit.io): > This also happened sometimes during a Luminous -> Mimic upgrade due to > a bug in Luminous; however I thought it was fixed on the ceph-mgr > side. > Maybe the fix was (also) required in the OSDs and you are seeing this > because the running OSDs

Re: [ceph-users] op_w_latency

2019-04-02 Thread Glen Baars
Thanks for the updated command – much cleaner! The OSD nodes have a single 6core X5650 @ 2.67GHz, 72GB GB and around 8x10TB HDD OSD/ 4 x 2TB SSD OSD. Cpu usage is around 20% and the ram has 22GB available. The 3 MON nodes are the same but with no OSDs The cluster has around 150 drives and only

Re: [ceph-users] op_w_latency

2019-04-02 Thread Konstantin Shalygin
Hello Ceph Users, I am finding that the write latency across my ceph clusters isn't great and I wanted to see what other people are getting for op_w_latency. Generally I am getting 70-110ms latency. I am using: ceph --admin-daemon /var/run/ceph/ceph-osd.102.asok perf dump | grep -A3

Re: [ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

2019-04-02 Thread Yan, Zheng
Looks like http://tracker.ceph.com/issues/37399. which version of ceph-mds do you use? On Tue, Apr 2, 2019 at 7:47 AM Sergey Malinin wrote: > > These steps pretty well correspond to > http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/ > Were you able to replay journal manually with no

Re: [ceph-users] Erasure Coding failure domain (again)

2019-04-02 Thread Christian Balzer
On Tue, 2 Apr 2019 19:04:28 +0900 Hector Martin wrote: > On 02/04/2019 18.27, Christian Balzer wrote: > > I did a quick peek at my test cluster (20 OSDs, 5 hosts) and a replica 2 > > pool with 1024 PGs. > > (20 choose 2) is 190, so you're never going to have more than that many > unique sets

Re: [ceph-users] Ceph nautilus upgrade problem

2019-04-02 Thread Paul Emmerich
This also happened sometimes during a Luminous -> Mimic upgrade due to a bug in Luminous; however I thought it was fixed on the ceph-mgr side. Maybe the fix was (also) required in the OSDs and you are seeing this because the running OSDs have that bug? Anyways, it's harmless and you can ignore

Re: [ceph-users] Ceph nautilus upgrade problem

2019-04-02 Thread Jan-Willem Michels
Op 2-4-2019 om 12:16 schreef Stefan Kooman: Quoting Stadsnet (jwil...@stads.net): On 26-3-2019 16:39, Ashley Merrick wrote: Have you upgraded any OSD's? No didn't go through with the osd's Just checking here: are your sure all PGs have been scrubbed while running Luminous? As the release

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Yan, Zheng
On Tue, Apr 2, 2019 at 9:10 PM Paul Emmerich wrote: > > On Tue, Apr 2, 2019 at 3:05 PM Yan, Zheng wrote: > > > > On Tue, Apr 2, 2019 at 8:23 PM Clausen, Jörn wrote: > > > > > > Hi! > > > > > > Am 29.03.2019 um 23:56 schrieb Paul Emmerich: > > > > There's also some metadata overhead etc. You

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Jonas Jelten
On 02/04/2019 15.05, Yan, Zheng wrote: > I don't use this feature. We don't have plan to mark this feature > stable. (probably we will remove this feature in the furthure). Oh no! We have activated inline_data since our cluster does have lots of small files (but also big ones), and performance

Re: [ceph-users] CephFS and many small files

2019-04-02 Thread Frédéric Nass
Hello, I haven't had any issues either with 4k allocation size in cluster holding 358M objects for 116TB (237TB raw) and 2.264B chunks/replicas. This is an average of 324k per object and 12.6M of chunks/replicas per OSD with RocksDB sizes going from 12.1GB to 21.14GB depending on how much

Re: [ceph-users] rbd: error processing image xxx (2) No such file or directory

2019-04-02 Thread Eugen Block
Sorry -- you need the "" as part of that command. My bad, I only read this from the help page ignoring the (and forgot the pool name): -a [ --all ] list snapshots from all namespaces I figured this would list all existing snapshots, similar to the "rbd -p ls --long" command.

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Paul Emmerich
On Tue, Apr 2, 2019 at 3:05 PM Yan, Zheng wrote: > > On Tue, Apr 2, 2019 at 8:23 PM Clausen, Jörn wrote: > > > > Hi! > > > > Am 29.03.2019 um 23:56 schrieb Paul Emmerich: > > > There's also some metadata overhead etc. You might want to consider > > > enabling inline data in cephfs to handle

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Yan, Zheng
On Tue, Apr 2, 2019 at 9:05 PM Yan, Zheng wrote: > > On Tue, Apr 2, 2019 at 8:23 PM Clausen, Jörn wrote: > > > > Hi! > > > > Am 29.03.2019 um 23:56 schrieb Paul Emmerich: > > > There's also some metadata overhead etc. You might want to consider > > > enabling inline data in cephfs to handle

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Yan, Zheng
On Tue, Apr 2, 2019 at 8:23 PM Clausen, Jörn wrote: > > Hi! > > Am 29.03.2019 um 23:56 schrieb Paul Emmerich: > > There's also some metadata overhead etc. You might want to consider > > enabling inline data in cephfs to handle small files in a > > store-efficient way (note that this feature is

Re: [ceph-users] rbd: error processing image xxx (2) No such file or directory

2019-04-02 Thread Jason Dillaman
On Tue, Apr 2, 2019 at 8:42 AM Eugen Block wrote: > > Hi, > > > If you run "rbd snap ls --all", you should see a snapshot in > > the "trash" namespace. > > I just tried the command "rbd snap ls --all" on a lab cluster > (nautilus) and get this error: > > ceph-2:~ # rbd snap ls --all > rbd: image

Re: [ceph-users] rbd: error processing image xxx (2) No such file or directory

2019-04-02 Thread Eugen Block
Hi, If you run "rbd snap ls --all", you should see a snapshot in the "trash" namespace. I just tried the command "rbd snap ls --all" on a lab cluster (nautilus) and get this error: ceph-2:~ # rbd snap ls --all rbd: image name was not specified Are there any requirements I haven't

Re: [ceph-users] rbd: error processing image xxx (2) No such file or directory

2019-04-02 Thread Jason Dillaman
On Tue, Apr 2, 2019 at 4:19 AM Nikola Ciprich wrote: > > Hi, > > on one of my clusters, I'm getting error message which is getting > me a bit nervous.. while listing contents of a pool I'm getting > error for one of images: > > [root@node1 ~]# rbd ls -l nvme > /dev/null > rbd: error processing

Re: [ceph-users] inline_data (was: CephFS and many small files)

2019-04-02 Thread Clausen , Jörn
Hi! Am 29.03.2019 um 23:56 schrieb Paul Emmerich: There's also some metadata overhead etc. You might want to consider enabling inline data in cephfs to handle small files in a store-efficient way (note that this feature is officially marked as experimental, though).

Re: [ceph-users] Ceph nautilus upgrade problem

2019-04-02 Thread Stefan Kooman
Quoting Stadsnet (jwil...@stads.net): > On 26-3-2019 16:39, Ashley Merrick wrote: > >Have you upgraded any OSD's? > > > No didn't go through with the osd's Just checking here: are your sure all PGs have been scrubbed while running Luminous? As the release notes [1] mention this: "If you are

Re: [ceph-users] Erasure Coding failure domain (again)

2019-04-02 Thread Hector Martin
On 02/04/2019 18.27, Christian Balzer wrote: I did a quick peek at my test cluster (20 OSDs, 5 hosts) and a replica 2 pool with 1024 PGs. (20 choose 2) is 190, so you're never going to have more than that many unique sets of OSDs. I just looked at the OSD distribution for a replica 3 pool

Re: [ceph-users] Moving pools between cluster

2019-04-02 Thread Stefan Kooman
Quoting Burkhard Linke (burkhard.li...@computational.bio.uni-giessen.de): > Hi, > Images: > > Straight-forward attempt would be exporting all images with qemu-img from > one cluster, and uploading them again on the second cluster. But this will > break snapshots, protections etc. You can use

Re: [ceph-users] Erasure Coding failure domain (again)

2019-04-02 Thread Christian Balzer
Hello Hector, Firstly I'm so happy somebody actually replied. On Tue, 2 Apr 2019 16:43:10 +0900 Hector Martin wrote: > On 31/03/2019 17.56, Christian Balzer wrote: > > Am I correct that unlike with with replication there isn't a maximum size > > of the critical path OSDs? > > As far as I

[ceph-users] Moving pools between cluster

2019-04-02 Thread Burkhard Linke
Hi, we are about to setup a new Ceph cluster for our Openstack cloud. Ceph is used for images, volumes and object storage. I'm unsure to handle these cases and how to move the data correctly. Object storage: I consider this the easiest case, since RGW itself provides the necessary means

[ceph-users] rbd: error processing image xxx (2) No such file or directory

2019-04-02 Thread Nikola Ciprich
Hi, on one of my clusters, I'm getting error message which is getting me a bit nervous.. while listing contents of a pool I'm getting error for one of images: [root@node1 ~]# rbd ls -l nvme > /dev/null rbd: error processing image xxx: (2) No such file or directory [root@node1 ~]# rbd info

Re: [ceph-users] Erasure Coding failure domain (again)

2019-04-02 Thread Hector Martin
On 31/03/2019 17.56, Christian Balzer wrote: Am I correct that unlike with with replication there isn't a maximum size of the critical path OSDs? As far as I know, the math for calculating the probability of data loss wrt placement groups is the same for EC and for replication. Replication

Re: [ceph-users] MDS stuck at replaying status

2019-04-02 Thread Yan, Zheng
please set debug_mds=10, and try again On Tue, Apr 2, 2019 at 1:01 PM Albert Yue wrote: > > Hi, > > This happens after we restart the active MDS, and somehow the standby MDS > daemon cannot take over successfully and is stuck at up:replaying. It is > showing the following log. Any idea on how