[ceph-users] Expected IO in luminous Ceph Cluster

2019-06-06 Thread Stolte, Felix
Hello folks, we are running a ceph cluster on Luminous consisting of 21 OSD Nodes with 9 8TB SATA drives and 3 Intel 3700 SSDs for Bluestore WAL and DB (1:3 Ratio). OSDs have 10Gb for Public and Cluster Network. The cluster is running stable for over a year. We didn’t had a closer look on IO

[ceph-users] Remove rbd image after interrupt of deletion command

2019-06-06 Thread Sakirnth Nagarasa
Hello, Our ceph version is ceph nautilus (14.2.1). We create periodically snapshots from an rbd image (50 TB). In order to restore some data, we have cloned a snapshot. To delete the snapshot we ran: rbd rm ${POOLNAME}/${IMAGE} But it took very long to delete the image after half an hour it had

Re: [ceph-users] Changing the release cadence

2019-06-06 Thread Xiaoxi Chen
We go with upstream release and mostly Nautilus now, probably the most aggressive ones among serious production user (i.e tens of PB+ ), I will vote for November for several reasons: 1. Q4 is holiday season and usually production rollout was blocked , especially storage related change, which

[ceph-users] OSD hanging on 12.2.12 by message worker

2019-06-06 Thread Max Vernimmen
HI, We are running VM images on ceph using RBD. We are seeing a problem where one of our VMs gets into problems due to IO not completing. iostat on the VM shows IO remaining in the queue, and disk utilisation for ceph based vdisks is 100%. Upon investigation the problem seems to be with the

[ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-06 Thread BASSAGET Cédric
Hello, I see messages related to REQUEST_SLOW a few times per day. here's my ceph -s : root@ceph-pa2-1:/etc/ceph# ceph -s cluster: id: 72d94815-f057-4127-8914-448dfd25f5bc health: HEALTH_OK services: mon: 3 daemons, quorum ceph-pa2-1,ceph-pa2-2,ceph-pa2-3 mgr:

Re: [ceph-users] How to remove ceph-mgr from a node

2019-06-06 Thread Vandeir Eduardo
Just for the record in case someone gets into this thread. The problem related to ceph-mgr beeing started on another host than mgr active one was because python-routes package was missing. In log, this was the error messages displayed: 2019-06-05 11:04:48.800 7fed60097700 -1 log_channel(cluster)

Re: [ceph-users] OSD hanging on 12.2.12 by message worker

2019-06-06 Thread Stefan Kooman
Quoting Max Vernimmen (vernim...@textkernel.nl): > > This is happening several times per day after we made several changes at > the same time: > >- add physical ram to the ceph nodes >- move from fixed 'bluestore cache size hdd|sdd' and 'bluestore cache kv >max' to 'bluestore cache

Re: [ceph-users] is rgw crypt default encryption key long term supported ?

2019-06-06 Thread Florian Engelmann
Am 5/28/19 um 5:37 PM schrieb Casey Bodley: On 5/28/19 11:17 AM, Scheurer François wrote: Hi Casey I greatly appreciate your quick and helpful answer :-) It's unlikely that we'll do that, but if we do it would be announced with a long deprecation period and migration strategy. Fine, just

Re: [ceph-users] Expected IO in luminous Ceph Cluster

2019-06-06 Thread Marc Roos
I am also thinking of moving the wal/db to ssd of the sata hdd's. Did you do tests before and after this change, and know what the difference is iops? And is the advantage more or less when your sata hdd's are slower? -Original Message- From: Stolte, Felix

[ceph-users] Upgrading from luminous to nautilus using CentOS storage repos

2019-06-06 Thread Drew Weaver
Hello, I built a tiny test cluster with Luminous using the CentOS storage repos. I saw that they now have a nautilus repo as well but I can't find much information on upgrading from one to the other. Does it make sense to continue using the CentOS storage repos or should I just switch to the

Re: [ceph-users] Remove rbd image after interrupt of deletion command

2019-06-06 Thread Jason Dillaman
On Thu, Jun 6, 2019 at 5:07 AM Sakirnth Nagarasa wrote: > > Hello, > > Our ceph version is ceph nautilus (14.2.1). > We create periodically snapshots from an rbd image (50 TB). In order to > restore some data, we have cloned a snapshot. > To delete the snapshot we ran: rbd rm ${POOLNAME}/${IMAGE}

Re: [ceph-users] Changing the release cadence

2019-06-06 Thread Dietmar Rieder
+1 Operators view: 12 months cycle is definitely better than 9. March seem to be a reasonable compromise. Best Dietmar On 6/6/19 2:31 AM, Linh Vu wrote: > I think 12 months cycle is much better from the cluster operations > perspective. I also like March as a release month as well.  >

Re: [ceph-users] Changing the release cadence

2019-06-06 Thread Daniel Baumann
On 6/6/19 9:26 AM, Xiaoxi Chen wrote: > I will vote for November for several reasons: [...] as an academic institution we're aligned by August to July (school year) instead of the January to December (calendar year), so all your reasons (thanks!) are valid for us.. just shifted by 6 months,

Re: [ceph-users] Remove rbd image after interrupt of deletion command

2019-06-06 Thread Sakirnth Nagarasa
On 6/6/19 3:46 PM, Jason Dillaman wrote: > Can you run "rbd trash ls --all --long" and see if your image > is listed? No, it is not listed. I did run: rbd trash ls --all --long ${POOLNAME_FROM_IMAGE} Cheers, Sakirnth ___ ceph-users mailing list

[ceph-users] Fix scrub error in bluestore.

2019-06-06 Thread Alfredo Rezinovsky
https://ceph.com/geen-categorie/ceph-manually-repair-object/ is a little outdated. After stopping the OSD, flushing the journal I don't have any clue on how to move the object (easy in filestore). I have thins in my osd log. 2019-06-05 10:46:41.418 7f47d0502700 -1 log_channel(cluster) log

Re: [ceph-users] Remove rbd image after interrupt of deletion command

2019-06-06 Thread Jason Dillaman
On Thu, Jun 6, 2019 at 10:13 AM Sakirnth Nagarasa wrote: > > On 6/6/19 3:46 PM, Jason Dillaman wrote: > > Can you run "rbd trash ls --all --long" and see if your image > > is listed? > > No, it is not listed. > > I did run: > rbd trash ls --all --long ${POOLNAME_FROM_IMAGE} > > Cheers, > Sakirnth

Re: [ceph-users] Single threaded IOPS on SSD pool.

2019-06-06 Thread jesper
> Hi, > > El 5/6/19 a las 16:53, vita...@yourcmc.ru escribió: >>> Ok, average network latency from VM to OSD's ~0.4ms. >> >> It's rather bad, you can improve the latency by 0.3ms just by >> upgrading the network. >> >>> Single threaded performance ~500-600 IOPS - or average latency of 1.6ms >>>

Re: [ceph-users] How to fix ceph MDS HEALTH_WARN

2019-06-06 Thread Jorge Garcia
The mds has load of 0.00, and the IO stats basically say "nothing is going on". On 6/5/19 5:33 PM, Yan, Zheng wrote: On Thu, Jun 6, 2019 at 6:36 AM Jorge Garcia wrote: We have been testing a new installation of ceph (mimic 13.2.2) mostly using cephfs (for now). The current test is just

Re: [ceph-users] Fix scrub error in bluestore.

2019-06-06 Thread Tarek Zegar
Look here http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent Read error typically is a disk issue. The doc is not clear on how to resolve that From: Alfredo Rezinovsky To: Ceph Users Date: 06/06/2019 10:58 AM Subject:[EXTERNAL]

[ceph-users] dashboard returns 401 on successful auth

2019-06-06 Thread Drew Weaver
Hello, I was able to get Nautilus running on my cluster. When I try to login to dashboard with the user I created if I enter the correct credentials in the log I see: 2019-06-06 12:51:43.738 7f373ec9b700 1 mgr[dashboard] [:::192.168.105.1:56110] [GET] [401] [0.002s] [271B]

Re: [ceph-users] Fix scrub error in bluestore.

2019-06-06 Thread Oliver Freyermuth
Hi Alfredo, you may want to check the SMART data for the disk. I also had such a case recently (see http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/035117.html for the thread), and the disk had one unreadable sector which was pending reallocation. Triggering "ceph pg repair" for

Re: [ceph-users] dashboard returns 401 on successful auth

2019-06-06 Thread Nathan Fish
I have filed this bug: https://tracker.ceph.com/issues/40051 On Thu, Jun 6, 2019 at 12:52 PM Drew Weaver wrote: > > Hello, > > > > I was able to get Nautilus running on my cluster. > > > > When I try to login to dashboard with the user I created if I enter the > correct credentials in the log I

[ceph-users] typical snapmapper size

2019-06-06 Thread Sage Weil
Hello RBD users, Would you mind running this command on a random OSD on your RBD-oriented cluster? ceph-objectstore-tool \ --data-path /var/lib/ceph/osd/ceph-NNN \ '["meta",{"oid":"snapmapper","key":"","snapid":0,"hash":2758339587,"max":0,"pool":-1,"namespace":"","max":0}]' \ list-omap |

Re: [ceph-users] obj_size_info_mismatch error handling

2019-06-06 Thread Reed Dier
Sadly I never discovered anything more. It ended up clearing up on its own, which was disconcerting, but I resigned to not making things worse in an attempt to make them better. I assume someone touched the file in CephFS, which triggered the metadata to be updated, and everyone was able to

[ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size

2019-06-06 Thread Tarek Zegar
For testing purposes I set a bunch of OSD to 0 weight, this correctly forces Ceph to not use said OSD. I took enough out such that the UP set only had Pool min size # of OSD (i.e 2 OSD). Two Questions: 1. Why doesn't the acting set eventually match the UP set and simply point to [6,5] only 2.

Re: [ceph-users] How to fix ceph MDS HEALTH_WARN

2019-06-06 Thread Jorge Garcia
Ok, I finally got the cluster back to HEALTH_OK. After rebooting the whole cluster didn't fix the problem, I did a:   ceph osd set noscrub   ceph osd set nodeep-scrub That made the "slow metadata IOs" and "behind on trimming" warnings go away, replaced by "noscrub, nodeep-scrub flag(s) set".

Re: [ceph-users] typical snapmapper size

2019-06-06 Thread Shawn Iverson
17838 ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS 24 hdd 1.0 1.0 419GiB 185GiB 234GiB 44.06 1.46 85 light snapshot use On Thu, Jun 6, 2019 at 2:00 PM Sage Weil wrote: > Hello RBD users, > > Would you mind running this command on a random OSD on your