Hello folks,
we are running a ceph cluster on Luminous consisting of 21 OSD Nodes with 9 8TB
SATA drives and 3 Intel 3700 SSDs for Bluestore WAL and DB (1:3 Ratio). OSDs
have 10Gb for Public and Cluster Network. The cluster is running stable for
over a year. We didn’t had a closer look on IO
Hello,
Our ceph version is ceph nautilus (14.2.1).
We create periodically snapshots from an rbd image (50 TB). In order to
restore some data, we have cloned a snapshot.
To delete the snapshot we ran: rbd rm ${POOLNAME}/${IMAGE}
But it took very long to delete the image after half an hour it had
We go with upstream release and mostly Nautilus now, probably the most
aggressive ones among serious production user (i.e tens of PB+ ),
I will vote for November for several reasons:
1. Q4 is holiday season and usually production rollout was blocked
, especially storage related change, which
HI,
We are running VM images on ceph using RBD. We are seeing a problem where
one of our VMs gets into problems due to IO not completing. iostat on the
VM shows IO remaining in the queue, and disk utilisation for ceph based
vdisks is 100%.
Upon investigation the problem seems to be with the
Hello,
I see messages related to REQUEST_SLOW a few times per day.
here's my ceph -s :
root@ceph-pa2-1:/etc/ceph# ceph -s
cluster:
id: 72d94815-f057-4127-8914-448dfd25f5bc
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-pa2-1,ceph-pa2-2,ceph-pa2-3
mgr:
Just for the record in case someone gets into this thread. The problem
related to ceph-mgr beeing started on another host than mgr active one
was because python-routes package was missing. In log, this was the
error messages displayed:
2019-06-05 11:04:48.800 7fed60097700 -1 log_channel(cluster)
Quoting Max Vernimmen (vernim...@textkernel.nl):
>
> This is happening several times per day after we made several changes at
> the same time:
>
>- add physical ram to the ceph nodes
>- move from fixed 'bluestore cache size hdd|sdd' and 'bluestore cache kv
>max' to 'bluestore cache
Am 5/28/19 um 5:37 PM schrieb Casey Bodley:
On 5/28/19 11:17 AM, Scheurer François wrote:
Hi Casey
I greatly appreciate your quick and helpful answer :-)
It's unlikely that we'll do that, but if we do it would be announced
with a long deprecation period and migration strategy.
Fine, just
I am also thinking of moving the wal/db to ssd of the sata hdd's. Did
you do tests before and after this change, and know what the difference
is iops? And is the advantage more or less when your sata hdd's are
slower?
-Original Message-
From: Stolte, Felix
Hello,
I built a tiny test cluster with Luminous using the CentOS storage repos.
I saw that they now have a nautilus repo as well but I can't find much
information on upgrading from one to the other.
Does it make sense to continue using the CentOS storage repos or should I just
switch to the
On Thu, Jun 6, 2019 at 5:07 AM Sakirnth Nagarasa
wrote:
>
> Hello,
>
> Our ceph version is ceph nautilus (14.2.1).
> We create periodically snapshots from an rbd image (50 TB). In order to
> restore some data, we have cloned a snapshot.
> To delete the snapshot we ran: rbd rm ${POOLNAME}/${IMAGE}
+1
Operators view: 12 months cycle is definitely better than 9. March seem
to be a reasonable compromise.
Best
Dietmar
On 6/6/19 2:31 AM, Linh Vu wrote:
> I think 12 months cycle is much better from the cluster operations
> perspective. I also like March as a release month as well.
>
On 6/6/19 9:26 AM, Xiaoxi Chen wrote:
> I will vote for November for several reasons:
[...]
as an academic institution we're aligned by August to July (school year)
instead of the January to December (calendar year), so all your reasons
(thanks!) are valid for us.. just shifted by 6 months,
On 6/6/19 3:46 PM, Jason Dillaman wrote:
> Can you run "rbd trash ls --all --long" and see if your image
> is listed?
No, it is not listed.
I did run:
rbd trash ls --all --long ${POOLNAME_FROM_IMAGE}
Cheers,
Sakirnth
___
ceph-users mailing list
https://ceph.com/geen-categorie/ceph-manually-repair-object/
is a little outdated.
After stopping the OSD, flushing the journal I don't have any clue on how
to move the object (easy in filestore).
I have thins in my osd log.
2019-06-05 10:46:41.418 7f47d0502700 -1 log_channel(cluster) log
On Thu, Jun 6, 2019 at 10:13 AM Sakirnth Nagarasa
wrote:
>
> On 6/6/19 3:46 PM, Jason Dillaman wrote:
> > Can you run "rbd trash ls --all --long" and see if your image
> > is listed?
>
> No, it is not listed.
>
> I did run:
> rbd trash ls --all --long ${POOLNAME_FROM_IMAGE}
>
> Cheers,
> Sakirnth
> Hi,
>
> El 5/6/19 a las 16:53, vita...@yourcmc.ru escribió:
>>> Ok, average network latency from VM to OSD's ~0.4ms.
>>
>> It's rather bad, you can improve the latency by 0.3ms just by
>> upgrading the network.
>>
>>> Single threaded performance ~500-600 IOPS - or average latency of 1.6ms
>>>
The mds has load of 0.00, and the IO stats basically say "nothing is
going on".
On 6/5/19 5:33 PM, Yan, Zheng wrote:
On Thu, Jun 6, 2019 at 6:36 AM Jorge Garcia wrote:
We have been testing a new installation of ceph (mimic 13.2.2) mostly
using cephfs (for now). The current test is just
Look here
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#pgs-inconsistent
Read error typically is a disk issue. The doc is not clear on how to
resolve that
From: Alfredo Rezinovsky
To: Ceph Users
Date: 06/06/2019 10:58 AM
Subject:[EXTERNAL]
Hello,
I was able to get Nautilus running on my cluster.
When I try to login to dashboard with the user I created if I enter the correct
credentials in the log I see:
2019-06-06 12:51:43.738 7f373ec9b700 1 mgr[dashboard]
[:::192.168.105.1:56110] [GET] [401] [0.002s] [271B]
Hi Alfredo,
you may want to check the SMART data for the disk.
I also had such a case recently (see
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/035117.html for
the thread),
and the disk had one unreadable sector which was pending reallocation.
Triggering "ceph pg repair" for
I have filed this bug:
https://tracker.ceph.com/issues/40051
On Thu, Jun 6, 2019 at 12:52 PM Drew Weaver wrote:
>
> Hello,
>
>
>
> I was able to get Nautilus running on my cluster.
>
>
>
> When I try to login to dashboard with the user I created if I enter the
> correct credentials in the log I
Hello RBD users,
Would you mind running this command on a random OSD on your RBD-oriented
cluster?
ceph-objectstore-tool \
--data-path /var/lib/ceph/osd/ceph-NNN \
'["meta",{"oid":"snapmapper","key":"","snapid":0,"hash":2758339587,"max":0,"pool":-1,"namespace":"","max":0}]'
\
list-omap |
Sadly I never discovered anything more.
It ended up clearing up on its own, which was disconcerting, but I resigned to
not making things worse in an attempt to make them better.
I assume someone touched the file in CephFS, which triggered the metadata to be
updated, and everyone was able to
For testing purposes I set a bunch of OSD to 0 weight, this correctly
forces Ceph to not use said OSD. I took enough out such that the UP set
only had Pool min size # of OSD (i.e 2 OSD).
Two Questions:
1. Why doesn't the acting set eventually match the UP set and simply point
to [6,5] only
2.
Ok, I finally got the cluster back to HEALTH_OK. After rebooting the
whole cluster didn't fix the problem, I did a:
ceph osd set noscrub
ceph osd set nodeep-scrub
That made the "slow metadata IOs" and "behind on trimming" warnings go
away, replaced by "noscrub, nodeep-scrub flag(s) set".
17838
ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS
24 hdd 1.0 1.0 419GiB 185GiB 234GiB 44.06 1.46 85
light snapshot use
On Thu, Jun 6, 2019 at 2:00 PM Sage Weil wrote:
> Hello RBD users,
>
> Would you mind running this command on a random OSD on your
27 matches
Mail list logo