Hi,
During testing (mimicking BGP / port flaps) on our cluster we are able
to trigger a "_committed_osd_maps shutdown OSD via async signal" on the
the affected OSD servers in that datacenter (OSDs in that DC become
intermittent isolated from their peers). Result is that all OSD
processes stop. Is
Hi,
Can we supply http://tracker.ceph.com with TLS and make it
https://tracker.ceph.com? Should be trivial with Let's Encrypt for
example.
Thanks!
Gr. Stefan
--
| BIT BV http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
--
| BIT BV http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Quoting Gregory Farnum (gfar...@redhat.com):
> That's a feature, but invoking it may indicate the presence of another
> issue. The OSD shuts down if
> 1) it has been deleted from the cluster, or
> 2) it has been incorrectly marked down a bunch of times by the cluster, and
> gives up, or
> 3) it
Quoting Kashif Mumtaz (kashif.mum...@yahoo.com):
>
> Dear User,
> I am striving had to install Ceph luminous version on Ubuntu 16.04.3 (
> xenial ).
> Its repo is available at https://download.ceph.com/debian-luminous/
> I added it like sudo apt-add-repository 'deb
>
Quoting Christian Balzer (ch...@gol.com):
>
> On Thu, 28 Sep 2017 22:36:22 + Gregory Farnum wrote:
>
> > Also, realize the deep scrub interval is a per-PG thing and (unfortunately)
> > the OSD doesn't use a global view of its PG deep scrub ages to try and
> > schedule them intelligently
Quoting Yoann Moulin (yoann.mou...@epfl.ch):
>
> Kernels on client is 4.4.0-93 and on ceph node are 4.4.0-96
>
> What is exactly an older kernel client ? 4.4 is old ?
See
http://docs.ceph.com/docs/master/cephfs/best-practices/#which-kernel-version
If you're on Ubuntu Xenial I would advise to
Quoting Yoann Moulin (yoann.mou...@epfl.ch):
>
> >> Kernels on client is 4.4.0-93 and on ceph node are 4.4.0-96
> >>
> >> What is exactly an older kernel client ? 4.4 is old ?
> >
> > See
> > http://docs.ceph.com/docs/master/cephfs/best-practices/#which-kernel-version
> >
> > If you're on
Quoting Kashif Mumtaz (kashif.mum...@yahoo.com):
> Dear, Thanks for help. I am able to install on single node. Now going
> to install on multiple nodes. Just want to clarify one small thing.
> Is Ceph key and Ceph repository need to add on every node or it is
> required only on admin node where
Hi,
I noticed the ceph version still gives "rc" although we are using the
latest Ceph packages: 12.2.0-1xenial
(https://download.ceph.com/debian-luminous xenial/main amd64 Packages):
ceph daemon mon.mon5 version
{"version":"12.2.0","release":"luminous","release_type":"rc"}
Why is this important
Hi,
Quoting Alfredo Deza (ad...@redhat.com):
> Hi,
>
> Now that ceph-volume is part of the Luminous release, we've been able
> to provide filestore support for LVM-based OSDs. We are making use of
> LVM's powerful mechanisms to store metadata which allows the process
> to no longer rely on UDEV
Hi,
While implementing (stricter) firewall rules I noticed weird behaviour.
For the monitors only port 6789 was allowed. We currently co-locate the
manager daemon with our monitors. Apparently (at least) port 6800 is
also essential. In the Network Configuration Reference [1] there is no
mention
Hi,
Sorry for empty mail, that shouldn't have happened. I would like to
address the following. Currently the repository list for debian-
packages contain _only_ the latest package version. In case of a
(urgent) need to downgrade you cannot easily select an older version.
You then need to resort
Quoting Ashley Merrick (ash...@amerrick.co.uk):
> Hello,
>
> Setting up a new test lab, single server 5 disks/OSD.
>
> Want to run an EC Pool that has more shards than avaliable OSD's , is
> it possible to force crush to 're use an OSD for another shard?
>
> I know normally this is bad practice
Quoting Fabian Grünbichler (f.gruenbich...@proxmox.com):
> I think the above roadmap is a good compromise for all involved parties,
> and I hope we can use the remainder of Luminous to prepare for a
> seam- and painless transition to ceph-volume in time for the Mimic
> release, and then finally
Quoting tim taler (robur...@gmail.com):
> And I'm still puzzled about the implication of the cluster size on the
> amount of OSD failures.
> With size=2 min_size=1 one host could die and (if by chance there is
> NO read error on any bit on the living host) I could (theoretically)
> recover, is
Hi,
The new style "ceph-volume" LVM way of provisioning OSDs introduces a
little challange for us. In order to create the OSDs as logical,
consistent and easily recognizable as possible, we try to name the
Volume Groups (VG) and Logical Volumes (LV) the same as the OSD. For
example: OSD no. 12
Quoting Webert de Souza Lima (webert.b...@gmail.com):
> if I may suggest, "ceph osd create" allocates and returns an OSD ID. So you
> could take it by doing:
>
> ID=$(ceph osd create)
>
> then remove it with
>
> ceph osd rm $ID
>
> Now you have the $ID and you can deploy it with ceph-volume
Quoting Willem Jan Withagen (w...@digiware.nl):
> LOG.debug('Allocating OSD id...')
> secrets = Secrets()
> try:
> wanttobe = read_one_line(path, 'wanttobe')
> if os.path.exists(os.path.join(path, 'wanttobe')):
> os.unlink(os.path.join(path, 'wanttobe'))
>
Quoting Burkhard Linke (burkhard.li...@computational.bio.uni-giessen.de):
> Just my 2 cents:
>
> What is happening if ansible runs on multiple hosts in parallel?
We won't do that to avoid race conditions that might arise. We don't
need parallel deployment of massive amount of OSDs overnight, so
Quoting Josef Zelenka (josef.zele...@cloudevelops.com):
> Hi everyone,
>
> we have recently deployed a Luminous(12.2.1) cluster on Ubuntu - three osd
> nodes and three monitors, every osd has 3x 2TB SSD + an NVMe drive for a
> blockdb. We use it as a backend for our Openstack cluster, so we store
Quoting 姜洵 (jiang...@100tal.com):
> Hi folks,
>
>
> I am trying to install create a bluestore osd manually with ceph-volume tool
> on
> a Ubuntu 14.04 system, but with no luck. The Ceph version I used is Luminous
> 12.2.2.
>
> I do this manually instead of ceph-deploy command, because I want
Hi,
We see the following in the logs after we start a scrub for some osds:
ceph-osd.2.log:2017-12-14 06:50:47.180344 7f0f47db2700 0 log_channel(cluster)
log [DBG] : 1.2d8 scrub starts
ceph-osd.2.log:2017-12-14 06:50:47.180915 7f0f47db2700 -1 osd.2 pg_epoch: 11897
pg[1.2d8( v 11890'165209
Quoting Webert de Souza Lima (webert.b...@gmail.com):
> Cool
>
>
> On Wed, Dec 13, 2017 at 11:04 AM, Stefan Kooman <ste...@bit.nl> wrote:
>
> > So, a "ceph osd ls" should give us a list, and we will pick the smallest
> > available number as
Quoting Nick Fisk (n...@fisk.me.uk):
> Hi All,
>
> Has anyone been testing the bluestore pool compression option?
>
> I have set compression=snappy on a RBD pool. When I add a new bluestore OSD,
> data is not being compressed when backfilling, confirmed by looking at the
> perf dump results. If
Dear list,
Somehow, might have to do with live migrating the virtual machine, an
rbd image ends up being undeletable. Trying to remove the image results
in *loads* of the same messages over and over again:
2017-11-07 11:30:58.431913 7f9ae2ffd700 -1 JournalPlayer: 0x7f9ae400a130
missing prior
Dear list,
In a ceph blog post about the new Luminous release there is a paragraph
on the need for ceph tuning [1]:
"If you are a Ceph power user and believe there is some setting that you
need to change for your environment to get the best performance, please
tell uswed like to either adjust
Quoting Alfredo Deza (ad...@redhat.com):
>
> Looks like there is a tag in there that broke it. Lets follow up on a
> tracker issue so that we don't hijack this thread?
>
> http://tracker.ceph.com/projects/ceph-volume/issues/new
Issue 22305 made for this: http://tracker.ceph.com/issues/22305
Hi List,
Will there be, some point in time, ceph luminous packages for Ubuntu
18.04 LTS (bionic)? Or are we supposed to upgrade to "Mimic" / 18.04 LTS
in one go?
Gr. Stefan
--
| BIT BV http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688
Quoting Anthony Verevkin (anth...@verevkin.ca):
> My thoughts on the subject are that even though checksums do allow to
> find which replica is corrupt without having to figure which 2 out of
> 3 copies are the same, this is not the only reason min_size=2 was
> required. Even if you are running
Quoting Dennis Benndorf (dennis.bennd...@googlemail.com):
> Hi,
>
> lets assume we have size=3 min_size=2 and lost some osds and now have some
> placement groups with only one copy left.
>
> Is there a setting to tell ceph to start recovering those pgs first in order
> to reach min_size and so
Quoting Reed Dier (reed.d...@focusvq.com):
>
> > On Jun 22, 2018, at 2:14 AM, Stefan Kooman wrote:
> >
> > Just checking here: Are you using the telegraf ceph plugin on the nodes?
> > In that case you _are_ duplicating data. But the good news is that you
> > d
Quoting Denny Fuchs (linuxm...@4lin.net):
>
> We have also a 2nd cluster which holds the VMs with also 128Gb Ram and 2 x
> Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz. But with only system disks (ZFS
> Raid1).
Storage doesn't matter for MDS, as they won't use it to store ceph data
(but instead use
Quoting John Spray (jsp...@redhat.com):
>
> The general idea with mgr plugins (Telegraf, etc) is that because
> there's only one active mgr daemon, you don't have to worry about
> duplicate feeds going in.
>
> I haven't use the icinga2 check_ceph plugin, but it seems like it's
> intended to run
Hi,
Quoting Stefan Kooman (ste...@bit.nl):
> Hi,
>
> We see the following in the logs after we start a scrub for some osds:
>
> ceph-osd.2.log:2017-12-14 06:50:47.180344 7f0f47db2700 0
> log_channel(cluster) log [DBG] : 1.2d8 scrub starts
> ceph-osd.2.log:2017-
Hi,
I know I'm not the only one with this question as I have see similar questions
on this list:
How to speed up recovery / backfilling?
Current status:
pgs: 155325434/800312109 objects degraded (19.408%)
1395 active+clean
440
Hi Ceph fs'ers
I have a question about the "mds_cache_memory_limit" parameter and MDS
memory usage. We currently have set mds_cache_memory_limit=150G.
The MDS server itself (and its active-standby) have 256 GB of RAM.
Eventually the MDS process will consume ~ 87.5% of available memory.
At that
Quoting Patrick Donnelly (pdonn...@redhat.com):
>
> It's expected but not desired: http://tracker.ceph.com/issues/21402
>
> The memory usage tracking is off by a constant factor. I'd suggest
> just lowering the limit so it's about where it should be for your
> system.
Thanks for the info. Yeah,
Quoting Konstantin Shalygin (k0...@k0ste.ru):
> >This is still a pre-production cluster. Most tests have been done
> >using rbd. We did make some rbd clones / snapshots here and there.
>
> What clients you used?
Only luminous clients. Mostly rbd (qemu-kvm) images.
Gr. Stefan
--
| BIT BV
Quoting Konstantin Shalygin (k0...@k0ste.ru):
> On 01/04/2018 11:38 PM, Stefan Kooman wrote:
> >Only luminous clients. Mostly rbd (qemu-kvm) images.
>
> Who is managed your images? May be OpenStack Cinder?
OpenNebula 5.4.3 (issuing rbd commands to ceph cluster).
Gr. Stefan
--
Quoting Chris Sarginson (csarg...@gmail.com):
> You probably want to consider increasing osd max backfills
>
> You should be able to inject this online
>
> http://docs.ceph.com/docs/luminous/rados/configuration/osd-config-ref/
>
> You might want to drop your osd recovery max active settings
Quoting Brent Kennedy (bkenn...@cfl.rr.com):
> Unfortunately, this cluster was setup before the calculator was in
> place and when the equation was not well understood. We have the
> storage space to move the pools and recreate them, which was
> apparently the only way to handle the issue( you
Quoting Steven Vacaroaia (ste...@gmail.com):
> Hi,
>
> I have noticed the below error message when creating a new OSD using
> ceph-volume
> deleting the OSD and recreating it does not work - same error message
>
> However, creating a new one OSD works
>
> Note
> No firewall /iptables are
Quoting Dan van der Ster (d...@vanderster.com):
>
> So, first question is: why didn't that OSD get detected as failing
> much earlier?
We have notiticed that "mon osd adjust heartbeat grace" made the cluster
"realize" OSDs going down _much_ later than the MONs / OSDs themselves.
Setting this
Hi,
While trying to get an OSD back in the test cluster, which had been
dropped out for unknown reason, we see a RocksDB Segmentation fault
during "compaction". I increased debugging to 20/20 for OSD / RocksDB,
see part of the logfile below:
... 49477, 49476, 49475, 49474, 49473, 49472, 49471,
Quoting Sage Weil (s...@newdream.net):
> Hi Stefan, Mehmet,
>
> Are these clusters that were upgraded from prior versions, or fresh
> luminous installs?
Fresh luminous install... The cluster was installed with
12.2.0, and later upgraded to 12.2.1 and 12.2.2.
> This message indicates that there
Quoting Stefan Kooman (ste...@bit.nl):
> Quoting Dan van der Ster (d...@vanderster.com):
> > Hi,
> >
> > We've used double the defaults for around 6 months now and haven't had any
> > behind on trimming errors in that time.
> >
> >mds log max segment
Hi,
We have two MDS servers. One active, one active-standby. While doing a
parallel rsync of 10 threads with loads of files, dirs, subdirs we get
the following HEALTH_WARN:
ceph health detail
HEALTH_WARN 2 MDSs behind on trimming
MDS_TRIM 2 MDSs behind on trimming
mdsmds2(mds.0): Behind on
Quoting Dan van der Ster (d...@vanderster.com):
> Hi,
>
> For someone who is not an lvm expert, does anyone have a recipe for
> destroying a ceph-volume lvm osd?
> (I have a failed disk which I want to deactivate / wipe before
> physically removing from the host, and the tooling for this doesn't
Quoting Dan van der Ster (d...@vanderster.com):
> Hi,
>
> We've used double the defaults for around 6 months now and haven't had any
> behind on trimming errors in that time.
>
>mds log max segments = 60
>mds log max expiring = 40
>
> Should be simple to try.
Yup, and works like a
Quoting Dan van der Ster (d...@vanderster.com):
> Thanks Stefan. But isn't there also some vgremove or lvremove magic
> that needs to bring down these /dev/dm-... devices I have?
Ah, you want to clean up properly before that. Sure:
lvremove -f /
vgremove
pvremove /dev/ceph-device (should wipe
Quoting Denny Fuchs (linuxm...@4lin.net):
> hi,
>
> > Am 19.06.2018 um 17:17 schrieb Kevin Hrpcek :
> >
> > # ceph auth get client.icinga
> > exported keyring for client.icinga
> > [client.icinga]
> > key =
> > caps mgr = "allow r"
> > caps mon = "allow r"
>
> thats the point: It's
Hi Gregory,
Quoting Gregory Farnum (gfar...@redhat.com):
> This is quite strange. Given that you have a log, I think what you want to
> do is find one request in the log, trace it through its lifetime, and see
> where the time is elapsed. You may find a bifurcation, where some
> categories of
Hi,
I'm trying to find out why ceph-fuse client(s) are slow. Luminous 12.2.7
Ceph cluster, Mimic 13.2.1 ceph-fuse client. Ubuntu xenial, 4.13.0-38-generic
kernel.
Test case:
25 curl requests directed at a single threaded apache process (apache2
-X).
When the requests are handled by ceph-kernel
Quoting Gregory Farnum (gfar...@redhat.com):
> Hmm, these aren't actually the start and end times to the same operation.
> put_inode() is literally adjusting a refcount, which can happen for reasons
> ranging from the VFS doing something that drops it to an internal operation
> completing to a
Hi,
Quoting Yan, Zheng (uker...@gmail.com):
> Could you strace apacha process, check which syscall waits for a long time.
Yes, that's how I did all the tests (strace -t -T apache2 -X). With
debug=20 (ceph-fuse) you see apache waiting for almost 20 seconds before it
starts serving data:
Quoting Abhishek Lekshmanan (abhis...@suse.com):
> *NOTE* The v12.2.5 release has a potential data corruption issue with
> erasure coded pools. If you ran v12.2.5 with erasure coding, please see
> below.
< snip >
> Upgrading from v12.2.5 or v12.2.6
> -
>
> If
Quoting Brett Chancellor (bchancel...@salesforce.com):
> The error will go away once you start storing data in the other pools. Or,
> you could simply silence the message with mon_pg_warn_max_object_skew = 0
Ran into this issue myself (again). Note to self: You need to restart the
_active_ MGR
Quoting Caspar Smit (caspars...@supernas.eu):
> Stefan,
>
> How many OSD's and how much RAM are in each server?
Currently 7 OSDs, 128 GB RAM. Max wil be 10 OSDs in these servers. 12
cores (at least one core per OSD).
> bluestore_cache_size=6G will not mean each OSD is using max 6GB RAM right?
Hi,
Quoting Stefan Kooman (ste...@bit.nl):
> Hi,
>
> TL;DR: we see "used" memory grows indefinitely on our OSD servers.
> Until the point that either 1) a OSD process gets killed by OOMkiller,
> or 2) OSD aborts (proably because malloc cannot provide more RAM).
Quoting Oliver Schulz (oliver.sch...@tu-dortmund.de):
> Dear Ceph Experts,
>
> I'm try to switch an old Ceph cluster from manual administration to
> ceph-deploy, but I'm running into the following error:
>
> # ceph-deploy gatherkeys HOSTNAME
>
> [HOSTNAME][INFO ] Running command: /usr/bin/ceph
doing some 5K writes. For sure this was not the limit. We would
hit max nic bandwith pretty soon though.
ceph++
Gr. Stefan
[1]: https://owncloud.kooman.org/s/mvbMCVLFbWjAyOn#pdfviewer
Quoting Stefan Kooman (ste...@bit.nl):
> Hi,
>
> I know I'm not the only one with this question as I have see similar
&
Hi,
TL;DR: we see "used" memory grows indefinitely on our OSD servers.
Until the point that either 1) a OSD process gets killed by OOMkiller,
or 2) OSD aborts (proably because malloc cannot provide more RAM). I
suspect a memory leak of the OSDs.
We were running 12.2.2. We are now running 12.2.3.
Quoting Dan van der Ster (d...@vanderster.com):
> Hi all,
>
> I'm just updating our test cluster from 12.2.2 to 12.2.4. Mon's and
> OSD's updated fine.
12.2.4? Did you mean 12.2.3? Or did I miss something?
Gr. stefan
--
| BIT BV http://www.bit.nl/Kamer van Koophandel 09090351
| GPG:
Quoting by morphin (morphinwith...@gmail.com):
> After 72 hours I believe we may hit a bug. Any help would be greatly
> appreciated.
Is it feasible for you to stop all client IO to the Ceph cluster? At
least until it stabilizes again. "ceph osd pause" would do the trick
(ceph osd unpause would
Quoting Gregory Farnum (gfar...@redhat.com):
>
> Ah, there's a misunderstanding here — the output isn't terribly clear.
> "is_healthy" is the name of a *function* in the source code. The line
>
> heartbeat_map is_healthy 'MDSRank' had timed out after 15
>
> is telling you that the
Quoting Patrick Donnelly (pdonn...@redhat.com):
> Thanks for the detailed notes. It looks like the MDS is stuck
> somewhere it's not even outputting any log messages. If possible, it'd
> be helpful to get a coredump (e.g. by sending SIGQUIT to the MDS) or,
> if you're comfortable with gdb, a
Quoting Wido den Hollander (w...@42on.com):
> Hi,
>
> Recently I've seen a Ceph cluster experience a few outages due to memory
> issues.
>
> The machines:
>
> - Intel Xeon E3 CPU
> - 32GB Memory
> - 8x 1.92TB SSD
> - Ubuntu 16.04
> - Ceph 12.2.8
What kernel version is running? What network
Quoting Ilya Dryomov (idryo...@gmail.com):
> On Sat, Nov 3, 2018 at 10:41 AM wrote:
> >
> > Hi.
> >
> > I tried to enable the "new smart balancing" - backend are on RH luminous
> > clients are Ubuntu 4.15 kernel.
[cut]
> > ok, so 4.15 kernel connects as a "hammer" (<1.0) client? Is there a
> >
Quoting Stefan Kooman (ste...@bit.nl):
> Quoting Patrick Donnelly (pdonn...@redhat.com):
> > Thanks for the detailed notes. It looks like the MDS is stuck
> > somewhere it's not even outputting any log messages. If possible, it'd
> > be helpful to get a coredump (e.g. by sendi
Quoting Stefan Kooman (ste...@bit.nl):
> I'm pretty sure it isn't. I'm trying to do the same (force luminous
> clients only) but ran into the same issue. Even when running 4.19 kernel
> it's interpreted as a jewel client. Here is the list I made so far:
>
> Ker
Dear list,
Today we hit our first Ceph MDS issue. Out of the blue the active MDS
stopped working:
mon.mon1 [WRN] daemon mds.mds1 is not responding, replacing it as rank 0 with
standby
daemon mds.mds2.
Logging of ceph-mds1:
2018-10-04 10:50:08.524745 7fdd516bf700 1 mds.mds1 asok_command:
Quoting by morphin (morphinwith...@gmail.com):
> Good news... :)
>
> After I tried everything. I decide to re-create my MONs from OSD's and
> I used the script:
> https://paste.ubuntu.com/p/rNMPdMPhT5/
>
> And it worked!!!
Congrats!
> I think when 2 server crashed and come back same time some
Quoting Stefan Kooman (ste...@bit.nl):
> > From what you've described here, it's most likely that the MDS is trying to
> > read something out of RADOS which is taking a long time, and which we
> > didn't expect to cause a slow down. You can check via the admin
Hi Patrick,
Quoting Stefan Kooman (ste...@bit.nl):
> Quoting Stefan Kooman (ste...@bit.nl):
> > Quoting Patrick Donnelly (pdonn...@redhat.com):
> > > Thanks for the detailed notes. It looks like the MDS is stuck
> > > somewhere it's not even outputting any log
Jay Munsterman schreef op 7 december 2018 21:55:25 CET:
>Hey all,
>I hope this is a simple question, but I haven't been able to figure it
>out.
>On one of our clusters there seems to be a disparity between the global
>available space and the space available to pools.
>
>$ ceph df
>GLOBAL:
>
Quoting Dan van der Ster (d...@vanderster.com):
> Haven't seen that exact issue.
>
> One thing to note though is that if osd_max_backfills is set to 1,
> then it can happen that PGs get into backfill state, taking that
> single reservation on a given OSD, and therefore the recovery_wait PGs
>
Quoting Janne Johansson (icepic...@gmail.com):
> Yes, when you add a drive (or 10), some PGs decide they should have one or
> more
> replicas on the new drives, a new empty PG is created there, and
> _then_ that replica
> will make that PG get into the "degraded" mode, meaning if it had 3
> fine
Quoting Cody (codeology@gmail.com):
> The Ceph OSD part of the cluster uses 3 identical servers with the
> following specifications:
>
> CPU: 2 x E5-2603 @1.8GHz
> RAM: 16GB
> Network: 1G port shared for Ceph public and cluster traffics
This will hamper throughput a lot.
> Journaling
Hi list,
During cluster expansion (adding extra disks to existing hosts) some
OSDs failed (FAILED assert(0 == "unexpected error", _txc_add_transaction
error (39) Directory not empty not handled on operation 21 (op 1,
counting from 0), full details: https://8n1.org/14078/c534). We had
Hi List,
Another interesting and unexpected thing we observed during cluster
expansion is the following. After we added extra disks to the cluster,
while "norebalance" flag was set, we put the new OSDs "IN". As soon as
we did that a couple of hundered objects would become degraded. During
that
Quoting Robin H. Johnson (robb...@gentoo.org):
> On Fri, Nov 23, 2018 at 04:03:25AM +0700, Lazuardi Nasution wrote:
> > I'm looking example Ceph configuration and topology on full layer 3
> > networking deployment. Maybe all daemons can use loopback alias address in
> > this case. But how to set
Hi List,
TL;DR: what application types are compatible with each other concerning
Ceph Pools?
I.e. is it safe to mix "RBD" pool with (some) native librados objects?
RBD / RGW / Cephfs all have their own pools. Since luminous release
there is this "application tag" to (somewhere in the future)
Hi John,
Quoting John Spray (jsp...@redhat.com):
> On Wed, Sep 12, 2018 at 2:59 PM Stefan Kooman wrote:
>
> When replaying a journal (either on MDS startup or on a standby-replay
> MDS), the replayed file creation operations are being checked for
> consistency with the state
Quoting John Spray (jsp...@redhat.com):
> On Thu, Sep 13, 2018 at 11:01 AM Stefan Kooman wrote:
> We implement locking, and it's correct that another client can't gain
> the lock until the first client is evicted. Aside from speeding up
> eviction by modifying the timeout, if you
Hi,
Once in a while, today a bit more often, the MDS is logging the
following:
mds.mds1 [WRN] replayed op client.15327973:15585315,15585103 used ino
0x19918de but session next is 0x1873b8b
Nothing of importance is logged in the mds (debug_mds_log": "1/5").
What does this warning
Quoting Yan, Zheng (uker...@gmail.com):
>
>
> please add '-f' option (trace child processes' syscall) to strace,
Good suggestion. We now see all apache child processes doing it's thing.
We have been, on and off, been stracing / debugging this issue. Nothing
obvious. We are still trying to get
Quoting Robert Sander (r.san...@heinlein-support.de):
> On 07.12.18 18:33, Scharfenberg, Buddy wrote:
>
> > We have 3 nodes set up, 1 with several large drives, 1 with a handful of
> > small ssds, and 1 with several nvme drives.
>
> This is a very unusual setup. Do you really have all your HDDs
Quoting Mike Perez (mipe...@redhat.com):
> Hi Serkan,
>
> I'm currently working on collecting the slides to have them posted to
> the Ceph Day Berlin page as Lenz mentioned they would show up. I will
> notify once the slides are available on mailing list/twitter. Thanks!
FYI: The Ceph Day Berlin
Quoting Matthew Vernon (m...@sanger.ac.uk):
> Hi,
>
> On our Jewel clusters, the mons keep a log of the cluster status e.g.
>
> 2019-01-24 14:00:00.028457 7f7a17bef700 0 log_channel(cluster) log [INF] :
> HEALTH_OK
> 2019-01-24 14:00:00.646719 7f7a46423700 0 log_channel(cluster) log [INF] :
>
Quoting Burkhard Linke (burkhard.li...@computational.bio.uni-giessen.de):
> Hi,
> Images:
>
> Straight-forward attempt would be exporting all images with qemu-img from
> one cluster, and uploading them again on the second cluster. But this will
> break snapshots, protections etc.
You can use
Quoting Stadsnet (jwil...@stads.net):
> On 26-3-2019 16:39, Ashley Merrick wrote:
> >Have you upgraded any OSD's?
>
>
> No didn't go through with the osd's
Just checking here: are your sure all PGs have been scrubbed while
running Luminous? As the release notes [1] mention this:
"If you are
Quoting Paul Emmerich (paul.emmer...@croit.io):
> This also happened sometimes during a Luminous -> Mimic upgrade due to
> a bug in Luminous; however I thought it was fixed on the ceph-mgr
> side.
> Maybe the fix was (also) required in the OSDs and you are seeing this
> because the running OSDs
Dear list,
After upgrading to 12.2.11 the MDSes are reporting slow metadata IOs
(MDS_SLOW_METADATA_IO). The metadata IOs would have been blocked for
more that 5 seconds. We have one active, and one active standby MDS. All
storage on SSD (Samsung PM863a / Intel DC4500). No other (OSD) slow ops
Quoting Wido den Hollander (w...@42on.com):
> Just wanted to chime in, I've seen this with Luminous+BlueStore+NVMe
> OSDs as well. Over time their latency increased until we started to
> notice I/O-wait inside VMs.
On a Luminous 12.2.8 cluster with only SSDs we also hit this issue I
guess.
Quoting Patrick Donnelly (pdonn...@redhat.com):
> On Thu, Feb 28, 2019 at 12:49 PM Stefan Kooman wrote:
> >
> > Dear list,
> >
> > After upgrading to 12.2.11 the MDSes are reporting slow metadata IOs
> > (MDS_SLOW_METADATA_IO). The metadata IOs would have been bl
Quoting Zack Brenton (z...@imposium.com):
> On Tue, Mar 12, 2019 at 6:10 AM Stefan Kooman wrote:
>
> > Hmm, 6 GiB of RAM is not a whole lot. Especially if you are going to
> > increase the amount of OSDs (partitions) like Patrick suggested. By
> > default it will take 4 G
Quoting Zack Brenton (z...@imposium.com):
> Types of devices:
> We run our Ceph pods on 3 AWS i3.2xlarge nodes. We're running 3 OSDs, 3
> Mons, and 2 MDS pods (1 active, 1 standby-replay). Currently, each pod runs
> with the following resources:
> - osds: 2 CPU, 6Gi RAM, 1.7Ti NVMe disk
> - mds:
Quoting Lars Täuber (taeu...@bbaw.de):
> > > This is something i was told to do, because a reconstruction of failed
> > > OSDs/disks would have a heavy impact on the backend network.
> >
> > Opinions vary on running "public" only versus "public" / "backend".
> > Having a separate "backend"
Quoting Lars Täuber (taeu...@bbaw.de):
> > I'd probably only use the 25G network for both networks instead of
> > using both. Splitting the network usually doesn't help.
>
> This is something i was told to do, because a reconstruction of failed
> OSDs/disks would have a heavy impact on the
1 - 100 of 176 matches
Mail list logo