Wow. Thanks
Not very operations friendly though…
Wouldn’t it be just OK to pull the disk that we think is the bad one, check the
serial number, and if not, just replug and let the udev rules do their job and
re-insert the disk in the ceph cluster ?
(provided XFS doesn’t freeze for good when we
Hello cephers,
I need your help and suggestion on what is going on with my cluster. A few
weeks ago i've upgraded from Firefly to Giant. I've previously written about
having issues with Giant where in two weeks period the cluster's IO froze three
times after ceph down-ed two osds. I have in
Hi Kevin,
There are every (I think) MDS tunables listed on this page with a short
description : http://ceph.com/docs/master/cephfs/mds-config-ref/
Can you tell us how your cluster behave after the mds-cache-size
change ? What is your MDS ram consumption, before and after ?
Thanks !
--
Thomas
Nobody knows where should be problem?
On Wed, Nov 12, 2014 at 10:41:36PM +0100, houmles wrote:
Hi,
I have 2 hosts with 8 2TB drive in each.
I want to have 2 replicas between both hosts and then 2 replicas between osds
on each host. That way even when I lost one host I still have 2
What do you mean by osd level? Pool has size 4 and min_size 1.
On Tue, Nov 18, 2014 at 10:32:11AM +, Anand Bhat wrote:
What are the setting for min_size and size at OSD level in your Ceph
configuration ? Looks like size is set to 2 which halves your total storage
as two copies of the
Has anyone tried applying this fix to see if it makes any difference?
https://github.com/ceph/ceph/pull/2374
I might be in a position in a few days to build a test cluster to test myself,
but was wondering if anyone else has had any luck with it?
Nick
-Original Message-
From:
Dear all,
i try to install ceph but i get errors:
#ceph-deploy install node1
[]
[ceph_deploy.install][DEBUG ] Installing stable version *firefly *on
cluster ceph hosts node1
[ceph_deploy.install][DEBUG ] Detecting platform for host node1 ...
[]
That would probably have helped. The XFS deadlocks would only occur when
there was relatively little free memory. Kernel 3.18 is supposed to have a
fix for that, but I haven't tried it yet.
Looking at my actual usage, I don't even need 64k inodes. 64k inodes
should make things a bit faster
You shouldn't let the cluster get so full that losing a few OSDs will make
you go toofull. Letting the cluster get to 100% full is such a bad idea
that you should make sure it doesn't happen.
Ceph is supposed to stop moving data to an OSD once that OSD hits
osd_backfill_full_ratio, which
On Tue, Nov 18, 2014 at 10:04 PM, Craig Lewis cle...@centraldesktop.com wrote:
That would probably have helped. The XFS deadlocks would only occur when
there was relatively little free memory. Kernel 3.18 is supposed to have a
fix for that, but I haven't tried it yet.
Looking at my actual
Ok, why is ceph marking osds down? Post your ceph.log from one of the
problematic periods.
-Sam
On Tue, Nov 18, 2014 at 1:35 AM, Andrei Mikhailovsky and...@arhont.com wrote:
Hello cephers,
I need your help and suggestion on what is going on with my cluster. A few
weeks ago i've upgraded from
Thanks guys. I looked at http://tracker.ceph.com/issues/8818 and chatted with
dis on #ceph-devel.
I ran a LOT of tests on a LOT of comabination of kernels (sometimes with
tunables legacy). I haven't found a magical combination in which the following
test does not hang:
fio --name=writefile
Hi people, I have two identical servers (both Sun X2100 M2's) that form
part of a cluster of 3 machines (other machines will be added later). I
want to bond two GB ethernet ports on these, which works perfectly on the
one, but not on the other.
How can this be?
The one machine (named S2)
Hi David,
Have you tried on a normal replicated pool with no cache? I've seen a number
of threads recently where caching is causing various things to block/hang.
It would be interesting to see if this still happens without the caching
layer, at least it would rule it out.
Also is there any sign
Hi David,
Just to let you know I finally managed to get to the bottom of this.
In the repo.pp one of the authors has a non ASCII character in his name, for
whatever reason this was tripping up my puppet environment. After removing
the following line:-
# Author: François Charlier
Sam, the logs are rather large in size. Where should I post it to?
Thanks
- Original Message -
From: Samuel Just sam.j...@inktank.com
To: Andrei Mikhailovsky and...@arhont.com
Cc: ceph-users@lists.ceph.com
Sent: Tuesday, 18 November, 2014 7:54:56 PM
Subject: Re: [ceph-users] Giant
Great find Nick.
I've discussed it on IRC and it does look like a real issue:
https://github.com/enovance/edeploy-roles/blob/master/puppet-master.install#L48-L52
I've pushed the fix for review: https://review.openstack.org/#/c/135421/
--
David Moreau Simard
On Nov 18, 2014, at 3:32 PM, Nick
pastebin or something, probably.
-Sam
On Tue, Nov 18, 2014 at 12:34 PM, Andrei Mikhailovsky and...@arhont.com wrote:
Sam, the logs are rather large in size. Where should I post it to?
Thanks
From: Samuel Just sam.j...@inktank.com
To: Andrei Mikhailovsky
Hi Thomas,
I looked over the mds config reference a bit yesterday, but mds cache size
seems to be the most relevant tunable.
As suggested, I upped mds-cache-size to 1 million yesterday and started the
load generator. During load generation, we’re seeing similar behavior on the
filesystem and
Hello everyone,
I'm new to ceph but been working with proprietary clustered filesystem for
quite some time.
I almost understand how ceph works, but have a couple of questions which
have been asked before here, but i didn't understand the answer.
In the closed source world, we use clustered
On Tue, Nov 18, 2014 at 1:26 PM, hp cre hpc...@gmail.com wrote:
Hello everyone,
I'm new to ceph but been working with proprietary clustered filesystem for
quite some time.
I almost understand how ceph works, but have a couple of questions which
have been asked before here, but i didn't
On Tue, Nov 11, 2014 at 11:43 PM, Gauvain Pocentek
gauvain.pocen...@objectif-libre.com wrote:
Hi all,
I'm facing a problem on a ceph deployment. rados mkpool always fails:
# rados -n client.admin mkpool test
error creating pool test: (2) No such file or directory
rados lspool and rmpool
I solved by installing EPEL repo on yum.
I think that somebody should write down in the documentation that EPEL
is mandatory
Il 18/11/2014 14:29, Massimiliano Cuttini ha scritto:
Dear all,
i try to install ceph but i get errors:
#ceph-deploy install node1
[]
Ok thanks Greg.
But what openstack does, AFAIU, is use rbd devices directly, one for each
Vm instance, right? And that's how it supports live migrations on KVM,
etc.. Right? Openstack and similar cloud frameworks don't need to create vm
instances on filesystems, am I correct?
On 18 Nov 2014
Hi Massimiliano,
I just recreated this bug myself. Ceph-deploy is supposed to install EPEL
automatically on the platforms that need it. I just confirmed that it is
not doing so, and will be opening up a bug in the Ceph tracker. I'll paste
it here when I do so you can follow it. Thanks for the
I can't speak for OpenStack, but OpenNebula uses Libvirt/QEMU/KVM to access an
RBD directly for each virtual instance deployed, live-migration included (as
each RBD is in and of itself a separate block device, not file system). I would
imagine OpenStack works in a similar fashion.
-
On Tue, Nov 18, 2014 at 1:43 PM, hp cre hpc...@gmail.com wrote:
Ok thanks Greg.
But what openstack does, AFAIU, is use rbd devices directly, one for each
Vm instance, right? And that's how it supports live migrations on KVM,
etc.. Right? Openstack and similar cloud frameworks don't need to
Yes Openstack also uses libvirt/qemu/kvm, thanks.
On 18 Nov 2014 23:50, Campbell, Bill bcampb...@axcess-financial.com
wrote:
I can't speak for OpenStack, but OpenNebula uses Libvirt/QEMU/KVM to
access an RBD directly for each virtual instance deployed, live-migration
included (as each RBD is
On Thu, Nov 13, 2014 at 9:34 AM, Lincoln Bryant linco...@uchicago.edu wrote:
Hi all,
Just providing an update to this -- I started the mds daemon on a new server
and rebooted a box with a hung CephFS mount (from the first crash) and the
problem seems to have gone away.
I'm still not sure
I've captured this at http://tracker.ceph.com/issues/10133
On Tue, Nov 18, 2014 at 4:48 PM, Travis Rhoden trho...@gmail.com wrote:
Hi Massimiliano,
I just recreated this bug myself. Ceph-deploy is supposed to install EPEL
automatically on the platforms that need it. I just confirmed that
Then.
...very good! :)
Ok, the next bad thing is that I have installed GIANT on Admin node.
However ceph-deploy ignore ADMIN node installation and install FIREFLY.
Now i have ceph-deploy of Giant on my ADMIN node and my first OSD node
with FIREFLY.
It seems to me odd. Is it fine or i
We currently have a 3 node system with 3 monitor nodes. I created them in
the initial setup and the ceph.conf
mon initial members = Ceph200, Ceph201, Ceph202
mon host = 10.10.5.31,10.10.5.32,10.10.5.33
We are in the process of expanding and installing dedicated mon servers.
I know I can run:
It's a little strange, but with just the one-sided log it looks as
though the OSD is setting up a bunch of connections and then
deliberately tearing them down again within second or two (i.e., this
is not a direct messenger bug, but it might be an OSD one, or it might
be something else).
Is it
On Sun, Nov 16, 2014 at 4:17 PM, Anthony Alba ascanio.al...@gmail.com wrote:
The step emit documentation states
Outputs the current value and empties the stack. Typically used at
the end of a rule, but may also be used to pick from different trees
in the same rule.
What use case is there
Hmm, last time we saw this it meant that the MDS log had gotten
corrupted somehow and was a little short (in that case due to the OSDs
filling up). What do you mean by rebuilt the OSDs?
-Greg
On Mon, Nov 17, 2014 at 12:52 PM, JIten Shah jshah2...@me.com wrote:
After i rebuilt the OSD’s, the MDS
Sam,
Pastebin or similar will not take tens of megabytes worth of logs. If we are
talking about debug_ms 10 setting, I've got about 7gb worth of logs generated
every half an hour or so. Not really sure what to do with that much data.
Anything more constructive?
Thanks
- Original
I was going to submit this as a bug, but thought I would put it here for
discussion first. I have a feeling that it could be behavior by design.
ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
I'm using a cache pool and was playing around with the size and min_size on
the pool to
I believe the reason we don't allow you to do this right now is that
there was not a good way of coordinating the transition (so that
everybody starts routing traffic through the cache pool at the same
time), which could lead to data inconsistencies. Looks like the OSDs
handle this appropriately
Testing without the cache tiering is the next test I want to do when I have
time..
When it's hanging, there is no activity at all on the cluster.
Nothing in ceph -w, nothing in ceph osd pool stats.
I'll provide an update when I have a chance to test without tiering.
--
David Moreau Simard
On Wed, Nov 12, 2014 at 1:41 PM, houmles houm...@gmail.com wrote:
Hi,
I have 2 hosts with 8 2TB drive in each.
I want to have 2 replicas between both hosts and then 2 replicas between osds
on each host. That way even when I lost one host I still have 2 replicas.
Currently I have this
On Nov 18, 2014 4:48 PM, Gregory Farnum g...@gregs42.com wrote:
On Tue, Nov 18, 2014 at 3:38 PM, Robert LeBlanc rob...@leblancnet.us
wrote:
I was going to submit this as a bug, but thought I would put it here for
discussion first. I have a feeling that it could be behavior by design.
Hi Dave
Did you say iscsi only? The tracker issue does not say though.
I am on giant, with both client and ceph on RHEL 7 and seems to work ok, unless
I am missing something here. RBD on baremetal with kmod-rbd and caching
disabled.
[root@compute4 ~]# time fio --name=writefile --size=100G
Hmm, the problem is I had not modified any config, all the config
is default.
as you said, all the IO should be stopped by the configs
mon_osd_full_ration or osd_failsafe_full_ration. In my test, when
the osd near full, the IO from rest bench stopped, but the backfill
IO did not stop.
I think I just solved at least part of the problem.
Because of the somewhat peculiar way that I have Docker configured, docker
instances on another system were being assigned my OSD's IP address,
running for a couple seconds, and then failing (for unrelated reasons).
Effectively, there was
44 matches
Mail list logo