Hi,
I have some problems with my ceph monitor nodes in my Cluster.
I had 5 mons in the cluster. On all 5 nodes the leveldb store grew up to
about 80 – 90 GB in size. To get rid of it I triggerd a compaction with
the following command on one node.
ceph tell mon.d compact
The Monitor
Thanks for your time,
I write a Python script and re-analysing client messaging log, list all
OSD ip and the time used for the read request to this OSD, and find slow
replies are all from node 10.10.11.12. So I do a network test, and there is
problem:
# iperf3 -c 10.10.11.15 -t 60 -i 1
On 13/05/2015, at 11.23, Steffen W Sørensen ste...@me.com wrote:
On 13/05/2015, at 04.08, Gregory Meno gm...@redhat.com
mailto:gm...@redhat.com wrote:
Ideally I would like everything in /var/log/calmari
be sure to set calamari.conf like so:
[shadow_man@vpm107 ~]$ grep DEBUG
Hello,
On Wed, 13 May 2015 18:09:46 +0800 changqian zuo wrote:
Thanks for your time,
I write a Python script and re-analysing client messaging log, list all
OSD ip and the time used for the read request to this OSD, and find slow
replies are all from node 10.10.11.12. So I do a network
On 13/05/2015, at 04.08, Gregory Meno gm...@redhat.com wrote:
Ideally I would like everything in /var/log/calmari
be sure to set calamari.conf like so:
[shadow_man@vpm107 ~]$ grep DEBUG /etc/calamari/calamari.conf
log_level = DEBUG
db_log_level = DEBUG
log_level = DEBUG
then
I run my mons as VMs inside of UCS blade compute nodes.
Do you use the fabric interconnects or the standalone blade chassis?
Jake
On Wednesday, May 13, 2015, Götz Reinicke - IT Koordinator
goetz.reini...@filmakademie.de wrote:
Hi Christian,
currently we do get good discounts as an
- Original Message -
From: Patrick McGarry pmcga...@redhat.com
To: 张忠波 zhangzhongbo2...@163.com, Ceph-User ceph-us...@ceph.com
Cc: community commun...@ceph.com
Sent: Tuesday, May 12, 2015 1:23:36 PM
Subject: Re: [ceph-users] Error in sys.exitfunc
Moving this to ceph-user where it
Hi,
Well, I've managed to find out that correct stop of osd causes no IO
downtime (/etc/init.d/ceph stop osd). But that cannot be called a fault
tolerance, which Ceph is supposed to be.
However, killall -9 ceph-osd causes IO to stop for about 20 seconds.
I've tried lowering some timeouts but
On Tue, May 12, 2015 at 9:13 PM, Abhishek L
abhishek.lekshma...@gmail.com wrote:
We've had a hammer (0.94.1) (virtual) 3 node/3 osd cluster with radosgws
failing to start, failing continously with the following error:
--8---cut here---start-8---
2015-05-06
If you are seeing ceph –s or ceph osd tree is showing all up , but in reality
it is not. I think the heart bit traffic is getting lost may be due to network
problem you indicated.
Also, the point that your node memory is running out means probably it is
trying to do recovery. OSDs are
Awesome! Thanks very much for looking into this. I also added a couple of quick
responses to your comments.
On May 12, 2015, at 7:02 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote:
I opened issue #11604, and have a fix for the issue. I updated our test suite
to cover the specific issue
Hi Jake,
we have the fabric interconnects.
MONs as VM? What setup do you have? and what cluster size?
Regards . Götz
Am 13.05.15 um 15:20 schrieb Jake Young:
I run my mons as VMs inside of UCS blade compute nodes.
Do you use the fabric interconnects or the standalone blade
I just pushed an update to the rados CLI that allows the setomapval command to
read the data from stdin. In your example below, the command to use would be:
# cat ./rbd_header.9a3ab3d1382f3-parent | rados -p volumes setomapval
rbd_header.9a3ab3d1382f3 parent
The change is currently under
On Wed, May 13, 2015 at 11:20 PM, Daniel Takatori Ohara
dtoh...@mochsl.org.br wrote:
Hello Lincoln,
Thank's for the answer. I will be upgrade the kernel in clients.
But, in the version 0.94.1 (hammer), the kernel is the same? Is the 3.16?
Pay attention to the or later part of v3.16.3 or
On Wed, May 13, 2015 at 12:08 PM, Daniel Takatori Ohara
dtoh...@mochsl.org.br wrote:
Hi,
We have a small ceph cluster with 4 OSD's and 1 MDS.
I run Ubuntu 14.04 with 3.13.0-52-generic in the clients, and CentOS 6.6
with 2.6.32-504.16.2.el6.x86_64 in Servers.
The version of Ceph is 0.94.1
In short, the drawback is false positives which can cause unnecessary cluster
churn.
-Sam
- Original Message -
From: Robert LeBlanc rob...@leblancnet.us
To: Vasiliy Angapov anga...@gmail.com
Cc: Sage Weil s...@newdream.net, ceph-users ceph-us...@ceph.com
Sent: Wednesday, May 13, 2015
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Don't let me deter you from pointing a gun at your foot. I think you
may be confusing Ceph for a SAN, which it certainly is not. In an
expensive SAN, 5 seconds it very unacceptable for a failover event
between controllers. However there are many
Thank Gregory for the answer.
I will be upgrade the kernel.
Do you know what kernel the CephFS is stable?
Thanks.
Att.
---
Daniel Takatori Ohara.
System Administrator - Lab. of Bioinformatics
Molecular Oncology Center
Instituto Sírio-Libanês de Ensino e Pesquisa
Hospital Sírio-Libanês
Phone:
Hi Daniel,
There are some kernel recommendations here, although it's unclear if they only
apply to RBD or also to CephFS.
http://ceph.com/docs/master/start/os-recommendations/
--Lincoln
On May 13, 2015, at 3:03 PM, Daniel Takatori Ohara wrote:
Thank Gregory for the answer.
I will
Hello Lincoln,
Thank's for the answer. I will be upgrade the kernel in clients.
But, in the version 0.94.1 (hammer), the kernel is the same? Is the 3.16?
Thank's,
Att.
---
Daniel Takatori Ohara.
System Administrator - Lab. of Bioinformatics
Molecular Oncology Center
Instituto Sírio-Libanês
Robert, thank you very much for sharing your wisdom with me! Much
appreciated.
I think I more or less got your point. Ceph is not a SAN, this sounds
logical.
What i'm trying to understand is what Ceph is for and what it is not for...
Is there any article about that? :)
I've heard Ceph is an
With CephFS, it seems to be safe bet to use the newest kernel available to you.
I believe you will need kernel 4.1+ if you are using Hammer CRUSH tunables
(straw2). There have been some threads on this recently.
--Lincoln
On May 13, 2015, at 3:20 PM, Daniel Takatori Ohara wrote:
Hello
Sorry for the delay. It took me a while to figure out how to do a range request
and append the data to a single file. The good news is that the end file seems
to be 14G in size which matches the files manifest size. The bad news is that
the file is completely corrupt and the radosgw log has
Hi,
We have a small ceph cluster with 4 OSD's and 1 MDS.
I run Ubuntu 14.04 with 3.13.0-52-generic in the clients, and CentOS 6.6
with 2.6.32-504.16.2.el6.x86_64 in Servers.
The version of Ceph is 0.94.1
Sometimes, the CephFS freeze, and the dmesg show me the follow messages :
May 13 15:53:10
That's another interesting issue. Note that for part 12_80 the manifest
specifies (I assume, by the messenger log) this part:
default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80
(note the
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Ceph is an Object Storage system that can also do block devices and a
file system. My impressions (and these are mine) is that it excels at
data integrity and providing enterprise class storage in a free and
open source format. It is a highly
In my never ending saga of calamari with minions on a big endian architecture
I've brought up another server from a clean install of Ubuntu 14.04. The
calamari master is now essperf13.
I was able to figure out how to rebuild salt-minion with zmq 3.0.5 which got
salt working so the 'salt \*
Possibly my issue as well. The calamari master is salt 0.17.5 but the minions
are running 2015.2.0rc2. I have to build the minions from source (big endian
unsupported architecture). All of my salt issues seemed to get resolved when I
got similar versions of ZMQ running on both master and
Ok, I dug a bit more, and it seems to me that the problem is with the manifest
that was created. I was able to reproduce a similar issue (opened ceph bug
#11622), for which I also have a fix.
I created new tests to cover this issue, and we'll get those recent fixes as
soon as we can, after we
Wow,
That must be a record. I didn’t realize that.
It turns out that you’ll have the best experience if the versions of master and
minion are in sync.
We test and use 2014.1.5 and are still evaluating 2014.7.Z.
Glad to hear things are working better.
regards,
Gregory
On May 13, 2015, at
On Wed, 13 May 2015, Vasiliy Angapov wrote:
Hi,
Well, I've managed to find out that correct stop of osd causes no IO
downtime (/etc/init.d/ceph stop osd). But that cannot be called a fault
tolerance, which Ceph is supposed to be.However, killall -9 ceph-osd
causes IO to stop for about 20
Hi!
Thank you for you effort, I think it will be very useful for many people!
Now I have solved the problem, using C program and radios library, 2 images
from 3 is completely restored, one was corrupted, but I have rescued all
important data :)
Pavel.
13 мая 2015 г., в 17:20, Jason
Thanks, Sage!
In the meanwhile I asked the same question in #Ceph IRC channel and Be_El
gave me exactly the same answer, which helped.
I also realized that in
http://ceph.com/docs/master/rados/configuration/mon-osd-interaction/ it is
stated: You may change this grace period by adding an osd
Hi:
Everyone!
I am trying to setup replication on two clusters right now.
please go through my previous steps.
on the slave zone(cluster), when i do the follow command, I found som
errors:??
radosgw-admin user stats --uid=us-test-east --sync-stats --name
1. No packet drop found in system log.
2. ceph health detail shows:
# ceph health detail
HEALTH_WARN
mon.bj-ceph10 addr 10.10.11.23:6789/0 has 43% avail disk space -- store is
getting too big! 77364 MB = 40960 MB
mon.bj-ceph12 addr 10.10.11.25:6789/0 has 43% avail disk space -- store is
getting
Thank you so much Yahuda! I look forward to testing these. Is there a way
for me to pull this code in? Is it in master?
On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub yeh...@redhat.com wrote:
Ok, I dug a bit more, and it seems to me that the problem is with the
manifest that was created. I
Hello,
On Thu, 14 May 2015 09:36:14 +0800 changqian zuo wrote:
1. No packet drop found in system log.
Is that storage node with the bad network fixed?
2. ceph health detail shows:
# ceph health detail
HEALTH_WARN
mon.bj-ceph10 addr 10.10.11.23:6789/0 has 43% avail disk space -- store
The code is in wip-11620, abd it's currently on top of the next branch. We'll
get it through the tests, then get it into hammer and firefly. I wouldn't
recommend installing it in production without proper testing first.
Yehuda
- Original Message -
From: Sean Sullivan
Christian,
EC pool is not supporting overwrites/partial writes and thus not supported
(directly) with block/file interfaces.
Did you put Cache tier in-front for your test with fio ?
Thanks Regards
Somnath
-Original Message-
From: Christian Balzer [mailto:ch...@gol.com]
Sent: Tuesday,
Hello,
On Wed, 13 May 2015 06:11:25 + Somnath Roy wrote:
Christian,
EC pool is not supporting overwrites/partial writes and thus not
supported (directly) with block/file interfaces. Did you put Cache tier
in-front for your test with fio ?
No, I never used EC and/or cache-tiers.
The
Hi Christian,
currently we do get good discounts as an University and the bundles were
worth it.
The chassis do have multiple PSUs and n 10Gb Ports (40Gb is possible).
The switch connection is redundant.
Cuurrently we think of 10 SATA OSD nodes + x SSD Cache Pool Nodes and 5
MONs. For a start.
Can you give some more insight about the ceph cluster you are running ?
It seems IO started and then no response..cur MB/s is becoming 0s..
What is ‘ceph –s’ output ?
Hope all the OSDs are up and running..
Thanks Regards
Somnath
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
Thanks, Gregory!
My Ceph version is 0.94.1. What I'm trying to test is the worst situation
when the node is loosing network or becomes inresponsive. So what i do is
killall -9 ceph-osd, then reboot.
Well, I also tried to do a clean reboot several times (just a reboot
command), but i saw no
Yes, I want to set myself candidate for the Ceph User Committee election.
I've been with the Ceph project for over 5 years now and have been
developing various kinds of tools and integrations for Ceph.
Next to that I maintain the EU mirror and I want to expand the mirror
system of Ceph so that
On 12/05/2015, at 19.51, Bruce McFarland bruce.mcfarl...@taec.toshiba.com
wrote:
I am having a similar issue. The cluster is up and salt is running on and has
accepted keys from all nodes, including the monitor. I can issue salt and
salt/ceph.py commands from the Calamari including
Hello,
in addition to what Somnath wrote, if you're seeing this kind of blocking
reads _and_ have slow write warnings in the logs, your cluster is likely
either unhealthy and/or underpowered for it's current load.
If your cluster is healthy, you may want to investigate what's busy, my
guess is
Hello fellow Ceph admins,
I have a need to run some periodic scripts against my Ceph cluster.
For example creating new snapshots or cleaning up old ones. I'd
preferably want to configure this periodic artifact on all my
monitors, but only execute it on the leader.
I've come up with the following
Hi, colleagues!
I'm testing a simple Ceph cluster in order to use it in production
environment. I have 8 OSDs (1Tb SATA drives) which are evenly distributed
between 4 nodes.
I'v mapped rbd image on the client node and started writing a lot of data
to it. Then I just reboot one node and see
Cache tier will definitely cause lot of WA irrespective of you use replication
or EC. There are some improvement coming in Infarnalis time and hope that will
help in WA.
Sorry, I am yet to use Cache tier , so, I don't have the data in terms of
performance/WA.
Will keep community posted once I
On Tue, May 12, 2015 at 11:39 PM, Vasiliy Angapov anga...@gmail.com wrote:
Hi, colleagues!
I'm testing a simple Ceph cluster in order to use it in production
environment. I have 8 OSDs (1Tb SATA drives) which are evenly distributed
between 4 nodes.
I'v mapped rbd image on the client node
Hi,
Same use case... We do:
/usr/bin/ceph --admin-daemon /var/run/ceph/ceph-mon.*.asok mon_status |
/usr/bin/json_reformat | /bin/grep state | /bin/grep -q leader ...
Cheers, Dan
On May 13, 2015 09:35, Kai Storbeck k...@xs4all.net wrote:
Hello fellow Ceph admins,
I have a need to run some
Hi,
On 13/05/2015 09:35, Kai Storbeck wrote:
Hello fellow Ceph admins,
I have a need to run some periodic scripts against my Ceph cluster.
For example creating new snapshots or cleaning up old ones. I'd
preferably want to configure this periodic artifact on all my
monitors, but only
OK, I finally got mine working. For whatever reason, the latest version of
salt was the issue for me. Leaving the latest version of salt on the calamari
server is working, but had to downgrade the minions.
Removed:
salt.noarch 0:2014.7.5-1.el6salt-minion.noarch
53 matches
Mail list logo