Hello, Nick!
Thank you for your reply! I have tested both with setting the replicas
number to 2 and 3, by setting the 'osd pool default size = (2|3)' in the
.conf file. Either I'm doing something incorrectly, or they seem to produce
the same result.
Can you give any troubleshooting advice? I
there was a blackout and one of my osds remains down, I have noticed that the
journal partition an data partion is not showed anymore so the device cannot
mounted…
8 1145241856 sdh2
8 128 3906249728 sdi
8 129 3901005807 sdi1
8 1305241856 sdi2
8
Hi,
If device mounted is not coming up, you can replace with new disk and ceph
will handle rebalancing the data.
Here are the steps if you would like to replace the failed disk with new
one :
1. ceph osd out osd.110
2. Now remove this failed OSD from Crush Map , as soon as its removed from
HI Bogdan,
Please paste the output of `ceph osd dump` and ceph osd tree`
Thanks
Sahana
On Fri, Mar 20, 2015 at 11:47 AM, Bogdan SOLGA bogdan.so...@gmail.com
wrote:
Hello, Nick!
Thank you for your reply! I have tested both with setting the replicas
number to 2 and 3, by setting the 'osd
If I run from command prompt it gives below error in $piece =
rados_read($ioRados, 'TEMP_object',$pieceSize['psize'] ,0);
--
Segmentation fault (core dumped)
--
I have tried new version of librados too...
--
Hi all, can anybody tell me how can I force delete osds? the thing is that one
node got corrupted because of outage, so there is no way to get those osd up
and back, is there anyway to force the removal from ceph-deploy node?
Thanks
[cid:image005.png@01D00809.A6D502D0]
Jesus Chavez
SYSTEMS
Hello, Sahana!
The output of the requested commands is listed below:
admin@cp-admin:~/safedrive$ ceph osd dump
epoch 26
fsid 7db3cf23-ddcb-40d9-874b-d7434bd8463d
created 2015-03-20 07:53:37.948969
modified 2015-03-20 08:11:18.813790
flags
pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0
Hi,
On 03/20/2015 01:58 AM, houguanghua wrote:
Dear all,
Ceph 0.72.2 is deployed in three hosts. But the ceph's status is
HEALTH_WARN . The status is as follows:
# ceph -s
cluster e25909ed-25d9-42fd-8c97-0ed31eec6194
health HEALTH_WARN 768 pgs degraded; 768 pgs stuck
On 19 Mar 2015, at 08:17, Christian Balzer ch...@gol.com wrote:
On Wed, 18 Mar 2015 08:59:14 +0100 Josef Johansson wrote:
Hi,
On 18 Mar 2015, at 05:29, Christian Balzer ch...@gol.com wrote:
Hello,
On Wed, 18 Mar 2015 03:52:22 +0100 Josef Johansson wrote:
[snip]
We though of
Hi Bogdan,
Here is the link for hardware recccomendations :
http://ceph.com/docs/master/start/hardware-recommendations/#hard-disk-drives.
As per this link, minimum size reccommended for osds is 1TB.
Butt as Nick said, Ceph OSDs must be min. 10GB to get an weight of 0.01
Here is the snippet
Hi,
On 03/19/2015 10:41 PM, Nick Fisk wrote:
I'm looking at trialling OSD's with a small flashcache device over them to
hopefully reduce the impact of metadata updates when doing small block io.
Inspiration from here:-
http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/12083
One
Hi Ghislain,
You will find more information about tools and methods at
On 20/03/2015 11:47, ghislain.cheval...@orange.com wrote:
Hi all,
I would like to compute the durability of data stored in a ceph environment
according to the cluster topology (failure domains) and the data
(that's what happens when typing Control-Enter V instead of Control-V enter ;-)
On 20/03/2015 11:50, Loic Dachary wrote:
Hi Ghislain,
You will find more information about tools and methods at
https://wiki.ceph.com/Development/Reliability_model/Final_report
Enjoy !
On 20/03/2015 11:47,
Hello Guys
My CEPH cluster lost data and not its not recovering. This problem occurred
when Ceph performed recovery when one of the node was down.
Now all the nodes are up but Ceph is showing PG as incomplete , unclean ,
recovering.
I have tried several things to recover them like , scrub ,
Hello Guys
My CEPH cluster lost data and not its not recovering. This problem occurred
when Ceph performed recovery when one of the node was down.
Now all the nodes are up but Ceph is showing PG as incomplete , unclean ,
recovering.
I have tried several things to recover them like , scrub ,
Hi all,
I would like to compute the durability of data stored in a ceph environment
according to the cluster topology (failure domains) and the data resiliency
(replication/erasure coding).
Does a tool exist ?
Best regards
- - - - - - - - - - - - - - - - -
Ghislain Chevalier ORANGE
Thank you for your suggestion, Nick! I have re-weighted the OSDs and the
status has changed to '256 active+clean'.
Is this information clearly stated in the documentation, and I have missed
it? In case it isn't - I think it would be recommended to add it, as the
issue might be encountered by
- Mail original -
Hi all, can anybody tell me how can I force delete osds? the thing is that
one node got corrupted because of outage, so there is no way to get those
osd up and back, is there anyway to force the removal from ceph-deploy node?
Hi,
Try manual :
*
Ah, I was wondering myself if compression could be causing an issue, but I'm
reconsidering now. My latest experiment should hopefully help troubleshoot.
So, I remembered that ZLIB is slower, but is more 'safe for old kernels'. I try
that:
find /var/lib/ceph/osd/ceph-1/current -xdev \( -type f
On Fri, Mar 20, 2015 at 4:03 PM, Chris Murray chrismurra...@gmail.com wrote:
Ah, I was wondering myself if compression could be causing an issue, but I'm
reconsidering now. My latest experiment should hopefully help troubleshoot.
So, I remembered that ZLIB is slower, but is more 'safe for old
Hi!
We have experienced several blackouts on our small ceph cluster.
Most annoying problem is time desync just after a blackout: mons are not
starting to work before time sync, after resync and manual restart of monitors,
some of pgs can stuck in inactive or peering state for a significant
Yes, at this point, I'd export the CRUSH, edit it and import it back in.
What version are you running?
Robert LeBlanc
Sent from a mobile device please excuse any typos.
On Mar 20, 2015 4:28 PM, Jesus Chavez (jeschave) jesch...@cisco.com
wrote:
thats what you sayd?
[root@capricornio ~]#
Hi Jerry,
We using RGW and RBD in our OpenStack clusters and as stand alone clusters.
We have six large clusters and adding more. Most of any issues we have
faced have been self inflicted such as not currently supporting bucket
names like host names. Some S3 tools only work that way which causes
Hi guys,
I'm trying to test rbd snapshot on a Centos 7.
# rbd -p rbd ls
test-a
test-b
test-c
test-d
# rbd snap create rbd/test-b@snap
rbd: failed to create snapshot: (22) Invalid argument
2015-03-20 15:22:56.300731 7f78f7afe880 -1 librbd: failed to create
snap id: (22) Invalid argument
I
Hi,
I'll build my full ssd production soon,
I wonder which distrib is best tested with inktank and ceph team ?
ceph.com doc is quite old, and don't have reference for giant or hammer
http://ceph.com/docs/master/start/os-recommendations/
Seem than in past only ubuntu and rhel was well tested,
Have you tried it from a different node? like the ceph-mon or another
ceph-osd node?
On Fri, Mar 20, 2015 at 11:23 AM, Jesus Chavez (jeschave)
jesch...@cisco.com wrote:
Thanks stephane the thing is that those steps needs to be run in the
node where the osd lives, I dont have that node any
Thanks stephane the thing is that those steps needs to be run in the node where
the osd lives, I dont have that node any more since the operating Systems got
corrupted so I Couldnt make it work :(
Thanks
Jesus Chavez
SYSTEMS ENGINEER-C.SALES
jesch...@cisco.commailto:jesch...@cisco.com
Phone:
For all intents and purposes, centos and rhel are equivalent, so I'd not be
too concerned about that distinction. I can't comment as to which distro is
better tested by ceph devs, but assuming that the packages are built
appropriately with similar dependency versions and whatnot, that also
On 20/03/2015 15:23, Jesus Chavez (jeschave) wrote:
Thanks stephane the thing is that those steps needs to be run in the
node where the osd lives, I dont have that node any more since the
operating Systems got corrupted so I Couldnt make it work :(
Assuming the OSD is already down+out, you
On Fri, Mar 20, 2015 at 12:39 PM, Daniel Takatori Ohara
dtoh...@mochsl.org.br wrote:
Hello,
Anybody help me, please? Appear any messages in log of my mds.
And after the shell of my clients freeze.
2015-03-20 12:23:54.068005 7f1608d49700 0 log_channel(default) log [WRN] :
client.3197487
On Fri, Mar 20, 2015 at 1:05 PM, Ridwan Rashid ridwan...@gmail.com wrote:
Gregory Farnum greg@... writes:
On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid ridwan064@... wrote:
Hi,
I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with
cephFS. I have installed
On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid ridwan...@gmail.com wrote:
Hi,
I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with
cephFS. I have installed hadoop-1.1.1 in the nodes and changed the
conf/core-site.xml file according to the ceph documentation
Thank you for the clarifications, Sahana!
I haven't got to that part, yet, so these details were (yet) unknown to me.
Perhaps some information on the PGs weight should be provided in the 'quick
deployment' page, as this issue might be encountered in the future by other
users, as well.
Kind
Any idea how to forcé remove ? Thanks
Jesus Chavez
SYSTEMS ENGINEER-C.SALES
jesch...@cisco.commailto:jesch...@cisco.com
Phone: +52 55 5267 3146tel:+52%2055%205267%203146
Mobile: +51 1 5538883255tel:+51%201%205538883255
CCIE - 44433
Begin forwarded message:
From: Stéphane DUGRAVOT
This seems to be a fairly consistent problem for new users.
The create-or-move is adjusting the crush weight, not the osd weight.
Perhaps the init script should set the defaultweight to 0.01 if it's = 0?
It seems like there's a downside to this, but I don't see it.
On Fri, Mar 20, 2015 at
osdmap e261536: 239 osds: 239 up, 238 in
Why is that last OSD not IN? The history you need is probably there.
Run ceph pg pgid query on some of the stuck PGs. Look for
the recovery_state section. That should tell you what Ceph needs to
complete the recovery.
If you need more help, post
Gregory Farnum greg@... writes:
On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid ridwan064@... wrote:
Hi,
I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with
cephFS. I have installed hadoop-1.1.1 in the nodes and changed the
conf/core-site.xml file according to the
You can create CRUSH rulesets and then assign pools to different rulesets.
http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds
On Thu, Mar 19, 2015 at 7:28 PM, Garg, Pankaj
pankaj.g...@caviumnetworks.com wrote:
Hi,
I have a Ceph cluster with
Hello,
Anybody help me, please? Appear any messages in log of my mds.
And after the shell of my clients freeze.
2015-03-20 12:23:54.068005 7f1608d49700 0 log_channel(default) log [WRN] :
client.3197487 isn't responding to mclientcaps(revoke), ino 11b1696
pending pAsxLsXsxFcb issued
Removing the OSD from the CRUSH map and deleting the auth key is how you
force remove an OSD. The OSD can no longer participate in the cluster, even
if it does come back to life. All clients forget about the OSD when the new
CRUSH map is distributed.
On Fri, Mar 20, 2015 at 11:19 AM, Jesus Chavez
We tested bcache and abandoned it for two reasons.
1. Didn't give us any better performance than journals on SSD.
2. We had lots of corruption of the OSDs and were rebuilding them
frequently.
Since removing them, the OSDs have been much more stable.
On Fri, Mar 20, 2015 at 4:03 AM,
The weight can be based on anything, size, speed, capability, some random
value, etc. The important thing is that it makes sense to you and that you
are consistent.
Ceph by default (ceph-disk and I believe ceph-deploy) take the approach of
using size. So if you use a different weighting scheme,
This isn't a hard limit on the number, but it's recommended that you keep
it around 100. Smaller values cause data distribution evenness problems.
Larger values cause the OSD processes to use more CPU, RAM, and file
descriptors, particularly during recovery. With that many OSDs, you're
going to
Yes that exactly what I did but ceph osd tree still shows the osds
Jesus Chavez
SYSTEMS ENGINEER-C.SALES
jesch...@cisco.commailto:jesch...@cisco.com
Phone: +52 55 5267 3146tel:+52%2055%205267%203146
Mobile: +51 1 5538883255tel:+51%201%205538883255
CCIE - 44433
On Mar 20, 2015, at 9:41 AM,
I like this idea. I was under the impression that udev did not call the
init script, but ceph-disk directly. I don't see ceph-disk calling
create-or-move, but I know it does because I see it in the ceph -w when I
boot up OSDs.
/lib/udev/rules.d/95-ceph-osd.rules
# activate ceph-tagged partitions
Maybe I should Edit the crushmap and delete osd... Is that a way yo force them?
Thanks
Jesus Chavez
SYSTEMS ENGINEER-C.SALES
jesch...@cisco.commailto:jesch...@cisco.com
Phone: +52 55 5267 3146tel:+52%2055%205267%203146
Mobile: +51 1 5538883255tel:+51%201%205538883255
CCIE - 44433
On Mar 20,
I see the Problem, as your OSD's are only 8GB they have a zero weight, I think
the minimum size you can get away with is 10GB in Ceph as the size is measured
in TB and only has 2 decimal places.
For a work around try running :-
ceph osd crush reweight osd.X 1
for each osd, this will
On 19/03/2015, at 17.46, O'Reilly, Dan daniel.orei...@dish.com wrote:
The problem with using the hpsa driver is that I need to install RHEL 7.1 on
a Proliant system using the SmartArray 400 controller. Therefore, I need a
driver that supports it to even install RHEL 7.1. RHEL 7.1 doesn’t
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Burkhard Linke
Sent: 20 March 2015 09:09
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] OSD + Flashcache + udev + Partition uuid
Hi,
On 03/19/2015 10:41 PM, Nick Fisk wrote:
I have found a few incompatibilities, but so far they're all on the Ceph
side. One example I remember was having to change the way we delete
objects. The function we originally used fetches a list of object
versions, and deletes all versions. Ceph is implementing objects versions
now (I believe
Does it show DNE in the entry? That stands for Does Not Exist. It will
disappear on it's own after a while. I don't know what the timeout is, but
they have always gone away within 24 hours. I've edited the CRUSH map
before and I don't think it removed it when it was already DNE, I just had
to wait
I would say you're a little light on RAM. With 4TB disks 70% full, I've
seen some ceph-osd processes using 3.5GB of RAM during recovery. You'll be
fine during normal operation, but you might run into issues at the worst
possible time.
I have 8 OSDs per node, and 32G of RAM. I've had ceph-osd
This is the output if I try to remove from the crush map it says that a is
already out…
[root@capricornio ~]# ceph osd crush remove osd.29
device 'osd.29' does not appear in the crush map
[root@capricornio ~]#
[root@capricornio ~]# ceph osd tree | grep down
# idweight type name
thats what you sayd?
[root@capricornio ~]# ceph auth del osd.9
entity osd.9 does not exist
[root@capricornio ~]# ceph auth del osd.19
entity osd.19 does not exist
[cid:image005.png@01D00809.A6D502D0]
Jesus Chavez
SYSTEMS ENGINEER-C.SALES
jesch...@cisco.commailto:jesch...@cisco.com
Phone: +52
I'm not a CephFS user, but I have had a few cluster outages.
Each OSD has a journal, and Ceph ensures that a write is in all of the
journals (primary and replicas) before it acknowledges the write. If an
OSD process crashes, it replays the journal on startup, and recovers the
write.
I've lost
55 matches
Mail list logo