Re: [ceph-users] PGs issue

2015-03-20 Thread Bogdan SOLGA
Hello, Nick! Thank you for your reply! I have tested both with setting the replicas number to 2 and 3, by setting the 'osd pool default size = (2|3)' in the .conf file. Either I'm doing something incorrectly, or they seem to produce the same result. Can you give any troubleshooting advice? I

[ceph-users] OSD remains down

2015-03-20 Thread Jesus Chavez (jeschave)
there was a blackout and one of my osds remains down, I have noticed that the journal partition an data partion is not showed anymore so the device cannot mounted… 8 1145241856 sdh2 8 128 3906249728 sdi 8 129 3901005807 sdi1 8 1305241856 sdi2 8

Re: [ceph-users] OSD remains down

2015-03-20 Thread Sahana
Hi, If device mounted is not coming up, you can replace with new disk and ceph will handle rebalancing the data. Here are the steps if you would like to replace the failed disk with new one : 1. ceph osd out osd.110 2. Now remove this failed OSD from Crush Map , as soon as its removed from

Re: [ceph-users] PGs issue

2015-03-20 Thread Sahana
HI Bogdan, Please paste the output of `ceph osd dump` and ceph osd tree` Thanks Sahana On Fri, Mar 20, 2015 at 11:47 AM, Bogdan SOLGA bogdan.so...@gmail.com wrote: Hello, Nick! Thank you for your reply! I have tested both with setting the replicas number to 2 and 3, by setting the 'osd

Re: [ceph-users] PHP Rados failed in read operation if object size is large (say more than 10 MB )

2015-03-20 Thread Gaurang Vyas
If I run from command prompt it gives below error in $piece = rados_read($ioRados, 'TEMP_object',$pieceSize['psize'] ,0); -- Segmentation fault (core dumped) -- I have tried new version of librados too... --

[ceph-users] OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Thanks [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS

Re: [ceph-users] PGs issue

2015-03-20 Thread Bogdan SOLGA
Hello, Sahana! The output of the requested commands is listed below: admin@cp-admin:~/safedrive$ ceph osd dump epoch 26 fsid 7db3cf23-ddcb-40d9-874b-d7434bd8463d created 2015-03-20 07:53:37.948969 modified 2015-03-20 08:11:18.813790 flags pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0

Re: [ceph-users] 'pgs stuck unclean ' problem

2015-03-20 Thread Burkhard Linke
Hi, On 03/20/2015 01:58 AM, houguanghua wrote: Dear all, Ceph 0.72.2 is deployed in three hosts. But the ceph's status is HEALTH_WARN . The status is as follows: # ceph -s cluster e25909ed-25d9-42fd-8c97-0ed31eec6194 health HEALTH_WARN 768 pgs degraded; 768 pgs stuck

Re: [ceph-users] SSD Hardware recommendation

2015-03-20 Thread Josef Johansson
On 19 Mar 2015, at 08:17, Christian Balzer ch...@gol.com wrote: On Wed, 18 Mar 2015 08:59:14 +0100 Josef Johansson wrote: Hi, On 18 Mar 2015, at 05:29, Christian Balzer ch...@gol.com wrote: Hello, On Wed, 18 Mar 2015 03:52:22 +0100 Josef Johansson wrote: [snip] We though of

Re: [ceph-users] PGs issue

2015-03-20 Thread Sahana
Hi Bogdan, Here is the link for hardware recccomendations : http://ceph.com/docs/master/start/hardware-recommendations/#hard-disk-drives. As per this link, minimum size reccommended for osds is 1TB. Butt as Nick said, Ceph OSDs must be min. 10GB to get an weight of 0.01 Here is the snippet

Re: [ceph-users] OSD + Flashcache + udev + Partition uuid

2015-03-20 Thread Burkhard Linke
Hi, On 03/19/2015 10:41 PM, Nick Fisk wrote: I'm looking at trialling OSD's with a small flashcache device over them to hopefully reduce the impact of metadata updates when doing small block io. Inspiration from here:- http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/12083 One

Re: [ceph-users] how to compute Ceph durability?

2015-03-20 Thread Loic Dachary
Hi Ghislain, You will find more information about tools and methods at On 20/03/2015 11:47, ghislain.cheval...@orange.com wrote: Hi all, I would like to compute the durability of data stored in a ceph environment according to the cluster topology (failure domains) and the data

Re: [ceph-users] how to compute Ceph durability?

2015-03-20 Thread Loic Dachary
(that's what happens when typing Control-Enter V instead of Control-V enter ;-) On 20/03/2015 11:50, Loic Dachary wrote: Hi Ghislain, You will find more information about tools and methods at https://wiki.ceph.com/Development/Reliability_model/Final_report Enjoy ! On 20/03/2015 11:47,

[ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean

2015-03-20 Thread Karan Singh
Hello Guys My CEPH cluster lost data and not its not recovering. This problem occurred when Ceph performed recovery when one of the node was down. Now all the nodes are up but Ceph is showing PG as incomplete , unclean , recovering. I have tried several things to recover them like , scrub ,

[ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean

2015-03-20 Thread Karan Singh
Hello Guys My CEPH cluster lost data and not its not recovering. This problem occurred when Ceph performed recovery when one of the node was down. Now all the nodes are up but Ceph is showing PG as incomplete , unclean , recovering. I have tried several things to recover them like , scrub ,

[ceph-users] how to compute Ceph durability?

2015-03-20 Thread ghislain.chevalier
Hi all, I would like to compute the durability of data stored in a ceph environment according to the cluster topology (failure domains) and the data resiliency (replication/erasure coding). Does a tool exist ? Best regards - - - - - - - - - - - - - - - - - Ghislain Chevalier ORANGE

Re: [ceph-users] PGs issue

2015-03-20 Thread Bogdan SOLGA
Thank you for your suggestion, Nick! I have re-weighted the OSDs and the status has changed to '256 active+clean'. Is this information clearly stated in the documentation, and I have missed it? In case it isn't - I think it would be recommended to add it, as the issue might be encountered by

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Stéphane DUGRAVOT
- Mail original - Hi all, can anybody tell me how can I force delete osds? the thing is that one node got corrupted because of outage, so there is no way to get those osd up and back, is there anyway to force the removal from ceph-deploy node? Hi, Try manual : *

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-20 Thread Chris Murray
Ah, I was wondering myself if compression could be causing an issue, but I'm reconsidering now. My latest experiment should hopefully help troubleshoot. So, I remembered that ZLIB is slower, but is more 'safe for old kernels'. I try that: find /var/lib/ceph/osd/ceph-1/current -xdev \( -type f

Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-20 Thread Gregory Farnum
On Fri, Mar 20, 2015 at 4:03 PM, Chris Murray chrismurra...@gmail.com wrote: Ah, I was wondering myself if compression could be causing an issue, but I'm reconsidering now. My latest experiment should hopefully help troubleshoot. So, I remembered that ZLIB is slower, but is more 'safe for old

Re: [ceph-users] Question Blackout

2015-03-20 Thread Pavel V. Kaygorodov
Hi! We have experienced several blackouts on our small ceph cluster. Most annoying problem is time desync just after a blackout: mons are not starting to work before time sync, after resync and manual restart of monitors, some of pgs can stuck in inactive or peering state for a significant

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Robert LeBlanc
Yes, at this point, I'd export the CRUSH, edit it and import it back in. What version are you running? Robert LeBlanc Sent from a mobile device please excuse any typos. On Mar 20, 2015 4:28 PM, Jesus Chavez (jeschave) jesch...@cisco.com wrote: thats what you sayd? [root@capricornio ~]#

Re: [ceph-users] RADOS Gateway Maturity

2015-03-20 Thread Chris Jones
Hi Jerry, We using RGW and RBD in our OpenStack clusters and as stand alone clusters. We have six large clusters and adding more. Most of any issues we have faced have been self inflicted such as not currently supporting bucket names like host names. Some S3 tools only work that way which causes

[ceph-users] Unable to create rbd snapshot on Centos 7

2015-03-20 Thread gian
Hi guys, I'm trying to test rbd snapshot on a Centos 7. # rbd -p rbd ls test-a test-b test-c test-d # rbd snap create rbd/test-b@snap rbd: failed to create snapshot: (22) Invalid argument 2015-03-20 15:22:56.300731 7f78f7afe880 -1 librbd: failed to create snap id: (22) Invalid argument I

[ceph-users] centos vs ubuntu for production ceph cluster ?

2015-03-20 Thread Alexandre DERUMIER
Hi, I'll build my full ssd production soon, I wonder which distrib is best tested with inktank and ceph team ? ceph.com doc is quite old, and don't have reference for giant or hammer http://ceph.com/docs/master/start/os-recommendations/ Seem than in past only ubuntu and rhel was well tested,

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Thomas Foster
Have you tried it from a different node? like the ceph-mon or another ceph-osd node? On Fri, Mar 20, 2015 at 11:23 AM, Jesus Chavez (jeschave) jesch...@cisco.com wrote: Thanks stephane the thing is that those steps needs to be run in the node where the osd lives, I dont have that node any

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
Thanks stephane the thing is that those steps needs to be run in the node where the osd lives, I dont have that node any more since the operating Systems got corrupted so I Couldnt make it work :( Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone:

Re: [ceph-users] centos vs ubuntu for production ceph cluster ?

2015-03-20 Thread Quentin Hartman
For all intents and purposes, centos and rhel are equivalent, so I'd not be too concerned about that distinction. I can't comment as to which distro is better tested by ceph devs, but assuming that the packages are built appropriately with similar dependency versions and whatnot, that also

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread John Spray
On 20/03/2015 15:23, Jesus Chavez (jeschave) wrote: Thanks stephane the thing is that those steps needs to be run in the node where the osd lives, I dont have that node any more since the operating Systems got corrupted so I Couldnt make it work :( Assuming the OSD is already down+out, you

Re: [ceph-users] mds log message

2015-03-20 Thread Gregory Farnum
On Fri, Mar 20, 2015 at 12:39 PM, Daniel Takatori Ohara dtoh...@mochsl.org.br wrote: Hello, Anybody help me, please? Appear any messages in log of my mds. And after the shell of my clients freeze. 2015-03-20 12:23:54.068005 7f1608d49700 0 log_channel(default) log [WRN] : client.3197487

Re: [ceph-users] hadoop namenode not starting due to bindException while deploying hadoop with cephFS

2015-03-20 Thread Gregory Farnum
On Fri, Mar 20, 2015 at 1:05 PM, Ridwan Rashid ridwan...@gmail.com wrote: Gregory Farnum greg@... writes: On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid ridwan064@... wrote: Hi, I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with cephFS. I have installed

Re: [ceph-users] hadoop namenode not starting due to bindException while deploying hadoop with cephFS

2015-03-20 Thread Gregory Farnum
On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid ridwan...@gmail.com wrote: Hi, I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with cephFS. I have installed hadoop-1.1.1 in the nodes and changed the conf/core-site.xml file according to the ceph documentation

Re: [ceph-users] PGs issue

2015-03-20 Thread Bogdan SOLGA
Thank you for the clarifications, Sahana! I haven't got to that part, yet, so these details were (yet) unknown to me. Perhaps some information on the PGs weight should be provided in the 'quick deployment' page, as this issue might be encountered in the future by other users, as well. Kind

[ceph-users] Fwd: OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
Any idea how to forcé remove ? Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 Begin forwarded message: From: Stéphane DUGRAVOT

Re: [ceph-users] PGs issue

2015-03-20 Thread Craig Lewis
This seems to be a fairly consistent problem for new users. The create-or-move is adjusting the crush weight, not the osd weight. Perhaps the init script should set the defaultweight to 0.01 if it's = 0? It seems like there's a downside to this, but I don't see it. On Fri, Mar 20, 2015 at

Re: [ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean

2015-03-20 Thread Craig Lewis
osdmap e261536: 239 osds: 239 up, 238 in Why is that last OSD not IN? The history you need is probably there. Run ceph pg pgid query on some of the stuck PGs. Look for the recovery_state section. That should tell you what Ceph needs to complete the recovery. If you need more help, post

Re: [ceph-users] hadoop namenode not starting due to bindException while deploying hadoop with cephFS

2015-03-20 Thread Ridwan Rashid
Gregory Farnum greg@... writes: On Thu, Mar 19, 2015 at 5:57 PM, Ridwan Rashid ridwan064@... wrote: Hi, I have a 5 node ceph(v0.87) cluster and am trying to deploy hadoop with cephFS. I have installed hadoop-1.1.1 in the nodes and changed the conf/core-site.xml file according to the

Re: [ceph-users] Server Specific Pools

2015-03-20 Thread Robert LeBlanc
You can create CRUSH rulesets and then assign pools to different rulesets. http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds On Thu, Mar 19, 2015 at 7:28 PM, Garg, Pankaj pankaj.g...@caviumnetworks.com wrote: Hi, I have a Ceph cluster with

[ceph-users] mds log message

2015-03-20 Thread Daniel Takatori Ohara
Hello, Anybody help me, please? Appear any messages in log of my mds. And after the shell of my clients freeze. 2015-03-20 12:23:54.068005 7f1608d49700 0 log_channel(default) log [WRN] : client.3197487 isn't responding to mclientcaps(revoke), ino 11b1696 pending pAsxLsXsxFcb issued

Re: [ceph-users] Fwd: OSD Forece Removal

2015-03-20 Thread Robert LeBlanc
Removing the OSD from the CRUSH map and deleting the auth key is how you force remove an OSD. The OSD can no longer participate in the cluster, even if it does come back to life. All clients forget about the OSD when the new CRUSH map is distributed. On Fri, Mar 20, 2015 at 11:19 AM, Jesus Chavez

Re: [ceph-users] OSD + Flashcache + udev + Partition uuid

2015-03-20 Thread Robert LeBlanc
We tested bcache and abandoned it for two reasons. 1. Didn't give us any better performance than journals on SSD. 2. We had lots of corruption of the OSDs and were rebuilding them frequently. Since removing them, the OSDs have been much more stable. On Fri, Mar 20, 2015 at 4:03 AM,

Re: [ceph-users] PGs issue

2015-03-20 Thread Robert LeBlanc
The weight can be based on anything, size, speed, capability, some random value, etc. The important thing is that it makes sense to you and that you are consistent. Ceph by default (ceph-disk and I believe ceph-deploy) take the approach of using size. So if you use a different weighting scheme,

Re: [ceph-users] Ceiling on number of PGs in a OSD

2015-03-20 Thread Craig Lewis
This isn't a hard limit on the number, but it's recommended that you keep it around 100. Smaller values cause data distribution evenness problems. Larger values cause the OSD processes to use more CPU, RAM, and file descriptors, particularly during recovery. With that many OSDs, you're going to

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
Yes that exactly what I did but ceph osd tree still shows the osds Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 On Mar 20, 2015, at 9:41 AM,

Re: [ceph-users] PGs issue

2015-03-20 Thread Robert LeBlanc
I like this idea. I was under the impression that udev did not call the init script, but ceph-disk directly. I don't see ceph-disk calling create-or-move, but I know it does because I see it in the ceph -w when I boot up OSDs. /lib/udev/rules.d/95-ceph-osd.rules # activate ceph-tagged partitions

Re: [ceph-users] Fwd: OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
Maybe I should Edit the crushmap and delete osd... Is that a way yo force them? Thanks Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52 55 5267 3146tel:+52%2055%205267%203146 Mobile: +51 1 5538883255tel:+51%201%205538883255 CCIE - 44433 On Mar 20,

Re: [ceph-users] PGs issue

2015-03-20 Thread Nick Fisk
I see the Problem, as your OSD's are only 8GB they have a zero weight, I think the minimum size you can get away with is 10GB in Ceph as the size is measured in TB and only has 2 decimal places. For a work around try running :- ceph osd crush reweight osd.X 1 for each osd, this will

Re: [ceph-users] cciss driver package for RHEL7

2015-03-20 Thread Steffen W Sørensen
On 19/03/2015, at 17.46, O'Reilly, Dan daniel.orei...@dish.com wrote: The problem with using the hpsa driver is that I need to install RHEL 7.1 on a Proliant system using the SmartArray 400 controller. Therefore, I need a driver that supports it to even install RHEL 7.1. RHEL 7.1 doesn’t

Re: [ceph-users] OSD + Flashcache + udev + Partition uuid

2015-03-20 Thread Nick Fisk
-Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Burkhard Linke Sent: 20 March 2015 09:09 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] OSD + Flashcache + udev + Partition uuid Hi, On 03/19/2015 10:41 PM, Nick Fisk wrote:

Re: [ceph-users] RADOS Gateway Maturity

2015-03-20 Thread Craig Lewis
I have found a few incompatibilities, but so far they're all on the Ceph side. One example I remember was having to change the way we delete objects. The function we originally used fetches a list of object versions, and deletes all versions. Ceph is implementing objects versions now (I believe

Re: [ceph-users] Fwd: OSD Forece Removal

2015-03-20 Thread Robert LeBlanc
Does it show DNE in the entry? That stands for Does Not Exist. It will disappear on it's own after a while. I don't know what the timeout is, but they have always gone away within 24 hours. I've edited the CRUSH map before and I don't think it removed it when it was already DNE, I just had to wait

Re: [ceph-users] Uneven CPU usage on OSD nodes

2015-03-20 Thread Craig Lewis
I would say you're a little light on RAM. With 4TB disks 70% full, I've seen some ceph-osd processes using 3.5GB of RAM during recovery. You'll be fine during normal operation, but you might run into issues at the worst possible time. I have 8 OSDs per node, and 32G of RAM. I've had ceph-osd

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
This is the output if I try to remove from the crush map it says that a is already out… [root@capricornio ~]# ceph osd crush remove osd.29 device 'osd.29' does not appear in the crush map [root@capricornio ~]# [root@capricornio ~]# ceph osd tree | grep down # idweight type name

Re: [ceph-users] OSD Forece Removal

2015-03-20 Thread Jesus Chavez (jeschave)
thats what you sayd? [root@capricornio ~]# ceph auth del osd.9 entity osd.9 does not exist [root@capricornio ~]# ceph auth del osd.19 entity osd.19 does not exist [cid:image005.png@01D00809.A6D502D0] Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.commailto:jesch...@cisco.com Phone: +52

Re: [ceph-users] Question Blackout

2015-03-20 Thread Craig Lewis
I'm not a CephFS user, but I have had a few cluster outages. Each OSD has a journal, and Ceph ensures that a write is in all of the journals (primary and replicas) before it acknowledges the write. If an OSD process crashes, it replays the journal on startup, and recovers the write. I've lost