[ceph-users] Performance questions (how original, I know)
Hello, new to Ceph, not new to replicated storage. Simple test cluster with 2 identical nodes running Debian Jessie, thus ceph 0.48. And yes, I very much prefer a distro supported package. Single mon and osd1 on node a, osd2 on node b. 1GbE direct interlink between the nodes, used exclusively for this setup. Bog standard, minimum configuration, declaring a journal but that's on the same backing storage. The backing storage can do this locally (bonnie++): Version 1.97 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP irt038G 89267 21 60474 15 267049 37 536.9 12 Latency4792ms 245ms 44908us 113ms And this with the a 20GB rbd (formatted the same way, ext4, as the test above) mounted on the node that hosts the mon and osd1: Version 1.97 --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP irt038G 11525 2 5562 1 48221 6 167.3 3 Latency5073ms2912ms 321ms2841ms I'm looking at Ceph/RBD to store VM volumes with ganeti and these numbers frankly scare me. Watching the traffic with ethstats I never saw something higher than this during writes (on node a): eth2: 72.32 Mb/s In 127.99 Mb/s Out - 8035.4 p/s In 11649.5 p/s Out I assume the traffic coming back in is replica stuff from node b, right? What prevented it to use more than about 13% of the network link capacity? Aside from that cringeworthy drop to 15% of the backing storage speed (and network link) which I presume might be salvageable by using a SSD journal I'm more than puzzled by the read speed. For starters I would have assumed that in this 2 replica setup all data is present on the local node a and Ceph would be smart enough to get it all locally. But even if it was talking to both nodes a and b (or just b) I would have expected something in the 100MB/s range. Any insights would be much appreciated. Regards, Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] is the manual correct?
hi all, offical manual says, == STOPPING W/OUT REBALANCING Periodically, you may need to perform maintenance on a subset of your cluster, or resolve a problem that affects a failure domain (e.g., a rack). If you do not want CRUSH to automatically rebalance the cluster as you stop OSDs for maintenance, set the cluster to noout first: ceph osd set noout Once the cluster is set to noout, you can begin stopping the OSDs within the failure domain that requires maintenance work. ceph osd stop osd.{num} Note Placement groups within the OSDs you stop will become degraded while you are addressing issues with within the failure domain. Once you have completed your maintenance, restart the OSDs. ceph osd start osd.{num} == but I can’t run ceph osd start/stop,it’s not because of my configuration,obviously,start/stop is not a valid command.I installed ceph 0.71 on ubuntu 12.04,what’s the problem?my ceph version is too new or too old? any sugguestions? thank you. === # ceph osd stop osd.1 no valid command found; 10 closest matches: osd tier remove-overlay poolname osd tier cache-mode poolname none|writeback|invalidate+forward|readonly osd tier set-overlay poolname poolname osd tier remove poolname poolname osd tier add poolname poolname osd pool stats {name} osd pool set-quota poolname max_objects|max_bytes val osd thrash int[0-] osd reweight-by-utilization {int[100-]} osd pool rename poolname poolname Error EINVAL: invalid command == Best Regards, David___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] is the manual correct?
The OSD can be stopped from the host directly, sudo stop ceph-osd id=3 I don't know if that's the 'proper' way mind. On 2013-12-16 09:40, david.zhang...@gmail.com wrote: ceph osd start osd.{num} == but I can’t run ceph osd start/stop,it’s not because of my configuration,obviously,start/stop is not a valid command. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] is the manual correct?
On 12/16/2013 10:48 AM, James Pearce wrote: The OSD can be stopped from the host directly, sudo stop ceph-osd id=3 Or use: service ceph stop osd.3 (depends if you use upstart or not). The manual in this case is not correct I think. Wido I don't know if that's the 'proper' way mind. On 2013-12-16 09:40, david.zhang...@gmail.com wrote: ceph osd start osd.{num} == but I can’t run ceph osd start/stop,it’s not because of my configuration,obviously,start/stop is not a valid command. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ulimit max user processes (-u) and non-root ceph clients
Hi, Sorry to revive this old thread, but I wanted to update you on the current pains we're going through related to clients' nproc (and now nofile) ulimits. When I started this thread we were using RBD for Glance images only, but now we're trying to enable RBD-backed Cinder volumes and are not really succeeding at the moment :( As we had guessed from our earlier experience, librbd and therefore qemu-kvm need increased nproc/nofile limits otherwise VMs will freeze. In fact we just observed a lockup of a test VM due to the RBD device blocking completely (this appears as blocked flush processes in the VM); we're actually not sure which of the nproc/nofile limits caused the freeze, but it was surely one of those. And the main problem we face now is that it isn't trivial to increase the limits of qemu-kvm on a running OpenStack hypervisor -- the values are set by libvirtd and seem to require a restart of all guest VMs on a host to reload a qemu config file. I'll update this thread when we find the solution to that... Moving forward, IMHO it would be much better if Ceph clients could gracefully work with large clusters without _requiring_ changes to the ulimits. I understand that such poorly configured clients would necessarily have decreased performance (since librados would need to use a thread pool and also lose some of the persistent client-OSD connections). But client lockups are IMHO worse that slightly lower performance. Have you guys discussed the client ulimit issues recently and is there a plan in the works? Best Regards, Dan, CERN IT/DSS On Sep 19, 2013 6:10 PM, Gregory Farnum g...@inktank.com wrote: On Wed, Sep 18, 2013 at 11:43 PM, Dan Van Der Ster daniel.vanders...@cern.ch wrote: On Sep 18, 2013, at 11:50 PM, Gregory Farnum g...@inktank.com wrote: On Wed, Sep 18, 2013 at 6:33 AM, Dan Van Der Ster daniel.vanders...@cern.ch wrote: Hi, We just finished debugging a problem with RBD-backed Glance image creation failures, and thought our workaround would be useful for others. Basically, we found that during an image upload, librbd on the glance api server was consuming many many processes, eventually hitting the 1024 nproc limit of non-root users in RHEL. The failure occurred when uploading to pools with 2048 PGs, but didn't fail when uploading to pools with 512 PGs (we're guessing that librbd is opening one thread per accessed-PG, and not closing those threads until the whole processes completes.) If you hit this same problem (and you run RHEL like us), you'll need to modify at least /etc/security/limits.d/90-nproc.conf (adding your non-root user that should be allowed 1024 procs), and then also possibly run ulimit -u in the init script of your client process. Ubuntu should have some similar limits. Did your pools with 2048 PGs have a significantly larger number of OSDs in them? Or are both pools on a pool with a lot of OSDs relative to the PG counts? 1056 OSDs at the moment. Uploading a 14GB image we observed up to ~1500 threads. We set the glance client to allow 4096 processes for now. The PG count shouldn't matter for this directly, but RBD (and other clients) will create a couple messenger threads for each OSD it talks to, and while they'll eventually shut down on idle it doesn't proactively close them. I'd expect this to be a problem around 500 OSDs. A couple, is that the upper limit? Should we be safe with ulimit -u 2*nOSDs +1 ?? The messenger currently generates 2 threads per daemon it communicates with (although they will go away after a long enough idle period). 2*nOSD+1 won't quite be enough as there's the monitor connection and a handful of internal threads (I don't remember the exact numbers off-hand). So far this hasn't been a problem for anybody and I doubt you'll see issues, but at some point we will need to switch the messenger to use epoll instead of a thread per socket. :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ulimit max user processes (-u) and non-root ceph clients
On Mon, Dec 16, 2013 at 11:08 AM, Dan van der Ster daniel.vanders...@cern.ch wrote: Hi, Sorry to revive this old thread, but I wanted to update you on the current pains we're going through related to clients' nproc (and now nofile) ulimits. When I started this thread we were using RBD for Glance images only, but now we're trying to enable RBD-backed Cinder volumes and are not really succeeding at the moment :( As we had guessed from our earlier experience, librbd and therefore qemu-kvm need increased nproc/nofile limits otherwise VMs will freeze. In fact we just observed a lockup of a test VM due to the RBD device blocking completely (this appears as blocked flush processes in the VM); we're actually not sure which of the nproc/nofile limits caused the freeze, but it was surely one of those. And the main problem we face now is that it isn't trivial to increase the limits of qemu-kvm on a running OpenStack hypervisor -- the values are set by libvirtd and seem to require a restart of all guest VMs on a host to reload a qemu config file. I'll update this thread when we find the solution to that... Is there some reason you can't just set it ridiculously high to start with? Moving forward, IMHO it would be much better if Ceph clients could gracefully work with large clusters without _requiring_ changes to the ulimits. I understand that such poorly configured clients would necessarily have decreased performance (since librados would need to use a thread pool and also lose some of the persistent client-OSD connections). But client lockups are IMHO worse that slightly lower performance. Have you guys discussed the client ulimit issues recently and is there a plan in the works? I'm afraid not. It's a plannable but non-trivial amount of work and the Inktank dev team is pretty well booked for a while. Anybody running into this as a serious bottleneck should 1) try and start a community effort 2) try and promote it as a priority with any Inktank business contacts they have. (You are only the second group to report it as an ongoing concern rather than a one-off hiccup, and honestly it sounds like you're just having issues with hitting the arbitrary limits, not with real resource exhaustion issues.) :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] USB pendrive as boot disk
On Mon, Dec 16, 2013 at 4:35 AM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: 2013/11/7 Kyle Bader kyle.ba...@gmail.com: Ceph handles it's own logs vs using syslog so I think your going to have to write to tmpfs and have a logger ship it somewhere else quickly. I have a feeling Ceph logs will eat a USB device alive, especially if you have to crank up debugging. I wasn't aware of this. I've assumed that ceph was using syslog like any other daemon. There are log_to_syslog and err_to_syslog config options that will send the ceph log output there. I don't remember all the config stuff you need to set up properly and be aware of, but you should be able to find it by searching the list archives or the docs. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ulimit max user processes (-u) and non-root ceph clients
On Dec 16, 2013 8:26 PM, Gregory Farnum g...@inktank.com wrote: On Mon, Dec 16, 2013 at 11:08 AM, Dan van der Ster daniel.vanders...@cern.ch wrote: Hi, Sorry to revive this old thread, but I wanted to update you on the current pains we're going through related to clients' nproc (and now nofile) ulimits. When I started this thread we were using RBD for Glance images only, but now we're trying to enable RBD-backed Cinder volumes and are not really succeeding at the moment :( As we had guessed from our earlier experience, librbd and therefore qemu-kvm need increased nproc/nofile limits otherwise VMs will freeze. In fact we just observed a lockup of a test VM due to the RBD device blocking completely (this appears as blocked flush processes in the VM); we're actually not sure which of the nproc/nofile limits caused the freeze, but it was surely one of those. And the main problem we face now is that it isn't trivial to increase the limits of qemu-kvm on a running OpenStack hypervisor -- the values are set by libvirtd and seem to require a restart of all guest VMs on a host to reload a qemu config file. I'll update this thread when we find the solution to that... Is there some reason you can't just set it ridiculously high to start with? As I mentioned, we haven't yet found a way to change the limits without affecting (stopping) the existing running (important) VMs. We thought that /etc/security/limits.conf would do the trick, but alas limits there have no effect on qemu. Cheers, Dan Moving forward, IMHO it would be much better if Ceph clients could gracefully work with large clusters without _requiring_ changes to the ulimits. I understand that such poorly configured clients would necessarily have decreased performance (since librados would need to use a thread pool and also lose some of the persistent client-OSD connections). But client lockups are IMHO worse that slightly lower performance. Have you guys discussed the client ulimit issues recently and is there a plan in the works? I'm afraid not. It's a plannable but non-trivial amount of work and the Inktank dev team is pretty well booked for a while. Anybody running into this as a serious bottleneck should 1) try and start a community effort 2) try and promote it as a priority with any Inktank business contacts they have. (You are only the second group to report it as an ongoing concern rather than a one-off hiccup, and honestly it sounds like you're just having issues with hitting the arbitrary limits, not with real resource exhaustion issues.) :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ulimit max user processes (-u) and non-root ceph clients
On 12/16/2013 2:36 PM, Dan Van Der Ster wrote: On Dec 16, 2013 8:26 PM, Gregory Farnum g...@inktank.com wrote: On Mon, Dec 16, 2013 at 11:08 AM, Dan van der Ster daniel.vanders...@cern.ch wrote: Hi, Sorry to revive this old thread, but I wanted to update you on the current pains we're going through related to clients' nproc (and now nofile) ulimits. When I started this thread we were using RBD for Glance images only, but now we're trying to enable RBD-backed Cinder volumes and are not really succeeding at the moment :( As we had guessed from our earlier experience, librbd and therefore qemu-kvm need increased nproc/nofile limits otherwise VMs will freeze. In fact we just observed a lockup of a test VM due to the RBD device blocking completely (this appears as blocked flush processes in the VM); we're actually not sure which of the nproc/nofile limits caused the freeze, but it was surely one of those. And the main problem we face now is that it isn't trivial to increase the limits of qemu-kvm on a running OpenStack hypervisor -- the values are set by libvirtd and seem to require a restart of all guest VMs on a host to reload a qemu config file. I'll update this thread when we find the solution to that... Is there some reason you can't just set it ridiculously high to start with? As I mentioned, we haven't yet found a way to change the limits without affecting (stopping) the existing running (important) VMs. We thought that /etc/security/limits.conf would do the trick, but alas limits there have no effect on qemu. I don't know whether qemu (perhaps librbd to be more precise?) is aware of the limits and avoids them or simply gets errors when it exceeds them. If it's the latter then couldn't you just use prlimit to change them? If that's not possible then maybe just change the limit settings, migrate the VM and then migrate it back? Cheers, Dan Moving forward, IMHO it would be much better if Ceph clients could gracefully work with large clusters without _requiring_ changes to the ulimits. I understand that such poorly configured clients would necessarily have decreased performance (since librados would need to use a thread pool and also lose some of the persistent client-OSD connections). But client lockups are IMHO worse that slightly lower performance. Have you guys discussed the client ulimit issues recently and is there a plan in the works? I'm afraid not. It's a plannable but non-trivial amount of work and the Inktank dev team is pretty well booked for a while. Anybody running into this as a serious bottleneck should 1) try and start a community effort 2) try and promote it as a priority with any Inktank business contacts they have. (You are only the second group to report it as an ongoing concern rather than a one-off hiccup, and honestly it sounds like you're just having issues with hitting the arbitrary limits, not with real resource exhaustion issues.) :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph incomplete pg
Are there any docs on how I can repair the inconsistent pgs? Or any thoughts on the crash of OSD? Thanks! From: Jeppesen, Nelson Sent: Thursday, December 12, 2013 10:58 PM To: 'ceph-users@lists.ceph.com' Subject: Ceph incomplete pg I have an issue with incomplete pgs, I've tried repairing it but no such luck. Any ideas what to check? Output from 'ceph health detail' HEALTH_ERR 2 pgs inconsistent; 1 pgs recovering; 1 pgs stuck unclean; recovery 15/863113 degraded (0.002%); 5/287707 unfound (0.002%); 4 scrub errors pg 22.ee is stuck unclean for 131473.768406, current state active+recovering+inconsistent, last acting [45,16,21] pg 22.ee is active+recovering+inconsistent, acting [45,16,21], 5 unfound pg 22.4a is active+clean+inconsistent, acting [2,25,34] recovery 15/863113 degraded (0.002%); 5/287707 unfound (0.002%) 4 scrub errors I tried to remove one of the nodes and now the service crashes on startup Dec 12 22:56:32 ceph12 ceph-osd: 0 2013-12-12 22:56:32.000946 7fe4dcd4a700 -1 *** Caught signal (Aborted) **#012 in thread 7fe4dcd4a700#012#012 ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)#012 1: /usr/bin/ceph-osd() [0x8001ea]#012 2: (()+0xfcb0) [0x7fe4f029bcb0]#012 3: (gsignal()+0x35) [0x7fe4eea53425]#012 4: (abort()+0x17b) [0x7fe4eea56b8b]#012 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fe4ef3a669d]#012 6: (()+0xb5846) [0x7fe4ef3a4846]#012 7: (()+0xb5873) [0x7fe4ef3a4873]#012 8: (()+0xb596e) [0x7fe4ef3a496e]#012 9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) [0x8c7087]#012 10: (object_info_t::decode(ceph::buffer::list::iterator)+0x73) [0x95c163]#012 11: (ReplicatedPG::build_push_op(ObjectRecoveryInfo const, ObjectRecoveryProgress const, ObjectRecoveryProgress*, PushOp*)+0x87f) [0x5f123f]#012 12: (ReplicatedPG::handle_pull(int, PullOp, PushOp*)+0xc1) [0x5f4611]#012 13: (ReplicatedPG::do_pull(std::tr1::shared_ptrOpRequest)+0x4f4) [0x5f53b4]#012 14: (PG::do_request(std::tr1::shared_ptrOpRequest, ThreadPool::TPHandle)+0x348) [0x703e38]#012 15: (OSD::dequeue_op(boost::intrusive_ptrPG, std::tr1::shared_ptrOpRequest, ThreadPool::TPHandle)+0x330) [0x658620]#012 16: (OSD::OpWQ::_process(boost::intrusive_ptrPG, ThreadPool::TPHandle)+0x4a0) [0x66ed10]#012 17: (ThreadPool::WorkQueueValstd::pairboost::intrusive_ptrPG, std::tr1::shared_ptrOpRequest , boost::intrusive_ptrPG ::_void_process(void*, ThreadPool::TPHandle)+0x9c) [0x6aa25c]#012 18: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x8b8f96]#012 19: (ThreadPool::WorkThread::entry()+0x10) [0x8bada0]#012 20: (()+0x7e9a) [0x7fe4f0293e9a]#012 21: (clone()+0x6d) [0x7fe4eeb113fd]#012 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] mon add problem
Hi, I am try to add mon host using ceph-deploy mon create kvm2, but its not working and giving me an error. [kvm2][DEBUG ] determining if provided host has same hostname in remote [kvm2][DEBUG ] get remote short hostname [kvm2][DEBUG ] deploying mon to kvm2 [kvm2][DEBUG ] get remote short hostname [kvm2][DEBUG ] remote hostname: kvm2 [kvm2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [kvm2][DEBUG ] create the mon path if it does not exist [kvm2][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-kvm2/done [kvm2][DEBUG ] create a done file to avoid re-doing the mon deployment [kvm2][DEBUG ] create the init path if it does not exist [kvm2][DEBUG ] locating the `service` executable... [kvm2][INFO ] Running command: initctl emit ceph-mon cluster=ceph id=kvm2 [kvm2][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.kvm2.asok mon_status [kvm2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [kvm2][WARNIN] monitor: mon.kvm2, might not be running yet [kvm2][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.kvm2.asok mon_status [kvm2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [kvm2][WARNIN] kvm2 is not defined in `mon initial members` [kvm2][WARNIN] monitor kvm2 does not exist in monmap [kvm2][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors [kvm2][WARNIN] monitors may not be able to form quorum root@kvm1:/home/umar/ceph-cluster# ceph-deploy mon create kvm2 would you please help me how to solve this problem? Br. Umar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Interest a SF South bay Meetup
So it sounds like there is only interest by two people. FYI, was looking for sometime in mid Jan. Andrew Mirantis On Wed, Dec 11, 2013 at 4:59 PM, Andrew Woodward xar...@gmail.com wrote: I'd like to get a pulse on any interest in having a meetup in the SF South bay (Mountain View CA, USA). -- Andrew Mirantis -- If google has done it, Google did it right! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mon add problem
[kvm2][WARNIN] kvm2 is not defined in `mon initial members` The above is why. When you run 'ceph-deploy new', pass it all the machines you intend to use as mons, eg 'ceph-deploy new mon1 mon2 mon3' Or alternately, you can modify the ceph.conf file in your bootstrap directory. And the mon and the IP, you'll see where. Do not use the mon's FQDN, only the shortname. From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Umar Draz Sent: Monday, December 16, 2013 2:28 PM To: ceph-us...@ceph.com Subject: [ceph-users] mon add problem Hi, I am try to add mon host using ceph-deploy mon create kvm2, but its not working and giving me an error. [kvm2][DEBUG ] determining if provided host has same hostname in remote [kvm2][DEBUG ] get remote short hostname [kvm2][DEBUG ] deploying mon to kvm2 [kvm2][DEBUG ] get remote short hostname [kvm2][DEBUG ] remote hostname: kvm2 [kvm2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [kvm2][DEBUG ] create the mon path if it does not exist [kvm2][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-kvm2/done [kvm2][DEBUG ] create a done file to avoid re-doing the mon deployment [kvm2][DEBUG ] create the init path if it does not exist [kvm2][DEBUG ] locating the `service` executable... [kvm2][INFO ] Running command: initctl emit ceph-mon cluster=ceph id=kvm2 [kvm2][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.kvm2.asok mon_status [kvm2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [kvm2][WARNIN] monitor: mon.kvm2, might not be running yet [kvm2][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.kvm2.asok mon_status [kvm2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [kvm2][WARNIN] kvm2 is not defined in `mon initial members` [kvm2][WARNIN] monitor kvm2 does not exist in monmap [kvm2][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors [kvm2][WARNIN] monitors may not be able to form quorum root@kvm1:/home/umar/ceph-cluster# ceph-deploy mon create kvm2 would you please help me how to solve this problem? Br. Umar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mon add problem
This indicates you have multiple networks on the new mon host, but no definition in your ceph.conf as to which network is public. In your ceph.conf, add: public network = 192.168.1.0/24 cluster network = 192.168.2.0/24 (Fix the subnet definitions for your environment) Then, re-try your new mon deploy. Thanks, Michael J. Kidd Michael J. Kidd Sr. Storage Consultant Inktank Professional Services On Mon, Dec 16, 2013 at 4:27 PM, Umar Draz unix...@gmail.com wrote: Hi, I am try to add mon host using ceph-deploy mon create kvm2, but its not working and giving me an error. [kvm2][DEBUG ] determining if provided host has same hostname in remote [kvm2][DEBUG ] get remote short hostname [kvm2][DEBUG ] deploying mon to kvm2 [kvm2][DEBUG ] get remote short hostname [kvm2][DEBUG ] remote hostname: kvm2 [kvm2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [kvm2][DEBUG ] create the mon path if it does not exist [kvm2][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-kvm2/done [kvm2][DEBUG ] create a done file to avoid re-doing the mon deployment [kvm2][DEBUG ] create the init path if it does not exist [kvm2][DEBUG ] locating the `service` executable... [kvm2][INFO ] Running command: initctl emit ceph-mon cluster=ceph id=kvm2 [kvm2][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.kvm2.asok mon_status [kvm2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [kvm2][WARNIN] monitor: mon.kvm2, might not be running yet [kvm2][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.kvm2.asok mon_status [kvm2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [kvm2][WARNIN] kvm2 is not defined in `mon initial members` [kvm2][WARNIN] monitor kvm2 does not exist in monmap [kvm2][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors [kvm2][WARNIN] monitors may not be able to form quorum root@kvm1:/home/umar/ceph-cluster# ceph-deploy mon create kvm2 would you please help me how to solve this problem? Br. Umar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Interest a SF South bay Meetup
Hi Andrew, That would be motivation enough for me to want to meet these two persons over a beer or a diner :-) It gets more complicated to do that when there are more than ten. Cheers On 16/12/2013 22:28, Andrew Woodward wrote: So it sounds like there is only interest by two people. FYI, was looking for sometime in mid Jan. Andrew Mirantis On Wed, Dec 11, 2013 at 4:59 PM, Andrew Woodward xar...@gmail.com mailto:xar...@gmail.com wrote: I'd like to get a pulse on any interest in having a meetup in the SF South bay (Mountain View CA, USA). -- Andrew Mirantis -- If google has done it, Google did it right! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph.com Audit?
Hey Guys, Ross and I were discussing a few pages on Ceph.com that we thought needed an update and I figured it might be a good idea to go through and audit Ceph.com in general, just to get an idea of what we're up against. I started a simple pad in case the Trello board is a bit too daunting. Anyone that has thoughts please weigh in under the appropriate place in the tree. Shout if you have questions. Thanks! http://pad.ceph.com/p/ceph.com-audit Best Regards, Patrick McGarry Director, Community || Inktank http://ceph.com || http://inktank.com @scuttlemonkey || @ceph || @inktank ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] My experience with ceph now documentted
Karan, This all looks great. I'd encourage you to submit some of this information into the ceph docs, some of the openstack integration docs are getting a little dated Andrew On Fri, Dec 6, 2013 at 12:24 PM, Karan Singh ksi...@csc.fi wrote: Hello Cephers I would like to say a BIG THANKS to ceph community for helping me in setting up and learning ceph. I have created a small documentation http://karan-mj.blogspot.fi/ of my experience with ceph till now , i belive it would help beginners in installing ceph and integrating it with openstack. I would keep updating this blog. PS -- i recommend original ceph documentation http://ceph.com/docs/master/ and other original content published by Ceph community , INKTANK and other partners. My attempt http://karan-mj.blogspot.fi/ is just to contribute for a regular online content about ceph. Karan Singh CSC - IT Center for Science Ltd. P.O. Box 405, FI-02101 Espoo, FINLAND http://www.csc.fi/ | +358 (0) 503 812758 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- If google has done it, Google did it right! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mon add problem
HI Don, Well the result is same even after ceph-deploy new kvm2 Br. Umar On Tue, Dec 17, 2013 at 2:35 AM, Don Talton (dotalton) dotal...@cisco.comwrote: [kvm2][WARNIN] kvm2 is not defined in `mon initial members` The above is why. When you run ‘ceph-deploy new’, pass it all the machines you intend to use as mons, eg ‘ceph-deploy new mon1 mon2 mon3’ Or alternately, you can modify the ceph.conf file in your bootstrap directory. And the mon and the IP, you’ll see where. Do not use the mon’s FQDN, only the shortname. *From:* ceph-users-boun...@lists.ceph.com [mailto: ceph-users-boun...@lists.ceph.com] *On Behalf Of *Umar Draz *Sent:* Monday, December 16, 2013 2:28 PM *To:* ceph-us...@ceph.com *Subject:* [ceph-users] mon add problem Hi, I am try to add mon host using ceph-deploy mon create kvm2, but its not working and giving me an error. [kvm2][DEBUG ] determining if provided host has same hostname in remote [kvm2][DEBUG ] get remote short hostname [kvm2][DEBUG ] deploying mon to kvm2 [kvm2][DEBUG ] get remote short hostname [kvm2][DEBUG ] remote hostname: kvm2 [kvm2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [kvm2][DEBUG ] create the mon path if it does not exist [kvm2][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-kvm2/done [kvm2][DEBUG ] create a done file to avoid re-doing the mon deployment [kvm2][DEBUG ] create the init path if it does not exist [kvm2][DEBUG ] locating the `service` executable... [kvm2][INFO ] Running command: initctl emit ceph-mon cluster=ceph id=kvm2 [kvm2][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.kvm2.asok mon_status [kvm2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [kvm2][WARNIN] monitor: mon.kvm2, might not be running yet [kvm2][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.kvm2.asok mon_status [kvm2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [kvm2][WARNIN] kvm2 is not defined in `mon initial members` [kvm2][WARNIN] monitor kvm2 does not exist in monmap [kvm2][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors [kvm2][WARNIN] monitors may not be able to form quorum root@kvm1:/home/umar/ceph-cluster# ceph-deploy mon create kvm2 would you please help me how to solve this problem? Br. Umar -- Umar Draz Network Architect ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mon add problem
Hi Michael, I have only single interface as 192.168.1.x on my ceph hosts. Then what i need to define? Br. Umar On Tue, Dec 17, 2013 at 2:37 AM, Michael Kidd michael.k...@inktank.comwrote: This indicates you have multiple networks on the new mon host, but no definition in your ceph.conf as to which network is public. In your ceph.conf, add: public network = 192.168.1.0/24 cluster network = 192.168.2.0/24 (Fix the subnet definitions for your environment) Then, re-try your new mon deploy. Thanks, Michael J. Kidd Michael J. Kidd Sr. Storage Consultant Inktank Professional Services On Mon, Dec 16, 2013 at 4:27 PM, Umar Draz unix...@gmail.com wrote: Hi, I am try to add mon host using ceph-deploy mon create kvm2, but its not working and giving me an error. [kvm2][DEBUG ] determining if provided host has same hostname in remote [kvm2][DEBUG ] get remote short hostname [kvm2][DEBUG ] deploying mon to kvm2 [kvm2][DEBUG ] get remote short hostname [kvm2][DEBUG ] remote hostname: kvm2 [kvm2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [kvm2][DEBUG ] create the mon path if it does not exist [kvm2][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-kvm2/done [kvm2][DEBUG ] create a done file to avoid re-doing the mon deployment [kvm2][DEBUG ] create the init path if it does not exist [kvm2][DEBUG ] locating the `service` executable... [kvm2][INFO ] Running command: initctl emit ceph-mon cluster=ceph id=kvm2 [kvm2][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.kvm2.asok mon_status [kvm2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [kvm2][WARNIN] monitor: mon.kvm2, might not be running yet [kvm2][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.kvm2.asok mon_status [kvm2][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory [kvm2][WARNIN] kvm2 is not defined in `mon initial members` [kvm2][WARNIN] monitor kvm2 does not exist in monmap [kvm2][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors [kvm2][WARNIN] monitors may not be able to form quorum root@kvm1:/home/umar/ceph-cluster# ceph-deploy mon create kvm2 would you please help me how to solve this problem? Br. Umar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Umar Draz Network Architect ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw-agent, sync zone_info.us-east: Http error code 500 content
Thanks for your reply. root@rceph0:~# radosgw-admin zone get --name client.radosgw.us-west-1 { domain_root: .us-west.rgw.root, control_pool: .us-west.rgw.control, gc_pool: .us-west.rgw.gc, log_pool: .us-west.log, intent_log_pool: .us-west.intent-log, usage_log_pool: .us-west.usage, user_keys_pool: .us-west.users, user_email_pool: .us-west.users.email, user_swift_pool: .us-west.users.swift, user_uid_pool: .us-west.users.uid, system_key: { access_key: G5DLUXD2HA07LDT10DRU, secret_key: IPgisy2fW7WOX1xFqjtdPFR6fXPfupfDHEM4n4+H}, placement_pools: [ { key: default-placement, val: { index_pool: .us-west.rgw.buckets.index, data_pool: .us-west.rgw.buckets}}]} root pool setting in ceph.conf is below: [client.radosgw.us-west-1] rgw region = us rgw region root pool = .us.rgw.root rgw zone = us-west rgw zone root pool = .us-west.rgw.root or,can I delete this non-bucket metadata info ?? 2013/12/16 Yehuda Sadeh yeh...@inktank.com For some reason your bucket list seem to be returning some non-bucket metadata info. Sounds like there's a mixup in the pools. What does radosgw-admin zone get (for the us-west zone) return? What's your 'rgw zone root pool' and 'rgw region root pool'? Yehuda On Sun, Dec 15, 2013 at 9:03 PM, hnuzhou...@gmail.com wrote: Hi,guys. I am using the character of geo-replication in ceph. I have two ceph clusters,so my plan is one region,in which two zones. Ceph version is ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de) Now I can sync users and buckets from master zone to slave zone. But the object in bucket can not be synced.the error about object is: ERROR:radosgw_agent.worker:failed to sync object gci-replication-copytest1/628.png: state is error The following is the output when I run “radosgw-agent -c /etc/ceph/region-data-sync.conf --sync-scope full”: region map is: {u'us': [u'us-west', u'us-east']} INFO:root:syncing all metadata INFO:radosgw_agent.sync:Starting sync INFO:radosgw_agent.worker:finished syncing shard 33 INFO:radosgw_agent.worker:incremental sync will need to retry items: [] INFO:radosgw_agent.sync:1/19 items processed INFO:radosgw_agent.worker:finished syncing shard 5 INFO:radosgw_agent.worker:incremental sync will need to retry items: [] INFO:radosgw_agent.sync:2/19 items processed INFO:radosgw_agent.worker:finished syncing shard 6 INFO:radosgw_agent.worker:incremental sync will need to retry items: [] INFO:radosgw_agent.sync:3/19 items processed INFO:radosgw_agent.worker:finished syncing shard 1 INFO:radosgw_agent.worker:incremental sync will need to retry items: [] INFO:radosgw_agent.sync:4/19 items processed WARNING:radosgw_agent.worker:error getting metadata for bucket zone_info.us-west: Http error code 500 content {Code:UnknownError} Traceback (most recent call last): File /usr/lib/python2.7/dist-packages/radosgw_agent/worker.py, line 400, in sync_meta metadata = client.get_metadata(self.src_conn, section, name) File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, line 163, in get_metadata params=dict(key=name)) File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, line 155, in request check_result_status(result) File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, line 116, in check_result_status HttpError)(result.status_code, result.content) HttpError: Http error code 500 content {Code:UnknownError} INFO:radosgw_agent.sync:5/19 items processed INFO:radosgw_agent.worker:finished syncing shard 28 INFO:radosgw_agent.worker:incremental sync will need to retry items: [] INFO:radosgw_agent.sync:6/19 items processed INFO:radosgw_agent.worker:finished syncing shard 42 INFO:radosgw_agent.worker:incremental sync will need to retry items: [] WARNING:radosgw_agent.worker:error getting metadata for bucket zone_info.us-east: Http error code 500 content {Code:UnknownError} Traceback (most recent call last): File /usr/lib/python2.7/dist-packages/radosgw_agent/worker.py, line 400, in sync_meta metadata = client.get_metadata(self.src_conn, section, name) File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, line 163, in get_metadata params=dict(key=name)) File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, line 155, in request check_result_status(result) File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, line 116, in check_result_status HttpError)(result.status_code, result.content) HttpError: Http error code 500 content {Code:UnknownError} INFO:radosgw_agent.sync:7/19 items processed INFO:radosgw_agent.worker:finished syncing shard 11 INFO:radosgw_agent.worker:incremental sync will need to retry items: []
Re: [ceph-users] radosgw-agent, sync zone_info.us-east: Http error code 500 content
On Mon, Dec 16, 2013 at 8:22 PM, lin zhou 周林 hnuzhou...@gmail.com wrote: Thanks for your reply. root@rceph0:~# radosgw-admin zone get --name client.radosgw.us-west-1 { domain_root: .us-west.rgw.root, control_pool: .us-west.rgw.control, gc_pool: .us-west.rgw.gc, log_pool: .us-west.log, intent_log_pool: .us-west.intent-log, usage_log_pool: .us-west.usage, user_keys_pool: .us-west.users, user_email_pool: .us-west.users.email, user_swift_pool: .us-west.users.swift, user_uid_pool: .us-west.users.uid, system_key: { access_key: G5DLUXD2HA07LDT10DRU, secret_key: IPgisy2fW7WOX1xFqjtdPFR6fXPfupfDHEM4n4+H}, placement_pools: [ { key: default-placement, val: { index_pool: .us-west.rgw.buckets.index, data_pool: .us-west.rgw.buckets}}]} root pool setting in ceph.conf is below: [client.radosgw.us-west-1] rgw region = us rgw region root pool = .us.rgw.root rgw zone = us-west rgw zone root pool = .us-west.rgw.root or,can I delete this non-bucket metadata info ?? If you delete it you'd lose your zone and region configuration. Note that you can use the region root pool for that purpose. So first copy the relevant objects, e.g.,: $ rados -p .us-west.rgw.root --target-pool=.us.rgw.root cp zone_info.us-west and then you can remove them. But please make sure everything else works before you remove them (e.g., you can still acess the zone). Yehuda 2013/12/16 Yehuda Sadeh yeh...@inktank.com For some reason your bucket list seem to be returning some non-bucket metadata info. Sounds like there's a mixup in the pools. What does radosgw-admin zone get (for the us-west zone) return? What's your 'rgw zone root pool' and 'rgw region root pool'? Yehuda On Sun, Dec 15, 2013 at 9:03 PM, hnuzhou...@gmail.com wrote: Hi,guys. I am using the character of geo-replication in ceph. I have two ceph clusters,so my plan is one region,in which two zones. Ceph version is ceph version 0.72.1 (4d923861868f6a15dcb33fef7f50f674997322de) Now I can sync users and buckets from master zone to slave zone. But the object in bucket can not be synced.the error about object is: ERROR:radosgw_agent.worker:failed to sync object gci-replication-copytest1/628.png: state is error The following is the output when I run “radosgw-agent -c /etc/ceph/region-data-sync.conf --sync-scope full”: region map is: {u'us': [u'us-west', u'us-east']} INFO:root:syncing all metadata INFO:radosgw_agent.sync:Starting sync INFO:radosgw_agent.worker:finished syncing shard 33 INFO:radosgw_agent.worker:incremental sync will need to retry items: [] INFO:radosgw_agent.sync:1/19 items processed INFO:radosgw_agent.worker:finished syncing shard 5 INFO:radosgw_agent.worker:incremental sync will need to retry items: [] INFO:radosgw_agent.sync:2/19 items processed INFO:radosgw_agent.worker:finished syncing shard 6 INFO:radosgw_agent.worker:incremental sync will need to retry items: [] INFO:radosgw_agent.sync:3/19 items processed INFO:radosgw_agent.worker:finished syncing shard 1 INFO:radosgw_agent.worker:incremental sync will need to retry items: [] INFO:radosgw_agent.sync:4/19 items processed WARNING:radosgw_agent.worker:error getting metadata for bucket zone_info.us-west: Http error code 500 content {Code:UnknownError} Traceback (most recent call last): File /usr/lib/python2.7/dist-packages/radosgw_agent/worker.py, line 400, in sync_meta metadata = client.get_metadata(self.src_conn, section, name) File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, line 163, in get_metadata params=dict(key=name)) File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, line 155, in request check_result_status(result) File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, line 116, in check_result_status HttpError)(result.status_code, result.content) HttpError: Http error code 500 content {Code:UnknownError} INFO:radosgw_agent.sync:5/19 items processed INFO:radosgw_agent.worker:finished syncing shard 28 INFO:radosgw_agent.worker:incremental sync will need to retry items: [] INFO:radosgw_agent.sync:6/19 items processed INFO:radosgw_agent.worker:finished syncing shard 42 INFO:radosgw_agent.worker:incremental sync will need to retry items: [] WARNING:radosgw_agent.worker:error getting metadata for bucket zone_info.us-east: Http error code 500 content {Code:UnknownError} Traceback (most recent call last): File /usr/lib/python2.7/dist-packages/radosgw_agent/worker.py, line 400, in sync_meta metadata = client.get_metadata(self.src_conn, section, name) File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, line 163, in get_metadata params=dict(key=name)) File /usr/lib/python2.7/dist-packages/radosgw_agent/client.py, line
Re: [ceph-users] Ceph incomplete pg
I am currently trying to figure out how to debug pgs issues myself and the debugging documentation I have found has not been that helpful. In my case the underlying problem is probably ZFS which I am using for my OSDs, but it would be nice to be able to recover what I can. My health output is: # ceph health HEALTH_WARN 39 pgs backfill; 26 pgs backfilling; 297 pgs degraded; 88 pgs down; 89 pgs peering; 19 pgs recovering; 35 pgs recovery_wait; 66 pgs stale; 96 pgs stuck inactive; 66 pgs stuck stale; 690 pgs stuck unclean; 3 requests are blocked 32 sec; recovery 86428/515041 objects degraded (16.781%); pool iscsi pg_num 250 pgp_num 100; pool iscsi has too few pgs Also if I try to do a rbd -p pool ls on any of my pools, the command hangs. If I figure out anything, I will let you known. I have an issue with incomplete pgs, I’ve tried repairing it but no such luck. Any ideas what to check? Output from ‘ceph health detail’ HEALTH_ERR 2 pgs inconsistent; 1 pgs recovering; 1 pgs stuck unclean; recovery 15/863113 degraded (0.002%); 5/287707 unfound (0.002%); 4 scrub errors pg 22.ee is stuck unclean for 131473.768406, current state active+recovering+inconsistent, last acting [45,16,21] pg 22.ee is active+recovering+inconsistent, acting [45,16,21], 5 unfound pg 22.4a is active+clean+inconsistent, acting [2,25,34] recovery 15/863113 degraded (0.002%); 5/287707 unfound (0.002%) 4 scrub errors Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] After reboot nothing worked
Hello, I have 2 node ceph cluster, I just rebooted both of the host just for testing that after rebooting the cluster remain work or not, and the result was cluster unable to start. here is ceph -s output health HEALTH_WARN 704 pgs stale; 704 pgs stuck stale; mds cluster is degraded; 1/1 in osds are down; clock skew detected on mon.kvm2 monmap e2: 2 mons at {kvm1= 192.168.214.10:6789/0,kvm2=192.168.214.11:6789/0}, election epoch 16, quorum 0,1 kvm1,kvm2 mdsmap e13: 1/1/1 up {0=kvm1=up:replay} osdmap e29: 2 osds: 0 up, 1 in pgmap v68: 704 pgs, 4 pools, 9603 bytes data, 23 objects 1062 MB used, 80816 MB / 81879 MB avail 704 stale+active+clean according to this useless documentation. http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/ I tried ceph osd tree the output was # idweight type name up/down reweight -1 0.16root default -2 0.07999 host kvm1 0 0.07999 osd.0 down1 -3 0.07999 host kvm2 1 0.07999 osd.1 down0 Then i tried sudo /etc/init.d/ceph -a start osd.0 sudo /etc/init.d/ceph -a start osd.1 to start the osd on both host the result was /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) /etc/init.d/ceph: osd.1 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines ) Now question is what is this? is really ceph is stable? can we use this for production environment? My both host has ntp running the time is upto date. Br. Umar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Sanity check of deploying Ceph very unconventionally (on top of RAID6, with very few nodes and OSDs)
Hello, I've been doing a lot of reading and am looking at the following design for a storage cluster based on Ceph. I will address all the likely knee-jerk reactions and reasoning below, so hold your guns until you've read it all. I also have a number of questions I've not yet found the answer to or determined it by experimentation. Hardware: 2x 4U (can you say Supermicro? ^.^) servers with 24 3.5 hotswap bays, 2 internal OS (journal?) drives, probably Opteron 4300 CPUs (see below), Areca 1882 controller with 4GB cache, 2 or 3 2-port Infiniband HCAs. 24 3TB HDs (30% of the price of a 4TB one!) in one or two RAID6, 2 of them hotspares, giving us 60TB per node and thus with a replication factor of 2 that's also the usable space. Space for 2 more identical servers if need be. Network: Infiniband QDR, 2x 18port switches (interconnected of course), redundant paths everywhere, including to the clients (compute nodes). Ceph configuration: Additional server with mon, mons also on the 2 storage nodes, at least 2 OSDs per node (see below) This is for a private cloud with about 500 VMs at most. There will 2 types of VMs, the majority writing a small amount of log chatter to their volumes, the other type (a few dozen) writing a more substantial data stream. I estimate less than 100MB/s of read/writes at full build out, which should be well within the abilities of this setup. Now for the rationale of this design that goes contrary to anything normal Ceph layouts suggest: 1. Idiot (aka NOC monkey) proof hotswap of disks. This will be deployed in a remote data center, meaning that qualified people will not be available locally and thus would have to travel there each time a disk or two fails. In short, telling somebody to pull the disk tray with the red flashing LED and put a new one from the spare pile in there is a lot more likely to result in success than telling them to pull the 3rd row, 4th column disk in server 2. ^o^ 2. Density, TCO Ideally I would love to deploy something like this: http://www.mbx.com/60-drive-4u-storage-server/ but they seem to not really have a complete product description, price list, etc. ^o^ With a monster like that, I'd be willing to reconsider local raids and just overspec things in a way that a LOT disks can fail before somebody (with a clue) needs to visit that DC. However failing that, the typical approach to use many smaller servers for OSDs increases the costs and/or reduces density. Replacing the 4U servers with 2U ones (that hold 12 disks) would require some sort of controller (to satisfy my #1 requirement) and similar amounts of HCAs per node, clearly driving the TCO up. 1U servers with typically 4 disk would be even worse. 3. Increased reliability/stability Failure of a single disk has no impact on the whole cluster, no need for any CPU/network intensive rebalancing. Questions/remarks: Due to the fact that there will be redundancy, reliability on the disk level and that there will be only 2 storage nodes initially, I'm planning to disable rebalancing. Or will Ceph realize that making replicas on the same server won't really save the day and refrain from doing so? If more nodes are added later, I will likely set an appropriate full ratio and activate rebalancing on a permanent basis again (except for planned maintenances of course). My experience tells me that an actual node failure will be due to: 1. Software bugs, kernel or otherwise. 2. Marginal hardware (CPU/memory/mainboard hairline cracks, I've seen it all) Actual total loss of power in the DC doesn't worry me, because if that happens I'm likely under a ton of rubble, this being Japan. ^_^ Given that a RAID6 with just 7 disk connected to an Areca 1882 controller in a different cluster I'm running here gives me about 800MB/s writes and 1GB/s reads I have a feeling that putting the journal on SSDs (Intel DC S3700) would be a waste, if not outright harmful. But I guess I shall determine that by testing, maybe the higher IOPS rate will still be beneficial. Since the expected performance of this RAID will be at least double the bandwidth available on a single IB interface, I'm thinking of splitting it in half and have an OSD for each half and bound to a different interface. One hopes that nothing in the OSD design stops it from dealing with these speeds/bandwidths. The plan is to use Ceph only for RBD, so would filestore xattr use omap really be needed in case tests determine ext4 to be faster than xfs in my setup? Given the above configuration, I'm wondering how many CPU cores would be sufficient in the storage nodes. Somewhere in the documentation http://ceph.com/docs/master/start/hardware-recommendations/ is a recommendation for 1GB RAM per 1TB of storage, but later on the same page we see a storage server example with 36TB and 16GB RAM. Ideally I would love to use just one 6 or 8 core Opteron 4300 with 32GB of memory, thus having only one NUMA domain and keeping all the processes dealing with I/O