Re: [ceph-users] can I attach a volume to 2 servers
Mapping a single RBD on multiple servers isn’t going to do what you want unless you’re putting some kind of clustered filesystem on it. Exporting the filesystem via an NFS server will generally be simpler. You’ve already encountered one problem with sharing a block device without a clustered filesystem: One server doesn’t know when some other server has changed something, so a given server will only show changes it has made unless you somehow refresh the server’s knowledge (with a remount, for example). A related but much bigger problem comes with multiple servers are writing to the same block device. Because no server is aware of what the other servers are doing, it’s essentially guaranteed that you’ll have one server partially overwriting things another server just wrote, resulting in lost data and/or a broken filesystem. - Edward Huyer School of Interactive Games and Media Golisano 70-2373 152 Lomb Memorial Drive Rochester, NY 14623 585-475-6651 erh...@rit.edu<mailto:erh...@rit.edu> Obligatory Legalese: The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of yang sheng Sent: Monday, May 02, 2016 9:47 AM To: Sean Redmond <sean.redmo...@gmail.com> Cc: ceph-users <ceph-users@lists.ceph.com> Subject: Re: [ceph-users] can I attach a volume to 2 servers hi Sean thanks for your reply I think the ceph and openstack works fine for me. I can attach a bootable volume to vm. I am right now trying to attach the volumes to physical servers (hypervisor nodes) and share some data among hypervisors (based on the docs, nova evacuate function require all hypervisors share instance files). In the doc, they are using a NFS cluster. I am wondering if i can use the ceph volume instead of NFS. (I have created a volume and attached the volume to 2 hypervisors (A and B). but when I write something in server A, the server B couldn't see the file. I have de-attach and re-attach the volume on server B. ) On Mon, May 2, 2016 at 9:34 AM, Sean Redmond <sean.redmo...@gmail.com<mailto:sean.redmo...@gmail.com>> wrote: Hi, You could set the below to create ephemeral disks as RBD's [libvirt] libvirt_images_type = rbd On Mon, May 2, 2016 at 2:28 PM, yang sheng <forsaks...@gmail.com<mailto:forsaks...@gmail.com>> wrote: Hi I am using ceph infernalis. it works fine with my openstack liberty. I am trying to test nova evacuate. All the vms' volumes are shared among all compute nodes. however, the instance files (/var/lib/nova/instances) are in each compute node's local storage. Based on redhat docs(https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/6/html/Administration_Guide/section-evacuation.html), nova evacuate require sync the instance files as well and they created a NFS cluster. Since ceph is also shared among all nodes as well, I was thinking to create a volume in ceph and attach this volume to all compute nodes. just wondering is this doable? (I have already attached this volume to 2 servers, server A and server B. If I write something in server A, seems it is not visible to server B. I have to re-attach the volume to server B so that server B can see it.) ___ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mapping RBD On Ceph Cluster Node
On Apr 29, 2016 11:46 PM, Gregory Farnum <gfar...@redhat.com> wrote: > > On Friday, April 29, 2016, Edward Huyer <erhvks@ On Friday, April 29, 2016, Edward Huyer <erh...@rit.edu<mailto:erh...@rit.edu>> wrote: This is more of a "why" than a "can I/should I" question. The Ceph block device quickstart says (if I interpret it correctly) not to use a physical machine as both a Ceph RBD client and a node for hosting OSDs or other Ceph services. Is this interpretation correct? If so, what is the reasoning? If not, what is it actually saying? It's important not to use the kernel rbd mount on a machine hosting OSDs because if it runs like on memory and tries to flush out shift pages, but the OSD needs to allocate memory to handle the write... You have a problem! Hosting all userspace processes shouldn't be a problem, though, apart from the usual resource contention problems of running hyper-converged. -Greg Thanks in advance. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Mapping RBD On Ceph Cluster Node
This is more of a "why" than a "can I/should I" question. The Ceph block device quickstart says (if I interpret it correctly) not to use a physical machine as both a Ceph RBD client and a node for hosting OSDs or other Ceph services. Is this interpretation correct? If so, what is the reasoning? If not, what is it actually saying? Thanks in advance. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Weird/normal behavior when creating filesystem on RBD volume
Hi, I'm seeing (have always seen) odd behavior when I first put a filesystem on a newly created RBD volume. I've seen this on two different clusters across multiple major Ceph revisions. When I create an RBD volume, map it on a client and then do (for instance) mkfs.xfs on it, mkfs.xfs will just sit there and hang for a number of minutes, seemingly doing nothing. During this time, load on the OSDs (both the CPU usage of the daemons and actual IO on the disks) will spike dramatically. After a while, the load will subside and the mkfs will proceed as normal. Can anyone explain what's going on here? I have a pretty strong notion, but I'm hoping someone can give a definite answer. This behavior appears to be normal, so I'm not actually worried about it. It just makes myself and some coworkers go "huh, I wonder what causes that". ----- Edward Huyer School of Interactive Games and Media Golisano 70-2373 152 Lomb Memorial Drive Rochester, NY 14623 585-475-6651 erh...@rit.edu<mailto:erh...@rit.edu> Obligatory Legalese: The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-deploy Intended Purpose
I'm working on deploying a multi-machine (possibly as many as 7) ceph (61.4) cluster for experimentation. I'm trying to deploy using ceph-deploy on Ubuntu, but it seems...flaky. For instance, I tried to deploy additional monitors and ran into the bug(?) where the additional monitors don't work if you don't have public network defined in ceph.conf, but by the time I found that bit of info I had already blown up the cluster. So my question is, is ceph-deploy the preferred method for deploying larger clusters, particularly in production, or is it a quick-and-dirty get-something-going-to-play-with tool and manual configuration is preferred for real clusters? I've seen documentation suggesting it's not intended for use in real clusters, but a lot of other documentation seems to assume it's the default deploy tool. - Edward Huyer School of Interactive Games and Media Golisano 70-2373 152 Lomb Memorial Drive Rochester, NY 14623 585-475-6651 erh...@rit.edumailto:erh...@rit.edu Obligatory Legalese: The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Resizing filesystem on RBD without unmount/mount cycle
-Original Message- From: John Nielsen [mailto:li...@jnielsen.net] Sent: Monday, June 24, 2013 1:24 PM To: Edward Huyer Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] Resizing filesystem on RBD without unmount/mount cycle On Jun 24, 2013, at 9:13 AM, Edward Huyer erh...@rit.edu wrote: I'm experimenting with ceph 0.61.4 and RBD under Ubuntu 13.0x. I create a RADOS block device (test), map it, format it as ext4 or xfs, and mount it. No problem. I grow the underlying RBD. lsblk on both /dev/rbd/rbd/test and /dev/rbd1 shows the new size, but the filesystem resize commands don't see the new size until I unmount and then mount the block device again. -o remount isn't good enough, nor is partprobe. Is there a way to club the filesystem tools into recognizing that the RBD has changed sizes without unmounting the filesystem? I know this is possible with e.g. virtual machines (c.f. virsh blockresize), so I agree it _ought_ to work. I don't know if the RBD kernel module has or needs any special support for online resizing. It may work the same as partprobe, but have you tried blockdev -- rereadpt? No dice with blockdev. I just now ran across this older thread in ceph-devel that seems to imply what I want isn't possible: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/8013 It looks like the kernel can't update the size of a block device while the block device is in use. Boo. Thanks anyway. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] palcing SSDs and SATAs in same hosts
Hi, I am thinking how to make ceph with 2 pools - fast and slow. Plan is to use SSDs and SATAs(or SAS) in the same hosts and define pools that use fast and slow disks accordingly. Later it would be easy to grow either pool by need. I found example for CRUSH map that does similar thing by defining 2 root hierarchies where hosts are used in steps http://ceph.com/docs/master/rados/operations/crush-map/ (see Placing Different Pools on Different OSDS) but if 2 types of disks are in same hosts it will not work. How to do this? Do I have to define some fake entity in crushmap to make 2 pools work on same hardware? If I'm reading the documentation correctly, you probably have to run two (software) clusters. Ceph is capable of running multiple clusters on the same hardware. http://ceph.com/docs/master/rados/configuration/ceph-conf/#running-multiple-clusters ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] New User Q: General config, massive temporary OSD loss
Hi, I'm an admin for the School of Interactive Games and Media at RIT, and looking into using ceph to reorganize/consolidate the storage my department is using. I've read a lot of documentation and comments/discussion on the web, but I'm not 100% sure what I'm looking at doing is a good use of ceph. I was hoping to get some input on that, as well as an answer to a more specific question about OSDs going offline. First questions: Are there obvious flaws or concerns with the following configuration I should be aware of? Does it even make sense to try to use ceph here? Anything else I should know, think about, or do instead of the above? My more specific question relates to the two RAID controllers in the MD3200, and my intended 2 or 3 copy replication (also striping): What happens if all OSDs with copies of a piece of data go down for a period of time, but then the OSDs come back intact (e.g. by moving them to a different controller)? I know this can be limited or prevented entirely using good failure domain organization, but it's still a question I haven't found an answer to. Sorry for the wall o' text, and thanks in advance for any help or advice you can provide. Proposed Configuration/Architecture: I'll note that most of the post-implementation utilization would be in the form of RBDs mounted over 10Gbit Ethernet, which will then be used for file storage and/or KVM virtual drives. Once it's stable enough for production I'd like to use cephFS for file storage, but not just yet. Currently, we have a Dell MD3200 and two attached MD1200 set up as a sort of mini-SAN, which is then exporting chunks of block to various servers via SAS. The whole assemblage has a total capacity of 48TB spread across 36 disks (24x 1TB drives and 12x 3TB drives). We also have several TB of storage scattered across 10ish drives in 2-3 actual servers. This is all raw capacity on 7200RPM drives. There are a few problem with this configuration: - Expanding storage in a usable way is fairly difficult, especially if the drives don't match existing drive sizes in the MD SAN. - It is relatively easy to end up with slack in the storage; large chunks of storage that can't be easily allocated or reallocated. - Limited number of systems able to directly access the storage (due to limited SAS ports) - Difficult to upgrade (massive manual data migration needed) - 2 points of failure (exactly 2 RAID controllers in the MD3200; ceph wouldn't solve this problem immediately) My goals are to have easily expandable and upgradable storage for several servers (mainly, but not entirely, virtual), eliminate the slack in the current configuration, and to be able to relatively easily migrate away from the MD pseudo-SAN in the future. I'm also looking to lay the infrastructure ground work for an OpenStack cloud implementation or similar. My notion is to allocate and split the contents of the Dell MD array as individual drives to 3-4 servers (6-9 drives per server), which will then configure each drive as an OSD using XFS. I'd likely set up SSDs as journaling drives, each containing journals for ~6 OSDs (10-20GB journal per OSD). The servers would have appropriate RAM (1+GB per OSD, 1-2GB for the monitors), and would be 4-6 core Core i-generation Xeons. The back-end communication network for ceph would be 10GB Ethernet. I may have to have the ceph clients read their data from that back-end network as well; I realize this is probably not ideal, but I'm hoping/thinking it will be good enough. - Edward Huyer School of Interactive Games and Media Golisano 70-2373 152 Lomb Memorial Drive Rochester, NY 14623 585-475-6651 erh...@rit.edumailto:erh...@rit.edu Obligatory Legalese: The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] New User Q: General config, massive temporary OSD loss
[ Please stay on the list. :) ] Doh. Was trying to get Outlook to quote properly, and forgot to hit Reply-all. :) The specifics of what data will migrate where will depend on how you've set up your CRUSH map, when you're updating the CRUSH locations, etc, but if you move an OSD then it will fully participate in recovery and can be used as the authoritative source for data. Ok, so if data chunk bar lives only on OSDs 3, 4, and 5, and OSDs 3, 4, and 5 suddenly vanish for some reason but then come back later (with their data intact), the cluster will recover more-or-less gracefully? That is, it *won't* go sorry, your RBD 'foobarbaz' lost 'bar' for a while, all that data is gone? I would *assume* it has a way to recover more-or-less gracefully, but it's also not something I want to discover the answer to later. :) Well, if the data goes away and you try and read it, the request is just going to hang, and presumably eventually the kernel/hypervisor/block device whatever will time out and throw an error. At that point you have a choice between marking it lost (at which point it will say ENOENT to requests to access it, and RBD will turn that into zero blocks) or getting the data back online. When you do bring it back online, it will peer and then be accessible again without much fuss (if you only bring back one copy it might kick off a bunch of network and disk traffic re-replicating). Awesome, that's exactly how I would want it to work. If the drives themselves somehow manage to all catch on fire at the same time, I can still recover some of the data on the RBD, but as long as the drives are ok I should be able to bring the data back. Thanks for your help, I really appreciate it! Now to do some testing and fiddling. :) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com