Re: [ceph-users] can I attach a volume to 2 servers

2016-05-02 Thread Edward Huyer
Mapping a single RBD on multiple servers isn’t going to do what you want unless 
you’re putting some kind of clustered filesystem on it.  Exporting the 
filesystem via an NFS server will generally be simpler.

You’ve already encountered one problem with sharing a block device without a 
clustered filesystem:  One server doesn’t know when some other server has 
changed something, so a given server will only show changes it has made unless 
you somehow refresh the server’s knowledge (with a remount, for example).

A related but much bigger problem comes with multiple servers are writing to 
the same block device.  Because no server is aware of what the other servers 
are doing, it’s essentially guaranteed that you’ll have one server partially 
overwriting things another server just wrote, resulting in lost data and/or a 
broken filesystem.

-
Edward Huyer
School of Interactive Games and Media
Golisano 70-2373
152 Lomb Memorial Drive
Rochester, NY 14623
585-475-6651
erh...@rit.edu<mailto:erh...@rit.edu>

Obligatory Legalese:
The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of yang 
sheng
Sent: Monday, May 02, 2016 9:47 AM
To: Sean Redmond <sean.redmo...@gmail.com>
Cc: ceph-users <ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] can I attach a volume to 2 servers

hi Sean

thanks for your reply

I think the ceph and openstack works fine for me. I can attach a bootable 
volume to vm.

I am right now trying to attach the volumes to physical servers (hypervisor 
nodes) and share some data among hypervisors (based on the docs, nova evacuate 
function require all hypervisors share instance files).

In the doc, they are using a NFS cluster. I am wondering if i can use the ceph 
volume instead of NFS.

(I have created a volume and attached the volume to 2 hypervisors (A and B). 
but when I write something in server A, the server B couldn't see the file. I 
have de-attach and re-attach the volume on server B. )



On Mon, May 2, 2016 at 9:34 AM, Sean Redmond 
<sean.redmo...@gmail.com<mailto:sean.redmo...@gmail.com>> wrote:
Hi,

You could set the below to create ephemeral disks as RBD's

[libvirt]

libvirt_images_type = rbd

On Mon, May 2, 2016 at 2:28 PM, yang sheng 
<forsaks...@gmail.com<mailto:forsaks...@gmail.com>> wrote:
Hi

I am using ceph infernalis.

it works fine with my openstack liberty.

I am trying to test nova evacuate.

All the vms' volumes are shared among all compute nodes. however, the instance 
files (/var/lib/nova/instances) are in each compute node's local storage.

Based on redhat 
docs(https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/6/html/Administration_Guide/section-evacuation.html),
 nova evacuate require sync the instance files as well and they created a NFS 
cluster.

Since ceph is also shared among all nodes as well, I was thinking to create a 
volume in ceph and attach this volume to all compute nodes.

just wondering is this doable?

(I have already attached this volume to 2 servers, server A and server B. If I 
write something in server A, seems it is not visible to server B. I have to 
re-attach the volume to server B so that server B can see it.)

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mapping RBD On Ceph Cluster Node

2016-04-30 Thread Edward Huyer

On Apr 29, 2016 11:46 PM, Gregory Farnum <gfar...@redhat.com> wrote:
>
> On Friday, April 29, 2016, Edward Huyer <erhvks@

On Friday, April 29, 2016, Edward Huyer <erh...@rit.edu<mailto:erh...@rit.edu>> 
wrote:
This is more of a "why" than a "can I/should I" question.

The Ceph block device quickstart says (if I interpret it correctly) not to use 
a physical machine as both a Ceph RBD client and a node for hosting OSDs or 
other Ceph services.

Is this interpretation correct? If so, what is the reasoning? If not, what is 
it actually saying?

It's important not to use the kernel rbd mount on a machine hosting OSDs 
because if it runs like on memory and tries to flush out shift pages, but the 
OSD needs to allocate memory to handle the write... You have a problem!

Hosting all userspace processes shouldn't be a problem, though, apart from the 
usual resource contention problems of running hyper-converged.
-Greg


Thanks in advance.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mapping RBD On Ceph Cluster Node

2016-04-29 Thread Edward Huyer
This is more of a "why" than a "can I/should I" question.

The Ceph block device quickstart says (if I interpret it correctly) not to use 
a physical machine as both a Ceph RBD client and a node for hosting OSDs or 
other Ceph services.

Is this interpretation correct? If so, what is the reasoning? If not, what is 
it actually saying?

Thanks in advance.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Weird/normal behavior when creating filesystem on RBD volume

2016-04-22 Thread Edward Huyer
Hi, I'm seeing (have always seen) odd behavior when I first put a filesystem on 
a newly created RBD volume.  I've seen this on two different clusters across 
multiple major Ceph revisions.

When I create an RBD volume, map it on a client and then do (for instance) 
mkfs.xfs on it, mkfs.xfs will just sit there and hang for a number of minutes, 
seemingly doing nothing.  During this time, load on the OSDs (both the CPU 
usage of the daemons and actual IO on the disks) will spike dramatically.  
After a while, the load will subside and the mkfs will proceed as normal.

Can anyone explain what's going on here?  I have a pretty strong notion, but 
I'm hoping someone can give a definite answer.

This behavior appears to be normal, so I'm not actually worried about it.  It 
just makes myself and some coworkers go "huh, I wonder what causes that".

-----
Edward Huyer
School of Interactive Games and Media
Golisano 70-2373
152 Lomb Memorial Drive
Rochester, NY 14623
585-475-6651
erh...@rit.edu<mailto:erh...@rit.edu>

Obligatory Legalese:
The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy Intended Purpose

2013-07-12 Thread Edward Huyer
I'm working on deploying a multi-machine (possibly as many as 7) ceph (61.4) 
cluster for experimentation.  I'm trying to deploy using ceph-deploy on Ubuntu, 
but it seems...flaky.  For instance, I tried to deploy additional monitors and 
ran into the bug(?) where the additional monitors don't work if you don't have 
public network defined in ceph.conf, but by the time I found that bit of info 
I had already blown up the cluster.

So my question is, is ceph-deploy the preferred method for deploying larger 
clusters, particularly in production, or is it a quick-and-dirty 
get-something-going-to-play-with tool and manual configuration is preferred for 
real clusters?  I've seen documentation suggesting it's not intended for use 
in real clusters, but a lot of other documentation seems to assume it's the 
default deploy tool.

-
Edward Huyer
School of Interactive Games and Media
Golisano 70-2373
152 Lomb Memorial Drive
Rochester, NY 14623
585-475-6651
erh...@rit.edumailto:erh...@rit.edu

Obligatory Legalese:
The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Resizing filesystem on RBD without unmount/mount cycle

2013-06-24 Thread Edward Huyer
 -Original Message-
 From: John Nielsen [mailto:li...@jnielsen.net]
 Sent: Monday, June 24, 2013 1:24 PM
 To: Edward Huyer
 Cc: ceph-us...@ceph.com
 Subject: Re: [ceph-users] Resizing filesystem on RBD without
 unmount/mount cycle
 
 On Jun 24, 2013, at 9:13 AM, Edward Huyer erh...@rit.edu wrote:
 
  I'm experimenting with ceph 0.61.4 and RBD under Ubuntu 13.0x.  I create
 a RADOS block device (test), map it, format it as ext4 or xfs, and mount it.  
 No
 problem.  I grow the underlying RBD.  lsblk on both /dev/rbd/rbd/test and
 /dev/rbd1 shows the new size, but the filesystem resize commands don't
 see the new size until I unmount and then mount the block device again.  -o
 remount isn't good enough, nor is partprobe.
 
  Is there a way to club the filesystem tools into recognizing that the RBD 
  has
 changed sizes without unmounting the filesystem?
 
 I know this is possible with e.g. virtual machines (c.f. virsh 
 blockresize), so I
 agree it _ought_ to work. I don't know if the RBD kernel module has or needs
 any special support for online resizing.
 
 It may work the same as partprobe, but have you tried blockdev --
 rereadpt?

No dice with blockdev.

I just now ran across this older thread in ceph-devel that seems to imply what 
I want isn't possible:
http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/8013

It looks like the kernel can't update the size of a block device while the 
block device is in use.  Boo.

Thanks anyway.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] palcing SSDs and SATAs in same hosts

2013-06-20 Thread Edward Huyer
 Hi,
 
 I am thinking how to make ceph with 2 pools - fast and slow.
 Plan is to use SSDs and SATAs(or SAS) in the same hosts and define pools that
 use fast and slow disks accordingly. Later it would be easy to grow either 
 pool
 by need.
 
 I found example for CRUSH map that does similar thing by defining 2 root
 hierarchies where hosts are used in steps
 http://ceph.com/docs/master/rados/operations/crush-map/ (see Placing
 Different Pools on Different OSDS)
 
 but if 2 types of disks are in same hosts it will not work.
 
 How to do this? Do I have to define some fake entity in crushmap to make 2
 pools work on same hardware?

If I'm reading the documentation correctly, you probably have to run two 
(software) clusters.  Ceph is capable of running multiple clusters on the same 
hardware.
http://ceph.com/docs/master/rados/configuration/ceph-conf/#running-multiple-clusters
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New User Q: General config, massive temporary OSD loss

2013-06-18 Thread Edward Huyer
Hi, I'm an admin for the School of Interactive Games and Media at RIT, and 
looking into using ceph to reorganize/consolidate the storage my department is 
using.  I've read a lot of documentation and comments/discussion on the web, 
but I'm not 100% sure what I'm looking at doing is a good use of ceph.  I was 
hoping to get some input on that, as well as an answer to a more specific 
question about OSDs going offline.


First questions:  Are there obvious flaws or concerns with the following 
configuration I should be aware of?  Does it even make sense to try to use ceph 
here?  Anything else I should know, think about, or do instead of the above?


My more specific question relates to the two RAID controllers in the MD3200, 
and my intended 2 or 3 copy replication (also striping):  What happens if all 
OSDs with copies of a piece of data go down for a period of time, but then the 
OSDs come back intact (e.g. by moving them to a different controller)?

I know this can be limited or prevented entirely using good failure domain 
organization, but it's still a question I haven't found an answer to.


Sorry  for the wall o' text, and thanks in advance for any help or advice you 
can provide.


Proposed Configuration/Architecture:

I'll note that most of the post-implementation utilization would be in the form 
of RBDs mounted over 10Gbit Ethernet, which will then be used for file storage 
and/or KVM virtual drives.  Once it's stable enough for production I'd like to 
use cephFS for file storage, but not just yet.

Currently, we have a Dell MD3200 and two attached MD1200 set up as a sort of 
mini-SAN, which is then exporting chunks of block to various servers via SAS.  
The whole assemblage has a total capacity of 48TB spread across 36 disks (24x 
1TB drives and 12x 3TB drives).  We also have several TB of storage scattered 
across 10ish drives in 2-3 actual servers.  This is all raw capacity on 7200RPM 
drives.

There are a few problem with this configuration:

-  Expanding storage in a usable way is fairly difficult, especially if 
the drives don't match existing drive sizes in the MD SAN.

-  It is relatively easy to end up with slack in the storage; large 
chunks of storage that can't be easily allocated or reallocated.

-  Limited number of systems able to directly access the storage (due 
to limited SAS ports)

-  Difficult to upgrade (massive manual data migration needed)

-  2 points of failure (exactly 2 RAID controllers in the MD3200; ceph 
wouldn't solve this problem immediately)

My goals are to have easily expandable and upgradable storage for several 
servers (mainly, but not entirely, virtual), eliminate the slack in the 
current configuration, and to be able to relatively easily migrate away from 
the MD pseudo-SAN in the future.  I'm also looking to lay the infrastructure 
ground work for an OpenStack cloud implementation or similar.

My notion is to allocate and split the contents of the Dell MD array as 
individual drives to 3-4 servers (6-9 drives per server), which will then 
configure each drive as an OSD using XFS.  I'd likely set up SSDs as journaling 
drives, each containing journals for ~6 OSDs (10-20GB journal per OSD).  The 
servers would have appropriate RAM (1+GB per OSD, 1-2GB for the monitors), and 
would be 4-6 core Core i-generation Xeons.

The back-end communication network for ceph would be 10GB Ethernet.  I may have 
to have the ceph clients read their data from that back-end network as well; I 
realize this is probably not ideal, but I'm hoping/thinking it will be good 
enough.


-
Edward Huyer
School of Interactive Games and Media
Golisano 70-2373
152 Lomb Memorial Drive
Rochester, NY 14623
585-475-6651
erh...@rit.edumailto:erh...@rit.edu

Obligatory Legalese:
The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New User Q: General config, massive temporary OSD loss

2013-06-18 Thread Edward Huyer
 [ Please stay on the list. :) ]

Doh.  Was trying to get Outlook to quote properly, and forgot to hit Reply-all. 
 :)

  The specifics of what data will migrate where will depend on how
  you've set up your CRUSH map, when you're updating the CRUSH
  locations, etc, but if you move an OSD then it will fully participate
  in recovery and can be used as the authoritative source for data.
 
  Ok, so if data chunk bar lives only on OSDs 3, 4, and 5, and OSDs 3,
  4, and 5 suddenly vanish for some reason but then come back later
  (with their data intact), the cluster will recover more-or-less
  gracefully?  That is, it *won't* go sorry, your RBD 'foobarbaz' lost
  'bar' for a while, all that data is gone?  I would *assume* it has a
  way to recover more-or-less gracefully, but it's also not something I
  want to discover the answer to later.  :)
 
 Well, if the data goes away and you try and read it, the request is just going
 to hang, and presumably eventually the kernel/hypervisor/block device
 whatever will time out and throw an error. At that point you have a choice
 between marking it lost (at which point it will say ENOENT to requests to
 access it, and RBD will turn that into zero blocks) or getting the data back
 online. When you do bring it back online, it will peer and then be accessible
 again without much fuss (if you only bring back one copy it might kick off a
 bunch of network and disk traffic re-replicating).

Awesome, that's exactly how I would want it to work.  If the drives themselves 
somehow manage to all catch on fire at the same time, I can still recover some 
of the data on the RBD, but as long as the drives are ok I should be able to 
bring the data back.

Thanks for your help, I really appreciate it!  Now to do some testing and 
fiddling.  :)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com