Re: [vdsm] [RFC] GlusterFS domain specific changes

M. Mohan Kumar Mon, 10 Sep 2012 10:20:36 -0700

On Fri, 7 Sep 2012 17:07:28 -0400 (EDT), Ayal Baron <aba...@redhat.com> wrote:
> 
> 
> ----- Original Message -----
> > As of now BD xlator supports only working with linear Logical
> > volumes,
> > they are thick provisioned. gluster cli command "gluster volume
> > create"
> > with option "device=lv" allows to work with logical volumes as files.
> > 
> > As a POC I have a code(not posted to external list), with option
> > "device=thin" to gluster volume create command it allows to work with
> > thin provisioned targets. But it does not take care of resizing
> > thin-pool when it reaches low-level threshold. Supporting thin
> > targets
> > is in our TODO list. We have dependency on lvm2 library to provide
> > apis
> > to create thin-targets.
> 
> I'm definitely missing some background here.
> 1. Can the LV span on multiple bricks in gLuster?
>  i. If 'yes' then
>    a. do you use gLuster's replication and distribution schemes to gain 
> performance and redundancy?
>    b. what performance gain is there over normal gLuster with files?
>  ii. If 'not' then you're only exposing single host local storage LVM? (in 
> which case I don't see why gLuster is used at all and where).
>


No, as of now BD Xlator works only with one brick. There are some issues
in supporting GlusterFS features such as replication and stripe from BD
xlator. We are still evaluating BD xlator for such scenarios. 

Advantages of BD Xlator:
 (*) Ease of use and unified management for both file and block based
 storage.
 (*) Making block devices available to nodes which don't have direct
 access to SAN. Supporting migration to nodes which don't have SAN
 access.
 (*) With FS interfaces, it becomes easier to support T10 extensions
 like xcopy, writesame (Currently not supported, future plan)
 (*) Use of dm-thin logical volumes to provide VM images that are
 inherently thin provisioned. It allows multi-level snapshot. When we
 support thin-provisioned logical volumes with 'unmap' support its
 almost equivalant to sparse files. This is also a future plan.                 
                                                                                
                   
 
> From a different angle, the only benefit I can think of in exposing a fs 
> interface over LVM is for consumers who do not wish to know the details of 
> the underlying storage but want the performance gain of using block storage.
> vdsm is already intimately familiar with LVM and block devices, so adding the 
> FS layer scheme on top doesn't strike me as adding any value. In addition, 
> you require the consumer to know a lot about your interface because it's not 
> truely a FS interface.  e.g. consumer is not allowed to create directories, 
> files are not sparse, not to mention that if you're indeed using LVM then I 
> don't think you're considering the VG MD and extent size limitations:
> 1. LVM currently has severe limitations wrt number of objects it can manage 
> (the limitation is actually the size of the VG metadata, but the distinction 
> is not important just yet).  This means that creating a metadata LV in 
> addition to each data LV is very costly (at around 1000 LVs you'd hit a 
> problem.  vdsm currently creates 2 files per snapshot (the data and a small 
> file with metadata describing it) meaning that you'd reach this limit really 
> fast.
> 2. LVM max LV size is extent size * 65K, this means that if I choose a 4K 
> extent size then my max LV size would be 256MB. This obviously won't do for 
> VMs disks so you'd choose a much larget extent size.  However a larger extent 
> size means that each metadata file vdsm creates wastes a lot of storage 
> space.  So even if LVM could scale, your storage usage plummets and your $/MB 
> ratio increases.
> The way around this is of course not to have a metadata file per volume but 
> have 1 file containing all the metadata, but then that means I'm fully aware 
> of the limitations of the environment and treating my objects as files gains 
> me nothing (but does require a new hybrid domain, a lot more code etc).
>

GlusterFS + BD xlator domain will be similar to block based storage
domain. IIUC in block based storage, VSDM will not create as many
LVs(files) similar to posix based storage.

BD xlator provides filesystem kind of interface to create/manipulate LVs
while in block based storage domain commands like lvcreate, lvextend
commands are used to manipulate them. ie BD xlator provides FS interface
for block based storage domain.

In future when we have proper support for reflink[1] cp --reflink can be
used for creating linked clone. Also there was a discussion in the past
on copyfile[2] interface which could be used to create full clone of lvs

[1] http://marc.info/?l=linux-fsdevel&m=125296717319013&w=2
[2] http://www.spinics.net/lists/linux-nfs/msg26203.html

> Also note that without thin provisioning we loose our ability to create 
> snapshots.
> 
Could you please explain it?

_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel

Re: [vdsm] [RFC] GlusterFS domain specific changes

Reply via email to