Re: [vdsm] [RFC] GlusterFS domain specific changes

Ayal Baron Fri, 07 Sep 2012 14:07:43 -0700


----- Original Message -----
> On Fri, 07 Sep 2012 13:27:15 +0530, "M. Mohan Kumar"
> <mo...@in.ibm.com> wrote:
> > On Fri, 07 Sep 2012 09:35:10 +0300, Itamar Heim <ih...@redhat.com>
> > wrote:
> > > On 09/07/2012 08:21 AM, M. Mohan Kumar wrote:
> > > > On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron
> > > > <aba...@redhat.com> wrote:
> > > >>
> > > >>
> > >
> > > >
> > > > For start using the LVs we will always do truncate for the
> > > > required
> > > > size, it will resize the LV. I didn't get what you are
> > > > mentioning about
> > > > thin-provisioning, but I have a dumb code using dm-thin targets
> > > > showing
> > > > BD xlators can be extended to use dm-thin targets for
> > > > thin-provisioning.
> > > 
> > > so even though this is block storage, it will be extended as
> > > needed? how
> > > does that work exactly?
> > > say i have a VM with a 100GB disk.
> > > thin provisioning means we only allocated 1GB to it, then as the
> > > guest
> > > uses that storage, we allocate more as needed (lvextend, pause
> > > guest,
> > > lvrefresh, resume guest)
> > > 
> > > 
> > 
> > When we use device=lv, it means we use only thick provisioned
> > logical
> > volumes. If this logical volume runs out of space in the guest, one
> > can
> > resize it from the client by using truncate (results in lvresize at
> > the
> > server side) and run filesystem tools at guest to get added space.
> > 
> > But with device=thin type, all LVs are thinly provisioned and
> > allocating
> > space to them is taken care by device-mapper thin target
> > automatically. The thin-pool should have enough space to
> > accomoodate the
> > sizing requirements.
> > 
> As of now BD xlator supports only working with linear Logical
> volumes,
> they are thick provisioned. gluster cli command "gluster volume
> create"
> with option "device=lv" allows to work with logical volumes as files.
> 
> As a POC I have a code(not posted to external list), with option
> "device=thin" to gluster volume create command it allows to work with
> thin provisioned targets. But it does not take care of resizing
> thin-pool when it reaches low-level threshold. Supporting thin
> targets
> is in our TODO list. We have dependency on lvm2 library to provide
> apis
> to create thin-targets.


I'm definitely missing some background here.
1. Can the LV span on multiple bricks in gLuster?
 i. If 'yes' then
   a. do you use gLuster's replication and distribution schemes to gain 
performance and redundancy?
   b. what performance gain is there over normal gLuster with files?
 ii. If 'not' then you're only exposing single host local storage LVM? (in 
which case I don't see why gLuster is used at all and where).

From a different angle, the only benefit I can think of in exposing a fs 
interface over LVM is for consumers who do not wish to know the details of the 
underlying storage but want the performance gain of using block storage.
vdsm is already intimately familiar with LVM and block devices, so adding the 
FS layer scheme on top doesn't strike me as adding any value. In addition, you 
require the consumer to know a lot about your interface because it's not truely 
a FS interface.  e.g. consumer is not allowed to create directories, files are 
not sparse, not to mention that if you're indeed using LVM then I don't think 
you're considering the VG MD and extent size limitations:
1. LVM currently has severe limitations wrt number of objects it can manage 
(the limitation is actually the size of the VG metadata, but the distinction is 
not important just yet).  This means that creating a metadata LV in addition to 
each data LV is very costly (at around 1000 LVs you'd hit a problem.  vdsm 
currently creates 2 files per snapshot (the data and a small file with metadata 
describing it) meaning that you'd reach this limit really fast.
2. LVM max LV size is extent size * 65K, this means that if I choose a 4K 
extent size then my max LV size would be 256MB. This obviously won't do for VMs 
disks so you'd choose a much larget extent size.  However a larger extent size 
means that each metadata file vdsm creates wastes a lot of storage space.  So 
even if LVM could scale, your storage usage plummets and your $/MB ratio 
increases.
The way around this is of course not to have a metadata file per volume but 
have 1 file containing all the metadata, but then that means I'm fully aware of 
the limitations of the environment and treating my objects as files gains me 
nothing (but does require a new hybrid domain, a lot more code etc).

Also note that without thin provisioning we loose our ability to create 
snapshots.

> 
> 
>  
> 
> _______________________________________________
> vdsm-devel mailing list
> vdsm-devel@lists.fedorahosted.org
> https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
> 
_______________________________________________
vdsm-devel mailing list
vdsm-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel

Re: [vdsm] [RFC] GlusterFS domain specific changes

Reply via email to