Re: [vdsm] [RFC] GlusterFS domain specific changes
On Fri, 7 Sep 2012 17:07:28 -0400 (EDT), Ayal Baron wrote: > > > - Original Message - > > As of now BD xlator supports only working with linear Logical > > volumes, > > they are thick provisioned. gluster cli command "gluster volume > > create" > > with option "device=lv" allows to work with logical volumes as files. > > > > As a POC I have a code(not posted to external list), with option > > "device=thin" to gluster volume create command it allows to work with > > thin provisioned targets. But it does not take care of resizing > > thin-pool when it reaches low-level threshold. Supporting thin > > targets > > is in our TODO list. We have dependency on lvm2 library to provide > > apis > > to create thin-targets. > > I'm definitely missing some background here. > 1. Can the LV span on multiple bricks in gLuster? > i. If 'yes' then >a. do you use gLuster's replication and distribution schemes to gain > performance and redundancy? >b. what performance gain is there over normal gLuster with files? > ii. If 'not' then you're only exposing single host local storage LVM? (in > which case I don't see why gLuster is used at all and where). > No, as of now BD Xlator works only with one brick. There are some issues in supporting GlusterFS features such as replication and stripe from BD xlator. We are still evaluating BD xlator for such scenarios. Advantages of BD Xlator: (*) Ease of use and unified management for both file and block based storage. (*) Making block devices available to nodes which don't have direct access to SAN. Supporting migration to nodes which don't have SAN access. (*) With FS interfaces, it becomes easier to support T10 extensions like xcopy, writesame (Currently not supported, future plan) (*) Use of dm-thin logical volumes to provide VM images that are inherently thin provisioned. It allows multi-level snapshot. When we support thin-provisioned logical volumes with 'unmap' support its almost equivalant to sparse files. This is also a future plan. > From a different angle, the only benefit I can think of in exposing a fs > interface over LVM is for consumers who do not wish to know the details of > the underlying storage but want the performance gain of using block storage. > vdsm is already intimately familiar with LVM and block devices, so adding the > FS layer scheme on top doesn't strike me as adding any value. In addition, > you require the consumer to know a lot about your interface because it's not > truely a FS interface. e.g. consumer is not allowed to create directories, > files are not sparse, not to mention that if you're indeed using LVM then I > don't think you're considering the VG MD and extent size limitations: > 1. LVM currently has severe limitations wrt number of objects it can manage > (the limitation is actually the size of the VG metadata, but the distinction > is not important just yet). This means that creating a metadata LV in > addition to each data LV is very costly (at around 1000 LVs you'd hit a > problem. vdsm currently creates 2 files per snapshot (the data and a small > file with metadata describing it) meaning that you'd reach this limit really > fast. > 2. LVM max LV size is extent size * 65K, this means that if I choose a 4K > extent size then my max LV size would be 256MB. This obviously won't do for > VMs disks so you'd choose a much larget extent size. However a larger extent > size means that each metadata file vdsm creates wastes a lot of storage > space. So even if LVM could scale, your storage usage plummets and your $/MB > ratio increases. > The way around this is of course not to have a metadata file per volume but > have 1 file containing all the metadata, but then that means I'm fully aware > of the limitations of the environment and treating my objects as files gains > me nothing (but does require a new hybrid domain, a lot more code etc). > GlusterFS + BD xlator domain will be similar to block based storage domain. IIUC in block based storage, VSDM will not create as many LVs(files) similar to posix based storage. BD xlator provides filesystem kind of interface to create/manipulate LVs while in block based storage domain commands like lvcreate, lvextend commands are used to manipulate them. ie BD xlator provides FS interface for block based storage domain. In future when we have proper support for reflink[1] cp --reflink can be used for creating linked clone. Also there was a discussion in the past on copyfile[2] interface which could be used to create full clone of lvs [1] http://marc.info/?l=linux-fsdevel&m=125296717319013&w=2 [2] http://www.spinics.net/lists/linux-nfs/msg26203.html > Also note that without thin provisioning we loose our ability to create > snapshots. > Could you please explain it? ___ vdsm-deve
Re: [vdsm] [RFC] GlusterFS domain specific changes
- Original Message - > On Fri, 07 Sep 2012 13:27:15 +0530, "M. Mohan Kumar" > wrote: > > On Fri, 07 Sep 2012 09:35:10 +0300, Itamar Heim > > wrote: > > > On 09/07/2012 08:21 AM, M. Mohan Kumar wrote: > > > > On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron > > > > wrote: > > > >> > > > >> > > > > > > > > > > > For start using the LVs we will always do truncate for the > > > > required > > > > size, it will resize the LV. I didn't get what you are > > > > mentioning about > > > > thin-provisioning, but I have a dumb code using dm-thin targets > > > > showing > > > > BD xlators can be extended to use dm-thin targets for > > > > thin-provisioning. > > > > > > so even though this is block storage, it will be extended as > > > needed? how > > > does that work exactly? > > > say i have a VM with a 100GB disk. > > > thin provisioning means we only allocated 1GB to it, then as the > > > guest > > > uses that storage, we allocate more as needed (lvextend, pause > > > guest, > > > lvrefresh, resume guest) > > > > > > > > > > When we use device=lv, it means we use only thick provisioned > > logical > > volumes. If this logical volume runs out of space in the guest, one > > can > > resize it from the client by using truncate (results in lvresize at > > the > > server side) and run filesystem tools at guest to get added space. > > > > But with device=thin type, all LVs are thinly provisioned and > > allocating > > space to them is taken care by device-mapper thin target > > automatically. The thin-pool should have enough space to > > accomoodate the > > sizing requirements. > > > As of now BD xlator supports only working with linear Logical > volumes, > they are thick provisioned. gluster cli command "gluster volume > create" > with option "device=lv" allows to work with logical volumes as files. > > As a POC I have a code(not posted to external list), with option > "device=thin" to gluster volume create command it allows to work with > thin provisioned targets. But it does not take care of resizing > thin-pool when it reaches low-level threshold. Supporting thin > targets > is in our TODO list. We have dependency on lvm2 library to provide > apis > to create thin-targets. I'm definitely missing some background here. 1. Can the LV span on multiple bricks in gLuster? i. If 'yes' then a. do you use gLuster's replication and distribution schemes to gain performance and redundancy? b. what performance gain is there over normal gLuster with files? ii. If 'not' then you're only exposing single host local storage LVM? (in which case I don't see why gLuster is used at all and where). From a different angle, the only benefit I can think of in exposing a fs interface over LVM is for consumers who do not wish to know the details of the underlying storage but want the performance gain of using block storage. vdsm is already intimately familiar with LVM and block devices, so adding the FS layer scheme on top doesn't strike me as adding any value. In addition, you require the consumer to know a lot about your interface because it's not truely a FS interface. e.g. consumer is not allowed to create directories, files are not sparse, not to mention that if you're indeed using LVM then I don't think you're considering the VG MD and extent size limitations: 1. LVM currently has severe limitations wrt number of objects it can manage (the limitation is actually the size of the VG metadata, but the distinction is not important just yet). This means that creating a metadata LV in addition to each data LV is very costly (at around 1000 LVs you'd hit a problem. vdsm currently creates 2 files per snapshot (the data and a small file with metadata describing it) meaning that you'd reach this limit really fast. 2. LVM max LV size is extent size * 65K, this means that if I choose a 4K extent size then my max LV size would be 256MB. This obviously won't do for VMs disks so you'd choose a much larget extent size. However a larger extent size means that each metadata file vdsm creates wastes a lot of storage space. So even if LVM could scale, your storage usage plummets and your $/MB ratio increases. The way around this is of course not to have a metadata file per volume but have 1 file containing all the metadata, but then that means I'm fully aware of the limitations of the environment and treating my objects as files gains me nothing (but does require a new hybrid domain, a lot more code etc). Also note that without thin provisioning we loose our ability to create snapshots. > > > > > ___ > vdsm-devel mailing list > vdsm-devel@lists.fedorahosted.org > https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel > ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [RFC] GlusterFS domain specific changes
On Fri, 07 Sep 2012 13:27:15 +0530, "M. Mohan Kumar" wrote: > On Fri, 07 Sep 2012 09:35:10 +0300, Itamar Heim wrote: > > On 09/07/2012 08:21 AM, M. Mohan Kumar wrote: > > > On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron > > > wrote: > > >> > > >> > > > > > > > > For start using the LVs we will always do truncate for the required > > > size, it will resize the LV. I didn't get what you are mentioning about > > > thin-provisioning, but I have a dumb code using dm-thin targets showing > > > BD xlators can be extended to use dm-thin targets for thin-provisioning. > > > > so even though this is block storage, it will be extended as needed? how > > does that work exactly? > > say i have a VM with a 100GB disk. > > thin provisioning means we only allocated 1GB to it, then as the guest > > uses that storage, we allocate more as needed (lvextend, pause guest, > > lvrefresh, resume guest) > > > > > > When we use device=lv, it means we use only thick provisioned logical > volumes. If this logical volume runs out of space in the guest, one can > resize it from the client by using truncate (results in lvresize at the > server side) and run filesystem tools at guest to get added space. > > But with device=thin type, all LVs are thinly provisioned and allocating > space to them is taken care by device-mapper thin target > automatically. The thin-pool should have enough space to accomoodate the > sizing requirements. > As of now BD xlator supports only working with linear Logical volumes, they are thick provisioned. gluster cli command "gluster volume create" with option "device=lv" allows to work with logical volumes as files. As a POC I have a code(not posted to external list), with option "device=thin" to gluster volume create command it allows to work with thin provisioned targets. But it does not take care of resizing thin-pool when it reaches low-level threshold. Supporting thin targets is in our TODO list. We have dependency on lvm2 library to provide apis to create thin-targets. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [RFC] GlusterFS domain specific changes
On Fri, 07 Sep 2012 14:23:08 +0800, Shu Ming wrote: > 于 2012-9-7 13:21, M. Mohan Kumar 写道: > > On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron > > wrote: > >> > >> - Original Message - > >>> - Original Message - > From: "M. Mohan Kumar" > To: vdsm-devel@lists.fedorahosted.org > Sent: Wednesday, July 25, 2012 1:26:15 PM > Subject: [vdsm] [RFC] GlusterFS domain specific changes > > > We are developing a GlusterFS server translator to export block > devices > as regular files to the client. Using block devices to serve VM > images > gives performance improvements, since it avoids some file system > bottlenecks in the host kernel. Goal is to use one block device(ie > file > at the client side) per VM image and feed this file to QEMU to get > the > performance improvements. QEMU will talk to glusterfs server > directly > using libgfapi. > > Currently we support only exporting Volume groups and Logical > Volumes. Logical volumes are exported as regular files to the > client. > >> Are you actually using LVM behind the scenes? > >> If so, why bother with exposing the LVs as files and not raw block devices? > >> > > Ayal, > > > > The idea is to provide a FS interface for managing block devices. One > > can mount the Block Device Gluster Volume and create a LV and size it > > just by > > $ touch lv1 > > $ truncate -s5G lv1 > > > > And other file commands can be used to clone LVs, snapshot LVs > > $ ln lv1 lv2 # clones > > $ ln -s lv1 lv1.sn # creates snapshot > Do we have special reason to use "ln"? > Why not use "cp" as the comannd to do the snapshot instead of "ln"? cp involves opening source file in read-only mode, opening/creating destination file with write-mode and issue series of read on source file and write that into destination file till end of source file. But we can't apply this to logical volume copy (or clone), because when we create a logical volume we have to specify the size, but thats not possible with above approach ie open/create does not take size as the parameter so we can't create destination lv with required size. But if I use link interface to copy LVs, VFS/FUSE/GlusterFS provides link() interface that takes source file, destination file name. In BD xlator link() code, I will get size of source LV and create destination LV with that size and copy the contents. This problem can be solved if we have a syscall copyfile(source, dest, size). There have been discussions in the past on copyfile() interface which could be made use of in this scenario copy. http://www.spinics.net/lists/linux-nfs/msg26203.html > > > > By enabling this feature GlusterFS can directly export storage in > > SAN. We are planning to add feature to export LUNs also as regular files > > in future. > > IMO, The major feature of GlusterFS is to export distributed local disks > to the clients. > If we have SAN in the backend, that means the storage block devices > should be exported > to clients natually. Why do we need GlusterSF to export the block > devices in SAN? > By enabling this feature we are allowing GlusterFS to work with local storage, NAS storage and SAN storage. ie it allows machines to access block devices from the SAN which are not directly connected to SAN storage. Also providing block devices as vm disk image has some advantages like * it does not incur host side filesystem over head * if storage arrays provide storage offload features such as flashcopy, it can be exploited (these offloads will be usually at LUN level) ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [RFC] GlusterFS domain specific changes
On Fri, 07 Sep 2012 09:35:10 +0300, Itamar Heim wrote: > On 09/07/2012 08:21 AM, M. Mohan Kumar wrote: > > On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron > > wrote: > >> > >> > > > > > For start using the LVs we will always do truncate for the required > > size, it will resize the LV. I didn't get what you are mentioning about > > thin-provisioning, but I have a dumb code using dm-thin targets showing > > BD xlators can be extended to use dm-thin targets for thin-provisioning. > > so even though this is block storage, it will be extended as needed? how > does that work exactly? > say i have a VM with a 100GB disk. > thin provisioning means we only allocated 1GB to it, then as the guest > uses that storage, we allocate more as needed (lvextend, pause guest, > lvrefresh, resume guest) > > When we use device=lv, it means we use only thick provisioned logical volumes. If this logical volume runs out of space in the guest, one can resize it from the client by using truncate (results in lvresize at the server side) and run filesystem tools at guest to get added space. But with device=thin type, all LVs are thinly provisioned and allocating space to them is taken care by device-mapper thin target automatically. The thin-pool should have enough space to accomoodate the sizing requirements. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [RFC] GlusterFS domain specific changes
On 09/07/2012 08:21 AM, M. Mohan Kumar wrote: On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron wrote: - Original Message - - Original Message - From: "M. Mohan Kumar" To: vdsm-devel@lists.fedorahosted.org Sent: Wednesday, July 25, 2012 1:26:15 PM Subject: [vdsm] [RFC] GlusterFS domain specific changes We are developing a GlusterFS server translator to export block devices as regular files to the client. Using block devices to serve VM images gives performance improvements, since it avoids some file system bottlenecks in the host kernel. Goal is to use one block device(ie file at the client side) per VM image and feed this file to QEMU to get the performance improvements. QEMU will talk to glusterfs server directly using libgfapi. Currently we support only exporting Volume groups and Logical Volumes. Logical volumes are exported as regular files to the client. Are you actually using LVM behind the scenes? If so, why bother with exposing the LVs as files and not raw block devices? Ayal, The idea is to provide a FS interface for managing block devices. One can mount the Block Device Gluster Volume and create a LV and size it just by $ touch lv1 $ truncate -s5G lv1 And other file commands can be used to clone LVs, snapshot LVs $ ln lv1 lv2 # clones $ ln -s lv1 lv1.sn # creates snapshot By enabling this feature GlusterFS can directly export storage in SAN. We are planning to add feature to export LUNs also as regular files in future. In GlusterFS terminology a volume capable of exporting block devices is created by specifying the 'Volume Group' (ie VG in Logical Volume management). Block Device translator(BD xlator) exports this volume group as a directory and LVs under it as regular files. In the gluster mount point creating a file results in creating a logical volume, removing a file results in removing logical volume etc. When a GlusterFS volume enabled with BD xlator is used, directory creation in that gluster mount path is not supported because directory maps to Volume groups in BD xlator. But it could be an issue in VDSM environment when a new VDSM volume is created for GlusterFS domain, VDSM mounts the storage domain and creates directories under that and create files for vm image and other uses (like meta data). Is it possible to modify this behavior in VDSM to use flat structure instead of creating directories and VM images and other files underneath it? ie for GlusterFS domain with BD xlator VDSM will not create any directory and only creates all required files under the mount point directory itself. From your description I think that the GlusterFS for block devices is actually more similar to what happens with the regular block domains. You should probably need to mount the share somewhere in the system and then use symlinks to point to the volumes. Create a regular block domain and look inside /rhev/data-center/mnt/blockSD, you'll probably get the idea of what I mean. That said we'd need to come up with a way of extending the LVs on the gluster server when required (for thin provisioning). Why? if it's exposed as a file that probably means it supports sparseness. i.e. if this becomes a new type of block domain it should only support 'preallocated' images. For start using the LVs we will always do truncate for the required size, it will resize the LV. I didn't get what you are mentioning about thin-provisioning, but I have a dumb code using dm-thin targets showing BD xlators can be extended to use dm-thin targets for thin-provisioning. so even though this is block storage, it will be extended as needed? how does that work exactly? say i have a VM with a 100GB disk. thin provisioning means we only allocated 1GB to it, then as the guest uses that storage, we allocate more as needed (lvextend, pause guest, lvrefresh, resume guest) ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [RFC] GlusterFS domain specific changes
于 2012-9-7 13:21, M. Mohan Kumar 写道: On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron wrote: - Original Message - - Original Message - From: "M. Mohan Kumar" To: vdsm-devel@lists.fedorahosted.org Sent: Wednesday, July 25, 2012 1:26:15 PM Subject: [vdsm] [RFC] GlusterFS domain specific changes We are developing a GlusterFS server translator to export block devices as regular files to the client. Using block devices to serve VM images gives performance improvements, since it avoids some file system bottlenecks in the host kernel. Goal is to use one block device(ie file at the client side) per VM image and feed this file to QEMU to get the performance improvements. QEMU will talk to glusterfs server directly using libgfapi. Currently we support only exporting Volume groups and Logical Volumes. Logical volumes are exported as regular files to the client. Are you actually using LVM behind the scenes? If so, why bother with exposing the LVs as files and not raw block devices? Ayal, The idea is to provide a FS interface for managing block devices. One can mount the Block Device Gluster Volume and create a LV and size it just by $ touch lv1 $ truncate -s5G lv1 And other file commands can be used to clone LVs, snapshot LVs $ ln lv1 lv2 # clones $ ln -s lv1 lv1.sn # creates snapshot Do we have special reason to use "ln"? Why not use "cp" as the comannd to do the snapshot instead of "ln"? By enabling this feature GlusterFS can directly export storage in SAN. We are planning to add feature to export LUNs also as regular files in future. IMO, The major feature of GlusterFS is to export distributed local disks to the clients. If we have SAN in the backend, that means the storage block devices should be exported to clients natually. Why do we need GlusterSF to export the block devices in SAN? In GlusterFS terminology a volume capable of exporting block devices is created by specifying the 'Volume Group' (ie VG in Logical Volume management). Block Device translator(BD xlator) exports this volume group as a directory and LVs under it as regular files. In the gluster mount point creating a file results in creating a logical volume, removing a file results in removing logical volume etc. When a GlusterFS volume enabled with BD xlator is used, directory creation in that gluster mount path is not supported because directory maps to Volume groups in BD xlator. But it could be an issue in VDSM environment when a new VDSM volume is created for GlusterFS domain, VDSM mounts the storage domain and creates directories under that and create files for vm image and other uses (like meta data). Is it possible to modify this behavior in VDSM to use flat structure instead of creating directories and VM images and other files underneath it? ie for GlusterFS domain with BD xlator VDSM will not create any directory and only creates all required files under the mount point directory itself. From your description I think that the GlusterFS for block devices is actually more similar to what happens with the regular block domains. You should probably need to mount the share somewhere in the system and then use symlinks to point to the volumes. Create a regular block domain and look inside /rhev/data-center/mnt/blockSD, you'll probably get the idea of what I mean. That said we'd need to come up with a way of extending the LVs on the gluster server when required (for thin provisioning). Why? if it's exposed as a file that probably means it supports sparseness. i.e. if this becomes a new type of block domain it should only support 'preallocated' images. For start using the LVs we will always do truncate for the required size, it will resize the LV. I didn't get what you are mentioning about thin-provisioning, but I have a dumb code using dm-thin targets showing BD xlators can be extended to use dm-thin targets for thin-provisioning. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel -- --- 舒明 Shu Ming Open Virtualization Engineerning; CSTL, IBM Corp. Tel: 86-10-82451626 Tieline: 9051626 E-mail: shum...@cn.ibm.com or shum...@linux.vnet.ibm.com Address: 3/F Ring Building, ZhongGuanCun Software Park, Haidian District, Beijing 100193, PRC ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [RFC] GlusterFS domain specific changes
On Thu, 6 Sep 2012 18:59:19 -0400 (EDT), Ayal Baron wrote: > > > - Original Message - > > - Original Message - > > > From: "M. Mohan Kumar" > > > To: vdsm-devel@lists.fedorahosted.org > > > Sent: Wednesday, July 25, 2012 1:26:15 PM > > > Subject: [vdsm] [RFC] GlusterFS domain specific changes > > > > > > > > > We are developing a GlusterFS server translator to export block > > > devices > > > as regular files to the client. Using block devices to serve VM > > > images > > > gives performance improvements, since it avoids some file system > > > bottlenecks in the host kernel. Goal is to use one block device(ie > > > file > > > at the client side) per VM image and feed this file to QEMU to get > > > the > > > performance improvements. QEMU will talk to glusterfs server > > > directly > > > using libgfapi. > > > > > > Currently we support only exporting Volume groups and Logical > > > Volumes. Logical volumes are exported as regular files to the > > > client. > > Are you actually using LVM behind the scenes? > If so, why bother with exposing the LVs as files and not raw block devices? > Ayal, The idea is to provide a FS interface for managing block devices. One can mount the Block Device Gluster Volume and create a LV and size it just by $ touch lv1 $ truncate -s5G lv1 And other file commands can be used to clone LVs, snapshot LVs $ ln lv1 lv2 # clones $ ln -s lv1 lv1.sn # creates snapshot By enabling this feature GlusterFS can directly export storage in SAN. We are planning to add feature to export LUNs also as regular files in future. > > > > > > In GlusterFS terminology a volume capable of exporting block > > > devices is > > > created by specifying the 'Volume Group' (ie VG in Logical Volume > > > management). Block Device translator(BD xlator) exports this volume > > > group as a directory and LVs under it as regular files. In the > > > gluster > > > mount point creating a file results in creating a logical volume, > > > removing a file results in removing logical volume etc. > > > > > > When a GlusterFS volume enabled with BD xlator is used, directory > > > creation in that gluster mount path is not supported because > > > directory > > > maps to Volume groups in BD xlator. But it could be an issue in > > > VDSM > > > environment when a new VDSM volume is created for GlusterFS domain, > > > VDSM > > > mounts the storage domain and creates directories under that and > > > create > > > files for vm image and other uses (like meta data). > > > > > Is it possible to modify this behavior in VDSM to use flat > > > structure > > > instead of creating directories and VM images and other files > > > underneath > > > it? ie for GlusterFS domain with BD xlator VDSM will not create any > > > directory and only creates all required files under the mount point > > > directory itself. > > > > From your description I think that the GlusterFS for block devices is > > actually more similar to what happens with the regular block domains. > > You should probably need to mount the share somewhere in the system > > and > > then use symlinks to point to the volumes. > > > > Create a regular block domain and look inside > > /rhev/data-center/mnt/blockSD, > > you'll probably get the idea of what I mean. > > > > That said we'd need to come up with a way of extending the LVs on the > > gluster server when required (for thin provisioning). > > Why? if it's exposed as a file that probably means it supports sparseness. > i.e. if this becomes a new type of block domain it should only support > 'preallocated' images. > For start using the LVs we will always do truncate for the required size, it will resize the LV. I didn't get what you are mentioning about thin-provisioning, but I have a dumb code using dm-thin targets showing BD xlators can be extended to use dm-thin targets for thin-provisioning. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [RFC] GlusterFS domain specific changes
- Original Message - > - Original Message - > > From: "M. Mohan Kumar" > > To: vdsm-devel@lists.fedorahosted.org > > Sent: Wednesday, July 25, 2012 1:26:15 PM > > Subject: [vdsm] [RFC] GlusterFS domain specific changes > > > > > > We are developing a GlusterFS server translator to export block > > devices > > as regular files to the client. Using block devices to serve VM > > images > > gives performance improvements, since it avoids some file system > > bottlenecks in the host kernel. Goal is to use one block device(ie > > file > > at the client side) per VM image and feed this file to QEMU to get > > the > > performance improvements. QEMU will talk to glusterfs server > > directly > > using libgfapi. > > > > Currently we support only exporting Volume groups and Logical > > Volumes. Logical volumes are exported as regular files to the > > client. Are you actually using LVM behind the scenes? If so, why bother with exposing the LVs as files and not raw block devices? > > > > In GlusterFS terminology a volume capable of exporting block > > devices is > > created by specifying the 'Volume Group' (ie VG in Logical Volume > > management). Block Device translator(BD xlator) exports this volume > > group as a directory and LVs under it as regular files. In the > > gluster > > mount point creating a file results in creating a logical volume, > > removing a file results in removing logical volume etc. > > > > When a GlusterFS volume enabled with BD xlator is used, directory > > creation in that gluster mount path is not supported because > > directory > > maps to Volume groups in BD xlator. But it could be an issue in > > VDSM > > environment when a new VDSM volume is created for GlusterFS domain, > > VDSM > > mounts the storage domain and creates directories under that and > > create > > files for vm image and other uses (like meta data). > > > > Is it possible to modify this behavior in VDSM to use flat > > structure > > instead of creating directories and VM images and other files > > underneath > > it? ie for GlusterFS domain with BD xlator VDSM will not create any > > directory and only creates all required files under the mount point > > directory itself. > > From your description I think that the GlusterFS for block devices is > actually more similar to what happens with the regular block domains. > You should probably need to mount the share somewhere in the system > and > then use symlinks to point to the volumes. > > Create a regular block domain and look inside > /rhev/data-center/mnt/blockSD, > you'll probably get the idea of what I mean. > > That said we'd need to come up with a way of extending the LVs on the > gluster server when required (for thin provisioning). Why? if it's exposed as a file that probably means it supports sparseness. i.e. if this becomes a new type of block domain it should only support 'preallocated' images. > > -- > Federico > ___ > vdsm-devel mailing list > vdsm-devel@lists.fedorahosted.org > https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel > ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [RFC] GlusterFS domain specific changes
> - Original Message - > > From: "M. Mohan Kumar" > > To: vdsm-devel@lists.fedorahosted.org > > Sent: Wednesday, July 25, 2012 1:26:15 PM > > Subject: [vdsm] [RFC] GlusterFS domain specific changes > > > > > > We are developing a GlusterFS server translator to export block devices > > as regular files to the client. Using block devices to serve VM images > > gives performance improvements, since it avoids some file system > > bottlenecks in the host kernel. Goal is to use one block device(ie file > > at the client side) per VM image and feed this file to QEMU to get the > > performance improvements. QEMU will talk to glusterfs server directly > > using libgfapi. > > > > Currently we support only exporting Volume groups and Logical > > Volumes. Logical volumes are exported as regular files to the client. > > > > In GlusterFS terminology a volume capable of exporting block devices is > > created by specifying the 'Volume Group' (ie VG in Logical Volume > > management). Block Device translator(BD xlator) exports this volume > > group as a directory and LVs under it as regular files. In the gluster > > mount point creating a file results in creating a logical volume, > > removing a file results in removing logical volume etc. > > > > When a GlusterFS volume enabled with BD xlator is used, directory > > creation in that gluster mount path is not supported because directory > > maps to Volume groups in BD xlator. But it could be an issue in VDSM > > environment when a new VDSM volume is created for GlusterFS domain, VDSM > > mounts the storage domain and creates directories under that and create > > files for vm image and other uses (like meta data). > > > > Is it possible to modify this behavior in VDSM to use flat structure > > instead of creating directories and VM images and other files underneath > > it? ie for GlusterFS domain with BD xlator VDSM will not create any > > directory and only creates all required files under the mount point > > directory itself. > Thanks Federico. > From your description I think that the GlusterFS for block devices is > actually more similar to what happens with the regular block domains. > You should probably need to mount the share somewhere in the system and > then use symlinks to point to the volumes. > > Create a regular block domain and look inside /rhev/data-center/mnt/blockSD, > you'll probably get the idea of what I mean. > GlusterFS with BD xlator will be a hybrid of block and posix fs domains. I will try to post more information about this model. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [RFC] GlusterFS domain specific changes
- Original Message - > From: "M. Mohan Kumar" > To: vdsm-devel@lists.fedorahosted.org > Sent: Wednesday, July 25, 2012 1:26:15 PM > Subject: [vdsm] [RFC] GlusterFS domain specific changes > > > We are developing a GlusterFS server translator to export block devices > as regular files to the client. Using block devices to serve VM images > gives performance improvements, since it avoids some file system > bottlenecks in the host kernel. Goal is to use one block device(ie file > at the client side) per VM image and feed this file to QEMU to get the > performance improvements. QEMU will talk to glusterfs server directly > using libgfapi. > > Currently we support only exporting Volume groups and Logical > Volumes. Logical volumes are exported as regular files to the client. > > In GlusterFS terminology a volume capable of exporting block devices is > created by specifying the 'Volume Group' (ie VG in Logical Volume > management). Block Device translator(BD xlator) exports this volume > group as a directory and LVs under it as regular files. In the gluster > mount point creating a file results in creating a logical volume, > removing a file results in removing logical volume etc. > > When a GlusterFS volume enabled with BD xlator is used, directory > creation in that gluster mount path is not supported because directory > maps to Volume groups in BD xlator. But it could be an issue in VDSM > environment when a new VDSM volume is created for GlusterFS domain, VDSM > mounts the storage domain and creates directories under that and create > files for vm image and other uses (like meta data). > > Is it possible to modify this behavior in VDSM to use flat structure > instead of creating directories and VM images and other files underneath > it? ie for GlusterFS domain with BD xlator VDSM will not create any > directory and only creates all required files under the mount point > directory itself. From your description I think that the GlusterFS for block devices is actually more similar to what happens with the regular block domains. You should probably need to mount the share somewhere in the system and then use symlinks to point to the volumes. Create a regular block domain and look inside /rhev/data-center/mnt/blockSD, you'll probably get the idea of what I mean. That said we'd need to come up with a way of extending the LVs on the gluster server when required (for thin provisioning). -- Federico ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] [RFC] GlusterFS domain specific changes
Hello any suggestion/feedback on this? On Wed, 25 Jul 2012 16:56:15 +0530, "M. Mohan Kumar" wrote: > > We are developing a GlusterFS server translator to export block devices > as regular files to the client. Using block devices to serve VM images > gives performance improvements, since it avoids some file system > bottlenecks in the host kernel. Goal is to use one block device(ie file > at the client side) per VM image and feed this file to QEMU to get the > performance improvements. QEMU will talk to glusterfs server directly > using libgfapi. > > Currently we support only exporting Volume groups and Logical > Volumes. Logical volumes are exported as regular files to the client. In > GlusterFS terminology a volume capable of exporting block devices is > created by specifying the 'Volume Group' (ie VG in Logical Volume > management). Block Device translator(BD xlator) exports this volume > group as a directory and LVs under it as regular files. In the gluster > mount point creating a file results in creating a logical volume, > removing a file results in removing logical volume etc. > > When a GlusterFS volume enabled with BD xlator is used, directory > creation in that gluster mount path is not supported because directory > maps to Volume groups in BD xlator. But it could be an issue in VDSM > environment when a new VDSM volume is created for GlusterFS domain, VDSM > mounts the storage domain and creates directories under that and create > files for vm image and other uses (like meta data). > > Is it possible to modify this behavior in VDSM to use flat structure > instead of creating directories and VM images and other files underneath > it? ie for GlusterFS domain with BD xlator VDSM will not create any > directory and only creates all required files under the mount point > directory itself. > > Note: > Patches to enable exporting block devices as regular files are available > in Gluster Gerrit system > http://review.gluster.com/3551 > > ___ > vdsm-devel mailing list > vdsm-devel@lists.fedorahosted.org > https://fedorahosted.org/mailman/listinfo/vdsm-devel ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel