On Fri, Apr 18, 2014 at 10:52 PM, lihuiba <magazine.lihu...@163.com> wrote: >>btw, I see but at the moment we had fixed it by network interface >>device driver instead of workaround - to limit network traffic slow >>down. > Which kind of driver, in host kernel, in guest kernel or in openstack? >
In compute host kernel, doesn't related with OpenStack. > > >>There are few works done in Glance >>(https://blueprints.launchpad.net/glance/+spec/glance-cinder-driver ), >>but some work still need to be taken I'm sure. There are something on >>drafting, and some dependencies need to be resolved as well. > I read the blueprints carefully, but still have some doubts. > Will it store an image as a single volume in cinder? Or store all image Yes > files > in one shared volume (with a file system on the volume, of course)? > Openstack already has support to convert an image to a volume, and to boot > from a volume. Are these features similar to this blueprint? Not similar but it could be leverage for this case. > I prefer to talk this details in IRC. (And I had read all VMThunder code at today early (my timezone), there are some questions from me as well) zhiyan > > Huiba Li > > National Key Laboratory for Parallel and Distributed > Processing, College of Computer Science, National University of Defense > Technology, Changsha, Hunan Province, P.R. China > 410073 > > > At 2014-04-18 12:14:25,"Zhi Yan Liu" <lzy....@gmail.com> wrote: >>On Fri, Apr 18, 2014 at 10:53 AM, lihuiba <magazine.lihu...@163.com> wrote: >>>>It's not 100% true, in my case at last. We fixed this problem by >>>>network interface driver, it causes kernel panic and readonly issues >>>>under heavy networking workload actually. >>> >>> Network traffic control could help. The point is to ensure no instance >>> is starved to death. Traffic control can be done with tc. >>> >> >>btw, I see but at the moment we had fixed it by network interface >>device driver instead of workaround - to limit network traffic slow >>down. >> >>> >>> >>>>btw, we are doing some works to make Glance to integrate Cinder as a >>>>unified block storage >>> backend. >>> That sounds interesting. Is there some more materials? >>> >> >>There are few works done in Glance >>(https://blueprints.launchpad.net/glance/+spec/glance-cinder-driver ), >>but some work still need to be taken I'm sure. There are something on >>drafting, and some dependencies need to be resolved as well. >> >>> >>> >>> At 2014-04-18 06:05:23,"Zhi Yan Liu" <lzy....@gmail.com> wrote: >>>>Replied as inline comments. >>>> >>>>On Thu, Apr 17, 2014 at 9:33 PM, lihuiba <magazine.lihu...@163.com> >>>> wrote: >>>>>>IMO we'd better to use backend storage optimized approach to access >>>>>>remote image from compute node instead of using iSCSI only. And from >>>>>>my experience, I'm sure iSCSI is short of stability under heavy I/O >>>>>>workload in product environment, it could causes either VM filesystem >>>>>>to be marked as readonly or VM kernel panic. >>>>> >>>>> Yes, in this situation, the problem lies in the backend storage, so no >>>>> other >>>>> >>>>> protocol will perform better. However, P2P transferring will greatly >>>>> reduce >>>>> >>>>> workload on the backend storage, so as to increase responsiveness. >>>>> >>>> >>>>It's not 100% true, in my case at last. We fixed this problem by >>>>network interface driver, it causes kernel panic and readonly issues >>>>under heavy networking workload actually. >>>> >>>>> >>>>> >>>>>>As I said currently Nova already has image caching mechanism, so in >>>>>>this case P2P is just an approach could be used for downloading or >>>>>>preheating for image caching. >>>>> >>>>> Nova's image caching is file level, while VMThunder's is block-level. >>>>> And >>>>> >>>>> VMThunder is for working in conjunction with Cinder, not Glance. >>>>> VMThunder >>>>> >>>>> currently uses facebook's flashcache to realize caching, and dm-cache, >>>>> >>>>> bcache are also options in the future. >>>>> >>>> >>>>Hm if you say bcache, dm-cache and flashcache, I'm just thinking if >>>>them could be leveraged by operation/best-practice level. >>>> >>>>btw, we are doing some works to make Glance to integrate Cinder as a >>>>unified block storage backend. >>>> >>>>> >>>>>>I think P2P transferring/pre-caching sounds a good way to go, as I >>>>>>mentioned as well, but actually for the area I'd like to see something >>>>>>like zero-copy + CoR. On one hand we can leverage the capability of >>>>>>on-demand downloading image bits by zero-copy approach, on the other >>>>>>hand we can prevent to reading data from remote image every time by >>>>>>CoR. >>>>> >>>>> Yes, on-demand transferring is what you mean by "zero-copy", and >>>>> caching >>>>> is something close to CoR. In fact, we are working on a kernel module >>>>> called >>>>> foolcache that realize a true CoR. See >>>>> https://github.com/lihuiba/dm-foolcache. >>>>> >>>> >>>>Yup. And it's really interesting to me, will take a look, thanks for >>>> sharing. >>>> >>>>> >>>>> >>>>> >>>>> National Key Laboratory for Parallel and Distributed >>>>> Processing, College of Computer Science, National University of Defense >>>>> Technology, Changsha, Hunan Province, P.R. China >>>>> 410073 >>>>> >>>>> >>>>> At 2014-04-17 17:11:48,"Zhi Yan Liu" <lzy....@gmail.com> wrote: >>>>>>On Thu, Apr 17, 2014 at 4:41 PM, lihuiba <magazine.lihu...@163.com> >>>>>> wrote: >>>>>>>>IMHO, zero-copy approach is better >>>>>>> VMThunder's "on-demand transferring" is the same thing as your >>>>>>> "zero-copy >>>>>>> approach". >>>>>>> VMThunder is uses iSCSI as the transferring protocol, which is option >>>>>>> #b >>>>>>> of >>>>>>> yours. >>>>>>> >>>>>> >>>>>>IMO we'd better to use backend storage optimized approach to access >>>>>>remote image from compute node instead of using iSCSI only. And from >>>>>>my experience, I'm sure iSCSI is short of stability under heavy I/O >>>>>>workload in product environment, it could causes either VM filesystem >>>>>>to be marked as readonly or VM kernel panic. >>>>>> >>>>>>> >>>>>>>>Under #b approach, my former experience from our previous similar >>>>>>>>Cloud deployment (not OpenStack) was that: under 2 PC server storage >>>>>>>>nodes (general *local SAS disk*, without any storage backend) + >>>>>>>>2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning >>>>>>>> 500 >>>>>>>>VMs in a minute. >>>>>>> suppose booting one instance requires reading 300MB of data, so 500 >>>>>>> ones >>>>>>> require 150GB. Each of the storage server needs to send data at a >>>>>>> rate >>>>>>> of >>>>>>> 150GB/2/60 = 1.25GB/s on average. This is absolutely a heavy burden >>>>>>> even >>>>>>> for high-end storage appliances. In production systems, this request >>>>>>> (booting >>>>>>> 500 VMs in one shot) will significantly disturb other running >>>>>>> instances >>>>>>> accessing the same storage nodes. >>>>>>> >>>> >>>>btw, I believe the case/numbers is not true as well, since remote >>>>image bits could be loaded on-demand instead of load them all on boot >>>>stage. >>>> >>>>zhiyan >>>> >>>>>>> VMThunder eliminates this problem by P2P transferring and >>>>>>> on-compute-node >>>>>>> caching. Even a pc server with one 1gb NIC (this is a true pc >>>>>>> server!) >>>>>>> can >>>>>>> boot >>>>>>> 500 VMs in a minute with ease. For the first time, VMThunder makes >>>>>>> bulk >>>>>>> provisioning of VMs practical for production cloud systems. This is >>>>>>> the >>>>>>> essential >>>>>>> value of VMThunder. >>>>>>> >>>>>> >>>>>>As I said currently Nova already has image caching mechanism, so in >>>>>>this case P2P is just an approach could be used for downloading or >>>>>>preheating for image caching. >>>>>> >>>>>>I think P2P transferring/pre-caching sounds a good way to go, as I >>>>>>mentioned as well, but actually for the area I'd like to see something >>>>>>like zero-copy + CoR. On one hand we can leverage the capability of >>>>>>on-demand downloading image bits by zero-copy approach, on the other >>>>>>hand we can prevent to reading data from remote image every time by >>>>>>CoR. >>>>>> >>>>>>zhiyan >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> =================================================== >>>>>>> From: Zhi Yan Liu <lzy....@gmail.com> >>>>>>> Date: 2014-04-17 0:02 GMT+08:00 >>>>>>> Subject: Re: [openstack-dev] [Nova][blueprint] Accelerate the booting >>>>>>> process of a number of vms via VMThunder >>>>>>> To: "OpenStack Development Mailing List (not for usage questions)" >>>>>>> <openstack-dev@lists.openstack.org> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hello Yongquan Fu, >>>>>>> >>>>>>> My thoughts: >>>>>>> >>>>>>> 1. Currently Nova has already supported image caching mechanism. It >>>>>>> could caches the image on compute host which VM had provisioning from >>>>>>> it before, and next provisioning (boot same image) doesn't need to >>>>>>> transfer it again only if cache-manger clear it up. >>>>>>> 2. P2P transferring and prefacing is something that still based on >>>>>>> copy mechanism, IMHO, zero-copy approach is better, even >>>>>>> transferring/prefacing could be optimized by such approach. (I have >>>>>>> not check "on-demand transferring" of VMThunder, but it is a kind of >>>>>>> transferring as well, at last from its literal meaning). >>>>>>> And btw, IMO, we have two ways can go follow zero-copy idea: >>>>>>> a. when Nova and Glance use same backend storage, we could use >>>>>>> storage >>>>>>> special CoW/snapshot approach to prepare VM disk instead of >>>>>>> copy/transferring image bits (through HTTP/network or local copy). >>>>>>> b. without "unified" storage, we could attach volume/LUN to compute >>>>>>> node from backend storage as a base image, then do such CoW/snapshot >>>>>>> on it to prepare root/ephemeral disk of VM. This way just like >>>>>>> boot-from-volume but different is that we do CoW/snapshot on Nova >>>>>>> side >>>>>>> instead of Cinder/storage side. >>>>>>> >>>>>>> For option #a, we have already got some progress: >>>>>>> https://blueprints.launchpad.net/nova/+spec/image-multiple-location >>>>>>> https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler >>>>>>> >>>>>>> https://blueprints.launchpad.net/nova/+spec/vmware-clone-image-handler >>>>>>> >>>>>>> Under #b approach, my former experience from our previous similar >>>>>>> Cloud deployment (not OpenStack) was that: under 2 PC server storage >>>>>>> nodes (general *local SAS disk*, without any storage backend) + >>>>>>> 2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning >>>>>>> 500 >>>>>>> VMs in a minute. >>>>>>> >>>>>>> For vmThunder topic I think it sounds a good idea, IMO P2P, prefacing >>>>>>> is one of optimized approach for image transferring valuably. >>>>>>> >>>>>>> zhiyan >>>>>>> >>>>>>> On Wed, Apr 16, 2014 at 9:14 PM, yongquan Fu <quanyo...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> Dear all, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> We would like to present an extension to the vm-booting >>>>>>>> functionality >>>>>>>> of >>>>>>>> Nova when a number of homogeneous vms need to be launched at the >>>>>>>> same >>>>>>>> time. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The motivation for our work is to increase the speed of provisioning >>>>>>>> vms >>>>>>>> for >>>>>>>> large-scale scientific computing and big data processing. In that >>>>>>>> case, >>>>>>>> we >>>>>>>> often need to boot tens and hundreds virtual machine instances at >>>>>>>> the >>>>>>>> same >>>>>>>> time. >>>>>>>> >>>>>>>> >>>>>>>> Currently, under the Openstack, we found that creating a large >>>>>>>> number >>>>>>>> of >>>>>>>> virtual machine instances is very time-consuming. The reason is the >>>>>>>> booting >>>>>>>> procedure is a centralized operation that involve performance >>>>>>>> bottlenecks. >>>>>>>> Before a virtual machine can be actually started, OpenStack either >>>>>>>> copy >>>>>>>> the >>>>>>>> image file (swift) or attach the image volume (cinder) from storage >>>>>>>> server >>>>>>>> to compute node via network. Booting a single VM need to read a >>>>>>>> large >>>>>>>> amount >>>>>>>> of image data from the image storage server. So creating a large >>>>>>>> number >>>>>>>> of >>>>>>>> virtual machine instances would cause a significant workload on the >>>>>>>> servers. >>>>>>>> The servers become quite busy even unavailable during the deployment >>>>>>>> phase. >>>>>>>> It would consume a very long time before the whole virtual machine >>>>>>>> cluster >>>>>>>> useable. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Our extension is based on our work on vmThunder, a novel mechanism >>>>>>>> accelerating the deployment of large number virtual machine >>>>>>>> instances. >>>>>>>> It >>>>>>>> is >>>>>>>> written in Python, can be integrated with OpenStack easily. >>>>>>>> VMThunder >>>>>>>> addresses the problem described above by following improvements: >>>>>>>> on-demand >>>>>>>> transferring (network attached storage), compute node caching, P2P >>>>>>>> transferring and prefetching. VMThunder is a scalable and >>>>>>>> cost-effective >>>>>>>> accelerator for bulk provisioning of virtual machines. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> We hope to receive your feedbacks. Any comments are extremely >>>>>>>> welcome. >>>>>>>> Thanks in advance. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> PS: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> VMThunder enhanced nova blueprint: >>>>>>>> https://blueprints.launchpad.net/nova/+spec/thunderboost >>>>>>>> >>>>>>>> VMThunder standalone project: https://launchpad.net/vmthunder; >>>>>>>> >>>>>>>> VMThunder prototype: https://github.com/lihuiba/VMThunder >>>>>>>> >>>>>>>> VMThunder etherpad: https://etherpad.openstack.org/p/vmThunder >>>>>>>> >>>>>>>> VMThunder portal: http://www.vmthunder.org/ >>>>>>>> >>>>>>>> VMThunder paper: >>>>>>>> http://www.computer.org/csdl/trans/td/preprint/06719385.pdf >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Regards >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> vmThunder development group >>>>>>>> >>>>>>>> PDL >>>>>>>> >>>>>>>> National University of Defense Technology >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> OpenStack-dev mailing list >>>>>>>> OpenStack-dev@lists.openstack.org >>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> OpenStack-dev mailing list >>>>>>> OpenStack-dev@lists.openstack.org >>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Yongquan Fu >>>>>>> PhD, Assistant Professor, >>>>>>> National Key Laboratory for Parallel and Distributed >>>>>>> Processing, College of Computer Science, National University of >>>>>>> Defense >>>>>>> Technology, Changsha, Hunan Province, P.R. China >>>>>>> 410073 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> OpenStack-dev mailing list >>>>>>> OpenStack-dev@lists.openstack.org >>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>>>> >>>>>> >>>>>>_______________________________________________ >>>>>>OpenStack-dev mailing list >>>>>>OpenStack-dev@lists.openstack.org >>>>>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>> >>>>> >>>>> _______________________________________________ >>>>> OpenStack-dev mailing list >>>>> OpenStack-dev@lists.openstack.org >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>> >>>> >>>>_______________________________________________ >>>>OpenStack-dev mailing list >>>>OpenStack-dev@lists.openstack.org >>>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> >>_______________________________________________ >>OpenStack-dev mailing list >>OpenStack-dev@lists.openstack.org >>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev