Hi Matt, James, any thoughts on the below notes?
Best Regards, Prema On 19 Jan 2016 20:47, "Premysl Kouril" <[email protected]> wrote: > Hi James, > > > > > > You still haven't answered Anita's question: when you say "sponsor" do > > you mean provide resources to existing developers to work on your > > feature or provide new developers. > > > > I did, I am copy-pasting my response to Anita here again: > > Both. We are first trying this "Are you asking for current Nova > developers to work on this feature?" and if we won't find anybody we > will start with "your company interested in having your developers > interact with Nova developers" > > > > > > Heh, this is history repeating itself from over a decade ago when > > Oracle would have confidently told you that Linux had to have raw > > devices because that's the only way a database will perform. Fast > > forward to today and all oracle databases use file backends. > > > > Simplicity is also in the eye of the beholder. LVM has a very simple > > naming structure whereas filesystems have complex hierarchical ones. > > Once you start trying to scale to millions of instances, you'll find > > there's quite a management penalty for the LVM simplicity. > > We won't definitely have millions instances on hypervisors but we can > certainly have applications demanding million IOPS (in sum) from > hypervisor in near future. > > > > >> It seems from our benchmarks that LVM behavior when > >> processing many IOPs (10s of thousands) is more stable than if > >> filesystem is used as backend. > > > > It sounds like you haven't enabled directio here ... that was the > > solution to the oracle issue. > > > If you mean O_DIRECT mode then we had than one during our benchmarks. > Here is our benchmark setup and results: > > testing box configuration: > > CPU: 4x E7-8867 v3 (total of 64 physical cores) > RAM: 1TB > Storage: 12x enteprise class SSD disks (each disk 140 000/120 000 > IOPS read/write) > disks connected via 12Gb/s SAS3 lanes > > So we are using big boxes which can run quite a lot of VMs. > > Out of the disks we create linux md raid (we did raid5 and raid10) > and do some fine tuning: > > 1) echo 8 > /sys/block/md127/md/group_thread_cnt - this increases > parallelism for raid5 > 2) we boot kernel with scsi_mod.use_blk_mq=Y to active block io multi > queueing > 3) we increase size of caching (for raid5) > > On that raid we either create LVM group or filesystem depending if we > are testing LVM nova backend or file-based nova backend. > > > On this hypervisor we run nova/kvm and we provision 10-20 VMs and we > run benchmark tests from these VMs and we are trying to saturate IO on > hypervisor. > > We use following command running inside the VMs: > > fio --randrepeat=1 --ioengine=libaio --direct=1 -gtod_reduce=1 > --name=test1 --bs=4k --iodepth=256 --size=20G --numjobs=1 > --readwrite=randwrite > > So you can see that in the guest OS we use --direct=1 which causes the > test file to be opened with O_DIRECT. Actually I am now not sure but > if using file-based backend then I hope that the virtual disk is > automatically opened with O_DIRECT and that it is done by libvirt/qemu > by default without any explicit configuration. > > Anyway, with this we have following results: > > If we use file-based backend in Nova, ext4 filesystem and RAID5 then > in 8 parallel VMs we were able to achieve ~3000 IOPS per machine which > means in total about 32000 IOPS. > > If we use LVM-based backend,RAID5, 8 parallel VMs, we achieve ~11000 > IOPS per machine, in total about 90000 IOPS. > > This is a significant difference. > > This test was done about half a year ago by one of our engineers who > no longer works for us but we still do have the box and everything, so > if community is interested I can re-run the tests, again validate > results, do any reconfiguration etc. > > > > > And this was precisely the Oracle argument. The reason it foundered is > > that most FS complexity goes to manage the data structures ... the I/O > > path can still be made short and fast, as DirectIO demonstrates. Then > > the management penalty you pay (having to manage all the data > > structures that the filesystem would have managed for you) starts to > > outweigh any minor performance advantages. > > The only thing O_DIRECT does is that it instructs the kernel to skip > filesystem cache for the file opened in this mode. Rest of the > filesystem complexity remains in the IO's datapath. Note for example - > we did a test on file-based backend with BTRFS - results were > absolutely horrible - there's just too much stuff filesystem has to do > when processing IOs and we believe a lot of it is just not necessary > when the storage is actually used to only store virtual disks. > > Anyway, I am really glad that you brought these views, we are happy to > reconsider our decisions so let's have a discussion I am sure we > missed many things when we were evaluating both backends. > > One more question: What about the Cinder? I think they are using LVM > for storing volumes, right? Why they don't use files? > > Thanks, > Prema >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
