Hello! You scheme is fine but you can't divide I/O load with cgroup blkio (ioprio/iolimit/iopslimit) between different folders but between different ZVOL you do.
I could imagine following problems for per folder scheme: 1) Can't limit number of inodes in different folders (but there are not an inode limit for ZFS like ext4 but bug amount of files in container could broke node; http://serverfault.com/questions/503658/can-you-set-inode-quotas-in-zfs) 2) Problems with system cache which used by all containers in HWN together 3) Problems with live migration because you _should_ change inode numbers on diffferent nodes 4) ZFS behaviour with linux software in some cases is very STRANGE (DIRECT_IO) 5) ext4 has good support from vzctl (fsck, resize2fs) My ideas like simfs vs ploop comparison: http://openvz.org/images/f/f3/Ct_in_a_file.pdf On Thu, Jul 10, 2014 at 12:06 PM, Pavel Snajdr <li...@snajpa.net> wrote: > On 07/09/2014 06:58 PM, Kir Kolyshkin wrote: >> On 07/08/2014 11:54 PM, Pavel Snajdr wrote: >>> On 07/08/2014 07:52 PM, Scott Dowdle wrote: >>>> Greetings, >>>> >>>> ----- Original Message ----- >>>>> (offtopic) We can not use ZFS. Unfortunately, NAS with something like >>>>> Nexenta is to expensive for us. >>>> From what I've gathered from a few presentations, ZFS on Linux >>>> (http://zfsonlinux.org/) is as stable but more performant than it is on >>>> the OpenSolaris forks... so you can build your own if you can spare the >>>> people to learn the best practices. >>>> >>>> I don't have a use for ZFS myself so I'm not really advocating it. >>>> >>>> TYL, >>>> >>> Hi all, >>> >>> we run tens of OpenVZ nodes (bigger boxes, 256G RAM, 12cores+, 90 CTs at >>> least). We've used to run ext4+flashcache, but ext4 has proven to be a >>> bottleneck. That was the primary motivation behind ploop as far as I know. >>> >>> We've switched to ZFS on Linux around the time Ploop was announced and I >>> didn't have second thoughts since. ZFS really *is* in my experience the >>> best filesystem there is at the moment for this kind of deployment - >>> especially if you use dedicated SSDs for ZIL and L2ARC, although the >>> latter is less important. You will know what I'm talking about when you >>> try this on boxes with lots of CTs doing LAMP load - databases and their >>> synchronous writes are the real problem, which ZFS with dedicated ZIL >>> device solves. >>> >>> Also there is the ARC caching, which is smarter then linux VFS cache - >>> we're able to achieve about 99% of hitrate at about 99% of the time, >>> even under high loads. >>> >>> Having said all that, I recommend everyone to give ZFS a chance, but I'm >>> aware this is yet another out-of-mainline code and that doesn't suit >>> everyone that well. >>> >> >> Are you using per-container ZVOL or something else? > > That would mean I'd need to do another filesystem on top of ZFS, which > would in turn mean I'd add another unnecessary layer of indirection. ZFS > is a pooled storage like BTRFS is, we're giving one dataset to each > container. > > vzctl tries to move the VE_PRIVATE folder around, so we had to add one > more directory to put the VE_PRIVATE data into (see the first ls). > > Example from production: > > [r...@node2.prg.vpsfree.cz] > ~ # zpool status vz > pool: vz > state: ONLINE > scan: scrub repaired 0 in 1h24m with 0 errors on Tue Jul 8 16:22:17 2014 > config: > > NAME STATE READ WRITE CKSUM > vz ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > sda ONLINE 0 0 0 > sdb ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > sde ONLINE 0 0 0 > sdf ONLINE 0 0 0 > mirror-2 ONLINE 0 0 0 > sdg ONLINE 0 0 0 > sdh ONLINE 0 0 0 > logs > mirror-3 ONLINE 0 0 0 > sdc3 ONLINE 0 0 0 > sdd3 ONLINE 0 0 0 > cache > sdc5 ONLINE 0 0 0 > sdd5 ONLINE 0 0 0 > > errors: No known data errors > > [r...@node2.prg.vpsfree.cz] > ~ # zfs list > NAME USED AVAIL REFER MOUNTPOINT > vz 432G 2.25T 36K /vz > vz/private 427G 2.25T 111K /vz/private > vz/private/101 17.7G 42.3G 17.7G /vz/private/101 > <snip> > vz/root 104K 2.25T 104K /vz/root > vz/template 5.38G 2.25T 5.38G /vz/template > > [r...@node2.prg.vpsfree.cz] > ~ # zfs get compressratio vz/private/101 > NAME PROPERTY VALUE SOURCE > vz/private/101 compressratio 1.38x - > > [r...@node2.prg.vpsfree.cz] > ~ # ls /vz/private/101 > private > > [r...@node2.prg.vpsfree.cz] > ~ # ls /vz/private/101/private/ > aquota.group aquota.user b bin boot dev etc git home lib > <snip> > > [r...@node2.prg.vpsfree.cz] > ~ # cat /etc/vz/conf/101.conf | grep -P "PRIVATE|ROOT" > VE_ROOT="/vz/root/101" > VE_PRIVATE="/vz/private/101/private" > > >> _______________________________________________ >> Users mailing list >> Users@openvz.org >> https://lists.openvz.org/mailman/listinfo/users >> > > _______________________________________________ > Users mailing list > Users@openvz.org > https://lists.openvz.org/mailman/listinfo/users -- Sincerely yours, Pavel Odintsov _______________________________________________ Users mailing list Users@openvz.org https://lists.openvz.org/mailman/listinfo/users