Re: [Users] flashcache

2016-01-15 Thread CoolCold
Let's bump thread again - has anyone tried dm-cache on openvz/centos 6
kernels? looks like some support is included:
root@mu2:~# fgrep CONFIG_DM_CACHE /boot/config-2.6.32-042stab112.15-el6-openvz
CONFIG_DM_CACHE=m
CONFIG_DM_CACHE_MQ=m
CONFIG_DM_CACHE_CLEANER=m


From userspace utils view, I'm using Debian 8 as host, so should have
some support, but not sure is dm-cache itself considered as somehow
stable at all.

On Mon, Nov 16, 2015 at 5:21 PM, Nick Knutov  wrote:
> I'v heard this from large VPS hosting provider.
>
> Anyway, even our intenal projects require more then 100MB/s in peak and more
> then 100 GB of storage (while only 100 GB are free). So local SSDs are
> cheaper for us, then 10G network and commercial version of pstorage.
>
> 16.11.2015 15:22, Corrado Fiore пишет:
>
>> Hi Nick,
>>
>> could you elaborate more on the second point?  As far as I understood,
>> pstorage is in fact targeted towards clusters with hundreds of containers,
>> so I am a bit curious to understand where you got that information.
>>
>> If there's anyone on the list that has used pstorage in clusters > 7 - 9
>> nodes and wishes to share his or her experience, that's more than welcome.
>>
>> Thanks,
>> Corrado
>>
>> 
>> On 16/11/2015, at 4:44 AM, Nick Knutov wrote:
>>
>>> Unfortunately, pstorage has two major disadvantages:
>>>
>>> 1) it's not free
>>> 2) it not usable for more then 1-4 CT over 1 gigabit network in real
>>> world cases (as far as I know)
>>
>>
>> ___
>> Users mailing list
>> Users@openvz.org
>> https://lists.openvz.org/mailman/listinfo/users
>
>
> --
> Best Regards,
> Nick Knutov
> http://knutov.com
> ICQ: 272873706
> Voice: +7-904-84-23-130
>
> ___
> Users mailing list
> Users@openvz.org
> https://lists.openvz.org/mailman/listinfo/users



-- 
Best regards,
[COOLCOLD-RIPN]

___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2015-11-16 Thread Alexander Kirov
Hello,

CT's requirements in I/O depend on an user application. 1 Gbit link is enough 
to supply up to 100 MB/s I/O in PStorage. 
Moreover according our statistic HSPs usually have 10-20 MB/s I/O per node, 
running tens of containers, because I/O is almost random. So, 1G network should 
be enough for usual scenarios.
We have many customers with 1G storage backend in production.

BTW, with PStorage you have an additional benefit:
According our statistic 20% of nodes process 80% of I/O in DCs. When you unite 
disks into one cluster you have better I/O balance. 

Thanks,
Alexander Kirov
Odin Virtuozzo Storage, PM
Odin


-Original Message-
From: users-boun...@openvz.org [mailto:users-boun...@openvz.org] On Behalf Of 
Corrado Fiore
Sent: Monday, November 16, 2015 1:22 PM
To: OpenVZ users <users@openvz.org>
Subject: Re: [Users] flashcache

Hi Nick,

could you elaborate more on the second point?  As far as I understood, pstorage 
is in fact targeted towards clusters with hundreds of containers, so I am a bit 
curious to understand where you got that information.

If there's anyone on the list that has used pstorage in clusters > 7 - 9 nodes 
and wishes to share his or her experience, that's more than welcome.

Thanks,
Corrado


On 16/11/2015, at 4:44 AM, Nick Knutov wrote:

> Unfortunately, pstorage has two major disadvantages:
> 
> 1) it's not free
> 2) it not usable for more then 1-4 CT over 1 gigabit network in real world 
> cases (as far as I know)


___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users

___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2015-11-16 Thread Nick Knutov

I'v heard this from large VPS hosting provider.

Anyway, even our intenal projects require more then 100MB/s in peak and 
more then 100 GB of storage (while only 100 GB are free). So local SSDs 
are cheaper for us, then 10G network and commercial version of pstorage.


16.11.2015 15:22, Corrado Fiore пишет:

Hi Nick,

could you elaborate more on the second point?  As far as I understood, pstorage 
is in fact targeted towards clusters with hundreds of containers, so I am a bit 
curious to understand where you got that information.

If there's anyone on the list that has used pstorage in clusters > 7 - 9 nodes 
and wishes to share his or her experience, that's more than welcome.

Thanks,
Corrado


On 16/11/2015, at 4:44 AM, Nick Knutov wrote:


Unfortunately, pstorage has two major disadvantages:

1) it's not free
2) it not usable for more then 1-4 CT over 1 gigabit network in real world 
cases (as far as I know)


___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


--
Best Regards,
Nick Knutov
http://knutov.com
ICQ: 272873706
Voice: +7-904-84-23-130

___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2015-11-16 Thread Konstantin Khorenko

Hi guys,

i'm not sure about flashcache 3.x (if anybody used it and thus it had been ever 
compilable against OpenVZ kernels),
but for flashcache 2.x i know for sure that it compiled ok several months ago =>
if something got broken now it is most probably some simple issue.

So i suggest
1) file issues to bugs.openvz.org
2) try to fix it. :)
   At least for flashcache 2.x it should not be a big deal.

Anyway in case 1) is done there can be someone who could check it/try to get it 
working.
No issues in jira - chances are much lower.

Hope that helps.

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 11/13/2015 03:37 PM, Nick Knutov wrote:


No. Even 2.x flashcashe is not possible to compile with recent openvz
rhel6 kernels.


13.11.2015 15:57, CoolCold пишет:

Bumping up - anyone still on flashcache & openvz kernels? Tried to
compile flashcache 3.1.3 dkms against 2.6.32-042stab112.15 , getting
errors:



___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2015-11-16 Thread Corrado Fiore
Hi Nick,

could you elaborate more on the second point?  As far as I understood, pstorage 
is in fact targeted towards clusters with hundreds of containers, so I am a bit 
curious to understand where you got that information.

If there's anyone on the list that has used pstorage in clusters > 7 - 9 nodes 
and wishes to share his or her experience, that's more than welcome.

Thanks,
Corrado


On 16/11/2015, at 4:44 AM, Nick Knutov wrote:

> Unfortunately, pstorage has two major disadvantages:
> 
> 1) it's not free
> 2) it not usable for more then 1-4 CT over 1 gigabit network in real world 
> cases (as far as I know)


___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2015-11-15 Thread Nick Knutov

Unfortunately, pstorage has two major disadvantages:

1) it's not free
2) it not usable for more then 1-4 CT over 1 gigabit network in real 
world cases (as far as I know)


14.11.2015 16:12, Corrado Fiore пишет:

You might want to use Odin Cloud Storage (pstorage) instead, as it goes beyond 
SSD acceleration, i.e. it is distributed and it offers file system corruption 
prevention (background scrubbing).


--
Best Regards,
Nick Knutov
http://knutov.com
ICQ: 272873706
Voice: +7-904-84-23-130

___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2015-11-14 Thread Corrado Fiore
Hi,

even if FlashCache compiled correctly, I would suggest you to not use it as the 
performance will most likely be sub-optimal (at least in my experience).

You might want to use Odin Cloud Storage (pstorage) instead, as it goes beyond 
SSD acceleration, i.e. it is distributed and it offers file system corruption 
prevention (background scrubbing).

Another alternative would be to use Btier (www.lessfs.com).  It's been 
extremely stable and very fast in our experience.

Best,
Corrado Fiore


On 13/11/2015, at 8:37 PM, Nick Knutov wrote:

> 
> No. Even 2.x flashcashe is not possible to compile with recent openvz rhel6 
> kernels.
> 
> 
> 13.11.2015 15:57, CoolCold пишет:
>> Bumping up - anyone still on flashcache & openvz kernels? Tried to
>> compile flashcache 3.1.3 dkms against 2.6.32-042stab112.15 , getting
>> errors:
> 
> -- 
> Best Regards,
> Nick Knutov
> http://knutov.com
> ICQ: 272873706
> Voice: +7-904-84-23-130
> 
> ___
> Users mailing list
> Users@openvz.org
> https://lists.openvz.org/mailman/listinfo/users


___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2015-11-13 Thread CoolCold
Bumping up - anyone still on flashcache & openvz kernels? Tried to
compile flashcache 3.1.3 dkms against 2.6.32-042stab112.15 , getting
errors:

DKMS make.log for flashcache-1.0-227-gc0eeb3d1e539 for kernel
2.6.32-042stab112.15-el6-openvz (x86_64)
Fri Nov 13 13:56:24 MSK 2015
make[1]: Entering directory
'/var/lib/dkms/flashcache/1.0-227-gc0eeb3d1e539/build'
grep: /etc/redhat-release: No such file or directory
make -C /lib/modules/2.6.32-042stab112.15-el6-openvz/build
M=/var/lib/dkms/flashcache/1.0-227-gc0eeb3d1e539/build modules V=0
make[2]: Entering directory
'/usr/src/linux-headers-2.6.32-042stab112.15-el6-openvz'
grep: /etc/redhat-release: No such file or directory
  CC [M]  /var/lib/dkms/flashcache/1.0-227-gc0eeb3d1e539/build/flashcache_conf.o
In file included from
/usr/src/linux-headers-2.6.32-042stab112.15-el6-openvz/arch/x86/include/asm/timex.h:5:0,
 from include/linux/timex.h:171,
 from include/linux/jiffies.h:8,
 from include/linux/ktime.h:25,
 from include/linux/timer.h:5,
 from include/linux/workqueue.h:8,
 from include/linux/mmzone.h:19,
 from include/linux/gfp.h:4,
 from include/linux/kmod.h:22,
 from include/linux/module.h:13,
 from
/var/lib/dkms/flashcache/1.0-227-gc0eeb3d1e539/build/flashcache_conf.c:26:
/usr/src/linux-headers-2.6.32-042stab112.15-el6-openvz/arch/x86/include/asm/tsc.h:
In function ‘vget_cycles’:
/usr/src/linux-headers-2.6.32-042stab112.15-el6-openvz/arch/x86/include/asm/tsc.h:45:2:
error: implicit declaration of function ‘__native_read_tsc’
[-Werror=implicit-function-declaration]
  return (cycles_t)__native_read_tsc();
  ^
In file included from include/linux/sched.h:72:0,
 from include/linux/kmod.h:28,
 from include/linux/module.h:13,
 from
/var/lib/dkms/flashcache/1.0-227-gc0eeb3d1e539/build/flashcache_conf.c:26:
include/linux/signal.h: In function ‘sigaddset’:
include/linux/signal.h:41:6: error: ‘_NSIG_WORDS’ undeclared (first
use in this function)
  if (_NSIG_WORDS == 1)
...

On Fri, Jul 11, 2014 at 4:34 AM, Nick Knutov  wrote:
> I think you are speaking here about different cases.
>
> One is making HA backup node. When we are backing up full node to
> another node (1:1) - zfs send/receive is much better (and the goal is to
> save data, not running processes). Without zfs - ploop snapshotting and
> vzmigrate is good enough (over SSD), and rsync with ext4 (simfs inside
> CT) is really pain.
>
> The other case is migrating large amount of CTs over large amount of
> nodes for resource usage balancing [with zero downtime]. There is no
> alternatives to vzmigrate here although zfs send/receive with
> per-container ZVOL can speed up this process [if it's important to
> transfer between nodes faster with less network usage]
>
> 10.07.2014 15:35, Pavel Odintsov пишет:
>>> Why? ZFS send/receive is able to do bit-by-bit identical copy of the FS,
>>> >I thought the point of migration is to don't have the CT notice any
>>> >change, I don't see why the inode numbers should change.
>> Do you have really working zero downtime vzmigrate on ZFS?
>>
>
> --
> Best Regards,
> Nick Knutov
> http://knutov.com
> ICQ: 272873706
> Voice: +7-904-84-23-130
> ___
> Users mailing list
> Users@openvz.org
> https://lists.openvz.org/mailman/listinfo/users



-- 
Best regards,
[COOLCOLD-RIPN]

___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2015-11-13 Thread Nick Knutov


No. Even 2.x flashcashe is not possible to compile with recent openvz 
rhel6 kernels.



13.11.2015 15:57, CoolCold пишет:

Bumping up - anyone still on flashcache & openvz kernels? Tried to
compile flashcache 3.1.3 dkms against 2.6.32-042stab112.15 , getting
errors:


--
Best Regards,
Nick Knutov
http://knutov.com
ICQ: 272873706
Voice: +7-904-84-23-130

___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-10 Thread Pavel Snajdr
On 07/09/2014 06:58 PM, Kir Kolyshkin wrote:
 On 07/08/2014 11:54 PM, Pavel Snajdr wrote:
 On 07/08/2014 07:52 PM, Scott Dowdle wrote:
 Greetings,

 - Original Message -
 (offtopic) We can not use ZFS. Unfortunately, NAS with something like
 Nexenta is to expensive for us.
 From what I've gathered from a few presentations, ZFS on Linux 
 (http://zfsonlinux.org/) is as stable but more performant than it is on the 
 OpenSolaris forks... so you can build your own if you can spare the people 
 to learn the best practices.

 I don't have a use for ZFS myself so I'm not really advocating it.

 TYL,

 Hi all,

 we run tens of OpenVZ nodes (bigger boxes, 256G RAM, 12cores+, 90 CTs at
 least). We've used to run ext4+flashcache, but ext4 has proven to be a
 bottleneck. That was the primary motivation behind ploop as far as I know.

 We've switched to ZFS on Linux around the time Ploop was announced and I
 didn't have second thoughts since. ZFS really *is* in my experience the
 best filesystem there is at the moment for this kind of deployment  -
 especially if you use dedicated SSDs for ZIL and L2ARC, although the
 latter is less important. You will know what I'm talking about when you
 try this on boxes with lots of CTs doing LAMP load - databases and their
 synchronous writes are the real problem, which ZFS with dedicated ZIL
 device solves.

 Also there is the ARC caching, which is smarter then linux VFS cache -
 we're able to achieve about 99% of hitrate at about 99% of the time,
 even under high loads.

 Having said all that, I recommend everyone to give ZFS a chance, but I'm
 aware this is yet another out-of-mainline code and that doesn't suit
 everyone that well.

 
 Are you using per-container ZVOL or something else?

That would mean I'd need to do another filesystem on top of ZFS, which
would in turn mean I'd add another unnecessary layer of indirection. ZFS
is a pooled storage like BTRFS is, we're giving one dataset to each
container.

vzctl tries to move the VE_PRIVATE folder around, so we had to add one
more directory to put the VE_PRIVATE data into (see the first ls).

Example from production:

[r...@node2.prg.vpsfree.cz]
 ~ # zpool status vz
  pool: vz
 state: ONLINE
  scan: scrub repaired 0 in 1h24m with 0 errors on Tue Jul  8 16:22:17 2014
config:

NAMESTATE READ WRITE CKSUM
vz  ONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
sda ONLINE   0 0 0
sdb ONLINE   0 0 0
  mirror-1  ONLINE   0 0 0
sde ONLINE   0 0 0
sdf ONLINE   0 0 0
  mirror-2  ONLINE   0 0 0
sdg ONLINE   0 0 0
sdh ONLINE   0 0 0
logs
  mirror-3  ONLINE   0 0 0
sdc3ONLINE   0 0 0
sdd3ONLINE   0 0 0
cache
  sdc5  ONLINE   0 0 0
  sdd5  ONLINE   0 0 0

errors: No known data errors

[r...@node2.prg.vpsfree.cz]
 ~ # zfs list
NAME  USED  AVAIL  REFER  MOUNTPOINT
vz432G  2.25T36K  /vz
vz/private427G  2.25T   111K  /vz/private
vz/private/101   17.7G  42.3G  17.7G  /vz/private/101
snip
vz/root   104K  2.25T   104K  /vz/root
vz/template  5.38G  2.25T  5.38G  /vz/template

[r...@node2.prg.vpsfree.cz]
 ~ # zfs get compressratio vz/private/101
NAMEPROPERTY   VALUE  SOURCE
vz/private/101  compressratio  1.38x  -

[r...@node2.prg.vpsfree.cz]
 ~ # ls /vz/private/101
private

[r...@node2.prg.vpsfree.cz]
 ~ # ls /vz/private/101/private/
aquota.group  aquota.user  b  bin  boot  dev  etc  git  home  lib
snip

[r...@node2.prg.vpsfree.cz]
 ~ # cat /etc/vz/conf/101.conf | grep -P PRIVATE|ROOT
VE_ROOT=/vz/root/101
VE_PRIVATE=/vz/private/101/private


 ___
 Users mailing list
 Users@openvz.org
 https://lists.openvz.org/mailman/listinfo/users
 

___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-10 Thread Aleksandar Ivanisevic
Pavel Odintsov pavel.odint...@gmail.com
writes:

 Hello!

 Yep, Read cache is nice and safe solution but not write cache :)

 No, we do not use ZFS in production yet. We done only very specific
 tests like this: https://github.com/zfsonlinux/zfs/issues/2458 But you
 can do some performance  tests and share :)

Why is everyone insisting on ext4 and even ext4 in individual zvols? I
have done some testing with root and private directly on a zfs file
system and so far everything seems to work just fine.

What am I to expect down the road?


[...]


___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-10 Thread Pavel Odintsov
Not true, IO limits are working as they should (if we're talking vzctl
set --iolimit/--iopslimit). I've kicked the ZoL guys around to add IO
accounting support, so it is there.

You can share tests with us? For standard folders like simfs this
limits works bad in big number of cases

How? ZFS doesn't have a limit on number of files (2^48 isn't a limit really)

It's ok when your customer create 1 billion of small files on 10GB VPS
and you will try to archive it for backup? On slow disk system it's
really nightmare because a lot of disk operations which kills your
I/O.

Why? ZFS send/receive is able to do bit-by-bit identical copy of the FS,
I thought the point of migration is to don't have the CT notice any
change, I don't see why the inode numbers should change.

Do you have really working zero downtime vzmigrate on ZFS?

How exactly? I haven't seen a problem with any userspace software, other
than MySQL default setting to AIO (it fallbacks to older method), which
ZFS doesn't support (*yet*, they have it in their plans).

I speaks about MySQL primarily. I have thousands of containers and I
can tune MySQL for another mode for all customers, it's impossible.

 L2ARC cache really smart

Yep, fine, I knew. But can you account L2ARC cache usage per customer?
OpenVZ can it via flag:
sysctl -a|grep pagecache_isola
ubc.pagecache_isolation = 0

But one customer can eat almost all L2ARC cache and displace another
customers data.

I'm not agains ZFS but I'm against of usage ZFS as underlying system
for containers. We caught ~100 kernel bugs with simfs on EXT4 when
customers do some strange thinks.

But ext4 has about few thouasands developers and the fix this issues
asap but ZFS on Linux has only 3-5 developers which VERY slow.
Because of this I recommends using ext4 with ploop because this
solution is rock stable or ZFS with ZVOL's with ext4 because this
solution if more reliable and more predictable then placing ZFS
containers on ZFS volumes.


On Thu, Jul 10, 2014 at 1:08 PM, Pavel Snajdr li...@snajpa.net wrote:
 On 07/10/2014 10:34 AM, Pavel Odintsov wrote:
 Hello!

 You scheme is fine but you can't divide I/O load with cgroup blkio
 (ioprio/iolimit/iopslimit) between different folders but between
 different ZVOL you do.

 Not true, IO limits are working as they should (if we're talking vzctl
 set --iolimit/--iopslimit). I've kicked the ZoL guys around to add IO
 accounting support, so it is there.


 I could imagine following problems for per folder scheme:
 1) Can't limit number of inodes in different folders (but there are
 not an inode limit for ZFS like ext4 but bug amount of files in
 container could broke node;

 How? ZFS doesn't have a limit on number of files (2^48 isn't a limit really)

 http://serverfault.com/questions/503658/can-you-set-inode-quotas-in-zfs)
 2) Problems with system cache which used by all containers in HWN together

 This exactly isn't a problem, but a *HUGE* benefit, you'd need to see it
 in practice :) Linux VFS cache is really dumb in comparison to ARC.
 ARC's hitrates just can't be done with what linux currently offers.

 3) Problems with live migration because you _should_ change inode
 numbers on diffferent nodes

 Why? ZFS send/receive is able to do bit-by-bit identical copy of the FS,
 I thought the point of migration is to don't have the CT notice any
 change, I don't see why the inode numbers should change.

 4) ZFS behaviour with linux software in some cases is very STRANGE 
 (DIRECT_IO)

 How exactly? I haven't seen a problem with any userspace software, other
 than MySQL default setting to AIO (it fallbacks to older method), which
 ZFS doesn't support (*yet*, they have it in their plans).

 5) ext4 has good support from vzctl (fsck, resize2fs)

 Yeah, but ext4 sucks big time. At least in my use-case.

 We've implemented most of vzctl create/destroy/etc. functionality in our
 vpsAdmin software instead.

 Guys, can I ask you to keep your mind open instead of fighting with
 pointless arguments? :) Give ZFS a try and then decide for yourselves.

 I think the community would benefit greatly if ZFS woudn't be fought as
 something alien in the Linux world, which I in my experience is what
 every Linux zealot I talk to about ZFS is doing.
 This is just not fair. It's primarily about technology, primarily about
 the best tool for the job. If we can implement something like this in
 Linux but without having ties to CDDL and possibly Oracle patents, that
 would be awesome, yet nobody has done such a thing yet. BTRFS is nowhere
 near ZFS when it comes to running larger scale deployments and in some
 regards I don't think it will ever match ZFS, just looking at the way
 it's been designed.

 I'm not trying to flame here, I'm trying to open you guys to the fact,
 that there really is a better alternative than you're currently seeing.
 And if it has some technological drawbacks like these that you're trying
 to point out, instead of pointing at them as something, which can't be
 

Re: [Users] flashcache

2014-07-10 Thread Pavel Snajdr
On 07/10/2014 11:35 AM, Pavel Odintsov wrote:
 Not true, IO limits are working as they should (if we're talking vzctl
 set --iolimit/--iopslimit). I've kicked the ZoL guys around to add IO
 accounting support, so it is there.
 
 You can share tests with us? For standard folders like simfs this
 limits works bad in big number of cases

If you can give me concrete tests to run, sure, I'm curious to see if
you're right - then we'd have something concrete to fix :)

 
 How? ZFS doesn't have a limit on number of files (2^48 isn't a limit really)
 
 It's ok when your customer create 1 billion of small files on 10GB VPS
 and you will try to archive it for backup? On slow disk system it's
 really nightmare because a lot of disk operations which kills your
 I/O.

zfs snapshot dataset@snapname
zfs send dataset@snapname  your-file or | ssh backuper zfs recv
backupdataset

That's done on block level. No need to run rsync anymore, it's a lot
faster this way.

 
 Why? ZFS send/receive is able to do bit-by-bit identical copy of the FS,
 I thought the point of migration is to don't have the CT notice any
 change, I don't see why the inode numbers should change.
 
 Do you have really working zero downtime vzmigrate on ZFS?

Nope, vzmigrate isn't zero downtime. Due to vzctl/vzmigrate not
supporting ZFS, we're implementing this our own way in vpsAdmin, which
in it's 2.0 re-implementation will go opensource under GPL.

 
 How exactly? I haven't seen a problem with any userspace software, other
 than MySQL default setting to AIO (it fallbacks to older method), which
 ZFS doesn't support (*yet*, they have it in their plans).
 
 I speaks about MySQL primarily. I have thousands of containers and I
 can tune MySQL for another mode for all customers, it's impossible.

As I said, this is under development and will improve.

 
 L2ARC cache really smart
 
 Yep, fine, I knew. But can you account L2ARC cache usage per customer?
 OpenVZ can it via flag:
 sysctl -a|grep pagecache_isola
 ubc.pagecache_isolation = 0

I can't account for caches per CT, but I didn't have any need to do so.

L2ARC != ARC, ARC is in system RAM, L2ARC is intended to be on SSD for
the content of ARC that is the least significant in case of low memory -
it gets pushed from ARC to L2ARC.

ARC has two primary lists of cached data - most frequently used and most
recently used and these two lists are divided by a boundary marking
which data can be pushed away in low mem situation.

It doesn't happen like with Linux VFS cache that you're copying one big
file and it pushes out all of the other useful data there.

Thanks to this distinction of MRU and MFU ARC achieves far better hitrates.

 
 But one customer can eat almost all L2ARC cache and displace another
 customers data.

Yes, but ZFS keeps track on what's being used, so useful data can't be
pushed away that easily, things naturally balance themselves due to the
way how ARC mechanism works.

 
 I'm not agains ZFS but I'm against of usage ZFS as underlying system
 for containers. We caught ~100 kernel bugs with simfs on EXT4 when
 customers do some strange thinks.

I haven't encountered any problems especially with vzquota disabled (no
need for it, ZFS has its own quotas, which never need to be recalculated
as with vzquota).

 
 But ext4 has about few thouasands developers and the fix this issues
 asap but ZFS on Linux has only 3-5 developers which VERY slow.
 Because of this I recommends using ext4 with ploop because this
 solution is rock stable or ZFS with ZVOL's with ext4 because this
 solution if more reliable and more predictable then placing ZFS
 containers on ZFS volumes.

ZFS itself is a stable and mature filesystem, it first shipped as
production with Solaris in 2006.
And it's still being developed upstream as OpenZFS, that code is shared
between the primary version - Illumos and the ports - FreeBSD, OS X, Linux.

So what really needs and still is being developed is the way how ZFS is
run under Linux kernel, but with recent release of 0.6.3, things have
gotten mature enough to be used in production without any fears. Of
course, no software is without bugs, but I can say with absolute
certainty that ZFS will never eat your data, the only problem you can
encounter is with the memory management, which is done really
differently in Linux than in ZFS's original habitat - Solaris.

/snajpa

 
 
 On Thu, Jul 10, 2014 at 1:08 PM, Pavel Snajdr li...@snajpa.net wrote:
 On 07/10/2014 10:34 AM, Pavel Odintsov wrote:
 Hello!

 You scheme is fine but you can't divide I/O load with cgroup blkio
 (ioprio/iolimit/iopslimit) between different folders but between
 different ZVOL you do.

 Not true, IO limits are working as they should (if we're talking vzctl
 set --iolimit/--iopslimit). I've kicked the ZoL guys around to add IO
 accounting support, so it is there.


 I could imagine following problems for per folder scheme:
 1) Can't limit number of inodes in different folders (but there are
 not an inode limit for ZFS like 

Re: [Users] flashcache

2014-07-10 Thread Pavel Odintsov
Thank you for your answers! It's really useful information.

On Thu, Jul 10, 2014 at 2:08 PM, Pavel Snajdr li...@snajpa.net wrote:
 On 07/10/2014 11:35 AM, Pavel Odintsov wrote:
 Not true, IO limits are working as they should (if we're talking vzctl
 set --iolimit/--iopslimit). I've kicked the ZoL guys around to add IO
 accounting support, so it is there.

 You can share tests with us? For standard folders like simfs this
 limits works bad in big number of cases

 If you can give me concrete tests to run, sure, I'm curious to see if
 you're right - then we'd have something concrete to fix :)


 How? ZFS doesn't have a limit on number of files (2^48 isn't a limit really)

 It's ok when your customer create 1 billion of small files on 10GB VPS
 and you will try to archive it for backup? On slow disk system it's
 really nightmare because a lot of disk operations which kills your
 I/O.

 zfs snapshot dataset@snapname
 zfs send dataset@snapname  your-file or | ssh backuper zfs recv
 backupdataset

 That's done on block level. No need to run rsync anymore, it's a lot
 faster this way.


 Why? ZFS send/receive is able to do bit-by-bit identical copy of the FS,
 I thought the point of migration is to don't have the CT notice any
 change, I don't see why the inode numbers should change.

 Do you have really working zero downtime vzmigrate on ZFS?

 Nope, vzmigrate isn't zero downtime. Due to vzctl/vzmigrate not
 supporting ZFS, we're implementing this our own way in vpsAdmin, which
 in it's 2.0 re-implementation will go opensource under GPL.


 How exactly? I haven't seen a problem with any userspace software, other
 than MySQL default setting to AIO (it fallbacks to older method), which
 ZFS doesn't support (*yet*, they have it in their plans).

 I speaks about MySQL primarily. I have thousands of containers and I
 can tune MySQL for another mode for all customers, it's impossible.

 As I said, this is under development and will improve.


 L2ARC cache really smart

 Yep, fine, I knew. But can you account L2ARC cache usage per customer?
 OpenVZ can it via flag:
 sysctl -a|grep pagecache_isola
 ubc.pagecache_isolation = 0

 I can't account for caches per CT, but I didn't have any need to do so.

 L2ARC != ARC, ARC is in system RAM, L2ARC is intended to be on SSD for
 the content of ARC that is the least significant in case of low memory -
 it gets pushed from ARC to L2ARC.

 ARC has two primary lists of cached data - most frequently used and most
 recently used and these two lists are divided by a boundary marking
 which data can be pushed away in low mem situation.

 It doesn't happen like with Linux VFS cache that you're copying one big
 file and it pushes out all of the other useful data there.

 Thanks to this distinction of MRU and MFU ARC achieves far better hitrates.


 But one customer can eat almost all L2ARC cache and displace another
 customers data.

 Yes, but ZFS keeps track on what's being used, so useful data can't be
 pushed away that easily, things naturally balance themselves due to the
 way how ARC mechanism works.


 I'm not agains ZFS but I'm against of usage ZFS as underlying system
 for containers. We caught ~100 kernel bugs with simfs on EXT4 when
 customers do some strange thinks.

 I haven't encountered any problems especially with vzquota disabled (no
 need for it, ZFS has its own quotas, which never need to be recalculated
 as with vzquota).


 But ext4 has about few thouasands developers and the fix this issues
 asap but ZFS on Linux has only 3-5 developers which VERY slow.
 Because of this I recommends using ext4 with ploop because this
 solution is rock stable or ZFS with ZVOL's with ext4 because this
 solution if more reliable and more predictable then placing ZFS
 containers on ZFS volumes.

 ZFS itself is a stable and mature filesystem, it first shipped as
 production with Solaris in 2006.
 And it's still being developed upstream as OpenZFS, that code is shared
 between the primary version - Illumos and the ports - FreeBSD, OS X, Linux.

 So what really needs and still is being developed is the way how ZFS is
 run under Linux kernel, but with recent release of 0.6.3, things have
 gotten mature enough to be used in production without any fears. Of
 course, no software is without bugs, but I can say with absolute
 certainty that ZFS will never eat your data, the only problem you can
 encounter is with the memory management, which is done really
 differently in Linux than in ZFS's original habitat - Solaris.

 /snajpa



 On Thu, Jul 10, 2014 at 1:08 PM, Pavel Snajdr li...@snajpa.net wrote:
 On 07/10/2014 10:34 AM, Pavel Odintsov wrote:
 Hello!

 You scheme is fine but you can't divide I/O load with cgroup blkio
 (ioprio/iolimit/iopslimit) between different folders but between
 different ZVOL you do.

 Not true, IO limits are working as they should (if we're talking vzctl
 set --iolimit/--iopslimit). I've kicked the ZoL guys around to add IO
 accounting support, so it is there.


 I 

Re: [Users] flashcache

2014-07-10 Thread Pavel Odintsov
Could you share your patches to vzmigrate and vzctl?

On Thu, Jul 10, 2014 at 2:25 PM, Pavel Odintsov
pavel.odint...@gmail.com wrote:
 Thank you for your answers! It's really useful information.

 On Thu, Jul 10, 2014 at 2:08 PM, Pavel Snajdr li...@snajpa.net wrote:
 On 07/10/2014 11:35 AM, Pavel Odintsov wrote:
 Not true, IO limits are working as they should (if we're talking vzctl
 set --iolimit/--iopslimit). I've kicked the ZoL guys around to add IO
 accounting support, so it is there.

 You can share tests with us? For standard folders like simfs this
 limits works bad in big number of cases

 If you can give me concrete tests to run, sure, I'm curious to see if
 you're right - then we'd have something concrete to fix :)


 How? ZFS doesn't have a limit on number of files (2^48 isn't a limit 
 really)

 It's ok when your customer create 1 billion of small files on 10GB VPS
 and you will try to archive it for backup? On slow disk system it's
 really nightmare because a lot of disk operations which kills your
 I/O.

 zfs snapshot dataset@snapname
 zfs send dataset@snapname  your-file or | ssh backuper zfs recv
 backupdataset

 That's done on block level. No need to run rsync anymore, it's a lot
 faster this way.


 Why? ZFS send/receive is able to do bit-by-bit identical copy of the FS,
 I thought the point of migration is to don't have the CT notice any
 change, I don't see why the inode numbers should change.

 Do you have really working zero downtime vzmigrate on ZFS?

 Nope, vzmigrate isn't zero downtime. Due to vzctl/vzmigrate not
 supporting ZFS, we're implementing this our own way in vpsAdmin, which
 in it's 2.0 re-implementation will go opensource under GPL.


 How exactly? I haven't seen a problem with any userspace software, other
 than MySQL default setting to AIO (it fallbacks to older method), which
 ZFS doesn't support (*yet*, they have it in their plans).

 I speaks about MySQL primarily. I have thousands of containers and I
 can tune MySQL for another mode for all customers, it's impossible.

 As I said, this is under development and will improve.


 L2ARC cache really smart

 Yep, fine, I knew. But can you account L2ARC cache usage per customer?
 OpenVZ can it via flag:
 sysctl -a|grep pagecache_isola
 ubc.pagecache_isolation = 0

 I can't account for caches per CT, but I didn't have any need to do so.

 L2ARC != ARC, ARC is in system RAM, L2ARC is intended to be on SSD for
 the content of ARC that is the least significant in case of low memory -
 it gets pushed from ARC to L2ARC.

 ARC has two primary lists of cached data - most frequently used and most
 recently used and these two lists are divided by a boundary marking
 which data can be pushed away in low mem situation.

 It doesn't happen like with Linux VFS cache that you're copying one big
 file and it pushes out all of the other useful data there.

 Thanks to this distinction of MRU and MFU ARC achieves far better hitrates.


 But one customer can eat almost all L2ARC cache and displace another
 customers data.

 Yes, but ZFS keeps track on what's being used, so useful data can't be
 pushed away that easily, things naturally balance themselves due to the
 way how ARC mechanism works.


 I'm not agains ZFS but I'm against of usage ZFS as underlying system
 for containers. We caught ~100 kernel bugs with simfs on EXT4 when
 customers do some strange thinks.

 I haven't encountered any problems especially with vzquota disabled (no
 need for it, ZFS has its own quotas, which never need to be recalculated
 as with vzquota).


 But ext4 has about few thouasands developers and the fix this issues
 asap but ZFS on Linux has only 3-5 developers which VERY slow.
 Because of this I recommends using ext4 with ploop because this
 solution is rock stable or ZFS with ZVOL's with ext4 because this
 solution if more reliable and more predictable then placing ZFS
 containers on ZFS volumes.

 ZFS itself is a stable and mature filesystem, it first shipped as
 production with Solaris in 2006.
 And it's still being developed upstream as OpenZFS, that code is shared
 between the primary version - Illumos and the ports - FreeBSD, OS X, Linux.

 So what really needs and still is being developed is the way how ZFS is
 run under Linux kernel, but with recent release of 0.6.3, things have
 gotten mature enough to be used in production without any fears. Of
 course, no software is without bugs, but I can say with absolute
 certainty that ZFS will never eat your data, the only problem you can
 encounter is with the memory management, which is done really
 differently in Linux than in ZFS's original habitat - Solaris.

 /snajpa



 On Thu, Jul 10, 2014 at 1:08 PM, Pavel Snajdr li...@snajpa.net wrote:
 On 07/10/2014 10:34 AM, Pavel Odintsov wrote:
 Hello!

 You scheme is fine but you can't divide I/O load with cgroup blkio
 (ioprio/iolimit/iopslimit) between different folders but between
 different ZVOL you do.

 Not true, IO limits are working as they should 

Re: [Users] flashcache

2014-07-10 Thread Pavel Snajdr
On 07/10/2014 12:32 PM, Pavel Odintsov wrote:
 Could you share your patches to vzmigrate and vzctl?

We don't have any, where vzctl/vzmigrate didn't satisfy our needs, we've
went the way around these utilities and let vpsAdmin on the hwnode
manage things.

You can take a look here:

https://github.com/vpsfreecz/vpsadmind

I wouldn't recommend anyone outside of our organization to use vpsAdmin
yet, as the 2.0 transition to self-describing RESTful API is still
underway. As soon as it's finished and well documented, I'll post a note
here as well.

The 2.0 version will be primarily controled via a CLI tool, which
autogenerates itself from the API description.

A running version of the API can be seen here:

https://api.vpsfree.cz/v1/

Github repos:

https://github.com/vpsfreecz/vpsadminapi (the API)
https://github.com/vpsfreecz/vpsadminctl (the CLI tool)

https://github.com/vpsfreecz/vpsadmind (deamon run on hwnode)
https://github.com/vpsfreecz/vpsadmindctl (CLI tool to control the daemon)

https://github.com/vpsfreecz/vpsadmin

The last repo is the vpsAdmin 1.x, which all 2.0 things still require to
run, it's a pain to get this running yourself, but stay tuned, once we
get rid of 1.x and document 2.0 properly, it's going to be a great thing.

/snajpa

 
 On Thu, Jul 10, 2014 at 2:25 PM, Pavel Odintsov
 pavel.odint...@gmail.com wrote:
 Thank you for your answers! It's really useful information.

 On Thu, Jul 10, 2014 at 2:08 PM, Pavel Snajdr li...@snajpa.net wrote:
 On 07/10/2014 11:35 AM, Pavel Odintsov wrote:
 Not true, IO limits are working as they should (if we're talking vzctl
 set --iolimit/--iopslimit). I've kicked the ZoL guys around to add IO
 accounting support, so it is there.

 You can share tests with us? For standard folders like simfs this
 limits works bad in big number of cases

 If you can give me concrete tests to run, sure, I'm curious to see if
 you're right - then we'd have something concrete to fix :)


 How? ZFS doesn't have a limit on number of files (2^48 isn't a limit 
 really)

 It's ok when your customer create 1 billion of small files on 10GB VPS
 and you will try to archive it for backup? On slow disk system it's
 really nightmare because a lot of disk operations which kills your
 I/O.

 zfs snapshot dataset@snapname
 zfs send dataset@snapname  your-file or | ssh backuper zfs recv
 backupdataset

 That's done on block level. No need to run rsync anymore, it's a lot
 faster this way.


 Why? ZFS send/receive is able to do bit-by-bit identical copy of the FS,
 I thought the point of migration is to don't have the CT notice any
 change, I don't see why the inode numbers should change.

 Do you have really working zero downtime vzmigrate on ZFS?

 Nope, vzmigrate isn't zero downtime. Due to vzctl/vzmigrate not
 supporting ZFS, we're implementing this our own way in vpsAdmin, which
 in it's 2.0 re-implementation will go opensource under GPL.


 How exactly? I haven't seen a problem with any userspace software, other
 than MySQL default setting to AIO (it fallbacks to older method), which
 ZFS doesn't support (*yet*, they have it in their plans).

 I speaks about MySQL primarily. I have thousands of containers and I
 can tune MySQL for another mode for all customers, it's impossible.

 As I said, this is under development and will improve.


 L2ARC cache really smart

 Yep, fine, I knew. But can you account L2ARC cache usage per customer?
 OpenVZ can it via flag:
 sysctl -a|grep pagecache_isola
 ubc.pagecache_isolation = 0

 I can't account for caches per CT, but I didn't have any need to do so.

 L2ARC != ARC, ARC is in system RAM, L2ARC is intended to be on SSD for
 the content of ARC that is the least significant in case of low memory -
 it gets pushed from ARC to L2ARC.

 ARC has two primary lists of cached data - most frequently used and most
 recently used and these two lists are divided by a boundary marking
 which data can be pushed away in low mem situation.

 It doesn't happen like with Linux VFS cache that you're copying one big
 file and it pushes out all of the other useful data there.

 Thanks to this distinction of MRU and MFU ARC achieves far better hitrates.


 But one customer can eat almost all L2ARC cache and displace another
 customers data.

 Yes, but ZFS keeps track on what's being used, so useful data can't be
 pushed away that easily, things naturally balance themselves due to the
 way how ARC mechanism works.


 I'm not agains ZFS but I'm against of usage ZFS as underlying system
 for containers. We caught ~100 kernel bugs with simfs on EXT4 when
 customers do some strange thinks.

 I haven't encountered any problems especially with vzquota disabled (no
 need for it, ZFS has its own quotas, which never need to be recalculated
 as with vzquota).


 But ext4 has about few thouasands developers and the fix this issues
 asap but ZFS on Linux has only 3-5 developers which VERY slow.
 Because of this I recommends using ext4 with ploop because this
 solution is rock 

Re: [Users] flashcache

2014-07-10 Thread Pavel Snajdr
On 07/10/2014 12:50 PM, Pavel Snajdr wrote:
 On 07/10/2014 12:32 PM, Pavel Odintsov wrote:
 Could you share your patches to vzmigrate and vzctl?
 
 We don't have any, where vzctl/vzmigrate didn't satisfy our needs, we've
 went the way around these utilities and let vpsAdmin on the hwnode
 manage things.
 
 You can take a look here:
 
 https://github.com/vpsfreecz/vpsadmind
 
 I wouldn't recommend anyone outside of our organization to use vpsAdmin
 yet, as the 2.0 transition to self-describing RESTful API is still
 underway. As soon as it's finished and well documented, I'll post a note
 here as well.
 
 The 2.0 version will be primarily controled via a CLI tool, which
 autogenerates itself from the API description.
 
 A running version of the API can be seen here:
 
 https://api.vpsfree.cz/v1/
 
 Github repos:
 
 https://github.com/vpsfreecz/vpsadminapi (the API)
 https://github.com/vpsfreecz/vpsadminctl (the CLI tool)
 
 https://github.com/vpsfreecz/vpsadmind (deamon run on hwnode)
 https://github.com/vpsfreecz/vpsadmindctl (CLI tool to control the daemon)
 
 https://github.com/vpsfreecz/vpsadmin
 
 The last repo is the vpsAdmin 1.x, which all 2.0 things still require to
 run, it's a pain to get this running yourself, but stay tuned, once we
 get rid of 1.x and document 2.0 properly, it's going to be a great thing.
 
 /snajpa
 

Though, if you don't mind managing things via a web interface, vpsAdmin
1.x can be installed through these scripts:

https://github.com/vpsfreecz/vpsadmininstall

/snajpa


 On Thu, Jul 10, 2014 at 2:25 PM, Pavel Odintsov
 pavel.odint...@gmail.com wrote:
 Thank you for your answers! It's really useful information.

 On Thu, Jul 10, 2014 at 2:08 PM, Pavel Snajdr li...@snajpa.net wrote:
 On 07/10/2014 11:35 AM, Pavel Odintsov wrote:
 Not true, IO limits are working as they should (if we're talking vzctl
 set --iolimit/--iopslimit). I've kicked the ZoL guys around to add IO
 accounting support, so it is there.

 You can share tests with us? For standard folders like simfs this
 limits works bad in big number of cases

 If you can give me concrete tests to run, sure, I'm curious to see if
 you're right - then we'd have something concrete to fix :)


 How? ZFS doesn't have a limit on number of files (2^48 isn't a limit 
 really)

 It's ok when your customer create 1 billion of small files on 10GB VPS
 and you will try to archive it for backup? On slow disk system it's
 really nightmare because a lot of disk operations which kills your
 I/O.

 zfs snapshot dataset@snapname
 zfs send dataset@snapname  your-file or | ssh backuper zfs recv
 backupdataset

 That's done on block level. No need to run rsync anymore, it's a lot
 faster this way.


 Why? ZFS send/receive is able to do bit-by-bit identical copy of the FS,
 I thought the point of migration is to don't have the CT notice any
 change, I don't see why the inode numbers should change.

 Do you have really working zero downtime vzmigrate on ZFS?

 Nope, vzmigrate isn't zero downtime. Due to vzctl/vzmigrate not
 supporting ZFS, we're implementing this our own way in vpsAdmin, which
 in it's 2.0 re-implementation will go opensource under GPL.


 How exactly? I haven't seen a problem with any userspace software, other
 than MySQL default setting to AIO (it fallbacks to older method), which
 ZFS doesn't support (*yet*, they have it in their plans).

 I speaks about MySQL primarily. I have thousands of containers and I
 can tune MySQL for another mode for all customers, it's impossible.

 As I said, this is under development and will improve.


 L2ARC cache really smart

 Yep, fine, I knew. But can you account L2ARC cache usage per customer?
 OpenVZ can it via flag:
 sysctl -a|grep pagecache_isola
 ubc.pagecache_isolation = 0

 I can't account for caches per CT, but I didn't have any need to do so.

 L2ARC != ARC, ARC is in system RAM, L2ARC is intended to be on SSD for
 the content of ARC that is the least significant in case of low memory -
 it gets pushed from ARC to L2ARC.

 ARC has two primary lists of cached data - most frequently used and most
 recently used and these two lists are divided by a boundary marking
 which data can be pushed away in low mem situation.

 It doesn't happen like with Linux VFS cache that you're copying one big
 file and it pushes out all of the other useful data there.

 Thanks to this distinction of MRU and MFU ARC achieves far better hitrates.


 But one customer can eat almost all L2ARC cache and displace another
 customers data.

 Yes, but ZFS keeps track on what's being used, so useful data can't be
 pushed away that easily, things naturally balance themselves due to the
 way how ARC mechanism works.


 I'm not agains ZFS but I'm against of usage ZFS as underlying system
 for containers. We caught ~100 kernel bugs with simfs on EXT4 when
 customers do some strange thinks.

 I haven't encountered any problems especially with vzquota disabled (no
 need for it, ZFS has its own quotas, which never need to 

Re: [Users] flashcache

2014-07-10 Thread Nick Knutov
There are two important moments here:

1) As Pavel wrote IO can't be separated easily with one fs (now, but I
think it can change with cgroups in future)

2) per-user quota inside CT is supported for ext4 only now

This two moments can be important for you or not.

In our real life we never had any issues with IO in production (and we
are migrating to SSD, so IO is always enough now), but most of
our/customers CTs are shared hosting in someway, so having per-user
quota is critical.


10.07.2014 14:42, Aleksandar Ivanisevic пишет:
 Why is everyone insisting on ext4 and even ext4 in individual zvols? I
 have done some testing with root and private directly on a zfs file
 system and so far everything seems to work just fine.
 
 What am I to expect down the road?

-- 
Best Regards,
Nick Knutov
http://knutov.com
ICQ: 272873706
Voice: +7-904-84-23-130
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-10 Thread Nick Knutov
I think you are speaking here about different cases.

One is making HA backup node. When we are backing up full node to
another node (1:1) - zfs send/receive is much better (and the goal is to
save data, not running processes). Without zfs - ploop snapshotting and
vzmigrate is good enough (over SSD), and rsync with ext4 (simfs inside
CT) is really pain.

The other case is migrating large amount of CTs over large amount of
nodes for resource usage balancing [with zero downtime]. There is no
alternatives to vzmigrate here although zfs send/receive with
per-container ZVOL can speed up this process [if it's important to
transfer between nodes faster with less network usage]

10.07.2014 15:35, Pavel Odintsov пишет:
 Why? ZFS send/receive is able to do bit-by-bit identical copy of the FS,
 I thought the point of migration is to don't have the CT notice any
 change, I don't see why the inode numbers should change.
 Do you have really working zero downtime vzmigrate on ZFS?
 

-- 
Best Regards,
Nick Knutov
http://knutov.com
ICQ: 272873706
Voice: +7-904-84-23-130
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-09 Thread Pavel Snajdr
On 07/08/2014 07:52 PM, Scott Dowdle wrote:
 Greetings,
 
 - Original Message -
 (offtopic) We can not use ZFS. Unfortunately, NAS with something like
 Nexenta is to expensive for us.
 
 From what I've gathered from a few presentations, ZFS on Linux 
 (http://zfsonlinux.org/) is as stable but more performant than it is on the 
 OpenSolaris forks... so you can build your own if you can spare the people to 
 learn the best practices.
 
 I don't have a use for ZFS myself so I'm not really advocating it.
 
 TYL,
 

Hi all,

we run tens of OpenVZ nodes (bigger boxes, 256G RAM, 12cores+, 90 CTs at
least). We've used to run ext4+flashcache, but ext4 has proven to be a
bottleneck. That was the primary motivation behind ploop as far as I know.

We've switched to ZFS on Linux around the time Ploop was announced and I
didn't have second thoughts since. ZFS really *is* in my experience the
best filesystem there is at the moment for this kind of deployment  -
especially if you use dedicated SSDs for ZIL and L2ARC, although the
latter is less important. You will know what I'm talking about when you
try this on boxes with lots of CTs doing LAMP load - databases and their
synchronous writes are the real problem, which ZFS with dedicated ZIL
device solves.

Also there is the ARC caching, which is smarter then linux VFS cache -
we're able to achieve about 99% of hitrate at about 99% of the time,
even under high loads.

Having said all that, I recommend everyone to give ZFS a chance, but I'm
aware this is yet another out-of-mainline code and that doesn't suit
everyone that well.

snajpa
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-09 Thread Kir Kolyshkin
On 07/08/2014 11:54 PM, Pavel Snajdr wrote:
 On 07/08/2014 07:52 PM, Scott Dowdle wrote:
 Greetings,

 - Original Message -
 (offtopic) We can not use ZFS. Unfortunately, NAS with something like
 Nexenta is to expensive for us.
 From what I've gathered from a few presentations, ZFS on Linux 
 (http://zfsonlinux.org/) is as stable but more performant than it is on the 
 OpenSolaris forks... so you can build your own if you can spare the people 
 to learn the best practices.

 I don't have a use for ZFS myself so I'm not really advocating it.

 TYL,

 Hi all,

 we run tens of OpenVZ nodes (bigger boxes, 256G RAM, 12cores+, 90 CTs at
 least). We've used to run ext4+flashcache, but ext4 has proven to be a
 bottleneck. That was the primary motivation behind ploop as far as I know.

 We've switched to ZFS on Linux around the time Ploop was announced and I
 didn't have second thoughts since. ZFS really *is* in my experience the
 best filesystem there is at the moment for this kind of deployment  -
 especially if you use dedicated SSDs for ZIL and L2ARC, although the
 latter is less important. You will know what I'm talking about when you
 try this on boxes with lots of CTs doing LAMP load - databases and their
 synchronous writes are the real problem, which ZFS with dedicated ZIL
 device solves.

 Also there is the ARC caching, which is smarter then linux VFS cache -
 we're able to achieve about 99% of hitrate at about 99% of the time,
 even under high loads.

 Having said all that, I recommend everyone to give ZFS a chance, but I'm
 aware this is yet another out-of-mainline code and that doesn't suit
 everyone that well.


Are you using per-container ZVOL or something else?
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-08 Thread Pavel Odintsov
Hi all!

I thought it's really not good idea because technology like ssd
caching should be tested _thoroughly_ before production use. But you
could try it with  simfs but beware of ploop because it's really not
an standard ext4 with custom caches and unexpected behaviour in some
cases.

On Tue, Jul 8, 2014 at 1:59 PM, Aleksandar Ivanisevic
aleksan...@ivanisevic.de wrote:

 Hi,

 is anyone using flashcache vith openvz? If so, which version and with
 which kernel? Versions lower than 3 do not compile against the latest
 el6 kernel and version 3.11 and the latest git oopses in
 flashcache_md_write_kickoff with a null pointer.

 I see provisions to detect ovz kernel source in flashcache makefile, so
 someone must be compiling and using it.

 Any other SSD caching software that works with openvz?

 regards,

 ___
 Users mailing list
 Users@openvz.org
 https://lists.openvz.org/mailman/listinfo/users



-- 
Sincerely yours, Pavel Odintsov
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-08 Thread Aleksandar Ivanisevic

I am actually planning on using it only on test systems where i have
commodity SATA disks that are getting a bit overwhelmed. I hope to get
better value from a SATA+SSD combination that I would with SAS disks and
the appropriate controllers and fancy RAID levels that cost 3 times
more at least.

Anyway, looks like that bug also got fixed in 092.1, at least it doesn't
oops immediately any more.

Pavel Odintsov pavel.odint...@gmail.com
writes:

 Hi all!

 I thought it's really not good idea because technology like ssd
 caching should be tested _thoroughly_ before production use. But you
 could try it with  simfs but beware of ploop because it's really not
 an standard ext4 with custom caches and unexpected behaviour in some
 cases.

 On Tue, Jul 8, 2014 at 1:59 PM, Aleksandar Ivanisevic
 aleksan...@ivanisevic.de wrote:

 Hi,

 is anyone using flashcache vith openvz? If so, which version and with
 which kernel? Versions lower than 3 do not compile against the latest
 el6 kernel and version 3.11 and the latest git oopses in
 flashcache_md_write_kickoff with a null pointer.

 I see provisions to detect ovz kernel source in flashcache makefile, so
 someone must be compiling and using it.

 Any other SSD caching software that works with openvz?

 regards,

 ___
 Users mailing list
 Users@openvz.org
 https://lists.openvz.org/mailman/listinfo/users

___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-08 Thread Pavel Odintsov
I knew about few incidents with ___FULL___ data loss from customers of
flashcache. Beware of it in production.

If you want speed you can try ZFS with l2arc/zvol cache because it's
native solution.

On Tue, Jul 8, 2014 at 8:05 PM, Nick Knutov m...@knutov.com wrote:
 We are using latest flashcache 2.* with 2.6.32-042stab083.2 in
 production for a long time. Planning to migrate 3.0 with latest 090.5
 but did not tried yet.


 08.07.2014 15:59, Aleksandar Ivanisevic пишет:

 Hi,

 is anyone using flashcache vith openvz? If so, which version and with
 which kernel? Versions lower than 3 do not compile against the latest
 el6 kernel and version 3.11 and the latest git oopses in
 flashcache_md_write_kickoff with a null pointer.

 I see provisions to detect ovz kernel source in flashcache makefile, so
 someone must be compiling and using it.

 Any other SSD caching software that works with openvz?


 --
 Best Regards,
 Nick Knutov
 http://knutov.com
 ICQ: 272873706
 Voice: +7-904-84-23-130
 ___
 Users mailing list
 Users@openvz.org
 https://lists.openvz.org/mailman/listinfo/users



-- 
Sincerely yours, Pavel Odintsov

___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-08 Thread Nick Knutov
We are using this for only cashing reads (mode thru), not writes.


(offtopic) We can not use ZFS. Unfortunately, NAS with something like
Nexenta is to expensive for us.

Anyway, we are doing completely migrate to SSD. It's just cheaper.


08.07.2014 22:23, Pavel Odintsov пишет:
 I knew about few incidents with ___FULL___ data loss from customers of
 flashcache. Beware of it in production.
 
 If you want speed you can try ZFS with l2arc/zvol cache because it's
 native solution.

-- 
Best Regards,
Nick Knutov
http://knutov.com
ICQ: 272873706
Voice: +7-904-84-23-130
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-08 Thread Scott Dowdle
Greetings,

- Original Message -
 (offtopic) We can not use ZFS. Unfortunately, NAS with something like
 Nexenta is to expensive for us.

From what I've gathered from a few presentations, ZFS on Linux 
(http://zfsonlinux.org/) is as stable but more performant than it is on the 
OpenSolaris forks... so you can build your own if you can spare the people to 
learn the best practices.

I don't have a use for ZFS myself so I'm not really advocating it.

TYL,
-- 
Scott Dowdle
704 Church Street
Belgrade, MT 59714
(406)388-0827 [home]
(406)994-3931 [work]
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-08 Thread Nick Knutov
I know about this project, but what about stability/compatibility ZFS on
Linux with OpenVZ kernel? Has anyone ever tested it?

Also, with ext4 I can always at any [our] datacenter boot to rescue mode
and, for example, move data to another/new server. I have no idea how to
get ZFS data if something happen wrong with hardware or recently
installed kernel with usual

From the other side, we are using flashcache in production for about two
years. With zero problems during all this time. It is not as fast as
Bcache (which is not compatible with OpenVZ  I think), but it solves
problem well.


08.07.2014 23:52, Scott Dowdle пишет:
 Greetings,
 
 - Original Message -
 (offtopic) We can not use ZFS. Unfortunately, NAS with something like
 Nexenta is to expensive for us.
 
 From what I've gathered from a few presentations, ZFS on Linux 
 (http://zfsonlinux.org/) is as stable but more performant than it is on the 
 OpenSolaris forks... so you can build your own if you can spare the people to 
 learn the best practices.
 
 I don't have a use for ZFS myself so I'm not really advocating it.
 
 TYL,
 

-- 
Best Regards,
Nick Knutov
http://knutov.com
ICQ: 272873706
Voice: +7-904-84-23-130
___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users


Re: [Users] flashcache

2014-07-08 Thread Pavel Odintsov
Hello!

Yep, Read cache is nice and safe solution but not write cache :)

No, we do not use ZFS in production yet. We done only very specific
tests like this: https://github.com/zfsonlinux/zfs/issues/2458 But you
can do some performance  tests and share :)

On Wed, Jul 9, 2014 at 12:55 AM, Nick Knutov m...@knutov.com wrote:
 I read
 http://www.stableit.ru/2014/07/using-zfs-with-openvz-openvzfs.html . Do
 you use it in production? Can you share speed tests or some other
 experience with zfs and openvz?


 08.07.2014 22:23, Pavel Odintsov пишет:
 I knew about few incidents with ___FULL___ data loss from customers of
 flashcache. Beware of it in production.

 If you want speed you can try ZFS with l2arc/zvol cache because it's
 native solution.


 --
 Best Regards,
 Nick Knutov
 http://knutov.com
 ICQ: 272873706
 Voice: +7-904-84-23-130
 ___
 Users mailing list
 Users@openvz.org
 https://lists.openvz.org/mailman/listinfo/users



-- 
Sincerely yours, Pavel Odintsov

___
Users mailing list
Users@openvz.org
https://lists.openvz.org/mailman/listinfo/users