Re: [Users] Opinions needed: 3 node gluster replica 3 | NFS async | snapshots for consistency

2014-02-24 Thread Ted Miller

On 2/23/2014 3:20 PM, Ayal Baron wrote:

- Original Message -

On Sun, Feb 23, 2014 at 4:27 AM, Ayal Baron aba...@redhat.com wrote:

- Original Message -

I'm looking for some opinions on this configuration in an effort to increase
write performance:

3 storage nodes using glusterfs in replica 3, quorum.

gluster doesn't support replica 3 yet, so I'm not sure how heavily I'd
rely on this.


Glusterfs or RHSS doesn't support rep 3? How could I create a quorum
without 3+ hosts?

glusterfs has the capability but it hasn't been widely tested with oVirt yet 
and we've already found a couple of issues there.
afaiu gluster has the ability to define a tie breaker (a third node which is 
part of the quorum but does not provide a third replica of the data).


I've been researching glusterfs for quite a while, and have had a 3 node 
replica up and running, but never heard of a tie breaker node.  Anywhere 
this is documented?  I could use something like that.


I will be testing a 3-node ovirt + gluster setup, hopefully yet this week.



Ovirt storage domain via NFS

why NFS and not gluster?


Gluster via posix SD doesn't have any performance gains over NFS, maybe the
opposite.

gluster via posix is mounting it using the gluster fuse client which should 
provide better performance + availability than NFS.


Gluster 'native' SD's are broken on EL6.5 so I have been unable to test
performance. I have heard performance can be upwards of 3x NFS for raw
write.

Broken how?


Gluster doesn't have an async write option, so its doubtful it will ever be
close to NFS async speeds.



Volume set nfs.trusted-sync on
On Ovirt, taking snapshots often enough to recover from a storage crash

Note that this would have negative write performance impact


The difference between NFS sync (50MB/s) and async (300MB/s on 10g) write
speeds should more than compensate for the performance hit of taking
snapshots more often. And that's just raw speed. If we take into
consideration IOPS (guest small writes) async is leaps and bounds ahead.

I would test this, since qemu is already doing async I/O (using threads when 
native AIO is not supported) and oVirt runs it with cache=none (direct I/O) so 
sync ops should not happen that often (depends on guest).  You may be still 
enjoying performance boost, but I've seen UPS systems fail before bringing down 
multiple nodes at once.
In addition, if you do not guarantee your data is safe when you create a 
snapshot (and it doesn't seem like you are) then I see no reason to think your 
snapshots are any better off than latest state on disk.



If we assume the site has backup UPS and generator power and we can build a
highly available storage system with 3 nodes in quorum, are there any
potential issues other than a write performance hit?

The issue I thought might be most prevalent is if an ovirt host goes down
and the VM's are automatically brought back up on another host, they could
incur disk corruption and need to be brought back down and restored to the
last snapshot state. This basically means the HA feature should be disabled.

I'm not sure I understand what your concern is here, what would cause the data 
corruption? if your node crashed then there is no I/O in flight.  So starting 
up the VM should be perfectly safe.
It seems to me that either the VM can start and clean up it's own disk, or it 
can't, same as a bare-metal computer after a crash.  I have not experienced 
any additional corruption opportunities.  I see no reason to not use the 
HA.  The worst that happens is that the boot hangs, you have to revert to a 
snapshot.  My experience (in bare metal and other virtualization 
environments) is that about 98% of the time the computer will reboot after an 
unclean shutdown (power failure or virtual equivalent).  I have done this to 
virtual machines more times than I want to admit.


Maybe you need to tell us more about what you have in mind as far as 
corruption, so that we can either confirm or debunk your concerns.



Even worse, if the gluster node with CTDB NFS IP goes down, it may not have
written out and replicated to its peers.  -- I think I may have just
answered my own question.

If 'trusted-sync' means that the CTDB NFS node acks the I/O before it reached 
quorum then I'd say that's a gluster bug.  It should ack the I/O before data 
hits the disc, but it should not ack it before it has quorum.
However, the configuration we feel comfortable using gluster is with both 
server and client quorum (gluster has 2 different configs and you need to 
configure both to work safely).
Gluster does not replicate to its peers.  Gluster writes to all peers at 
the same time, as part of the original write process.  If quorum is on, the 
process either works or it doesn't.


I assume you are keeping in mind that the kernel NFS server does not get 
along with gluster.  You need to run gluster's own NFS server, and turn off 
the kernel NFS server.  Gluster's own NFS server is 

Re: [Users] Opinions needed: 3 node gluster replica 3 | NFS async | snapshots for consistency

2014-02-24 Thread Ayal Baron


- Original Message -
 On 2/23/2014 3:20 PM, Ayal Baron wrote:
  - Original Message -
  On Sun, Feb 23, 2014 at 4:27 AM, Ayal Baron aba...@redhat.com wrote:
  - Original Message -
  I'm looking for some opinions on this configuration in an effort to
  increase
  write performance:
 
  3 storage nodes using glusterfs in replica 3, quorum.
  gluster doesn't support replica 3 yet, so I'm not sure how heavily I'd
  rely on this.
 
  Glusterfs or RHSS doesn't support rep 3? How could I create a quorum
  without 3+ hosts?
  glusterfs has the capability but it hasn't been widely tested with oVirt
  yet and we've already found a couple of issues there.
  afaiu gluster has the ability to define a tie breaker (a third node which
  is part of the quorum but does not provide a third replica of the data).
 
 I've been researching glusterfs for quite a while, and have had a 3 node
 replica up and running, but never heard of a tie breaker node.  Anywhere
 this is documented?  I could use something like that.

Deferring to Vijay who can give much more qualified answers on these matters.

 
 I will be testing a 3-node ovirt + gluster setup, hopefully yet this week.
 
  Ovirt storage domain via NFS
  why NFS and not gluster?
 
  Gluster via posix SD doesn't have any performance gains over NFS, maybe
  the
  opposite.
  gluster via posix is mounting it using the gluster fuse client which should
  provide better performance + availability than NFS.
 
  Gluster 'native' SD's are broken on EL6.5 so I have been unable to test
  performance. I have heard performance can be upwards of 3x NFS for raw
  write.
  Broken how?
 
  Gluster doesn't have an async write option, so its doubtful it will ever
  be
  close to NFS async speeds.
 
 
  Volume set nfs.trusted-sync on
  On Ovirt, taking snapshots often enough to recover from a storage crash
  Note that this would have negative write performance impact
 
  The difference between NFS sync (50MB/s) and async (300MB/s on 10g)
  write
  speeds should more than compensate for the performance hit of taking
  snapshots more often. And that's just raw speed. If we take into
  consideration IOPS (guest small writes) async is leaps and bounds ahead.
  I would test this, since qemu is already doing async I/O (using threads
  when native AIO is not supported) and oVirt runs it with cache=none
  (direct I/O) so sync ops should not happen that often (depends on guest).
  You may be still enjoying performance boost, but I've seen UPS systems
  fail before bringing down multiple nodes at once.
  In addition, if you do not guarantee your data is safe when you create a
  snapshot (and it doesn't seem like you are) then I see no reason to think
  your snapshots are any better off than latest state on disk.
 
 
  If we assume the site has backup UPS and generator power and we can build
  a
  highly available storage system with 3 nodes in quorum, are there any
  potential issues other than a write performance hit?
 
  The issue I thought might be most prevalent is if an ovirt host goes down
  and the VM's are automatically brought back up on another host, they could
  incur disk corruption and need to be brought back down and restored to the
  last snapshot state. This basically means the HA feature should be
  disabled.
  I'm not sure I understand what your concern is here, what would cause the
  data corruption? if your node crashed then there is no I/O in flight.  So
  starting up the VM should be perfectly safe.
 It seems to me that either the VM can start and clean up it's own disk, or it
 can't, same as a bare-metal computer after a crash.  I have not experienced
 any additional corruption opportunities.  I see no reason to not use the
 HA.  The worst that happens is that the boot hangs, you have to revert to a
 snapshot.  My experience (in bare metal and other virtualization
 environments) is that about 98% of the time the computer will reboot after an
 unclean shutdown (power failure or virtual equivalent).  I have done this to
 virtual machines more times than I want to admit.
 
 Maybe you need to tell us more about what you have in mind as far as
 corruption, so that we can either confirm or debunk your concerns.
 
  Even worse, if the gluster node with CTDB NFS IP goes down, it may not
  have
  written out and replicated to its peers.  -- I think I may have just
  answered my own question.
  If 'trusted-sync' means that the CTDB NFS node acks the I/O before it
  reached quorum then I'd say that's a gluster bug.  It should ack the I/O
  before data hits the disc, but it should not ack it before it has quorum.
  However, the configuration we feel comfortable using gluster is with both
  server and client quorum (gluster has 2 different configs and you need to
  configure both to work safely).
 Gluster does not replicate to its peers.  Gluster writes to all peers at
 the same time, as part of the original write process.  If quorum is on, the
 process either works or it 

Re: [Users] Opinions needed: 3 node gluster replica 3 | NFS async | snapshots for consistency

2014-02-23 Thread Ayal Baron


- Original Message -
 I'm looking for some opinions on this configuration in an effort to increase
 write performance:
 
 3 storage nodes using glusterfs in replica 3, quorum.

gluster doesn't support replica 3 yet, so I'm not sure how heavily I'd rely on 
this.

 Ovirt storage domain via NFS

why NFS and not gluster?

 Volume set nfs.trusted-sync on
 On Ovirt, taking snapshots often enough to recover from a storage crash

Note that this would have negative write performance impact

 Using CTDB to manage NFS storage domain IP, moving it to another storage node
 when necessary
 
 Something along the lines of EC2's data consistency model, where only
 snapshots can be considered reliable. The Ovirt added advantage would be
 memory consistency at time of snapshot as well.
 
 Feedback appreciated, including 'you are insane for thinking this is a good
 idea' (and some supported reasoning would be great).
 
 Thanks,
 
 
 
 Steve Dainard
 IT Infrastructure Manager
 Miovision | Rethink Traffic
 
 Blog | LinkedIn | Twitter | Facebook
 Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON,
 Canada | N2C 1L3
 This e-mail may contain information that is privileged or confidential. If
 you are not the intended recipient, please delete the e-mail and any
 attachments and notify us immediately.
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Opinions needed: 3 node gluster replica 3 | NFS async | snapshots for consistency

2014-02-23 Thread Steve Dainard
On Sun, Feb 23, 2014 at 4:27 AM, Ayal Baron aba...@redhat.com wrote:



 - Original Message -
  I'm looking for some opinions on this configuration in an effort to
 increase
  write performance:
 
  3 storage nodes using glusterfs in replica 3, quorum.

 gluster doesn't support replica 3 yet, so I'm not sure how heavily I'd
 rely on this.


Glusterfs or RHSS doesn't support rep 3? How could I create a quorum
without 3+ hosts?



  Ovirt storage domain via NFS

 why NFS and not gluster?


Gluster via posix SD doesn't have any performance gains over NFS, maybe the
opposite.

Gluster 'native' SD's are broken on EL6.5 so I have been unable to test
performance. I have heard performance can be upwards of 3x NFS for raw
write.

Gluster doesn't have an async write option, so its doubtful it will ever be
close to NFS async speeds.



  Volume set nfs.trusted-sync on
  On Ovirt, taking snapshots often enough to recover from a storage crash

 Note that this would have negative write performance impact


The difference between NFS sync (50MB/s) and async (300MB/s on 10g) write
speeds should more than compensate for the performance hit of taking
snapshots more often. And that's just raw speed. If we take into
consideration IOPS (guest small writes) async is leaps and bounds ahead.


If we assume the site has backup UPS and generator power and we can build a
highly available storage system with 3 nodes in quorum, are there any
potential issues other than a write performance hit?

The issue I thought might be most prevalent is if an ovirt host goes down
and the VM's are automatically brought back up on another host, they could
incur disk corruption and need to be brought back down and restored to the
last snapshot state. This basically means the HA feature should be disabled.

Even worse, if the gluster node with CTDB NFS IP goes down, it may not have
written out and replicated to its peers.  -- I think I may have just
answered my own question.


Thanks,
Steve
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Opinions needed: 3 node gluster replica 3 | NFS async | snapshots for consistency

2014-02-23 Thread Ayal Baron


- Original Message -
 On Sun, Feb 23, 2014 at 4:27 AM, Ayal Baron aba...@redhat.com wrote:
 
 
 
  - Original Message -
   I'm looking for some opinions on this configuration in an effort to
  increase
   write performance:
  
   3 storage nodes using glusterfs in replica 3, quorum.
 
  gluster doesn't support replica 3 yet, so I'm not sure how heavily I'd
  rely on this.
 
 
 Glusterfs or RHSS doesn't support rep 3? How could I create a quorum
 without 3+ hosts?

glusterfs has the capability but it hasn't been widely tested with oVirt yet 
and we've already found a couple of issues there.
afaiu gluster has the ability to define a tie breaker (a third node which is 
part of the quorum but does not provide a third replica of the data).

 
 
 
   Ovirt storage domain via NFS
 
  why NFS and not gluster?
 
 
 Gluster via posix SD doesn't have any performance gains over NFS, maybe the
 opposite.

gluster via posix is mounting it using the gluster fuse client which should 
provide better performance + availability than NFS.

 
 Gluster 'native' SD's are broken on EL6.5 so I have been unable to test
 performance. I have heard performance can be upwards of 3x NFS for raw
 write.

Broken how?

 
 Gluster doesn't have an async write option, so its doubtful it will ever be
 close to NFS async speeds.t
 
 
 
   Volume set nfs.trusted-sync on
   On Ovirt, taking snapshots often enough to recover from a storage crash
 
  Note that this would have negative write performance impact
 
 
 The difference between NFS sync (50MB/s) and async (300MB/s on 10g) write
 speeds should more than compensate for the performance hit of taking
 snapshots more often. And that's just raw speed. If we take into
 consideration IOPS (guest small writes) async is leaps and bounds ahead.

I would test this, since qemu is already doing async I/O (using threads when 
native AIO is not supported) and oVirt runs it with cache=none (direct I/O) so 
sync ops should not happen that often (depends on guest).  You may be still 
enjoying performance boost, but I've seen UPS systems fail before bringing down 
multiple nodes at once.
In addition, if you do not guarantee your data is safe when you create a 
snapshot (and it doesn't seem like you are) then I see no reason to think your 
snapshots are any better off than latest state on disk.

 
 
 If we assume the site has backup UPS and generator power and we can build a
 highly available storage system with 3 nodes in quorum, are there any
 potential issues other than a write performance hit?
 
 The issue I thought might be most prevalent is if an ovirt host goes down
 and the VM's are automatically brought back up on another host, they could
 incur disk corruption and need to be brought back down and restored to the
 last snapshot state. This basically means the HA feature should be disabled.

I'm not sure I understand what your concern is here, what would cause the data 
corruption? if your node crashed then there is no I/O in flight.  So starting 
up the VM should be perfectly safe.

 
 Even worse, if the gluster node with CTDB NFS IP goes down, it may not have
 written out and replicated to its peers.  -- I think I may have just
 answered my own question.

If 'trusted-sync' means that the CTDB NFS node acks the I/O before it reached 
quorum then I'd say that's a gluster bug.  It should ack the I/O before data 
hits the disc, but it should not ack it before it has quorum.
However, the configuration we feel comfortable using gluster is with both 
server and client quorum (gluster has 2 different configs and you need to 
configure both to work safely).


 
 
 Thanks,
 Steve
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Opinions needed: 3 node gluster replica 3 | NFS async | snapshots for consistency

2014-02-23 Thread Steve Dainard
On Sun, Feb 23, 2014 at 3:20 PM, Ayal Baron aba...@redhat.com wrote:



 - Original Message -
  On Sun, Feb 23, 2014 at 4:27 AM, Ayal Baron aba...@redhat.com wrote:
 
  
  
   - Original Message -
I'm looking for some opinions on this configuration in an effort to
   increase
write performance:
   
3 storage nodes using glusterfs in replica 3, quorum.
  
   gluster doesn't support replica 3 yet, so I'm not sure how heavily I'd
   rely on this.
  
 
  Glusterfs or RHSS doesn't support rep 3? How could I create a quorum
  without 3+ hosts?

 glusterfs has the capability but it hasn't been widely tested with oVirt
 yet and we've already found a couple of issues there.
 afaiu gluster has the ability to define a tie breaker (a third node which
 is part of the quorum but does not provide a third replica of the data).


Good to know, I'll dig into this.



 
 
  
Ovirt storage domain via NFS
  
   why NFS and not gluster?
  
 
  Gluster via posix SD doesn't have any performance gains over NFS, maybe
 the
  opposite.

 gluster via posix is mounting it using the gluster fuse client which
 should provide better performance + availability than NFS.


Availability for sure, but performance is seriously questionable. I've run
in both scenarios and haven't seen a performance improvement, the general
consensus seems to be fuse is adding overhead and therefore decreasing
performance vs. NFS.



 
  Gluster 'native' SD's are broken on EL6.5 so I have been unable to test
  performance. I have heard performance can be upwards of 3x NFS for raw
  write.

 Broken how?


Ongoing issues, libgfapi support wasn't available, and then was disabled
because snapshot support wasn't built into the kvm packages which was a
dependency. There are a few threads in reference to this, and some effort
to get CentOS builds to enable snapshot support in kvm.

I have installed rebuilt qemu packages with the RHEV snapshot flag enabled,
and was just able to create a native gluster SD, maybe I missed something
during a previous attempt. I'll test performance and see if its close to
what I'm looking for.



 
  Gluster doesn't have an async write option, so its doubtful it will ever
 be
  close to NFS async speeds.t
 
 
  
Volume set nfs.trusted-sync on
On Ovirt, taking snapshots often enough to recover from a storage
 crash
  
   Note that this would have negative write performance impact
  
 
  The difference between NFS sync (50MB/s) and async (300MB/s on 10g)
 write
  speeds should more than compensate for the performance hit of taking
  snapshots more often. And that's just raw speed. If we take into
  consideration IOPS (guest small writes) async is leaps and bounds ahead.

 I would test this, since qemu is already doing async I/O (using threads
 when native AIO is not supported) and oVirt runs it with cache=none (direct
 I/O) so sync ops should not happen that often (depends on guest).  You may
 be still enjoying performance boost, but I've seen UPS systems fail before
 bringing down multiple nodes at once.
 In addition, if you do not guarantee your data is safe when you create a
 snapshot (and it doesn't seem like you are) then I see no reason to think
 your snapshots are any better off than latest state on disk.


My logic here was if a snapshot is run, then the disk and system state
should be consistent at time of snapshot once its been written to storage.
If the host failed during snapshot then the snapshot would be incomplete,
and the last complete snapshot would need to be used for recovery.



 
 
  If we assume the site has backup UPS and generator power and we can
 build a
  highly available storage system with 3 nodes in quorum, are there any
  potential issues other than a write performance hit?
 
  The issue I thought might be most prevalent is if an ovirt host goes down
  and the VM's are automatically brought back up on another host, they
 could
  incur disk corruption and need to be brought back down and restored to
 the
  last snapshot state. This basically means the HA feature should be
 disabled.

 I'm not sure I understand what your concern is here, what would cause the
 data corruption? if your node crashed then there is no I/O in flight.  So
 starting up the VM should be perfectly safe.


Good point, that makes sense.



 
  Even worse, if the gluster node with CTDB NFS IP goes down, it may not
 have
  written out and replicated to its peers.  -- I think I may have just
  answered my own question.

 If 'trusted-sync' means that the CTDB NFS node acks the I/O before it
 reached quorum then I'd say that's a gluster bug.


http://gluster.org/community/documentation/index.php/Gluster_3.2:_Setting_Volume_Options#nfs.trusted-syncSpecifically
mentions data won't be guaranteed to be on disk, but doesn't
mention if data would be replicated in memory between gluster nodes.
Technically async breaks the NFS protocol standard anyways but this seems
like a question for the gluster guys, I'll 

[Users] Opinions needed: 3 node gluster replica 3 | NFS async | snapshots for consistency

2014-02-20 Thread Steve Dainard
I'm looking for some opinions on this configuration in an effort to
increase write performance:

3 storage nodes using glusterfs in replica 3, quorum.
Ovirt storage domain via NFS
Volume set nfs.trusted-sync on
On Ovirt, taking snapshots often enough to recover from a storage crash
Using CTDB to manage NFS storage domain IP, moving it to another storage
node when necessary

Something along the lines of EC2's data consistency model, where only
snapshots can be considered reliable. The Ovirt added advantage would be
memory consistency at time of snapshot as well.

Feedback appreciated, including 'you are insane for thinking this is a good
idea' (and some supported reasoning would be great).

Thanks,



*Steve Dainard *
IT Infrastructure Manager
Miovision http://miovision.com/ | *Rethink Traffic*

*Blog http://miovision.com/blog  |  **LinkedIn
https://www.linkedin.com/company/miovision-technologies  |  Twitter
https://twitter.com/miovision  |  Facebook
https://www.facebook.com/miovision*
--
 Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON,
Canada | N2C 1L3
This e-mail may contain information that is privileged or confidential. If
you are not the intended recipient, please delete the e-mail and any
attachments and notify us immediately.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users