Re: [ovirt-users] Glusterfs HA with Ovirt

2014-07-04 Thread Brad House

I was never able to achieve a stable system that could survive the loss of a
single node with glusterfs.  I attempted to use replica 2 across 3 nodes (which
required 2 bricks per node as the number of bricks must be a multiple of the
replica, and you have to order them so the brick pairs span servers).  I enabled
server-side quorum, but found out later that client side quorum is based on
'sub volumes', which means that with a single node failure on replica 2, even
though there were 3 nodes, it would go into a readonly state.

After disabling client-side quorum (but keeping server side quorum) I thought 
the
issue was fixed, but every once in a while, rebooting one of the nodes (after 
ensuring
gluster was healed) would lead to i/o errors on the VM guest and essentially 
make
it so it needed to be rebooted (which was successful and everything worked after
even before bringing the downed node back up).  My nodes were all combined 
glusterfs
and ovirt nodes. I tried using both 'localhost' on the nodes as well as using a
keepalived VIP.

Its possible my issues were all due to client-side quorum not being enabled,
but that would require replica 3 to be able to survive a single node failure, 
but
I never pursued testing that theory.  Also, heal times seemed a bit long for
a single idle VM, it would consume 2 full cores of CPU for about 5 minutes for
healing a single idle VM (granted, I was testing on a 1Gbps network, but that
doesn't explain the CPU usage).

-Brad

On 7/4/14 1:29 AM, Andrew Lau wrote:

As long as all your compute nodes are part of the gluster peer,
localhost will work.
Just remember, gluster will connect to any server, so even if you
mount as localhost:/ it could be accessing the storage from another
host in the gluster peer group.


On Fri, Jul 4, 2014 at 3:26 PM, Punit Dambiwal hypu...@gmail.com wrote:

Hi Andrew,

Yes..both on the same node...but i have 4 nodes of this type in the same
clusterSo it should work or not ??

1. 4 physical nodes with 12 bricks each(distributed replicated)...
2. The same all 4 nodes use for the compute purpose also...

Do i still require the VIP...or not ?? because i tested even the mount point
node goes down...the VM will not pause and not affect...


On Fri, Jul 4, 2014 at 1:18 PM, Andrew Lau and...@andrewklau.com wrote:


Or just localhost as your computer and storage are on the same box.


On Fri, Jul 4, 2014 at 2:48 PM, Punit Dambiwal hypu...@gmail.com wrote:

Hi Andrew,

Thanks for the updatethat means HA can not work without VIP in the
gluster,so better to use the glusterfs with the VIP to take over the
ip...in
case of any storage node failure...


On Fri, Jul 4, 2014 at 12:35 PM, Andrew Lau and...@andrewklau.com
wrote:


Don't forget to take into consideration quroum, that's something
people often forget

The reason you're having the current happen, is gluster only uses the
initial IP address to get the volume details. After that it'll connect
directly to ONE of the servers, so with your 2 storage server case,
50% chance it won't go to paused state.

For the VIP, you could consider CTDB or keepelived, or even just using
localhost (as your storage and compute are all on the same machine).
For CTDB, checkout
http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/

I have a BZ open regarding gluster VMs going into paused state and not
being resumable, so it's something you should also consider. My case,
switch dies, gluster volume goes away, VMs go into paused state but
can't be resumed. If you lose one server out of a cluster is a
different story though.
https://bugzilla.redhat.com/show_bug.cgi?id=1058300

HTH

On Fri, Jul 4, 2014 at 11:48 AM, Punit Dambiwal hypu...@gmail.com
wrote:

Hi,

Thanks...can you suggest me any good how to/article for the glusterfs
with
ovirt...

One strange thing is if i will try both (compute  storage) on the
same
node...the below quote not happen

-

Right now, if 10.10.10.2 goes away, all your gluster mounts go away
and
your
VMs get paused because the hypervisors can’t access the storage. Your
gluster storage is still fine, but ovirt can’t talk to it because
10.10.10.2
isn’t there.
-

Even the 10.10.10.2 goes down...i can still access the gluster mounts
and no
VM pausei can access the VM via ssh...no connection
failure.the
connection drop only in case of SPM goes down and the another node
will
elect as SPM(All the running VM's pause in this condition).



On Fri, Jul 4, 2014 at 4:12 AM, Darrell Budic
darrell.bu...@zenfire.com
wrote:


You need to setup a virtual IP to use as the mount point, most
people
use
keepalived to provide a virtual ip via vrrp for this. Setup
something
like
10.10.10.10 and use that for your mounts.

Right now, if 10.10.10.2 goes away, all your gluster mounts go away
and
your VMs get paused because the hypervisors can’t access the
storage.
Your
gluster storage is still fine, but ovirt can’t talk to it because
10.10.10.2
isn’t there.

If 

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-10 Thread Brad House

Ok, I thought I was doing something wrong yesterday and just
tore down my 3-node cluster with the hosted engine and started
rebuilding.  I was seeing essentially the same thing, a score of
0 on the VMs not running the engine, it wouldn't allow migration of
the hosted engine.   I played with all things related to setting
maintenance and rebooting hosts, nothing brought them up to a
point where I could migrate the hosted engine.

I thought it was related to ovirt messing up when deploying the
other hosts (I told it not to modify the firewall that I disabled,
but the deploy process forcibly reenabled the firewall which gluster
really didn't like).  Now after reading this it appears my assumption
may be false.

Previously a 2-node cluster I had worked fine, but I wanted to
go to 3-nodes so I could enable quorum on gluster to not risk
split-brain issues.

-Brad


On 6/10/14 1:19 AM, Andrew Lau wrote:

I'm really having a hard time finding out why it's happening..

If I set the cluster to global for a minute or two, the scores will
reset back to 2400. Set maintenance mode to none, and all will be fine
until a migration occurs. It seems it tries to migrate, fails and sets
the score to 0 permanently rather than the 10? minutes mentioned in
one of the ovirt slides.

When I have two hosts, it's score 0 only when a migration occurs.
(Just on the host which doesn't have engine up). The score 0 only
happens when it's tried to migrate when I set the host to local
maintenance. Migrating the VM from the UI has worked quite a few
times, but it's recently started to fail.

When I have three hosts, after 5~ mintues of them all up the score
will hit 0 on the hosts not running the VMs. It doesn't even have to
attempt to migrate before the score goes to 0. Stopping the ha agent
on one host, and resetting it with the global maintenance method
brings it back to the 2 host scenario above.

I may move on and just go back to a standalone engine as this is not
getting very much luck..

On Tue, Jun 10, 2014 at 3:11 PM, combuster combus...@archlinux.us wrote:

Nah, I've explicitly allowed hosted-engine vm to be able to access the NAS
device as the NFS share itself, before the deploy procedure even started.
But I'm puzzled at how you can reproduce the bug, all was well on my setup
before I've stated manual migration of the engine's vm. Even auto migration
worked before that (tested it). Does it just happen without any procedure on
the engine itself? Is the score 0 for just one node, or two of three of
them?

On 06/10/2014 01:02 AM, Andrew Lau wrote:


nvm, just as I hit send the error has returned.
Ignore this..

On Tue, Jun 10, 2014 at 9:01 AM, Andrew Lau and...@andrewklau.com wrote:


So after adding the L3 capabilities to my storage network, I'm no
longer seeing this issue anymore. So the engine needs to be able to
access the storage domain it sits on? But that doesn't show up in the
UI?

Ivan, was this also the case with your setup? Engine couldn't access
storage domain?

On Mon, Jun 9, 2014 at 9:56 PM, Andrew Lau and...@andrewklau.com wrote:


Interesting, my storage network is a L2 only and doesn't run on the
ovirtmgmt (which is the only thing HostedEngine sees) but I've only
seen this issue when running ctdb in front of my NFS server. I
previously was using localhost as all my hosts had the nfs server on
it (gluster).

On Mon, Jun 9, 2014 at 9:15 PM, Artyom Lukianov aluki...@redhat.com
wrote:


I just blocked connection to storage for testing, but on result I had
this error: Failed to acquire lock error -243, so I added it in reproduce
steps.
If you know another steps to reproduce this error, without blocking
connection to storage it also can be wonderful if you can provide them.
Thanks

- Original Message -
From: Andrew Lau and...@andrewklau.com
To: combuster combus...@archlinux.us
Cc: users users@ovirt.org
Sent: Monday, June 9, 2014 3:47:00 AM
Subject: Re: [ovirt-users] VM HostedEngie is down. Exist message:
internal error Failed to acquire lock error -243

I just ran a few extra tests, I had a 2 host, hosted-engine running
for a day. They both had a score of 2400. Migrated the VM through the
UI multiple times, all worked fine. I then added the third host, and
that's when it all fell to pieces.
Other two hosts have a score of 0 now.

I'm also curious, in the BZ there's a note about:

where engine-vm block connection to storage domain(via iptables -I
INPUT -s sd_ip -j DROP)

What's the purpose for that?

On Sat, Jun 7, 2014 at 4:16 PM, Andrew Lau and...@andrewklau.com
wrote:


Ignore that, the issue came back after 10 minutes.

I've even tried a gluster mount + nfs server on top of that, and the
same issue has come back.

On Fri, Jun 6, 2014 at 6:26 PM, Andrew Lau and...@andrewklau.com
wrote:


Interesting, I put it all into global maintenance. Shut it all down
for 10~ minutes, and it's regained it's sanlock control and doesn't
seem to have that issue coming up in the log.

On Fri, Jun 6, 2014 at 4:21 PM, combuster 

Re: [Users] [ANN] oVirt 3.4.0 Release Candidate is now available

2014-02-28 Thread Brad House

You're welcome to join us testing this release candidate in next week test day 
[2] scheduled for 2014-03-06!

[1] http://www.ovirt.org/OVirt_3.4.0_release_notes
[2] http://www.ovirt.org/OVirt_3.4_Test_Day


Known issues should list some information about Gluster I think.
Such as the fact that libgfapi is not currently being used even
when choosing GlusterFS instead of POSIXFS, instead it creates
a Posix mount and uses that.  This was an advertised 3.3 feature,
so this would be considered a regression or known issue, right?

I was told it was due to BZ #1017289

This has been observed in Fedora 19, though that BZ lists RHEL6.

Thanks!
-Brad
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[Users] oVirt 3.4 pre-release and GlusterFS support (F19)

2014-02-20 Thread Brad House

I've been testing the 3.4 prerelease on Fedora 19.  When I create a GlusterFS
(not POSIXFS) storage group and create a VM with a disk image on the storage
group, I see a POSIX mount created on the host.  Upon further investigation,
when evaluating the executed qemu command line, it doesn't appear qemu is
being told to use libgfapi but rather that previously observed POSIX mount.

One other note, I'm specifically testing the hosted engine, and haven't
tested using the non-hosted variant.

The question is  is this expected behavior, and if so, is it because
of the hosted engine?  Or is this some form of regression from the
advertised feature list of oVirt 3.3?  Anything I should try or look at?
I'm obviously concerned about the FUSE overhead with Gluster and would
like to avoid that if possible.

Thanks!
-Brad
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] oVirt 3.4.0 beta - Hosted Engine Setup -- issues

2014-01-23 Thread Brad House

On 1/22/14 11:42 PM, Andrew Lau wrote:

That sounds exactly what I did, do you mind putting those log files into the BZ 
saves me from having to find some
hardware and replicate it again.

So what i did was I ended up just nuking the whole OS, and getting a clean 
start. First after doing all your initial
prep EXCEPT configuring your 4 NICs run the hosted-engine --deploy command 
twice. Assuming the first run fails always
like in my case, else once you get to the Configure Storage phase press Ctrl 
+ D to exit.

Now configure your NICs (also configure ovirtmgmt manually as there is another 
BZ about it not being able to create the
bridge) and rerun hosted-engine --deploy and you should be back in action. 
This should get you to a working
hosted-engine solution.

P.S. could you add me in the CC when you reply, I would've seen your message 
sooner.


Weird, very very weird.  I'll give it a shot and see what happens.

Thanks!
-Brad
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] oVirt 3.4.0 beta - Hosted Engine Setup -- issues

2014-01-23 Thread Brad House



On 1/23/14 7:00 AM, Andrew Lau wrote:

Good luck! If you get time it'd really be great if you could post those logs 
(ovirt-hosted-engine-setup.log and
vdsm.log) to BZ 1055153 for me? It'd help them debug the issue and save me from 
having to find a new spare server.

I spent a good 2 days trying to work through the alpha jungle so hope this 
helps :)


No, problem, I'll do that before I rebuild the server to follow your procedure.
Hopefully I'll be able to do it today since it is test day, but unfortunately
I've got meetings planned most of the day :/

-Brad
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] hosted-engine setup fails on RHEL6

2014-01-23 Thread Brad House

On 01/23/2014 08:24 AM, Frank Wall wrote:

Hi,

I'm currently trying to setup a hosted-engine on a RHEL6 host
with nightly repository (because 3.4 BETA didn't work either):


snip


[ ERROR ] Failed to execute stage 'Environment customization': [Errno 111] 
Connection refused
[ INFO  ] Stage: Clean up
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination



snip



Any hint?


Thanks
- Frank



Yep, I get the _same_ exact issue.  Please see my e-mail chain from
yesterday with the subject line:

oVirt 3.4.0 beta - Hosted Engine Setup -- issues

And look at Andrew Lau's responses who also experienced the same
issue ... with a potential work-around which I have yet to try.

-Brad
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[Users] oVirt 3.4.0 beta - Hosted Engine Setup -- issues

2014-01-22 Thread Brad House

I'm trying to test out the oVirt Hosted Engine, but am experiencing a
failure early on and was hoping someone could point me in the right
direction.   I'm not familiar enough with the architecture of oVirt
to start to debug this situation.

Basically, I run the hosted-engine --deploy command and it outputs:


[ INFO  ] Stage: Initializing
  Continuing will configure this host for serving as hypervisor and 
create a VM where you have to install oVirt Engine afterwards.
  Are you sure you want to continue? (Yes, No)[Yes]:
[ INFO  ] Generating a temporary VNC password.
[ INFO  ] Stage: Environment setup
  Configuration files: []
  Log file: 
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20140122110741.log
  Version: otopi-1.2.0_beta (otopi-1.2.0-0.1.beta.fc19)
[ INFO  ] Hardware supports virtualization
[ INFO  ] Bridge ovirtmgmt already created
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Stage: Programs detection
[ INFO  ] Stage: Environment setup
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Waiting for VDSM hardware info
[ INFO  ] Stage: Environment customization

  --== STORAGE CONFIGURATION ==--

  During customization use CTRL-D to abort.
[ ERROR ] Failed to execute stage 'Environment customization': [Errno 111] 
Connection refused
[ INFO  ] Stage: Clean up
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination


/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20140122110741.log 
has:

2014-01-22 11:07:57 DEBUG 
otopi.plugins.ovirt_hosted_engine_setup.storage.storage 
storage._check_existing_pools:631 getConnectedStoragePoolsList
2014-01-22 11:07:57 DEBUG otopi.context context._executeMethod:152 method 
exception
Traceback (most recent call last):
  File /usr/lib/python2.7/site-packages/otopi/context.py, line 142, in 
_executeMethod
method['method']()
  File 
/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/storage/storage.py,
 line 729, in _customization
self._check_existing_pools()
  File 
/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/storage/storage.py,
 line 632, in _check_existing_pools
pools = self.serv.s.getConnectedStoragePoolsList()
  File /usr/lib64/python2.7/xmlrpclib.py, line 1224, in __call__
return self.__send(self.__name, args)
  File /usr/lib64/python2.7/xmlrpclib.py, line 1578, in __request
verbose=self.__verbose
  File /usr/lib64/python2.7/xmlrpclib.py, line 1264, in request
return self.single_request(host, handler, request_body, verbose)
  File /usr/lib64/python2.7/xmlrpclib.py, line 1292, in single_request
self.send_content(h, request_body)
  File /usr/lib64/python2.7/xmlrpclib.py, line 1439, in send_content
connection.endheaders(request_body)
  File /usr/lib64/python2.7/httplib.py, line 969, in endheaders
self._send_output(message_body)
  File /usr/lib64/python2.7/httplib.py, line 829, in _send_output
self.send(msg)
  File /usr/lib64/python2.7/httplib.py, line 791, in send
self.connect()
  File /usr/lib64/python2.7/site-packages/vdsm/SecureXMLRPCServer.py, line 
188, in connect
sock = socket.create_connection((self.host, self.port), self.timeout)
  File /usr/lib64/python2.7/socket.py, line 571, in create_connection
raise err
error: [Errno 111] Connection refused
2014-01-22 11:07:57 ERROR otopi.context context._executeMethod:161 Failed to 
execute stage 'Environment customization': [Errno 111] Connection refused


Unfortunately I do not know what service this is trying to connect to, or what 
hostname
or port to try to start debugging that.

Some other useful information about my environment:

- Fedora 19 (64bit), minimal install, selected 'standard' add-on utilities.  
This was a
  fresh install just for this test.
  - 512MB /boot ext4
  - 80GB / ext4 in LVM, 220GB free in VG
- yum -y update performed to get all latest updates
- SElinux in permissive mode
- Hardware:
  - Supermicro 1026T-URF barebones
  - single CPU populated (Xeon E5630 4x2.53GHz)
  - 12GB ECC DDR3 RAM
  - H/W Raid with SSDs
- Networking:
  - Network Manager DISABLED
  - 4 GbE ports (p2p1, p2p2, em1, em2)
  - all 4 ports configured in a bond (bond0) using balance-alb
  - ovirtmgmt bridge pre-created with 'bond0' as the only member, assigned a 
static IP address
  - firewall DISABLED
- /etc/yum.repos.d/ovirt.rep has ONLY the 3.4.0 beta repo enabled:
[ovirt-3.4.0-beta]
name=3.4.0 beta testing repo for the oVirt project

Re: [Users] oVirt 3.4.0 beta - Hosted Engine Setup -- issues

2014-01-22 Thread Brad House



On 01/22/2014 05:42 PM, Andrew Lau wrote:

Hi,

This looks like this BZ which I reported 
https://bugzilla.redhat.com/show_bug.cgi?id=1055153

Did you customize your nics before you tried running hosted-engine --deploy?

Thanks,
Andrew


Yes, I created all the /etc/sysconfig/network-scripts/ifcfg-* for my 4 NICs in 
the bond,
then also created the ifcfg-bond0, as well as an ifcfg-ovirtmgmt which uses the 
bond0.
So the ovirt management interface should be fully configured before even running
the hosted-engine --deploy ... when I was testing the all-in-one (non-hosted) I 
had
to do that, so figured it was a good idea for hosted too.

I'm trying to understand your BZ 1055153, did you actually get it to work?  Or 
did
you get stuck at that point?  I have seen the same issue as your BZ 1055129 as 
well.

-Brad
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users