Re: [ovirt-users] Promiscuous Mode

2016-03-05 Thread combuster

I haven't tried it, but here is a guide on how to add a hook to ovirt-node:

http://www.ovirt.org/develop/developer-guide/vdsm/hook/qemucmdline/

On 03/05/2016 09:48 PM, Christopher Young wrote:

Question:

There is no yum functionality on ovirt-node/RHEV-H, so how does one go
about this in that scenario?

On Sat, Mar 5, 2016 at 3:32 PM, combuster <combus...@gmail.com> wrote:

It's great to know that it's working.

Best of luck Clint.


On 03/05/2016 09:09 PM, cl...@theboggios.com wrote:

On 2016-03-05 13:34, combuster wrote:

Correct procedure would be:

1. On each of your ovirt nodes run:

yum install vdsm-hook-macspoof

2. On the engine run:

sudo engine-config -s "UserDefinedVMProperties=macspoof=^(true|false)$"

3. Edit OpenVPN virtual machine settings and add a custom property
containing macspoof keyword and set the value "true" for it.

If you want to remove filtering for a single interface, then replace
steps 2 and 3 as outlined in the README.

Kind regards,

Ivan

On 03/05/2016 08:21 PM, cl...@theboggios.com wrote:

On 2016-03-05 13:13, combuster wrote:

Ignore the link (minor accident while pasting). Yum will download the
appropriate one from the repos.

On 03/05/2016 08:09 PM, combuster wrote:


Just the hook rpm (vdsm-hook-macspoof [1]).

Ivan

On 03/05/2016 08:02 PM, Christopher Young wrote:

I had a related question on this.

When it comes to ovirt-node or rhev-h, is there anything required to
be installed on the hypervisor hosts themselves?

Thanks,

Chris
On Mar 5, 2016 1:47 PM, "combuster" <combus...@gmail.com> wrote:
Hi Clint, you might want to check the macspoof hook features here:

https://github.com/oVirt/vdsm/tree/master/vdsm_hooks/macspoof [2]

This should override arp/spoofing filtering, that might be the
cause of your issues with OpenVPN setup (first guess).

On 03/05/2016 07:30 PM, Clint Boggio wrote:
I am deploying an OpenVPN server in my OVirt environment and I've
come to a dead stop with the developer support on a topic related to
OVirt configuration.

The developer wants me to put the VM's underlying NIC into
promiscuous mode.

I've seen this in a VMware environment and I know what they are
asking me to do, and I'm wondering if there is a clear way to do
this in my OVirt environment.

I found "port mirroring" but no "promiscuous mode"

Cheers and thank you !
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users [3]

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users [3]




Links:
--
[1]

http://resources.ovirt.org/pub/ovirt-3.5/rpm/el7Server/noarch/vdsm-hook-macspoof-4.16.10-0.el7.noarch.rpm
[2] https://github.com/oVirt/vdsm/tree/master/vdsm_hooks/macspoof
[3] http://lists.ovirt.org/mailman/listinfo/users



Thank you very much. Reading the README it appears that there is a
series of commands to run on the engine to make the options to remove
filtering from the vNIC, or the whole VM available. What purpose is filled
by the two scripts that are included in the git, and where do I put them so
that they will be utilized if that's even necessary ?



Ivan, because of YOU, I get me weekend back ! It works and OVPN is up and
running.

Thank you SO MUCH !


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Promiscuous Mode

2016-03-05 Thread combuster

It's great to know that it's working.

Best of luck Clint.

On 03/05/2016 09:09 PM, cl...@theboggios.com wrote:

On 2016-03-05 13:34, combuster wrote:

Correct procedure would be:

1. On each of your ovirt nodes run:

yum install vdsm-hook-macspoof

2. On the engine run:

sudo engine-config -s "UserDefinedVMProperties=macspoof=^(true|false)$"

3. Edit OpenVPN virtual machine settings and add a custom property
containing macspoof keyword and set the value "true" for it.

If you want to remove filtering for a single interface, then replace
steps 2 and 3 as outlined in the README.

Kind regards,

Ivan

On 03/05/2016 08:21 PM, cl...@theboggios.com wrote:

On 2016-03-05 13:13, combuster wrote:

Ignore the link (minor accident while pasting). Yum will download the
appropriate one from the repos.

On 03/05/2016 08:09 PM, combuster wrote:


Just the hook rpm (vdsm-hook-macspoof [1]).

Ivan

On 03/05/2016 08:02 PM, Christopher Young wrote:

I had a related question on this.

When it comes to ovirt-node or rhev-h, is there anything required to
be installed on the hypervisor hosts themselves?

Thanks,

Chris
On Mar 5, 2016 1:47 PM, "combuster" <combus...@gmail.com> wrote:
Hi Clint, you might want to check the macspoof hook features here:

https://github.com/oVirt/vdsm/tree/master/vdsm_hooks/macspoof [2]

This should override arp/spoofing filtering, that might be the
cause of your issues with OpenVPN setup (first guess).

On 03/05/2016 07:30 PM, Clint Boggio wrote:
I am deploying an OpenVPN server in my OVirt environment and I've
come to a dead stop with the developer support on a topic related to
OVirt configuration.

The developer wants me to put the VM's underlying NIC into
promiscuous mode.

I've seen this in a VMware environment and I know what they are
asking me to do, and I'm wondering if there is a clear way to do
this in my OVirt environment.

I found "port mirroring" but no "promiscuous mode"

Cheers and thank you !
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users [3]

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users [3]




Links:
--
[1]
http://resources.ovirt.org/pub/ovirt-3.5/rpm/el7Server/noarch/vdsm-hook-macspoof-4.16.10-0.el7.noarch.rpm 
[2] https://github.com/oVirt/vdsm/tree/master/vdsm_hooks/macspoof

[3] http://lists.ovirt.org/mailman/listinfo/users



Thank you very much. Reading the README it appears that there is a 
series of commands to run on the engine to make the options to 
remove filtering from the vNIC, or the whole VM available. What 
purpose is filled by the two scripts that are included in the git, 
and where do I put them so that they will be utilized if that's even 
necessary ?



Ivan, because of YOU, I get me weekend back ! It works and OVPN is up 
and running.


Thank you SO MUCH !


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Promiscuous Mode

2016-03-05 Thread combuster

Correct procedure would be:

1. On each of your ovirt nodes run:

yum install vdsm-hook-macspoof

2. On the engine run:

sudo engine-config -s "UserDefinedVMProperties=macspoof=^(true|false)$"

3. Edit OpenVPN virtual machine settings and add a custom property 
containing macspoof keyword and set the value "true" for it.


If you want to remove filtering for a single interface, then replace 
steps 2 and 3 as outlined in the README.


Kind regards,

Ivan

On 03/05/2016 08:21 PM, cl...@theboggios.com wrote:

On 2016-03-05 13:13, combuster wrote:

Ignore the link (minor accident while pasting). Yum will download the
appropriate one from the repos.

On 03/05/2016 08:09 PM, combuster wrote:


Just the hook rpm (vdsm-hook-macspoof [1]).

Ivan

On 03/05/2016 08:02 PM, Christopher Young wrote:

I had a related question on this.

When it comes to ovirt-node or rhev-h, is there anything required to
be installed on the hypervisor hosts themselves?

Thanks,

Chris
On Mar 5, 2016 1:47 PM, "combuster" <combus...@gmail.com> wrote:
Hi Clint, you might want to check the macspoof hook features here:

https://github.com/oVirt/vdsm/tree/master/vdsm_hooks/macspoof [2]

This should override arp/spoofing filtering, that might be the
cause of your issues with OpenVPN setup (first guess).

On 03/05/2016 07:30 PM, Clint Boggio wrote:
I am deploying an OpenVPN server in my OVirt environment and I've
come to a dead stop with the developer support on a topic related to
OVirt configuration.

The developer wants me to put the VM's underlying NIC into
promiscuous mode.

I've seen this in a VMware environment and I know what they are
asking me to do, and I'm wondering if there is a clear way to do
this in my OVirt environment.

I found "port mirroring" but no "promiscuous mode"

Cheers and thank you !
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users [3]

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users [3]




Links:
--
[1]
http://resources.ovirt.org/pub/ovirt-3.5/rpm/el7Server/noarch/vdsm-hook-macspoof-4.16.10-0.el7.noarch.rpm 


[2] https://github.com/oVirt/vdsm/tree/master/vdsm_hooks/macspoof
[3] http://lists.ovirt.org/mailman/listinfo/users



Thank you very much. Reading the README it appears that there is a 
series of commands to run on the engine to make the options to remove 
filtering from the vNIC, or the whole VM available. What purpose is 
filled by the two scripts that are included in the git, and where do I 
put them so that they will be utilized if that's even necessary ?


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Promiscuous Mode

2016-03-05 Thread combuster
Ignore the link (minor accident while pasting). Yum will download the 
appropriate one from the repos.


On 03/05/2016 08:09 PM, combuster wrote:
Just the hook rpm (vdsm-hook-macspoof 
<http://resources.ovirt.org/pub/ovirt-3.5/rpm/el7Server/noarch/vdsm-hook-macspoof-4.16.10-0.el7.noarch.rpm>).


Ivan

On 03/05/2016 08:02 PM, Christopher Young wrote:


I had a related question on this.

When it comes to ovirt-node or rhev-h, is there anything required to 
be installed on the hypervisor hosts themselves?


Thanks,

Chris

On Mar 5, 2016 1:47 PM, "combuster" <combus...@gmail.com> wrote:

Hi Clint, you might want to check the macspoof hook features here:

https://github.com/oVirt/vdsm/tree/master/vdsm_hooks/macspoof

This should override arp/spoofing filtering, that might be the
cause of your issues with OpenVPN setup (first guess).

On 03/05/2016 07:30 PM, Clint Boggio wrote:

I am deploying an OpenVPN server in my OVirt environment and
I've come to a dead stop with the developer support on a
topic related to OVirt configuration.

The developer wants me to put the VM's underlying NIC into
promiscuous mode.

I've seen this in a VMware environment and I know what they
are asking me to do, and I'm wondering if there is a clear
way to do this in my OVirt environment.

I found "port mirroring" but no "promiscuous mode"

Cheers and thank you !
___
Users mailing list
Users@ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users





___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Promiscuous Mode

2016-03-05 Thread combuster
Just the hook rpm (vdsm-hook-macspoof 
<http://resources.ovirt.org/pub/ovirt-3.5/rpm/el7Server/noarch/vdsm-hook-macspoof-4.16.10-0.el7.noarch.rpm>).


Ivan

On 03/05/2016 08:02 PM, Christopher Young wrote:


I had a related question on this.

When it comes to ovirt-node or rhev-h, is there anything required to 
be installed on the hypervisor hosts themselves?


Thanks,

Chris

On Mar 5, 2016 1:47 PM, "combuster" <combus...@gmail.com 
<mailto:combus...@gmail.com>> wrote:


Hi Clint, you might want to check the macspoof hook features here:

https://github.com/oVirt/vdsm/tree/master/vdsm_hooks/macspoof

This should override arp/spoofing filtering, that might be the
cause of your issues with OpenVPN setup (first guess).

On 03/05/2016 07:30 PM, Clint Boggio wrote:

I am deploying an OpenVPN server in my OVirt environment and
I've come to a dead stop with the developer support on a topic
related to OVirt configuration.

The developer wants me to put the VM's underlying NIC into
promiscuous mode.

I've seen this in a VMware environment and I know what they
are asking me to do, and I'm wondering if there is a clear way
to do this in my OVirt environment.

I found "port mirroring" but no "promiscuous mode"

Cheers and thank you !
___
Users mailing list
Users@ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org <mailto:Users@ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Promiscuous Mode

2016-03-05 Thread combuster

Hi Clint, you might want to check the macspoof hook features here:

https://github.com/oVirt/vdsm/tree/master/vdsm_hooks/macspoof

This should override arp/spoofing filtering, that might be the cause of 
your issues with OpenVPN setup (first guess).


On 03/05/2016 07:30 PM, Clint Boggio wrote:

I am deploying an OpenVPN server in my OVirt environment and I've come to a 
dead stop with the developer support on a topic related to OVirt configuration.

The developer wants me to put the VM's underlying NIC into promiscuous mode.

I've seen this in a VMware environment and I know what they are asking me to 
do, and I'm wondering if there is a clear way to do this in my OVirt 
environment.

I found "port mirroring" but no "promiscuous mode"

Cheers and thank you !
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Storage network clarification

2016-01-19 Thread combuster
ack (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 332, in run
return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 77, in wrapper
return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1886, in createVolume
initialSize=initialSize)
  File "/usr/share/vdsm/storage/sd.py", line 488, in createVolume
initialSize=initialSize)
  File "/usr/share/vdsm/storage/volume.py", line 476, in create
initialSize=initialSize)
  File "/usr/share/vdsm/storage/fileVolume.py", line 134, in _create
raise se.VolumesZeroingError(volPath)
VolumesZeroingError: Cannot zero out volume: 
(u'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae',)
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,798::task::885::Storage.TaskManager.Task::(_run) 
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::Task._run: 
bf482d82-d8f9-442d-ba93-da5ec225c8c3 () {} failed - stopping task
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,798::task::1246::Storage.TaskManager.Task::(stop) 
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::stopping in state running 
(force False)
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,798::task::993::Storage.TaskManager.Task::(_decref) 
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::ref 1 aborting True
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,799::task::919::Storage.TaskManager.Task::(_runJobs) 
Task=`bf482d82-d8f9-442d-ba93-da5ec225c8c3`::aborting: Task is 
aborted: 'Cannot zero out volume' - code 374



On 01/18/2016 04:00 PM, combuster wrote:
oVirt is still managing the cluster via ovirtmgmt network. The same 
rule applies for tagging networks as VM networks, Live Migration 
networks etc. Gluster is no different, except that it involved a 
couple of manual steps for us to configure it.


On 01/18/2016 03:53 PM, Fil Di Noto wrote:

Thanks I will try this. I am running ovirt-engine 3.6.1.3-1.el7.centos

In the configuration described, is oVirt able to manage gluster? I am
confused because if oVirt knows the nodes by their ovirtmgmt network
IP/hostname aren't all the VDSM commands going to fail?



On Mon, Jan 18, 2016 at 6:39 AM, combuster <combus...@gmail.com> wrote:

Hi Fil,

this worked for me a couple of months back:

http://lists.ovirt.org/pipermail/users/2015-November/036235.html

I'll try to set this up again, and see if there are any issues. 
Which oVirt

release are you running ?

Ivan

On 01/18/2016 02:56 PM, Fil Di Noto wrote:

I'm having trouble setting up a dedicated storage network.

I have a separate VLAN designated for storage, and configured separate
IP addresses for each host that correspond to that subnet. I have
tested this subnet extensively and it is working as expected.

Prior to adding the hosts, I configured a storage network and
configured the cluster to use that network for storage and not the
ovirtmgmt network. I was hopping that this would be recognized when
the hosts were added but it was not. I had to actually reconfigure the
storage VLAN interface via oVirt "manage host networks" just to bring
the host networks into compliance. The IP is configured directly on
the bond0., not on a bridge interface which I assume is
correct since it is not a "VM" network.

In this setup I was not able to activate any of the hosts due to VDSM
gluster errors, I think it was because VDSM was trying to use the
hostname/IP of the ovirtmgmt network. I manually set up the peers
using "gluster peer probe" and I was able to activate the hosts but
they were not using the storage network (tcpdump). I also tried adding
DNS records for the storage network interfaces using different
hostnames but gluster seemed to still consider the ovirtmgmt interface
as the primary.

With the hosts active, I couldn't create/activate any volumes until I
changed the cluster network settings to use the ovirtmgmt network for
storage. I ended up abandoning the dedicated storage subnet for the
time being and I'm starting to wonder if running virtualization and
gluster on the same hosts is intended to work this way.

Assuming that it should work, what is the correct way to configure it?
I can't find any docs that go in detail about storage networks. Is
reverse DNS a factor? If I had a better understanding of what oVirt is
expecting to see that would be helpful.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users






___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Storage network clarification

2016-01-19 Thread combuster
Increasing network ping timeout and lowering the number of io threads 
helped. Disk image gets created, but during that time nodes are pretty 
much unresponsive. I should've expected that on my setup...


In any case, I hope this helps...

Ivan

On 01/19/2016 06:43 PM, combuster wrote:
OK, setting up gluster on a dedicated network is easier this time 
around, mostly point and click adventure (setting everything up from 
scratch):


- 4 NIC's, 2 bonds, one for ovirtmgmt and the other one for gluster
- Tagged gluster network for gluster traffic
- configured IP addresses without gateways on gluster dedicated bonds 
on both nodes
- allowed_replica_counts=1,2,3 in gluster section within 
/etc/vdsm/vdsm.conf to allow replica 2
- added transport.socket.bind-address to /etc/glusterfs/glusterd.vol 
to force glusterd to listen only from gluster dedicated IP address
- modified /etc/hosts so that the nodes can resolve each other by 
gluster dedicated hostnames (optional)

- probed the peers by their gluster hostnames
- created the volume in the same fashion (I've tried creating another 
one from oVirt webadmin and it works also)
- oVirt picked it up and I was able to create gluster storage domain 
on this volume (+ optimized the volume for virt store)
- tcpdump and iftop shows that replication is going through gluster 
dedicated interfaces


One problem so far, creating preallocated disk images fails. It broke 
after zeroing out some 37GB of 40GB in total, but it's an intermittent 
issue (sometimes it fails earlier), I'm still poking around to find 
the culprit. Thin provisioning works. Bricks and volume are fine, as 
are gluster services. Bandwidth related issues from what I can see 
(large amount of net traffic during flushes, 
rpc_clnt_ping_timer_expired followed by sanlock renewal errors), but 
I'll report it as soon as I can confirm it's not a 
hardware/configuration related issue.


vdsm.log:

bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,782::utils::716::Storage.Misc.excCmd::(watchCmd) FAILED: 
 = ["/usr/bin/dd: error writing 
'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae': 
Transport endpoint is not connected", "/usr/bin/dd: closing output 
file 
'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae': 
Transport endpoint is not connected"];  = 1
bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19 
18:03:20,783::fileVolume::133::Storage.Volume::(_create) Unexpected 
error

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/fileVolume.py", line 129, in _create
vars.task.aborting, sizeBytes)
  File "/usr/share/vdsm/storage/misc.py", line 350, in ddWatchCopy
raise se.MiscBlockWriteException(dst, offset, size)
MiscBlockWriteException: Internal block device write failure: 
u'name=/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae, 
offset=0, size=42949672960'
jsonrpc.Executor/7::DEBUG::2016-01-19 
18:03:20,784::__init__::533::jsonrpc.JsonRpcServer::(_serveRequest) 
Return 'GlusterTask.list' in bridge with {'tasks': {}}
bf482d82-d8f9-442d-ba93-da5ec225c8c3::ERROR::2016-01-19 
18:03:20,790::volume::515::Storage.Volume::(create) Unexpected error

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/volume.py", line 476, in create
initialSize=initialSize)
  File "/usr/share/vdsm/storage/fileVolume.py", line 134, in _create
raise se.VolumesZeroingError(volPath)
VolumesZeroingError: Cannot zero out volume: 
(u'/rhev/data-center/90758579-cae7-4fdf-97e5-e8415db68c54/9cbc0f15-119e-4fe7-94ef-8bc84e0c8254/images/283ddfaa-7fc2-4bea-9acc-c8ff601110de/e3d135b2-a7c0-43d4-b3a5-04991cce73ae',)
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,795::resourceManager::616::Storage.ResourceManager::(releaseResource) 
Trying to release resource 
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de'
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,796::resourceManager::635::Storage.ResourceManager::(releaseResource) 
Released resource 
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de' 
(0 active users)
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,796::resourceManager::641::Storage.ResourceManager::(releaseResource) 
Resource 
'9cbc0f15-119e-4fe7-94ef-8bc84e0c8254_imageNS.283ddfaa-7fc2-4bea-9acc-c8ff601110de' 
is free, finding out if anyone is waiting for it.
bf482d82-d8f9-442d-ba93-da5ec225c8c3::DEBUG::2016-01-19 
18:03:20,796::resourceManager::649::Storage.ResourceManager::(releaseResource) 
No one is waiting for resource 
'9cb

Re: [ovirt-users] Storage network clarification

2016-01-18 Thread combuster
oVirt is still managing the cluster via ovirtmgmt network. The same rule 
applies for tagging networks as VM networks, Live Migration networks 
etc. Gluster is no different, except that it involved a couple of manual 
steps for us to configure it.


On 01/18/2016 03:53 PM, Fil Di Noto wrote:

Thanks I will try this. I am running ovirt-engine 3.6.1.3-1.el7.centos

In the configuration described, is oVirt able to manage gluster? I am
confused because if oVirt knows the nodes by their ovirtmgmt network
IP/hostname aren't all the VDSM commands going to fail?



On Mon, Jan 18, 2016 at 6:39 AM, combuster <combus...@gmail.com> wrote:

Hi Fil,

this worked for me a couple of months back:

http://lists.ovirt.org/pipermail/users/2015-November/036235.html

I'll try to set this up again, and see if there are any issues. Which oVirt
release are you running ?

Ivan

On 01/18/2016 02:56 PM, Fil Di Noto wrote:

I'm having trouble setting up a dedicated storage network.

I have a separate VLAN designated for storage, and configured separate
IP addresses for each host that correspond to that subnet. I have
tested this subnet extensively and it is working as expected.

Prior to adding the hosts, I configured a storage network and
configured the cluster to use that network for storage and not the
ovirtmgmt network. I was hopping that this would be recognized when
the hosts were added but it was not. I had to actually reconfigure the
storage VLAN interface via oVirt "manage host networks" just to bring
the host networks into compliance. The IP is configured directly on
the bond0., not on a bridge interface which I assume is
correct since it is not a "VM" network.

In this setup I was not able to activate any of the hosts due to VDSM
gluster errors, I think it was because VDSM was trying to use the
hostname/IP of the ovirtmgmt network. I manually set up the peers
using "gluster peer probe" and I was able to activate the hosts but
they were not using the storage network (tcpdump). I also tried adding
DNS records for the storage network interfaces using different
hostnames but gluster seemed to still consider the ovirtmgmt interface
as the primary.

With the hosts active, I couldn't create/activate any volumes until I
changed the cluster network settings to use the ovirtmgmt network for
storage. I ended up abandoning the dedicated storage subnet for the
time being and I'm starting to wonder if running virtualization and
gluster on the same hosts is intended to work this way.

Assuming that it should work, what is the correct way to configure it?
I can't find any docs that go in detail about storage networks. Is
reverse DNS a factor? If I had a better understanding of what oVirt is
expecting to see that would be helpful.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Storage network clarification

2016-01-18 Thread combuster

Hi Fil,

this worked for me a couple of months back:

http://lists.ovirt.org/pipermail/users/2015-November/036235.html

I'll try to set this up again, and see if there are any issues. Which 
oVirt release are you running ?


Ivan

On 01/18/2016 02:56 PM, Fil Di Noto wrote:

I'm having trouble setting up a dedicated storage network.

I have a separate VLAN designated for storage, and configured separate
IP addresses for each host that correspond to that subnet. I have
tested this subnet extensively and it is working as expected.

Prior to adding the hosts, I configured a storage network and
configured the cluster to use that network for storage and not the
ovirtmgmt network. I was hopping that this would be recognized when
the hosts were added but it was not. I had to actually reconfigure the
storage VLAN interface via oVirt "manage host networks" just to bring
the host networks into compliance. The IP is configured directly on
the bond0., not on a bridge interface which I assume is
correct since it is not a "VM" network.

In this setup I was not able to activate any of the hosts due to VDSM
gluster errors, I think it was because VDSM was trying to use the
hostname/IP of the ovirtmgmt network. I manually set up the peers
using "gluster peer probe" and I was able to activate the hosts but
they were not using the storage network (tcpdump). I also tried adding
DNS records for the storage network interfaces using different
hostnames but gluster seemed to still consider the ovirtmgmt interface
as the primary.

With the hosts active, I couldn't create/activate any volumes until I
changed the cluster network settings to use the ovirtmgmt network for
storage. I ended up abandoning the dedicated storage subnet for the
time being and I'm starting to wonder if running virtualization and
gluster on the same hosts is intended to work this way.

Assuming that it should work, what is the correct way to configure it?
I can't find any docs that go in detail about storage networks. Is
reverse DNS a factor? If I had a better understanding of what oVirt is
expecting to see that would be helpful.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] power management on ovirt3.6.1

2016-01-18 Thread combuster

Hi,

you need at least two servers in the cluster for pm test to succeed. If 
you do, make sure that IP address of the iLO is pingable from all hosts 
in the cluster. oVirt engine log would also help in troubleshooting the 
issue.



On 01/18/2016 02:10 PM, alireza sadeh seighalan wrote:

hi everyone

i wanted to configure  power management on ovirt3.6.1 but it failed. i 
attach the configuration image . thanks in advance



os: centos7.1
ovirt: 3.6.1
server: DL380 G7  (ilo3)


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] HP ILO2 , fence not working, with SSH port specified, a Bug?

2014-06-30 Thread combuster

Well if it's a bug then it would be resolved by now :)

https://bugzilla.redhat.com/show_bug.cgi?id=1026662

Had the same doubts as you did. I really don't know why it wouldn't 
connect to iLO if the default port is specified, but I'm glad that you 
found a workaround.


Ivan

On 06/30/2014 08:36 AM, mad Engineer wrote:

hi i have an old HP server with ILO2

on manager i configured power management and configured SSH port to 
use for ILO2


for checking SSH i manually ssh to ILO and is working fine,

but power management test always fail with *Unable to connect/login 
to fencing device*



log shows its using fence_ilo instead of fence_ilo2

Thread-18::DEBUG::2014-06-30 08:23:14,106::API::1133::vds::(fenceNode) 
fenceNode(addr=,port=,*agent=ilo*,user=Administrator,passwd=,action=status,secure=,options=ipport=22

ssl=no)
Thread-18::DEBUG::2014-06-30 08:23:14,741::API::1159::vds::(fenceNode) 
rc 1 in agent=*fence_ilo*

ipaddr=xx
login=Administrator
action=status
passwd=
ipport=22
ssl=no out  err *Unable to connect/login to fencing device*




*Manually testing*

fence_ilo -a xx  -l Administrator -p x -o status
Status: ON

but with ssh port specified ie *-u *

fence_ilo -a xx  -l Administrator -p x -o status  -u 22
*Unable to connect/login to fencing device*

So when we specify ssh port it fails and with out ssh port its working

this is the case with ILO2 also

for ilo3 and ilo4 since it does not ask for SSH port its working

Is this a Bug

Thanks,



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Recommended setup for a FC based storage domain

2014-06-10 Thread combuster

/etc/libvirt/libvirtd.conf and /etc/vdsm/logger.conf

, but unfortunately maybe I've jumped to conclusions, last weekend, that 
very same thin provisioned vm was running a simple export for 3hrs 
before I've killed the process. But I wondered:


1. The process that runs behind the export is qemu-img convert (from raw 
to raw), and running iotop shows that every three or four seconds it 
reads 10-13 MBps and then idles for a few seconds. Run the numbers on 
100GB (why is he covering the entire 100 of 15GB used on thin volume I 
still don't get it) and you get precisely 3-4 hrs estimated time remaining.
2. When I run export with SPM on a node that doesn't have any vm's 
running, export finishes for aprox. 30min (iotop shows 40-70MBps read 
speed constantly)
3. Renicing I/O priority of the qemu-img process as well as the CPU 
priority gave no results, it was still runing slow beyond any explanation.


Debug logs showed nothing of interest, so I disabled anything above 
warning and it suddenly accelerated the export, so I've connected the 
wrong dots.


On 06/10/2014 11:18 AM, Andrew Lau wrote:

Interesting, which files did you modify to lower the log levels?

On Tue, Jun 3, 2014 at 12:38 AM,  combus...@archlinux.us wrote:

One word of caution so far, when exporting any vm, the node that acts as SPM
is stressed out to the max. I releived the stress by a certain margin with
lowering libvirtd and vdsm log levels to WARNING. That shortened out the
export procedure by at least five times. But vdsm process on the SPM node  is
still with high cpu usage so it's best that the SPM node should be left with a
decent CPU time amount to spare. Also, export of VM's with high vdisk capacity
and thin provisioning enabled (let's say 14GB used of 100GB defined) took
around 50min over a 10Gb ethernet interface to a 1Gb export NAS device that
was not stressed out at all by other processes. When I did that export with
debug log levels it took 5hrs :(

So lowering log levels is a must in production enviroment. I've deleted the
lun that I exported on the storage (removed it first from ovirt) and for the
next weekend I am planing to add a new one, export it again on all the nodes
and start a few fresh vm installations. Things I'm going to look for are
partition alignment and running them from different nodes in the cluster at
the same time. I just hope that not all I/O is going to pass through the SPM,
this is the one thing that bothers me the most.

I'll report back on these results next week, but if anyone has experience with
this kind of things or can point  to some documentation would be great.

On Monday, 2. June 2014. 18.51.52 you wrote:

I'm curious to hear what other comments arise, as we're analyzing a
production setup shortly.

On Sun, Jun 1, 2014 at 10:11 PM,  combus...@archlinux.us wrote:

I need to scratch gluster off because setup is based on CentOS 6.5, so
essential prerequisites like qemu 1.3 and libvirt 1.0.1 are not met.

Gluster would still work with EL6, afaik it just won't use libgfapi and
instead use just a standard mount.


Any info regarding FC storage domain would be appreciated though.

Thanks

Ivan

On Sunday, 1. June 2014. 11.44.33 combus...@archlinux.us wrote:

Hi,

I have a 4 node cluster setup and my storage options right now are a FC
based storage, one partition per node on a local drive (~200GB each) and
a
NFS based NAS device. I want to setup export and ISO domain on the NAS
and
there are no issues or questions regarding those two. I wasn't aware of
any
other options at the time for utilizing a local storage (since this is a
shared based datacenter) so I exported a directory from each partition
via
NFS and it works. But I am little in the dark with the following:

1. Are there any advantages for switching from NFS based local storage to
a
Gluster based domain with blocks for each partition. I guess it can be
only
performance wise but maybe I'm wrong. If there are advantages, are there
any tips regarding xfs mount options etc ?

2. I've created a volume on the FC based storage and exported it to all
of
the nodes in the cluster on the storage itself. I've configured
multipathing correctly and added an alias for the wwid of the LUN so I
can
distinct this one and any other future volumes more easily. At first I
created a partition on it but since oVirt saw only the whole LUN as raw
device I erased it before adding it as the FC master storage domain. I've
imported a few VM's and point them to the FC storage domain. This setup
works, but:

- All of the nodes see a device with the alias for the wwid of the
volume,
but only the node wich is currently the SPM for the cluster can see
logical
volumes inside. Also when I setup the high availability for VM's residing
on the FC storage and select to start on any node on the cluster, they
always start on the SPM. Can multiple nodes run different VM's on the
same
FC storage at the same time (logical thing would be that they can, but I
wanted to be sure first). I 

Re: [ovirt-users] Recommended setup for a FC based storage domain

2014-06-09 Thread combuster

OK, I have good news and bad news :)

Good news is that I can run different VM's on different nodes when all 
of their drives are on FC Storage domain. I don't think that all of I/O 
is running through SPM, but I need to test that. Simply put, for every 
virtual disk that you create on the shared fc storage domain, ovirt will 
present that vdisk only to the node wich is running the VM itself. They 
all can see domain infrastructure (inbox,outbox,metadata) but the LV for 
the virtual disk itself for that VM is visible only to the node that is 
running that particular VM. There is no limitation (except for the free 
space on the storage).


Bad news!

I can create the virtual disk on the fc storage for a vm, but when I 
start the VM itself, node wich hosts the VM that I'm starting is going 
non-operational, and quickly goes up again (ilo fencing agent checks if 
the node is ok and bring it back up). During that time, vm starts on 
another node (Default Host parameter was ignored - assigned Host was not 
available). I can manualy migrate it later to the intended node, that 
works. Lucky me, on two nodes (of the four) in the cluster, there were 
no vm's running (i tried this on both, with two different vm's created 
from scratch and i got the same result.


I've killed everything above WARNING because it was killing the 
performance of the cluster. vdsm.log :


[code]
Thread-305::WARNING::2014-06-09 
12:15:53,236::persistentDict::256::Storage.PersistentDict::(refresh) 
data has no embedded checksum - trust it as it is
55809e40-ccf3-4f7c-aeec-802bc1c326a7::WARNING::2014-06-09 
12:17:25,013::utils::129::root::(rmFile) File: 
/rhev/data-center/a0500f5c-e8d9-42f1-8f04-15b23514c8ed/55338570-e537-412b-97a9-635eea1ecb10/images/90659ad8-bd90-4a0a-bb4e-7c6afe90e925/242a1bce-a434-4246-ad24-b62f99c03a05 
already removed
55809e40-ccf3-4f7c-aeec-802bc1c326a7::WARNING::2014-06-09 
12:17:25,074::blockSD::761::Storage.StorageDomain::(_getOccupiedMetadataSlots) 
Could not find mapping for lv 
55338570-e537-412b-97a9-635eea1ecb10/242a1bce-a434-4246-ad24-b62f99c03a05
Thread-305::WARNING::2014-06-09 
12:20:54,341::persistentDict::256::Storage.PersistentDict::(refresh) 
data has no embedded checksum - trust it as it is
Thread-305::WARNING::2014-06-09 
12:25:55,378::persistentDict::256::Storage.PersistentDict::(refresh) 
data has no embedded checksum - trust it as it is
Thread-305::WARNING::2014-06-09 
12:30:56,424::persistentDict::256::Storage.PersistentDict::(refresh) 
data has no embedded checksum - trust it as it is
Thread-1857::WARNING::2014-06-09 
12:32:45,639::libvirtconnection::116::root::(wrapper) connection to 
libvirt broken. ecode: 1 edom: 7
Thread-1857::CRITICAL::2014-06-09 
12:32:45,640::libvirtconnection::118::root::(wrapper) taking calling 
process down.
Thread-17704::WARNING::2014-06-09 
12:32:48,009::libvirtconnection::116::root::(wrapper) connection to 
libvirt broken. ecode: 1 edom: 7
Thread-17704::CRITICAL::2014-06-09 
12:32:48,013::libvirtconnection::118::root::(wrapper) taking calling 
process down.
Thread-17704::ERROR::2014-06-09 
12:32:48,018::vm::2285::vm.Vm::(_startUnderlyingVm) 
vmId=`2bee9d79-b8d1-4a5a-a4f7-8092d1c803d9`::The vm start process failed

Traceback (most recent call last):
  File /usr/share/vdsm/vm.py, line 2245, in _startUnderlyingVm
self._run()
  File /usr/share/vdsm/vm.py, line 3185, in _run
self._connection.createXML(domxml, flags),
  File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, 
line 110, in wrapper

__connections.get(id(target)).pingLibvirt()
  File /usr/lib64/python2.6/site-packages/libvirt.py, line 3389, in 
getLibVersion
if ret == -1: raise libvirtError ('virConnectGetLibVersion() 
failed', conn=self)

libvirtError: internal error client socket is closed
Thread-1857::WARNING::2014-06-09 
12:32:50,673::vm::1963::vm.Vm::(_set_lastStatus) 
vmId=`2bee9d79-b8d1-4a5a-a4f7-8092d1c803d9`::trying to set state to 
Powering down when already Down
Thread-1857::WARNING::2014-06-09 
12:32:50,815::utils::129::root::(rmFile) File: 
/var/lib/libvirt/qemu/channels/2bee9d79-b8d1-4a5a-a4f7-8092d1c803d9.com.redhat.rhevm.vdsm 
already removed
Thread-1857::WARNING::2014-06-09 
12:32:50,816::utils::129::root::(rmFile) File: 
/var/lib/libvirt/qemu/channels/2bee9d79-b8d1-4a5a-a4f7-8092d1c803d9.org.qemu.guest_agent.0 
already removed
MainThread::WARNING::2014-06-09 
12:33:03,770::fileUtils::167::Storage.fileUtils::(createdir) Dir 
/rhev/data-center/mnt already exists
MainThread::WARNING::2014-06-09 
12:33:05,738::clientIF::181::vds::(_prepareBindings) Unable to load the 
json rpc server module. Please make sure it is installed.
storageRefresh::WARNING::2014-06-09 
12:33:06,133::fileUtils::167::Storage.fileUtils::(createdir) Dir 
/rhev/data-center/hsm-tasks already exists
Thread-35::ERROR::2014-06-09 
12:33:08,375::sdc::137::Storage.StorageDomainCache::(_findDomain) 
looking for unfetched domain 55338570-e537-412b-97a9-635eea1ecb10
Thread-35::ERROR::2014-06-09 

Re: [ovirt-users] Recommended setup for a FC based storage domain

2014-06-09 Thread combuster

Bad news happens only when running a VM for the first time, if it helps...

On 06/09/2014 01:30 PM, combuster wrote:

OK, I have good news and bad news :)

Good news is that I can run different VM's on different nodes when all 
of their drives are on FC Storage domain. I don't think that all of 
I/O is running through SPM, but I need to test that. Simply put, for 
every virtual disk that you create on the shared fc storage domain, 
ovirt will present that vdisk only to the node wich is running the VM 
itself. They all can see domain infrastructure (inbox,outbox,metadata) 
but the LV for the virtual disk itself for that VM is visible only to 
the node that is running that particular VM. There is no limitation 
(except for the free space on the storage).


Bad news!

I can create the virtual disk on the fc storage for a vm, but when I 
start the VM itself, node wich hosts the VM that I'm starting is going 
non-operational, and quickly goes up again (ilo fencing agent checks 
if the node is ok and bring it back up). During that time, vm starts 
on another node (Default Host parameter was ignored - assigned Host 
was not available). I can manualy migrate it later to the intended 
node, that works. Lucky me, on two nodes (of the four) in the cluster, 
there were no vm's running (i tried this on both, with two different 
vm's created from scratch and i got the same result.


I've killed everything above WARNING because it was killing the 
performance of the cluster. vdsm.log :


[code]
Thread-305::WARNING::2014-06-09 
12:15:53,236::persistentDict::256::Storage.PersistentDict::(refresh) 
data has no embedded checksum - trust it as it is
55809e40-ccf3-4f7c-aeec-802bc1c326a7::WARNING::2014-06-09 
12:17:25,013::utils::129::root::(rmFile) File: 
/rhev/data-center/a0500f5c-e8d9-42f1-8f04-15b23514c8ed/55338570-e537-412b-97a9-635eea1ecb10/images/90659ad8-bd90-4a0a-bb4e-7c6afe90e925/242a1bce-a434-4246-ad24-b62f99c03a05 
already removed
55809e40-ccf3-4f7c-aeec-802bc1c326a7::WARNING::2014-06-09 
12:17:25,074::blockSD::761::Storage.StorageDomain::(_getOccupiedMetadataSlots) 
Could not find mapping for lv 
55338570-e537-412b-97a9-635eea1ecb10/242a1bce-a434-4246-ad24-b62f99c03a05
Thread-305::WARNING::2014-06-09 
12:20:54,341::persistentDict::256::Storage.PersistentDict::(refresh) 
data has no embedded checksum - trust it as it is
Thread-305::WARNING::2014-06-09 
12:25:55,378::persistentDict::256::Storage.PersistentDict::(refresh) 
data has no embedded checksum - trust it as it is
Thread-305::WARNING::2014-06-09 
12:30:56,424::persistentDict::256::Storage.PersistentDict::(refresh) 
data has no embedded checksum - trust it as it is
Thread-1857::WARNING::2014-06-09 
12:32:45,639::libvirtconnection::116::root::(wrapper) connection to 
libvirt broken. ecode: 1 edom: 7
Thread-1857::CRITICAL::2014-06-09 
12:32:45,640::libvirtconnection::118::root::(wrapper) taking calling 
process down.
Thread-17704::WARNING::2014-06-09 
12:32:48,009::libvirtconnection::116::root::(wrapper) connection to 
libvirt broken. ecode: 1 edom: 7
Thread-17704::CRITICAL::2014-06-09 
12:32:48,013::libvirtconnection::118::root::(wrapper) taking calling 
process down.
Thread-17704::ERROR::2014-06-09 
12:32:48,018::vm::2285::vm.Vm::(_startUnderlyingVm) 
vmId=`2bee9d79-b8d1-4a5a-a4f7-8092d1c803d9`::The vm start process failed

Traceback (most recent call last):
  File /usr/share/vdsm/vm.py, line 2245, in _startUnderlyingVm
self._run()
  File /usr/share/vdsm/vm.py, line 3185, in _run
self._connection.createXML(domxml, flags),
  File /usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, 
line 110, in wrapper

__connections.get(id(target)).pingLibvirt()
  File /usr/lib64/python2.6/site-packages/libvirt.py, line 3389, in 
getLibVersion
if ret == -1: raise libvirtError ('virConnectGetLibVersion() 
failed', conn=self)

libvirtError: internal error client socket is closed
Thread-1857::WARNING::2014-06-09 
12:32:50,673::vm::1963::vm.Vm::(_set_lastStatus) 
vmId=`2bee9d79-b8d1-4a5a-a4f7-8092d1c803d9`::trying to set state to 
Powering down when already Down
Thread-1857::WARNING::2014-06-09 
12:32:50,815::utils::129::root::(rmFile) File: 
/var/lib/libvirt/qemu/channels/2bee9d79-b8d1-4a5a-a4f7-8092d1c803d9.com.redhat.rhevm.vdsm 
already removed
Thread-1857::WARNING::2014-06-09 
12:32:50,816::utils::129::root::(rmFile) File: 
/var/lib/libvirt/qemu/channels/2bee9d79-b8d1-4a5a-a4f7-8092d1c803d9.org.qemu.guest_agent.0 
already removed
MainThread::WARNING::2014-06-09 
12:33:03,770::fileUtils::167::Storage.fileUtils::(createdir) Dir 
/rhev/data-center/mnt already exists
MainThread::WARNING::2014-06-09 
12:33:05,738::clientIF::181::vds::(_prepareBindings) Unable to load 
the json rpc server module. Please make sure it is installed.
storageRefresh::WARNING::2014-06-09 
12:33:06,133::fileUtils::167::Storage.fileUtils::(createdir) Dir 
/rhev/data-center/hsm-tasks already exists
Thread-35::ERROR::2014-06-09 
12:33:08,375::sdc::137::Storage.StorageDomainCache

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-09 Thread combuster
Nah, I've explicitly allowed hosted-engine vm to be able to access the 
NAS device as the NFS share itself, before the deploy procedure even 
started. But I'm puzzled at how you can reproduce the bug, all was well 
on my setup before I've stated manual migration of the engine's vm. Even 
auto migration worked before that (tested it). Does it just happen 
without any procedure on the engine itself? Is the score 0 for just one 
node, or two of three of them?

On 06/10/2014 01:02 AM, Andrew Lau wrote:

nvm, just as I hit send the error has returned.
Ignore this..

On Tue, Jun 10, 2014 at 9:01 AM, Andrew Lau and...@andrewklau.com wrote:

So after adding the L3 capabilities to my storage network, I'm no
longer seeing this issue anymore. So the engine needs to be able to
access the storage domain it sits on? But that doesn't show up in the
UI?

Ivan, was this also the case with your setup? Engine couldn't access
storage domain?

On Mon, Jun 9, 2014 at 9:56 PM, Andrew Lau and...@andrewklau.com wrote:

Interesting, my storage network is a L2 only and doesn't run on the
ovirtmgmt (which is the only thing HostedEngine sees) but I've only
seen this issue when running ctdb in front of my NFS server. I
previously was using localhost as all my hosts had the nfs server on
it (gluster).

On Mon, Jun 9, 2014 at 9:15 PM, Artyom Lukianov aluki...@redhat.com wrote:

I just blocked connection to storage for testing, but on result I had this error: 
Failed to acquire lock error -243, so I added it in reproduce steps.
If you know another steps to reproduce this error, without blocking connection 
to storage it also can be wonderful if you can provide them.
Thanks

- Original Message -
From: Andrew Lau and...@andrewklau.com
To: combuster combus...@archlinux.us
Cc: users users@ovirt.org
Sent: Monday, June 9, 2014 3:47:00 AM
Subject: Re: [ovirt-users] VM HostedEngie is down. Exist message: internal 
error Failed to acquire lock error -243

I just ran a few extra tests, I had a 2 host, hosted-engine running
for a day. They both had a score of 2400. Migrated the VM through the
UI multiple times, all worked fine. I then added the third host, and
that's when it all fell to pieces.
Other two hosts have a score of 0 now.

I'm also curious, in the BZ there's a note about:

where engine-vm block connection to storage domain(via iptables -I
INPUT -s sd_ip -j DROP)

What's the purpose for that?

On Sat, Jun 7, 2014 at 4:16 PM, Andrew Lau and...@andrewklau.com wrote:

Ignore that, the issue came back after 10 minutes.

I've even tried a gluster mount + nfs server on top of that, and the
same issue has come back.

On Fri, Jun 6, 2014 at 6:26 PM, Andrew Lau and...@andrewklau.com wrote:

Interesting, I put it all into global maintenance. Shut it all down
for 10~ minutes, and it's regained it's sanlock control and doesn't
seem to have that issue coming up in the log.

On Fri, Jun 6, 2014 at 4:21 PM, combuster combus...@archlinux.us wrote:

It was pure NFS on a NAS device. They all had different ids (had no
redeployements of nodes before problem occured).

Thanks Jirka.


On 06/06/2014 08:19 AM, Jiri Moskovcak wrote:

I've seen that problem in other threads, the common denominator was nfs
on top of gluster. So if you have this setup, then it's a known problem. Or
you should double check if you hosts have different ids otherwise they would
be trying to acquire the same lock.

--Jirka

On 06/06/2014 08:03 AM, Andrew Lau wrote:

Hi Ivan,

Thanks for the in depth reply.

I've only seen this happen twice, and only after I added a third host
to the HA cluster. I wonder if that's the root problem.

Have you seen this happen on all your installs or only just after your
manual migration? It's a little frustrating this is happening as I was
hoping to get this into a production environment. It was all working
except that log message :(

Thanks,
Andrew


On Fri, Jun 6, 2014 at 3:20 PM, combuster combus...@archlinux.us wrote:

Hi Andrew,

this is something that I saw in my logs too, first on one node and then
on
the other three. When that happend on all four of them, engine was
corrupted
beyond repair.

First of all, I think that message is saying that sanlock can't get a
lock
on the shared storage that you defined for the hostedengine during
installation. I got this error when I've tried to manually migrate the
hosted engine. There is an unresolved bug there and I think it's related
to
this one:

[Bug 1093366 - Migration of hosted-engine vm put target host score to
zero]
https://bugzilla.redhat.com/show_bug.cgi?id=1093366

This is a blocker bug (or should be) for the selfhostedengine and, from
my
own experience with it, shouldn't be used in the production enviroment
(not
untill it's fixed).

Nothing that I've done couldn't fix the fact that the score for the
target
node was Zero, tried to reinstall the node, reboot the node, restarted
several services, tailed a tons of logs etc but to no avail. When only
one
node was left (that was actually running

Re: [ovirt-users] Recommended setup for a FC based storage domain

2014-06-09 Thread combuster
Hm, another update on this one. If I create another VM with another 
virtual disk on the node that already have a vm running from the FC 
storage, then libvirt doesn't brake. I guess it just happens for the 
first time on any of the nodes. If this is the case, I would have to 
bring all of the vm's on the other two nodes in this four node cluster 
and start a VM from the FC storage just to make sure it doesn't brake 
during working hours. I guess it would be fine then.


It seems to me that this is some sort of a timeout issue that happens 
when I start the vm for the first time on fc sd, this could have 
something to do with fc card driver settings, or libvirt won't wait for 
ovirt-engine to present the new LV to the targeted node. I don't see why 
ovirt-engine waits for the first-time launch of the vm to present the LV 
at all, shouldn't it be doing this at the time of the virtual disk 
creation in case I have selected to run from the specific node?


On 06/09/2014 01:49 PM, combuster wrote:

Bad news happens only when running a VM for the first time, if it helps...

On 06/09/2014 01:30 PM, combuster wrote:

OK, I have good news and bad news :)

Good news is that I can run different VM's on different nodes when 
all of their drives are on FC Storage domain. I don't think that all 
of I/O is running through SPM, but I need to test that. Simply put, 
for every virtual disk that you create on the shared fc storage 
domain, ovirt will present that vdisk only to the node wich is 
running the VM itself. They all can see domain infrastructure 
(inbox,outbox,metadata) but the LV for the virtual disk itself for 
that VM is visible only to the node that is running that particular 
VM. There is no limitation (except for the free space on the storage).


Bad news!

I can create the virtual disk on the fc storage for a vm, but when I 
start the VM itself, node wich hosts the VM that I'm starting is 
going non-operational, and quickly goes up again (ilo fencing agent 
checks if the node is ok and bring it back up). During that time, vm 
starts on another node (Default Host parameter was ignored - assigned 
Host was not available). I can manualy migrate it later to the 
intended node, that works. Lucky me, on two nodes (of the four) in 
the cluster, there were no vm's running (i tried this on both, with 
two different vm's created from scratch and i got the same result.


I've killed everything above WARNING because it was killing the 
performance of the cluster. vdsm.log :


[code]
Thread-305::WARNING::2014-06-09 
12:15:53,236::persistentDict::256::Storage.PersistentDict::(refresh) 
data has no embedded checksum - trust it as it is
55809e40-ccf3-4f7c-aeec-802bc1c326a7::WARNING::2014-06-09 
12:17:25,013::utils::129::root::(rmFile) File: 
/rhev/data-center/a0500f5c-e8d9-42f1-8f04-15b23514c8ed/55338570-e537-412b-97a9-635eea1ecb10/images/90659ad8-bd90-4a0a-bb4e-7c6afe90e925/242a1bce-a434-4246-ad24-b62f99c03a05 
already removed
55809e40-ccf3-4f7c-aeec-802bc1c326a7::WARNING::2014-06-09 
12:17:25,074::blockSD::761::Storage.StorageDomain::(_getOccupiedMetadataSlots) 
Could not find mapping for lv 
55338570-e537-412b-97a9-635eea1ecb10/242a1bce-a434-4246-ad24-b62f99c03a05
Thread-305::WARNING::2014-06-09 
12:20:54,341::persistentDict::256::Storage.PersistentDict::(refresh) 
data has no embedded checksum - trust it as it is
Thread-305::WARNING::2014-06-09 
12:25:55,378::persistentDict::256::Storage.PersistentDict::(refresh) 
data has no embedded checksum - trust it as it is
Thread-305::WARNING::2014-06-09 
12:30:56,424::persistentDict::256::Storage.PersistentDict::(refresh) 
data has no embedded checksum - trust it as it is
Thread-1857::WARNING::2014-06-09 
12:32:45,639::libvirtconnection::116::root::(wrapper) connection to 
libvirt broken. ecode: 1 edom: 7
Thread-1857::CRITICAL::2014-06-09 
12:32:45,640::libvirtconnection::118::root::(wrapper) taking calling 
process down.
Thread-17704::WARNING::2014-06-09 
12:32:48,009::libvirtconnection::116::root::(wrapper) connection to 
libvirt broken. ecode: 1 edom: 7
Thread-17704::CRITICAL::2014-06-09 
12:32:48,013::libvirtconnection::118::root::(wrapper) taking calling 
process down.
Thread-17704::ERROR::2014-06-09 
12:32:48,018::vm::2285::vm.Vm::(_startUnderlyingVm) 
vmId=`2bee9d79-b8d1-4a5a-a4f7-8092d1c803d9`::The vm start process failed

Traceback (most recent call last):
  File /usr/share/vdsm/vm.py, line 2245, in _startUnderlyingVm
self._run()
  File /usr/share/vdsm/vm.py, line 3185, in _run
self._connection.createXML(domxml, flags),
  File 
/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py, line 
110, in wrapper

__connections.get(id(target)).pingLibvirt()
  File /usr/lib64/python2.6/site-packages/libvirt.py, line 3389, in 
getLibVersion
if ret == -1: raise libvirtError ('virConnectGetLibVersion() 
failed', conn=self)

libvirtError: internal error client socket is closed
Thread-1857::WARNING::2014-06-09 
12:32:50,673::vm::1963::vm.Vm::(_set_lastStatus

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-09 Thread combuster

On 06/10/2014 07:19 AM, Andrew Lau wrote:

I'm really having a hard time finding out why it's happening..

If I set the cluster to global for a minute or two, the scores will
reset back to 2400. Set maintenance mode to none, and all will be fine
until a migration occurs. It seems it tries to migrate, fails and sets
the score to 0 permanently rather than the 10? minutes mentioned in
one of the ovirt slides.

When I have two hosts, it's score 0 only when a migration occurs.
(Just on the host which doesn't have engine up). The score 0 only
happens when it's tried to migrate when I set the host to local
maintenance. Migrating the VM from the UI has worked quite a few
times, but it's recently started to fail.

When I have three hosts, after 5~ mintues of them all up the score
will hit 0 on the hosts not running the VMs. It doesn't even have to
attempt to migrate before the score goes to 0. Stopping the ha agent
on one host, and resetting it with the global maintenance method
brings it back to the 2 host scenario above.

I may move on and just go back to a standalone engine as this is not
getting very much luck..
Well I've done this already, I can't really afford to have so much 
unplanned downtime on my critical vm's, especially since it would take 
me several hours (even a whole day) to install a dedicated engine, then 
setup the nodes if need be, and then import vm's from export domain. I 
would love to help more to resolve this one, but I was pressed with 
time, I already had ovirt 3.3 running (for a year and a half rock solid 
stable, started from 3.1 i think), and I couldn't spare more then a day 
in trying to get around this bug (had to have a setup runing by the end 
of the weekend). I wasn't using gluster at all, so at least we know now 
that gluster is not a must in the mix. Besides Artyom already described 
it nicely in the bug report, havent had anything to add.


You were lucky Andrew, when I've tried the global maintenance method and 
restarted the VM, I got a corrupted filesystem on the VM's engine and it 
wouldn't even start on that one node that had a good score. It was bad 
health or uknown state on all of the nodes, and I've managed to repair 
the fs on the vm via VNC, then just barely bring the services online but 
the postgres db was too much damaged, so engine missbehaved.


At the time, I've explained it to myself :) that the locking mechanism 
didn't prevent one node to try to start (or write to) the vm while it 
was already running on another node, because filesystem was so damaged 
that I couldn't belive it, for 15 years I've never seen an extX fs so 
badly damaged, and the fact that this happens during migration just 
amped this thought up.


On Tue, Jun 10, 2014 at 3:11 PM, combuster combus...@archlinux.us wrote:

Nah, I've explicitly allowed hosted-engine vm to be able to access the NAS
device as the NFS share itself, before the deploy procedure even started.
But I'm puzzled at how you can reproduce the bug, all was well on my setup
before I've stated manual migration of the engine's vm. Even auto migration
worked before that (tested it). Does it just happen without any procedure on
the engine itself? Is the score 0 for just one node, or two of three of
them?

On 06/10/2014 01:02 AM, Andrew Lau wrote:

nvm, just as I hit send the error has returned.
Ignore this..

On Tue, Jun 10, 2014 at 9:01 AM, Andrew Lau and...@andrewklau.com wrote:

So after adding the L3 capabilities to my storage network, I'm no
longer seeing this issue anymore. So the engine needs to be able to
access the storage domain it sits on? But that doesn't show up in the
UI?

Ivan, was this also the case with your setup? Engine couldn't access
storage domain?

On Mon, Jun 9, 2014 at 9:56 PM, Andrew Lau and...@andrewklau.com wrote:

Interesting, my storage network is a L2 only and doesn't run on the
ovirtmgmt (which is the only thing HostedEngine sees) but I've only
seen this issue when running ctdb in front of my NFS server. I
previously was using localhost as all my hosts had the nfs server on
it (gluster).

On Mon, Jun 9, 2014 at 9:15 PM, Artyom Lukianov aluki...@redhat.com
wrote:

I just blocked connection to storage for testing, but on result I had
this error: Failed to acquire lock error -243, so I added it in reproduce
steps.
If you know another steps to reproduce this error, without blocking
connection to storage it also can be wonderful if you can provide them.
Thanks

- Original Message -
From: Andrew Lau and...@andrewklau.com
To: combuster combus...@archlinux.us
Cc: users users@ovirt.org
Sent: Monday, June 9, 2014 3:47:00 AM
Subject: Re: [ovirt-users] VM HostedEngie is down. Exist message:
internal error Failed to acquire lock error -243

I just ran a few extra tests, I had a 2 host, hosted-engine running
for a day. They both had a score of 2400. Migrated the VM through the
UI multiple times, all worked fine. I then added the third host, and
that's when it all fell to pieces.
Other two hosts have

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-06 Thread combuster

On 06/06/2014 08:03 AM, Andrew Lau wrote:

Hi Ivan,

Thanks for the in depth reply.

I've only seen this happen twice, and only after I added a third host
to the HA cluster. I wonder if that's the root problem.
It shouldn't be if a shared storage that vm is residing on is accessible 
by a third node in the cluster.


Have you seen this happen on all your installs or only just after your
manual migration? It's a little frustrating this is happening as I was
hoping to get this into a production environment. It was all working
except that log message :(
Just after manual migration, then things went all to ... My strong 
recommendation is not to use self hosted engine feature for production 
purposes untill the mentioned bug is resolved. But it would really help 
to hear someone from the dev team on this one.

Thanks,
Andrew


On Fri, Jun 6, 2014 at 3:20 PM, combuster combus...@archlinux.us wrote:

Hi Andrew,

this is something that I saw in my logs too, first on one node and then on
the other three. When that happend on all four of them, engine was corrupted
beyond repair.

First of all, I think that message is saying that sanlock can't get a lock
on the shared storage that you defined for the hostedengine during
installation. I got this error when I've tried to manually migrate the
hosted engine. There is an unresolved bug there and I think it's related to
this one:

[Bug 1093366 - Migration of hosted-engine vm put target host score to zero]
https://bugzilla.redhat.com/show_bug.cgi?id=1093366

This is a blocker bug (or should be) for the selfhostedengine and, from my
own experience with it, shouldn't be used in the production enviroment (not
untill it's fixed).

Nothing that I've done couldn't fix the fact that the score for the target
node was Zero, tried to reinstall the node, reboot the node, restarted
several services, tailed a tons of logs etc but to no avail. When only one
node was left (that was actually running the hosted engine), I brought the
engine's vm down gracefully (hosted-engine --vm-shutdown I belive) and after
that, when I've tried to start the vm - it wouldn't load. Running VNC showed
that the filesystem inside the vm was corrupted and when I ran fsck and
finally started up - it was too badly damaged. I succeded to start the
engine itself (after repairing postgresql service that wouldn't want to
start) but the database was damaged enough and acted pretty weird (showed
that storage domains were down but the vm's were running fine etc). Lucky
me, I had already exported all of the VM's on the first sign of trouble and
then installed ovirt-engine on the dedicated server and attached the export
domain.

So while really a usefull feature, and it's working (for the most part ie,
automatic migration works), manually migrating VM with the hosted-engine
will lead to troubles.

I hope that my experience with it, will be of use to you. It happened to me
two weeks ago, ovirt-engine was current (3.4.1) and there was no fix
available.

Regards,

Ivan

On 06/06/2014 05:12 AM, Andrew Lau wrote:

Hi,

I'm seeing this weird message in my engine log

2014-06-06 03:06:09,380 INFO
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-79) RefreshVmList vm id
85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5 status = WaitForLaunch on vds
ov-hv2-2a-08-23 ignoring it in the refresh until migration is done
2014-06-06 03:06:12,494 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(DefaultQuartzScheduler_Worker-89) START, DestroyVDSCommand(HostName =
ov-hv2-2a-08-23, HostId = c04c62be-5d34-4e73-bd26-26f805b2dc60,
vmId=85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5, force=false,
secondsToWait=0, gracefully=false), log id: 62a9d4c1
2014-06-06 03:06:12,561 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(DefaultQuartzScheduler_Worker-89) FINISH, DestroyVDSCommand, log id:
62a9d4c1
2014-06-06 03:06:12,652 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_
Worker-89) Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit
message: internal error Failed to acquire lock: error -243.

It also appears to occur on the other hosts in the cluster, except the
host which is running the hosted-engine. So right now 3 servers, it
shows up twice in the engine UI.

The engine VM continues to run peacefully, without any issues on the
host which doesn't have that error.

Any ideas?
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-06 Thread combuster
It was pure NFS on a NAS device. They all had different ids (had no 
redeployements of nodes before problem occured).


Thanks Jirka.

On 06/06/2014 08:19 AM, Jiri Moskovcak wrote:
I've seen that problem in other threads, the common denominator was 
nfs on top of gluster. So if you have this setup, then it's a known 
problem. Or you should double check if you hosts have different ids 
otherwise they would be trying to acquire the same lock.


--Jirka

On 06/06/2014 08:03 AM, Andrew Lau wrote:

Hi Ivan,

Thanks for the in depth reply.

I've only seen this happen twice, and only after I added a third host
to the HA cluster. I wonder if that's the root problem.

Have you seen this happen on all your installs or only just after your
manual migration? It's a little frustrating this is happening as I was
hoping to get this into a production environment. It was all working
except that log message :(

Thanks,
Andrew


On Fri, Jun 6, 2014 at 3:20 PM, combuster combus...@archlinux.us 
wrote:

Hi Andrew,

this is something that I saw in my logs too, first on one node and 
then on
the other three. When that happend on all four of them, engine was 
corrupted

beyond repair.

First of all, I think that message is saying that sanlock can't get 
a lock

on the shared storage that you defined for the hostedengine during
installation. I got this error when I've tried to manually migrate the
hosted engine. There is an unresolved bug there and I think it's 
related to

this one:

[Bug 1093366 - Migration of hosted-engine vm put target host score 
to zero]

https://bugzilla.redhat.com/show_bug.cgi?id=1093366

This is a blocker bug (or should be) for the selfhostedengine and, 
from my
own experience with it, shouldn't be used in the production 
enviroment (not

untill it's fixed).

Nothing that I've done couldn't fix the fact that the score for the 
target

node was Zero, tried to reinstall the node, reboot the node, restarted
several services, tailed a tons of logs etc but to no avail. When 
only one
node was left (that was actually running the hosted engine), I 
brought the
engine's vm down gracefully (hosted-engine --vm-shutdown I belive) 
and after
that, when I've tried to start the vm - it wouldn't load. Running 
VNC showed

that the filesystem inside the vm was corrupted and when I ran fsck and
finally started up - it was too badly damaged. I succeded to start the
engine itself (after repairing postgresql service that wouldn't want to
start) but the database was damaged enough and acted pretty weird 
(showed
that storage domains were down but the vm's were running fine etc). 
Lucky
me, I had already exported all of the VM's on the first sign of 
trouble and
then installed ovirt-engine on the dedicated server and attached the 
export

domain.

So while really a usefull feature, and it's working (for the most 
part ie,
automatic migration works), manually migrating VM with the 
hosted-engine

will lead to troubles.

I hope that my experience with it, will be of use to you. It 
happened to me

two weeks ago, ovirt-engine was current (3.4.1) and there was no fix
available.

Regards,

Ivan

On 06/06/2014 05:12 AM, Andrew Lau wrote:

Hi,

I'm seeing this weird message in my engine log

2014-06-06 03:06:09,380 INFO
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-79) RefreshVmList vm id
85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5 status = WaitForLaunch on vds
ov-hv2-2a-08-23 ignoring it in the refresh until migration is done
2014-06-06 03:06:12,494 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(DefaultQuartzScheduler_Worker-89) START, DestroyVDSCommand(HostName =
ov-hv2-2a-08-23, HostId = c04c62be-5d34-4e73-bd26-26f805b2dc60,
vmId=85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5, force=false,
secondsToWait=0, gracefully=false), log id: 62a9d4c1
2014-06-06 03:06:12,561 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(DefaultQuartzScheduler_Worker-89) FINISH, DestroyVDSCommand, log id:
62a9d4c1
2014-06-06 03:06:12,652 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_
Worker-89) Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit
message: internal error Failed to acquire lock: error -243.

It also appears to occur on the other hosts in the cluster, except the
host which is running the hosted-engine. So right now 3 servers, it
shows up twice in the engine UI.

The engine VM continues to run peacefully, without any issues on the
host which doesn't have that error.

Any ideas?
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users





___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-05 Thread combuster

Hi Andrew,

this is something that I saw in my logs too, first on one node and then 
on the other three. When that happend on all four of them, engine was 
corrupted beyond repair.


First of all, I think that message is saying that sanlock can't get a 
lock on the shared storage that you defined for the hostedengine during 
installation. I got this error when I've tried to manually migrate the 
hosted engine. There is an unresolved bug there and I think it's related 
to this one:


[*Bug 1093366* https://bugzilla.redhat.com/show_bug.cgi?id=1093366 
-Migration of hosted-engine vm put target host score to zero]

https://bugzilla.redhat.com/show_bug.cgi?id=1093366

This is a blocker bug (or should be) for the selfhostedengine and, from 
my own experience with it, shouldn't be used in the production 
enviroment (not untill it's fixed).


Nothing that I've done couldn't fix the fact that the score for the 
target node was Zero, tried to reinstall the node, reboot the node, 
restarted several services, tailed a tons of logs etc but to no avail. 
When only one node was left (that was actually running the hosted 
engine), I brought the engine's vm down gracefully (hosted-engine 
--vm-shutdown I belive) and after that, when I've tried to start the vm 
- it wouldn't load. Running VNC showed that the filesystem inside the vm 
was corrupted and when I ran fsck and finally started up - it was too 
badly damaged. I succeded to start the engine itself (after repairing 
postgresql service that wouldn't want to start) but the database was 
damaged enough and acted pretty weird (showed that storage domains were 
down but the vm's were running fine etc). Lucky me, I had already 
exported all of the VM's on the first sign of trouble and then installed 
ovirt-engine on the dedicated server and attached the export domain.


So while really a usefull feature, and it's working (for the most part 
ie, automatic migration works), manually migrating VM with the 
hosted-engine will lead to troubles.


I hope that my experience with it, will be of use to you. It happened to 
me two weeks ago, ovirt-engine was current (3.4.1) and there was no fix 
available.


Regards,

Ivan
On 06/06/2014 05:12 AM, Andrew Lau wrote:

Hi,

I'm seeing this weird message in my engine log

2014-06-06 03:06:09,380 INFO
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-79) RefreshVmList vm id
85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5 status = WaitForLaunch on vds
ov-hv2-2a-08-23 ignoring it in the refresh until migration is done
2014-06-06 03:06:12,494 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(DefaultQuartzScheduler_Worker-89) START, DestroyVDSCommand(HostName =
ov-hv2-2a-08-23, HostId = c04c62be-5d34-4e73-bd26-26f805b2dc60,
vmId=85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5, force=false,
secondsToWait=0, gracefully=false), log id: 62a9d4c1
2014-06-06 03:06:12,561 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(DefaultQuartzScheduler_Worker-89) FINISH, DestroyVDSCommand, log id:
62a9d4c1
2014-06-06 03:06:12,652 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-89) Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit
message: internal error Failed to acquire lock: error -243.

It also appears to occur on the other hosts in the cluster, except the
host which is running the hosted-engine. So right now 3 servers, it
shows up twice in the engine UI.

The engine VM continues to run peacefully, without any issues on the
host which doesn't have that error.

Any ideas?
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Recommended setup for a FC based storage domain

2014-06-02 Thread combuster
One word of caution so far, when exporting any vm, the node that acts as SPM 
is stressed out to the max. I releived the stress by a certain margin with 
lowering libvirtd and vdsm log levels to WARNING. That shortened out the 
export procedure by at least five times. But vdsm process on the SPM node  is 
still with high cpu usage so it's best that the SPM node should be left with a 
decent CPU time amount to spare. Also, export of VM's with high vdisk capacity 
and thin provisioning enabled (let's say 14GB used of 100GB defined) took 
around 50min over a 10Gb ethernet interface to a 1Gb export NAS device that 
was not stressed out at all by other processes. When I did that export with 
debug log levels it took 5hrs :(

So lowering log levels is a must in production enviroment. I've deleted the 
lun that I exported on the storage (removed it first from ovirt) and for the 
next weekend I am planing to add a new one, export it again on all the nodes 
and start a few fresh vm installations. Things I'm going to look for are 
partition alignment and running them from different nodes in the cluster at 
the same time. I just hope that not all I/O is going to pass through the SPM, 
this is the one thing that bothers me the most.
 
I'll report back on these results next week, but if anyone has experience with 
this kind of things or can point  to some documentation would be great.

On Monday, 2. June 2014. 18.51.52 you wrote:
 I'm curious to hear what other comments arise, as we're analyzing a
 production setup shortly.
 
 On Sun, Jun 1, 2014 at 10:11 PM,  combus...@archlinux.us wrote:
  I need to scratch gluster off because setup is based on CentOS 6.5, so
  essential prerequisites like qemu 1.3 and libvirt 1.0.1 are not met.
 
 Gluster would still work with EL6, afaik it just won't use libgfapi and
 instead use just a standard mount.
 
  Any info regarding FC storage domain would be appreciated though.
  
  Thanks
  
  Ivan
  
  On Sunday, 1. June 2014. 11.44.33 combus...@archlinux.us wrote:
  Hi,
  
  I have a 4 node cluster setup and my storage options right now are a FC
  based storage, one partition per node on a local drive (~200GB each) and
  a
  NFS based NAS device. I want to setup export and ISO domain on the NAS
  and
  there are no issues or questions regarding those two. I wasn't aware of
  any
  other options at the time for utilizing a local storage (since this is a
  shared based datacenter) so I exported a directory from each partition
  via
  NFS and it works. But I am little in the dark with the following:
  
  1. Are there any advantages for switching from NFS based local storage to
  a
  Gluster based domain with blocks for each partition. I guess it can be
  only
  performance wise but maybe I'm wrong. If there are advantages, are there
  any tips regarding xfs mount options etc ?
  
  2. I've created a volume on the FC based storage and exported it to all
  of
  the nodes in the cluster on the storage itself. I've configured
  multipathing correctly and added an alias for the wwid of the LUN so I
  can
  distinct this one and any other future volumes more easily. At first I
  created a partition on it but since oVirt saw only the whole LUN as raw
  device I erased it before adding it as the FC master storage domain. I've
  imported a few VM's and point them to the FC storage domain. This setup
  works, but:
  
  - All of the nodes see a device with the alias for the wwid of the
  volume,
  but only the node wich is currently the SPM for the cluster can see
  logical
  volumes inside. Also when I setup the high availability for VM's residing
  on the FC storage and select to start on any node on the cluster, they
  always start on the SPM. Can multiple nodes run different VM's on the
  same
  FC storage at the same time (logical thing would be that they can, but I
  wanted to be sure first). I am not familiar with the logic oVirt utilizes
  that locks the vm's logical volume to prevent corruption.
  
  - Fdisk shows that logical volumes on the LUN of the FC volume are
  missaligned (partition doesn't end on cylindar boundary), so I wonder if
  this is becuase I imported the VM's with disks that were created on local
  storage before and that any _new_ VM's with disks on the fc storage would
  be propperly aligned.
  
  This is a new setup with oVirt 3.4 (did an export of all the VM's on 3.3
  and after a fresh installation of the 3.4 imported them back again). I
  have room to experiment a little with 2 of the 4 nodes because currently
  they are free from running any VM's, but I have limited room for
  anything else that would cause an unplanned downtime for four virtual
  machines running on the other two nodes on the cluster (currently highly
  available and their drives are on the FC storage domain). All in all I
  have 12 VM's running and I'm asking on the list for advice and guidance
  before I make any changes.
  
  Just trying to find as much info regarding all of this as possible 

[ovirt-users] Recommended setup for a FC based storage domain

2014-06-01 Thread combuster
Hi,

I have a 4 node cluster setup and my storage options right now are a FC based 
storage, one partition per node on a local drive (~200GB each) and a NFS based 
NAS device. I want to setup export and ISO domain on the NAS and there are no 
issues or questions regarding those two. I wasn't aware of any other options 
at the time for utilizing a local storage (since this is a shared based 
datacenter) so I exported a directory from each partition via NFS and it 
works. But I am little in the dark with the following:

1. Are there any advantages for switching from NFS based local storage to a 
Gluster based domain with blocks for each partition. I guess it can be only 
performance wise but maybe I'm wrong. If there are advantages, are there any 
tips regarding xfs mount options etc ?

2. I've created a volume on the FC based storage and exported it to all of the 
nodes in the cluster on the storage itself. I've configured multipathing 
correctly and added an alias for the wwid of the LUN so I can distinct this 
one and any other future volumes more easily. At first I created a partition 
on it but since oVirt saw only the whole LUN as raw device I erased it before 
adding it as the FC master storage domain. I've imported a few VM's and point 
them to the FC storage domain. This setup works, but:

- All of the nodes see a device with the alias for the wwid of the volume, but 
only the node wich is currently the SPM for the cluster can see logical 
volumes inside. Also when I setup the high availability for VM's residing on 
the FC storage and select to start on any node on the cluster, they always 
start on the SPM. Can multiple nodes run different VM's on the same FC storage 
at the same time (logical thing would be that they can, but I wanted to be 
sure first). I am not familiar with the logic oVirt utilizes that locks the 
vm's logical volume to prevent corruption.

- Fdisk shows that logical volumes on the LUN of the FC volume are missaligned 
(partition doesn't end on cylindar boundary), so I wonder if this is becuase I 
imported the VM's with disks that were created on local storage before and 
that any _new_ VM's with disks on the fc storage would be propperly aligned.

This is a new setup with oVirt 3.4 (did an export of all the VM's on 3.3 and 
after a fresh installation of the 3.4 imported them back again). I have room 
to experiment a little with 2 of the 4 nodes because currently they are free 
from running any VM's, but I have limited room for anything else that would 
cause an unplanned downtime for four virtual machines running on the other two 
nodes on the cluster (currently highly available and their drives are on the 
FC storage domain). All in all I have 12 VM's running and I'm asking on the 
list for advice and guidance before I make any changes.

Just trying to find as much info regarding all of this as possible before 
acting upon.

Thank you in advance,

Ivan
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Recommended setup for a FC based storage domain

2014-06-01 Thread combuster
I need to scratch gluster off because setup is based on CentOS 6.5, so 
essential prerequisites like qemu 1.3 and libvirt 1.0.1 are not met.

Any info regarding FC storage domain would be appreciated though.

Thanks

Ivan

On Sunday, 1. June 2014. 11.44.33 combus...@archlinux.us wrote:
 Hi,
 
 I have a 4 node cluster setup and my storage options right now are a FC
 based storage, one partition per node on a local drive (~200GB each) and a
 NFS based NAS device. I want to setup export and ISO domain on the NAS and
 there are no issues or questions regarding those two. I wasn't aware of any
 other options at the time for utilizing a local storage (since this is a
 shared based datacenter) so I exported a directory from each partition via
 NFS and it works. But I am little in the dark with the following:
 
 1. Are there any advantages for switching from NFS based local storage to a
 Gluster based domain with blocks for each partition. I guess it can be only
 performance wise but maybe I'm wrong. If there are advantages, are there any
 tips regarding xfs mount options etc ?
 
 2. I've created a volume on the FC based storage and exported it to all of
 the nodes in the cluster on the storage itself. I've configured
 multipathing correctly and added an alias for the wwid of the LUN so I can
 distinct this one and any other future volumes more easily. At first I
 created a partition on it but since oVirt saw only the whole LUN as raw
 device I erased it before adding it as the FC master storage domain. I've
 imported a few VM's and point them to the FC storage domain. This setup
 works, but:
 
 - All of the nodes see a device with the alias for the wwid of the volume,
 but only the node wich is currently the SPM for the cluster can see logical
 volumes inside. Also when I setup the high availability for VM's residing
 on the FC storage and select to start on any node on the cluster, they
 always start on the SPM. Can multiple nodes run different VM's on the same
 FC storage at the same time (logical thing would be that they can, but I
 wanted to be sure first). I am not familiar with the logic oVirt utilizes
 that locks the vm's logical volume to prevent corruption.
 
 - Fdisk shows that logical volumes on the LUN of the FC volume are
 missaligned (partition doesn't end on cylindar boundary), so I wonder if
 this is becuase I imported the VM's with disks that were created on local
 storage before and that any _new_ VM's with disks on the fc storage would
 be propperly aligned.
 
 This is a new setup with oVirt 3.4 (did an export of all the VM's on 3.3 and
 after a fresh installation of the 3.4 imported them back again). I have
 room to experiment a little with 2 of the 4 nodes because currently they
 are free from running any VM's, but I have limited room for anything else
 that would cause an unplanned downtime for four virtual machines running on
 the other two nodes on the cluster (currently highly available and their
 drives are on the FC storage domain). All in all I have 12 VM's running and
 I'm asking on the list for advice and guidance before I make any changes.
 
 Just trying to find as much info regarding all of this as possible before
 acting upon.
 
 Thank you in advance,
 
 Ivan

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users