Re: [ovirt-users] Can't attach storage domain / Unsupported watchdog

2014-07-31 Thread Andrew Lau
I was able to resolve this by disabling the HW watchdog model, and relying
on softdog eg.

blacklist iTCO_wdt
blacklist iTCO_vendor_support

Similar to, https://bugzilla.redhat.com/show_bug.cgi?id=878119
​I guess it's more of the kernel not supporting it rather than ​an oVirt
issue.

​What are the implications in this, based on the ovirt wiki wdmd is used as
sort of a fencing mechanism to reboot the host on failure. ​Wouldn't this
be considered redundant, considering power management exists also from the
ovirt-engine? Are there any downsides in using softdog?


On Thu, Jul 31, 2014 at 6:45 PM, Allon Mureinik amure...@redhat.com wrote:

 yikes.
 would you mind opening a bug on this, so we can track the issue properly?

 --

 *From: *Andrew Lau and...@andrewklau.com
 *To: *users users@ovirt.org
 *Sent: *Thursday, July 31, 2014 4:19:56 AM
 *Subject: *[ovirt-users] Can't attach storage domain / Unsupported
 watchdog


 Hi,

 I'm trying out some new boards (Intel Avotons), and it appears wdmd does
 not like the watchdog device it provides.

 Everything in oVirt 3.4.3 seems to run fine until it comes to adding a
 storage device. It gives me an error about Cannot acquire host id

 vdsm logs,
 http://fpaste.org/122164/67695431/

 # service wdmd start
 (logs)
 wdmd[14062]: wdmd started S0 H1 G0
 wdmd[14062]: /dev/watchdog failed to set timeout
 wdmd[14062]: /dev/watchdog disarmed
 wdmd[14062]: no watchdog device, load a watchdog driver

 # service wdmd status
 wdmd dead but subsys locked

 I can't seem to find any documentation on what wdmd does, could someone
 explain?

 Thanks,
 Andrew

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Can't attach storage domain / Unsupported watchdog

2014-07-30 Thread Andrew Lau
Hi,

I'm trying out some new boards (Intel Avotons), and it appears wdmd does
not like the watchdog device it provides.

Everything in oVirt 3.4.3 seems to run fine until it comes to adding a
storage device. It gives me an error about Cannot acquire host id

vdsm logs,
http://fpaste.org/122164/67695431/

# service wdmd start
(logs)
wdmd[14062]: wdmd started S0 H1 G0
wdmd[14062]: /dev/watchdog failed to set timeout
wdmd[14062]: /dev/watchdog disarmed
wdmd[14062]: no watchdog device, load a watchdog driver

# service wdmd status
wdmd dead but subsys locked

I can't seem to find any documentation on what wdmd does, could someone
explain?

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Odd messages on new node/hosted engine

2014-07-23 Thread Andrew Lau
On Wed, Jul 23, 2014 at 7:06 PM, Sandro Bonazzola sbona...@redhat.com
wrote:

 Il 23/07/2014 04:08, Andrew Lau ha scritto:
  Looks like the multipath rules ovirt seems to setup by default, not sure
 why..
  I just blacklist my (HDD) devices, otherwise my consoles get filled with
 a similar message.

 hosted-engine setup doesn't set power management, it can be done by hand
 from the UI.
 also, hosted-engine setup don't touch the multipath configuration.
 If something changes it, it's not hosted-engine setup.


​Even non hosted-engine setups, ​this still happens.
http://lists.ovirt.org/pipermail/users/2013-January/011474.html

​I believe that's what he's referring to - nothing to do with
powermanagement.
​



 
  On Thu, Jul 17, 2014 at 5:41 AM, Chris Adams c...@cmadams.net mailto:
 c...@cmadams.net wrote:
 
  I built a new node+hosted engine setup, using up-to-date 6.5 and
 oVirt
  3.4.3-RC.  I see some odd messages, that I think are probably
 related:
 
  - In the hosted engine UI, I have an Alert that says Failed to
 verify
Power Management configuration for Host hosted_engine_1.
 
  - On the node, I get the following chunk repeated every 10 seconds in
/var/log/messages:
 
  Jul 16 14:34:19 node0 kernel: device-mapper: table: 253:2:
 multipath: error getting device
  Jul 16 14:34:19 node0 kernel: device-mapper: ioctl: error adding
 target to table
  Jul 16 14:34:19 node0 kernel: device-mapper: table: 253:2:
 multipath: error getting device
  Jul 16 14:34:19 node0 kernel: device-mapper: ioctl: error adding
 target to table
  Jul 16 14:34:19 node0 multipathd: dm-2: remove map (uevent)
  Jul 16 14:34:19 node0 multipathd: dm-2: devmap not registered, can't
 remove
  Jul 16 14:34:19 node0 cpuspeed: Disabling performance cpu frequency
 scaling governor
  Jul 16 14:34:19 node0 multipathd: dm-2: remove map (uevent)
  Jul 16 14:34:19 node0 multipathd: dm-2: devmap not registered, can't
 remove
  Jul 16 14:34:19 node0 multipathd: dm-2: remove map (uevent)
  Jul 16 14:34:19 node0 multipathd: dm-2: devmap not registered, can't
 remove
  Jul 16 14:34:19 node0 multipathd: dm-2: remove map (uevent)
  Jul 16 14:34:19 node0 multipathd: dm-2: devmap not registered, can't
 remove
  Jul 16 14:34:20 node0 cpuspeed: Enabling performance cpu frequency
 scaling governor
 
  There is no dm-2; the system is installed on 2 SAS drives, mirrored
  using Linux md RAID1, using LVM (dm-0 is the root filesystem and
 dm-1 is
  swap).
 
  Here's the corresponding chunk of /var/log/vdsm/vdsm.log:
 
  Thread-128::DEBUG::2014-07-16
 14:34:19,092::task::595::TaskManager.Task::(_updateState)
 Task=`6db0f7ed-ec65-4685-ae2d-604560349317`::moving from
  state init - state preparing
  Thread-128::INFO::2014-07-16
 14:34:19,092::logUtils::44::dispatcher::(wrapper) Run and protect:
 repoStats(options=None)
  Thread-128::INFO::2014-07-16
 14:34:19,093::logUtils::47::dispatcher::(wrapper) Run and protect:
 repoStats, Return response:
  {'74cb6a07-5745-4b21-ba4b-d9012acb5cae': {'code': 0, 'version': 3,
 'acquired': True, 'delay': '0.000485922', 'lastCheck': '9.4', 'valid':
 True}}
  Thread-128::DEBUG::2014-07-16
 14:34:19,093::task::1185::TaskManager.Task::(prepare)
 Task=`6db0f7ed-ec65-4685-ae2d-604560349317`::finished:
  {'74cb6a07-5745-4b21-ba4b-d9012acb5cae': {'code': 0, 'version': 3,
 'acquired': True, 'delay': '0.000485922', 'lastCheck': '9.4', 'valid':
 True}}
  Thread-128::DEBUG::2014-07-16
 14:34:19,094::task::595::TaskManager.Task::(_updateState)
 Task=`6db0f7ed-ec65-4685-ae2d-604560349317`::moving from
  state preparing - state finished
  Thread-128::DEBUG::2014-07-16
 14:34:19,094::resourceManager::940::ResourceManager.Owner::(releaseAll)
 Owner.releaseAll requests {} resources {}
  Thread-128::DEBUG::2014-07-16
 14:34:19,094::resourceManager::977::ResourceManager.Owner::(cancelAll)
 Owner.cancelAll requests {}
  Thread-128::DEBUG::2014-07-16
 14:34:19,095::task::990::TaskManager.Task::(_decref)
 Task=`6db0f7ed-ec65-4685-ae2d-604560349317`::ref 0 aborting False
  Thread-2011::DEBUG::2014-07-16
 14:34:19,369::BindingXMLRPC::251::vds::(wrapper) client [127.0.0.1]
  Thread-2011::DEBUG::2014-07-16
 14:34:19,370::task::595::TaskManager.Task::(_updateState)
 Task=`44641295-7f85-40f9-ba71-6f587a96f387`::moving from
  state init - state preparing
  Thread-2011::INFO::2014-07-16
 14:34:19,371::logUtils::44::dispatcher::(wrapper) Run and protect:
 connectStorageServer(domType=1,
  spUUID='b15478ff-1ae1-4065-8e52-19c808d39597', conList=[{'port': '',
 'connection': 'nfs.c1.api-digital.com:/vmstore/engine', 'iqn': '',
 'portal':
  '', 'user': 'kvm', 'protocol_version': '4', 'password': '**',
 'id': '7fb481a8-f7b2-4cf7-8862-8ff02acde48d'}], options=None)
  Thread-2011::DEBUG::2014-07-16
 14:34:19,376::hsm::2328::Storage.HSM::(__prefetchDomains) nfs local path:
  /rhev

Re: [ovirt-users] Odd messages on new node/hosted engine

2014-07-22 Thread Andrew Lau
Looks like the multipath rules ovirt seems to setup by default, not sure
why..
I just blacklist my (HDD) devices, otherwise my consoles get filled with a
similar message.

On Thu, Jul 17, 2014 at 5:41 AM, Chris Adams c...@cmadams.net wrote:

 I built a new node+hosted engine setup, using up-to-date 6.5 and oVirt
 3.4.3-RC.  I see some odd messages, that I think are probably related:

 - In the hosted engine UI, I have an Alert that says Failed to verify
   Power Management configuration for Host hosted_engine_1.

 - On the node, I get the following chunk repeated every 10 seconds in
   /var/log/messages:

 Jul 16 14:34:19 node0 kernel: device-mapper: table: 253:2: multipath:
 error getting device
 Jul 16 14:34:19 node0 kernel: device-mapper: ioctl: error adding target to
 table
 Jul 16 14:34:19 node0 kernel: device-mapper: table: 253:2: multipath:
 error getting device
 Jul 16 14:34:19 node0 kernel: device-mapper: ioctl: error adding target to
 table
 Jul 16 14:34:19 node0 multipathd: dm-2: remove map (uevent)
 Jul 16 14:34:19 node0 multipathd: dm-2: devmap not registered, can't remove
 Jul 16 14:34:19 node0 cpuspeed: Disabling performance cpu frequency
 scaling governor
 Jul 16 14:34:19 node0 multipathd: dm-2: remove map (uevent)
 Jul 16 14:34:19 node0 multipathd: dm-2: devmap not registered, can't remove
 Jul 16 14:34:19 node0 multipathd: dm-2: remove map (uevent)
 Jul 16 14:34:19 node0 multipathd: dm-2: devmap not registered, can't remove
 Jul 16 14:34:19 node0 multipathd: dm-2: remove map (uevent)
 Jul 16 14:34:19 node0 multipathd: dm-2: devmap not registered, can't remove
 Jul 16 14:34:20 node0 cpuspeed: Enabling performance cpu frequency scaling
 governor

 There is no dm-2; the system is installed on 2 SAS drives, mirrored
 using Linux md RAID1, using LVM (dm-0 is the root filesystem and dm-1 is
 swap).

 Here's the corresponding chunk of /var/log/vdsm/vdsm.log:

 Thread-128::DEBUG::2014-07-16
 14:34:19,092::task::595::TaskManager.Task::(_updateState)
 Task=`6db0f7ed-ec65-4685-ae2d-604560349317`::moving from state init -
 state preparing
 Thread-128::INFO::2014-07-16
 14:34:19,092::logUtils::44::dispatcher::(wrapper) Run and protect:
 repoStats(options=None)
 Thread-128::INFO::2014-07-16
 14:34:19,093::logUtils::47::dispatcher::(wrapper) Run and protect:
 repoStats, Return response: {'74cb6a07-5745-4b21-ba4b-d9012acb5cae':
 {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000485922',
 'lastCheck': '9.4', 'valid': True}}
 Thread-128::DEBUG::2014-07-16
 14:34:19,093::task::1185::TaskManager.Task::(prepare)
 Task=`6db0f7ed-ec65-4685-ae2d-604560349317`::finished:
 {'74cb6a07-5745-4b21-ba4b-d9012acb5cae': {'code': 0, 'version': 3,
 'acquired': True, 'delay': '0.000485922', 'lastCheck': '9.4', 'valid':
 True}}
 Thread-128::DEBUG::2014-07-16
 14:34:19,094::task::595::TaskManager.Task::(_updateState)
 Task=`6db0f7ed-ec65-4685-ae2d-604560349317`::moving from state preparing -
 state finished
 Thread-128::DEBUG::2014-07-16
 14:34:19,094::resourceManager::940::ResourceManager.Owner::(releaseAll)
 Owner.releaseAll requests {} resources {}
 Thread-128::DEBUG::2014-07-16
 14:34:19,094::resourceManager::977::ResourceManager.Owner::(cancelAll)
 Owner.cancelAll requests {}
 Thread-128::DEBUG::2014-07-16
 14:34:19,095::task::990::TaskManager.Task::(_decref)
 Task=`6db0f7ed-ec65-4685-ae2d-604560349317`::ref 0 aborting False
 Thread-2011::DEBUG::2014-07-16
 14:34:19,369::BindingXMLRPC::251::vds::(wrapper) client [127.0.0.1]
 Thread-2011::DEBUG::2014-07-16
 14:34:19,370::task::595::TaskManager.Task::(_updateState)
 Task=`44641295-7f85-40f9-ba71-6f587a96f387`::moving from state init -
 state preparing
 Thread-2011::INFO::2014-07-16
 14:34:19,371::logUtils::44::dispatcher::(wrapper) Run and protect:
 connectStorageServer(domType=1,
 spUUID='b15478ff-1ae1-4065-8e52-19c808d39597', conList=[{'port': '',
 'connection': 'nfs.c1.api-digital.com:/vmstore/engine', 'iqn': '',
 'portal': '', 'user': 'kvm', 'protocol_version': '4', 'password': '**',
 'id': '7fb481a8-f7b2-4cf7-8862-8ff02acde48d'}], options=None)
 Thread-2011::DEBUG::2014-07-16
 14:34:19,376::hsm::2328::Storage.HSM::(__prefetchDomains) nfs local path:
 /rhev/data-center/mnt/nfs.c1.api-digital.com:_vmstore_engine
 Thread-2011::DEBUG::2014-07-16
 14:34:19,378::hsm::2352::Storage.HSM::(__prefetchDomains) Found SD uuids:
 ('74cb6a07-5745-4b21-ba4b-d9012acb5cae',)
 Thread-2011::DEBUG::2014-07-16
 14:34:19,379::hsm::2408::Storage.HSM::(connectStorageServer) knownSDs:
 {74cb6a07-5745-4b21-ba4b-d9012acb5cae: storage.nfsSD.findDomain}
 Thread-2011::INFO::2014-07-16
 14:34:19,379::logUtils::47::dispatcher::(wrapper) Run and protect:
 connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id':
 '7fb481a8-f7b2-4cf7-8862-8ff02acde48d'}]}
 Thread-2011::DEBUG::2014-07-16
 14:34:19,379::task::1185::TaskManager.Task::(prepare)
 Task=`44641295-7f85-40f9-ba71-6f587a96f387`::finished: {'statuslist':
 [{'status': 0, 'id': 

Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-22 Thread Andrew Lau
On 23/07/2014 1:45 am, Jason Brooks jbro...@redhat.com wrote:



 - Original Message -
  From: Jason Brooks jbro...@redhat.com
  To: Andrew Lau and...@andrewklau.com
  Cc: users users@ovirt.org
  Sent: Tuesday, July 22, 2014 8:29:46 AM
  Subject: Re: [ovirt-users] Can we debug some truths/myths/facts
about   hosted-engine and gluster?
 
 
 
  - Original Message -
   From: Andrew Lau and...@andrewklau.com
   To: users users@ovirt.org
   Sent: Friday, July 18, 2014 4:50:31 AM
   Subject: [ovirt-users] Can we debug some truths/myths/facts about
   hosted-engine and gluster?
  
   Hi all,
  
   As most of you have got hints from previous messages, hosted engine
won't
   work on gluster . A quote from BZ1097639
  
   Using hosted engine with Gluster backed storage is currently
something we
   really warn against.
 
  My current setup is hosted engine, configured w/ gluster storage as
described
  in my
  blog post, but with three hosts and replica 3 volumes.
 
  Only issue I've seen is an errant message about the Hosted Engine being
down
  following an engine migration. The engine does migrate successfully,
though.
​​That was fixed in 3.4.3 I believe, although when it happened to me my
engine didn't migrate it ju​st sat there.


 
  RE your bug, what do you use for a mount point for the nfs storage?

 In the log you attached to your bug, it looks like you're using localhost
as
 the nfs mount point. I use a dns name that resolves to the virtual IP
hosted
 by ctdb. So, you're only ever talking to one nfs server at a time, and
failover
 between the nfs hosts is handled by ctdb.

I also tried your setup, but hit other complications. I used localhost
​in an old setup, ​
previously as I was under the assumption when accessing anything gluster
related,
​ the connection point only provides the volume info and you connect to any
server in the volume group.​


 Anyway, like I said, my main testing rig is now using this configuration,
 help me try and break it. :)

rm -rf /

Jokes aside, are you able to reboot a server without losing the VM ?
​ My experience with ctdb (based on your blog) was even with the
floating/virtual IP it wasn't fast enough, or something in the gluster
layer delayed the failover. Either way, the VM goes into paused state and
can't be resumed.​


 
  Jason
 
 
  
  
   I think this bug should be closed or re-targeted at documentation,
   because there is nothing we can do here. Hosted engine assumes that
   all writes are atomic and (immediately) available for all hosts in the
   cluster. Gluster violates those assumptions.
  
   ​
  
   ​Until the documentation gets updated, I hope this serves as a useful
   notice at least to save people some of the headaches I hit like
   hosted-engine starting up multiple VMs because of above issue.
   ​
  
   Now my question, does this theory prevent a scenario of perhaps
something
   like a gluster replicated volume being mounted as a glusterfs
filesystem
   and then re-exported as the native kernel NFS share for the
hosted-engine
   to consume? It could then be possible to chuck ctdb in there to
provide a
   last resort failover solution. I have tried myself and suggested it
to two
   people who are running a similar setup. Now using the native kernel
NFS
   server for hosted-engine and they haven't reported as many issues.
Curious,
   could anyone validate my theory on this?
  
   Thanks,
   Andrew
  
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Setup of hosted Engine Fails

2014-07-21 Thread Andrew Lau
Done, https://bugzilla.redhat.com/show_bug.cgi?id=1121561

On Mon, Jul 21, 2014 at 6:32 PM, Jiri Moskovcak jmosk...@redhat.com wrote:

 Hi Andrew,
 thanks for debugging this, please create a bug against vdsm to make sure
 it gets proper attention.

 Thanks,
 Jirka


 On 07/19/2014 12:36 PM, Andrew Lau wrote:

 Quick update, it seems to be related to the latest vdsm package,

 service vdsmd start
 vdsm: Running mkdirs
 vdsm: Running configure_coredump
 vdsm: Running configure_vdsm_logs
 vdsm: Running run_init_hooks
 vdsm: Running gencerts
 vdsm: Running check_is_configured
 libvirt is not configured for vdsm yet
 Modules libvirt are not configured
   Traceback (most recent call last):
File /usr/bin/vdsm-tool, line 145, in module
  sys.exit(main())
File /usr/bin/vdsm-tool, line 142, in main
  return tool_command[cmd][command](*args[1:])
File /usr/lib64/python2.6/site-packages/vdsm/tool/configurator.py,
 line 282, in isconfigured
  raise RuntimeError(msg)
 RuntimeError:

 One of the modules is not configured to work with VDSM.
 To configure the module use the following:
 'vdsm-tool configure [module_name]'.

 If all modules are not configured try to use:
 'vdsm-tool configure --force'
 (The force flag will stop the module's service and start it
 afterwards automatically to load the new configuration.)

 vdsm: stopped during execute check_is_configured task (task returned
 with error code 1).
 vdsm start [FAILED]

 yum downgrade vdsm*

 ​Here's the package changes for reference,

 -- Running transaction check
 --- Package vdsm.x86_64 0:4.14.9-0.el6 will be a downgrade
 --- Package vdsm.x86_64 0:4.14.11-0.el6 will be erased
 --- Package vdsm-cli.noarch 0:4.14.9-0.el6 will be a downgrade
 --- Package vdsm-cli.noarch 0:4.14.11-0.el6 will be erased
 --- Package vdsm-python.x86_64 0:4.14.9-0.el6 will be a downgrade
 --- Package vdsm-python.x86_64 0:4.14.11-0.el6 will be erased
 --- Package vdsm-python-zombiereaper.noarch 0:4.14.9-0.el6 will be a
 downgrade
 --- Package vdsm-python-zombiereaper.noarch 0:4.14.11-0.el6 will be
 erased
 --- Package vdsm-xmlrpc.noarch 0:4.14.9-0.el6 will be a downgrade
 --- Package vdsm-xmlrpc.noarch 0:4.14.11-0.el6 will be erased

 service vdsmd start
 initctl: Job is already running: libvirtd
 vdsm: Running mkdirs
 vdsm: Running configure_coredump
 vdsm: Running configure_vdsm_logs
 vdsm: Running run_init_hooks
 vdsm: Running gencerts
 vdsm: Running check_is_configured
 libvirt is already configured for vdsm
 sanlock service is already configured
 vdsm: Running validate_configuration
 SUCCESS: ssl configured to true. No conflicts
 vdsm: Running prepare_transient_repository
 vdsm: Running syslog_available
 vdsm: Running nwfilter
 vdsm: Running dummybr
 vdsm: Running load_needed_modules
 vdsm: Running tune_system
 vdsm: Running test_space
 vdsm: Running test_lo
 vdsm: Running unified_network_persistence_upgrade
 vdsm: Running restore_nets
 vdsm: Running upgrade_300_nets
 Starting up vdsm daemon:
 vdsm start [  OK  ]
 [root@ov-hv1-2a-08-23 ~]# service vdsmd status
 VDS daemon server is running


 On Sat, Jul 19, 2014 at 6:58 PM, Andrew Lau and...@andrewklau.com
 mailto:and...@andrewklau.com wrote:

 It seems vdsm is not running,

 service vdsmd status
 VDS daemon is not running, and its watchdog is running

 The only logs in /var/log/vdsm/ that appear to have any content is
 /var/log/vdsm/supervdsm.log - everything else is blank

 MainThread::DEBUG::2014-07-19
 18:55:34,793::supervdsmServer::424::SuperVdsm.Server::(main)
 Terminated normally
 MainThread::DEBUG::2014-07-19
 18:55:38,033::netconfpersistence::134::root::(_getConfigs)
 Non-existing config set.
 MainThread::DEBUG::2014-07-19
 18:55:38,034::netconfpersistence::134::root::(_getConfigs)
 Non-existing config set.
 MainThread::DEBUG::2014-07-19
 18:55:38,058::supervdsmServer::384::SuperVdsm.Server::(main) Making
 sure I'm root - SuperVdsm
 MainThread::DEBUG::2014-07-19
 18:55:38,059::supervdsmServer::393::SuperVdsm.Server::(main) Parsing
 cmd args
 MainThread::DEBUG::2014-07-19
 18:55:38,059::supervdsmServer::396::SuperVdsm.Server::(main)
 Cleaning old socket /var/run/vdsm/svdsm.sock
 MainThread::DEBUG::2014-07-19
 18:55:38,059::supervdsmServer::400::SuperVdsm.Server::(main) Setting
 up keep alive thread
 MainThread::DEBUG::2014-07-19
 18:55:38,059::supervdsmServer::406::SuperVdsm.Server::(main)
 Creating remote object manager
 MainThread::DEBUG::2014-07-19
 18:55:38,061::supervdsmServer::417::SuperVdsm.Server::(main) Started
 serving super vdsm object
 sourceRoute::DEBUG::2014-07-19
 18:55:38,062::sourceRouteThread::56::root::(_subscribeToInotifyLoop)
 sourceRouteThread.subscribeToInotifyLoop started


 On Sat, Jul 19, 2014 at 6:48 PM, Andrew Lau and...@andrewklau.com

Re: [ovirt-users] Setup of hosted Engine Fails

2014-07-19 Thread Andrew Lau
Here's a snippet from my hosted-engine-setup log

2014-07-19 18:45:14 DEBUG otopi.context context._executeMethod:138 Stage
late_setup METHOD
otopi.plugins.ovirt_hosted_engine_setup.vm.configurevm.Plugin._late_setup
2014-07-19 18:45:14 DEBUG otopi.context context._executeMethod:152 method
exception
Traceback (most recent call last):
  File /usr/lib/python2.6/site-packages/otopi/context.py, line 142, in
_executeMethod
method['method']()
  File
/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/vm/configurevm.py,
line 99, in _late_setup
response = serv.s.list()
  File /usr/lib64/python2.6/xmlrpclib.py, line 1199, in __call__
return self.__send(self.__name, args)
  File /usr/lib64/python2.6/xmlrpclib.py, line 1489, in __request
verbose=self.__verbose
  File /usr/lib64/python2.6/xmlrpclib.py, line 1235, in request
self.send_content(h, request_body)
  File /usr/lib64/python2.6/xmlrpclib.py, line 1349, in send_content
connection.endheaders()
  File /usr/lib64/python2.6/httplib.py, line 908, in endheaders
self._send_output()
  File /usr/lib64/python2.6/httplib.py, line 780, in _send_output
self.send(msg)
  File /usr/lib64/python2.6/httplib.py, line 739, in send
self.connect()
  File /usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py,
line 188, in connect
sock = socket.create_connection((self.host, self.port), self.timeout)
  File /usr/lib64/python2.6/socket.py, line 567, in create_connection
raise error, msg
error: [Errno 111] Connection refused
2014-07-19 18:45:14 ERROR otopi.context context._executeMethod:161 Failed
to execute stage 'Environment setup': [Errno 111] Connection refused


On Wed, Jul 16, 2014 at 9:05 PM, Sandro Bonazzola sbona...@redhat.com
wrote:

 Il 16/07/2014 00:47, Christopher Jaggon ha scritto:
  Here is a list of packages :
 
  rpm -qa | grep -i vdsm gives :
 
  vdsm-python-4.14.9-0.el6.x86_64
  vdsm-python-zombiereaper-4.14.9-0.el6.noarch
  vdsm-xmlrpc-4.14.9-0.el6.noarch
  vdsm-4.14.9-0.el6.x86_64
  vdsm-cli-4.14.9-0.el6.noarch
 
  When I try to run the hosted engine setup I get this error in the log :
 
  [ INFO  ] Waiting for VDSM hardware info
  [ INFO  ] Waiting for VDSM hardware info
  [ ERROR ] Failed to execute stage 'Environment setup': [Errno 111]
 Connection refused
  [ INFO  ] Stage: Clean up
 
  Any advice and why this maybe so?


 Can you please provide hosted-engine setup, vdsm, supervdsm and libvirt
 logs?



 
 
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 


 --
 Sandro Bonazzola
 Better technology. Faster innovation. Powered by community collaboration.
 See how it works at redhat.com
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Setup of hosted Engine Fails

2014-07-19 Thread Andrew Lau
It seems vdsm is not running,

service vdsmd status
VDS daemon is not running, and its watchdog is running

The only logs in /var/log/vdsm/ that appear to have any content is
/var/log/vdsm/supervdsm.log - everything else is blank

MainThread::DEBUG::2014-07-19
18:55:34,793::supervdsmServer::424::SuperVdsm.Server::(main) Terminated
normally
MainThread::DEBUG::2014-07-19
18:55:38,033::netconfpersistence::134::root::(_getConfigs) Non-existing
config set.
MainThread::DEBUG::2014-07-19
18:55:38,034::netconfpersistence::134::root::(_getConfigs) Non-existing
config set.
MainThread::DEBUG::2014-07-19
18:55:38,058::supervdsmServer::384::SuperVdsm.Server::(main) Making sure
I'm root - SuperVdsm
MainThread::DEBUG::2014-07-19
18:55:38,059::supervdsmServer::393::SuperVdsm.Server::(main) Parsing cmd
args
MainThread::DEBUG::2014-07-19
18:55:38,059::supervdsmServer::396::SuperVdsm.Server::(main) Cleaning old
socket /var/run/vdsm/svdsm.sock
MainThread::DEBUG::2014-07-19
18:55:38,059::supervdsmServer::400::SuperVdsm.Server::(main) Setting up
keep alive thread
MainThread::DEBUG::2014-07-19
18:55:38,059::supervdsmServer::406::SuperVdsm.Server::(main) Creating
remote object manager
MainThread::DEBUG::2014-07-19
18:55:38,061::supervdsmServer::417::SuperVdsm.Server::(main) Started
serving super vdsm object
sourceRoute::DEBUG::2014-07-19
18:55:38,062::sourceRouteThread::56::root::(_subscribeToInotifyLoop)
sourceRouteThread.subscribeToInotifyLoop started


On Sat, Jul 19, 2014 at 6:48 PM, Andrew Lau and...@andrewklau.com wrote:

 Here's a snippet from my hosted-engine-setup log

 2014-07-19 18:45:14 DEBUG otopi.context context._executeMethod:138 Stage
 late_setup METHOD
 otopi.plugins.ovirt_hosted_engine_setup.vm.configurevm.Plugin._late_setup
 2014-07-19 18:45:14 DEBUG otopi.context context._executeMethod:152 method
 exception
 Traceback (most recent call last):
   File /usr/lib/python2.6/site-packages/otopi/context.py, line 142, in
 _executeMethod
 method['method']()
   File
 /usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/vm/configurevm.py,
 line 99, in _late_setup
 response = serv.s.list()
   File /usr/lib64/python2.6/xmlrpclib.py, line 1199, in __call__
 return self.__send(self.__name, args)
   File /usr/lib64/python2.6/xmlrpclib.py, line 1489, in __request
 verbose=self.__verbose
   File /usr/lib64/python2.6/xmlrpclib.py, line 1235, in request
 self.send_content(h, request_body)
   File /usr/lib64/python2.6/xmlrpclib.py, line 1349, in send_content
 connection.endheaders()
   File /usr/lib64/python2.6/httplib.py, line 908, in endheaders
 self._send_output()
   File /usr/lib64/python2.6/httplib.py, line 780, in _send_output
 self.send(msg)
   File /usr/lib64/python2.6/httplib.py, line 739, in send
 self.connect()
   File /usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py,
 line 188, in connect
 sock = socket.create_connection((self.host, self.port), self.timeout)
   File /usr/lib64/python2.6/socket.py, line 567, in create_connection
 raise error, msg
 error: [Errno 111] Connection refused
 2014-07-19 18:45:14 ERROR otopi.context context._executeMethod:161 Failed
 to execute stage 'Environment setup': [Errno 111] Connection refused


 On Wed, Jul 16, 2014 at 9:05 PM, Sandro Bonazzola sbona...@redhat.com
 wrote:

 Il 16/07/2014 00:47, Christopher Jaggon ha scritto:
  Here is a list of packages :
 
  rpm -qa | grep -i vdsm gives :
 
  vdsm-python-4.14.9-0.el6.x86_64
  vdsm-python-zombiereaper-4.14.9-0.el6.noarch
  vdsm-xmlrpc-4.14.9-0.el6.noarch
  vdsm-4.14.9-0.el6.x86_64
  vdsm-cli-4.14.9-0.el6.noarch
 
  When I try to run the hosted engine setup I get this error in the log :
 
  [ INFO  ] Waiting for VDSM hardware info
  [ INFO  ] Waiting for VDSM hardware info
  [ ERROR ] Failed to execute stage 'Environment setup': [Errno 111]
 Connection refused
  [ INFO  ] Stage: Clean up
 
  Any advice and why this maybe so?


 Can you please provide hosted-engine setup, vdsm, supervdsm and libvirt
 logs?



 
 
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 


 --
 Sandro Bonazzola
 Better technology. Faster innovation. Powered by community collaboration.
 See how it works at redhat.com
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Setup of hosted Engine Fails

2014-07-19 Thread Andrew Lau
Quick update, it seems to be related to the latest vdsm package,

service vdsmd start
vdsm: Running mkdirs
vdsm: Running configure_coredump
vdsm: Running configure_vdsm_logs
vdsm: Running run_init_hooks
vdsm: Running gencerts
vdsm: Running check_is_configured
libvirt is not configured for vdsm yet
Modules libvirt are not configured
 Traceback (most recent call last):
  File /usr/bin/vdsm-tool, line 145, in module
sys.exit(main())
  File /usr/bin/vdsm-tool, line 142, in main
return tool_command[cmd][command](*args[1:])
  File /usr/lib64/python2.6/site-packages/vdsm/tool/configurator.py, line
282, in isconfigured
raise RuntimeError(msg)
RuntimeError:

One of the modules is not configured to work with VDSM.
To configure the module use the following:
'vdsm-tool configure [module_name]'.

If all modules are not configured try to use:
'vdsm-tool configure --force'
(The force flag will stop the module's service and start it
afterwards automatically to load the new configuration.)

vdsm: stopped during execute check_is_configured task (task returned with
error code 1).
vdsm start [FAILED]

yum downgrade vdsm*

​Here's the package changes for reference,

-- Running transaction check
--- Package vdsm.x86_64 0:4.14.9-0.el6 will be a downgrade
--- Package vdsm.x86_64 0:4.14.11-0.el6 will be erased
--- Package vdsm-cli.noarch 0:4.14.9-0.el6 will be a downgrade
--- Package vdsm-cli.noarch 0:4.14.11-0.el6 will be erased
--- Package vdsm-python.x86_64 0:4.14.9-0.el6 will be a downgrade
--- Package vdsm-python.x86_64 0:4.14.11-0.el6 will be erased
--- Package vdsm-python-zombiereaper.noarch 0:4.14.9-0.el6 will be a
downgrade
--- Package vdsm-python-zombiereaper.noarch 0:4.14.11-0.el6 will be erased
--- Package vdsm-xmlrpc.noarch 0:4.14.9-0.el6 will be a downgrade
--- Package vdsm-xmlrpc.noarch 0:4.14.11-0.el6 will be erased

service vdsmd start
initctl: Job is already running: libvirtd
vdsm: Running mkdirs
vdsm: Running configure_coredump
vdsm: Running configure_vdsm_logs
vdsm: Running run_init_hooks
vdsm: Running gencerts
vdsm: Running check_is_configured
libvirt is already configured for vdsm
sanlock service is already configured
vdsm: Running validate_configuration
SUCCESS: ssl configured to true. No conflicts
vdsm: Running prepare_transient_repository
vdsm: Running syslog_available
vdsm: Running nwfilter
vdsm: Running dummybr
vdsm: Running load_needed_modules
vdsm: Running tune_system
vdsm: Running test_space
vdsm: Running test_lo
vdsm: Running unified_network_persistence_upgrade
vdsm: Running restore_nets
vdsm: Running upgrade_300_nets
Starting up vdsm daemon:
vdsm start [  OK  ]
[root@ov-hv1-2a-08-23 ~]# service vdsmd status
VDS daemon server is running


On Sat, Jul 19, 2014 at 6:58 PM, Andrew Lau and...@andrewklau.com wrote:

 It seems vdsm is not running,

 service vdsmd status
 VDS daemon is not running, and its watchdog is running

 The only logs in /var/log/vdsm/ that appear to have any content is
 /var/log/vdsm/supervdsm.log - everything else is blank

 MainThread::DEBUG::2014-07-19
 18:55:34,793::supervdsmServer::424::SuperVdsm.Server::(main) Terminated
 normally
 MainThread::DEBUG::2014-07-19
 18:55:38,033::netconfpersistence::134::root::(_getConfigs) Non-existing
 config set.
 MainThread::DEBUG::2014-07-19
 18:55:38,034::netconfpersistence::134::root::(_getConfigs) Non-existing
 config set.
 MainThread::DEBUG::2014-07-19
 18:55:38,058::supervdsmServer::384::SuperVdsm.Server::(main) Making sure
 I'm root - SuperVdsm
 MainThread::DEBUG::2014-07-19
 18:55:38,059::supervdsmServer::393::SuperVdsm.Server::(main) Parsing cmd
 args
 MainThread::DEBUG::2014-07-19
 18:55:38,059::supervdsmServer::396::SuperVdsm.Server::(main) Cleaning old
 socket /var/run/vdsm/svdsm.sock
 MainThread::DEBUG::2014-07-19
 18:55:38,059::supervdsmServer::400::SuperVdsm.Server::(main) Setting up
 keep alive thread
 MainThread::DEBUG::2014-07-19
 18:55:38,059::supervdsmServer::406::SuperVdsm.Server::(main) Creating
 remote object manager
 MainThread::DEBUG::2014-07-19
 18:55:38,061::supervdsmServer::417::SuperVdsm.Server::(main) Started
 serving super vdsm object
 sourceRoute::DEBUG::2014-07-19
 18:55:38,062::sourceRouteThread::56::root::(_subscribeToInotifyLoop)
 sourceRouteThread.subscribeToInotifyLoop started


 On Sat, Jul 19, 2014 at 6:48 PM, Andrew Lau and...@andrewklau.com wrote:

 Here's a snippet from my hosted-engine-setup log

 2014-07-19 18:45:14 DEBUG otopi.context context._executeMethod:138 Stage
 late_setup METHOD
 otopi.plugins.ovirt_hosted_engine_setup.vm.configurevm.Plugin._late_setup
 2014-07-19 18:45:14 DEBUG otopi.context context._executeMethod:152 method
 exception
 Traceback (most recent call last):
   File /usr/lib/python2.6/site-packages/otopi/context.py, line 142, in
 _executeMethod
 method['method']()
   File
 /usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/vm

Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-07-19 Thread Andrew Lau
Hi,

Did anyone find much luck tracking this down? I rebooted one of our servers
and hit this issue again, conveniently, the dell remote access card has
borked as well.. so a 50 minute trip to the DC..


On Thu, Jun 19, 2014 at 10:10 AM, Bob Doolittle bobddr...@gmail.com wrote:

  Specifically, if do the following:

- Enter global maintenance (hosted-engine --set-maintenance-mode
--mode=global)
- init 0 the engine
- systemctl stop ovirt-ha-agent ovirt-ha-broker libvirtd vdmsd


 and then run sanlock client status I see:

 # sanlock client status
 daemon c715b5de-fd98-4146-a0b1-e9801179c768.xion2.smar
 p -1 helper
 p -1 listener
 p -1 status
 s 
 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
 s 
 18eeab54-e482-497f-b096-11f8a43f94f4:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/dom_md/ids:0
 s 
 hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0


 Waiting a few minutes does not change this state.

 The earlier data I shared which showed HostedEngine was with a different
 test scenario.

 -Bob


 On 06/18/2014 07:53 AM, Bob Doolittle wrote:

 I see I have a very unfortunate typo in my previous mail. As supported by
 the vm-status output I attached, I had set --mode=global (not none) in step
 1.

 I am not the only one experiencing this. I can reproduce it easily. It
 appears that shutting down vdsm causes the HA services to incorrectly think
 the system has come out of Global Maintenance and restart the engine.

 -Bob
 On Jun 18, 2014 5:06 AM, Federico Simoncelli fsimo...@redhat.com
 wrote:

 - Original Message -
  From: Bob Doolittle b...@doolittle.us.com
  To: Doron Fediuck dfedi...@redhat.com, Andrew Lau 
 and...@andrewklau.com
  Cc: users users@ovirt.org, Federico Simoncelli 
 fsimo...@redhat.com
  Sent: Saturday, June 14, 2014 1:29:54 AM
  Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
 
  But there may be more going on. Even if I stop vdsmd, the HA services,
  and libvirtd, and sleep 60 seconds, I still see a lock held on the
  Engine VM storage:
 
  daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar
  p -1 helper
  p -1 listener
  p -1 status
  s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/
 xion2.smartcity.net
 \:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
  s
 hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0

 This output shows that the lockspaces are still acquired. When you put
 hosted-engine
 in maintenance they must be released.
 One by directly using rem_lockspace (since it's the hosted-engine one)
 and the other
 one by stopMonitoringDomain.

 I quickly looked at the ovirt-hosted-engine* projects and I haven't found
 anything
 related to that.

 --
 Federico



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-07-19 Thread Andrew Lau
btw, this happened on an aborted hosted-engine install. So, the ha-agents
hadn't even started up.. just the VM running.


On Sat, Jul 19, 2014 at 11:24 PM, Andrew Lau and...@andrewklau.com wrote:

 Hi,

 Did anyone find much luck tracking this down? I rebooted one of our
 servers and hit this issue again, conveniently, the dell remote access card
 has borked as well.. so a 50 minute trip to the DC..


 On Thu, Jun 19, 2014 at 10:10 AM, Bob Doolittle bobddr...@gmail.com
 wrote:

  Specifically, if do the following:

- Enter global maintenance (hosted-engine --set-maintenance-mode
--mode=global)
- init 0 the engine
- systemctl stop ovirt-ha-agent ovirt-ha-broker libvirtd vdmsd


 and then run sanlock client status I see:

 # sanlock client status
 daemon c715b5de-fd98-4146-a0b1-e9801179c768.xion2.smar
 p -1 helper
 p -1 listener
 p -1 status
 s 
 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
 s 
 18eeab54-e482-497f-b096-11f8a43f94f4:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/dom_md/ids:0
 s 
 hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0


 Waiting a few minutes does not change this state.

 The earlier data I shared which showed HostedEngine was with a different
 test scenario.

 -Bob


 On 06/18/2014 07:53 AM, Bob Doolittle wrote:

 I see I have a very unfortunate typo in my previous mail. As supported by
 the vm-status output I attached, I had set --mode=global (not none) in step
 1.

 I am not the only one experiencing this. I can reproduce it easily. It
 appears that shutting down vdsm causes the HA services to incorrectly think
 the system has come out of Global Maintenance and restart the engine.

 -Bob
 On Jun 18, 2014 5:06 AM, Federico Simoncelli fsimo...@redhat.com
 wrote:

 - Original Message -
  From: Bob Doolittle b...@doolittle.us.com
  To: Doron Fediuck dfedi...@redhat.com, Andrew Lau 
 and...@andrewklau.com
  Cc: users users@ovirt.org, Federico Simoncelli 
 fsimo...@redhat.com
  Sent: Saturday, June 14, 2014 1:29:54 AM
  Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
 
  But there may be more going on. Even if I stop vdsmd, the HA services,
  and libvirtd, and sleep 60 seconds, I still see a lock held on the
  Engine VM storage:
 
  daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar
  p -1 helper
  p -1 listener
  p -1 status
  s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/
 xion2.smartcity.net
 \:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
  s
 hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0

 This output shows that the lockspaces are still acquired. When you put
 hosted-engine
 in maintenance they must be released.
 One by directly using rem_lockspace (since it's the hosted-engine one)
 and the other
 one by stopMonitoringDomain.

 I quickly looked at the ovirt-hosted-engine* projects and I haven't
 found anything
 related to that.

 --
 Federico




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-18 Thread Andrew Lau
Hi all,

As most of you have got hints from previous messages, hosted engine won't
work on gluster . A quote from BZ1097639

Using hosted engine with Gluster backed storage is currently something we
really warn against.


I think this bug should be closed or re-targeted at documentation,
because there is nothing we can do here. Hosted engine assumes that
all writes are atomic and (immediately) available for all hosts in the
cluster. Gluster violates those assumptions.

​

​Until the documentation gets updated, I hope this serves as a useful
notice at least to save people some of the headaches I hit like
hosted-engine starting up multiple VMs because of above issue.
​

Now my question, does this theory prevent a scenario of perhaps something
like a gluster replicated volume being mounted as a glusterfs filesystem
and then re-exported as the native kernel NFS share for the hosted-engine
to consume? It could then be possible to chuck ctdb in there to provide a
last resort failover solution. I have tried myself and suggested it to two
people who are running a similar setup. Now using the native kernel NFS
server for hosted-engine and they haven't reported as many issues. Curious,
could anyone validate my theory on this?

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-18 Thread Andrew Lau
​​

On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur vbel...@redhat.com wrote:

 [Adding gluster-devel]


 On 07/18/2014 05:20 PM, Andrew Lau wrote:

 Hi all,

 As most of you have got hints from previous messages, hosted engine
 won't work on gluster . A quote from BZ1097639

 Using hosted engine with Gluster backed storage is currently something
 we really warn against.


 I think this bug should be closed or re-targeted at documentation,
 because there is nothing we can do here. Hosted engine assumes that all
 writes are atomic and (immediately) available for all hosts in the cluster.
 Gluster violates those assumptions.
 ​

 I tried going through BZ1097639 but could not find much detail with
 respect to gluster there.

 A few questions around the problem:

 1. Can somebody please explain in detail the scenario that causes the
 problem?

 2. Is hosted engine performing synchronous writes to ensure that writes
 are durable?

 Also, if there is any documentation that details the hosted engine
 architecture that would help in enhancing our understanding of its
 interactions with gluster.


  ​

 Now my question, does this theory prevent a scenario of perhaps
 something like a gluster replicated volume being mounted as a glusterfs
 filesystem and then re-exported as the native kernel NFS share for the
 hosted-engine to consume? It could then be possible to chuck ctdb in
 there to provide a last resort failover solution. I have tried myself
 and suggested it to two people who are running a similar setup. Now
 using the native kernel NFS server for hosted-engine and they haven't
 reported as many issues. Curious, could anyone validate my theory on this?


 If we obtain more details on the use case and obtain gluster logs from the
 failed scenarios, we should be able to understand the problem better. That
 could be the first step in validating your theory or evolving further
 recommendations :).


​I'm not sure how useful this is, but ​Jiri Moskovcak tracked this down in
an off list message.

​Message Quote:​

​==​

​We were able to track it down to this (thanks Andrew for providing the
testing setup):

-b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
Traceback (most recent call last):
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 165, in handle
response = success  + self._dispatch(data)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
line 261, in _dispatch
.get_all_stats_for_service_type(**options)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 41, in get_all_stats_for_service_type
d = self.get_raw_stats_for_service_type(storage_dir, service_type)
  File 
/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
line 74, in get_raw_stats_for_service_type
f = os.open(path, direct_flag | os.O_RDONLY)
OSError: [Errno 116] Stale file handle: '/rhev/data-center/mnt/localho
st:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted-
engine.metadata'

It's definitely connected to the storage which leads us to the gluster, I'm
not very familiar with the gluster so I need to check this with our gluster
 gurus.​

​==​



 Thanks,
 Vijay

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Gluster-devel] Can we debug some truths/myths/facts about hosted-engine and gluster?

2014-07-18 Thread Andrew Lau
On Sat, Jul 19, 2014 at 12:03 AM, Pranith Kumar Karampuri 
pkara...@redhat.com wrote:


 On 07/18/2014 05:43 PM, Andrew Lau wrote:

  ​ ​

  On Fri, Jul 18, 2014 at 10:06 PM, Vijay Bellur vbel...@redhat.com
 wrote:

 [Adding gluster-devel]


 On 07/18/2014 05:20 PM, Andrew Lau wrote:

 Hi all,

 As most of you have got hints from previous messages, hosted engine
 won't work on gluster . A quote from BZ1097639

 Using hosted engine with Gluster backed storage is currently something
 we really warn against.


 I think this bug should be closed or re-targeted at documentation,
 because there is nothing we can do here. Hosted engine assumes that all
 writes are atomic and (immediately) available for all hosts in the cluster.
 Gluster violates those assumptions.
 ​

  I tried going through BZ1097639 but could not find much detail with
 respect to gluster there.

 A few questions around the problem:

 1. Can somebody please explain in detail the scenario that causes the
 problem?

 2. Is hosted engine performing synchronous writes to ensure that writes
 are durable?

 Also, if there is any documentation that details the hosted engine
 architecture that would help in enhancing our understanding of its
 interactions with gluster.


 ​

 Now my question, does this theory prevent a scenario of perhaps
 something like a gluster replicated volume being mounted as a glusterfs
 filesystem and then re-exported as the native kernel NFS share for the
 hosted-engine to consume? It could then be possible to chuck ctdb in
 there to provide a last resort failover solution. I have tried myself
 and suggested it to two people who are running a similar setup. Now
 using the native kernel NFS server for hosted-engine and they haven't
 reported as many issues. Curious, could anyone validate my theory on
 this?


  If we obtain more details on the use case and obtain gluster logs from
 the failed scenarios, we should be able to understand the problem better.
 That could be the first step in validating your theory or evolving further
 recommendations :).


  ​I'm not sure how useful this is, but ​Jiri Moskovcak tracked this down
 in an off list message.

  ​Message Quote:​

  ​==​

   ​We were able to track it down to this (thanks Andrew for providing the
 testing setup):

 -b686-4363-bb7e-dba99e5789b6/ha_agent service_type=hosted-engine'
 Traceback (most recent call last):
   File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
 line 165, in handle
 response = success  + self._dispatch(data)
   File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py,
 line 261, in _dispatch
 .get_all_stats_for_service_type(**options)
   File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
 line 41, in get_all_stats_for_service_type
 d = self.get_raw_stats_for_service_type(storage_dir, service_type)
   File 
 /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py,
 line 74, in get_raw_stats_for_service_type
 f = os.open(path, direct_flag | os.O_RDONLY)
 OSError: [Errno 116] Stale file handle: '/rhev/data-center/mnt/localho
 st:_mnt_hosted-engine/c898fd2a-b686-4363-bb7e-dba99e5789b6/ha_agent/hosted
 -engine.metadata'

 Andrew/Jiri,
 Would it be possible to post gluster logs of both the mount and
 bricks on the bz? I can take a look at it once. If I gather nothing then
 probably I will ask for your help in re-creating the issue.

 Pranith


​Unfortunately, I don't have the logs for that setup any more.. ​I'll try
replicate when I get a chance. If I understand the comment from the BZ, I
don't think it's a gluster bug per-say, more just how gluster does its
replication.





 It's definitely connected to the storage which leads us to the gluster,
 I'm not very familiar with the gluster so I need to check this with our
 gluster gurus.​

  ​==​



 Thanks,
 Vijay




 ___
 Gluster-devel mailing 
 listGluster-devel@gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-devel



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Deploying hosted engine on second host with different CPU model

2014-07-17 Thread Andrew Lau
I think you should be able specify this within the ovirt-engine, just
modify the cluster's cpu compatibility. I hit this too, but i think I
just ended up provisioning the older machine first then the newer ones
joined with the older model

On Thu, Jul 17, 2014 at 11:05 PM, George Machitidze
gmachiti...@greennet.ge wrote:
 Hello,

 I am deploying hosted engine (HA) on hosts with different CPU models on one
 of my oVirt labs.

 Host have different CPU's or there is also the problem: virtualization
 platform cannot detect CPU at all, The following CPU types are supported by
 this host: is empty:


 2014-07-17 16:51:42 DEBUG otopi.plugins.ovirt_hosted_engine_setup.vdsmd.cpu
 cpu._customization:124 Compatible CPU models are: []

 Is there any way to override this setting and use CPU of old machine for
 both hosts?

 ex.
 host1:

 cpu family  : 6
 model   : 15
 model name  : Intel(R) Xeon(R) CPU5160  @ 3.00GHz
 stepping: 11

 host2:

 cpu family  : 6
 model   : 42
 model name  : Intel(R) Xeon(R) CPU E31220 @ 3.10GHz
 stepping: 7



 [root@ovirt2 ~]# hosted-engine --deploy
 [ INFO  ] Stage: Initializing
   Continuing will configure this host for serving as hypervisor and
 create a VM where you have to install oVirt Engine afterwards.
   Are you sure you want to continue? (Yes, No)[Yes]:
 [ INFO  ] Generating a temporary VNC password.
 [ INFO  ] Stage: Environment setup
   Configuration files: []
   Log file:
 /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20140717165111-7tg2g7.log
   Version: otopi-1.2.1 (otopi-1.2.1-1.el6)
 [ INFO  ] Hardware supports virtualization
 [ INFO  ] Stage: Environment packages setup
 [ INFO  ] Stage: Programs detection
 [ INFO  ] Stage: Environment setup
 [ INFO  ] Stage: Environment customization

   --== STORAGE CONFIGURATION ==--

   During customization use CTRL-D to abort.
   Please specify the storage you would like to use (nfs3,
 nfs4)[nfs3]:
   Please specify the full shared storage connection path to use
 (example: host:/path): ovirt-hosted:/engine
   The specified storage location already contains a data domain. Is
 this an additional host setup (Yes, No)[Yes]?
 [ INFO  ] Installing on additional host
   Please specify the Host ID [Must be integer, default: 2]:

   --== SYSTEM CONFIGURATION ==--

 [WARNING] A configuration file must be supplied to deploy Hosted Engine on
 an additional host.
   The answer file may be fetched from the first host using scp.
   If you do not want to download it automatically you can abort the
 setup answering no to the following question.
   Do you want to scp the answer file from the first host? (Yes,
 No)[Yes]:
   Please provide the FQDN or IP of the first host: ovirt1.test.ge
   Enter 'root' user password for host ovirt1.test.ge:
 [ INFO  ] Answer file successfully downloaded

   --== NETWORK CONFIGURATION ==--

   The following CPU types are supported by this host:
 [ ERROR ] Failed to execute stage 'Environment customization': Invalid CPU
 type specified: model_Conroe
 [ INFO  ] Stage: Clean up
 [ INFO  ] Stage: Pre-termination
 [ INFO  ] Stage: Termination

 --
 BR

 George Machitidze


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt + gluster with host and hosted VM on different subnets

2014-07-08 Thread Andrew Lau
On Wed, Jul 9, 2014 at 12:23 AM, Sandro Bonazzola sbona...@redhat.com wrote:
 Il 07/07/2014 15:38, Simone Marchioni ha scritto:
 Hi,

 I'm trying to install oVirt 3.4 + gluster looking at the following guides:

 http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
 http://community.redhat.com/blog/2014/03/up-and-running-with-ovirt-3-4/

 It went smooth until the hosted engine VM configuration: I can reach it by 
 VNC and the host IP, but I can't configure the VM network in a way it works.
 Probably the problem is the assumption that the three hosts (2 hosts + the 
 hosted engine) are on the same subnet sharing the same default gateway.

 But my server is on OVH with the subnet 94.23.2.0/24 and my failover IPs are 
 on the subnet 46.105.224.236/30, and my hosted engine need to use one IP
 of the last ones.

 Anyone installed oVirt in such a configuration and can give me any tip?

 Never tested such configuration.
 Andrew, something similar at your installation with an additional NIC?

If I understand correctly, you have your hosts on the 94.23.2.0/24
subnet but you need your hosted engine to be accessible as an address
within 46.105.224.236? If that's true, then the easiest way to do it
is simply run your hosted-engine install with the hosted-engine first
on 94.23.2.0/24, you then add another nic to that hosted-engine VM
which'll have the IP address for 46.105.224.237 (?)... alternatively,
you could just use a nic alias?

If you want to add an extra NIC to your hosted-engine to do that above
scenario, here's a snippet from my notes:
(storage_network is a bridge, replace that with ovirtmgmt or another
bridge you may have created)

hosted-engine --set-maintenance --mode=global

# On all installed hosts
nano /etc/ovirt-hosted-engine/vm.conf
# insert under earlier nicModel
# replace macaddress and uuid from above
# increment slot
devices={nicModel:pv,macAddr:00:16:3e:e1:7b:14,linkActive:true,network:storage_network,filter:vdsm-no-mac-spoofing,specParams:{},deviceId:fdb11208-a888-e587-6053-32c9c0361f96,address:{bus:0x00,slot:0x04,
domain:0x, type:pci,function:0x0},device:bridge,type:interface}

hosted-engine --vm-shutdown
hosted-engine --vm-start

hosted-engine --set-maintenance --mode=none


Although, re-reading your question, what do you mean by 'but I can't
configure the VM network in a way it works.' ? Does the setup fail, or
just when you create a VM you don't have any network connectivity..



 Thanks
 Simone
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


 --
 Sandro Bonazzola
 Better technology. Faster innovation. Powered by community collaboration.
 See how it works at redhat.com
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt + gluster with host and hosted VM on different subnets

2014-07-08 Thread Andrew Lau
Hi Martin,

Is that because of how the replication works? What if you had, the
kernel-nfs server running ontop of the glusternfs share and a virtual
IP to allow the hosted-engine to only access one of the shares.

Thanks,
Andrew


On Wed, Jul 9, 2014 at 5:15 AM, Martin Sivak msi...@redhat.com wrote:
 Hi,

 I do not recommend running hosted engine on top of GlusterFS. Not even on top 
 of NFS compatibility layer the GlusterFS provides.

 There have been a lot of issues with setups like that. GlusterFS does not 
 ensure that the metadata writes are atomic and visible to all nodes at the 
 same time and that causes serious trouble (the synchronization algorithm 
 relies on the atomicity assumption).

 You can use GlusterFS storage domain for VMs, but the hosted engine storage 
 domain needs something else - NFS or iSCSI (available in 3.5).

 --
 Martin Sivák
 msi...@redhat.com
 Red Hat Czech
 RHEV-M SLA / Brno, CZ

 - Original Message -
 Il 08/07/2014 16:47, Andrew Lau ha scritto:
  On Wed, Jul 9, 2014 at 12:23 AM, Sandro Bonazzola sbona...@redhat.com
  wrote:
  Il 07/07/2014 15:38, Simone Marchioni ha scritto:
  Hi,
 
  I'm trying to install oVirt 3.4 + gluster looking at the following
  guides:
 
  http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
  http://community.redhat.com/blog/2014/03/up-and-running-with-ovirt-3-4/
 
  It went smooth until the hosted engine VM configuration: I can reach it
  by VNC and the host IP, but I can't configure the VM network in a way it
  works.
  Probably the problem is the assumption that the three hosts (2 hosts +
  the hosted engine) are on the same subnet sharing the same default
  gateway.
 
  But my server is on OVH with the subnet 94.23.2.0/24 and my failover IPs
  are on the subnet 46.105.224.236/30, and my hosted engine need to use
  one IP
  of the last ones.
 
  Anyone installed oVirt in such a configuration and can give me any tip?
  Never tested such configuration.
  Andrew, something similar at your installation with an additional NIC?
  If I understand correctly, you have your hosts on the 94.23.2.0/24
  subnet but you need your hosted engine to be accessible as an address
  within 46.105.224.236?

 Exactly

  If that's true, then the easiest way to do it
  is simply run your hosted-engine install with the hosted-engine first
  on 94.23.2.0/24, you then add another nic to that hosted-engine VM
  which'll have the IP address for 46.105.224.237 (?)...

 I'll try this

  alternatively, you could just use a nic alias?

 We made it work with the following changes (on the host machine in the
 subnet 94.23.2.0/24):
 - commented out and removed from running configuration ip rules in
 /etc/sysconfig/network-scripts/rule-ovirtmgmt
 - commented out and removed from running configuration ip routes in
 /etc/sysconfig/network-scripts/route-ovirtmgmt
 - added /etc/sysconfig/network-scripts/ovirtmgmt:0 with the following
 configuration:
 DEVICE=ovirtmgmt:238
 ONBOOT=yes
 DELAY=0
 IPADDR=46.105.224.238
 NETMASK=255.255.255.252
 BOOTPROTO=static
 NM_CONTROLLED=no
 - enabled ip forwarding in /proc/sys/net/ipv4/ip_forward

 After that installing the hosted-engine VM with the following IP stack:

 NETMASK=255.255.255.252
 IPADDR=46.105.224.237
 GATEWAY=46.105.224.238

 seems to work ok.

  If you want to add an extra NIC to your hosted-engine to do that above
  scenario, here's a snippet from my notes:
  (storage_network is a bridge, replace that with ovirtmgmt or another
  bridge you may have created)
 
  hosted-engine --set-maintenance --mode=global
 
  # On all installed hosts
  nano /etc/ovirt-hosted-engine/vm.conf
  # insert under earlier nicModel
  # replace macaddress and uuid from above
  # increment slot
  devices={nicModel:pv,macAddr:00:16:3e:e1:7b:14,linkActive:true,network:storage_network,filter:vdsm-no-mac-spoofing,specParams:{},deviceId:fdb11208-a888-e587-6053-32c9c0361f96,address:{bus:0x00,slot:0x04,
  domain:0x, type:pci,function:0x0},device:bridge,type:interface}
 
  hosted-engine --vm-shutdown
  hosted-engine --vm-start
 
  hosted-engine --set-maintenance --mode=none

 Ok: thanks for the advice!

  Although, re-reading your question, what do you mean by 'but I can't
  configure the VM network in a way it works.' ? Does the setup fail, or
  just when you create a VM you don't have any network connectivity..

 The setup works ok: it creates the VM and i can login to it with VNC on
 the host IP (94.23.2.X).
 I can install CentOS 6.5 as advised. After the reboot I login again by
 VNC and the host IP (94.23.2.X), and configure the IP stack with the
 other subnet (46.105.224.236/30) and after that the VM is isolated,
 unless I do the steps written above.

 Thanks for your support!
 Simone

 
  Thanks
  Simone
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
  --
  Sandro Bonazzola
  Better technology. Faster innovation. Powered by community collaboration

Re: [ovirt-users] Glusterfs HA with Ovirt

2014-07-03 Thread Andrew Lau
Don't forget to take into consideration quroum, that's something
people often forget

The reason you're having the current happen, is gluster only uses the
initial IP address to get the volume details. After that it'll connect
directly to ONE of the servers, so with your 2 storage server case,
50% chance it won't go to paused state.

For the VIP, you could consider CTDB or keepelived, or even just using
localhost (as your storage and compute are all on the same machine).
For CTDB, checkout
http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/

I have a BZ open regarding gluster VMs going into paused state and not
being resumable, so it's something you should also consider. My case,
switch dies, gluster volume goes away, VMs go into paused state but
can't be resumed. If you lose one server out of a cluster is a
different story though.
https://bugzilla.redhat.com/show_bug.cgi?id=1058300

HTH

On Fri, Jul 4, 2014 at 11:48 AM, Punit Dambiwal hypu...@gmail.com wrote:
 Hi,

 Thanks...can you suggest me any good how to/article for the glusterfs with
 ovirt...

 One strange thing is if i will try both (compute  storage) on the same
 node...the below quote not happen

 -

 Right now, if 10.10.10.2 goes away, all your gluster mounts go away and your
 VMs get paused because the hypervisors can’t access the storage. Your
 gluster storage is still fine, but ovirt can’t talk to it because 10.10.10.2
 isn’t there.
 -

 Even the 10.10.10.2 goes down...i can still access the gluster mounts and no
 VM pausei can access the VM via ssh...no connection failure.the
 connection drop only in case of SPM goes down and the another node will
 elect as SPM(All the running VM's pause in this condition).



 On Fri, Jul 4, 2014 at 4:12 AM, Darrell Budic darrell.bu...@zenfire.com
 wrote:

 You need to setup a virtual IP to use as the mount point, most people use
 keepalived to provide a virtual ip via vrrp for this. Setup something like
 10.10.10.10 and use that for your mounts.

 Right now, if 10.10.10.2 goes away, all your gluster mounts go away and
 your VMs get paused because the hypervisors can’t access the storage. Your
 gluster storage is still fine, but ovirt can’t talk to it because 10.10.10.2
 isn’t there.

 If the SPM goes down, it the other hypervisor hosts will elect a new one
 (under control of the ovirt engine).

 Same scenarios if storage  compute are on the same server, you still need
 a vip address for the storage portion to serve as the mount point so it’s
 not dependent on any one server.

 -Darrell

 On Jul 3, 2014, at 1:14 AM, Punit Dambiwal hypu...@gmail.com wrote:

 Hi,

 I have some HA related concern about glusterfs with Ovirt...let say i have
 4 storage node with gluster bricks as below :-

 1. 10.10.10.1 to 10.10.10.4 with 2 bricks each and i have distributed
 replicated architecture...
 2. Now attached this gluster storge to ovrit-engine with the following
 mount point 10.10.10.2/vol1
 3. In my cluster i have 3 hypervisior hosts (10.10.10.5 to 10.10.10.7) SPM
 is on 10.10.10.5...
 4. What happen if 10.10.10.2 will goes down.can hypervisior host can
 still access the storage ??
 5. What happen if SPM goes down ???

 Note :- What happen for point 4 5 ,If storage and Compute both working on
 the same server.

 Thanks,
 Punit
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users




 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Glusterfs HA with Ovirt

2014-07-03 Thread Andrew Lau
Or just localhost as your computer and storage are on the same box.


On Fri, Jul 4, 2014 at 2:48 PM, Punit Dambiwal hypu...@gmail.com wrote:
 Hi Andrew,

 Thanks for the updatethat means HA can not work without VIP in the
 gluster,so better to use the glusterfs with the VIP to take over the ip...in
 case of any storage node failure...


 On Fri, Jul 4, 2014 at 12:35 PM, Andrew Lau and...@andrewklau.com wrote:

 Don't forget to take into consideration quroum, that's something
 people often forget

 The reason you're having the current happen, is gluster only uses the
 initial IP address to get the volume details. After that it'll connect
 directly to ONE of the servers, so with your 2 storage server case,
 50% chance it won't go to paused state.

 For the VIP, you could consider CTDB or keepelived, or even just using
 localhost (as your storage and compute are all on the same machine).
 For CTDB, checkout
 http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/

 I have a BZ open regarding gluster VMs going into paused state and not
 being resumable, so it's something you should also consider. My case,
 switch dies, gluster volume goes away, VMs go into paused state but
 can't be resumed. If you lose one server out of a cluster is a
 different story though.
 https://bugzilla.redhat.com/show_bug.cgi?id=1058300

 HTH

 On Fri, Jul 4, 2014 at 11:48 AM, Punit Dambiwal hypu...@gmail.com wrote:
  Hi,
 
  Thanks...can you suggest me any good how to/article for the glusterfs
  with
  ovirt...
 
  One strange thing is if i will try both (compute  storage) on the same
  node...the below quote not happen
 
  -
 
  Right now, if 10.10.10.2 goes away, all your gluster mounts go away and
  your
  VMs get paused because the hypervisors can’t access the storage. Your
  gluster storage is still fine, but ovirt can’t talk to it because
  10.10.10.2
  isn’t there.
  -
 
  Even the 10.10.10.2 goes down...i can still access the gluster mounts
  and no
  VM pausei can access the VM via ssh...no connection failure.the
  connection drop only in case of SPM goes down and the another node will
  elect as SPM(All the running VM's pause in this condition).
 
 
 
  On Fri, Jul 4, 2014 at 4:12 AM, Darrell Budic
  darrell.bu...@zenfire.com
  wrote:
 
  You need to setup a virtual IP to use as the mount point, most people
  use
  keepalived to provide a virtual ip via vrrp for this. Setup something
  like
  10.10.10.10 and use that for your mounts.
 
  Right now, if 10.10.10.2 goes away, all your gluster mounts go away and
  your VMs get paused because the hypervisors can’t access the storage.
  Your
  gluster storage is still fine, but ovirt can’t talk to it because
  10.10.10.2
  isn’t there.
 
  If the SPM goes down, it the other hypervisor hosts will elect a new
  one
  (under control of the ovirt engine).
 
  Same scenarios if storage  compute are on the same server, you still
  need
  a vip address for the storage portion to serve as the mount point so
  it’s
  not dependent on any one server.
 
  -Darrell
 
  On Jul 3, 2014, at 1:14 AM, Punit Dambiwal hypu...@gmail.com wrote:
 
  Hi,
 
  I have some HA related concern about glusterfs with Ovirt...let say i
  have
  4 storage node with gluster bricks as below :-
 
  1. 10.10.10.1 to 10.10.10.4 with 2 bricks each and i have distributed
  replicated architecture...
  2. Now attached this gluster storge to ovrit-engine with the following
  mount point 10.10.10.2/vol1
  3. In my cluster i have 3 hypervisior hosts (10.10.10.5 to 10.10.10.7)
  SPM
  is on 10.10.10.5...
  4. What happen if 10.10.10.2 will goes down.can hypervisior host
  can
  still access the storage ??
  5. What happen if SPM goes down ???
 
  Note :- What happen for point 4 5 ,If storage and Compute both working
  on
  the same server.
 
  Thanks,
  Punit
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
 
 
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] gluster rpms not found

2014-06-19 Thread Andrew Lau
You're missing vdsm-gluster

yum install vdsm-gluster

On Fri, Jun 20, 2014 at 6:24 AM, Nathan Stratton nat...@robotics.net wrote:
 I am running ovirt 3.4 and have gluster installed:

 [root@virt01a]# yum list installed |grep gluster
 glusterfs.x86_64   3.5.0-2.el6  @ovirt-glusterfs-epel
 glusterfs-api.x86_64   3.5.0-2.el6  @ovirt-glusterfs-epel
 glusterfs-cli.x86_64   3.5.0-2.el6  @ovirt-glusterfs-epel
 glusterfs-fuse.x86_64  3.5.0-2.el6  @ovirt-glusterfs-epel
 glusterfs-libs.x86_64  3.5.0-2.el6  @ovirt-glusterfs-epel
 glusterfs-rdma.x86_64  3.5.0-2.el6  @ovirt-glusterfs-epel
 glusterfs-server.x86_64  3.5.0-2.el6  @ovirt-glusterfs-epel

 However vdsm can't seem to find them:

 Thread-13::DEBUG::2014-06-19
 16:15:57,250::caps::458::root::(_getKeyPackages) rpm package glusterfs-rdma
 not found
 Thread-13::DEBUG::2014-06-19
 16:15:57,250::caps::458::root::(_getKeyPackages) rpm package glusterfs-fuse
 not found
 Thread-13::DEBUG::2014-06-19
 16:15:57,251::caps::458::root::(_getKeyPackages) rpm package gluster-swift
 not found
 Thread-13::DEBUG::2014-06-19
 16:15:57,252::caps::458::root::(_getKeyPackages) rpm package
 gluster-swift-object not found
 Thread-13::DEBUG::2014-06-19
 16:15:57,252::caps::458::root::(_getKeyPackages) rpm package glusterfs not
 found
 Thread-13::DEBUG::2014-06-19
 16:15:57,252::caps::458::root::(_getKeyPackages) rpm package
 gluster-swift-plugin not found
 Thread-13::DEBUG::2014-06-19
 16:15:57,254::caps::458::root::(_getKeyPackages) rpm package
 gluster-swift-account not found
 Thread-13::DEBUG::2014-06-19
 16:15:57,254::caps::458::root::(_getKeyPackages) rpm package
 gluster-swift-proxy not found
 Thread-13::DEBUG::2014-06-19
 16:15:57,254::caps::458::root::(_getKeyPackages) rpm package
 gluster-swift-doc not found
 Thread-13::DEBUG::2014-06-19
 16:15:57,255::caps::458::root::(_getKeyPackages) rpm package
 glusterfs-server not found
 Thread-13::DEBUG::2014-06-19
 16:15:57,255::caps::458::root::(_getKeyPackages) rpm package
 gluster-swift-container not found
 Thread-13::DEBUG::2014-06-19
 16:15:57,255::caps::458::root::(_getKeyPackages) rpm package
 glusterfs-geo-replication not found

 Any ideas?


 nathan stratton | vp technology | broadsoft, inc | +1-240-404-6580 |
 www.broadsoft.com

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Gluster VMs Paused State then not resumable

2014-06-14 Thread Andrew Lau
Hi all,

I'm wondering if anyone's had any luck with running VMs on top of a gluster VM?
I have a bug open here,
https://bugzilla.redhat.com/show_bug.cgi?id=1058300 about VMs going
into a paused state and not being able to resume.

It's gone a little quiet, so I was wondering has anyone else had any
luck with this style of setup? I have my gluster and virt on the same
boxes, and currently 5 in the cluster (replica 2)

A few cases this can be reproduced:

- switch dies
- cable unplug
- gluster volume stop
- gluster brick dies

I recently tried a suggested volume option:
gluster volume set vm-data network.ping-timeout 10

The qemu logs now report on virsh resume vmname:
block I/O error in device 'drive-virtio-disk0': Transport endpoint is
not connected (107)

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] After Upgrade to 3.4.2 no conole icon activation

2014-06-14 Thread Andrew Lau
Possibly a cache issue? Try a hard reset ctrl+shift+r ?

On Sat, Jun 14, 2014 at 3:51 AM, Markus Stockhausen
stockhau...@collogia.de wrote:
 Hello,

 after starting a VM in webadmin on ovirt engine 3.4.2 we have
 to manually switch to another VM and back to get the console
 icon active.

 Is this behaviour desired?

 Markus

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] localdomain

2014-06-11 Thread Andrew Lau
The cloud-init integration was a little flaky when I was using it,

I ended up not using any of the inbuilt oVirt options (eg. hostname,
root password). Root password never worked for me as it'd force a
reset on first login.. defeating the purpose.
Just passing a full cloud-init config into the bottom section worked
for me, so for your case just define the hostname there instead.


On Tue, May 27, 2014 at 9:33 PM, Koen Vanoppen vanoppen.k...@gmail.com wrote:
 Hi Guys,

 It's bin a while :-). Luckily :-).

 I have a quick question. Is there a way to change the default .localdomain
 for the FQDN in ovirt?
 I would be handy if we just had to fill in the hostname of our vm (we are
 using 3.4, with the cloud-init feature) and he automatically adds our domain
 in stead of .localdomain.

 Kind regards,

 Koen

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Recommended setup for a FC based storage domain

2014-06-10 Thread Andrew Lau
Interesting, which files did you modify to lower the log levels?

On Tue, Jun 3, 2014 at 12:38 AM,  combus...@archlinux.us wrote:
 One word of caution so far, when exporting any vm, the node that acts as SPM
 is stressed out to the max. I releived the stress by a certain margin with
 lowering libvirtd and vdsm log levels to WARNING. That shortened out the
 export procedure by at least five times. But vdsm process on the SPM node  is
 still with high cpu usage so it's best that the SPM node should be left with a
 decent CPU time amount to spare. Also, export of VM's with high vdisk capacity
 and thin provisioning enabled (let's say 14GB used of 100GB defined) took
 around 50min over a 10Gb ethernet interface to a 1Gb export NAS device that
 was not stressed out at all by other processes. When I did that export with
 debug log levels it took 5hrs :(

 So lowering log levels is a must in production enviroment. I've deleted the
 lun that I exported on the storage (removed it first from ovirt) and for the
 next weekend I am planing to add a new one, export it again on all the nodes
 and start a few fresh vm installations. Things I'm going to look for are
 partition alignment and running them from different nodes in the cluster at
 the same time. I just hope that not all I/O is going to pass through the SPM,
 this is the one thing that bothers me the most.

 I'll report back on these results next week, but if anyone has experience with
 this kind of things or can point  to some documentation would be great.

 On Monday, 2. June 2014. 18.51.52 you wrote:
 I'm curious to hear what other comments arise, as we're analyzing a
 production setup shortly.

 On Sun, Jun 1, 2014 at 10:11 PM,  combus...@archlinux.us wrote:
  I need to scratch gluster off because setup is based on CentOS 6.5, so
  essential prerequisites like qemu 1.3 and libvirt 1.0.1 are not met.

 Gluster would still work with EL6, afaik it just won't use libgfapi and
 instead use just a standard mount.

  Any info regarding FC storage domain would be appreciated though.
 
  Thanks
 
  Ivan
 
  On Sunday, 1. June 2014. 11.44.33 combus...@archlinux.us wrote:
  Hi,
 
  I have a 4 node cluster setup and my storage options right now are a FC
  based storage, one partition per node on a local drive (~200GB each) and
  a
  NFS based NAS device. I want to setup export and ISO domain on the NAS
  and
  there are no issues or questions regarding those two. I wasn't aware of
  any
  other options at the time for utilizing a local storage (since this is a
  shared based datacenter) so I exported a directory from each partition
  via
  NFS and it works. But I am little in the dark with the following:
 
  1. Are there any advantages for switching from NFS based local storage to
  a
  Gluster based domain with blocks for each partition. I guess it can be
  only
  performance wise but maybe I'm wrong. If there are advantages, are there
  any tips regarding xfs mount options etc ?
 
  2. I've created a volume on the FC based storage and exported it to all
  of
  the nodes in the cluster on the storage itself. I've configured
  multipathing correctly and added an alias for the wwid of the LUN so I
  can
  distinct this one and any other future volumes more easily. At first I
  created a partition on it but since oVirt saw only the whole LUN as raw
  device I erased it before adding it as the FC master storage domain. I've
  imported a few VM's and point them to the FC storage domain. This setup
  works, but:
 
  - All of the nodes see a device with the alias for the wwid of the
  volume,
  but only the node wich is currently the SPM for the cluster can see
  logical
  volumes inside. Also when I setup the high availability for VM's residing
  on the FC storage and select to start on any node on the cluster, they
  always start on the SPM. Can multiple nodes run different VM's on the
  same
  FC storage at the same time (logical thing would be that they can, but I
  wanted to be sure first). I am not familiar with the logic oVirt utilizes
  that locks the vm's logical volume to prevent corruption.
 
  - Fdisk shows that logical volumes on the LUN of the FC volume are
  missaligned (partition doesn't end on cylindar boundary), so I wonder if
  this is becuase I imported the VM's with disks that were created on local
  storage before and that any _new_ VM's with disks on the fc storage would
  be propperly aligned.
 
  This is a new setup with oVirt 3.4 (did an export of all the VM's on 3.3
  and after a fresh installation of the 3.4 imported them back again). I
  have room to experiment a little with 2 of the 4 nodes because currently
  they are free from running any VM's, but I have limited room for
  anything else that would cause an unplanned downtime for four virtual
  machines running on the other two nodes on the cluster (currently highly
  available and their drives are on the FC storage domain). All in all I
  have 12 VM's running and I'm asking on the list for 

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-09 Thread Andrew Lau
Interesting, my storage network is a L2 only and doesn't run on the
ovirtmgmt (which is the only thing HostedEngine sees) but I've only
seen this issue when running ctdb in front of my NFS server. I
previously was using localhost as all my hosts had the nfs server on
it (gluster).

On Mon, Jun 9, 2014 at 9:15 PM, Artyom Lukianov aluki...@redhat.com wrote:
 I just blocked connection to storage for testing, but on result I had this 
 error: Failed to acquire lock error -243, so I added it in reproduce steps.
 If you know another steps to reproduce this error, without blocking 
 connection to storage it also can be wonderful if you can provide them.
 Thanks

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: combuster combus...@archlinux.us
 Cc: users users@ovirt.org
 Sent: Monday, June 9, 2014 3:47:00 AM
 Subject: Re: [ovirt-users] VM HostedEngie is down. Exist message: internal 
 error Failed to acquire lock error -243

 I just ran a few extra tests, I had a 2 host, hosted-engine running
 for a day. They both had a score of 2400. Migrated the VM through the
 UI multiple times, all worked fine. I then added the third host, and
 that's when it all fell to pieces.
 Other two hosts have a score of 0 now.

 I'm also curious, in the BZ there's a note about:

 where engine-vm block connection to storage domain(via iptables -I
 INPUT -s sd_ip -j DROP)

 What's the purpose for that?

 On Sat, Jun 7, 2014 at 4:16 PM, Andrew Lau and...@andrewklau.com wrote:
 Ignore that, the issue came back after 10 minutes.

 I've even tried a gluster mount + nfs server on top of that, and the
 same issue has come back.

 On Fri, Jun 6, 2014 at 6:26 PM, Andrew Lau and...@andrewklau.com wrote:
 Interesting, I put it all into global maintenance. Shut it all down
 for 10~ minutes, and it's regained it's sanlock control and doesn't
 seem to have that issue coming up in the log.

 On Fri, Jun 6, 2014 at 4:21 PM, combuster combus...@archlinux.us wrote:
 It was pure NFS on a NAS device. They all had different ids (had no
 redeployements of nodes before problem occured).

 Thanks Jirka.


 On 06/06/2014 08:19 AM, Jiri Moskovcak wrote:

 I've seen that problem in other threads, the common denominator was nfs
 on top of gluster. So if you have this setup, then it's a known problem. 
 Or
 you should double check if you hosts have different ids otherwise they 
 would
 be trying to acquire the same lock.

 --Jirka

 On 06/06/2014 08:03 AM, Andrew Lau wrote:

 Hi Ivan,

 Thanks for the in depth reply.

 I've only seen this happen twice, and only after I added a third host
 to the HA cluster. I wonder if that's the root problem.

 Have you seen this happen on all your installs or only just after your
 manual migration? It's a little frustrating this is happening as I was
 hoping to get this into a production environment. It was all working
 except that log message :(

 Thanks,
 Andrew


 On Fri, Jun 6, 2014 at 3:20 PM, combuster combus...@archlinux.us wrote:

 Hi Andrew,

 this is something that I saw in my logs too, first on one node and then
 on
 the other three. When that happend on all four of them, engine was
 corrupted
 beyond repair.

 First of all, I think that message is saying that sanlock can't get a
 lock
 on the shared storage that you defined for the hostedengine during
 installation. I got this error when I've tried to manually migrate the
 hosted engine. There is an unresolved bug there and I think it's related
 to
 this one:

 [Bug 1093366 - Migration of hosted-engine vm put target host score to
 zero]
 https://bugzilla.redhat.com/show_bug.cgi?id=1093366

 This is a blocker bug (or should be) for the selfhostedengine and, from
 my
 own experience with it, shouldn't be used in the production enviroment
 (not
 untill it's fixed).

 Nothing that I've done couldn't fix the fact that the score for the
 target
 node was Zero, tried to reinstall the node, reboot the node, restarted
 several services, tailed a tons of logs etc but to no avail. When only
 one
 node was left (that was actually running the hosted engine), I brought
 the
 engine's vm down gracefully (hosted-engine --vm-shutdown I belive) and
 after
 that, when I've tried to start the vm - it wouldn't load. Running VNC
 showed
 that the filesystem inside the vm was corrupted and when I ran fsck and
 finally started up - it was too badly damaged. I succeded to start the
 engine itself (after repairing postgresql service that wouldn't want to
 start) but the database was damaged enough and acted pretty weird
 (showed
 that storage domains were down but the vm's were running fine etc).
 Lucky
 me, I had already exported all of the VM's on the first sign of trouble
 and
 then installed ovirt-engine on the dedicated server and attached the
 export
 domain.

 So while really a usefull feature, and it's working (for the most part
 ie,
 automatic migration works), manually migrating VM with the hosted-engine
 will lead to troubles.

 I hope that my experience

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-09 Thread Andrew Lau
So after adding the L3 capabilities to my storage network, I'm no
longer seeing this issue anymore. So the engine needs to be able to
access the storage domain it sits on? But that doesn't show up in the
UI?

Ivan, was this also the case with your setup? Engine couldn't access
storage domain?

On Mon, Jun 9, 2014 at 9:56 PM, Andrew Lau and...@andrewklau.com wrote:
 Interesting, my storage network is a L2 only and doesn't run on the
 ovirtmgmt (which is the only thing HostedEngine sees) but I've only
 seen this issue when running ctdb in front of my NFS server. I
 previously was using localhost as all my hosts had the nfs server on
 it (gluster).

 On Mon, Jun 9, 2014 at 9:15 PM, Artyom Lukianov aluki...@redhat.com wrote:
 I just blocked connection to storage for testing, but on result I had this 
 error: Failed to acquire lock error -243, so I added it in reproduce steps.
 If you know another steps to reproduce this error, without blocking 
 connection to storage it also can be wonderful if you can provide them.
 Thanks

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: combuster combus...@archlinux.us
 Cc: users users@ovirt.org
 Sent: Monday, June 9, 2014 3:47:00 AM
 Subject: Re: [ovirt-users] VM HostedEngie is down. Exist message: internal 
 error Failed to acquire lock error -243

 I just ran a few extra tests, I had a 2 host, hosted-engine running
 for a day. They both had a score of 2400. Migrated the VM through the
 UI multiple times, all worked fine. I then added the third host, and
 that's when it all fell to pieces.
 Other two hosts have a score of 0 now.

 I'm also curious, in the BZ there's a note about:

 where engine-vm block connection to storage domain(via iptables -I
 INPUT -s sd_ip -j DROP)

 What's the purpose for that?

 On Sat, Jun 7, 2014 at 4:16 PM, Andrew Lau and...@andrewklau.com wrote:
 Ignore that, the issue came back after 10 minutes.

 I've even tried a gluster mount + nfs server on top of that, and the
 same issue has come back.

 On Fri, Jun 6, 2014 at 6:26 PM, Andrew Lau and...@andrewklau.com wrote:
 Interesting, I put it all into global maintenance. Shut it all down
 for 10~ minutes, and it's regained it's sanlock control and doesn't
 seem to have that issue coming up in the log.

 On Fri, Jun 6, 2014 at 4:21 PM, combuster combus...@archlinux.us wrote:
 It was pure NFS on a NAS device. They all had different ids (had no
 redeployements of nodes before problem occured).

 Thanks Jirka.


 On 06/06/2014 08:19 AM, Jiri Moskovcak wrote:

 I've seen that problem in other threads, the common denominator was nfs
 on top of gluster. So if you have this setup, then it's a known 
 problem. Or
 you should double check if you hosts have different ids otherwise they 
 would
 be trying to acquire the same lock.

 --Jirka

 On 06/06/2014 08:03 AM, Andrew Lau wrote:

 Hi Ivan,

 Thanks for the in depth reply.

 I've only seen this happen twice, and only after I added a third host
 to the HA cluster. I wonder if that's the root problem.

 Have you seen this happen on all your installs or only just after your
 manual migration? It's a little frustrating this is happening as I was
 hoping to get this into a production environment. It was all working
 except that log message :(

 Thanks,
 Andrew


 On Fri, Jun 6, 2014 at 3:20 PM, combuster combus...@archlinux.us 
 wrote:

 Hi Andrew,

 this is something that I saw in my logs too, first on one node and then
 on
 the other three. When that happend on all four of them, engine was
 corrupted
 beyond repair.

 First of all, I think that message is saying that sanlock can't get a
 lock
 on the shared storage that you defined for the hostedengine during
 installation. I got this error when I've tried to manually migrate the
 hosted engine. There is an unresolved bug there and I think it's 
 related
 to
 this one:

 [Bug 1093366 - Migration of hosted-engine vm put target host score to
 zero]
 https://bugzilla.redhat.com/show_bug.cgi?id=1093366

 This is a blocker bug (or should be) for the selfhostedengine and, from
 my
 own experience with it, shouldn't be used in the production enviroment
 (not
 untill it's fixed).

 Nothing that I've done couldn't fix the fact that the score for the
 target
 node was Zero, tried to reinstall the node, reboot the node, restarted
 several services, tailed a tons of logs etc but to no avail. When only
 one
 node was left (that was actually running the hosted engine), I brought
 the
 engine's vm down gracefully (hosted-engine --vm-shutdown I belive) and
 after
 that, when I've tried to start the vm - it wouldn't load. Running VNC
 showed
 that the filesystem inside the vm was corrupted and when I ran fsck and
 finally started up - it was too badly damaged. I succeded to start the
 engine itself (after repairing postgresql service that wouldn't want to
 start) but the database was damaged enough and acted pretty weird
 (showed
 that storage domains were down but the vm's were running fine etc

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-09 Thread Andrew Lau
nvm, just as I hit send the error has returned.
Ignore this..

On Tue, Jun 10, 2014 at 9:01 AM, Andrew Lau and...@andrewklau.com wrote:
 So after adding the L3 capabilities to my storage network, I'm no
 longer seeing this issue anymore. So the engine needs to be able to
 access the storage domain it sits on? But that doesn't show up in the
 UI?

 Ivan, was this also the case with your setup? Engine couldn't access
 storage domain?

 On Mon, Jun 9, 2014 at 9:56 PM, Andrew Lau and...@andrewklau.com wrote:
 Interesting, my storage network is a L2 only and doesn't run on the
 ovirtmgmt (which is the only thing HostedEngine sees) but I've only
 seen this issue when running ctdb in front of my NFS server. I
 previously was using localhost as all my hosts had the nfs server on
 it (gluster).

 On Mon, Jun 9, 2014 at 9:15 PM, Artyom Lukianov aluki...@redhat.com wrote:
 I just blocked connection to storage for testing, but on result I had this 
 error: Failed to acquire lock error -243, so I added it in reproduce 
 steps.
 If you know another steps to reproduce this error, without blocking 
 connection to storage it also can be wonderful if you can provide them.
 Thanks

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: combuster combus...@archlinux.us
 Cc: users users@ovirt.org
 Sent: Monday, June 9, 2014 3:47:00 AM
 Subject: Re: [ovirt-users] VM HostedEngie is down. Exist message: internal 
 error Failed to acquire lock error -243

 I just ran a few extra tests, I had a 2 host, hosted-engine running
 for a day. They both had a score of 2400. Migrated the VM through the
 UI multiple times, all worked fine. I then added the third host, and
 that's when it all fell to pieces.
 Other two hosts have a score of 0 now.

 I'm also curious, in the BZ there's a note about:

 where engine-vm block connection to storage domain(via iptables -I
 INPUT -s sd_ip -j DROP)

 What's the purpose for that?

 On Sat, Jun 7, 2014 at 4:16 PM, Andrew Lau and...@andrewklau.com wrote:
 Ignore that, the issue came back after 10 minutes.

 I've even tried a gluster mount + nfs server on top of that, and the
 same issue has come back.

 On Fri, Jun 6, 2014 at 6:26 PM, Andrew Lau and...@andrewklau.com wrote:
 Interesting, I put it all into global maintenance. Shut it all down
 for 10~ minutes, and it's regained it's sanlock control and doesn't
 seem to have that issue coming up in the log.

 On Fri, Jun 6, 2014 at 4:21 PM, combuster combus...@archlinux.us wrote:
 It was pure NFS on a NAS device. They all had different ids (had no
 redeployements of nodes before problem occured).

 Thanks Jirka.


 On 06/06/2014 08:19 AM, Jiri Moskovcak wrote:

 I've seen that problem in other threads, the common denominator was nfs
 on top of gluster. So if you have this setup, then it's a known 
 problem. Or
 you should double check if you hosts have different ids otherwise they 
 would
 be trying to acquire the same lock.

 --Jirka

 On 06/06/2014 08:03 AM, Andrew Lau wrote:

 Hi Ivan,

 Thanks for the in depth reply.

 I've only seen this happen twice, and only after I added a third host
 to the HA cluster. I wonder if that's the root problem.

 Have you seen this happen on all your installs or only just after your
 manual migration? It's a little frustrating this is happening as I was
 hoping to get this into a production environment. It was all working
 except that log message :(

 Thanks,
 Andrew


 On Fri, Jun 6, 2014 at 3:20 PM, combuster combus...@archlinux.us 
 wrote:

 Hi Andrew,

 this is something that I saw in my logs too, first on one node and 
 then
 on
 the other three. When that happend on all four of them, engine was
 corrupted
 beyond repair.

 First of all, I think that message is saying that sanlock can't get a
 lock
 on the shared storage that you defined for the hostedengine during
 installation. I got this error when I've tried to manually migrate the
 hosted engine. There is an unresolved bug there and I think it's 
 related
 to
 this one:

 [Bug 1093366 - Migration of hosted-engine vm put target host score to
 zero]
 https://bugzilla.redhat.com/show_bug.cgi?id=1093366

 This is a blocker bug (or should be) for the selfhostedengine and, 
 from
 my
 own experience with it, shouldn't be used in the production enviroment
 (not
 untill it's fixed).

 Nothing that I've done couldn't fix the fact that the score for the
 target
 node was Zero, tried to reinstall the node, reboot the node, restarted
 several services, tailed a tons of logs etc but to no avail. When only
 one
 node was left (that was actually running the hosted engine), I brought
 the
 engine's vm down gracefully (hosted-engine --vm-shutdown I belive) and
 after
 that, when I've tried to start the vm - it wouldn't load. Running VNC
 showed
 that the filesystem inside the vm was corrupted and when I ran fsck 
 and
 finally started up - it was too badly damaged. I succeded to start the
 engine itself (after repairing postgresql service that wouldn't

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-09 Thread Andrew Lau
I'm really having a hard time finding out why it's happening..

If I set the cluster to global for a minute or two, the scores will
reset back to 2400. Set maintenance mode to none, and all will be fine
until a migration occurs. It seems it tries to migrate, fails and sets
the score to 0 permanently rather than the 10? minutes mentioned in
one of the ovirt slides.

When I have two hosts, it's score 0 only when a migration occurs.
(Just on the host which doesn't have engine up). The score 0 only
happens when it's tried to migrate when I set the host to local
maintenance. Migrating the VM from the UI has worked quite a few
times, but it's recently started to fail.

When I have three hosts, after 5~ mintues of them all up the score
will hit 0 on the hosts not running the VMs. It doesn't even have to
attempt to migrate before the score goes to 0. Stopping the ha agent
on one host, and resetting it with the global maintenance method
brings it back to the 2 host scenario above.

I may move on and just go back to a standalone engine as this is not
getting very much luck..

On Tue, Jun 10, 2014 at 3:11 PM, combuster combus...@archlinux.us wrote:
 Nah, I've explicitly allowed hosted-engine vm to be able to access the NAS
 device as the NFS share itself, before the deploy procedure even started.
 But I'm puzzled at how you can reproduce the bug, all was well on my setup
 before I've stated manual migration of the engine's vm. Even auto migration
 worked before that (tested it). Does it just happen without any procedure on
 the engine itself? Is the score 0 for just one node, or two of three of
 them?

 On 06/10/2014 01:02 AM, Andrew Lau wrote:

 nvm, just as I hit send the error has returned.
 Ignore this..

 On Tue, Jun 10, 2014 at 9:01 AM, Andrew Lau and...@andrewklau.com wrote:

 So after adding the L3 capabilities to my storage network, I'm no
 longer seeing this issue anymore. So the engine needs to be able to
 access the storage domain it sits on? But that doesn't show up in the
 UI?

 Ivan, was this also the case with your setup? Engine couldn't access
 storage domain?

 On Mon, Jun 9, 2014 at 9:56 PM, Andrew Lau and...@andrewklau.com wrote:

 Interesting, my storage network is a L2 only and doesn't run on the
 ovirtmgmt (which is the only thing HostedEngine sees) but I've only
 seen this issue when running ctdb in front of my NFS server. I
 previously was using localhost as all my hosts had the nfs server on
 it (gluster).

 On Mon, Jun 9, 2014 at 9:15 PM, Artyom Lukianov aluki...@redhat.com
 wrote:

 I just blocked connection to storage for testing, but on result I had
 this error: Failed to acquire lock error -243, so I added it in 
 reproduce
 steps.
 If you know another steps to reproduce this error, without blocking
 connection to storage it also can be wonderful if you can provide them.
 Thanks

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: combuster combus...@archlinux.us
 Cc: users users@ovirt.org
 Sent: Monday, June 9, 2014 3:47:00 AM
 Subject: Re: [ovirt-users] VM HostedEngie is down. Exist message:
 internal error Failed to acquire lock error -243

 I just ran a few extra tests, I had a 2 host, hosted-engine running
 for a day. They both had a score of 2400. Migrated the VM through the
 UI multiple times, all worked fine. I then added the third host, and
 that's when it all fell to pieces.
 Other two hosts have a score of 0 now.

 I'm also curious, in the BZ there's a note about:

 where engine-vm block connection to storage domain(via iptables -I
 INPUT -s sd_ip -j DROP)

 What's the purpose for that?

 On Sat, Jun 7, 2014 at 4:16 PM, Andrew Lau and...@andrewklau.com
 wrote:

 Ignore that, the issue came back after 10 minutes.

 I've even tried a gluster mount + nfs server on top of that, and the
 same issue has come back.

 On Fri, Jun 6, 2014 at 6:26 PM, Andrew Lau and...@andrewklau.com
 wrote:

 Interesting, I put it all into global maintenance. Shut it all down
 for 10~ minutes, and it's regained it's sanlock control and doesn't
 seem to have that issue coming up in the log.

 On Fri, Jun 6, 2014 at 4:21 PM, combuster combus...@archlinux.us
 wrote:

 It was pure NFS on a NAS device. They all had different ids (had no
 redeployements of nodes before problem occured).

 Thanks Jirka.


 On 06/06/2014 08:19 AM, Jiri Moskovcak wrote:

 I've seen that problem in other threads, the common denominator was
 nfs
 on top of gluster. So if you have this setup, then it's a known
 problem. Or
 you should double check if you hosts have different ids otherwise
 they would
 be trying to acquire the same lock.

 --Jirka

 On 06/06/2014 08:03 AM, Andrew Lau wrote:

 Hi Ivan,

 Thanks for the in depth reply.

 I've only seen this happen twice, and only after I added a third
 host
 to the HA cluster. I wonder if that's the root problem.

 Have you seen this happen on all your installs or only just after
 your
 manual migration? It's a little frustrating this is happening as I

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-08 Thread Andrew Lau
I just ran a few extra tests, I had a 2 host, hosted-engine running
for a day. They both had a score of 2400. Migrated the VM through the
UI multiple times, all worked fine. I then added the third host, and
that's when it all fell to pieces.
Other two hosts have a score of 0 now.

I'm also curious, in the BZ there's a note about:

where engine-vm block connection to storage domain(via iptables -I
INPUT -s sd_ip -j DROP)

What's the purpose for that?

On Sat, Jun 7, 2014 at 4:16 PM, Andrew Lau and...@andrewklau.com wrote:
 Ignore that, the issue came back after 10 minutes.

 I've even tried a gluster mount + nfs server on top of that, and the
 same issue has come back.

 On Fri, Jun 6, 2014 at 6:26 PM, Andrew Lau and...@andrewklau.com wrote:
 Interesting, I put it all into global maintenance. Shut it all down
 for 10~ minutes, and it's regained it's sanlock control and doesn't
 seem to have that issue coming up in the log.

 On Fri, Jun 6, 2014 at 4:21 PM, combuster combus...@archlinux.us wrote:
 It was pure NFS on a NAS device. They all had different ids (had no
 redeployements of nodes before problem occured).

 Thanks Jirka.


 On 06/06/2014 08:19 AM, Jiri Moskovcak wrote:

 I've seen that problem in other threads, the common denominator was nfs
 on top of gluster. So if you have this setup, then it's a known problem. 
 Or
 you should double check if you hosts have different ids otherwise they 
 would
 be trying to acquire the same lock.

 --Jirka

 On 06/06/2014 08:03 AM, Andrew Lau wrote:

 Hi Ivan,

 Thanks for the in depth reply.

 I've only seen this happen twice, and only after I added a third host
 to the HA cluster. I wonder if that's the root problem.

 Have you seen this happen on all your installs or only just after your
 manual migration? It's a little frustrating this is happening as I was
 hoping to get this into a production environment. It was all working
 except that log message :(

 Thanks,
 Andrew


 On Fri, Jun 6, 2014 at 3:20 PM, combuster combus...@archlinux.us wrote:

 Hi Andrew,

 this is something that I saw in my logs too, first on one node and then
 on
 the other three. When that happend on all four of them, engine was
 corrupted
 beyond repair.

 First of all, I think that message is saying that sanlock can't get a
 lock
 on the shared storage that you defined for the hostedengine during
 installation. I got this error when I've tried to manually migrate the
 hosted engine. There is an unresolved bug there and I think it's related
 to
 this one:

 [Bug 1093366 - Migration of hosted-engine vm put target host score to
 zero]
 https://bugzilla.redhat.com/show_bug.cgi?id=1093366

 This is a blocker bug (or should be) for the selfhostedengine and, from
 my
 own experience with it, shouldn't be used in the production enviroment
 (not
 untill it's fixed).

 Nothing that I've done couldn't fix the fact that the score for the
 target
 node was Zero, tried to reinstall the node, reboot the node, restarted
 several services, tailed a tons of logs etc but to no avail. When only
 one
 node was left (that was actually running the hosted engine), I brought
 the
 engine's vm down gracefully (hosted-engine --vm-shutdown I belive) and
 after
 that, when I've tried to start the vm - it wouldn't load. Running VNC
 showed
 that the filesystem inside the vm was corrupted and when I ran fsck and
 finally started up - it was too badly damaged. I succeded to start the
 engine itself (after repairing postgresql service that wouldn't want to
 start) but the database was damaged enough and acted pretty weird
 (showed
 that storage domains were down but the vm's were running fine etc).
 Lucky
 me, I had already exported all of the VM's on the first sign of trouble
 and
 then installed ovirt-engine on the dedicated server and attached the
 export
 domain.

 So while really a usefull feature, and it's working (for the most part
 ie,
 automatic migration works), manually migrating VM with the hosted-engine
 will lead to troubles.

 I hope that my experience with it, will be of use to you. It happened to
 me
 two weeks ago, ovirt-engine was current (3.4.1) and there was no fix
 available.

 Regards,

 Ivan

 On 06/06/2014 05:12 AM, Andrew Lau wrote:

 Hi,

 I'm seeing this weird message in my engine log

 2014-06-06 03:06:09,380 INFO
 [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
 (DefaultQuartzScheduler_Worker-79) RefreshVmList vm id
 85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5 status = WaitForLaunch on vds
 ov-hv2-2a-08-23 ignoring it in the refresh until migration is done
 2014-06-06 03:06:12,494 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
 (DefaultQuartzScheduler_Worker-89) START, DestroyVDSCommand(HostName =
 ov-hv2-2a-08-23, HostId = c04c62be-5d34-4e73-bd26-26f805b2dc60,
 vmId=85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5, force=false,
 secondsToWait=0, gracefully=false), log id: 62a9d4c1
 2014-06-06 03:06:12,561 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-07 Thread Andrew Lau
Ignore that, the issue came back after 10 minutes.

I've even tried a gluster mount + nfs server on top of that, and the
same issue has come back.

On Fri, Jun 6, 2014 at 6:26 PM, Andrew Lau and...@andrewklau.com wrote:
 Interesting, I put it all into global maintenance. Shut it all down
 for 10~ minutes, and it's regained it's sanlock control and doesn't
 seem to have that issue coming up in the log.

 On Fri, Jun 6, 2014 at 4:21 PM, combuster combus...@archlinux.us wrote:
 It was pure NFS on a NAS device. They all had different ids (had no
 redeployements of nodes before problem occured).

 Thanks Jirka.


 On 06/06/2014 08:19 AM, Jiri Moskovcak wrote:

 I've seen that problem in other threads, the common denominator was nfs
 on top of gluster. So if you have this setup, then it's a known problem. Or
 you should double check if you hosts have different ids otherwise they would
 be trying to acquire the same lock.

 --Jirka

 On 06/06/2014 08:03 AM, Andrew Lau wrote:

 Hi Ivan,

 Thanks for the in depth reply.

 I've only seen this happen twice, and only after I added a third host
 to the HA cluster. I wonder if that's the root problem.

 Have you seen this happen on all your installs or only just after your
 manual migration? It's a little frustrating this is happening as I was
 hoping to get this into a production environment. It was all working
 except that log message :(

 Thanks,
 Andrew


 On Fri, Jun 6, 2014 at 3:20 PM, combuster combus...@archlinux.us wrote:

 Hi Andrew,

 this is something that I saw in my logs too, first on one node and then
 on
 the other three. When that happend on all four of them, engine was
 corrupted
 beyond repair.

 First of all, I think that message is saying that sanlock can't get a
 lock
 on the shared storage that you defined for the hostedengine during
 installation. I got this error when I've tried to manually migrate the
 hosted engine. There is an unresolved bug there and I think it's related
 to
 this one:

 [Bug 1093366 - Migration of hosted-engine vm put target host score to
 zero]
 https://bugzilla.redhat.com/show_bug.cgi?id=1093366

 This is a blocker bug (or should be) for the selfhostedengine and, from
 my
 own experience with it, shouldn't be used in the production enviroment
 (not
 untill it's fixed).

 Nothing that I've done couldn't fix the fact that the score for the
 target
 node was Zero, tried to reinstall the node, reboot the node, restarted
 several services, tailed a tons of logs etc but to no avail. When only
 one
 node was left (that was actually running the hosted engine), I brought
 the
 engine's vm down gracefully (hosted-engine --vm-shutdown I belive) and
 after
 that, when I've tried to start the vm - it wouldn't load. Running VNC
 showed
 that the filesystem inside the vm was corrupted and when I ran fsck and
 finally started up - it was too badly damaged. I succeded to start the
 engine itself (after repairing postgresql service that wouldn't want to
 start) but the database was damaged enough and acted pretty weird
 (showed
 that storage domains were down but the vm's were running fine etc).
 Lucky
 me, I had already exported all of the VM's on the first sign of trouble
 and
 then installed ovirt-engine on the dedicated server and attached the
 export
 domain.

 So while really a usefull feature, and it's working (for the most part
 ie,
 automatic migration works), manually migrating VM with the hosted-engine
 will lead to troubles.

 I hope that my experience with it, will be of use to you. It happened to
 me
 two weeks ago, ovirt-engine was current (3.4.1) and there was no fix
 available.

 Regards,

 Ivan

 On 06/06/2014 05:12 AM, Andrew Lau wrote:

 Hi,

 I'm seeing this weird message in my engine log

 2014-06-06 03:06:09,380 INFO
 [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
 (DefaultQuartzScheduler_Worker-79) RefreshVmList vm id
 85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5 status = WaitForLaunch on vds
 ov-hv2-2a-08-23 ignoring it in the refresh until migration is done
 2014-06-06 03:06:12,494 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
 (DefaultQuartzScheduler_Worker-89) START, DestroyVDSCommand(HostName =
 ov-hv2-2a-08-23, HostId = c04c62be-5d34-4e73-bd26-26f805b2dc60,
 vmId=85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5, force=false,
 secondsToWait=0, gracefully=false), log id: 62a9d4c1
 2014-06-06 03:06:12,561 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
 (DefaultQuartzScheduler_Worker-89) FINISH, DestroyVDSCommand, log id:
 62a9d4c1
 2014-06-06 03:06:12,652 INFO
 [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
 (DefaultQuartzScheduler_
 Worker-89) Correlation ID: null, Call Stack:
 null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit
 message: internal error Failed to acquire lock: error -243.

 It also appears to occur on the other hosts in the cluster, except the
 host which is running the hosted-engine. So right now 3 servers, it
 shows up twice

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-06 Thread Andrew Lau
Hi Ivan,

Thanks for the in depth reply.

I've only seen this happen twice, and only after I added a third host
to the HA cluster. I wonder if that's the root problem.

Have you seen this happen on all your installs or only just after your
manual migration? It's a little frustrating this is happening as I was
hoping to get this into a production environment. It was all working
except that log message :(

Thanks,
Andrew


On Fri, Jun 6, 2014 at 3:20 PM, combuster combus...@archlinux.us wrote:
 Hi Andrew,

 this is something that I saw in my logs too, first on one node and then on
 the other three. When that happend on all four of them, engine was corrupted
 beyond repair.

 First of all, I think that message is saying that sanlock can't get a lock
 on the shared storage that you defined for the hostedengine during
 installation. I got this error when I've tried to manually migrate the
 hosted engine. There is an unresolved bug there and I think it's related to
 this one:

 [Bug 1093366 - Migration of hosted-engine vm put target host score to zero]
 https://bugzilla.redhat.com/show_bug.cgi?id=1093366

 This is a blocker bug (or should be) for the selfhostedengine and, from my
 own experience with it, shouldn't be used in the production enviroment (not
 untill it's fixed).

 Nothing that I've done couldn't fix the fact that the score for the target
 node was Zero, tried to reinstall the node, reboot the node, restarted
 several services, tailed a tons of logs etc but to no avail. When only one
 node was left (that was actually running the hosted engine), I brought the
 engine's vm down gracefully (hosted-engine --vm-shutdown I belive) and after
 that, when I've tried to start the vm - it wouldn't load. Running VNC showed
 that the filesystem inside the vm was corrupted and when I ran fsck and
 finally started up - it was too badly damaged. I succeded to start the
 engine itself (after repairing postgresql service that wouldn't want to
 start) but the database was damaged enough and acted pretty weird (showed
 that storage domains were down but the vm's were running fine etc). Lucky
 me, I had already exported all of the VM's on the first sign of trouble and
 then installed ovirt-engine on the dedicated server and attached the export
 domain.

 So while really a usefull feature, and it's working (for the most part ie,
 automatic migration works), manually migrating VM with the hosted-engine
 will lead to troubles.

 I hope that my experience with it, will be of use to you. It happened to me
 two weeks ago, ovirt-engine was current (3.4.1) and there was no fix
 available.

 Regards,

 Ivan

 On 06/06/2014 05:12 AM, Andrew Lau wrote:

 Hi,

 I'm seeing this weird message in my engine log

 2014-06-06 03:06:09,380 INFO
 [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
 (DefaultQuartzScheduler_Worker-79) RefreshVmList vm id
 85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5 status = WaitForLaunch on vds
 ov-hv2-2a-08-23 ignoring it in the refresh until migration is done
 2014-06-06 03:06:12,494 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
 (DefaultQuartzScheduler_Worker-89) START, DestroyVDSCommand(HostName =
 ov-hv2-2a-08-23, HostId = c04c62be-5d34-4e73-bd26-26f805b2dc60,
 vmId=85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5, force=false,
 secondsToWait=0, gracefully=false), log id: 62a9d4c1
 2014-06-06 03:06:12,561 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
 (DefaultQuartzScheduler_Worker-89) FINISH, DestroyVDSCommand, log id:
 62a9d4c1
 2014-06-06 03:06:12,652 INFO
 [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
 (DefaultQuartzScheduler_
 Worker-89) Correlation ID: null, Call Stack:
 null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit
 message: internal error Failed to acquire lock: error -243.

 It also appears to occur on the other hosts in the cluster, except the
 host which is running the hosted-engine. So right now 3 servers, it
 shows up twice in the engine UI.

 The engine VM continues to run peacefully, without any issues on the
 host which doesn't have that error.

 Any ideas?
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-06 Thread Andrew Lau
Is this related to the NFS server which gluster provides, or is
because of the way gluster does replication?

There's a few posts ie.
http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/ which
are reporting success with gluster + hosted engine. So it'd be good to
know, so we could possibly try a work around.

Cheers.

On Fri, Jun 6, 2014 at 4:19 PM, Jiri Moskovcak jmosk...@redhat.com wrote:
 I've seen that problem in other threads, the common denominator was nfs on
 top of gluster. So if you have this setup, then it's a known problem. Or
 you should double check if you hosts have different ids otherwise they would
 be trying to acquire the same lock.

 --Jirka


 On 06/06/2014 08:03 AM, Andrew Lau wrote:

 Hi Ivan,

 Thanks for the in depth reply.

 I've only seen this happen twice, and only after I added a third host
 to the HA cluster. I wonder if that's the root problem.

 Have you seen this happen on all your installs or only just after your
 manual migration? It's a little frustrating this is happening as I was
 hoping to get this into a production environment. It was all working
 except that log message :(

 Thanks,
 Andrew


 On Fri, Jun 6, 2014 at 3:20 PM, combuster combus...@archlinux.us wrote:

 Hi Andrew,

 this is something that I saw in my logs too, first on one node and then
 on
 the other three. When that happend on all four of them, engine was
 corrupted
 beyond repair.

 First of all, I think that message is saying that sanlock can't get a
 lock
 on the shared storage that you defined for the hostedengine during
 installation. I got this error when I've tried to manually migrate the
 hosted engine. There is an unresolved bug there and I think it's related
 to
 this one:

 [Bug 1093366 - Migration of hosted-engine vm put target host score to
 zero]
 https://bugzilla.redhat.com/show_bug.cgi?id=1093366

 This is a blocker bug (or should be) for the selfhostedengine and, from
 my
 own experience with it, shouldn't be used in the production enviroment
 (not
 untill it's fixed).

 Nothing that I've done couldn't fix the fact that the score for the
 target
 node was Zero, tried to reinstall the node, reboot the node, restarted
 several services, tailed a tons of logs etc but to no avail. When only
 one
 node was left (that was actually running the hosted engine), I brought
 the
 engine's vm down gracefully (hosted-engine --vm-shutdown I belive) and
 after
 that, when I've tried to start the vm - it wouldn't load. Running VNC
 showed
 that the filesystem inside the vm was corrupted and when I ran fsck and
 finally started up - it was too badly damaged. I succeded to start the
 engine itself (after repairing postgresql service that wouldn't want to
 start) but the database was damaged enough and acted pretty weird (showed
 that storage domains were down but the vm's were running fine etc). Lucky
 me, I had already exported all of the VM's on the first sign of trouble
 and
 then installed ovirt-engine on the dedicated server and attached the
 export
 domain.

 So while really a usefull feature, and it's working (for the most part
 ie,
 automatic migration works), manually migrating VM with the hosted-engine
 will lead to troubles.

 I hope that my experience with it, will be of use to you. It happened to
 me
 two weeks ago, ovirt-engine was current (3.4.1) and there was no fix
 available.

 Regards,

 Ivan

 On 06/06/2014 05:12 AM, Andrew Lau wrote:

 Hi,

 I'm seeing this weird message in my engine log

 2014-06-06 03:06:09,380 INFO
 [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
 (DefaultQuartzScheduler_Worker-79) RefreshVmList vm id
 85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5 status = WaitForLaunch on vds
 ov-hv2-2a-08-23 ignoring it in the refresh until migration is done
 2014-06-06 03:06:12,494 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
 (DefaultQuartzScheduler_Worker-89) START, DestroyVDSCommand(HostName =
 ov-hv2-2a-08-23, HostId = c04c62be-5d34-4e73-bd26-26f805b2dc60,
 vmId=85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5, force=false,
 secondsToWait=0, gracefully=false), log id: 62a9d4c1
 2014-06-06 03:06:12,561 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
 (DefaultQuartzScheduler_Worker-89) FINISH, DestroyVDSCommand, log id:
 62a9d4c1
 2014-06-06 03:06:12,652 INFO
 [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
 (DefaultQuartzScheduler_
 Worker-89) Correlation ID: null, Call Stack:
 null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit
 message: internal error Failed to acquire lock: error -243.

 It also appears to occur on the other hosts in the cluster, except the
 host which is running the hosted-engine. So right now 3 servers, it
 shows up twice in the engine UI.

 The engine VM continues to run peacefully, without any issues on the
 host which doesn't have that error.

 Any ideas?
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman

Re: [ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-06 Thread Andrew Lau
Interesting, I put it all into global maintenance. Shut it all down
for 10~ minutes, and it's regained it's sanlock control and doesn't
seem to have that issue coming up in the log.

On Fri, Jun 6, 2014 at 4:21 PM, combuster combus...@archlinux.us wrote:
 It was pure NFS on a NAS device. They all had different ids (had no
 redeployements of nodes before problem occured).

 Thanks Jirka.


 On 06/06/2014 08:19 AM, Jiri Moskovcak wrote:

 I've seen that problem in other threads, the common denominator was nfs
 on top of gluster. So if you have this setup, then it's a known problem. Or
 you should double check if you hosts have different ids otherwise they would
 be trying to acquire the same lock.

 --Jirka

 On 06/06/2014 08:03 AM, Andrew Lau wrote:

 Hi Ivan,

 Thanks for the in depth reply.

 I've only seen this happen twice, and only after I added a third host
 to the HA cluster. I wonder if that's the root problem.

 Have you seen this happen on all your installs or only just after your
 manual migration? It's a little frustrating this is happening as I was
 hoping to get this into a production environment. It was all working
 except that log message :(

 Thanks,
 Andrew


 On Fri, Jun 6, 2014 at 3:20 PM, combuster combus...@archlinux.us wrote:

 Hi Andrew,

 this is something that I saw in my logs too, first on one node and then
 on
 the other three. When that happend on all four of them, engine was
 corrupted
 beyond repair.

 First of all, I think that message is saying that sanlock can't get a
 lock
 on the shared storage that you defined for the hostedengine during
 installation. I got this error when I've tried to manually migrate the
 hosted engine. There is an unresolved bug there and I think it's related
 to
 this one:

 [Bug 1093366 - Migration of hosted-engine vm put target host score to
 zero]
 https://bugzilla.redhat.com/show_bug.cgi?id=1093366

 This is a blocker bug (or should be) for the selfhostedengine and, from
 my
 own experience with it, shouldn't be used in the production enviroment
 (not
 untill it's fixed).

 Nothing that I've done couldn't fix the fact that the score for the
 target
 node was Zero, tried to reinstall the node, reboot the node, restarted
 several services, tailed a tons of logs etc but to no avail. When only
 one
 node was left (that was actually running the hosted engine), I brought
 the
 engine's vm down gracefully (hosted-engine --vm-shutdown I belive) and
 after
 that, when I've tried to start the vm - it wouldn't load. Running VNC
 showed
 that the filesystem inside the vm was corrupted and when I ran fsck and
 finally started up - it was too badly damaged. I succeded to start the
 engine itself (after repairing postgresql service that wouldn't want to
 start) but the database was damaged enough and acted pretty weird
 (showed
 that storage domains were down but the vm's were running fine etc).
 Lucky
 me, I had already exported all of the VM's on the first sign of trouble
 and
 then installed ovirt-engine on the dedicated server and attached the
 export
 domain.

 So while really a usefull feature, and it's working (for the most part
 ie,
 automatic migration works), manually migrating VM with the hosted-engine
 will lead to troubles.

 I hope that my experience with it, will be of use to you. It happened to
 me
 two weeks ago, ovirt-engine was current (3.4.1) and there was no fix
 available.

 Regards,

 Ivan

 On 06/06/2014 05:12 AM, Andrew Lau wrote:

 Hi,

 I'm seeing this weird message in my engine log

 2014-06-06 03:06:09,380 INFO
 [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
 (DefaultQuartzScheduler_Worker-79) RefreshVmList vm id
 85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5 status = WaitForLaunch on vds
 ov-hv2-2a-08-23 ignoring it in the refresh until migration is done
 2014-06-06 03:06:12,494 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
 (DefaultQuartzScheduler_Worker-89) START, DestroyVDSCommand(HostName =
 ov-hv2-2a-08-23, HostId = c04c62be-5d34-4e73-bd26-26f805b2dc60,
 vmId=85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5, force=false,
 secondsToWait=0, gracefully=false), log id: 62a9d4c1
 2014-06-06 03:06:12,561 INFO
 [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
 (DefaultQuartzScheduler_Worker-89) FINISH, DestroyVDSCommand, log id:
 62a9d4c1
 2014-06-06 03:06:12,652 INFO
 [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
 (DefaultQuartzScheduler_
 Worker-89) Correlation ID: null, Call Stack:
 null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit
 message: internal error Failed to acquire lock: error -243.

 It also appears to occur on the other hosts in the cluster, except the
 host which is running the hosted-engine. So right now 3 servers, it
 shows up twice in the engine UI.

 The engine VM continues to run peacefully, without any issues on the
 host which doesn't have that error.

 Any ideas?
 ___
 Users mailing list
 Users

Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-05 Thread Andrew Lau
Hi Doron,

On Mon, May 26, 2014 at 4:38 PM, Doron Fediuck dfedi...@redhat.com wrote:


 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Bob Doolittle b...@doolittle.us.com
 Cc: users users@ovirt.org
 Sent: Monday, May 26, 2014 7:30:41 AM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?

 On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote:
 
  On 05/25/2014 02:51 PM, Joop wrote:
 
  On 25-5-2014 19:38, Bob Doolittle wrote:
 
 
  Also curious is that when I say poweroff it actually reboots and comes
  up again. Could that be due to the timeouts on the way down?
 
  Ah, that's something my F19 host does too. Some more info: if engine
  hasn't been started on the host then I can shutdown it and it will
  poweroff.
  IF engine has been run on it then it will reboot.
  Its not vdsm (I think) because my shutdown sequence is (on my f19 host):
   service ovirt-agent-ha stop
   service ovirt-agent-broker stop
   service vdsmd stop
   ssh root@engine01 init 0
  init 0
 
  I don't use maintenance mode because when I poweron my host (= my desktop)
  I want engine to power on automatically which it does most of the time
  within 10 min.
 
 
  For comparison, I see this issue and I *do* use maintenance mode (because
  presumably that's the 'blessed' way to shut things down and I'm scared to
  mess this complex system up by straying off the beaten path ;). My process
  is:
 
  ssh root@engine init 0
  (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
  hosted-engine --set-maintenance --mode=global
  poweroff
 
  And then on startup:
  hosted-engine --set-maintenance --mode=none
  hosted-engine --vm-start
 
  There are two issues here. I am not sure if they are related or not.
  1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?)
  2. The system reboot instead of poweroff (which messes up remote machine
  management)
 

 For 1. I was wondering if perhaps, we could have an option to specify
 the mount options. If I understand correctly, applying a soft mount
 instead of a hard mount would prevent this from happening. I'm however
 not sure of the implications this would have on the data integrity..

 I would really like to see it happen in the ha-agent, as it's the one
 which connects/mounts the storage it should also unmount it on boot.
 However the stability on it, is flaky at best. I've noticed if `df`
 hangs because of another NFS mount having timed-out the agent will
 die. That's not a good sign.. this was what actually caused my
 hosted-engine to run twice in one case.

  Thanks,
   Bob
 
 
  I think wdmd or sanlock are causing the reboot instead of poweroff
 
  Joop
 

 Great to have your feedback guys!

 So just to clarify some of the issues you mentioned;

 Hosted engine wasn't designed for a 'single node' use case, as we do
 want it to be highly available. This is why it's being restarted
 elsewhere or even on the same server if no better alternative.

 Having said that, it is possible to set global maintenance mode
 as a first step (in the UI: right click engine vm and choose ha-maintenance).
 Then you can ssh into the engine vm and init 0.

 After a short while, the qemu process should gracefully end and release
 its sanlock lease as well as any other resource, which means you can
 reboot your hypervisor peacefully.

Sadly no, I've only been able to reboot my hypervisors if one of the
two conditions are met:

- Lazy unmount of /rhev/mnt/hosted-engine etc.
- killall -9 sanlock wdmd

I notice sanlock and wdmd are not able to be stopped with service wdmd
stop; service sanlock stop
These seem to fail during the shutdown/reboot process which prevents
the unmount and the graceful reboot.

Are there any logs I can look into on how to debug those failed shutdowns?


 Doron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-05 Thread Andrew Lau
On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote:

 On 05/25/2014 02:51 PM, Joop wrote:

 On 25-5-2014 19:38, Bob Doolittle wrote:


 Also curious is that when I say poweroff it actually reboots and comes
 up again. Could that be due to the timeouts on the way down?

 Ah, that's something my F19 host does too. Some more info: if engine
 hasn't been started on the host then I can shutdown it and it will poweroff.
 IF engine has been run on it then it will reboot.
 Its not vdsm (I think) because my shutdown sequence is (on my f19 host):
  service ovirt-agent-ha stop
  service ovirt-agent-broker stop
  service vdsmd stop
  ssh root@engine01 init 0
 init 0

 I don't use maintenance mode because when I poweron my host (= my desktop)
 I want engine to power on automatically which it does most of the time
 within 10 min.


 For comparison, I see this issue and I *do* use maintenance mode (because
 presumably that's the 'blessed' way to shut things down and I'm scared to
 mess this complex system up by straying off the beaten path ;). My process
 is:

 ssh root@engine init 0
 (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
 hosted-engine --set-maintenance --mode=global
 poweroff

 And then on startup:
 hosted-engine --set-maintenance --mode=none
 hosted-engine --vm-start

 There are two issues here. I am not sure if they are related or not.
 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?)
 2. The system reboot instead of poweroff (which messes up remote machine
 management)

 Thanks,
  Bob


 I think wdmd or sanlock are causing the reboot instead of poweroff

While searching for my issue of wdmd/sanlock not shutting down, I
found this which may interest you both:
https://bugzilla.redhat.com/show_bug.cgi?id=888197

Specifically:
To shut down sanlock without causing a wdmd reboot, you can run the
following command: sanlock client shutdown -f 1

This will cause sanlock to kill any pid's that are holding leases,
release those leases, and then exit.



 Joop

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] VM HostedEngie is down. Exist message: internal error Failed to acquire lock error -243

2014-06-05 Thread Andrew Lau
Hi,

I'm seeing this weird message in my engine log

2014-06-06 03:06:09,380 INFO
[org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo]
(DefaultQuartzScheduler_Worker-79) RefreshVmList vm id
85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5 status = WaitForLaunch on vds
ov-hv2-2a-08-23 ignoring it in the refresh until migration is done
2014-06-06 03:06:12,494 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(DefaultQuartzScheduler_Worker-89) START, DestroyVDSCommand(HostName =
ov-hv2-2a-08-23, HostId = c04c62be-5d34-4e73-bd26-26f805b2dc60,
vmId=85d4cfb9-f063-4c7c-a9f8-2b74f5f7afa5, force=false,
secondsToWait=0, gracefully=false), log id: 62a9d4c1
2014-06-06 03:06:12,561 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand]
(DefaultQuartzScheduler_Worker-89) FINISH, DestroyVDSCommand, log id:
62a9d4c1
2014-06-06 03:06:12,652 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler_Worker-89) Correlation ID: null, Call Stack:
null, Custom Event ID: -1, Message: VM HostedEngine is down. Exit
message: internal error Failed to acquire lock: error -243.

It also appears to occur on the other hosts in the cluster, except the
host which is running the hosted-engine. So right now 3 servers, it
shows up twice in the engine UI.

The engine VM continues to run peacefully, without any issues on the
host which doesn't have that error.

Any ideas?
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-05 Thread Andrew Lau
On Fri, Jun 6, 2014 at 1:09 PM, Bob Doolittle b...@doolittle.us.com wrote:
 Thanks Andrew, I'll try this workaround tomorrow for sure. But reading
 though that bug report (closed not a bug) it states that the problem should
 only arise if something is not releasing a sanlock lease. So if we've
 entered Global Maintenance and shut down Engine, the question is what's
 holding the lease?

 How can that be debugged?

For me it's wdmd and sanlock itself failing to shutdown properly. I
also noticed even when in global maintenance and the engine VM powered
off there is still a sanlock lease for the
/rhev/mnt/hosted-engine/? lease file or something along those
lines. So the global maintenance may not actually be releasing that
lock.

I'm not too familiar with sanlock etc. So it's like stabbing in the dark :(


 -Bob

 On Jun 5, 2014 10:56 PM, Andrew Lau and...@andrewklau.com wrote:

 On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com
 wrote:
 
  On 05/25/2014 02:51 PM, Joop wrote:
 
  On 25-5-2014 19:38, Bob Doolittle wrote:
 
 
  Also curious is that when I say poweroff it actually reboots and
  comes
  up again. Could that be due to the timeouts on the way down?
 
  Ah, that's something my F19 host does too. Some more info: if engine
  hasn't been started on the host then I can shutdown it and it will
  poweroff.
  IF engine has been run on it then it will reboot.
  Its not vdsm (I think) because my shutdown sequence is (on my f19
  host):
   service ovirt-agent-ha stop
   service ovirt-agent-broker stop
   service vdsmd stop
   ssh root@engine01 init 0
  init 0
 
  I don't use maintenance mode because when I poweron my host (= my
  desktop)
  I want engine to power on automatically which it does most of the time
  within 10 min.
 
 
  For comparison, I see this issue and I *do* use maintenance mode
  (because
  presumably that's the 'blessed' way to shut things down and I'm scared
  to
  mess this complex system up by straying off the beaten path ;). My
  process
  is:
 
  ssh root@engine init 0
  (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
  hosted-engine --set-maintenance --mode=global
  poweroff
 
  And then on startup:
  hosted-engine --set-maintenance --mode=none
  hosted-engine --vm-start
 
  There are two issues here. I am not sure if they are related or not.
  1. The NFS timeout during shutdown (Joop do you see this also? Or just
  #2?)
  2. The system reboot instead of poweroff (which messes up remote machine
  management)
 
  Thanks,
   Bob
 
 
  I think wdmd or sanlock are causing the reboot instead of poweroff

 While searching for my issue of wdmd/sanlock not shutting down, I
 found this which may interest you both:
 https://bugzilla.redhat.com/show_bug.cgi?id=888197

 Specifically:
 To shut down sanlock without causing a wdmd reboot, you can run the
 following command: sanlock client shutdown -f 1

 This will cause sanlock to kill any pid's that are holding leases,
 release those leases, and then exit.
 

 
  Joop
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt Mirrors

2014-06-03 Thread Andrew Lau
Hi,


On Wed, May 14, 2014 at 7:55 PM, Sandro Bonazzola sbona...@redhat.com wrote:
 Il 13/05/2014 15:37, Brian Proffitt ha scritto:
 That was one on my list...

 package available for testing here: 
 http://jenkins.ovirt.org/job/ovirt-release_gerrit/80/


I'm doing a mini demo tomorrow for some guys who keep asking me about
oVirt. Is this now available in 3.4.2? It'd help if I was able to spin
up a local mirror prior?



 BKP

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Eyal Edri ee...@redhat.com
 Cc: David Caro Estevez dcaro...@redhat.com, Brian Proffitt 
 bprof...@redhat.com, users users@ovirt.org,
 infra in...@ovirt.org
 Sent: Tuesday, May 13, 2014 8:49:40 AM
 Subject: Re: [ovirt-users] oVirt Mirrors

 You'd have a good chance with AARNET http://mirror.aarnet.edu.au/

 On Tue, May 13, 2014 at 10:41 PM, Eyal Edri ee...@redhat.com wrote:
 Hi,

 We have some mirrors up and running for a few universities, not sure any of
 them are in australia though.
 david/brian - any chance of contacting an australia university for another
 mirror?

 Eyal.

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: users users@ovirt.org
 Sent: Tuesday, May 13, 2014 3:13:44 PM
 Subject: [ovirt-users] oVirt Mirrors

 Hi,

 I was wondering if there were any plans on drumming up an oVirt mirror
 down in Australia. It takes me nearly a few hours just to grab all the
 required packages to spin up a new box.

 Alternatively, the mirror does not seem to support rsync. Is there a
 recommended way to sync? (w/o ISOs)

 Thanks,
 Andrew
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users



 --
 Sandro Bonazzola
 Better technology. Faster innovation. Powered by community collaboration.
 See how it works at redhat.com
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Recommended setup for a FC based storage domain

2014-06-02 Thread Andrew Lau
I'm curious to hear what other comments arise, as we're analyzing a
production setup shortly.


On Sun, Jun 1, 2014 at 10:11 PM,  combus...@archlinux.us wrote:
 I need to scratch gluster off because setup is based on CentOS 6.5, so
 essential prerequisites like qemu 1.3 and libvirt 1.0.1 are not met.
Gluster would still work with EL6, afaik it just won't use libgfapi and
instead use just a standard mount.


 Any info regarding FC storage domain would be appreciated though.

 Thanks

 Ivan

 On Sunday, 1. June 2014. 11.44.33 combus...@archlinux.us wrote:
 Hi,

 I have a 4 node cluster setup and my storage options right now are a FC
 based storage, one partition per node on a local drive (~200GB each) and a
 NFS based NAS device. I want to setup export and ISO domain on the NAS and
 there are no issues or questions regarding those two. I wasn't aware of any
 other options at the time for utilizing a local storage (since this is a
 shared based datacenter) so I exported a directory from each partition via
 NFS and it works. But I am little in the dark with the following:

 1. Are there any advantages for switching from NFS based local storage to a
 Gluster based domain with blocks for each partition. I guess it can be only
 performance wise but maybe I'm wrong. If there are advantages, are there any
 tips regarding xfs mount options etc ?

 2. I've created a volume on the FC based storage and exported it to all of
 the nodes in the cluster on the storage itself. I've configured
 multipathing correctly and added an alias for the wwid of the LUN so I can
 distinct this one and any other future volumes more easily. At first I
 created a partition on it but since oVirt saw only the whole LUN as raw
 device I erased it before adding it as the FC master storage domain. I've
 imported a few VM's and point them to the FC storage domain. This setup
 works, but:

 - All of the nodes see a device with the alias for the wwid of the volume,
 but only the node wich is currently the SPM for the cluster can see logical
 volumes inside. Also when I setup the high availability for VM's residing
 on the FC storage and select to start on any node on the cluster, they
 always start on the SPM. Can multiple nodes run different VM's on the same
 FC storage at the same time (logical thing would be that they can, but I
 wanted to be sure first). I am not familiar with the logic oVirt utilizes
 that locks the vm's logical volume to prevent corruption.

 - Fdisk shows that logical volumes on the LUN of the FC volume are
 missaligned (partition doesn't end on cylindar boundary), so I wonder if
 this is becuase I imported the VM's with disks that were created on local
 storage before and that any _new_ VM's with disks on the fc storage would
 be propperly aligned.

 This is a new setup with oVirt 3.4 (did an export of all the VM's on 3.3 and
 after a fresh installation of the 3.4 imported them back again). I have
 room to experiment a little with 2 of the 4 nodes because currently they
 are free from running any VM's, but I have limited room for anything else
 that would cause an unplanned downtime for four virtual machines running on
 the other two nodes on the cluster (currently highly available and their
 drives are on the FC storage domain). All in all I have 12 VM's running and
 I'm asking on the list for advice and guidance before I make any changes.

 Just trying to find as much info regarding all of this as possible before
 acting upon.

 Thank you in advance,

 Ivan

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] glusterfs resume vm paused state

2014-06-02 Thread Andrew Lau
Hi Humble,

On Mon, Jun 2, 2014 at 8:10 PM, Humble Devassy Chirammal
humble.deva...@gmail.com wrote:
 Hi Andrew,

 Afaict, there should be manual intervention to resume a 'paused vm' in any
 storage domain even if VM is marked as HA..

I had a BZ open about this with some traction, but i forgot to keep up
with the requests and it's fallen behind
https://bugzilla.redhat.com/show_bug.cgi?id=1058300

Even manually, they won't resume. virsh resume host also has the same
end result.

 Also, I failed to understand the setup you have, that said, you mentioned:


  resuming a VM from a paused state on top
 of NFS share? Even when the VMs are marked as HA, if the gluster
 storage goes down for a few seconds the VMs go to a paused state and
 can never be resumed

 Do you have NFS storage domain configured by specifying gluster server ip
 and volume name  in place of server and export path ?

 can you please detail the setup (wrt storage domain configuration and
 gluster volumes) and version of ovirt and gluster in use ?

We're testing a two host setup with oVirt and gluster on the same
boxes. CentOS 6.5, hosted-engine. Storage domain type as glusterfs,
although when I try a storage domain type of nfs (using the gluster
nfs server) the above issue doesn't seem to occur.


 --Humble


 On Mon, Jun 2, 2014 at 11:17 AM, Andrew Lau and...@andrewklau.com wrote:

 Hi,

 Has anyone had any luck with resuming a VM from a paused state on top
 of NFS share? Even when the VMs are marked as HA, if the gluster
 storage goes down for a few seconds the VMs go to a paused state and
 can never be resumed. They require a hard reset.

 I recall when using NFS to not have this issue.

 Thanks,
 Andrew
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] glusterfs resume vm paused state

2014-06-01 Thread Andrew Lau
Hi,

Has anyone had any luck with resuming a VM from a paused state on top
of NFS share? Even when the VMs are marked as HA, if the gluster
storage goes down for a few seconds the VMs go to a paused state and
can never be resumed. They require a hard reset.

I recall when using NFS to not have this issue.

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-31 Thread Andrew Lau
On Sat, May 31, 2014 at 6:06 AM, Joop jvdw...@xs4all.nl wrote:
 Bob Doolittle wrote:

 Joop,

 On 05/26/2014 02:43 AM, Joop wrote:

 Yesterday evening I have found the service responsible for the reboot
 instead of the powerdown. If I do: service wdmd stop the server will reboot.
 It seems the watchdog is hung up and eventually this will lead to a crash
 and thus a reboot instead of the shutdown.

 Anyone knows how to debug this?


 Did you get anywhere with this?
 Pretty nasty. Is there a bug open?

 We're getting a timeout on an NFS mount during the powerdown (single-node
 hosted, after global maintenance enabled and engine powered off), and that
 makes the machine reboot and try to come back up again instead of powering
 off.

 So two issues:
 - What is the mount that is hanging (probably an oVirt issue)?

 Don´t know what that problem is. I have a local nfs mount but don´t
 experience that problem

I got to the console when mine went for a reboot. I see sanlock and
wdmd failing to shutdown properly which would explain why it doesn't
unmount properly.



 - Why does the system reboot instead of powering down as instructed (?)?


 the reboot is caused by wdmd. Docs says that if the watchdogs aren´t
 responding that a reset will follow. So our init 0 is overruled because of
 the hanging watchdog. Why it is hanging I don´t know. Could be my chipset,
 could be the version of wdmd-kernel. Only thing is I know it didn´t happen
 always in the past but that is as much as I remember, sorry.


 Joop


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] hosted-engine configure startup time allowance?

2014-05-28 Thread Andrew Lau
Hi,

I was just wondering if it's possible to configure the startup-time
allowance of the hosted-engine? I seem to sometime have this issue
where my hosted-engine would start automatically but it would be sent
a reboot signal 30 seconds before the engine has time to startup. This
is because it fails the 'liveliness check', just before it reboots the
engine status would be set to up but as the reboot signal was already
sent the VM will reboot and then startup on another host.

This then goes into a loop, until I do a global maintenance, manual
bootup and then maintenance mode none.

Thanks,
Andrew.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-28 Thread Andrew Lau
Hi Doron,

Before the initial thread sways a little more..

On Mon, May 26, 2014 at 4:38 PM, Doron Fediuck dfedi...@redhat.com wrote:


 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Bob Doolittle b...@doolittle.us.com
 Cc: users users@ovirt.org
 Sent: Monday, May 26, 2014 7:30:41 AM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?

 On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote:
 
  On 05/25/2014 02:51 PM, Joop wrote:
 
  On 25-5-2014 19:38, Bob Doolittle wrote:
 
 
  Also curious is that when I say poweroff it actually reboots and comes
  up again. Could that be due to the timeouts on the way down?
 
  Ah, that's something my F19 host does too. Some more info: if engine
  hasn't been started on the host then I can shutdown it and it will
  poweroff.
  IF engine has been run on it then it will reboot.
  Its not vdsm (I think) because my shutdown sequence is (on my f19 host):
   service ovirt-agent-ha stop
   service ovirt-agent-broker stop
   service vdsmd stop
   ssh root@engine01 init 0
  init 0
 
  I don't use maintenance mode because when I poweron my host (= my desktop)
  I want engine to power on automatically which it does most of the time
  within 10 min.
 
 
  For comparison, I see this issue and I *do* use maintenance mode (because
  presumably that's the 'blessed' way to shut things down and I'm scared to
  mess this complex system up by straying off the beaten path ;). My process
  is:
 
  ssh root@engine init 0
  (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
  hosted-engine --set-maintenance --mode=global
  poweroff
 
  And then on startup:
  hosted-engine --set-maintenance --mode=none
  hosted-engine --vm-start
 
  There are two issues here. I am not sure if they are related or not.
  1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?)
  2. The system reboot instead of poweroff (which messes up remote machine
  management)
 

 For 1. I was wondering if perhaps, we could have an option to specify
 the mount options. If I understand correctly, applying a soft mount
 instead of a hard mount would prevent this from happening. I'm however
 not sure of the implications this would have on the data integrity..

 I would really like to see it happen in the ha-agent, as it's the one
 which connects/mounts the storage it should also unmount it on boot.
 However the stability on it, is flaky at best. I've noticed if `df`
 hangs because of another NFS mount having timed-out the agent will
 die. That's not a good sign.. this was what actually caused my
 hosted-engine to run twice in one case.

  Thanks,
   Bob
 
 
  I think wdmd or sanlock are causing the reboot instead of poweroff
 
  Joop
 

 Great to have your feedback guys!

 So just to clarify some of the issues you mentioned;

 Hosted engine wasn't designed for a 'single node' use case, as we do
 want it to be highly available. This is why it's being restarted
 elsewhere or even on the same server if no better alternative.

 Having said that, it is possible to set global maintenance mode
 as a first step (in the UI: right click engine vm and choose ha-maintenance).
 Then you can ssh into the engine vm and init 0.

 After a short while, the qemu process should gracefully end and release
 its sanlock lease as well as any other resource, which means you can
 reboot your hypervisor peacefully.


What about in a 2 host cluster. Lets say we want to take down 1 host
for maintenance, so 50% chance it could be running the engine. Would
setting  maintenance-mode local do the same thing and allow a clean
shutdown/reboot?

 Doron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hosted-engine configure startup time allowance?

2014-05-28 Thread Andrew Lau
Hi Jiri,

On Wed, May 28, 2014 at 5:10 PM, Jiri Moskovcak jmosk...@redhat.com wrote:
 On 05/28/2014 08:52 AM, Andrew Lau wrote:

 Hi,

 I was just wondering if it's possible to configure the startup-time
 allowance of the hosted-engine? I seem to sometime have this issue
 where my hosted-engine would start automatically but it would be sent
 a reboot signal 30 seconds before the engine has time to startup. This
 is because it fails the 'liveliness check', just before it reboots the
 engine status would be set to up but as the reboot signal was already
 sent the VM will reboot and then startup on another host.

 This then goes into a loop, until I do a global maintenance, manual
 bootup and then maintenance mode none.


 Hi Andrew,
 try to look into [1] and tweak the timeouts there. Don't forget to restart
 the ovirt-ha-agent service when you change it.


Thanks, I'll try that out.

Are all those values there available in /etc/ovirt-hosted-engine-ha/agent.conf ?


 --Jirka

 [1]
 /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/constants.py

 Thanks,
 Andrew.
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to become operational...

2014-05-25 Thread Andrew Lau
On Sun, May 25, 2014 at 4:04 PM, Artyom Lukianov aluki...@redhat.com wrote:
 I see that I verified it on version 
 ovirt-hosted-engine-setup-1.1.2-5.el6ev.noarch, so it must work from this 
 version and above.
 Thanks
I can only seem to get 1.1.2.1 is the patched version being released soon?

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Artyom Lukianov aluki...@redhat.com
 Cc: users users@ovirt.org, Sandro Bonazzola sbona...@redhat.com
 Sent: Saturday, May 24, 2014 2:51:15 PM
 Subject: Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to 
 become operational...

 Simply starting the ha-agents manually seems to bring up the VM
 however it doesn't come up in the chkconfig list.

 The next host that gets configured works fine. What steps get
 configured in that final stage that perhaps I could manually run
 rather than rerolling for a third time?

 On Sat, May 24, 2014 at 9:42 PM, Andrew Lau and...@andrewklau.com wrote:
 Hi,

 Are these patches merged into 3.4.1? I seem to be hitting this issue
 now, twice in a row.
 The second BZ is also marked as private.

 On Fri, May 2, 2014 at 1:21 AM, Artyom Lukianov aluki...@redhat.com wrote:
 It have number of the same bugs:
 https://bugzilla.redhat.com/show_bug.cgi?id=1080513
 https://bugzilla.redhat.com/show_bug.cgi?id=1088572 - fix for this already 
 merged, so if you take the last ovirt it must include it
 The one thing you can do until it, it try to restart host and start 
 deployment process from beginning.
 Thanks

 - Original Message -
 From: Tobias Honacker tob...@honacker.info
 To: users@ovirt.org
 Sent: Thursday, May 1, 2014 6:06:47 PM
 Subject: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to 
 become operational...

 Hi all,

 i hit this bug yesterday.

 Packages:

 ovirt-host-deploy-1.2.0-1.el6.noarch
 ovirt-engine-sdk-python-3.4.0.7-1.el6.noarch
 ovirt-hosted-engine-setup-1.1.2-1.el6.noarch
 ovirt-release-11.2.0-1.noarch
 ovirt-hosted-engine-ha-1.1.2-1.el6.noarch

 After setting up the hosted engine (running great) the setup canceled with 
 this MSG:

 [ INFO  ] The VDSM Host is now operational
 [ ERROR ] Waiting for cluster 'Default' to become operational...
 [ ERROR ] Failed to execute stage 'Closing up': 'NoneType' object has no 
 attribute '__dict__'
 [ INFO  ] Stage: Clean up
 [ INFO  ] Stage: Pre-termination
 [ INFO  ] Stage: Termination

 What is the next step i have to do that t he HA features of the 
 hosted-engine will take care of keeping the VM alive.

 best regards
 tobias

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to become operational...

2014-05-25 Thread Andrew Lau
On Sun, May 25, 2014 at 8:52 PM, Yedidyah Bar David d...@redhat.com wrote:
 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Artyom Lukianov aluki...@redhat.com
 Cc: users users@ovirt.org
 Sent: Sunday, May 25, 2014 1:02:18 PM
 Subject: Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to 
 become operational...

 On Sun, May 25, 2014 at 4:04 PM, Artyom Lukianov aluki...@redhat.com wrote:
  I see that I verified it on version
  ovirt-hosted-engine-setup-1.1.2-5.el6ev.noarch, so it must work from this
  version and above.
  Thanks
 I can only seem to get 1.1.2.1 is the patched version being released soon?

 ovirt-hosted-engine-setup-1.1.2-5.el6ev.noarch is an internal version and 
 should
 not be confused with those on ovirt.org.

I wonder why I get 1.1.2.1 when ran the install only just yesterday..
although I do see 1.1.3.1 in the repo


 [1] contains 1.1.3-1 . The 3.4.1 release notes also mention that BZ 1088572 
 was
 solved by it.

 [1] http://resources.ovirt.org/pub/ovirt-3.4/rpm/fc19/noarch/


  - Original Message -
  From: Andrew Lau and...@andrewklau.com
  To: Artyom Lukianov aluki...@redhat.com
  Cc: users users@ovirt.org, Sandro Bonazzola sbona...@redhat.com
  Sent: Saturday, May 24, 2014 2:51:15 PM
  Subject: Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to
  become operational...
 
  Simply starting the ha-agents manually seems to bring up the VM
  however it doesn't come up in the chkconfig list.
 
  The next host that gets configured works fine. What steps get
  configured in that final stage that perhaps I could manually run
  rather than rerolling for a third time?
 
  On Sat, May 24, 2014 at 9:42 PM, Andrew Lau and...@andrewklau.com wrote:
  Hi,
 
  Are these patches merged into 3.4.1? I seem to be hitting this issue
  now, twice in a row.
  The second BZ is also marked as private.
 
  On Fri, May 2, 2014 at 1:21 AM, Artyom Lukianov aluki...@redhat.com
  wrote:
  It have number of the same bugs:
  https://bugzilla.redhat.com/show_bug.cgi?id=1080513
  https://bugzilla.redhat.com/show_bug.cgi?id=1088572 - fix for this
  already merged, so if you take the last ovirt it must include it
  The one thing you can do until it, it try to restart host and start
  deployment process from beginning.
  Thanks
 
  - Original Message -
  From: Tobias Honacker tob...@honacker.info
  To: users@ovirt.org
  Sent: Thursday, May 1, 2014 6:06:47 PM
  Subject: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to
  become operational...
 
  Hi all,
 
  i hit this bug yesterday.
 
  Packages:
 
  ovirt-host-deploy-1.2.0-1.el6.noarch
  ovirt-engine-sdk-python-3.4.0.7-1.el6.noarch
  ovirt-hosted-engine-setup-1.1.2-1.el6.noarch
  ovirt-release-11.2.0-1.noarch
  ovirt-hosted-engine-ha-1.1.2-1.el6.noarch
 
  After setting up the hosted engine (running great) the setup canceled
  with this MSG:
 
  [ INFO  ] The VDSM Host is now operational
  [ ERROR ] Waiting for cluster 'Default' to become operational...
  [ ERROR ] Failed to execute stage 'Closing up': 'NoneType' object has no
  attribute '__dict__'
  [ INFO  ] Stage: Clean up
  [ INFO  ] Stage: Pre-termination
  [ INFO  ] Stage: Termination
 
  What is the next step i have to do that t he HA features of the
  hosted-engine will take care of keeping the VM alive.
 
  best regards
  tobias
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


 --
 Didi
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to become operational...

2014-05-25 Thread Andrew Lau
On Sun, May 25, 2014 at 10:25 PM, Yedidyah Bar David d...@redhat.com wrote:
 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Yedidyah Bar David d...@redhat.com
 Cc: Artyom Lukianov aluki...@redhat.com, users users@ovirt.org
 Sent: Sunday, May 25, 2014 2:00:24 PM
 Subject: Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to 
 become operational...

 On Sun, May 25, 2014 at 8:52 PM, Yedidyah Bar David d...@redhat.com wrote:
  - Original Message -
  From: Andrew Lau and...@andrewklau.com
  To: Artyom Lukianov aluki...@redhat.com
  Cc: users users@ovirt.org
  Sent: Sunday, May 25, 2014 1:02:18 PM
  Subject: Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default'
  to become operational...
 
  On Sun, May 25, 2014 at 4:04 PM, Artyom Lukianov aluki...@redhat.com
  wrote:
   I see that I verified it on version
   ovirt-hosted-engine-setup-1.1.2-5.el6ev.noarch, so it must work from
   this
   version and above.
   Thanks
  I can only seem to get 1.1.2.1 is the patched version being released soon?
 
  ovirt-hosted-engine-setup-1.1.2-5.el6ev.noarch is an internal version and
  should
  not be confused with those on ovirt.org.

 I wonder why I get 1.1.2.1 when ran the install only just yesterday..
 although I do see 1.1.3.1 in the repo

 No idea - verified now that it works for me. Perhaps some local caching?
 Did you try 'yum clean all'?

It was a fresh install, I just tried yum clean all and a yum update, nothing.

Are my repos correct?
[root@ov-hv1-2a-08-23 ~]# cat /etc/yum.repos.d/ovirt.repo

[ovirt-stable]
name=Latest oVirt Releases
baseurl=http://ovirt.org/releases/stable/rpm/EL/$releasever/
enabled=1
skip_if_unavailable=1
gpgcheck=0


# Latest oVirt 3.4 releases

[ovirt-3.4-stable]
name=Latest oVirt 3.4.z Releases
baseurl=http://ovirt.org/releases/3.4/rpm/EL/$releasever/
enabled=1
skip_if_unavailable=1
gpgcheck=0


[ovirt-3.4-prerelease]
name=Latest oVirt 3.4 Pre Releases (Beta to Release Candidate)
baseurl=http://resources.ovirt.org/releases/3.4_pre/rpm/EL/$releasever/
enabled=0
skip_if_unavailable=1
gpgcheck=0


# Latest oVirt 3.3 releases

[ovirt-3.3-stable]
name=Latest oVirt 3.3.z Releases
baseurl=http://resources.ovirt.org/releases/3.3/rpm/EL/$releasever/
enabled=1
skip_if_unavailable=1
gpgcheck=0

[ovirt-3.3-prerelease]
name=Latest oVirt 3.3.z Pre Releases (Beta to Release Candidate)
baseurl=http://resources.ovirt.org/releases/3.3_pre/rpm/EL/$releasever/
enabled=0
skip_if_unavailable=1
gpgcheck=0

I still seem to be getting:
[root@ov-hv1-2a-08-23 ~]# rpm -qa | grep ovirt
ovirt-hosted-engine-setup-1.1.2-1.el6.noarch
ovirt-release-11.2.0-1.noarch
ovirt-host-deploy-1.2.0-1.el6.noarch
ovirt-hosted-engine-ha-1.1.2-1.el6.noarch
ovirt-engine-sdk-python-3.4.0.7-1.el6.noarch



 
  [1] contains 1.1.3-1 . The 3.4.1 release notes also mention that BZ 1088572
  was
  solved by it.
 
  [1] http://resources.ovirt.org/pub/ovirt-3.4/rpm/fc19/noarch/
 
 
   - Original Message -
   From: Andrew Lau and...@andrewklau.com
   To: Artyom Lukianov aluki...@redhat.com
   Cc: users users@ovirt.org, Sandro Bonazzola sbona...@redhat.com
   Sent: Saturday, May 24, 2014 2:51:15 PM
   Subject: Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default'
   to
   become operational...
  
   Simply starting the ha-agents manually seems to bring up the VM
   however it doesn't come up in the chkconfig list.
  
   The next host that gets configured works fine. What steps get
   configured in that final stage that perhaps I could manually run
   rather than rerolling for a third time?
  
   On Sat, May 24, 2014 at 9:42 PM, Andrew Lau and...@andrewklau.com
   wrote:
   Hi,
  
   Are these patches merged into 3.4.1? I seem to be hitting this issue
   now, twice in a row.
   The second BZ is also marked as private.
  
   On Fri, May 2, 2014 at 1:21 AM, Artyom Lukianov aluki...@redhat.com
   wrote:
   It have number of the same bugs:
   https://bugzilla.redhat.com/show_bug.cgi?id=1080513
   https://bugzilla.redhat.com/show_bug.cgi?id=1088572 - fix for this
   already merged, so if you take the last ovirt it must include it
   The one thing you can do until it, it try to restart host and start
   deployment process from beginning.
   Thanks
  
   - Original Message -
   From: Tobias Honacker tob...@honacker.info
   To: users@ovirt.org
   Sent: Thursday, May 1, 2014 6:06:47 PM
   Subject: [ovirt-users] Hosted Engine - Waiting for cluster 'Default'
   to
   become operational...
  
   Hi all,
  
   i hit this bug yesterday.
  
   Packages:
  
   ovirt-host-deploy-1.2.0-1.el6.noarch
   ovirt-engine-sdk-python-3.4.0.7-1.el6.noarch
   ovirt-hosted-engine-setup-1.1.2-1.el6.noarch
   ovirt-release-11.2.0-1.noarch
   ovirt-hosted-engine-ha-1.1.2-1.el6.noarch
  
   After setting up the hosted engine (running great) the setup canceled
   with this MSG:
  
   [ INFO  ] The VDSM Host is now operational
   [ ERROR ] Waiting for cluster 'Default' to become operational...
   [ ERROR

Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to become operational...

2014-05-25 Thread Andrew Lau
On Sun, May 25, 2014 at 10:59 PM, Yedidyah Bar David d...@redhat.com wrote:
 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Yedidyah Bar David d...@redhat.com
 Cc: Artyom Lukianov aluki...@redhat.com, users users@ovirt.org
 Sent: Sunday, May 25, 2014 3:38:07 PM
 Subject: Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to 
 become operational...

 On Sun, May 25, 2014 at 10:25 PM, Yedidyah Bar David d...@redhat.com wrote:
  - Original Message -
  From: Andrew Lau and...@andrewklau.com
  To: Yedidyah Bar David d...@redhat.com
  Cc: Artyom Lukianov aluki...@redhat.com, users users@ovirt.org
  Sent: Sunday, May 25, 2014 2:00:24 PM
  Subject: Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default'
  to become operational...
 
  On Sun, May 25, 2014 at 8:52 PM, Yedidyah Bar David d...@redhat.com
  wrote:
   - Original Message -
   From: Andrew Lau and...@andrewklau.com
   To: Artyom Lukianov aluki...@redhat.com
   Cc: users users@ovirt.org
   Sent: Sunday, May 25, 2014 1:02:18 PM
   Subject: Re: [ovirt-users] Hosted Engine - Waiting for cluster
   'Default'
   to become operational...
  
   On Sun, May 25, 2014 at 4:04 PM, Artyom Lukianov aluki...@redhat.com
   wrote:
I see that I verified it on version
ovirt-hosted-engine-setup-1.1.2-5.el6ev.noarch, so it must work from
this
version and above.
Thanks
   I can only seem to get 1.1.2.1 is the patched version being released
   soon?
  
   ovirt-hosted-engine-setup-1.1.2-5.el6ev.noarch is an internal version
   and
   should
   not be confused with those on ovirt.org.
 
  I wonder why I get 1.1.2.1 when ran the install only just yesterday..
  although I do see 1.1.3.1 in the repo
 
  No idea - verified now that it works for me. Perhaps some local caching?
  Did you try 'yum clean all'?

 It was a fresh install, I just tried yum clean all and a yum update, nothing.

 Are my repos correct?
 [root@ov-hv1-2a-08-23 ~]# cat /etc/yum.repos.d/ovirt.repo

 [ovirt-stable]
 name=Latest oVirt Releases
 baseurl=http://ovirt.org/releases/stable/rpm/EL/$releasever/
 enabled=1
 skip_if_unavailable=1
 gpgcheck=0


 # Latest oVirt 3.4 releases

 [ovirt-3.4-stable]
 name=Latest oVirt 3.4.z Releases
 baseurl=http://ovirt.org/releases/3.4/rpm/EL/$releasever/
 enabled=1
 skip_if_unavailable=1
 gpgcheck=0


 [ovirt-3.4-prerelease]
 name=Latest oVirt 3.4 Pre Releases (Beta to Release Candidate)
 baseurl=http://resources.ovirt.org/releases/3.4_pre/rpm/EL/$releasever/
 enabled=0
 skip_if_unavailable=1
 gpgcheck=0


 # Latest oVirt 3.3 releases

 [ovirt-3.3-stable]
 name=Latest oVirt 3.3.z Releases
 baseurl=http://resources.ovirt.org/releases/3.3/rpm/EL/$releasever/
 enabled=1
 skip_if_unavailable=1
 gpgcheck=0

 [ovirt-3.3-prerelease]
 name=Latest oVirt 3.3.z Pre Releases (Beta to Release Candidate)
 baseurl=http://resources.ovirt.org/releases/3.3_pre/rpm/EL/$releasever/
 enabled=0
 skip_if_unavailable=1
 gpgcheck=0

 Seems ok, but note that the 'resources.ovirt.org/releases' URLs are
 obsolete and recent release packages (e.g. the one pointed at from
 the 3.4.1 release notes) point at 'resources.ovirt.org/pub'.

I haven't modified the URLs, they were what just came from ovirt-release


 I still seem to be getting:
 [root@ov-hv1-2a-08-23 ~]# rpm -qa | grep ovirt
 ovirt-hosted-engine-setup-1.1.2-1.el6.noarch
 ovirt-release-11.2.0-1.noarch
 ovirt-host-deploy-1.2.0-1.el6.noarch
 ovirt-hosted-engine-ha-1.1.2-1.el6.noarch
 ovirt-engine-sdk-python-3.4.0.7-1.el6.noarch

 This shows what you have installed. What do you get from
 'yum list ovirt-hosted-engine-setup' ?

[root@ov-hv1-2a-08-23 yum.repos.d]# yum list ovirt-hosted-engine-setup
Loaded plugins: fastestmirror, security
Loading mirror speeds from cached hostfile
ovirt-epel/metalink

   | 3.3 kB 00:00
 * base: mirror.as24220.net
 * epel: mirror.optus.net
 * extras: mirror.as24220.net
 * ovirt-epel: mirror.optus.net
 * ovirt-jpackage-6.0-generic: mirror.ibcp.fr
 * updates: centos.melb.au.glomirror.com.au
ovirt-3.3-stable

   | 2.9 kB 00:00
ovirt-3.4-stable

   | 2.9 kB 00:00
ovirt-glusterfs-epel

   | 2.9 kB 00:00
ovirt-glusterfs-noarch-epel

   | 2.9 kB 00:00
ovirt-jpackage-6.0-generic

   | 1.9 kB 00:00
ovirt-stable

   | 2.9 kB 00:00
Installed Packages
ovirt-hosted-engine-setup.noarch
   1.1.2-1.el6
 @ovirt-3.4-stable

 --
 Didi
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to become operational...

2014-05-25 Thread Andrew Lau
Thanks, that fixed it.

Cheers

On Sun, May 25, 2014 at 11:24 PM, Joop jvdw...@xs4all.nl wrote:
 Reinstall ovirt.repo using the resources.ovirt.org/pub path. There is a
 ovirt-release.rpm, use that.

 Joop


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-25 Thread Andrew Lau
On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote:

 On 05/25/2014 02:51 PM, Joop wrote:

 On 25-5-2014 19:38, Bob Doolittle wrote:


 Also curious is that when I say poweroff it actually reboots and comes
 up again. Could that be due to the timeouts on the way down?

 Ah, that's something my F19 host does too. Some more info: if engine
 hasn't been started on the host then I can shutdown it and it will poweroff.
 IF engine has been run on it then it will reboot.
 Its not vdsm (I think) because my shutdown sequence is (on my f19 host):
  service ovirt-agent-ha stop
  service ovirt-agent-broker stop
  service vdsmd stop
  ssh root@engine01 init 0
 init 0

 I don't use maintenance mode because when I poweron my host (= my desktop)
 I want engine to power on automatically which it does most of the time
 within 10 min.


 For comparison, I see this issue and I *do* use maintenance mode (because
 presumably that's the 'blessed' way to shut things down and I'm scared to
 mess this complex system up by straying off the beaten path ;). My process
 is:

 ssh root@engine init 0
 (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
 hosted-engine --set-maintenance --mode=global
 poweroff

 And then on startup:
 hosted-engine --set-maintenance --mode=none
 hosted-engine --vm-start

 There are two issues here. I am not sure if they are related or not.
 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?)
 2. The system reboot instead of poweroff (which messes up remote machine
 management)


For 1. I was wondering if perhaps, we could have an option to specify
the mount options. If I understand correctly, applying a soft mount
instead of a hard mount would prevent this from happening. I'm however
not sure of the implications this would have on the data integrity..

I would really like to see it happen in the ha-agent, as it's the one
which connects/mounts the storage it should also unmount it on boot.
However the stability on it, is flaky at best. I've noticed if `df`
hangs because of another NFS mount having timed-out the agent will
die. That's not a good sign.. this was what actually caused my
hosted-engine to run twice in one case.

 Thanks,
  Bob


 I think wdmd or sanlock are causing the reboot instead of poweroff

 Joop

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Can HA Agent control NFS Mount?

2014-05-24 Thread Andrew Lau
Hi,

I was just wondering, within the whole complexity of hosted-engine.
Would it be possible for the hosted-engine ha-agent control the mount
point?

I'm basing this off a few people I've been talking to who have their
NFS server running on the same host that the hosted-engine servers are
running. Most normally also running that on top of gluster.

The main motive for this, is currently if the nfs server is running on
the localhost and the server goes for a clean shutdown it will hang
because the nfs mount is hard mounted and as the nfs server has gone
away, we're stuck at an infinite hold waiting for it to cleanly
unmount (which it never will)

If it's possible for instead one of the ha components to unmount this
nfs mount when it shuts down, this could potentially prevent this.
There are other alternatives and I know this is not the supported
scenario, but just hoping to bounce a few ideas.

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to become operational...

2014-05-24 Thread Andrew Lau
Hi,

Are these patches merged into 3.4.1? I seem to be hitting this issue
now, twice in a row.
The second BZ is also marked as private.

On Fri, May 2, 2014 at 1:21 AM, Artyom Lukianov aluki...@redhat.com wrote:
 It have number of the same bugs:
 https://bugzilla.redhat.com/show_bug.cgi?id=1080513
 https://bugzilla.redhat.com/show_bug.cgi?id=1088572 - fix for this already 
 merged, so if you take the last ovirt it must include it
 The one thing you can do until it, it try to restart host and start 
 deployment process from beginning.
 Thanks

 - Original Message -
 From: Tobias Honacker tob...@honacker.info
 To: users@ovirt.org
 Sent: Thursday, May 1, 2014 6:06:47 PM
 Subject: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to 
 become operational...

 Hi all,

 i hit this bug yesterday.

 Packages:

 ovirt-host-deploy-1.2.0-1.el6.noarch
 ovirt-engine-sdk-python-3.4.0.7-1.el6.noarch
 ovirt-hosted-engine-setup-1.1.2-1.el6.noarch
 ovirt-release-11.2.0-1.noarch
 ovirt-hosted-engine-ha-1.1.2-1.el6.noarch

 After setting up the hosted engine (running great) the setup canceled with 
 this MSG:

 [ INFO  ] The VDSM Host is now operational
 [ ERROR ] Waiting for cluster 'Default' to become operational...
 [ ERROR ] Failed to execute stage 'Closing up': 'NoneType' object has no 
 attribute '__dict__'
 [ INFO  ] Stage: Clean up
 [ INFO  ] Stage: Pre-termination
 [ INFO  ] Stage: Termination

 What is the next step i have to do that t he HA features of the hosted-engine 
 will take care of keeping the VM alive.

 best regards
 tobias

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to become operational...

2014-05-24 Thread Andrew Lau
Simply starting the ha-agents manually seems to bring up the VM
however it doesn't come up in the chkconfig list.

The next host that gets configured works fine. What steps get
configured in that final stage that perhaps I could manually run
rather than rerolling for a third time?

On Sat, May 24, 2014 at 9:42 PM, Andrew Lau and...@andrewklau.com wrote:
 Hi,

 Are these patches merged into 3.4.1? I seem to be hitting this issue
 now, twice in a row.
 The second BZ is also marked as private.

 On Fri, May 2, 2014 at 1:21 AM, Artyom Lukianov aluki...@redhat.com wrote:
 It have number of the same bugs:
 https://bugzilla.redhat.com/show_bug.cgi?id=1080513
 https://bugzilla.redhat.com/show_bug.cgi?id=1088572 - fix for this already 
 merged, so if you take the last ovirt it must include it
 The one thing you can do until it, it try to restart host and start 
 deployment process from beginning.
 Thanks

 - Original Message -
 From: Tobias Honacker tob...@honacker.info
 To: users@ovirt.org
 Sent: Thursday, May 1, 2014 6:06:47 PM
 Subject: [ovirt-users] Hosted Engine - Waiting for cluster 'Default' to 
 become operational...

 Hi all,

 i hit this bug yesterday.

 Packages:

 ovirt-host-deploy-1.2.0-1.el6.noarch
 ovirt-engine-sdk-python-3.4.0.7-1.el6.noarch
 ovirt-hosted-engine-setup-1.1.2-1.el6.noarch
 ovirt-release-11.2.0-1.noarch
 ovirt-hosted-engine-ha-1.1.2-1.el6.noarch

 After setting up the hosted engine (running great) the setup canceled with 
 this MSG:

 [ INFO  ] The VDSM Host is now operational
 [ ERROR ] Waiting for cluster 'Default' to become operational...
 [ ERROR ] Failed to execute stage 'Closing up': 'NoneType' object has no 
 attribute '__dict__'
 [ INFO  ] Stage: Clean up
 [ INFO  ] Stage: Pre-termination
 [ INFO  ] Stage: Termination

 What is the next step i have to do that t he HA features of the 
 hosted-engine will take care of keeping the VM alive.

 best regards
 tobias

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] ovirt image deployment and foreman?

2014-05-17 Thread Andrew Lau
Hi guys,

I was wondering if anyone's had any luck with oVirt and Foreman 1.5
image integration.

Foreman 1.5 seems to boast the new oVirt image deployment feature but
I'm still not getting much luck and no one seems to know why. I've
opened my bug here http://projects.theforeman.org/issues/5581

So far I've only got up to:
The VM will be created in oVirt, it will have no disk attached to it
and it's template configurable such as Delete protection are not
kept.

Could anyone shed some light on this?

Cheers,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] oVirt Mirrors

2014-05-13 Thread Andrew Lau
Hi,

I was wondering if there were any plans on drumming up an oVirt mirror
down in Australia. It takes me nearly a few hours just to grab all the
required packages to spin up a new box.

Alternatively, the mirror does not seem to support rsync. Is there a
recommended way to sync? (w/o ISOs)

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt Mirrors

2014-05-13 Thread Andrew Lau
You'd have a good chance with AARNET http://mirror.aarnet.edu.au/

On Tue, May 13, 2014 at 10:41 PM, Eyal Edri ee...@redhat.com wrote:
 Hi,

 We have some mirrors up and running for a few universities, not sure any of 
 them are in australia though.
 david/brian - any chance of contacting an australia university for another 
 mirror?

 Eyal.

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: users users@ovirt.org
 Sent: Tuesday, May 13, 2014 3:13:44 PM
 Subject: [ovirt-users] oVirt Mirrors

 Hi,

 I was wondering if there were any plans on drumming up an oVirt mirror
 down in Australia. It takes me nearly a few hours just to grab all the
 required packages to spin up a new box.

 Alternatively, the mirror does not seem to support rsync. Is there a
 recommended way to sync? (w/o ISOs)

 Thanks,
 Andrew
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we request images for the ovirt image repository?

2014-05-04 Thread Andrew Lau
Awesome - thanks!
Is there a reason why the actual size shows up as significantly
larger than the template itself?

Andrew.

On Sun, May 4, 2014 at 6:05 PM, Oved Ourfalli ov...@redhat.com wrote:
 Hi

 No real form for that.
 E-mail to this mailing list is a good way to request that for now.
 I've uploaded the image. Didn't test/play with it yet, but it is there.

 Oved

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: users users@ovirt.org
 Sent: Sunday, May 4, 2014 7:11:21 AM
 Subject: [ovirt-users] Can we request images for the ovirt image repository?

 Hi,

 Is there a form where we can request images in the public ovirt image repo?

 Either way, it'd be nice if we could get the project atomic images added
 http://rpm-ostree.cloud.fedoraproject.org/project-atomic/images/f20/qemu/

 Cheers.
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can we request images for the ovirt image repository?

2014-05-04 Thread Andrew Lau
Makes sense, thanks again.

On Sun, May 4, 2014 at 6:18 PM, Oved Ourfalli ov...@redhat.com wrote:
 The image in the link is compressed.
 I decompressed it before uploading.

 Oved

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Oved Ourfalli ov...@redhat.com
 Cc: users users@ovirt.org
 Sent: Sunday, May 4, 2014 11:08:37 AM
 Subject: Re: [ovirt-users] Can we request images for the ovirt image  
 repository?

 Awesome - thanks!
 Is there a reason why the actual size shows up as significantly
 larger than the template itself?

 Andrew.

 On Sun, May 4, 2014 at 6:05 PM, Oved Ourfalli ov...@redhat.com wrote:
  Hi
 
  No real form for that.
  E-mail to this mailing list is a good way to request that for now.
  I've uploaded the image. Didn't test/play with it yet, but it is there.
 
  Oved
 
  - Original Message -
  From: Andrew Lau and...@andrewklau.com
  To: users users@ovirt.org
  Sent: Sunday, May 4, 2014 7:11:21 AM
  Subject: [ovirt-users] Can we request images for the ovirt image
  repository?
 
  Hi,
 
  Is there a form where we can request images in the public ovirt image
  repo?
 
  Either way, it'd be nice if we could get the project atomic images added
  http://rpm-ostree.cloud.fedoraproject.org/project-atomic/images/f20/qemu/
 
  Cheers.
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Can we request images for the ovirt image repository?

2014-05-03 Thread Andrew Lau
Hi,

Is there a form where we can request images in the public ovirt image repo?

Either way, it'd be nice if we could get the project atomic images added
http://rpm-ostree.cloud.fedoraproject.org/project-atomic/images/f20/qemu/

Cheers.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] iso-uploader with hosted-engine

2014-05-03 Thread Andrew Lau
Hi Garry,

I hope I understand your case here, the terminology is a little different.

Adding the hosted-engine to the gluster network is an interesting
approach I tried. It works to an extent but will have issues with
quorum (depending on your setup).

Anyway try this:

# Generate random UUID and MAC Address - or create your own by hand..
yum -y install python-virtinst
echo  'import virtinst.util ; print
virtinst.util.uuidToString(virtinst.util.randomUUID())' | python
echo  'import virtinst.util ; print virtinst.util.randomMAC()' | python

hosted-engine --set-maintenance --mode=global

# On all installed hosts
nano /etc/ovirt-hosted-engine/vm.conf
# insert under earlier nicModel
# replace macaddress and uuid from above
# increment slot
devices={nicModel:pv,macAddr:00:16:3e:j1:4b:22,linkActive:true,network:storage_network,filter:vdsm-no-mac-spoofing,specParams:{},deviceId:fds24208-a234-z123-6053-32c9c0361f96,address:{bus:0x00,slot:0x04,
domain:0x, type:pci,function:0x0},device:bridge,type:interface}

hosted-engine --vm-shutdown
hosted-engine --vm-start

hosted-engine --set-maintenance --mode=none

Now your hosted-engine VM should come up with another NIC, this'll
work for modifying your existing hosts. But I'm not sure how to apply
this to the answers file if you were to add a new host to the
configuration.

HTH

On Tue, Apr 29, 2014 at 6:08 PM, Garry Tiedemann
garrytiedem...@networkvideo.com.au wrote:
 Hi guys,

 I built my hosted-engine with three nodes (3.4, CentOS), and built a Gluster
 ISO domain, connected it up, all good.

 Until I come to engine-iso-uploader, it is present on the hosted-engine but
 not on the VMs.
 And my gluster subnet is not routed. So the VM can't see it to upload ISOs
 into.

 How are other people getting around this?

 I can see:
 1. Obvious - make the Gluster LAN routed. It wouldn't hurt, just for
 transferring a few ISOs.
 2. Add a second NIC to the hosted-engine VM, so that it can see the gluster
 LAN
 3. put engine-iso-uploader on one of the nodes, is that possible?

 My preference is to connect hosted-engine to the Gluster LAN.

 Is it possible to add another NIC to the VM, after it's built? (I don't want
 to rebuild if avoidable; I've done that enough times).
 I know that the ovirt GUI isn't aware of the first NIC on the VM, so I
 haven't dared to try adding a NIC there. Is that safe?

 Thanks in advance for your thoughts on this.

 cheers,

 Garry



 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt Hosted Engine and gluster mount - native nfs?

2014-05-02 Thread Andrew Lau
On Sat, May 3, 2014 at 2:43 AM, Jason Brooks jbro...@redhat.com wrote:


 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: users users@ovirt.org
 Sent: Thursday, May 1, 2014 10:30:47 PM
 Subject: [ovirt-users] oVirt Hosted Engine and gluster mount - native nfs?

 Hi,

 I'm having many issues with the inbuilt gluster nfs server (crashing
 and disconnecting). I was wondering, if anyone's tried exporting the
 gluster mount through glusterfs and then running a NFS server locally?

 I haven't tried this, but I'm testing gluster nfs w/ ctdb  a virtual IP
 and that seems to be working so far...

 Jason

Some reason my gluster nfs servers were notorious for crashing and
virtual IPs with keepalived kept interfering with the interfaces in
oVirt which made it a pain to modify networks within the UI.
I went a head and tried this anyway, and it appears to be holding up a
lot better than the gluster nfs server. So far so good!



 eg.
 storage servers (gluster) - hypervisor (gluster client - export nfs
 of gluster client)

 Does this pose any issues with sanlock if there are then multiple nfs
 servers? I've been simply mounting the nfs servers as
 localhost:/hosted-engine as it means another host can die without
 effecting the nfs connection. My only concern would be regarding
 sanlock/nfslock as there'd be multiple NFS servers.

 Thoughts?

 Thanks,
 Andrew
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Hosted Engine started VM Multiple Times

2014-04-28 Thread Andrew Lau
Hi,

I added a new node to my hosted cluster today, only one other node was
active in the cluster at that time. It was running the hosted-engine,
however during the install it's agent died and when the second node came
up. It started the hosted-engine, causing the hosted-engine to be run on
both nodes.

Has anyone had this happen? This seems very dangerous, as both VMs were
running simultaneously.. data corruption alert!

Andrew.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] cloud-init woes w/ templates

2014-04-17 Thread Andrew Lau
On Thu, Apr 17, 2014 at 7:40 PM, Michal Skrivanek
michal.skriva...@redhat.com wrote:

 On Apr 12, 2014, at 15:39 , Andrew Lau and...@andrewklau.com wrote:

 Hi,

 Has anyone had any success with cloud-init and templates with ovirt
 3.4? So far, it seems to be able to configure things like networks
 etc. But when it goes to do with passwords, it must be set again in
 the run once or in the Initial Run even if Use already configured
 password is set.

 Hi,
 which version you're talking about?
It seems the key won't save unless a password is specified.



 Another thing, why is it only setup to change the root password? By
 default cloud-init will block root, so nearly all images need to be
 modified ie.
 sed -i 's/disable_root: 1/disable_root: 0\nchpasswd: { expire: False
 }/g' /etc/cloud/cloud.cfg

 yeah. this has been fixed recently
That's good news!



 Otherwise, you'll login and it'll ask you to change your password.
 Defeats the purpose of setting it through cloud-init? I'm also not
 being able to just set an SSH key, it insists a password otherwise the
 key won't get uploaded.

 hm, not sure, Shahar?


 Finally, templates seem to be lacking validation. Where Initial Run
 and Run Once will give the red box if you have the wrong syntax
 while templates don't care.

 syntax of what?
eg. you put a netmask of 24 instead of the 255.255.255.0 it wants, in
the template section it won't give that red box warning it's invalid.
Initial Run/Run Once will..


 Thanks,
 michal


 Thanks,
 Andrew
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] cloud-init woes w/ templates

2014-04-17 Thread Andrew Lau
On Thu, Apr 17, 2014 at 10:05 PM, Michal Skrivanek
michal.skriva...@redhat.com wrote:

 On Apr 17, 2014, at 13:13 , Andrew Lau and...@andrewklau.com wrote:

 On Thu, Apr 17, 2014 at 7:40 PM, Michal Skrivanek
 michal.skriva...@redhat.com wrote:

 On Apr 12, 2014, at 15:39 , Andrew Lau and...@andrewklau.com wrote:

 Hi,

 Has anyone had any success with cloud-init and templates with ovirt
 3.4? So far, it seems to be able to configure things like networks
 etc. But when it goes to do with passwords, it must be set again in
 the run once or in the Initial Run even if Use already configured
 password is set.

 Hi,
 which version you're talking about?
 It seems the key won't save unless a password is specified.

 sorry, I mean which oVirt version you're talking about:)

3.4





 Another thing, why is it only setup to change the root password? By
 default cloud-init will block root, so nearly all images need to be
 modified ie.
 sed -i 's/disable_root: 1/disable_root: 0\nchpasswd: { expire: False
 }/g' /etc/cloud/cloud.cfg

 yeah. this has been fixed recently
 That's good news!



 Otherwise, you'll login and it'll ask you to change your password.
 Defeats the purpose of setting it through cloud-init? I'm also not
 being able to just set an SSH key, it insists a password otherwise the
 key won't get uploaded.

 hm, not sure, Shahar?


 Finally, templates seem to be lacking validation. Where Initial Run
 and Run Once will give the red box if you have the wrong syntax
 while templates don't care.

 syntax of what?
 eg. you put a netmask of 24 instead of the 255.255.255.0 it wants, in
 the template section it won't give that red box warning it's invalid.
 Initial Run/Run Once will..

 ah, ok, good catch, definitely a bug ….(however there was a general 
 validation issue in the Edit VM….Martin? related/same issue?)



 Thanks,
 michal


 Thanks,
 Andrew
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted engine

2014-04-15 Thread Andrew Lau
Technically yes, I believe there's a 10 minute or so delay before
it'll come up on the other host.
Pretty sure iscsi is not available for now, only NFS.

On Wed, Apr 16, 2014 at 4:08 AM, Maurice James mja...@media-node.com wrote:

 Scenario
 I have a hosted engine setup with 2 nodes with shared iscsi storage.

 Question:
 If I yanked the plug on the host that the hosted engine vm is running on,
 will it come back up on the remaining host without any intervention?

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted engine

2014-04-15 Thread Andrew Lau
Not that I know of.. someone else may now. Sorry

On Wed, Apr 16, 2014 at 10:18 AM, Maurice James mja...@media-node.com wrote:
 Is there any way to change that 10 minute delay?

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Maurice James mja...@media-node.com
 Cc: users users@ovirt.org
 Sent: Tuesday, April 15, 2014 7:24:40 PM
 Subject: Re: [ovirt-users] Hosted engine

 Technically yes, I believe there's a 10 minute or so delay before
 it'll come up on the other host.
 Pretty sure iscsi is not available for now, only NFS.

 On Wed, Apr 16, 2014 at 4:08 AM, Maurice James mja...@media-node.com wrote:

 Scenario
 I have a hosted engine setup with 2 nodes with shared iscsi storage.

 Question:
 If I yanked the plug on the host that the hosted engine vm is running on,
 will it come back up on the remaining host without any intervention?

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Users] hosted engine setup (NFS)

2014-04-14 Thread Andrew Lau
On Mon, Apr 14, 2014 at 10:23 PM, René Koch rk...@linuxland.at wrote:

 On 04/14/2014 12:39 PM, Sandro Bonazzola wrote:

 Il 03/03/2014 12:39, René Koch ha scritto:

 Hi,

 I installed hosted engine and faced an issue with NFS during
 installation.

 First of all, I'm using GlusterFS on my storage and ovirt engine doesn't
 support GlusterFS yet, only NFS.

 But for NFS I can't use mountprotpo=tcp as hosted-engine --setup doesn't
 ask for NFS options.

 So I manually edited the following file:

 /usr/share/ovirt-hosted-engine-setup/plugins/ovirt-hosted-engine-setup/storage/storage.py

 and changed opts.append('vers=3'):

  if domain_type == 'nfs3':
  fstype = 'nfs'
  opts.append('vers=3,mountproto=tcp')

 My question is now: is it possible to ask for NFS options during setup or
 do you think this can lead into problems? NFS via TCP worked fine for me for
 one week until I rebooted the host today (did reboot tests last weeks,
 too which was fine) and can't start hosted engine anymore (see other mail
 thread), but I think the other issue is not NFS mountproto related.



 Well, in hosted-engine setup we don't ask for additional options because
 we don't store them.
 We just ask for nfs3 or nfs4 because we pass that value as protocol
 version to VDSM connectStorageServer verb.
 The above change affects only the temporary mount done for validating the
 domain.


 Thanks a lot for the information.

 Btw, I can mount mit GlusterFS 3.4.2 NFS share now without specifying -o
 mountproto=tcp. Is upd now possible or is the protocol determined
 automatically now? I didn't test if hosted-engine-setup is able to mount
 GlusterFS NFS shares now without hacks, too - only discovered this new
 behavior on my hosts.

Something I noticed, is you have to restart the gluster volume after
it's created to get the NFS server to come back up. It's a little
buggy...
If you check gluster volume status you'll see that before the
restart the nfs server is down. I'm not sure why though


 Slightly off-topic question:
 The storage options are stored in
 /etc/ovirt-hosted-engine/hosted-engine.conf, right? If I want to change the
 ip address of my storage I simply put engine into global maintenance mode,
 change IP in hosted-engine.conf and re-enable hsoted-engine vm again? Or are
 there more steps required?


 Regards,
 René



 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Users] oVirt 3.4 Templates break Foreman Provisioning/Adding host

2014-04-13 Thread Andrew Lau
Have you had any recent luck?

On Thu, Apr 10, 2014 at 3:40 AM, Jorick Astrego j.astr...@netbulae.eu wrote:
 Sorry, I see you already updated this bug... The name Matt instead of Yama
 Kasi  misled me :-%



 On Wed, 2014-04-09 at 19:33 +0200, Jorick Astrego wrote:

 I think this is the bug you're experiencing:

 http://projects.theforeman.org/issues/4996

 and maybe related to:

 http://projects.theforeman.org/issues/4684

 Kind regards,

 Jorick Astrego
 Netbulae B.V.

 On Tue, 2014-04-08 at 21:05 +0200, Matt . wrote:

 Yeah that is no problem but it seems to hang only on the template part...
 when you have a Default DC it should not happen.


 As I don't have it (and don't want it to avoid confusions) I'm stuck there
 and that needs a workaround :)



 2014-04-08 20:59 GMT+02:00 Darrell Budic darrell.bu...@zenfire.com:

 I didn’t mean to imply they worked perfectly, but you can make it provision
 a new host successfully, and it does see to do host management fine with
 puppet. Here’s the workaround I’m using at the moment (and need to get off
 my butt and report, as soon as I finish cleaning up openssl versions…):


 0: functioning smart-proxy for dhcp  tftp, ovirt compute resource bound and
 displaying info on currently running vms, compute resources setup with
 custom template (had to uncheck the one pci card option even though there’s
 only one display)
 1: create new host in foreman on ovirt compute resource, save/commit
   : foreman does it’s stuff, actually creates the VM, then hangs waiting for
 ovirt to start the VM, barber pole on foreman screen
   : you can try starting the VM by hand, it will start to kickstart, then
 fail partway through while not finding the kickstart template from foreman
 2: let this barber pole time out (default time 600 sec, might want to
 shorten for testing), then save the host again
   : it will go to host view, and now it will actually access the kickstart
 templates
   : but it deleted the VM! so you have to recreate it by hand
 3: recreate and PXE boot your VM, this time it builds successfully and
 foreman takes over management just fine (including vm power management,
 oddly enough)


 A little headache, but I haven’t had time to followup on it with foreman
 yet.



 On Apr 8, 2014, at 12:52 PM, Matt . yamakasi@gmail.com wrote:

 Thanks for the update!


 The strange thing is that it's not working in any way (new provisioning) on
 1.5. Back those days I installed a new 1.4.2 as I wanted to migrate to
 CentOS anyway because a 1.4.2 stopped working well, even the rbovirt update
 and so on didn't fix it.


 I have ran 1.4.2 very well against 3.3 and if I'm right also for one day to
 3.4, but I didn't need to provision after my tests so did the upgrade which
 went well.


 I need 3.4 as it supports mixed Storage, so I'm bound to that as I don't
 want to do such a major upgrade on a running system for now, so I went and
 well. Only FM doesn't mix with it, also not the nightly's it seems.


 Are there other options ?





 2014-04-08 16:56 GMT+02:00 Darrell Budic darrell.bu...@zenfire.com:

 If this is the same problem I had, this is a known issue in Foreman 1.4.2.
 API update in Ovirt broke the rbovirt integration component:
 http://projects.theforeman.org/issues/4346#change-13781 . I didn't
 investigate mine in as much depth, but your original symptoms look the same
 as what I saw.


 1.5 nightlies mostly work, I'm using them with good success. They appear to
 have some trouble starting a new VM for provisioning, I need to get on
 reproducing and reporting that. I find you can fail the first build attempt
 (no provisioning template until then), then manually start the VM it runs it
 properly from there.


 On Apr 8, 2014, at 7:58 AM, Matt . yamakasi@gmail.com wrote:

 Hi,

 The only thing I see in the engine.log is a bunch of:

 2014-04-08 14:47:51,167 INFO  [org.ovirt.engine.core.bll.LoginUserCommand]
 (ajp--127.0.0.1-8702-3) Running command: LoginUserCommand internal: false.
 2014-04-08 14:47:51,303 INFO  [org.ovirt.engine.core.bll.LogoutUserCommand]
 (ajp--127.0.0.1-8702-3) [621339b] Running command: LogoutUserCommand
 internal: false.
 2014-04-08 14:47:51,321 INFO
 [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
 (ajp--127.0.0.1-8702-3) [621339b] Correlation ID: 621339b, Call Stack: null,
 Custom Event ID: -1, Message: User admin logged out.
 2014-04-08 14:47:51,352 INFO  [org.ovirt.engine.core.bll.LoginUserCommand]
 (ajp--127.0.0.1-8702-4) Running command: LoginUserCommand internal: false.
 2014-04-08 14:47:51,418 INFO  [org.ovirt.engine.core.bll.LogoutUserCommand]
 (ajp--127.0.0.1-8702-4) [67db2722] Running command: LogoutUserCommand
 internal: false.
 2014-04-08 14:47:51,429 INFO
 [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
 (ajp--127.0.0.1-8702-4) [67db2722] Correlation ID: 67db2722, Call Stack:
 null, Custom Event ID: -1, Message: User admin logged out.



 And in the server.log:

 2014-04-08 

[ovirt-users] cloud-init woes w/ templates

2014-04-12 Thread Andrew Lau
Hi,

Has anyone had any success with cloud-init and templates with ovirt
3.4? So far, it seems to be able to configure things like networks
etc. But when it goes to do with passwords, it must be set again in
the run once or in the Initial Run even if Use already configured
password is set.

Another thing, why is it only setup to change the root password? By
default cloud-init will block root, so nearly all images need to be
modified ie.
sed -i 's/disable_root: 1/disable_root: 0\nchpasswd: { expire: False
}/g' /etc/cloud/cloud.cfg

Otherwise, you'll login and it'll ask you to change your password.
Defeats the purpose of setting it through cloud-init? I'm also not
being able to just set an SSH key, it insists a password otherwise the
key won't get uploaded.

Finally, templates seem to be lacking validation. Where Initial Run
and Run Once will give the red box if you have the wrong syntax
while templates don't care.

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Users] Importing Image from Glance times out

2014-04-10 Thread Andrew Lau
On Thu, Apr 10, 2014 at 4:53 PM, Oved Ourfalli ov...@redhat.com wrote:
 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Oved Ourfalli ov...@redhat.com
 Cc: users users@ovirt.org
 Sent: Thursday, April 10, 2014 4:52:56 AM
 Subject: Re: [ovirt-users] [Users] Importing Image from Glance times out

 Found the URL - http://glance.ovirt.org:9292/v1/images/imageid

 On Thu, Apr 10, 2014 at 11:42 AM, Andrew Lau and...@andrewklau.com wrote:
  Thanks for the quick fix! Is there a way to patch that manually or
  will we need to wait for the next release?
 

 You can either wait for the next release, or take the 3.4 nightly build from 
 http://resources.ovirt.org/pub/ovirt-3.4-snapshot/rpm/

nice -- keep forgetting about the nightly. What's the difference
between the master nightly and static nightly?




  Is there also any chance that those glance images are available
  through a HTTP method or something, I'd be interested in importing
  that through the export domain to dig around a little.
 

 I see you're already on it :-)


  Thanks,
  Andrew.
 
  On Wed, Apr 9, 2014 at 10:08 PM, Oved Ourfalli ov...@redhat.com wrote:
  Posted a fix in:
  http://gerrit.ovirt.org/#/c/26601/
 
  Being tested and reviewed as we speak.
  Andrew + Elad - thank you for bringing this issue up, and helping diagnose
  it.
 
  Regards,
  Oved
 
  - Original Message -
  From: Andrew Lau and...@andrewklau.com
  To: Oved Ourfalli ov...@redhat.com
  Cc: users users@ovirt.org
  Sent: Wednesday, April 9, 2014 1:52:38 PM
  Subject: Re: [Users] Importing Image from Glance times out
 
  Yeah I imported it as a template, I'll try import it as an image now
  just to verify.
 
  On Wed, Apr 9, 2014 at 8:43 PM, Oved Ourfalli ov...@redhat.com wrote:
   Did you import it as template or just as an image.
   If as template then it can be nice to see if you're getting the same
   NullPointerException that Elad gets (although it was fixed a few weeks
   ago, so perhaps it is another issue).
  
   Thanks,
   Oved
  
   - Original Message -
   From: Andrew Lau and...@andrewklau.com
   To: Elad Ben Aharon ebena...@redhat.com
   Cc: Oved Ourfalli ov...@redhat.com, users users@ovirt.org
   Sent: Wednesday, April 9, 2014 1:37:34 PM
   Subject: Re: [Users] Importing Image from Glance times out
  
   Do you still want the log files? Is there anything in specific you're
   looking for, or should I just upload the whole files
  
   I also wonder, could you compare the md5? Out of the two attempts on
   the centos image (not docker) the md5sum gave me
   62bc26a8a07be5adbef63b2eb1a18aeb
  
   If it's different to others, we could assume a failed transfer? I'm
   assuming it's just the timeout of the import process as the smaller
   CirrOS image worked fine.
  
   On Wed, Apr 9, 2014 at 8:32 PM, Elad Ben Aharon ebena...@redhat.com
   wrote:
Oved, I had the same thing:
https://bugzilla.redhat.com/show_bug.cgi?id=1085712
   
- Original Message -
From: Oved Ourfalli ov...@redhat.com
To: Andrew Lau and...@andrewklau.com
Cc: users users@ovirt.org
Sent: Wednesday, April 9, 2014 1:29:57 PM
Subject: Re: [Users] Importing Image from Glance times out
   
Do you see any failure in the log?
Can you attach both the engine and the vdsm log?
iirc the SPM (Federico?) should be the one importing the image, so
if
you
look for a process with curl (ps -ef | grep -i curl) then you'll
be
able
to see the import process (just to check whether it is running or
not).
   
Thank you,
Oved
   
- Original Message -
From: Andrew Lau and...@andrewklau.com
To: users users@ovirt.org
Sent: Wednesday, April 9, 2014 1:23:59 PM
Subject: [Users] Importing Image from Glance times out
   
Hi,
   
Using the new 3.4 public glance repository, I was able to
successfully
import the tiny 12mb CirrOS image and it appeared in my data store.
   
However when trying the larger CentOS image, it took much longer.
For
some reason I can only push 50Kbps from any of the ovirt
infrastructure so after many hours in the datastore I can see it's
finished downloading the full 1gb image but it'll remain locked in
the
ovirt engine.
   
Any thoughts on why this happens?
   
Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
   
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
 ___
 Users mailing list

Re: [ovirt-users] [Users] Importing Image from Glance times out

2014-04-10 Thread Andrew Lau
On Thu, Apr 10, 2014 at 5:07 PM, Andrew Lau and...@andrewklau.com wrote:
 On Thu, Apr 10, 2014 at 4:53 PM, Oved Ourfalli ov...@redhat.com wrote:
 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Oved Ourfalli ov...@redhat.com
 Cc: users users@ovirt.org
 Sent: Thursday, April 10, 2014 4:52:56 AM
 Subject: Re: [ovirt-users] [Users] Importing Image from Glance times out

 Found the URL - http://glance.ovirt.org:9292/v1/images/imageid

 On Thu, Apr 10, 2014 at 11:42 AM, Andrew Lau and...@andrewklau.com wrote:
  Thanks for the quick fix! Is there a way to patch that manually or
  will we need to wait for the next release?
 

 You can either wait for the next release, or take the 3.4 nightly build from 
 http://resources.ovirt.org/pub/ovirt-3.4-snapshot/rpm/

 nice -- keep forgetting about the nightly. What's the difference
 between the master nightly and static nightly?

Ignore that.. I should just stop asking questions :)





  Is there also any chance that those glance images are available
  through a HTTP method or something, I'd be interested in importing
  that through the export domain to dig around a little.
 

 I see you're already on it :-)


  Thanks,
  Andrew.
 
  On Wed, Apr 9, 2014 at 10:08 PM, Oved Ourfalli ov...@redhat.com wrote:
  Posted a fix in:
  http://gerrit.ovirt.org/#/c/26601/
 
  Being tested and reviewed as we speak.
  Andrew + Elad - thank you for bringing this issue up, and helping 
  diagnose
  it.
 
  Regards,
  Oved
 
  - Original Message -
  From: Andrew Lau and...@andrewklau.com
  To: Oved Ourfalli ov...@redhat.com
  Cc: users users@ovirt.org
  Sent: Wednesday, April 9, 2014 1:52:38 PM
  Subject: Re: [Users] Importing Image from Glance times out
 
  Yeah I imported it as a template, I'll try import it as an image now
  just to verify.
 
  On Wed, Apr 9, 2014 at 8:43 PM, Oved Ourfalli ov...@redhat.com wrote:
   Did you import it as template or just as an image.
   If as template then it can be nice to see if you're getting the same
   NullPointerException that Elad gets (although it was fixed a few weeks
   ago, so perhaps it is another issue).
  
   Thanks,
   Oved
  
   - Original Message -
   From: Andrew Lau and...@andrewklau.com
   To: Elad Ben Aharon ebena...@redhat.com
   Cc: Oved Ourfalli ov...@redhat.com, users users@ovirt.org
   Sent: Wednesday, April 9, 2014 1:37:34 PM
   Subject: Re: [Users] Importing Image from Glance times out
  
   Do you still want the log files? Is there anything in specific you're
   looking for, or should I just upload the whole files
  
   I also wonder, could you compare the md5? Out of the two attempts on
   the centos image (not docker) the md5sum gave me
   62bc26a8a07be5adbef63b2eb1a18aeb
  
   If it's different to others, we could assume a failed transfer? I'm
   assuming it's just the timeout of the import process as the smaller
   CirrOS image worked fine.
  
   On Wed, Apr 9, 2014 at 8:32 PM, Elad Ben Aharon ebena...@redhat.com
   wrote:
Oved, I had the same thing:
https://bugzilla.redhat.com/show_bug.cgi?id=1085712
   
- Original Message -
From: Oved Ourfalli ov...@redhat.com
To: Andrew Lau and...@andrewklau.com
Cc: users users@ovirt.org
Sent: Wednesday, April 9, 2014 1:29:57 PM
Subject: Re: [Users] Importing Image from Glance times out
   
Do you see any failure in the log?
Can you attach both the engine and the vdsm log?
iirc the SPM (Federico?) should be the one importing the image, so
if
you
look for a process with curl (ps -ef | grep -i curl) then you'll
be
able
to see the import process (just to check whether it is running or
not).
   
Thank you,
Oved
   
- Original Message -
From: Andrew Lau and...@andrewklau.com
To: users users@ovirt.org
Sent: Wednesday, April 9, 2014 1:23:59 PM
Subject: [Users] Importing Image from Glance times out
   
Hi,
   
Using the new 3.4 public glance repository, I was able to
successfully
import the tiny 12mb CirrOS image and it appeared in my data 
store.
   
However when trying the larger CentOS image, it took much longer.
For
some reason I can only push 50Kbps from any of the ovirt
infrastructure so after many hours in the datastore I can see it's
finished downloading the full 1gb image but it'll remain locked in
the
ovirt engine.
   
Any thoughts on why this happens?
   
Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
   
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  
  ___
  Users mailing

[ovirt-users] Etherpad for ovirt issues?

2014-04-10 Thread Andrew Lau
Hi,

Could I propose an option for an etherpad of basic issues people have
run into with ovirt, split into a few categories like:

- engine
- hosted-engine
- gluster

The pad would be used for known workarounds (primarily for setup) and
identifying issues before opening a BZ

The main motive is, over the past few days I've redeployed my
hosted-engine install about 10 times.. I've run into about 5-6 issues
like bridge creation with the setup on vlans, or the use of prefix in
ifcfg etc.

- I've reported some of the main ones where I sort of know what the
problem is, but quite a few I've just walked around it by doing things
in a different order.
- Quite a few times I've also opened a few BZs and by the time
soneone's asked for a specific log, I've already reinstalled the host.
(I just can't stand a dirty host :P )
- Sometimes I'm not sure if they should be considered a bug so just move on
- Sometimes people (I mean I) get lazy
- There's often sometimes things which don't work, are reported in the
website but only get one line and often gets overlooked.

Thoughts?

Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Users] using cloud-init?

2014-04-10 Thread Andrew Lau
On Wed, Mar 12, 2014 at 5:07 PM, Jason Brooks jbro...@redhat.com wrote:
 On Wed, 2014-03-12 at 01:52 -0400, Oved Ourfalli wrote:
 Did you set the OS type of the VM / template to some linux based OS
 type?

 That was it. Thanks!

I'm curious, did you try the run-once option or through a template? I
can't get set the password properly if it's done through the template
option.


 Jason

 The cloud-init data is passed only to linux VMs.
 A new patch changed that, and passed it to all non-windows VMs, so if
 you left the defaults, and the OS type is Other OS, then it doesn't
 work without the patch.

 See bug https://bugzilla.redhat.com/show_bug.cgi?id=1072764

 Oved

 - Original Message -
  From: Jason Brooks jbro...@redhat.com
  To: users@ovirt.org
  Sent: Tuesday, March 11, 2014 6:21:37 PM
  Subject: [Users] using cloud-init?
 
  Hi all --
 
  I've been trying, without success, to use cloud-init w/ oVirt 3.4 on
 Fedora
  19 hosts. I've had similar failure in the past, but here are the
 steps I'm
  taking currently:
 
  1. Import as template F19 image from ovirt-image-repository glance
 repo
  2. Create new vm based on that template
  3. Choose ovirtmgmt as the nic1 for the VM
  4. Show advanced options, click initial run, expand authentication,
 enter
  a root password, paste my public key into the allowed ssh keys field
  5. Hit OK, and then run the VM
  6. In the VM's console, I see it complain about No instance
 datasource found
  7. Unsurprisingly, I can't log in w/ pw or ssh.
 
  (By the way, are there any default creds for these images? I thought
 they
  might
  be based on the fedora cloud images, but their default uname fedora
 pw
  nothing
  doesn't work)
 
  I've tried some other derivations of this, launching from the Run
 Once menu,
  filling in various different fields, etc.
 
  Any clues?
 
  I don't see many people complaining about this, so I'm assuming it's
 working
  for other people. I don't know, maybe it's something with Fedora?
 
  Thanks, Jason
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Hosted Engine recovery failure of all HA - nodes

2014-04-09 Thread Andrew Lau
Hi,

On Apr 9, 2014 5:43 PM, Martin Sivak msi...@redhat.com wrote:

 Hi,

  I noticed this happens too, I think the issue is after N attempts the
  ovirt-ha-agent process will kill itself if it believes it can't access
  the storage or it fails in some other way.

 If the agent can't access storage or VDSM it waits for 60 seconds and
tries again. After three (iirc) failed attempts it shuts down.

Is there any reason it shuts down? Could it not be possible to just have it
sleep for x minutes? Have that sleep time exponentially scale after each
fail.

  The ovirt-ha-broker service
  however still remains and continues to calculate the score.

 The broker acts only as a data link, the score is computed by the agent.
The broker is used to propagate it to storage (and to collect data).

Thanks for clarifying, I remember seeing some reference to score in the
broker log. Assumed incorrectly.

  It'll be
  nice I guess if it could pro-actively restart the ha-agent every now
  and then.

 We actually have a bug that is related to this:
https://bugzilla.redhat.com/show_bug.cgi?id=1030441

 Greg, are you still working on it?

   What is the supposed procedure after a shutdown (graceful /
ungraceful)
   of Hosted-Engine HA nodes? Should the engine recover by itself? Should
   the running VM's be restarted automatically?

 If the agent-broker pair recovers and sanlock is not preventing taking
the lock (which was not released properly) then the engine VM should be
started automatically.

  If all the nodes come up at the same time, in my testing, it took 10
  minutes for the ha-agents to settle and then finally decide which host
  to bring up the engine.

 We set a 10 minute mandatory down time for a host when a VM start is not
successful. That might be because the sanlock still things somebody is
running the VM. The /var/log/ovirt-hosted-engine-ha/agent.log would help
here.

 Regards
 --
 Martin Sivák
 msi...@redhat.com
 Red Hat Czech
 RHEV-M SLA / Brno, CZ

 - Original Message -
  On Wed, Apr 9, 2014 at 2:09 AM, Daniel Helgenberger
  daniel.helgenber...@m-box.de wrote:
   Hello,
  
   I have an oVirt 3.4 hosted engine lab setup witch I am evaluating for
   production use.
  
   I simulated an ungraceful shutdown of all HA nodes (powercut) while
   the engine was running. After powering up, the system did not recover
   itself (it seemed).
   I had to restart the ovirt-hosted-ha service (witch was in a locked
   state) and then manually run 'hosted-engine --vm-start'.
 
  I noticed this happens too, I think the issue is after N attempts the
  ovirt-ha-agent process will kill itself if it believes it can't access
  the storage or it fails in some other way. The ovirt-ha-broker service
  however still remains and continues to calculate the score. It'll be
  nice I guess if it could pro-actively restart the ha-agent every now
  and then.
 
  
   What is the supposed procedure after a shutdown (graceful /
ungraceful)
   of Hosted-Engine HA nodes? Should the engine recover by itself? Should
   the running VM's be restarted automatically?
 
  I don't think any other VMs get restarted automatically, this is
  because the engine is used to ensure that the VM hasn't been restarted
  on another host. This is where power management etc comes into play.
 
  If all the nodes come up at the same time, in my testing, it took 10
  minutes for the ha-agents to settle and then finally decide which host
  to bring up the engine. Then technically... (untested) any VMs which
  you've marked as HA should be automatically brought back up by the
  engine. This would be 15-20 minutes to recover which feels a little
  slow.. although fairly automatic.
 
  
   Thanks,
   Daniel
  
  
  
  
  
  
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Setting up networks in oVirt

2014-04-09 Thread Andrew Lau
On Wed, Apr 9, 2014 at 6:35 PM, Andy Michielsen
andy.michiel...@gmail.com wrote:
 Hello,

 Can anyone give me some screenshot of what a network setup can look like if
 I wanted to isolate storage and vm traffic from the management traffic.


Something like attached?


 Kind regards.

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

attachment: Screenshot from 2014-04-09 18:37:39.png___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Setting up networks in oVirt

2014-04-09 Thread Andrew Lau
On Wed, Apr 9, 2014 at 6:52 PM, Andy Michielsen
andy.michiel...@gmail.com wrote:
 Hello Andrew,

 Yes that would be a start.

 What are the ranges of the subnets used.
 I'm asking because in my setup I connect to the ovirt engine from my subnet
 192.168.203.x and I also use that subnet to access my virtual machines.
 I suppose I can use completely different subnets for the ovirtmgmt and
 storage network ?


You can make up whatever subnetting your network has available. You
need to use VLANs to separate the network like in that screenshot..
otherwise use separate interfaces with different subnets to isolate
your traffic.

 Kind regards.


 2014-04-09 10:38 GMT+02:00 Andrew Lau and...@andrewklau.com:

 On Wed, Apr 9, 2014 at 6:35 PM, Andy Michielsen
 andy.michiel...@gmail.com wrote:
  Hello,
 
  Can anyone give me some screenshot of what a network setup can look like
  if
  I wanted to isolate storage and vm traffic from the management traffic.
 

 Something like attached?


  Kind regards.
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Setting up networks in oVirt

2014-04-09 Thread Andrew Lau
On Wed, Apr 9, 2014 at 7:05 PM, Andy Michielsen
andy.michiel...@gmail.com wrote:
 Hello,

 Okay. We do not use vlan's at the moment.

 I have my engine setup on a machine with 6 physical nic's available which
 also functions as my NFS server.
 My node only has 2 physical nic's.

 What setup would you propose in my case.


With only 2 nics, and no vlans it's really quite limited.

It'll probably be easier just to have like eth0 share management and
storage and eth1 for vm data. Using vlans would be more ideal though..

ovirt won't let you put multiple non-vlaned networks on the same NIC.

 Kind regards.


 2014-04-09 10:55 GMT+02:00 Andrew Lau and...@andrewklau.com:

 On Wed, Apr 9, 2014 at 6:52 PM, Andy Michielsen
 andy.michiel...@gmail.com wrote:
  Hello Andrew,
 
  Yes that would be a start.
 
  What are the ranges of the subnets used.
  I'm asking because in my setup I connect to the ovirt engine from my
  subnet
  192.168.203.x and I also use that subnet to access my virtual machines.
  I suppose I can use completely different subnets for the ovirtmgmt and
  storage network ?
 

 You can make up whatever subnetting your network has available. You
 need to use VLANs to separate the network like in that screenshot..
 otherwise use separate interfaces with different subnets to isolate
 your traffic.

  Kind regards.
 
 
  2014-04-09 10:38 GMT+02:00 Andrew Lau and...@andrewklau.com:
 
  On Wed, Apr 9, 2014 at 6:35 PM, Andy Michielsen
  andy.michiel...@gmail.com wrote:
   Hello,
  
   Can anyone give me some screenshot of what a network setup can look
   like
   if
   I wanted to isolate storage and vm traffic from the management
   traffic.
  
 
  Something like attached?
 
 
   Kind regards.
  
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  
 
 


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[Users] Importing Image from Glance times out

2014-04-09 Thread Andrew Lau
Hi,

Using the new 3.4 public glance repository, I was able to successfully
import the tiny 12mb CirrOS image and it appeared in my data store.

However when trying the larger CentOS image, it took much longer. For
some reason I can only push 50Kbps from any of the ovirt
infrastructure so after many hours in the datastore I can see it's
finished downloading the full 1gb image but it'll remain locked in the
ovirt engine.

Any thoughts on why this happens?

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Importing Image from Glance times out

2014-04-09 Thread Andrew Lau
Do you still want the log files? Is there anything in specific you're
looking for, or should I just upload the whole files

I also wonder, could you compare the md5? Out of the two attempts on
the centos image (not docker) the md5sum gave me
62bc26a8a07be5adbef63b2eb1a18aeb

If it's different to others, we could assume a failed transfer? I'm
assuming it's just the timeout of the import process as the smaller
CirrOS image worked fine.

On Wed, Apr 9, 2014 at 8:32 PM, Elad Ben Aharon ebena...@redhat.com wrote:
 Oved, I had the same thing:
 https://bugzilla.redhat.com/show_bug.cgi?id=1085712

 - Original Message -
 From: Oved Ourfalli ov...@redhat.com
 To: Andrew Lau and...@andrewklau.com
 Cc: users users@ovirt.org
 Sent: Wednesday, April 9, 2014 1:29:57 PM
 Subject: Re: [Users] Importing Image from Glance times out

 Do you see any failure in the log?
 Can you attach both the engine and the vdsm log?
 iirc the SPM (Federico?) should be the one importing the image, so if you 
 look for a process with curl (ps -ef | grep -i curl) then you'll be able to 
 see the import process (just to check whether it is running or not).

 Thank you,
 Oved

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: users users@ovirt.org
 Sent: Wednesday, April 9, 2014 1:23:59 PM
 Subject: [Users] Importing Image from Glance times out

 Hi,

 Using the new 3.4 public glance repository, I was able to successfully
 import the tiny 12mb CirrOS image and it appeared in my data store.

 However when trying the larger CentOS image, it took much longer. For
 some reason I can only push 50Kbps from any of the ovirt
 infrastructure so after many hours in the datastore I can see it's
 finished downloading the full 1gb image but it'll remain locked in the
 ovirt engine.

 Any thoughts on why this happens?

 Thanks,
 Andrew
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Importing Image from Glance times out

2014-04-09 Thread Andrew Lau
Yeah I imported it as a template, I'll try import it as an image now
just to verify.

On Wed, Apr 9, 2014 at 8:43 PM, Oved Ourfalli ov...@redhat.com wrote:
 Did you import it as template or just as an image.
 If as template then it can be nice to see if you're getting the same 
 NullPointerException that Elad gets (although it was fixed a few weeks ago, 
 so perhaps it is another issue).

 Thanks,
 Oved

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Elad Ben Aharon ebena...@redhat.com
 Cc: Oved Ourfalli ov...@redhat.com, users users@ovirt.org
 Sent: Wednesday, April 9, 2014 1:37:34 PM
 Subject: Re: [Users] Importing Image from Glance times out

 Do you still want the log files? Is there anything in specific you're
 looking for, or should I just upload the whole files

 I also wonder, could you compare the md5? Out of the two attempts on
 the centos image (not docker) the md5sum gave me
 62bc26a8a07be5adbef63b2eb1a18aeb

 If it's different to others, we could assume a failed transfer? I'm
 assuming it's just the timeout of the import process as the smaller
 CirrOS image worked fine.

 On Wed, Apr 9, 2014 at 8:32 PM, Elad Ben Aharon ebena...@redhat.com wrote:
  Oved, I had the same thing:
  https://bugzilla.redhat.com/show_bug.cgi?id=1085712
 
  - Original Message -
  From: Oved Ourfalli ov...@redhat.com
  To: Andrew Lau and...@andrewklau.com
  Cc: users users@ovirt.org
  Sent: Wednesday, April 9, 2014 1:29:57 PM
  Subject: Re: [Users] Importing Image from Glance times out
 
  Do you see any failure in the log?
  Can you attach both the engine and the vdsm log?
  iirc the SPM (Federico?) should be the one importing the image, so if you
  look for a process with curl (ps -ef | grep -i curl) then you'll be able
  to see the import process (just to check whether it is running or not).
 
  Thank you,
  Oved
 
  - Original Message -
  From: Andrew Lau and...@andrewklau.com
  To: users users@ovirt.org
  Sent: Wednesday, April 9, 2014 1:23:59 PM
  Subject: [Users] Importing Image from Glance times out
 
  Hi,
 
  Using the new 3.4 public glance repository, I was able to successfully
  import the tiny 12mb CirrOS image and it appeared in my data store.
 
  However when trying the larger CentOS image, it took much longer. For
  some reason I can only push 50Kbps from any of the ovirt
  infrastructure so after many hours in the datastore I can see it's
  finished downloading the full 1gb image but it'll remain locked in the
  ovirt engine.
 
  Any thoughts on why this happens?
 
  Thanks,
  Andrew
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Setting up networks in oVirt

2014-04-09 Thread Andrew Lau
Install ovirt engine, then go to hosts - edit networks

it's a drag and drop interface, very easy..

No ipaddress needed for the vm data nic, they will get created as a bridge.

On Wed, Apr 9, 2014 at 9:44 PM, Andy Michielsen
andy.michiel...@gmail.com wrote:
 Hello Andrew,

 Let's say I'm going to implement it like you suggest.

 The first nic with 2 vlan's for management and storage. The second nic for
 the vm's.

 What would I go about defining this in ovirt ?

 Can you give me an example for the vlan for the management and the storage ?
 Do I need to assing a ip-adress to the nic for the vm's or is this
 transparent ?

 Kind regards.


 2014-04-09 12:16 GMT+02:00 Andrew Lau and...@andrewklau.com:

 On Wed, Apr 9, 2014 at 7:05 PM, Andy Michielsen
 andy.michiel...@gmail.com wrote:
  Hello,
 
  Okay. We do not use vlan's at the moment.
 
  I have my engine setup on a machine with 6 physical nic's available
  which
  also functions as my NFS server.
  My node only has 2 physical nic's.
 
  What setup would you propose in my case.
 

 With only 2 nics, and no vlans it's really quite limited.

 It'll probably be easier just to have like eth0 share management and
 storage and eth1 for vm data. Using vlans would be more ideal though..

 ovirt won't let you put multiple non-vlaned networks on the same NIC.

  Kind regards.
 
 
  2014-04-09 10:55 GMT+02:00 Andrew Lau and...@andrewklau.com:
 
  On Wed, Apr 9, 2014 at 6:52 PM, Andy Michielsen
  andy.michiel...@gmail.com wrote:
   Hello Andrew,
  
   Yes that would be a start.
  
   What are the ranges of the subnets used.
   I'm asking because in my setup I connect to the ovirt engine from my
   subnet
   192.168.203.x and I also use that subnet to access my virtual
   machines.
   I suppose I can use completely different subnets for the ovirtmgmt
   and
   storage network ?
  
 
  You can make up whatever subnetting your network has available. You
  need to use VLANs to separate the network like in that screenshot..
  otherwise use separate interfaces with different subnets to isolate
  your traffic.
 
   Kind regards.
  
  
   2014-04-09 10:38 GMT+02:00 Andrew Lau and...@andrewklau.com:
  
   On Wed, Apr 9, 2014 at 6:35 PM, Andy Michielsen
   andy.michiel...@gmail.com wrote:
Hello,
   
Can anyone give me some screenshot of what a network setup can
look
like
if
I wanted to isolate storage and vm traffic from the management
traffic.
   
  
   Something like attached?
  
  
Kind regards.
   
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
   
  
  
 
 


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Users] Post-Install Engine VM Changes Feasible?

2014-04-09 Thread Andrew Lau
On Tue, Apr 8, 2014 at 8:52 PM, Andrew Lau and...@andrewklau.com wrote:
 On Mon, Mar 17, 2014 at 8:01 PM, Sandro Bonazzola sbona...@redhat.com wrote:
 Il 15/03/2014 12:44, Giuseppe Ragusa ha scritto:
 Hi Joshua,

 --
 Date: Sat, 15 Mar 2014 02:32:59 -0400
 From: j...@wrale.com
 To: users@ovirt.org
 Subject: [Users] Post-Install Engine VM Changes Feasible?

 Hi,

 I'm in the process of installing 3.4 RC(2?) on Fedora 19.  I'm using hosted 
 engine, introspective GlusterFS+keepalived+NFS ala [1], across six nodes.

 I have a layered networking topology ((V)LANs for public, internal, 
 storage, compute and ipmi).  I am comfortable doing the bridging for each
 interface myself via /etc/sysconfig/network-scripts/ifcfg-*.

 Here's my desired topology: 
 http://www.asciiflow.com/#Draw6325992559863447154

 Here's my keepalived setup: 
 https://gist.github.com/josh-at-knoesis/98618a16418101225726

 I'm writing a lot of documentation of the many steps I'm taking.  I hope to 
 eventually release a distributed introspective all-in-one (including
 distributed storage) guide.

 Looking at vm.conf.in http://vm.conf.in, it looks like I'd by default end 
 up with one interface on my engine, probably on my internal VLAN, as
 that's where I'd like the control traffic to flow.  I definitely could do 
 NAT, but I'd be most happy to see the engine have a presence on all of the
 LANs, if for no other reason than because I want to send backups directly 
 over the storage VLAN.

 I'll cut to it:  I believe I could successfully alter the vdsm template 
 (vm.conf.in http://vm.conf.in) to give me the extra interfaces I require.
 It hit me, however, that I could just take the defaults for the initial 
 install.  Later, I think I'll be able to come back with virsh and make my
 changes to the gracefully disabled VM.  Is this true?

 [1] http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/

 Thanks,
 Joshua


 I started from the same reference[1] and ended up statically modifying 
 vm.conf.in before launching setup, like this:

 cp -a /usr/share/ovirt-hosted-engine-setup/templates/vm.conf.in 
 /usr/share/ovirt-hosted-engine-setup/templates/vm.conf.in.orig
 cat  EOM  /usr/share/ovirt-hosted-engine-setup/templates/vm.conf.in
 vmId=@VM_UUID@
 memSize=@MEM_SIZE@
 display=@CONSOLE_TYPE@
 devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1,
 type:drive},specParams:{},readonly:true,deviceId:@CDROM_UUID@,path:@CDROM@,device:cdrom,shared:false,type:disk@BOOT_CDROM@}
 devices={index:0,iface:virtio,format:raw,poolID:@SP_UUID@,volumeID:@VOL_UUID@,imageID:@IMG_UUID@,specParams:{},readonly:false,domainID:@SD_UUID@,optional:false,deviceId:@IMG_UUID@,address:{bus:0x00,
 slot:0x06, domain:0x, type:pci, 
 function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk@BOOT_DISK@}
 devices={device:scsi,model:virtio-scsi,type:controller}
 devices={index:4,nicModel:pv,macAddr:@MAC_ADDR@,linkActive:true,network:@BRIDGE@,filter:vdsm-no-mac-spoofing,specParams:{},deviceId:@NIC_UUID@,address:{bus:0x00,
 slot:0x03, domain:0x, type:pci, 
 function:0x0},device:bridge,type:interface@BOOT_PXE@}
 devices={index:8,nicModel:pv,macAddr:02:16:3e:4f:c4:b0,linkActive:true,network:lan,filter:vdsm-no-mac-spoofing,specParams:{},address:{bus:0x00,
 slot:0x09, domain:0x, type:pci, 
 function:0x0},device:bridge,type:interface@BOOT_PXE@}
 devices={device:console,specParams:{},type:console,deviceId:@CONSOLE_UUID@,alias:console0}
 vmName=@NAME@
 spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
 smp=@VCPUS@
 cpuType=@CPU_TYPE@
 emulatedMachine=@EMULATED_MACHINE@
 EOM


 Note that you should also be able to edit /etc/ovirt-hosted-engine/vm.conf 
 after setup:
 - put the system in global maintenance
 - edit the vm.conf file on all the hosts running the hosted engine
 - shutdown the vm: hosted-engine --vm-shutdown
 - start again the vm: hosted-engine --vm-start
 - exit global maintenance

 Giuseppe, Joshua: can you share your changes in a guide for Hosted engine 
 users on ovirt.org wiki?



 So would you simply just add a new line under the original devices line? ie.
 devices={nicModel:pv,macAddr:00:16:3e:6d:34:78,linkActive:true,network:ovirtmgmt,filter:vdsm-no-mac-spoofing,specParams:{},deviceId:0c8a1710-casd-407a-94e8-5b09e55fa141,address:{bus:0x00,
 slot:0x03, domain:0x, type:pci,
 function:0x0},device:bridge,type:interface}

 Are there any good practices for getting the mac addr so it won't be
 possible to clash with ones vdsm would generate? I assume the same
 applies for deviceid?
 Did you also change the slot?


This worked successfully:

yum -y install python-virtinst

# generate uuid and mac address
echo  'import virtinst.util ; print
virtinst.util.uuidToString(virtinst.util.randomUUID())' | python
echo  'import virtinst.util

Re: [ovirt-users] [Users] Importing Image from Glance times out

2014-04-09 Thread Andrew Lau
Thanks for the quick fix! Is there a way to patch that manually or
will we need to wait for the next release?

Is there also any chance that those glance images are available
through a HTTP method or something, I'd be interested in importing
that through the export domain to dig around a little.

Thanks,
Andrew.

On Wed, Apr 9, 2014 at 10:08 PM, Oved Ourfalli ov...@redhat.com wrote:
 Posted a fix in:
 http://gerrit.ovirt.org/#/c/26601/

 Being tested and reviewed as we speak.
 Andrew + Elad - thank you for bringing this issue up, and helping diagnose it.

 Regards,
 Oved

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Oved Ourfalli ov...@redhat.com
 Cc: users users@ovirt.org
 Sent: Wednesday, April 9, 2014 1:52:38 PM
 Subject: Re: [Users] Importing Image from Glance times out

 Yeah I imported it as a template, I'll try import it as an image now
 just to verify.

 On Wed, Apr 9, 2014 at 8:43 PM, Oved Ourfalli ov...@redhat.com wrote:
  Did you import it as template or just as an image.
  If as template then it can be nice to see if you're getting the same
  NullPointerException that Elad gets (although it was fixed a few weeks
  ago, so perhaps it is another issue).
 
  Thanks,
  Oved
 
  - Original Message -
  From: Andrew Lau and...@andrewklau.com
  To: Elad Ben Aharon ebena...@redhat.com
  Cc: Oved Ourfalli ov...@redhat.com, users users@ovirt.org
  Sent: Wednesday, April 9, 2014 1:37:34 PM
  Subject: Re: [Users] Importing Image from Glance times out
 
  Do you still want the log files? Is there anything in specific you're
  looking for, or should I just upload the whole files
 
  I also wonder, could you compare the md5? Out of the two attempts on
  the centos image (not docker) the md5sum gave me
  62bc26a8a07be5adbef63b2eb1a18aeb
 
  If it's different to others, we could assume a failed transfer? I'm
  assuming it's just the timeout of the import process as the smaller
  CirrOS image worked fine.
 
  On Wed, Apr 9, 2014 at 8:32 PM, Elad Ben Aharon ebena...@redhat.com
  wrote:
   Oved, I had the same thing:
   https://bugzilla.redhat.com/show_bug.cgi?id=1085712
  
   - Original Message -
   From: Oved Ourfalli ov...@redhat.com
   To: Andrew Lau and...@andrewklau.com
   Cc: users users@ovirt.org
   Sent: Wednesday, April 9, 2014 1:29:57 PM
   Subject: Re: [Users] Importing Image from Glance times out
  
   Do you see any failure in the log?
   Can you attach both the engine and the vdsm log?
   iirc the SPM (Federico?) should be the one importing the image, so if
   you
   look for a process with curl (ps -ef | grep -i curl) then you'll be
   able
   to see the import process (just to check whether it is running or not).
  
   Thank you,
   Oved
  
   - Original Message -
   From: Andrew Lau and...@andrewklau.com
   To: users users@ovirt.org
   Sent: Wednesday, April 9, 2014 1:23:59 PM
   Subject: [Users] Importing Image from Glance times out
  
   Hi,
  
   Using the new 3.4 public glance repository, I was able to successfully
   import the tiny 12mb CirrOS image and it appeared in my data store.
  
   However when trying the larger CentOS image, it took much longer. For
   some reason I can only push 50Kbps from any of the ovirt
   infrastructure so after many hours in the datastore I can see it's
   finished downloading the full 1gb image but it'll remain locked in the
   ovirt engine.
  
   Any thoughts on why this happens?
  
   Thanks,
   Andrew
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Users] Importing Image from Glance times out

2014-04-09 Thread Andrew Lau
Found the URL - http://glance.ovirt.org:9292/v1/images/imageid

On Thu, Apr 10, 2014 at 11:42 AM, Andrew Lau and...@andrewklau.com wrote:
 Thanks for the quick fix! Is there a way to patch that manually or
 will we need to wait for the next release?

 Is there also any chance that those glance images are available
 through a HTTP method or something, I'd be interested in importing
 that through the export domain to dig around a little.

 Thanks,
 Andrew.

 On Wed, Apr 9, 2014 at 10:08 PM, Oved Ourfalli ov...@redhat.com wrote:
 Posted a fix in:
 http://gerrit.ovirt.org/#/c/26601/

 Being tested and reviewed as we speak.
 Andrew + Elad - thank you for bringing this issue up, and helping diagnose 
 it.

 Regards,
 Oved

 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Oved Ourfalli ov...@redhat.com
 Cc: users users@ovirt.org
 Sent: Wednesday, April 9, 2014 1:52:38 PM
 Subject: Re: [Users] Importing Image from Glance times out

 Yeah I imported it as a template, I'll try import it as an image now
 just to verify.

 On Wed, Apr 9, 2014 at 8:43 PM, Oved Ourfalli ov...@redhat.com wrote:
  Did you import it as template or just as an image.
  If as template then it can be nice to see if you're getting the same
  NullPointerException that Elad gets (although it was fixed a few weeks
  ago, so perhaps it is another issue).
 
  Thanks,
  Oved
 
  - Original Message -
  From: Andrew Lau and...@andrewklau.com
  To: Elad Ben Aharon ebena...@redhat.com
  Cc: Oved Ourfalli ov...@redhat.com, users users@ovirt.org
  Sent: Wednesday, April 9, 2014 1:37:34 PM
  Subject: Re: [Users] Importing Image from Glance times out
 
  Do you still want the log files? Is there anything in specific you're
  looking for, or should I just upload the whole files
 
  I also wonder, could you compare the md5? Out of the two attempts on
  the centos image (not docker) the md5sum gave me
  62bc26a8a07be5adbef63b2eb1a18aeb
 
  If it's different to others, we could assume a failed transfer? I'm
  assuming it's just the timeout of the import process as the smaller
  CirrOS image worked fine.
 
  On Wed, Apr 9, 2014 at 8:32 PM, Elad Ben Aharon ebena...@redhat.com
  wrote:
   Oved, I had the same thing:
   https://bugzilla.redhat.com/show_bug.cgi?id=1085712
  
   - Original Message -
   From: Oved Ourfalli ov...@redhat.com
   To: Andrew Lau and...@andrewklau.com
   Cc: users users@ovirt.org
   Sent: Wednesday, April 9, 2014 1:29:57 PM
   Subject: Re: [Users] Importing Image from Glance times out
  
   Do you see any failure in the log?
   Can you attach both the engine and the vdsm log?
   iirc the SPM (Federico?) should be the one importing the image, so if
   you
   look for a process with curl (ps -ef | grep -i curl) then you'll be
   able
   to see the import process (just to check whether it is running or not).
  
   Thank you,
   Oved
  
   - Original Message -
   From: Andrew Lau and...@andrewklau.com
   To: users users@ovirt.org
   Sent: Wednesday, April 9, 2014 1:23:59 PM
   Subject: [Users] Importing Image from Glance times out
  
   Hi,
  
   Using the new 3.4 public glance repository, I was able to successfully
   import the tiny 12mb CirrOS image and it appeared in my data store.
  
   However when trying the larger CentOS image, it took much longer. For
   some reason I can only push 50Kbps from any of the ovirt
   infrastructure so after many hours in the datastore I can see it's
   finished downloading the full 1gb image but it'll remain locked in the
   ovirt engine.
  
   Any thoughts on why this happens?
  
   Thanks,
   Andrew
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  
   ___
   Users mailing list
   Users@ovirt.org
   http://lists.ovirt.org/mailman/listinfo/users
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] how to import a disk image w/o associated vm

2014-04-09 Thread Andrew Lau
On Wed, Apr 9, 2014 at 11:39 PM, Jeremiah Jahn
jerem...@goodinassociates.com wrote:
 One of the things that I have to where I work, is pull and push disk
 images onto appliances and other physical things.  This is generally
 done with a little dd magic.   We'll start up the appliance in a
 virtual machine and mess with it in one way or another, then when
 we're done we'll take that image and use it to image our appliances
 that we send out.  So we have this collection of  some_system.img
 files laying around.  With virt-manager I can just stick the image
 file in a storage pool somewhere, create a new vm, and attach the
 image.

 So my question is how is this done in ovirt. v2v and
 engine-upload-image all want a pre-configured vm to work with. That is
 not what I have. Is there anyway to do this? I thought it'd be
 something like drop an img onto the export domain, and import it,
 create a vm around it. Once done export the image back to the export
 domain and move it someplace to be useful. This is not the case as far
 as I can tell. Am I missing something?

I was just looking at the same thing a few hours ago.. so far I've
found two possible options:

- Create the VM with your disk, find the uuid of that and dd your
source image onto this newly created image
- You should be able to manually create items in your export domain,
they just need a metadata file.

First is a VM export template and the second is an exported VM

[root@ov-engine1 images]# cat
39efaa5f-394c-4842-b791-8eff831bca83/10824522-7e89-4f39-821d-bd7371b23f76.meta
DOMAIN=a571b1f5-b3c6-45d6-99dc-cdc5bdcdc249
VOLTYPE=LEAF
CTIME=1397108117
FORMAT=RAW
IMAGE=39efaa5f-394c-4842-b791-8eff831bca83
DISKTYPE=2
PUUID=----
LEGALITY=LEGAL
MTIME=1397108118
POOL_UUID=
DESCRIPTION=
TYPE=SPARSE
SIZE=41943040
EOF

[root@ov-engine1 images]# cat
9efe9fea-fb1d-4945-b10a-93d8914638bc/c06a63de-3761-485a-a6e9-92f6ed586254.meta
DOMAIN=a571b1f5-b3c6-45d6-99dc-cdc5bdcdc249
VOLTYPE=SHARED
CTIME=1397108361
FORMAT=RAW
IMAGE=9efe9fea-fb1d-4945-b10a-93d8914638bc
DISKTYPE=2
PUUID=----
LEGALITY=LEGAL
MTIME=1397108361
POOL_UUID=
SIZE=33554432
TYPE=SPARSE
DESCRIPTION=
EOF

I find the first option would probably be easier. Would be nice to see
a better alternative, ie. the glance import method is quite nice.



 -jj-
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] how to import a disk image w/o associated vm

2014-04-09 Thread Andrew Lau
On Thu, Apr 10, 2014 at 3:45 PM, Andrew Lau and...@andrewklau.com wrote:
 On Wed, Apr 9, 2014 at 11:39 PM, Jeremiah Jahn
 jerem...@goodinassociates.com wrote:
 One of the things that I have to where I work, is pull and push disk
 images onto appliances and other physical things.  This is generally
 done with a little dd magic.   We'll start up the appliance in a
 virtual machine and mess with it in one way or another, then when
 we're done we'll take that image and use it to image our appliances
 that we send out.  So we have this collection of  some_system.img
 files laying around.  With virt-manager I can just stick the image
 file in a storage pool somewhere, create a new vm, and attach the
 image.

 So my question is how is this done in ovirt. v2v and
 engine-upload-image all want a pre-configured vm to work with. That is
 not what I have. Is there anyway to do this? I thought it'd be
 something like drop an img onto the export domain, and import it,
 create a vm around it. Once done export the image back to the export
 domain and move it someplace to be useful. This is not the case as far
 as I can tell. Am I missing something?

 I was just looking at the same thing a few hours ago.. so far I've
 found two possible options:

 - Create the VM with your disk, find the uuid of that and dd your
 source image onto this newly created image
 - You should be able to manually create items in your export domain,
 they just need a metadata file.

 First is a VM export template and the second is an exported VM
Oops, other way round..


 [root@ov-engine1 images]# cat
 39efaa5f-394c-4842-b791-8eff831bca83/10824522-7e89-4f39-821d-bd7371b23f76.meta
 DOMAIN=a571b1f5-b3c6-45d6-99dc-cdc5bdcdc249
 VOLTYPE=LEAF
 CTIME=1397108117
 FORMAT=RAW
 IMAGE=39efaa5f-394c-4842-b791-8eff831bca83
 DISKTYPE=2
 PUUID=----
 LEGALITY=LEGAL
 MTIME=1397108118
 POOL_UUID=
 DESCRIPTION=
 TYPE=SPARSE
 SIZE=41943040
 EOF

 [root@ov-engine1 images]# cat
 9efe9fea-fb1d-4945-b10a-93d8914638bc/c06a63de-3761-485a-a6e9-92f6ed586254.meta
 DOMAIN=a571b1f5-b3c6-45d6-99dc-cdc5bdcdc249
 VOLTYPE=SHARED
 CTIME=1397108361
 FORMAT=RAW
 IMAGE=9efe9fea-fb1d-4945-b10a-93d8914638bc
 DISKTYPE=2
 PUUID=----
 LEGALITY=LEGAL
 MTIME=1397108361
 POOL_UUID=
 SIZE=33554432
 TYPE=SPARSE
 DESCRIPTION=
 EOF

 I find the first option would probably be easier. Would be nice to see
 a better alternative, ie. the glance import method is quite nice.



 -jj-
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Post-Install Engine VM Changes Feasible?

2014-04-08 Thread Andrew Lau
On Mon, Mar 17, 2014 at 8:01 PM, Sandro Bonazzola sbona...@redhat.com wrote:
 Il 15/03/2014 12:44, Giuseppe Ragusa ha scritto:
 Hi Joshua,

 --
 Date: Sat, 15 Mar 2014 02:32:59 -0400
 From: j...@wrale.com
 To: users@ovirt.org
 Subject: [Users] Post-Install Engine VM Changes Feasible?

 Hi,

 I'm in the process of installing 3.4 RC(2?) on Fedora 19.  I'm using hosted 
 engine, introspective GlusterFS+keepalived+NFS ala [1], across six nodes.

 I have a layered networking topology ((V)LANs for public, internal, storage, 
 compute and ipmi).  I am comfortable doing the bridging for each
 interface myself via /etc/sysconfig/network-scripts/ifcfg-*.

 Here's my desired topology: http://www.asciiflow.com/#Draw6325992559863447154

 Here's my keepalived setup: 
 https://gist.github.com/josh-at-knoesis/98618a16418101225726

 I'm writing a lot of documentation of the many steps I'm taking.  I hope to 
 eventually release a distributed introspective all-in-one (including
 distributed storage) guide.

 Looking at vm.conf.in http://vm.conf.in, it looks like I'd by default end 
 up with one interface on my engine, probably on my internal VLAN, as
 that's where I'd like the control traffic to flow.  I definitely could do 
 NAT, but I'd be most happy to see the engine have a presence on all of the
 LANs, if for no other reason than because I want to send backups directly 
 over the storage VLAN.

 I'll cut to it:  I believe I could successfully alter the vdsm template 
 (vm.conf.in http://vm.conf.in) to give me the extra interfaces I require.
 It hit me, however, that I could just take the defaults for the initial 
 install.  Later, I think I'll be able to come back with virsh and make my
 changes to the gracefully disabled VM.  Is this true?

 [1] http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/

 Thanks,
 Joshua


 I started from the same reference[1] and ended up statically modifying 
 vm.conf.in before launching setup, like this:

 cp -a /usr/share/ovirt-hosted-engine-setup/templates/vm.conf.in 
 /usr/share/ovirt-hosted-engine-setup/templates/vm.conf.in.orig
 cat  EOM  /usr/share/ovirt-hosted-engine-setup/templates/vm.conf.in
 vmId=@VM_UUID@
 memSize=@MEM_SIZE@
 display=@CONSOLE_TYPE@
 devices={index:2,iface:ide,address:{ controller:0, target:0,unit:0, bus:1,
 type:drive},specParams:{},readonly:true,deviceId:@CDROM_UUID@,path:@CDROM@,device:cdrom,shared:false,type:disk@BOOT_CDROM@}
 devices={index:0,iface:virtio,format:raw,poolID:@SP_UUID@,volumeID:@VOL_UUID@,imageID:@IMG_UUID@,specParams:{},readonly:false,domainID:@SD_UUID@,optional:false,deviceId:@IMG_UUID@,address:{bus:0x00,
 slot:0x06, domain:0x, type:pci, 
 function:0x0},device:disk,shared:exclusive,propagateErrors:off,type:disk@BOOT_DISK@}
 devices={device:scsi,model:virtio-scsi,type:controller}
 devices={index:4,nicModel:pv,macAddr:@MAC_ADDR@,linkActive:true,network:@BRIDGE@,filter:vdsm-no-mac-spoofing,specParams:{},deviceId:@NIC_UUID@,address:{bus:0x00,
 slot:0x03, domain:0x, type:pci, 
 function:0x0},device:bridge,type:interface@BOOT_PXE@}
 devices={index:8,nicModel:pv,macAddr:02:16:3e:4f:c4:b0,linkActive:true,network:lan,filter:vdsm-no-mac-spoofing,specParams:{},address:{bus:0x00,
 slot:0x09, domain:0x, type:pci, 
 function:0x0},device:bridge,type:interface@BOOT_PXE@}
 devices={device:console,specParams:{},type:console,deviceId:@CONSOLE_UUID@,alias:console0}
 vmName=@NAME@
 spiceSecureChannels=smain,sdisplay,sinputs,scursor,splayback,srecord,ssmartcard,susbredir
 smp=@VCPUS@
 cpuType=@CPU_TYPE@
 emulatedMachine=@EMULATED_MACHINE@
 EOM


 Note that you should also be able to edit /etc/ovirt-hosted-engine/vm.conf 
 after setup:
 - put the system in global maintenance
 - edit the vm.conf file on all the hosts running the hosted engine
 - shutdown the vm: hosted-engine --vm-shutdown
 - start again the vm: hosted-engine --vm-start
 - exit global maintenance

 Giuseppe, Joshua: can you share your changes in a guide for Hosted engine 
 users on ovirt.org wiki?



So would you simply just add a new line under the original devices line? ie.
devices={nicModel:pv,macAddr:00:16:3e:6d:34:78,linkActive:true,network:ovirtmgmt,filter:vdsm-no-mac-spoofing,specParams:{},deviceId:0c8a1710-casd-407a-94e8-5b09e55fa141,address:{bus:0x00,
slot:0x03, domain:0x, type:pci,
function:0x0},device:bridge,type:interface}

Are there any good practices for getting the mac addr so it won't be
possible to clash with ones vdsm would generate? I assume the same
applies for deviceid?
Did you also change the slot?



 I simply added a second nic (with a fixed MAC address from the 
 locally-administered pool, since I didn't know how to auto-generate one) and 
 added an
 index for nics too (mimicking the the storage devices setup already present).

 My network setup is much simpler than yours: ovirtmgmt bridge is on an 

Re: [Users] Hosted Engine recovery failure of all HA - nodes

2014-04-08 Thread Andrew Lau
On Wed, Apr 9, 2014 at 2:09 AM, Daniel Helgenberger
daniel.helgenber...@m-box.de wrote:
 Hello,

 I have an oVirt 3.4 hosted engine lab setup witch I am evaluating for
 production use.

 I simulated an ungraceful shutdown of all HA nodes (powercut) while
 the engine was running. After powering up, the system did not recover
 itself (it seemed).
 I had to restart the ovirt-hosted-ha service (witch was in a locked
 state) and then manually run 'hosted-engine --vm-start'.

I noticed this happens too, I think the issue is after N attempts the
ovirt-ha-agent process will kill itself if it believes it can't access
the storage or it fails in some other way. The ovirt-ha-broker service
however still remains and continues to calculate the score. It'll be
nice I guess if it could pro-actively restart the ha-agent every now
and then.


 What is the supposed procedure after a shutdown (graceful / ungraceful)
 of Hosted-Engine HA nodes? Should the engine recover by itself? Should
 the running VM's be restarted automatically?

I don't think any other VMs get restarted automatically, this is
because the engine is used to ensure that the VM hasn't been restarted
on another host. This is where power management etc comes into play.

If all the nodes come up at the same time, in my testing, it took 10
minutes for the ha-agents to settle and then finally decide which host
to bring up the engine. Then technically... (untested) any VMs which
you've marked as HA should be automatically brought back up by the
engine. This would be 15-20 minutes to recover which feels a little
slow.. although fairly automatic.


 Thanks,
 Daniel






 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Hosted-Engine purpose for gateway check?

2014-04-06 Thread Andrew Lau
On Sun, Apr 6, 2014 at 4:55 PM, Yedidyah Bar David d...@redhat.com wrote:
 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: users users@ovirt.org
 Sent: Sunday, April 6, 2014 3:10:19 AM
 Subject: [Users] Hosted-Engine purpose for gateway check?

 Hi,

 I was recently playing around with the new ovirt 3.4 ga, I'm very
 happy all those issues I reported got fixed :D

 I found a new issue regarding the use of PREFIX vs NETMASK which I've
 uploaded here https://bugzilla.redhat.com/show_bug.cgi?id=1084685

 Anyway -- I'm wondering what is the purpose for the gateway check in
 the hosted-engine setup? In my test case, I had the following NIC
 configuration

 eth0 - public (has gateway)
 eth1 - management
 eth1.1 - storage
 eth2 - vm data (no IP address)

 So during the hosted engine install, it will not let me assign eth2 as
 the NIC because it has no IP address or gateway. So I proceed to use
 eth1 instead as, as it has an IP address but again that would fail
 because no gateway. Luckily I have a L3 switch, so I put up a gateway
 for eth1 and that solved that issue.

 What is the gateway check supposed to achieve? I also tried to put in
 my eth0's IP address as the gateway but it still failed because of
 those config issues. If management/ovirtmgmt/vmdata are all on a L2
 switch environment, effectively there becomes no gateway and it
 prevents the installation.

 It actually does not need to really be a gateway. It's used only as
 part of a calculation trying to assess the liveliness of the host.
 See [1] for details, especially pages 33-34.

 [1] http://www.ovirt.org/images/8/88/Hosted_Engine_Deep_Dive.pdf
 --
 Didi

I've recall reading that pdf before - however my comments are a little
aimed towards why does the setup require the GATEWAY=x be in the
ifcfg-ethx file when it also asks for the gateway in the otopi setup.
It seems a little redundant and also prevents the ability to proceed
with the setup if you're in a L2 switch environment.

The hosted-engine VM will require a gateway as it only has one nic and
needs to be publicly accessible, so let's say we have:
eth1 - ovirtmgmt (172.16.0.10) - hosted-engine (192.168.100.10 w/
192.168.100.1 as gateway)

Isn't 192.168.100.1 the gateway we want to be checking for?

Although now that I think of it, I'm confused where the gateway check
has it's example scenario, is it just for checking to make sure the
hosted-engine will be externally accessible? Wouldn't it also work to
do something like ethtool and check the link exists instead.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[Users] Hosted-Engine purpose for gateway check?

2014-04-05 Thread Andrew Lau
Hi,

I was recently playing around with the new ovirt 3.4 ga, I'm very
happy all those issues I reported got fixed :D

I found a new issue regarding the use of PREFIX vs NETMASK which I've
uploaded here https://bugzilla.redhat.com/show_bug.cgi?id=1084685

Anyway -- I'm wondering what is the purpose for the gateway check in
the hosted-engine setup? In my test case, I had the following NIC
configuration

eth0 - public (has gateway)
eth1 - management
eth1.1 - storage
eth2 - vm data (no IP address)

So during the hosted engine install, it will not let me assign eth2 as
the NIC because it has no IP address or gateway. So I proceed to use
eth1 instead as, as it has an IP address but again that would fail
because no gateway. Luckily I have a L3 switch, so I put up a gateway
for eth1 and that solved that issue.

What is the gateway check supposed to achieve? I also tried to put in
my eth0's IP address as the gateway but it still failed because of
those config issues. If management/ovirtmgmt/vmdata are all on a L2
switch environment, effectively there becomes no gateway and it
prevents the installation.

Am I looking at this the wrong way?

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[Users] Foreman and oVirt 3.4 fails

2014-02-17 Thread Andrew Lau
Hi,

Is anyone using Foreman with oVirt 3.4?

Post upgrade, I appear to be having some issues with Foreman reporting 
undefined method `text' for nil:NilClass

I'm not sure if this ends up being a Foreman issue or oVirt, or it could
just be my install gone bad.. however last time it was the oVirt API had
been updated which caused issues with Foreman so I thought I'd post here
first.

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Foreman and oVirt 3.4 fails

2014-02-17 Thread Andrew Lau
Hi Frank,

Thanks for the link - much appreciated.

On Mon, Feb 17, 2014 at 9:09 PM, Frank Wall f...@moov.de wrote:

 Hi Andrew,

 On Mon, Feb 17, 2014 at 07:57:33PM +1100, Andrew Lau wrote:
  Is anyone using Foreman with oVirt 3.4?
 
  Post upgrade, I appear to be having some issues with Foreman reporting 
  undefined method `text' for nil:NilClass

 this issue was reported to Foreman:

 http://projects.theforeman.org/issues/4346#change-13781
 oVirt Compute Resource needs to be updated for rbovirt 0.0.21

 There is also an API change in recent oVirt releases which requires
 changes to rbovirt:

 https://github.com/abenari/rbovirt/pull/28
 work around change in Red Hat bugzilla #1038053


 Regards
 - Frank

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] SPICE behind NAT

2014-02-13 Thread Andrew Lau
You just need some proper DST and SRC Nat rules and you should be fine.

I use mikrotik so its slightly different but the same concept applies. For
windows, I don't know, never really cared much as no one uses windows on
our ovirt setup :)

But the client tools you linked are for the client accessing the spice
session.
On Feb 14, 2014 3:20 AM, Alan Murrell a...@murrell.ca wrote:

 Quoting Andrew Lau and...@andrewklau.com:

  Your value for SpiceDefaultProxy should be your external IP
 address/hostname otherwise external users will never know where to connect
 to.


 So the spice proxy would be going out the firewall then looping back in
 (also known as hairpinning), which in my experience is usually a
 behaviour denied by many firewalls as standard, which is what I believe is
 happening here.

  This then becomes more of a firewall issue as you're spice proxy is


 I agree.  Would you be willing to share the current IPTables rules on your
 external firewall so I can confirm this? (sanitised appropriately for
 actual IPs and/or hostnames, of course)  You can contact me off-list if you
 prefer.  This is more for curiousity/confirmation than anything else.

 I know that when I was on the same LAN as the oVirt box, I had to edit my
 local hosts file to point the proxy value to the oVirt box itself for the
 remote-viewer to connect to the Windows desktop.

 If that is indeed what is happening here, I think a better (and more
 universal) solution would be to have a VPN connection from the remote end
 user to the network where the oVirt/RHEV server is (site-to-site if the
 users are in an office and road warrior for remote individuals).  Not
 sure how much of a performance hit that might make, though.  Will need to
 do some testing.

  working. But just to confirm, if you open up console through chrome it
 should download a console.vv file rather than opening up remote-viewer
 natively, before you run it; open it with a text editor you'll see the
 proxy settings there.


 I took a look and the proxy settings are correct.

  The windows issue is probably just related to non proper drives installed.


 On the machine I am connecting from or the virtual machine I am connecting
 to?  I downloaded the client from the link here:

  http://www.spice-space.org/download.html

 Is there a different SPICE client for Windows that is recommended?

 -Alan
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] SPICE behind NAT

2014-02-11 Thread Andrew Lau
Your value for SpiceDefaultProxy should be your external IP
address/hostname otherwise external users will never know where to connect
to.

This then becomes more of a firewall issue as you're spice proxy is
working. But just to confirm, if you open up console through chrome it
should download a console.vv file rather than opening up remote-viewer
natively, before you run it; open it with a text editor you'll see the
proxy settings there.

The windows issue is probably just related to non proper drives installed.

On Wed, Feb 12, 2014 at 1:07 PM, Alan Murrell a...@murrell.ca wrote:

 Looks like I am talking to myself now, but I will post my latest findings,
 as I have had some time today to poke at this a bit.

 It seems that the issues I last posted about may be specific to whn using
 the Windows Remote-Viewer client, as that is what I was testing with
 yesterday (and when I was logged in remotely).  I can connect from the
 local network when using the Remote Viewer on my Linux laptop.  I will try
 from remote when I get home, but I still cannot connect from a local
 Windiws machine.

 Also, I wanted to confirm what the value for SpideDefaultProxy should
 be, when behine NAT.  Should it be:

   * the value of the external IP/hostname, or
   * the value of the internal IP/hostname of the server where the proxy is
 installed (in my case, on the All-In-One setup)

 The reason I ask is for a couple of reasons:

   * If I used the value of the external hostname, I was unable to connect
 from my Linux laptop on the local network (same symptoms as when trying to
 connect from the Windows PC, as detailed in my previous post).  However, if
 I edited my local hosts file to resolve hostname we use externally to the
 IP of the SPiceProxy server, I was then able to connect to the SPICE
 session.  I believe this is because our firewall does not allow
 hairpinning, so it wa denying the return connection
   * If the correct value is indeed the external IP/hostname, then if the
 firewall denies hairpinning connections, will the conenction from outside
 be blocked due to that as well?

 I hope the above makes sense.  Let me know if you need clarification on
 the above.  In any event, I will update on my test from outside.


 -Alan
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] SPICE behind NAT

2014-02-07 Thread Andrew Lau
On Sat, Feb 8, 2014 at 9:11 AM, Alan Murrell li...@murrell.ca wrote:

 Hi Andrew,

 Thanks for the reply.


 Quoting Andrew Lau and...@andrewklau.com:

  Just install squid proxy and port forward the 3128 port through your
 firewall you should be all good.


 Is squid installed on your oVirt box or is it on your firewall? Or did ypu
 srtup a seperate box ad the proxy? What you post above suggests you
 installed it in the oVirt machine?


Yup, I install squid on the oVirt engine as it was easier to setup and
configure. No point setting up a dedicated box just for the spice proxy
unless you need some strict policies.




  Here's a quick snippet from my notes:

 [snip]


btw the 172.16.0/24 addresses are the oVirt hosts.





  engine-config -s SpiceProxyDefault=http://public_ip_address:3128/


 Ah, so the IP I put is the *public* IP on the firewall (or at least the
 one I am connecting to), and not the private IP of the machine Squid is
 installed on?


Yup, this is the public IP address on the firewall.





 -Alan

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] SPICE behind NAT

2014-02-07 Thread Andrew Lau
Lots of variables here:

- Can you connect to squid from your browser?
- Did you modify the squid.conf to match your setup? (dst addresses, etc).
- iptables?
- restarted engine?
- If you're using ovirt 3.4 make sure you set the cluster policy too


On Sat, Feb 8, 2014 at 3:15 PM, Alan Murrell li...@murrell.ca wrote:

 I followed your notes, installing Squid on my oVirt server (I have an
 all-in-one installation).  I set a port forward on our firewall for port
 3128 to my oVirt server.

 I logged into the User Portal and tried connecting to the console, but I
 get Could not connect to graphic server (null).  Not sure if I have
 overlooked something?

 -Alan

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Gluster question

2014-02-05 Thread Andrew Lau
There was another recent post about this but a sum up was:

You must have power fencing to support VM HA otherwise they'll be an issue
with the engine not knowing whether the VM is still running and not bring
it up on a new host to avoid data corruption. Also make sure you have your
quorum setup properly based on your replication scenario so you can
withstand 1 host being lost.

I don't believe they'll keep running in a sense because of the host being
lost, but they would restart on another host. At least that's what I've
noticed in my case.


On Thu, Feb 6, 2014 at 1:04 PM, Maurice James midnightst...@msn.com wrote:

 I currently have a new setup running ovirt 3.3.3. I have a Gluster storage
 domain with roughly 2.5TB of usable space.  Gluster is installed on the
 same systems as the ovirt hosts. The host break down is as follows



 Ovirt DC:

 4 hosts in the cluster. Each host has 4 physical disks in a RAID 5. Each
 disk is 500GB. With the OS installed and configured I end up with 1.2TB of
 usable space left for my data volume



 Gluster volume:

 4 bricks with 1.2TB of space per brick (Distribute Replicate leaves me
 with about 2.5TB in the storage domain)





 Does this setup give me enough fault tolerance to survive losing a host
 and have my HA vm automatically move to an available host and keep running??



 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] Gluster question

2014-02-05 Thread Andrew Lau
I'm not sure what you mean by NFS each host, but you'll need some way to at
least ensure the data is available. Be that be replicated gluster or a
centralized SAN etc.

On Thu, Feb 6, 2014 at 1:21 PM, Maurice James midnightst...@msn.com wrote:

 Hmm. So in that case would I be able to drop the Gluster setup and use NFS
 each host and make sure power fencing is enabled? Will that still achieve
 fault tolerance, or is a replicated gluster still required?



 *From:* Andrew Lau [mailto:and...@andrewklau.com]
 *Sent:* Wednesday, February 05, 2014 9:17 PM
 *To:* Maurice James
 *Cc:* users
 *Subject:* Re: [Users] Gluster question



 There was another recent post about this but a sum up was:



 You must have power fencing to support VM HA otherwise they'll be an issue
 with the engine not knowing whether the VM is still running and not bring
 it up on a new host to avoid data corruption. Also make sure you have your
 quorum setup properly based on your replication scenario so you can
 withstand 1 host being lost.



 I don't believe they'll keep running in a sense because of the host
 being lost, but they would restart on another host. At least that's what
 I've noticed in my case.



 On Thu, Feb 6, 2014 at 1:04 PM, Maurice James midnightst...@msn.com
 wrote:

 I currently have a new setup running ovirt 3.3.3. I have a Gluster storage
 domain with roughly 2.5TB of usable space.  Gluster is installed on the
 same systems as the ovirt hosts. The host break down is as follows



 Ovirt DC:

 4 hosts in the cluster. Each host has 4 physical disks in a RAID 5. Each
 disk is 500GB. With the OS installed and configured I end up with 1.2TB of
 usable space left for my data volume



 Gluster volume:

 4 bricks with 1.2TB of space per brick (Distribute Replicate leaves me
 with about 2.5TB in the storage domain)





 Does this setup give me enough fault tolerance to survive losing a host
 and have my HA vm automatically move to an available host and keep running??




 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[Users] Hosted Engine always reports unknown stale-data

2014-02-03 Thread Andrew Lau
Hi,

I was wondering if anyone has this same notice when they run:
hosted-engine --vm-status

The engine status will always be unknown stale-data even when the VM is
powered on and the engine is online. engine-health will actually report the
correct status.

eg.

--== Host 1 status ==--

Status up-to-date  : False
Hostname   : 172.16.0.11
Host ID: 1
Engine status  : unknown stale-data

Is it some sort of blocked port causing this or is this by design?

Thanks,
Andrew
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


  1   2   >