[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-06 Thread Jamie Lawrence
Hi Jim,

I don't have any targeted suggestions, because there isn't much to latch on to. 
I can say Gluster replica three  (no arbiters) on dedicated servers serving a 
couple Ovirt VM clusters here have not had these sorts of issues. 

I suspect your long heal times (and the resultant long periods of high load) 
are at least partly related to 1G networking. That is just a matter of IO - 
heals of VMs involve moving a lot of bits. My cluster uses 10G bonded NICs on 
the gluster and ovirt boxes for storage traffic and separate bonded 1G for 
ovirtmgmt and communication with other machines/people, and we're occasionally 
hitting the bandwidth ceiling on the storage network. I'm starting to think 
about 40/100G, different ways of splitting up intensive systems, and 
considering iSCSI for specific volumes, although I really don't want to go 
there.

I don't run FreeNAS[1], but I do run FreeBSD as storage servers for their 
excellent ZFS implementation, mostly for backups. ZFS will make your `heal` 
problem go away, but not your bandwidth problems, which become worse (because 
of fewer NICS pushing traffic). 10G hardware is not exactly in the impulse-buy 
territory, but if you can, I'd recommend doing some testing using it. I think 
at least some of your problems are related.

If that's not possible, my next stops would be optimizing everything I could 
about sharding, healing and optimizing for serving the shard size to squeeze as 
much performance out of 1G as I could, but that will only go so far.

-j

[1] FreeNAS is just a storage-tuned FreeBSD with a GUI.

> On Jul 6, 2018, at 1:19 PM, Jim Kusznir  wrote:
> 
> hi all:
> 
> Once again my production ovirt cluster is collapsing in on itself.  My 
> servers are intermittently unavailable or degrading, customers are noticing 
> and calling in.  This seems to be yet another gluster failure that I haven't 
> been able to pin down.
> 
> I posted about this a while ago, but didn't get anywhere (no replies that I 
> found).  The problem started out as a glusterfsd process consuming large 
> amounts of ram (up to the point where ram and swap were exhausted and the 
> kernel OOM killer killed off the glusterfsd process).  For reasons not clear 
> to me at this time, that resulted in any VMs running on that host and that 
> gluster volume to be paused with I/O error (the glusterfs process is usually 
> unharmed; why it didn't continue I/O with other servers is confusing to me).
> 
> I have 3 servers and a total of 4 gluster volumes (engine, iso, data, and 
> data-hdd).  The first 3 are replica 2+arb; the 4th (data-hdd) is replica 3.  
> The first 3 are backed by an LVM partition (some thin provisioned) on an SSD; 
> the 4th is on a seagate hybrid disk (hdd + some internal flash for 
> acceleration).  data-hdd is the only thing on the disk.  Servers are Dell 
> R610 with the PERC/6i raid card, with the disks individually passed through 
> to the OS (no raid enabled).
> 
> The above RAM usage issue came from the data-hdd volume.  Yesterday, I cought 
> one of the glusterfsd high ram usage before the OOM-Killer had to run.  I was 
> able to migrate the VMs off the machine and for good measure, reboot the 
> entire machine (after taking this opportunity to run the software updates 
> that ovirt said were pending).  Upon booting back up, the necessary volume 
> healing began.  However, this time, the healing caused all three servers to 
> go to very, very high load averages (I saw just under 200 on one server; 
> typically they've been 40-70) with top reporting IO Wait at 7-20%.  Network 
> for this volume is a dedicated gig network.  According to bwm-ng, initially 
> the network bandwidth would hit 50MB/s (yes, bytes), but tailed off to mostly 
> in the kB/s for a while.  All machines' load averages were still 40+ and 
> gluster volume heal data-hdd info reported 5 items needing healing.  Server's 
> were intermittently experiencing IO issues, even on the 3 gluster volumes 
> that appeared largely unaffected.  Even the OS activities on the hosts itself 
> (logging in, running commands) would often be very delayed.  The ovirt engine 
> was seemingly randomly throwing engine down / engine up / engine failed 
> notifications.  Responsiveness on ANY VM was horrific most of the time, with 
> random VMs being inaccessible.
> 
> I let the gluster heal run overnight.  By morning, there were still 5 items 
> needing healing, all three servers were still experiencing high load, and 
> servers were still largely unstable.
> 
> I've noticed that all of my ovirt outages (and I've had a lot, way more than 
> is acceptable for a production cluster) have come from gluster.  I still have 
> 3 VMs who's hard disk images have become corrupted by my last gluster crash 
> that I haven't had time to repair / rebuild yet (I believe this crash was 
> caused by the OOM issue previously mentioned, but I didn't know it at the 
> time).
> 
> Is gluster really ready for production yet?  It seems so 

Re: [ovirt-users] Remote DB: How do you set server_version?

2018-05-03 Thread Jamie Lawrence

> On May 3, 2018, at 12:42 AM, Yaniv Kaul  wrote:

> Patches are welcome to improve the way oVirt uses Postgresql, supports 
> various versions, etc.
> Can you give examples for some of the things you'd do differently?

A little pre-ramble - I was trying not to be offensive in talking about this, 
and hope I didn't bother anyone. For the record, if I were supreme dictator of 
the project, I might well make the same choices. Attention is limited, 
DB-flexibility is nowhere near a top-line feature, DB compatibility issues can 
be complex and subtle, and QA is  a limited  resource. I don't know that those 
are the concerns responsible for the current stance, but can totally see good 
reasons as to why things are they way they are.

Anyway, I've been thinking about is an installer mode that treats the DB as 
Someone Else's Problem - it doesn't try to install, configure or monitor it, 
instead leaving all config and responsibility for selecting something that 
works to the administrator. The assumption is that crazy people like me will 
figure out if things won't work against a given version, and over time the list 
will be capable of assuming some of that QA responsibility. That leaves the 
normal path for the bulk of users, and those who want to assume the risk can 
point at their own clusters where the closest running version is almost always 
going to be a point-release or three away from whatever Ovirt tests against and 
configuration is quite different.

What I have not done is written any code. I'd like to, but I'm probably several 
months away from having time.

-j


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Remote DB: How do you set server_version?

2018-05-02 Thread Jamie Lawrence

I've been down this road. Postgres won't lie about its version for you.  If you 
want to do this, you have to patch the Ovirt installer[1]. I stopped trying to 
use my PG cluster at some point -  the relationship between the installer and 
the product combined with the overly restrictive requirements baked into the 
installer[2]) makes doing so  an ongoing hassle. So I treat Ovirt's PG as an 
black box; disappointing, considering that we are a very heavy PG shop with a 
lot of expertise and automation I can't use with Ovirt.

If nothing has changed (my notes are from a few versions ago), everything you 
need to correct is in

/usr/share/ovirt-engine/setup/ovirt_engine_setup/engine_common/constants.py

Aside from the version, you'll also have to make the knobs for vacuuming match 
those of your current installation, and I think there was another configurable 
for something else I'm not remembering right now.

Be aware that doing so is accepting an ongoing commitment to monkeying with the 
installer a lot. At one time I thought doing so was the right tradeoff, but it 
turns out I  was wrong.

-j

[1] Or you could rebuild PG with a fake version. That option was unavailable 
here.
[2] Not criticizing, just stating a technical fact. How folks apportion their 
QA resources is their business.

> On May 2, 2018, at 12:49 PM, ~Stack~  wrote:
> 
> Greetings,
> 
> Exploring hosting my engine and ovirt_engine_history db's on my
> dedicated PostgreSQL server.
> 
> This is a 9.5 install on a beefy box from the postgresql.org yum repos
> that I'm using for other SQL needs too. 9.5.12 to be exact. I set up the
> database just as the documentation says and I'm doing a fresh install of
> my engine-setup.
> 
> During the install, right after I give it the details for the remote I
> get this error:
> [ ERROR ] Please set:
>  server_version = 9.5.9
> in postgresql.conf on 'None'. Its location is usually
> /var/lib/pgsql/data , or somewhere under /etc/postgresql* .
> 
> Huh?
> 
> Um. OK.
> $ grep ^server_version postgresql.conf
> server_version = 9.5.9
> 
> $ systemctl restart postgresql-9.5.service
> 
> LOG:  syntax error in file "/var/lib/pgsql/9.5/data/postgresql.conf"
> line 33, n...n ".9"
> FATAL:  configuration file "/var/lib/pgsql/9.5/data/postgresql.conf"
> contains errors
> 
> 
> Well that didn't work. Let's try something else.
> 
> $ grep ^server_version postgresql.conf
> server_version = 9.5.9
> 
> $ systemctl restart postgresql-9.5.service
> LOG:  parameter "server_version" cannot be changed
> FATAL:  configuration file "/var/lib/pgsql/9.5/data/postgresql.conf"
> contains errors
> 
> Whelp. That didn't work either. I can't seem to find anything in the
> oVirt docs on setting this.
> 
> How am I supposed to do this?
> 
> Thanks!
> ~Stack~
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted engine VDSM issue with sanlock

2018-03-29 Thread Jamie Lawrence

> On Mar 28, 2018, at 10:59 PM, Artem Tambovskiy  
> wrote:
> 
> Hi,
> 
> How many hosts you have? Check hosted-engine.conf on all hosts including the 
> one you have problem with and look if all host_id values are unique. It might 
> happen that you have several hosts with host_id=1

Hi Artem,

Thanks.  3 compute hosts, 3 gluster hosts. Checked them all, they're all unique 
(1, 2, 6, 101, 102 and 103), so that isn't the problem.

Found another datapoint that I'm not entirely sure what to do with. Tried 
reinstalling the afflicted host - call it host1. Moved the HE off of it, 
removed it from the GUI, reinstalled. At this point, the SPM was on host3. 
After it was back up, we moved the SPM to host1. The problem ceased on host1 
for several hours and then returned. But most notably, the problem started 
happening on host3!

So it seems somehow related to/influenced by the SPM. And I'm deeply confused.

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Hosted engine VDSM issue with sanlock

2018-03-28 Thread Jamie Lawrence
I still can't resolve this issue.

I have a host that is stuck in a cycle; it will be marked non responsive, then 
come back up, ending with an "finished activation" message in the GUI. Then it 
repeats.

The root cause seems to be sanlock.  I'm just unclear on why it started or how 
to resolve it. The only "approved" knob I'm aware of is 
--reinitialize-lockspace and the manual equivalent, neither of which fix 
anything.

Anyone have a guess?

-j

- - - vdsm.log - - - -

2018-03-28 10:38:22,207-0700 INFO  (monitor/b41eb20) [storage.SANLock] 
Acquiring host id for domain b41eb20a-eafb-481b-9a50-a135cf42b15e (id=1, 
async=True) (clusterlock:284)
2018-03-28 10:38:22,208-0700 ERROR (monitor/b41eb20) [storage.Monitor] Error 
acquiring host id 1 for domain b41eb20a-eafb-481b-9a50-a135cf42b15e 
(monitor:568)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 565, in 
_acquireHostId
self.domain.acquireHostId(self.hostId, async=True)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 828, in 
acquireHostId
self._manifest.acquireHostId(hostId, async)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 453, in 
acquireHostId
self._domainLock.acquireHostId(hostId, async)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line 
315, in acquireHostId
raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: 
(u'b41eb20a-eafb-481b-9a50-a135cf42b15e', SanlockException(22, 'Sanlock 
lockspace add failure', 'Invalid argument'))
2018-03-28 10:38:23,078-0700 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call 
Host.ping2 succeeded in 0.00 seconds (__init__:573)
2018-03-28 10:38:23,085-0700 INFO  (jsonrpc/6) [vdsm.api] START 
repoStats(domains=[u'b41eb20a-eafb-481b-9a50-a135cf42b15e']) from=::1,54450, 
task_id=186d7e8b-7b4e-485d-a9e0-c0cb46eed621 (api:46)
2018-03-28 10:38:23,085-0700 INFO  (jsonrpc/6) [vdsm.api] FINISH repoStats 
return={u'b41eb20a-eafb-481b-9a50-a135cf42b15e': {'code': 0, 'actual': True, 
'version': 4, 'acquired': False, 'delay': '0.000812547', 'lastCheck': '0.4', 
'valid': True}} from=::1,54450, task_id=186d7e8b-7b4e-485d-a9e0-c0cb46eed621 
(api:52)
2018-03-28 10:38:23,086-0700 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call 
Host.getStorageRepoStats succeeded in 0.00 seconds (__init__:573)
2018-03-28 10:38:23,092-0700 WARN  (vdsm.Scheduler) [Executor] Worker blocked: 
 at 0x1d44150> 
timeout=15, duration=150 at 0x7f076c05fb90> task#=83985 at 0x7f082c08e510>, 
traceback:
File: "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
  self.__bootstrap_inner()
File: "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
  self.run()
File: "/usr/lib64/python2.7/threading.py", line 765, in run
  self.__target(*self.__args, **self.__kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, 
in run
  ret = func(*args, **kwargs)
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 301, in _run
  self._execute_task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in 
_execute_task
  task()
File: "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 391, in __call__
  self._callable()
File: "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 213, in 
__call__
  self._func()
File: "/usr/lib/python2.7/site-packages/vdsm/virt/sampling.py", line 578, in 
__call__
  stats = hostapi.get_stats(self._cif, self._samples.stats())
File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 77, in get_stats
  ret['haStats'] = _getHaInfo()
File: "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 182, in 
_getHaInfo
  stats = instance.get_all_stats()
File: 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", 
line 93, in get_all_stats
  stats = broker.get_stats_from_storage()
File: 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 135, in get_stats_from_storage
  result = self._proxy.get_stats()
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
  return self.__send(self.__name, args)
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1587, in __request
  verbose=self.__verbose
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
  return self.single_request(host, handler, request_body, verbose)
File: "/usr/lib64/python2.7/xmlrpclib.py", line 1303, in single_request
  response = h.getresponse(buffering=True)
File: "/usr/lib64/python2.7/httplib.py", line 1089, in getresponse
  response.begin()
File: "/usr/lib64/python2.7/httplib.py", line 444, in begin
  version, status, reason = self._read_status()
File: "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
  line = self.fp.readline(_MAXLINE + 1)
File: "/usr/lib64/python2.7/socket.py", line 476, in readline
  data = self._sock.recv(self._rbufsize) (executor:363)
2018-03-28 10:38:23,274-0700 INFO  (jsonrpc/3) 

[ovirt-users] Host down/activation loop

2018-03-21 Thread Jamie Lawrence
Hello,

Have an issue that feels sanlock related, but I can't get sorted with our 
installation. This is 4.2.1, hosted engine. One of our hosts is stuck in a 
loop. It:

- gets a VDSM GetStatsVDS timeout, is marked as down, 
- throws a warning about not being fenced (because that's not enabled yet, 
because of this problem).
- and is set up Up about a minute later.

This repeats every 4 minutes and 20 seconds.

The hosted engine is running on the host that is stuck like this, and it 
doesn't appear to get in the way of creating new VMs or other operations, but 
obviously I can't use fencing, which is a big part of the point of running 
Ovirt in the first place.

I tried setting global maintenance and running hosted-engine 
--reinitialize-lockspace, which (a) took nearly exactly 2 minutes to run, 
making me think something timed out, (b) exited with rc 0, and (c) didn't fix 
the problem.

Anyone have an idea of how to fix this?

-j



- - details - -

I still can't quite figure out how to interpret what sanlock says, but  the -1s 
look like wrongness.

[sc5-ovirt-1]# sanlock client status
daemon bedae69e-03cc-49f8-88f4-9674a85a3185.sc5-ovirt-
p -1 helper
p -1 listener
p 122268 HostedEngine
p -1 status
s 
1aabcd3a-3fd3-4902-b92e-17beaf8fe3fd:1:/rhev/data-center/mnt/glusterSD/172.16.0.151\:_sc5-images/1aabcd3a-3fd3-4902-b92e-17beaf8fe3fd/dom_md/ids:0
s 
b41eb20a-eafb-481b-9a50-a135cf42b15e:1:/rhev/data-center/mnt/glusterSD/sc5-gluster-10g-1\:_sc5-ovirt__engine/b41eb20a-eafb-481b-9a50-a135cf42b15e/dom_md/ids:0
r 
b41eb20a-eafb-481b-9a50-a135cf42b15e:8f0c9f7a-ae6a-476e-b6f3-a830dcb79e87:/rhev/data-center/mnt/glusterSD/172.16.0.153\:_sc5-ovirt__engine/b41eb20a-eafb-481b-9a50-a135cf42b15e/images/a9d01d59-f146-47e5-b514-d10f8867678e/8f0c9f7a-ae6a-476e-b6f3-a830dcb79e87.lease:0:5
 p 122268


engine.log:

2018-03-21 16:09:26,081-07 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] EVENT_ID: 
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM sc5-ovirt-1 command GetStatsVDS 
failed: Message timeout which can be caused by communication issues
2018-03-21 16:09:26,081-07 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Command 
'GetStatsVDSCommand(HostName = sc5-ovirt-1, 
VdsIdAndVdsVDSCommandParametersBase:{hostId='be3517e0-f79d-464c-8169-f786d13ac287',
 vds='Host[sc5-ovirt-1,be3517e0-f79d-464c-8169-f786d13ac287]'})' execution 
failed: VDSGenericException: VDSNetworkException: Message timeout which can be 
caused by communication issues
2018-03-21 16:09:26,081-07 ERROR 
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failed getting vds 
stats, host='sc5-ovirt-1'(be3517e0-f79d-464c-8169-f786d13ac287): 
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: 
VDSGenericException: VDSNetworkException: Message timeout which can be caused 
by communication issues
2018-03-21 16:09:26,081-07 ERROR 
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failure to refresh host 
'sc5-ovirt-1' runtime info: VDSGenericException: VDSNetworkException: Message 
timeout which can be caused by communication issues
2018-03-21 16:09:26,081-07 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failed to refresh VDS, 
network error, continuing, 
vds='sc5-ovirt-1'(be3517e0-f79d-464c-8169-f786d13ac287): VDSGenericException: 
VDSNetworkException: Message timeout which can be caused by communication issues
2018-03-21 16:09:26,081-07 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] 
(EE-ManagedThreadFactory-engine-Thread-102682) [] Host 'sc5-ovirt-1' is not 
responding.
2018-03-21 16:09:26,088-07 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-102682) [] EVENT_ID: 
VDS_HOST_NOT_RESPONDING(9,027), Host sc5-ovirt-1 is not responding. Host cannot 
be fenced automatically because power management for the host is disabled.
2018-03-21 16:09:27,070-07 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connecting to sc5-ovirt-1/10.181.26.129
2018-03-21 16:09:27,918-07 INFO  
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] 
(DefaultQuartzScheduler4) [493fb316] START, 
GlusterServersListVDSCommand(HostName = sc5-gluster-2, 
VdsIdVDSCommandParametersBase:{hostId='797cbf42-6553-4a75-b8b1-93b2adbbc0db'}), 
log id: 6afccc01
2018-03-21 16:09:28,579-07 INFO  
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] 
(DefaultQuartzScheduler4) [493fb316] FINISH, GlusterServersListVDSCommand, 
return: [192.168.122.1/24:CONNECTED, sc5-gluster-3:CONNECTED, 
sc5-gluster-10g-1:CONNECTED], log id: 6afccc01
2018-03-21 16:09:28,606-07 INFO  

[ovirt-users] Host down/activation loop

2018-03-21 Thread Jamie Lawrence
Hello,

Have an issue that feels sanlock related, but I can't get sorted with our 
installation. This is 4.2.1, hosted engine. One of our hosts is stuck in a 
loop. It:

 - gets a VDSM GetStatsVDS timeout, is marked as down, 
 - throws a warning about not being fenced (because that's not enabled yet, 
because of this problem).
 - and is set up Up about a minute later.

This repeats every 4 minutes and 20 seconds.

The hosted engine is running on the host that is stuck like this, and it 
doesn't appear to get in the way of creating new VMs or other operations, but 
obviously I can't use fencing, which is a big part of the point of running 
Ovirt in the first place.

I tried setting global maintenance and running hosted-engine 
--reinitialize-lockspace, which (a) took nearly exactly 2 minutes to run, 
making me think something timed out, (b) exited with rc 0, and (c) didn't fix 
the problem.

Anyone have an idea of how to fix this?

-j



- - details - -

I still can't quite figure out how to interpret what sanlock says, but  the -1s 
look like wrongness.

[sc5-ovirt-1]# sanlock client status
daemon bedae69e-03cc-49f8-88f4-9674a85a3185.sc5-ovirt-
p -1 helper
p -1 listener
p 122268 HostedEngine
p -1 status
s 
1aabcd3a-3fd3-4902-b92e-17beaf8fe3fd:1:/rhev/data-center/mnt/glusterSD/172.16.0.151\:_sc5-images/1aabcd3a-3fd3-4902-b92e-17beaf8fe3fd/dom_md/ids:0
s 
b41eb20a-eafb-481b-9a50-a135cf42b15e:1:/rhev/data-center/mnt/glusterSD/sc5-gluster-10g-1\:_sc5-ovirt__engine/b41eb20a-eafb-481b-9a50-a135cf42b15e/dom_md/ids:0
r 
b41eb20a-eafb-481b-9a50-a135cf42b15e:8f0c9f7a-ae6a-476e-b6f3-a830dcb79e87:/rhev/data-center/mnt/glusterSD/172.16.0.153\:_sc5-ovirt__engine/b41eb20a-eafb-481b-9a50-a135cf42b15e/images/a9d01d59-f146-47e5-b514-d10f8867678e/8f0c9f7a-ae6a-476e-b6f3-a830dcb79e87.lease:0:5
 p 122268


engine.log:

2018-03-21 16:09:26,081-07 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] EVENT_ID: 
VDS_BROKER_COMMAND_FAILURE(10,802), VDSM sc5-ovirt-1 command GetStatsVDS 
failed: Message timeout which can be caused by communication issues
2018-03-21 16:09:26,081-07 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetStatsVDSCommand] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Command 
'GetStatsVDSCommand(HostName = sc5-ovirt-1, 
VdsIdAndVdsVDSCommandParametersBase:{hostId='be3517e0-f79d-464c-8169-f786d13ac287',
 vds='Host[sc5-ovirt-1,be3517e0-f79d-464c-8169-f786d13ac287]'})' execution 
failed: VDSGenericException: VDSNetworkException: Message timeout which can be 
caused by communication issues
2018-03-21 16:09:26,081-07 ERROR 
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failed getting vds 
stats, host='sc5-ovirt-1'(be3517e0-f79d-464c-8169-f786d13ac287): 
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: 
VDSGenericException: VDSNetworkException: Message timeout which can be caused 
by communication issues
2018-03-21 16:09:26,081-07 ERROR 
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failure to refresh host 
'sc5-ovirt-1' runtime info: VDSGenericException: VDSNetworkException: Message 
timeout which can be caused by communication issues
2018-03-21 16:09:26,081-07 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] 
(EE-ManagedThreadFactory-engineScheduled-Thread-67) [] Failed to refresh VDS, 
network error, continuing, 
vds='sc5-ovirt-1'(be3517e0-f79d-464c-8169-f786d13ac287): VDSGenericException: 
VDSNetworkException: Message timeout which can be caused by communication issues
2018-03-21 16:09:26,081-07 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] 
(EE-ManagedThreadFactory-engine-Thread-102682) [] Host 'sc5-ovirt-1' is not 
responding.
2018-03-21 16:09:26,088-07 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-102682) [] EVENT_ID: 
VDS_HOST_NOT_RESPONDING(9,027), Host sc5-ovirt-1 is not responding. Host cannot 
be fenced automatically because power management for the host is disabled.
2018-03-21 16:09:27,070-07 INFO  
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
Connecting to sc5-ovirt-1/10.181.26.129
2018-03-21 16:09:27,918-07 INFO  
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] 
(DefaultQuartzScheduler4) [493fb316] START, 
GlusterServersListVDSCommand(HostName = sc5-gluster-2, 
VdsIdVDSCommandParametersBase:{hostId='797cbf42-6553-4a75-b8b1-93b2adbbc0db'}), 
log id: 6afccc01
2018-03-21 16:09:28,579-07 INFO  
[org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] 
(DefaultQuartzScheduler4) [493fb316] FINISH, GlusterServersListVDSCommand, 
return: [192.168.122.1/24:CONNECTED, sc5-gluster-3:CONNECTED, 
sc5-gluster-10g-1:CONNECTED], log id: 6afccc01
2018-03-21 16:09:28,606-07 INFO  

Re: [ovirt-users] Iso upload success, no GUI popup option

2018-03-20 Thread Jamie Lawrence

> On Mar 20, 2018, at 5:00 AM, Alexander Wels <aw...@redhat.com> wrote:
> 
> On Monday, March 19, 2018 7:47:00 PM EDT Jamie Lawrence wrote:

>> So, uploading from one of the hosts to an ISO domain claims success, and
>> manually checking shows the ISO uploaded just fine, perms set correctly to
>> 36:36. But it doesn't appear in the GUI popup when creating a new VM.
>> 
> 
> You probably need to refresh the ISO list, assuming 4.2 go to Storage -> 
> Storage Domains -> , click on the name, and go to the images 
> detail tab. This should refresh the list of ISOs in the list and the ISO 
> should be listed, once that is done, it should show up in the drop down when 
> you change the CD.

That did it, thanks so much.

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Iso upload success, no GUI popup option

2018-03-19 Thread Jamie Lawrence
Hello,

I'm trying to iron out the last few oddities of this setup, and one of them is 
the iso upload process. This worked in the last rebuild, but... well.

So, uploading from one of the hosts to an ISO domain claims success, and 
manually checking shows the ISO uploaded just fine, perms set correctly to 
36:36. But it doesn't appear in the GUI popup when creating a new VM.

Verified that the VDSM user can fully traverse the directory path - presumably 
that was tested by uploading it in the first place, but I double-checked. 
Looked in various logs, but didn't see any action in ovirt-imageio-daemon or 
-proxy. Didn't see anything in engine.log that looked relevant.

What is the troubleshooting method for this? Googling, it seemed most folks' 
problems were related to permissions. I scanned DB table names for something 
that seemed like it might have ISO-related info in it, but couldn't find 
anything, and am not sure what else to check.

Thanks,

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] 4.2 aaa LDAP setup issue

2018-02-20 Thread Jamie Lawrence

I missed this when you sent it; apologies for the delay.

> On Feb 13, 2018, at 12:11 AM, Ondra Machacek <omach...@redhat.com> wrote:
> 
> Hello,
> 
> On 02/09/2018 08:17 PM, Jamie Lawrence wrote:
>> Hello,
>> I'm bringing up a new 4.2 cluster and would like to use LDAP auth. Our LDAP 
>> servers are fine and function normally for a number of other services, but I 
>> can't get this working.
>> Our LDAP setup requires startTLS and a login. That last bit seems to be 
>> where the trouble is. After ovirt-engine-extension-aaa-ldap-setup asks for 
>> the cert and I pass it the path to the same cert used via nslcd/PAM for 
>> logging in to the host, it replies:
>> [ INFO  ] Connecting to LDAP using 'ldap://x.squaretrade.com:389'
>> [ INFO  ] Executing startTLS
>> [WARNING] Cannot connect using 'ldap://x.squaretrade.com:389': {'info': 
>> 'authentication required', 'desc': 'Server is unwilling to perform'}
>> [ ERROR ] Cannot connect using any of available options
>> "Unwilling to perform" makes me think -aaa-ldap-setup is trying something 
>> the backend doesn't support, but I'm having trouble guessing what that could 
>> be since the tool hasn't gathered sufficient information to connect yet - it 
>> asks for a DN/pass later in the script. And the log isn't much more 
>> forthcoming.
>> I double-checked the cert with openssl; it is a valid, PEM-encoded cert.
>> Before I head in to the code, has anyone seen this?
> 
> Looks like you have disallowed anonymous bind on your LDAP.
> We are trying to estabilish anonymous bind to test the connection.

Ah, I think I forgot that anonymous bind was a thing. 

> I would recommend to try to do a manual configuration, the documentation
> is here:
> 
> https://github.com/oVirt/ovirt-engine-extension-aaa-ldap/blob/master/README#L17
> 
> Then in your /etc/ovirt-engine/aaa/profile1.properties add following
> line:
> 
> pool.default.auth.type = simple

Awesome, thanks so much. I really appreciate the pointer.

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Reinitializing lockspace

2018-02-20 Thread Jamie Lawrence
Hello,

I have a sanlock problem. I don't fully understand the logs, but from what I 
can gather, messages like this means it ain't working.

2018-02-16 14:51:46 22123 [15036]: s1 renewal error -107 delta_length 0 
last_success 22046
2018-02-16 14:51:47 22124 [15036]: 53977885 aio collect RD 
0x7fe5040008c0:0x7fe5040008d0:0x7fe518922000 result -107:0 match res
2018-02-16 14:51:47 22124 [15036]: s1 delta_renew read rv -107 offset 0 
/rhev/data-center/mnt/glusterSD/sc5-gluster-10g-1.squaretrade.com:ovirt__images/53977885-0887-48d0-a02c-8d9e3faec93c/dom_md/ids

I attempted `hosted-engine --reinitialize-lockspace --force`, which didn't 
appear to do anything, but who knows.

I downed everything and and tried `sanlock direct init -s `, which caused 
sanlock to dump core.

At this point the only thing I can think of to do is down everything, whack and 
manually recreate the lease files and try again. I'm worried that that will 
lose something that the setup did or will otherwise destroy the installation. 
It looks like this has been done by others[1], but the references I can find 
are a bit old, so I'm unsure if that is still a valid approach.

So, questions:

 - Will that work? 
 - Is there something I should do instead of that?

Thanks,

-j


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1116469
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hosted engine install fails on useless DHCP lookup

2018-02-14 Thread Jamie Lawrence
> On Feb 14, 2018, at 1:27 AM, Simone Tiraboschi <stira...@redhat.com> wrote:
> On Wed, Feb 14, 2018 at 2:11 AM, Jamie Lawrence <jlawre...@squaretrade.com> 
> wrote:
> Hello,
> 
> I'm seeing the hosted engine install fail on an Ansible playbook step. Log 
> below. I tried looking at the file specified for retry, below 
> (/usr/share/ovirt-hosted-engine-setup/ansible/bootstrap_local_vm.retry); it 
> contains the word, 'localhost'.
> 
> The log below didn't contain anything I could see that was actionable; given 
> that it was an ansible error, I hunted down the config and enabled logging. 
> On this run the error was different - the installer log was the same, but the 
> reported error (from the installer changed).
> 
> The first time, the installer said:
> 
> [ INFO  ] TASK [Wait for the host to become non operational]
> [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": 
> []}, "attempts": 150, "changed": false}
> [ ERROR ] Failed to execute stage 'Closing up': Failed executing 
> ansible-playbook
> [ INFO  ] Stage: Clean up
> 
> 'localhost' here is not an issue by itself: the playbook is executed on the 
> host against the same host over a local connection so localhost is absolutely 
> fine there.
> 
> Maybe you hit this one:
> https://bugzilla.redhat.com/show_bug.cgi?id=1540451

That seems likely. 


> It seams NetworkManager related but still not that clear.
> Stopping NetworkManager and starting network before the deployment seams to 
> help.

Tried this, got the same results.

[snip]
> Anyone see what is wrong here?
> 
> This is absolutely fine.
> The new ansible based flow (also called node zero) uses an engine running on 
> a local virtual machine to bootstrap the system.
> The bootstrap local VM runs over libvirt default natted network with its own 
> dhcp instance, that's why we are consuming it.
> The locally running engine will create a target virtual machine on the shared 
> storage and that one will be instead configured as you specified.

Thanks for the context - that's useful, and presumably explains why 192.168 
addresses (which we don't use) are appearing in the logs.

Not being entirely sure where to go from here, I guess I'll spend the evening 
figuring out ansible-ese in order to try to figure out why it is blowing chunks.

Thanks for the note. 

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] hosted engine install fails on useless DHCP lookup

2018-02-13 Thread Jamie Lawrence
Hello,

I'm seeing the hosted engine install fail on an Ansible playbook step. Log 
below. I tried looking at the file specified for retry, below 
(/usr/share/ovirt-hosted-engine-setup/ansible/bootstrap_local_vm.retry); it 
contains the word, 'localhost'. 

The log below didn't contain anything I could see that was actionable; given 
that it was an ansible error, I hunted down the config and enabled logging. On 
this run the error was different - the installer log was the same, but the 
reported error (from the installer changed). 

The first time, the installer said:

[ INFO  ] TASK [Wait for the host to become non operational]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": []}, 
"attempts": 150, "changed": false}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing 
ansible-playbook
[ INFO  ] Stage: Clean up


Second:

[ INFO  ] TASK [Get local vm ip]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 50, "changed": true, 
"cmd": "virsh -r net-dhcp-leases default | grep -i 00:16:3e:11:e7:bd | awk '{ 
print $5 }' | cut -f1 -d'/'", "delta": "0:00:00.093840", "end": "2018-02-13 
16:53:08.658556", "rc": 0, "start": "2018-02-13 16:53:08.564716", "stderr": "", 
"stderr_lines": [], "stdout": "", "stdout_lines": []}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing 
ansible-playbook
[ INFO  ] Stage: Clean up



 Ansible log below; as with that second snippet, it appears that it was trying 
to parse out a host name from virsh's list of DHCP leases, couldn't, and died. 

Which makes sense: I gave it a static IP, and unless I'm missing something, 
setup should not have been doing that. I verified that the answer file has the 
IP:

OVEHOSTED_VM/cloudinitVMStaticCIDR=str:10.181.26.150/24

Anyone see what is wrong here?

-j


hosted-engine --deploy log:

2018-02-13 16:20:32,138-0800 INFO otopi.ovirt_hosted_engine_setup.ansible_utils 
ansible_utils._process_output:100 TASK [Force host-deploy in offline mode]
2018-02-13 16:20:33,041-0800 INFO otopi.ovirt_hosted_engine_setup.ansible_utils 
ansible_utils._process_output:100 changed: [localhost]
2018-02-13 16:20:33,342-0800 INFO otopi.ovirt_hosted_engine_setup.ansible_utils 
ansible_utils._process_output:100 TASK [include_tasks]
2018-02-13 16:20:33,443-0800 INFO otopi.ovirt_hosted_engine_setup.ansible_utils 
ansible_utils._process_output:100 ok: [localhost]
2018-02-13 16:20:33,744-0800 INFO otopi.ovirt_hosted_engine_setup.ansible_utils 
ansible_utils._process_output:100 TASK [Obtain SSO token using 
username/password credentials]
2018-02-13 16:20:35,248-0800 INFO otopi.ovirt_hosted_engine_setup.ansible_utils 
ansible_utils._process_output:100 ok: [localhost]
2018-02-13 16:20:35,550-0800 INFO otopi.ovirt_hosted_engine_setup.ansible_utils 
ansible_utils._process_output:100 TASK [Add host]
2018-02-13 16:20:37,053-0800 INFO otopi.ovirt_hosted_engine_setup.ansible_utils 
ansible_utils._process_output:100 changed: [localhost]
2018-02-13 16:20:37,355-0800 INFO otopi.ovirt_hosted_engine_setup.ansible_utils 
ansible_utils._process_output:100 TASK [Wait for the host to become non 
operational]
2018-02-13 16:27:48,895-0800 DEBUG 
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 
{u'_ansible_parsed': True, u'_ansible_no_log': False, u'changed': False, 
u'attempts': 150, u'invocation': {u'module_args': {u'pattern': 
u'name=ovirt-1.squaretrade.com', u'fetch_nested': False, u'nested_attributes': 
[]}}, u'ansible_facts': {u'ovirt_hosts': []}}
2018-02-13 16:27:48,995-0800 ERROR 
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 
fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": []}, 
"attempts": 150, "changed": false}
2018-02-13 16:27:49,297-0800 DEBUG 
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 
PLAY RECAP [localhost] : ok: 42 changed: 17 unreachable: 0 skipped: 2 failed: 1
2018-02-13 16:27:49,397-0800 DEBUG 
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 
PLAY RECAP [ovirt-engine-1.squaretrade.com] : ok: 15 changed: 8 unreachable: 0 
skipped: 4 failed: 0
2018-02-13 16:27:49,498-0800 DEBUG 
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:180 
ansible-playbook rc: 2
2018-02-13 16:27:49,498-0800 DEBUG 
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:187 
ansible-playbook stdout:
2018-02-13 16:27:49,499-0800 DEBUG 
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:189  to retry, 
use: --limit 
@/usr/share/ovirt-hosted-engine-setup/ansible/bootstrap_local_vm.retry

2018-02-13 16:27:49,499-0800 DEBUG 
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils.run:190 
ansible-playbook stderr:
2018-02-13 16:27:49,500-0800 DEBUG otopi.context context._executeMethod:143 
method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 133, in 
_executeMethod
method['method']()
  File 

[ovirt-users] 4.2 aaa LDAP setup issue

2018-02-09 Thread Jamie Lawrence
Hello,

I'm bringing up a new 4.2 cluster and would like to use LDAP auth. Our LDAP 
servers are fine and function normally for a number of other services, but I 
can't get this working.

Our LDAP setup requires startTLS and a login. That last bit seems to be where 
the trouble is. After ovirt-engine-extension-aaa-ldap-setup asks for the cert 
and I pass it the path to the same cert used via nslcd/PAM for logging in to 
the host, it replies:

[ INFO  ] Connecting to LDAP using 'ldap://x.squaretrade.com:389'
[ INFO  ] Executing startTLS
[WARNING] Cannot connect using 'ldap://x.squaretrade.com:389': {'info': 
'authentication required', 'desc': 'Server is unwilling to perform'}
[ ERROR ] Cannot connect using any of available options

"Unwilling to perform" makes me think -aaa-ldap-setup is trying something the 
backend doesn't support, but I'm having trouble guessing what that could be 
since the tool hasn't gathered sufficient information to connect yet - it asks 
for a DN/pass later in the script. And the log isn't much more forthcoming.

I double-checked the cert with openssl; it is a valid, PEM-encoded cert.

Before I head in to the code, has anyone seen this? 

Thanks,

-j

- - - - snip - - - - 

Relevant log details:

2018-02-08 15:15:08,625-0800 DEBUG 
otopi.plugins.ovirt_engine_extension_aaa_ldap.ldap.common common._getURLs:281 
URLs: ['ldap://x.squaretrade.com:389']
2018-02-08 15:15:08,626-0800 INFO 
otopi.plugins.ovirt_engine_extension_aaa_ldap.ldap.common 
common._connectLDAP:391 Connecting to LDAP using 'ldap://x.squaretrade.com:389'
2018-02-08 15:15:08,627-0800 INFO 
otopi.plugins.ovirt_engine_extension_aaa_ldap.ldap.common 
common._connectLDAP:442 Executing startTLS
2018-02-08 15:15:08,640-0800 DEBUG 
otopi.plugins.ovirt_engine_extension_aaa_ldap.ldap.common 
common._connectLDAP:445 Perform search
2018-02-08 15:15:08,641-0800 DEBUG 
otopi.plugins.ovirt_engine_extension_aaa_ldap.ldap.common 
common._connectLDAP:459 Exception
Traceback (most recent call last):
  File 
"/usr/share/ovirt-engine-extension-aaa-ldap/setup/bin/../plugins/ovirt-engine-extension-aaa-ldap/ldap/common.py",
 line 451, in _connectLDAP
timeout=60,
  File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 555, in 
search_st
return 
self.search_ext_s(base,scope,filterstr,attrlist,attrsonly,None,None,timeout)
  File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 546, in 
search_ext_s
return self.result(msgid,all=1,timeout=timeout)[1]
  File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 458, in 
result
resp_type, resp_data, resp_msgid = self.result2(msgid,all,timeout)
  File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 462, in 
result2
resp_type, resp_data, resp_msgid, resp_ctrls = 
self.result3(msgid,all,timeout)
  File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 469, in 
result3
resp_ctrl_classes=resp_ctrl_classes
  File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 476, in 
result4
ldap_result = 
self._ldap_call(self._l.result4,msgid,all,timeout,add_ctrls,add_intermediates,add_extop)
  File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 99, in 
_ldap_call
result = func(*args,**kwargs)
UNWILLING_TO_PERFORM: {'info': 'authentication required', 'desc': 'Server is 
unwilling to perform'}
2018-02-08 15:15:08,642-0800 WARNING 
otopi.plugins.ovirt_engine_extension_aaa_ldap.ldap.common 
common._connectLDAP:463 Cannot connect using 'ldap://x.squaretrade.com:389': 
{'info': 'authentication required', 'desc': 'Server is unwilling to perform'}
2018-02-08 15:15:08,643-0800 ERROR 
otopi.plugins.ovirt_engine_extension_aaa_ldap.ldap.common 
common._customization_late:787 Cannot connect using any of available options
2018-02-08 15:15:08,644-0800 DEBUG 
otopi.plugins.ovirt_engine_extension_aaa_ldap.ldap.common 
common._customization_late:788 Exception
Traceback (most recent call last):
  File 
"/usr/share/ovirt-engine-extension-aaa-ldap/setup/bin/../plugins/ovirt-engine-extension-aaa-ldap/ldap/common.py",
 line 782, in _customization_late
insecure=insecure,
  File 
"/usr/share/ovirt-engine-extension-aaa-ldap/setup/bin/../plugins/ovirt-engine-extension-aaa-ldap/ldap/common.py",
 line 468, in _connectLDAP
_('Cannot connect using any of available options')
SoftRuntimeError: Cannot connect using any of available options
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Upgrade via reinstall?

2018-02-01 Thread Jamie Lawrence

> On Feb 1, 2018, at 1:21 AM, Yaniv Kaul  wrote:

> The engine-backup utility is your friend and will properly back up for you 
> everything you need.

Thanks, but that is an answer to a different question I didn't ask.

It does, implicitly, seem to indicate that there probably are artifacts on 
hosts that need to be preserved, and hopefully I can find specifics in that 
script .

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Upgrade via reinstall?

2018-01-31 Thread Jamie Lawrence
Hello,

I currently have an Ovirt 4.1.8 installation with a hosted engine using Gluster 
for storage, with the DBs hosted on a dedicated PG cluster.

For reasons[1], it seems possibly simpler for me to upgrade our installation by 
reinstalling rather than upgrading. In this case, I can happily bring down the 
running VMs/otherwise do things that one normally can't.

Is there any technical reason I can't/shouldn't rebuild from bare-metal, 
including creating a fresh hosted engine, without losing anything? I suppose a 
different way of asking this is, is there anything on the engine/host 
filesystems that I should preserve/restore for this to work? 

Thanks,

-j

[1] If this isn't an option, I'll go in to them in order to figure out a plan 
B; just avoiding a lot of backstory that isn't needed for the question.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt home lab hardware

2018-01-18 Thread Jamie Lawrence

I'm a bit out of date on the topic, but unless they have changed, avoid the 
NUCs. I bought a couple but gave up on them because of a firmware bug (? 
limitation, at least) that Intel said it won't fix. It causes a lockup on boot 
if a monitor is not connected. I've been told that different models don't have 
that problem, but have also heard of other weird problems. The impression I got 
is that they were meant for being media servers stuck to the back of TVs, and 
other use cases tend to be precluded by unexpected limitations, bugs and 
general strange behavior. Perhaps this was fixed, but I ended up with a bad 
taste in my mouth.

You can pick up dirt-cheap rackmount kit on Ebay. A few R710s or similar can be 
had for a few $hundred and would be absolutely lovely overkill for a home 
network. The downside of this approach at home is going to be power consumption 
and noise, depending on local energy costs and the nature of your dwelling. 
Also, some people may not be that fond of the datacenter look for home decor.

Unless you live in a cheap energy locale/don't pay for your power/enjoy 
malformed space heaters, don't underestimate the cost of running server kit. 
Getting rid of rack mount machines and switches and moving everything to a 
couple machines built with energy consumption in mind cut my electricity costs 
by over half.

-j


> On Jan 18, 2018, at 12:52 PM, Abdurrahman A. Ibrahim 
>  wrote:
> 
> Hello,
> 
> I am planning to buy home lab hardware to be used by oVirt.
> 
> Any recommendations for used hardware i can buy from eBay for example? 
> Also, have you tried oVirt on Intel NUC or any other SMB servers before? 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Postgres errors after updating to 9.5.7 from 9.5.4

2017-06-01 Thread Jamie Lawrence
> On May 24, 2017, at 6:25 AM, supp...@jac-properties.com wrote:
> 
> Which makes sense seeing as that's what Red Hat officially supports.  It just 
> made sense for our infrastructure to put it on our postgres cluster that is 
> running 9.5.x.  Unfortunately things like this happen sometimes when running 
> a slightly unsupported infrastructure.

We have done the same thing. Be aware that it makes upgrades a PITA - the 
installer is picky about the version, as well as various PG config options and 
refuses to run if you don't satisfy it. I hacked the installer, because that 
was easier than standing up an ancient Postgres just for the upgrade in our 
environment; YMMV.

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Regenerating SSL keys

2017-05-16 Thread Jamie Lawrence

> On May 14, 2017, at 3:35 AM, Yedidyah Bar David  wrote:

> In addition to Yaniv's explanation below, can you explain why it was
> bad? That is, what software/process was broken by it? Please note that
> this is the CN of the CA's cert, not of the individual certs its signs
> (such as the one for the web server for https) - these have the FQDN
> you supplied to engine-setup as their CN.

You're absolutely right; my apologies for that red herring. I confused myself 
after too long at the keyboard.

>> The 5 random digits are supposed to be OK, and are actually a feature - it
>> ensures uniqueness if you re-generate (most likely reinstall your Engine),
>> as otherwise some browsers fail miserably if a CA cert mismatches what they
>> know.
>> 
>> SAN is being worked on - we are aware of Chrome 58 now requiring it.
>> I sincerely hope to see it in 4.1.2 (see https://bugzilla.redhat.com/1449084
>> ).
> 
> Indeed, and see my comment 5 there for how to add SAN to an existing
> setup, _after_ you upgrade to 4.1.2 when it's out.

Great, that's handy.

> See also:
> 
> https://www.ovirt.org/documentation/how-to/networking/changing-engine-hostname/

Thanks for the pointer! That was the missing piece for me; my Google-fu failed 
to uncover it. I think I have what I need.

Thanks again to both of you,

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Regenerating SSL keys

2017-05-12 Thread Jamie Lawrence
The key generated by the engine install ended up with a bad CN; it has a 
five-digit number appended to the host name, and no SAN.

I've lived with this through setup, but now I'm getting close to prod use, and 
need to clean up so that it is usable for general consumption. And the SPICE 
HTML client is completely busted due to this; that's a problem because we're 
mostly MacOS on the client side, and the Mac Spice client is unusable for 
normal humans. 

 I'm wary of attempting to regenerate these manually, as I don't have a handle 
on how the keysare used by the various components.

What is the approved method of regenerating these keys?

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted engine already imported

2017-05-04 Thread Jamie Lawrence
Does anyone know the answer to the specific question below?  I'm not asking for 
workarounds to try;  that approach has repeatedly gone nowhere and time is 
becoming an issue for me. If I'm on the wrong track, I'd be happy to hear that, 
too. (Well, not happy, but that would at least be progress.)

- - - - 

I’m wondering if I can do this a different way. Since it is already imported, I 
can create VMs, and the only other warning I’m getting in the logs is 
unrelated, I’m thinking that the problem here is that something didn’t get 
properly set in the DB, and I should be able to fix that manually.

In looking at the code 
(https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/storage/domain/ImportHostedEngineStorageDomainCommand.java),
 it to my (not very deep) read that the setSucceeded call failed, even though 
AttachStorageDomainToPool worked.

 if (getSucceeded()) {
   AttachStorageDomainToPoolParameters attachSdParams =
   new AttachStorageDomainToPoolParameters(
   addSdParams.getStorageDomainId(),
   addSdParams.getStoragePoolId());
   setSucceeded(getBackend().runInternalAction(
   VdcActionType.AttachStorageDomainToPool,
   attachSdParams,
   getContext()).getSucceeded());
   }

   setActionReturnValue(heStorageDomain);

Is there a way to call setSucceeded without hacking together a custom utility? 
Not seeing it in vdsClient --help, which doesn’t surprise me. In looking over 
the stored procedures, I’m also not finding a likely candidate, but that is 
probably because there are so many that I’m just missing it.

Does anyone know what the relevant SP is, or in some other way clue me in on 
the right direction here?  

I realize this is not supposed to be the way to do things. But I’m not finding 
a better solution, and attempting to find a “right” way via questions to this 
isn’t working either. And of course I take full responsibility when Ovirt kills 
my pets and drinks all my liquor.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted engine already imported

2017-05-03 Thread Jamie Lawrence
> 
> On May 2, 2017, at 12:04 AM, Simone Tiraboschi <stira...@redhat.com> wrote:
> 
> 
> 
> On Sat, Apr 29, 2017 at 1:11 AM, Jamie Lawrence <jlawre...@squaretrade.com> 
> wrote:
> I’m wondering if I can do this a different way. Since it is already imported, 
> I can create VMs, and the only other warning I’m getting in the logs is 
> unrelated, I’m thinking that the problem here is that something didn’t get 
> properly set in the DB, and I should be able to fix that manually.

> I realize this is not supposed to be the way to do things. But I’m not 
> finding a better solution, and attempting to find a “right” way via questions 
> to this isn’t working either. And of course I take full responsibility when 
> Ovirt  kills my pets and drinks all my liquor.
> 
> 
> I'd suggest to destroy (but without deleting its content!!!) the 
> hosted-engine storage domain in the engine: the auto-import process should 
> simply trigger again.

OK, I'm in a place where I can try this, but am unsure as to how to do it. 
Selecting the hosted engine SD, it won't allow me to place it in maintenance 
("Error while executing action: Cannot deactivate Storage. The storage selected 
contains the self hosted engine."). Poking around more the only related action 
I can find that isn't disabled is Gluster stop, which, if it is allowed, would 
seem likely to end badly - if the Gluster volume is stopped,  the running 
engine VM won't be accessible,  and of course the engine auto-import process 
wouldn't find anything.

So, how does one destroy the hosted engine volume from within the engine?

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted engine already imported

2017-04-28 Thread Jamie Lawrence
I’m wondering if I can do this a different way. Since it is already imported, I 
can create VMs, and the only other warning I’m getting in the logs is 
unrelated, I’m thinking that the problem here is that something didn’t get 
properly set in the DB, and I should be able to fix that manually.

In looking at the code 
(https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/storage/domain/ImportHostedEngineStorageDomainCommand.java),
 it to my (not very deep) read that the setSucceeded call failed, even though 
AttachStorageDomainToPool worked.

  if (getSucceeded()) {
AttachStorageDomainToPoolParameters attachSdParams =
new AttachStorageDomainToPoolParameters(
addSdParams.getStorageDomainId(),
addSdParams.getStoragePoolId());
setSucceeded(getBackend().runInternalAction(
VdcActionType.AttachStorageDomainToPool,
attachSdParams,
getContext()).getSucceeded());
}

setActionReturnValue(heStorageDomain);
 
Is there a way to call setSucceeded without hacking together a custom utility? 
Not seeing it in vdsClient --help, which doesn’t surprise me. In looking over 
the stored procedures, I’m also not finding a likely candidate, but that is 
probably because there are so many that I’m just missing it.

Does anyone know what the relevant SP is, or in some other way clue me in on 
the right direction here?  

I realize this is not supposed to be the way to do things. But I’m not finding 
a better solution, and attempting to find a “right” way via questions to this 
isn’t working either. And of course I take full responsibility when Ovirt  
kills my pets and drinks all my liquor.

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted engine already imported

2017-04-28 Thread Jamie Lawrence

> On Apr 27, 2017, at 12:12 AM, Simone Tiraboschi <stira...@redhat.com> wrote:
> 
> 
> 
> On Tue, Apr 25, 2017 at 7:53 PM, Jamie Lawrence <jlawre...@squaretrade.com> 
> wrote:
> Hi all,
> 
> In my hopefully-near-complete quest to automate Ovirt configuration in our 
> environment, I’m very close. One outstanding issue that remains is that, even 
> though the hosted engine storage domain actually was imported and shows in 
> the GUI, some part of Ovirt appears to think that hasn’t happened yet.
> 
> Did the engine imported it automatically once you had your first regular 
> storage domain it did you forced it somehow?


It was imported automatically. The chain of events was install HE, wait for the 
emails saying it was done cogitating, create the regular Gluster SD, import it. 
Shortly thereafter, the HE domain was imported, and things (storage, hosts, 
clusters) all appear correct in the GUI, but the error persists.

-j

> In the GUI, a periodic error is logged: “The Hosted Engine Storage Domain 
> doesn’t exist. It will be imported automatically…”
> 
> In engine.log, all I’m seeing that appears relevant is:
> 
> 2017-04-25 10:28:57,988-07 INFO  
> [org.ovirt.engine.core.bll.storage.domain.ImportHostedEngineStorageDomainCommand]
>  (org.ovirt.thread.pool-6-thread-9) [1e44dde0] Lock Acquired to object 
> 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
> 2017-04-25 10:28:57,992-07 WARN  
> [org.ovirt.engine.core.bll.storage.domain.ImportHostedEngineStorageDomainCommand]
>  (org.ovirt.thread.pool-6-thread-9) [1e44dde0] Validation of action 
> 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons: 
> VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_ALREADY_EXIST
> 2017-04-25 10:28:57,992-07 INFO  
> [org.ovirt.engine.core.bll.storage.domain.ImportHostedEngineStorageDomainCommand]
>  (org.ovirt.thread.pool-6-thread-9) [1e44dde0] Lock freed to object 
> 'EngineLock:{exclusiveLocks='[]', sharedLocks='null’}’
> 
> Otherwise the log is pretty clean.
> 
> I saw nothing of interest in the Gluster logs or the hosted-engine-ha logs on 
> the host it is running on.
> 
> It appears harmless, but then we aren’t actually using these systems yet, and 
> in any case we don’t want the error spamming the log forever. This is 
> 4.1.1.8-1.el7.centos, hosted engine on Centos 7.6.1311, Gluster for both the 
> hosted engine and general data domains.
> 
> Has anyone seen this before?
> 
> Thanks in advance,
> 
> -j
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted engine already imported

2017-04-26 Thread Jamie Lawrence

> On Apr 26, 2017, at 1:49 AM, Yaniv Kaul  wrote:

> In my hopefully-near-complete quest to automate Ovirt configuration in our 
> environment, I’m very close. One outstanding issue that remains is that, even 
> though the hosted engine storage domain actually was imported and shows in 
> the GUI, some part of Ovirt appears to think that hasn’t happened yet.
> 
> We'll be happy to hear, once complete, how you've achieved this automation.
> Note that there are several implementations already doing this successfully. 
> See [1] for one.

Thanks, I looked at that. Our implementation is pretty site-specific, but 
briefly, the Ovirt part is just a set of Puppet classes and some scripts for 
glue. The site specific stuff does things like driving our Gluster config, 
dumping the DB/loading into our PG cluster[1] and reconfiguring, hooking in to 
monitoring, replacing the Apache SSL cert, etc.

The basic stuff isn’t all that different than one of the generic Puppetforge 
modules, and if you don’t have our exact environment and requirements, the rest 
of it wouldn’t be useful.

-j

Still hoping for a response to the original question…


> One outstanding issue that remains is that, even though the hosted engine 
> storage domain actually was imported and shows in the GUI, some part of Ovirt 
> appears to think that hasn’t happened yet.
> 
> In the GUI, a periodic error is logged: “The Hosted Engine Storage Domain 
> doesn’t exist. It will be imported automatically…”
> 
> In engine.log, all I’m seeing that appears relevant is:
> 
> 2017-04-25 10:28:57,988-07 INFO  
> [org.ovirt.engine.core.bll.storage.domain.ImportHostedEngineStorageDomainCommand]
>  (org.ovirt.thread.pool-6-thread-9) [1e44dde0] Lock Acquired to object 
> 'EngineLock:{exclusiveLocks='[]', sharedLocks='null'}'
> 2017-04-25 10:28:57,992-07 WARN  
> [org.ovirt.engine.core.bll.storage.domain.ImportHostedEngineStorageDomainCommand]
>  (org.ovirt.thread.pool-6-thread-9) [1e44dde0] Validation of action 
> 'ImportHostedEngineStorageDomain' failed for user SYSTEM. Reasons: 
> VAR__ACTION__ADD,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_ALREADY_EXIST
> 2017-04-25 10:28:57,992-07 INFO  
> [org.ovirt.engine.core.bll.storage.domain.ImportHostedEngineStorageDomainCommand]
>  (org.ovirt.thread.pool-6-thread-9) [1e44dde0] Lock freed to object 
> 'EngineLock:{exclusiveLocks='[]', sharedLocks='null’}’
> 
> Otherwise the log is pretty clean.
> 
> I saw nothing of interest in the Gluster logs or the hosted-engine-ha logs on 
> the host it is running on.
> 
> It appears harmless, but then we aren’t actually using these systems yet, and 
> in any case we don’t want the error spamming the log forever. This is 
> 4.1.1.8-1.el7.centos, hosted engine on Centos 7.6.1311, Gluster for both the 
> hosted engine and general data domains.


-j

[1] We are a DB heavy shop and want our DBAs managing backups and whatnot, and 
the installer barfs because we aren’t running an ancient PG version and tune 
our DBs differently than the absurdly-mandatory Ovirt defaults, so this dance 
is painful.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted engine install failed; vdsm upset about broker

2017-04-21 Thread Jamie Lawrence

> On Apr 21, 2017, at 6:38 AM, knarra <kna...@redhat.com> wrote:
> 
> On 04/21/2017 06:34 PM, Jamie Lawrence wrote:
>>> On Apr 20, 2017, at 10:36 PM, knarra <kna...@redhat.com> wrote:
>>>> The installer claimed it did, but I believe it didn’t. Below the error 
>>>> from my original email, there’s the below (apologies for not including it 
>>>> earlier; I missed it). Note: 04ff4cf1-135a-4918-9a1f-8023322f89a3 is the 
>>>> HE - I’m pretty sure it is complaining about itself. (In any case, I 
>>>> verified that there are no other VMs running with both virsh and 
>>>> vdsClient.)
>> ^^^
>> 
>>>> 2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:128 Stage 
>>>> late_setup METHOD otopi.plugins.gr_he_setup.vm.runvm.Plugin._late_setup
>>>> 2017-04-19 12:27:02 DEBUG otopi.plugins.gr_he_setup.vm.runvm 
>>>> runvm._late_setup:83 {'status': {'message': 'Done', 'code': 0}, 'items': 
>>>> [u'04ff4cf1-135a-4918-9a1f-8023322f89a3']}
>>>> 2017-04-19 12:27:02 ERROR otopi.plugins.gr_he_setup.vm.runvm 
>>>> runvm._late_setup:91 The following VMs have been found: 
>>>> 04ff4cf1-135a-4918-9a1f-8023322f89a3
>>>> 2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:142 method 
>>>> exception
>>>> Traceback (most recent call last):
>>>>   File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in 
>>>> _executeMethod
>>>> method['method']()
>>>>   File 
>>>> "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/vm/runvm.py",
>>>>  line 95, in _late_setup
>>>> _('Cannot setup Hosted Engine with other VMs running')
>>>> RuntimeError: Cannot setup Hosted Engine with other VMs running
>>>> 2017-04-19 12:27:02 ERROR otopi.context context._executeMethod:151 Failed 
>>>> to execute stage 'Environment setup': Cannot setup Hosted Engine with 
>>>> other VMs running
>>>> 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:760 
>>>> ENVIRONMENT DUMP - BEGIN
>>>> 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
>>>> BASE/error=bool:'True'
>>>> 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
>>>> BASE/exceptionInfo=list:'[(, 
>>>> RuntimeError('Cannot setup Hosted Engine with other VMs running',), 
>>>> )]'
>>>> 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:774 
>>>> ENVIRONMENT DUMP - END
>>> James, generally this issue happens when the setup failed once and you 
>>> tried re running it again.  Can you clean it and deploy it again?  HE 
>>> should come up successfully. Below are the steps for cleaning it up.
>> Knarra,
>> 
>> I realize that. However, that is not the situation in my case. See above, at 
>> the mark - the UUID it is complaining about is the UUID of the hosted-engine 
>> it just installed. From the answers file generated from the run (whole thing 
>> below):
>> 
>>>>>> OVEHOSTED_VM/vmUUID=str:04ff4cf1-135a-4918-9a1f-8023322f89a3
>> Also see the WARNs I mentioned previously, quoted below. Excerpt:
>> 
>>>>>> Apr 19 12:29:20 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root WARN 
>>>>>> File: 
>>>>>> /var/lib/libvirt/qemu/channels/04ff4cf1-135a-4918-9a1f-8023322f89a3.com.redhat.rhevm.vdsm
>>>>>>  already removed
>>>>>> Apr 19 12:29:20 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root WARN 
>>>>>> File: 
>>>>>> /var/lib/libvirt/qemu/channels/04ff4cf1-135a-4918-9a1f-8023322f89a3.org.qemu.guest_agent.0
>>>>>>  already removed
>>>>>> Apr 19 12:29:30 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm 
>>>>>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect 
>>>>>> to broker, the number of errors has exceeded the limit (1)
>> I’m not clear on what it is attempting to do there, but it seems relevant.
> I remember that you said HE vm was not started when the installation was 
> successful. Is Local Maintenance enabled on that host?
> 
> can you please check if the services 'ovirt-ha-agent' and 'ovirt-ha-broker' 
> running fine and try to restart them once ?

Agent and broker logs from before are down in the original message quoting.  
They’re running, but not fine.

[root@sc5-ovirt-2 jlawrence]# ps ax|grep ha-
130599 ?Ssl3:52 /usr/bin/python 
/u

Re: [ovirt-users] Hosted engine install failed; vdsm upset about broker

2017-04-21 Thread Jamie Lawrence

> On Apr 20, 2017, at 10:36 PM, knarra  wrote:

>> The installer claimed it did, but I believe it didn’t. Below the error from 
>> my original email, there’s the below (apologies for not including it 
>> earlier; I missed it). Note: 04ff4cf1-135a-4918-9a1f-8023322f89a3 is the HE 
>> - I’m pretty sure it is complaining about itself. (In any case, I verified 
>> that there are no other VMs running with both virsh and vdsClient.)

^^^ 

>> 2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:128 Stage 
>> late_setup METHOD otopi.plugins.gr_he_setup.vm.runvm.Plugin._late_setup
>> 2017-04-19 12:27:02 DEBUG otopi.plugins.gr_he_setup.vm.runvm 
>> runvm._late_setup:83 {'status': {'message': 'Done', 'code': 0}, 'items': 
>> [u'04ff4cf1-135a-4918-9a1f-8023322f89a3']}
>> 2017-04-19 12:27:02 ERROR otopi.plugins.gr_he_setup.vm.runvm 
>> runvm._late_setup:91 The following VMs have been found: 
>> 04ff4cf1-135a-4918-9a1f-8023322f89a3
>> 2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:142 method 
>> exception
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in 
>> _executeMethod
>> method['method']()
>>   File 
>> "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/vm/runvm.py",
>>  line 95, in _late_setup
>> _('Cannot setup Hosted Engine with other VMs running')
>> RuntimeError: Cannot setup Hosted Engine with other VMs running
>> 2017-04-19 12:27:02 ERROR otopi.context context._executeMethod:151 Failed to 
>> execute stage 'Environment setup': Cannot setup Hosted Engine with other VMs 
>> running
>> 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:760 
>> ENVIRONMENT DUMP - BEGIN
>> 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
>> BASE/error=bool:'True'
>> 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
>> BASE/exceptionInfo=list:'[(, 
>> RuntimeError('Cannot setup Hosted Engine with other VMs running',), 
>> )]'
>> 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:774 
>> ENVIRONMENT DUMP - END
> James, generally this issue happens when the setup failed once and you tried 
> re running it again.  Can you clean it and deploy it again?  HE should come 
> up successfully. Below are the steps for cleaning it up.

Knarra,

I realize that. However, that is not the situation in my case. See above, at 
the mark - the UUID it is complaining about is the UUID of the hosted-engine it 
just installed. From the answers file generated from the run (whole thing 
below):

 OVEHOSTED_VM/vmUUID=str:04ff4cf1-135a-4918-9a1f-8023322f89a3

Also see the WARNs I mentioned previously, quoted below. Excerpt:

 Apr 19 12:29:20 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root WARN 
 File: 
 /var/lib/libvirt/qemu/channels/04ff4cf1-135a-4918-9a1f-8023322f89a3.com.redhat.rhevm.vdsm
  already removed
 Apr 19 12:29:20 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root WARN 
 File: 
 /var/lib/libvirt/qemu/channels/04ff4cf1-135a-4918-9a1f-8023322f89a3.org.qemu.guest_agent.0
  already removed
 Apr 19 12:29:30 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm 
 ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect 
 to broker, the number of errors has exceeded the limit (1)

I’m not clear on what it is attempting to do there, but it seems relevant.

I know there is no failed install left on the gluster volume, because when I 
attempt an install, part of my scripted prep process is deleting and recreating 
the Gluster volume. The below instructions are more or less what I’m doing 
already in a script[1]. (the gluster portion of the script process is: stop the 
volume, delete the volume, remove the mount point directory to avoid Gluster’s 
xattr problem with recycling directories, recreate the directory, change perms, 
create the volume, start the volume, set Ovirt-recc’ed volume options.)

-j

[1] We have a requirement for automated setup of all production resources, so 
all of this ends up being scripted.

> 1) vdsClient -s 0 list table | awk '{print $1}' | xargs vdsClient -s 0 destroy
> 
> 2) stop the volume and delete all the information inside the bricks from all 
> the hosts
> 
> 3) try to umount storage from /rhev/data-center/mnt/ - umount -f 
> /rhev/data-center/mnt/  if it is mounted
> 
> 4) remove all dirs from /rhev/data-center/mnt/ - rm -rf 
> /rhev/data-center/mnt/*
> 
> 5) start  volume again and start the deployment.
> 
> Thanks
> kasturi
>> 
>> 
 If I start it manually, the default DC is down, the default cluster has 
 the installation host in the cluster,  there is no storage, and the VM 
 doesn’t show up in the GUI. In this install run, I have not yet started 
 the engine manually.
>>> you wont be seeing HE vm until HE storage is imported into the UI. HE 
>>> storage will be automatically imported into the UI (which will import HE vm 
>>> too )once a 

Re: [ovirt-users] Hosted engine install failed; vdsm upset about broker (revised)

2017-04-20 Thread Jamie Lawrence

> On Apr 20, 2017, at 9:18 AM, Simone Tiraboschi  wrote:

> Could you please share the output of 
>   sudo -u vdsm sudo service sanlock status

That command line prompts for vdsm’s password, which it doesn’t have. But 
output returned as root is below. Is that ‘operation not permitted’ related?

Thanks,

-j

[root@sc5-ovirt-2 jlawrence]# service sanlock status
Redirecting to /bin/systemctl status  sanlock.service
● sanlock.service - Shared Storage Lease Manager
   Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor 
preset: disabled)
   Active: active (running) since Wed 2017-04-19 16:56:40 PDT; 17h ago
  Process: 16764 ExecStart=/usr/sbin/sanlock daemon (code=exited, 
status=0/SUCCESS)
 Main PID: 16765 (sanlock)
   CGroup: /system.slice/sanlock.service
   ├─16765 /usr/sbin/sanlock daemon
   └─16766 /usr/sbin/sanlock daemon

Apr 19 16:56:40 sc5-ovirt-2.squaretrade.com systemd[1]: Starting Shared Storage 
Lease Manager...
Apr 19 16:56:40 sc5-ovirt-2.squaretrade.com systemd[1]: Started Shared Storage 
Lease Manager.
Apr 19 16:56:40 sc5-ovirt-2.squaretrade.com sanlock[16765]: 2017-04-19 
16:56:40-0700 482 [16765]: set scheduler RR|RESET_ON_FORK priority 99 failed: 
Operation not permitted

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted engine install failed; vdsm upset about broker

2017-04-20 Thread Jamie Lawrence

> On Apr 19, 2017, at 11:35 PM, knarra <kna...@redhat.com> wrote:
> 
> On 04/20/2017 03:15 AM, Jamie Lawrence wrote:
>> I trialed installing the hosted engine, following the instructions at  
>> http://www.ovirt.org/documentation/self-hosted/chap-Deploying_Self-Hosted_Engine/
>>   . This is using Gluster as the backend storage subsystem.
>> 
>> Answer file at the end.
>> 
>> Per the docs,
>> 
>> "When the hosted-engine deployment script completes successfully, the oVirt 
>> Engine is configured and running on your host. The Engine has already 
>> configured the data center, cluster, host, the Engine virtual machine, and a 
>> shared storage domain dedicated to the Engine virtual machine.”
>> 
>> In my case, this is false. The installation claims success, but  the hosted 
>> engine VM stays stopped, unless I start it manually.
> During the install process there is a step where HE vm is stopped and 
> started. Can you check if this has happened correctly ?

The installer claimed it did, but I believe it didn’t. Below the error from my 
original email, there’s the below (apologies for not including it earlier; I 
missed it). Note: 04ff4cf1-135a-4918-9a1f-8023322f89a3 is the HE - I’m pretty 
sure it is complaining about itself. (In any case, I verified that there are no 
other VMs running with both virsh and vdsClient.)

2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:128 Stage 
late_setup METHOD otopi.plugins.gr_he_setup.vm.runvm.Plugin._late_setup
2017-04-19 12:27:02 DEBUG otopi.plugins.gr_he_setup.vm.runvm 
runvm._late_setup:83 {'status': {'message': 'Done', 'code': 0}, 'items': 
[u'04ff4cf1-135a-4918-9a1f-8023322f89a3']}
2017-04-19 12:27:02 ERROR otopi.plugins.gr_he_setup.vm.runvm 
runvm._late_setup:91 The following VMs have been found: 
04ff4cf1-135a-4918-9a1f-8023322f89a3
2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:142 method 
exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in 
_executeMethod
method['method']()
  File 
"/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/vm/runvm.py",
 line 95, in _late_setup
_('Cannot setup Hosted Engine with other VMs running')
RuntimeError: Cannot setup Hosted Engine with other VMs running
2017-04-19 12:27:02 ERROR otopi.context context._executeMethod:151 Failed to 
execute stage 'Environment setup': Cannot setup Hosted Engine with other VMs 
running
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT 
DUMP - BEGIN
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
BASE/error=bool:'True'
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
BASE/exceptionInfo=list:'[(, 
RuntimeError('Cannot setup Hosted Engine with other VMs running',), )]'
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:774 ENVIRONMENT 
DUMP - END


>> If I start it manually, the default DC is down, the default cluster has the 
>> installation host in the cluster,  there is no storage, and the VM doesn’t 
>> show up in the GUI. In this install run, I have not yet started the engine 
>> manually.
> you wont be seeing HE vm until HE storage is imported into the UI. HE storage 
> will be automatically imported into the UI (which will import HE vm too )once 
> a master domain is present .

Sure; I’m just attempting to provide context.

>> I assume this is related to the errors in ovirt-hosted-engine-setup.log, 
>> below. (The timestamps are confusing; it looks like the Python errors are 
>> logged some time after they’re captured or something.) The HA broker and 
>> agent logs just show them looping in the sequence below.
>> 
>> Is there a decent way to pick this up and continue? If not, how do I make 
>> this work?
> Can you please check the following things.
> 
> 1) is glusterd running on all the nodes ? 'systemctl status glistered’
> 2) Are you able to connect to your storage server which is ovirt_engine in 
> your case.
> 3) Can you check if all the brick process in the volume is up ?


1) Verified that glusterd is running on all three nodes.

2) 
[root@sc5-thing-1]# mount -tglusterfs sc5-gluster-1:/ovirt_engine 
/mnt/ovirt_engine
[root@sc5-thing-1]# df -h
Filesystem  Size  Used Avail Use% Mounted on
[…]
sc5-gluster-1:/ovirt_engine 300G  2.6G  298G   1% /mnt/ovirt_engine


3)
[root@sc5-gluster-1 jlawrence]# gluster volume status
Status of volume: ovirt_engine
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick sc5-gluster-1:/gluster-bricks/ovirt_e
ngine/ovirt_engine-149217 0  Y  

[ovirt-users] Hosted engine install failed; vdsm upset about broker (revised)

2017-04-19 Thread Jamie Lawrence

So, tracing this further, I’m pretty sure this is something about sanlock. 

As best I can tell this[1]  seems to be the failure that is blocking importing 
the pool, creating storage domains, importing the HE, etc. Contrary to the log, 
sanlock is running; I verified it starts on system-boot and restarts just fine.

I found one reference to someone having a similar problem in 3.6, but that 
appeared to have been a permission issue I’m not afflicted with.

How can I move past this? 

TIA, 

-j


[1] agent.log:
MainThread::WARNING::2017-04-19 
17:07:13,537::agent::209::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
 Restarting agent, attempt '6'
MainThread::INFO::2017-04-19 
17:07:13,567::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
 Found certificate common name: sc5-ovirt-2.squaretrade.com
MainThread::INFO::2017-04-19 
17:07:13,569::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
 Initializing VDSM
MainThread::INFO::2017-04-19 
17:07:16,044::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
 Connecting the storage
MainThread::INFO::2017-04-19 
17:07:16,045::storage_server::219::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
 Connecting storage server
MainThread::INFO::2017-04-19 
17:07:20,876::storage_server::226::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
 Connecting storage server
MainThread::INFO::2017-04-19 
17:07:20,893::storage_server::233::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
 Refreshing the storage domain
MainThread::INFO::2017-04-19 
17:07:21,160::hosted_engine::657::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
 Preparing images
MainThread::INFO::2017-04-19 
17:07:21,160::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images)
 Preparing images
MainThread::INFO::2017-04-19 
17:07:23,954::hosted_engine::660::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)
 Refreshing vm.conf
MainThread::INFO::2017-04-19 
17:07:23,955::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
 Reloading vm.conf from the shared storage domain
MainThread::INFO::2017-04-19 
17:07:23,955::config::412::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
 Trying to get a fresher copy of vm configuration from the OVF_STORE
MainThread::WARNING::2017-04-19 
17:07:26,741::ovf_store::107::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
 Unable to find OVF_STORE
MainThread::ERROR::2017-04-19 
17:07:26,744::config::450::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
 Unable to identify the OVF_STORE volume, falling back to initial vm.conf. 
Please ensure you already added your first data domain for regular VMs
MainThread::INFO::2017-04-19 
17:07:26,770::hosted_engine::509::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
 Initializing ha-broker connection
MainThread::INFO::2017-04-19 
17:07:26,771::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor ping, options {'addr': '10.181.26.1'}
MainThread::INFO::2017-04-19 
17:07:26,774::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 140621269798096
MainThread::INFO::2017-04-19 
17:07:26,774::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': 
'ovirtmgmt', 'address': '0'}
MainThread::INFO::2017-04-19 
17:07:26,791::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 140621269798544
MainThread::INFO::2017-04-19 
17:07:26,792::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
MainThread::INFO::2017-04-19 
17:07:26,793::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 140621269798224
MainThread::INFO::2017-04-19 
17:07:26,794::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid': 
'04ff4cf1-135a-4918-9a1f-8023322f89a3', 'address': '0'}
MainThread::INFO::2017-04-19 
17:07:26,796::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Success, id 140621269796816
MainThread::INFO::2017-04-19 
17:07:26,796::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 
'04ff4cf1-135a-4918-9a1f-8023322f89a3', 'address': '0'}

[ovirt-users] Hosted engine install failed; vdsm upset about broker

2017-04-19 Thread Jamie Lawrence
I trialed installing the hosted engine, following the instructions at  
http://www.ovirt.org/documentation/self-hosted/chap-Deploying_Self-Hosted_Engine/
  . This is using Gluster as the backend storage subsystem.

Answer file at the end.

Per the docs, 

"When the hosted-engine deployment script completes successfully, the oVirt 
Engine is configured and running on your host. The Engine has already 
configured the data center, cluster, host, the Engine virtual machine, and a 
shared storage domain dedicated to the Engine virtual machine.”

In my case, this is false. The installation claims success, but  the hosted 
engine VM stays stopped, unless I start it manually. If I start it manually, 
the default DC is down, the default cluster has the installation host in the 
cluster,  there is no storage, and the VM doesn’t show up in the GUI. In this 
install run, I have not yet started the engine manually.

I assume this is related to the errors in ovirt-hosted-engine-setup.log, below. 
(The timestamps are confusing; it looks like the Python errors are logged some 
time after they’re captured or something.) The HA broker and agent logs just 
show them looping in the sequence below.

Is there a decent way to pick this up and continue? If not, how do I make this 
work? 

Thanks,

-j

- - - - ovirt-hosted-engine-setup.log snippet: - - - - 

2017-04-19 12:29:55 DEBUG otopi.context context._executeMethod:128 Stage 
late_setup METHOD otopi.plugins.gr_he_setup.system.vdsmenv.Plugin._late_setup
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd 
systemd.status:90 check service vdsmd status
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd 
plugin.executeRaw:813 execute: ('/bin/systemctl', 'status', 'vdsmd.service'), 
executable='None', cwd='None', env=None
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd 
plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'status', 
'vdsmd.service'), rc=0
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd 
plugin.execute:921 execute-output: ('/bin/systemctl', 'status', 
'vdsmd.service') stdout:
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor 
preset: enabled)
   Active: active (running) since Wed 2017-04-19 12:26:59 PDT; 2min 55s ago
  Process: 67370 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh 
--post-stop (code=exited, status=0/SUCCESS)
  Process: 69995 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh 
--pre-start (code=exited, status=0/SUCCESS)
 Main PID: 70062 (vdsm)
   CGroup: /system.slice/vdsmd.service
   └─70062 /usr/bin/python2 /usr/share/vdsm/vdsm

Apr 19 12:29:00 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm 
ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to 
broker, the number of errors has exceeded the limit (1)
Apr 19 12:29:00 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root ERROR failed 
to retrieve Hosted Engine HA info
 Traceback (most recent 
call last):
   File 
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
 stats = 
instance.get_all_stats()
   File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", 
line 102, in get_all_stats
 with 
broker.connection(self._retries, self._wait):
   File 
"/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
 return 
self.gen.next()
   File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 99, in connection
 
self.connect(retries, wait)
   File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 78, in connect
 raise 
BrokerConnectionError(error_msg)
 BrokerConnectionError: 
Failed to connect to broker, the number of errors has exceeded the limit (1)
Apr 19 12:29:15 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm 
ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to 
broker, the number of errors has exceeded the limit (1)
Apr 19 12:29:15 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root ERROR failed 
to retrieve Hosted Engine HA info
 Traceback (most recent 
call last):
   File 

Re: [ovirt-users] virsh list

2017-04-14 Thread Jamie Lawrence
I forget where I found it, likely somewhere in the code. But:

u: vdsm@ovirt
p: shibboleth

-j

> On Apr 14, 2017, at 3:58 PM, Konstantin Raskoshnyi  wrote:
> 
> Hi guys
> 
> I'm trying to run virsh list (or any other virsh commands)
> 
> virsh list
> Please enter your authentication name: admin
> Please enter your password:
> error: failed to connect to the hypervisor
> error: authentication failed: authentication failed
> 
> But I have no clue about login:password ovirt uses. 
> I tried admin password, also tried to create new account with saslpasswd2
> 
> Which didn't work to.
> 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted engine setup shooting dirty pool

2017-04-12 Thread Jamie Lawrence

> On Apr 12, 2017, at 1:31 AM, Evgenia Tokar  wrote:
> 
> Hi Jamie, 
> 
> Are you trying to setup hosted engine using the "hosted-engine --deploy" 
> command, or are you trying to migrate existing he vm? 
>  
> For hosted engine setup you need to provide a clean storage domain, which is 
> not a part of your 4.1 setup, this storage domain will be used for the hosted 
> engine and will be visible in the UI once the deployment of the hosted engine 
> is complete.
> If your storage domain appears in the UI it means that it is already 
> connected to the storage pool and is not "clean”.

Hi Jenny,

Thanks for the response.

I’m using `hosted-engine —deploy`, yes. (Actually, the last few attempts have 
been with an answerfile, but the responses are the same.)

I think I may have been unclear.  I understand that it wants an unmolested SD. 
There just doesn’t seem to be a path to provide that with an Ovirt-managed 
Gluster cluster.

I guess my question is how to provide that with an Ovirt-managed gluster 
installation. Or a different way of asking, I guess, would be how do I make 
Ovirt/VDSM ignore a newly created gluster SD so that `hosted-engine` can pick 
it up? I don’t see any options to tell the Gluster cluster to not auto-discover 
or similar. So as soon as I create it, the non-hosted engine picks it up. This 
happens within seconds - I vainly tried to time it with running the installer.

This is why I mentioned dismissing the idea of using another Gluster 
installation, unattached to Ovirt. That’s the only way I could think of to give 
it a clean pool. (I dismissed it because I can’t run this in production with 
that sort of dependency.)

Do I need to take this Gluster cluster out of Ovirt control (delete the Gluster 
cluster from the Ovirt GUI, recreate outside of Ovirt manually), install on to 
that, and then re-associate it in the GUI or something similar?

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Hosted engine setup shooting dirty pool

2017-04-11 Thread Jamie Lawrence
Or at least, refusing to mount a dirty pool.

I have 4.1 set up, configured and functional, currently wired up with two VM 
hosts and three Gluster hosts. It is configured with a (temporary) NFS data 
storage domain, with the end-goal being two data domains on Gluster; one for 
the hosted engine, one for other VMs.

The issue is that `hosted-engine` sees any gluster volumes offered as dirty. (I 
have been creating them via the command line  right before attempting the 
hosted-engine migration; there is nothing in them at that stage.)  I *think* 
what is happening is that ovirt-engine notices a newly created volume and has 
its way with the volume (visible in the GUI; the volume appears in the list), 
and the hosted-engine installer becomes upset about that. What I don’t know is 
what to do about it. Relevant log lines below. The installer almost sounds like 
it is asking me to remove the UUID-directory and whatnot, but I’m pretty sure 
that’s just going to leave me with two problems instead of fixing the first 
one. I’ve considered attempting to wire this together in the DB, which also 
seems like a great way to break things. I’ve even thought of using a Gluster 
installation that Ovirt knows nothing about, mainly as an experiment to see if 
it would even work, but decided it doesn’t matter, because I can’t deploy in 
that state anyway and it doesn’t actually get me any closer to getting this 
working.

I noticed several bugs in the tracker seemingly related, but the bulk of those 
were for past versions and I saw nothing that seemed actionable from my end in 
the others.

So, can anyone spare a clue as to what is going wrong, and what to do about 
that?

-j

- - - - ovirt-hosted-engine-setup.log - - - - 

2017-04-11 16:14:39 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._storageServerConnection:408 connectStorageServer
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._storageServerConnection:475 {'status': {'message': 'Done', 'code': 0}, 
'items': [{u'status': 0, u'id': u'890e82cf-5570-4507-a9bc-c610584dea6e'}]}
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._storageServerConnection:502 {'status': {'message': 'Done', 'code': 0}, 
'items': [{u'status': 0, u'id': u'cd1a1bb6-e607-4e35-b815-1fd88b84fe14'}]}
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._check_existing_pools:794 _check_existing_pools
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._check_existing_pools:795 getConnectedStoragePoolsList
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._check_existing_pools:797 {'status': {'message': 'Done', 'code': 0}}
2017-04-11 16:14:40 INFO otopi.plugins.gr_he_setup.storage.storage 
storage._misc:956 Creating Storage Domain
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createStorageDomain:513 createStorageDomain
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createStorageDomain:547 {'status': {'message': 'Done', 'code': 0}}
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createStorageDomain:549 {'status': {'message': 'Done', 'code': 0}, 
u'mdasize': 0, u'mdathreshold': True, u'mdavalid': True, u'diskfree': 
u'321929216000', u'disktotal': u'321965260800', u'mdafree': 0}
2017-04-11 16:14:40 INFO otopi.plugins.gr_he_setup.storage.storage 
storage._misc:959 Creating Storage Pool
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createFakeStorageDomain:553 createFakeStorageDomain
2017-04-11 16:14:41 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createFakeStorageDomain:570 {'status': {'message': 'Done', 'code': 0}}
2017-04-11 16:14:41 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createFakeStorageDomain:572 {'status': {'message': 'Done', 'code': 0}, 
u'mdasize': 0, u'mdathreshold': True, u'mdavalid': True, u'diskfree': 
u'1933930496', u'disktotal': u'2046640128', u'mdafree': 0}
2017-04-11 16:14:41 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createStoragePool:587 createStoragePool
2017-04-11 16:14:41 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createStoragePool:627 
createStoragePool(args=[storagepoolID=9e399f0c-7c4b-4131-be79-922dda038383, 
name=hosted_datacenter, masterSdUUID=9a5c302b-2a18-4c7e-b75d-29088299988c, 
masterVersion=1, domainList=['9a5c302b-2a18-4c7e-b75d-29088299988c', 
'f26efe61-a2e1-4a85-a212-269d0a047e07'], lockRenewalIntervalSec=None, 
leaseTimeSec=None, ioOpTimeoutSec=None, leaseRetries=None])
2017-04-11 16:15:29 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createStoragePool:640 {'status': {'message': 'Done', 'code': 0}}
2017-04-11 16:15:29 INFO otopi.plugins.gr_he_setup.storage.storage 
storage._misc:962 Connecting Storage Pool
2017-04-11 16:15:29 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._storagePoolConnection:717 connectStoragePool

[ovirt-users] Hosted engine setup shooting dirty pool

2017-04-11 Thread Jamie Lawrence
Or at least, refusing to mount a dirty pool. I’m having trouble getting the 
hosted engine installed.

I have 4.1 set up, configured and functional, currently wired up with two VM 
hosts and three Gluster hosts. It is configured with a (temporary) NFS data 
storage domain, with the end-goal being two data domains on Gluster; one for 
the hosted engine, one for other VMs.

The issue is that `hosted-engine` sees any gluster volumes offered as dirty. (I 
have been creating them via the command line  right before attempting the 
hosted-engine migration; there is nothing in them at that stage.)  I *think* 
what is happening is that ovirt-engine notices a newly created volume and has 
its way with the volume (visible in the GUI; the volume appears in the list), 
and the hosted-engine installer becomes upset about that. What I don’t know is 
what to do about that. Relevant log lines below. The installer almost sounds 
like it is asking me to remove the UUID-directory and whatnot, but I’m pretty 
sure that’s just going to leave me with two problems instead of fixing the 
first one. I’ve considered attempting to wire this together in the DB, which 
also seems like a great way to break things. I’ve even thought of using a 
Gluster cluster that Ovirt knows nothing about, mainly as an experiment to see 
if it would even work, but decided it doesn’t especially matter, as 
architecturally that would not work for production in our environment and I 
just need to get this up.

So, can anyone spare a clue as to what is going wrong, and what to do about 
that?

-j

- - - - ovirt-hosted-engine-setup.log - - - - 

2017-04-11 16:14:39 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._storageServerConnection:408 connectStorageServer
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._storageServerConnection:475 {'status': {'message': 'Done', 'code': 0}, 
'items': [{u'status': 0, u'id': u'890e82cf-5570-4507-a9bc-c610584dea6e'}]}
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._storageServerConnection:502 {'status': {'message': 'Done', 'code': 0}, 
'items': [{u'status': 0, u'id': u'cd1a1bb6-e607-4e35-b815-1fd88b84fe14'}]}
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._check_existing_pools:794 _check_existing_pools
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._check_existing_pools:795 getConnectedStoragePoolsList
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._check_existing_pools:797 {'status': {'message': 'Done', 'code': 0}}
2017-04-11 16:14:40 INFO otopi.plugins.gr_he_setup.storage.storage 
storage._misc:956 Creating Storage Domain
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createStorageDomain:513 createStorageDomain
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createStorageDomain:547 {'status': {'message': 'Done', 'code': 0}}
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createStorageDomain:549 {'status': {'message': 'Done', 'code': 0}, 
u'mdasize': 0, u'mdathreshold': True, u'mdavalid': True, u'diskfree': 
u'321929216000', u'disktotal': u'321965260800', u'mdafree': 0}
2017-04-11 16:14:40 INFO otopi.plugins.gr_he_setup.storage.storage 
storage._misc:959 Creating Storage Pool
2017-04-11 16:14:40 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createFakeStorageDomain:553 createFakeStorageDomain
2017-04-11 16:14:41 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createFakeStorageDomain:570 {'status': {'message': 'Done', 'code': 0}}
2017-04-11 16:14:41 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createFakeStorageDomain:572 {'status': {'message': 'Done', 'code': 0}, 
u'mdasize': 0, u'mdathreshold': True, u'mdavalid': True, u'diskfree': 
u'1933930496', u'disktotal': u'2046640128', u'mdafree': 0}
2017-04-11 16:14:41 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createStoragePool:587 createStoragePool
2017-04-11 16:14:41 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createStoragePool:627 
createStoragePool(args=[storagepoolID=9e399f0c-7c4b-4131-be79-922dda038383, 
name=hosted_datacenter, masterSdUUID=9a5c302b-2a18-4c7e-b75d-29088299988c, 
masterVersion=1, domainList=['9a5c302b-2a18-4c7e-b75d-29088299988c', 
'f26efe61-a2e1-4a85-a212-269d0a047e07'], lockRenewalIntervalSec=None, 
leaseTimeSec=None, ioOpTimeoutSec=None, leaseRetries=None])
2017-04-11 16:15:29 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._createStoragePool:640 {'status': {'message': 'Done', 'code': 0}}
2017-04-11 16:15:29 INFO otopi.plugins.gr_he_setup.storage.storage 
storage._misc:962 Connecting Storage Pool
2017-04-11 16:15:29 DEBUG otopi.plugins.gr_he_setup.storage.storage 
storage._storagePoolConnection:717 connectStoragePool
2017-04-11 16:15:29 DEBUG otopi.context context._executeMethod:142 method 
exception
Traceback (most recent call last):
  

Re: [ovirt-users] Remote PostgreSQL 9.5 (was: Answer file key for "nonlocal postgres")

2017-03-30 Thread Jamie Lawrence

> On Mar 27, 2017, at 10:42 PM, Yedidyah Bar David  wrote:
>> I know what I was doing is unsupported. I was wondering down the wrong 
>> troubleshooting path for a bit there, but I think ultimately what I need is 
>> also unsupported.
>> 
>> It was because I was trying to push this into our extant DB infrastructure, 
>> which is PG 9.5. Which I found doesn’t work with a local-install, either. (I 
>> was thinking it would work due to past experience with things that demand an 
>> old Postgres; IME, PG generally has pretty solid forward-compatibility.)
> 
> In "local-install" you mean on the engine machine?
> 
> From RPMs?
> Instead of the OS-packaged PG (and not in parallel)?
> Were its binaries in /usr/bin (and not some private path)?

Should have provided clearer info. By local install, I mean the machine the 
engine is being installed on. The PG installs I tried were all from RPMs. 9.5 
was from the PGDG archive (the installer refuses to run against it); 9.2, I 
didn’t look, but assume that was either the Ovirt archive or from Centos 
upstream (the installer is happy with it). I just checked, and Postgres client 
binaries (pg_dump, psql, etc.) were installed in /bin; not sure why the package 
would do that.

> If answers for all of the above are 'Yes', then please
> share setup logs, perhaps preferably by opening a bugzilla
> RFE and attaching them there. It's rather likely that
> whatever problems you had are quite easy to solve.
> 
> Otherwise, please try that, or see the bug(s) below for
> a discussion about this.

Sorry, I no longer have that log. I’ve probably run 70-80 iterations of the 
installation, and cleaned up a few times.

Unfortunately, the speed with which I’ve had to do this hasn't matched well 
with asking for help. I’ve solved my DB problems (although I’m well off into an 
entirely manual configuration now), and am currently fighting with LDAP setup.  
(When it works, works fine, in that it authenticates, returns the right data, 
etc. It just times out on connect about 90% of the time, and is the only one of 
a diverse set of LDAP clients to do so.) 

>> So that leads me to my next question: if I install under the supported  
>> version and dump/load/reconfigure to PG9.5.3, is anyone aware of any actual 
>> problems (other than lack of official support)?
> 
> I personally didn't yet reach that point to be able to tell about,
> nor do I know about others that did, but see below.

FTR, as best I can tell so far, there’s no issue with 9.5.3, once you get it 
working. However, at least in my experience, you won’t get it working with the 
installer. I also tried the installer-facilitated migration from local to 
remote at one point; it blew up and died telling me to downgrade our cluster, 
change a bunch of vacuum-related config variables(!), and I think there was 
something else it was fussing about - really ticky operational details the 
installer has no business refusing to run over. (There is simply no way the 
installer has a better idea of what optimum vacuum settings are for our 
hardware, environment and load than our DBAs do.)

What I did, which is working:

- Let the installer do as it pleases with PG9.2 on the engine machine. 
- pg_dump […] | pg_restore […] to the 9.5 cluster for both DBs.
- Revise as needed:
  /etc/ovirt-engine/aaa/internal.properties
  /etc/ovirt-engine/engine.conf.d/10-database-setup.conf
  /etc/ovirt-engine/engine.conf.d/10-setup-dwh-database.conf
  /etc/ovirt-engine-dwh/ovirt-engine-dwhd.conf.d/10-setup-database.conf
- Shut down local PG, verify everything is using the right DB and working.
- Blow away the local DB, leave the ancient PG and related installed for Ovirt 
to do whatever with.
- (Not done yet) code and document this madness for our Puppet system.

One additional manual config requirement I ran in to is that in the places 
where DB URLs are used (.properties, DB-related .conf files), when enabling 
SSL, the URL needs to have ‘?ssl=true’ appended or it fails to attempt SSL on 
connect in our environment. I assume that’s some driver peculiarity but haven’t 
looked (Most of our DB hosts are Debian, a couple are Ubuntu

> Please see this bug, and the the ones it depends on:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1324882
> 
> Almost all of it is relevant for a "vanilla" 9.5.

Thank you; it was helpful in piecing together what needed to happen. 

I do fear upgrades now, especially anything touching the DB. But the installer 
as it works now simply doesn’t support our environment.  I had/am having issues 
with:

 - remote PG 9.5 cluster + required SSL, 
 - using our CA, 
 - OpenLDAP + required SSL,
 - Bonded NICs

I know this is being discussed on the dev list, and I’ve resisted jumping in, 
because I’m not going to be contributing code. But please consider this one 
emphatic vote for adding options to the installer to selectively disable/skip 
parts of the ‘non-core’ config: database, authentication, 

Re: [ovirt-users] Answer file key for "nonlocal postgres"

2017-03-27 Thread Jamie Lawrence
> On Mar 25, 2017, at 10:57 PM, Yedidyah Bar David <d...@redhat.com> wrote:
> 
> On Fri, Mar 24, 2017 at 3:08 AM, Jamie Lawrence
> <jlawre...@squaretrade.com> wrote:

[…]

>> Anyone know what I am missing?
> 
> Probably OVESETUP_PROVISIONING/postgresProvisioningEnabled
> and OVESETUP_DWH_PROVISIONING/postgresProvisioningEnabled .

Appreciate the reply - thanks!

> That said, I strongly recommend to not try and write the answer file
> by hand. Instead, do an interactive setup with the exact conditions […]

I know what I was doing is unsupported. I was wondering down the wrong 
troubleshooting path for a bit there, but I think ultimately what I need is 
also unsupported.

It was because I was trying to push this into our extant DB infrastructure, 
which is PG 9.5. Which I found doesn’t work with a local-install, either. (I 
was thinking it would work due to past experience with things that demand an 
old Postgres; IME, PG generally has pretty solid forward-compatibility.)

So that leads me to my next question: if I install under the supported  version 
and dump/load/reconfigure to PG9.5.3, is anyone aware of any actual problems 
(other than lack of official support)? In doing answerfile-driven installs 
repeatedly, the point where it now fails is after the DB load, with 
ovirt-aaa-jdbc-tool choking and failing the run.

The reason I’m considering that as my fallback, nothing-else-worked option is 
that the DB needs to live in one of our existing clusters. We are a heavy 
Postres shop with a lot of hardware, humans and process devoted to maintaining 
it, and the DBAs would hang my corpse up as a deterrent to others if I started 
installing ancient instances in random places for them to take care of.

> https://bugzilla.redhat.com/show_bug.cgi?id=1396925

Was unaware of that; thanks for sharing (and doing!) it.

Thanks for the help,

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Answer file key for "nonlocal postgres"

2017-03-23 Thread Jamie Lawrence
Hello,

I’m working on an answerfile for an unattended Ovirt install. The engine and 
the data warehouse DBs live remotely. I think I have most of the relevant keys 
defined, but appear to be missing one, because the installer is attempting to 
manage a local Postgres.  Log and error below. Keys I’ve defined so far are:

OVESETUP_DB/secured=bool:False
OVESETUP_DB/user=str:engine
OVESETUP_DB/password=str:[SNIP]
OVESETUP_DB/dumper=str:pg_custom
OVESETUP_DB/database=str:ovirt_engine
OVESETUP_DB/fixDbViolations=none:None
OVESETUP_DB/host=str:[SNIP]
OVESETUP_DB/port=int:5435
OVESETUP_DB/filter=none:None
OVESETUP_DB/restoreJobs=int:2
OVESETUP_DB/securedHostValidation=bool:False

And a similar set for the DWH.

Anyone know what I am missing?


Thanks in advance,

-j

- - - - 
The user error is:

[ INFO  ] Creating PostgreSQL 'ovirt_engine' database
[ ERROR ] Failed to execute stage 'Misc configuration': Failed to start service 
'postgresql'
[ INFO  ] Yum Performing yum transaction rollback



From the installer log, it obviously still thinks it is supposed to set up PG 
locally:

2017-03-23 23:57:18 DEBUG otopi.context context._executeMethod:128 Stage misc 
METHOD 
otopi.plugins.ovirt_engine_setup.ovirt_engine.provisioning.postgres.Plugin._misc
2017-03-23 23:57:18 INFO otopi.ovirt_engine_setup.engine_common.postgres 
postgres.provision:485 Creating PostgreSQL 'ovirt_engine' database
2017-03-23 23:57:18 DEBUG otopi.transaction transaction._prepare:61 preparing 
'File transaction for '/var/lib/pgsql/data/pg_hba.conf''
2017-03-23 23:57:18 DEBUG otopi.filetransaction filetransaction.prepare:185 
file '/var/lib/pgsql/data/pg_hba.conf' exists
2017-03-23 23:57:18 DEBUG otopi.filetransaction filetransaction.prepare:219 
backup 
'/var/lib/pgsql/data/pg_hba.conf'->'/var/lib/pgsql/data/pg_hba.conf.20170323235718'
2017-03-23 23:57:18 DEBUG otopi.plugins.otopi.services.systemd 
systemd.state:130 stopping service postgresql
2017-03-23 23:57:18 DEBUG otopi.plugins.otopi.services.systemd 
plugin.executeRaw:813 execute: ('/bin/systemctl', 'stop', 
'postgresql.service'), executable='None', cwd='None', env=None
2017-03-23 23:57:18 DEBUG otopi.plugins.otopi.services.systemd 
plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'stop', 
'postgresql.service'), rc=0
2017-03-23 23:57:18 DEBUG otopi.plugins.otopi.services.systemd 
plugin.execute:921 execute-output: ('/bin/systemctl', 'stop', 
'postgresql.service') stdout:


2017-03-23 23:57:18 DEBUG otopi.plugins.otopi.services.systemd 
plugin.execute:926 execute-output: ('/bin/systemctl', 'stop', 
'postgresql.service') stderr:


2017-03-23 23:57:18 DEBUG otopi.plugins.otopi.services.systemd 
systemd.state:130 starting service postgresql
2017-03-23 23:57:18 DEBUG otopi.plugins.otopi.services.systemd 
plugin.executeRaw:813 execute: ('/bin/systemctl', 'start', 
'postgresql.service'), executable='None', cwd='None', env=None
2017-03-23 23:57:19 DEBUG otopi.plugins.otopi.services.systemd 
plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'start', 
'postgresql.service'), rc=1
2017-03-23 23:57:19 DEBUG otopi.plugins.otopi.services.systemd 
plugin.execute:921 execute-output: ('/bin/systemctl', 'start', 
'postgresql.service') stdout:


2017-03-23 23:57:19 DEBUG otopi.plugins.otopi.services.systemd 
plugin.execute:926 execute-output: ('/bin/systemctl', 'start', 
'postgresql.service') stderr:
Job for postgresql.service failed because the control process exited with error 
code. See "systemctl status postgresql.service" and "journalctl -xe" for 
details.

2017-03-23 23:57:19 DEBUG otopi.transaction transaction.abort:119 aborting 
'File transaction for '/var/lib/pgsql/data/pg_hba.conf''
2017-03-23 23:57:19 DEBUG otopi.context context._executeMethod:142 method 
exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in 
_executeMethod
method['method']()
  File 
"/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/provisioning/postgres.py",
 line 201, in _misc
self._provisioning.provision()
  File 
"/usr/share/ovirt-engine/setup/ovirt_engine_setup/engine_common/postgres.py", 
line 496, in provision
self.restartPG()
  File 
"/usr/share/ovirt-engine/setup/ovirt_engine_setup/engine_common/postgres.py", 
line 397, in restartPG
state=state,
  File "/usr/share/otopi/plugins/otopi/services/systemd.py", line 141, in state
service=name,
RuntimeError: Failed to start service 'postgresql'
2017-03-23 23:57:19 ERROR otopi.context context._executeMethod:151 Failed to 
execute stage 'Misc configuration': Failed to start service 'postgresql'
2017-03-23 23:57:19 DEBUG otopi.transaction transaction.abort:119 aborting 'Yum 
Transaction'
2017-03-23 23:57:19 INFO otopi.plugins.otopi.packagers.yumpackager 
yumpackager.info:80 Yum Performing yum transaction rollback
Loaded plugins: fastestmirror, versionlock
2017-03-23 23:57:19 DEBUG otopi.transaction transaction.abort:119 aborting 'DWH 
Engine database 

Re: [ovirt-users] strange API parameters default in python SDK

2016-06-06 Thread Jamie Lawrence

> On Jun 6, 2016, at 1:59 AM, Fabrice Bacchella  
> wrote:

> I'm surprised, because in my mind, the default value are the least usefull 
> version of each options. Why don't set them to good, useful values and let 
> the user changed them to the opposite if there is some problems ?

I’m not a developer, but it looks to me like the defaults are chosen to be the 
safe out of the box.

Defaults are tricky, because everyone’s needs are different. (Otherwise if 
everyone wants the same setting, why make it an option?) So not everyone gets 
what they want out of the box, and when choosing them, there needs to be some 
principle guiding the choice, otherwise it is hard for users to develop a 
“feel” for the software and the lack of consistency causes everyone problems. 

The guiding principle you want seems to be ease of use. That’s valid, but with 
software like this, I think it is likely that a lot of folks would prefer 
safety out of the box. Imagine if you had requirements to install and lock 
Ovirt down to some meet specific criteria. If it shipped with a wide-open 
security policy and you were not yet fluent in using it, you’re going to have 
trouble locking it down and probably continue to wonder if you found every 
relevant knob.

The reverse - opening it up - is generally much easier (especially when you’re 
new to complex software) and at least sometimes less dangerous if you get it 
wrong (if it isn’t as open as you’d like), so at least in my view, defaulting 
to locked-down makes more sense.

My $.02,

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Looking for Python-SDK documentation

2016-04-11 Thread Jamie Lawrence

> On Apr 11, 2016, at 11:41 AM, Frank Thommen  
> wrote:
> 

> Thanks to all who answered.  Brett brings it to the point:  All sent links so 
> far are indeed helpful - thanks a lot - but not the reference I expected. 
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html/Python_SDK_Guide/chap-Python_Reference_Documentation.html#Python_Reference_Documentation
>  mentions `pydoc`, but this documentation seems to be provided only for some 
> modules or to be incomplete.  Also for me not being a software developper and 
> newish to Python, the `pydoc` information is not very useful.  Where can I 
> e.g. find the documentation for vms.get() and vms.add() (just to name teo 
> concrete examples)?

I’m pretty sure the examples are all there is at the moment. I spent a while 
looking for reference material a few months ago and haven’t seen anything new 
mentioned in this thread. Between the examples and a couple of questions to the 
list, I managed to piece together what we needed (command-line-driven machine 
creation specific to our needs). 

I am a bit hazy on the specifics of what I did, but I recall that the main 
problem I had to ask about was with order-of-operations issues - it wasn’t 
obvious to me that setting some things out of the expected (by the API) order 
wouldn’t work. IIRC, this had to do with setting boot options.

Best of luck,

-j 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt 3.6.4 / PXE guest boot issues

2016-04-05 Thread Jamie Lawrence
I had the same issue a while back. Never figured it out, and don't have
much else useful to add, other than to say it isn't just you two.

-j

On Tue, Apr 5, 2016 at 3:22 AM, Alan Griffiths 
wrote:

> Hi,
>
>
>
> I’m seeing the same PXE boot issue with 3.6.4 on Centos 7. Booting from
> ISO DHCP works fine. With PXE I can see the offer coming back from the DHCP
> server but the VM just seems to ignore it. I also tried swapping the ROMs
> as per previous post, but had no effect.
>
>
>
> Alan
>
> --
>
> This communication is private and confidential and may contain information
> that is proprietary, privileged or otherwise legally exempt from
> disclosure. If you have received this message in error, please notify the
> sender immediately by e-mail and delete all copies of the message. In
> accordance with our guidelines, emails sent or received may be
> monitored.Inmarsat plc, Registered No 4886072 and Inmarsat Global Limited,
> Registered No. 3675885. Both Registered in England and Wales with
> Registered Office at 99 City Road, London EC1Y 1AX
>
> _
> This e-mail has been scanned for viruses by Verizon Business Internet
> Managed Scanning Services - powered by MessageLabs.
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Run Once install -> reboot loop

2016-02-19 Thread Jamie Lawrence

> On Feb 18, 2016, at 11:45 PM, René Koch  wrote:
> 
> Hi Jamie,
> 
> That reason for this is, that the iso will be mounted as long as you're in 
> run once mode.
> 
> You can fix this by powering off your vm after the installation and run it 
> again in "normal" mode. As you don't want to run the vm in run once mode 
> forever, you have to shut it down anyway.


But it seems that this has worked for me in the past, when I was creating these 
through the GUI. In other words, after the OS installer reboots, it would 
correctly reboot in “normal” mode. Am I hallucinating that? 

In any case, I’m sure there’s a power-off API method, but I’m not sure how to 
reliably detect when to call it. I could do something hacky like call my own 
API endpoint as the last action of the installer somehow (I know how to with 
Debian, and I’m sure the RH family can do it as well), but that seems fragile.

Maybe hack the installer to power power off instead of reboot, and detect that? 
I hate the idea of having to fork/maintain my own installer patches...

What do other people do for automating this situation?

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Run Once install -> reboot loop

2016-02-18 Thread Jamie Lawrence
Hello all,

I’m running overt 3.6.0 on Centos 7.[1]

I’ve been working on getting Ovirt to slot in to our environment, and almost 
have a setup that works. I can now build isos on demand for automated installs 
(I’m working around some local networking choices without modifying them for 
now, which is why this isn’t a PXE boot) and create my guests, pointing them to 
the iso form which to install via Run Once. This all works.

The problem is that after the install, the guest reboots, but (best I can tell) 
Ovirt doesn’t detect it, the iso is still mounted, and the install happens all 
over again. Rinse, repeat.

Has anyone seen this? Or better, does anyone know how to fix this?

Thanks,

-j



[1]
# yum list installed |grep ovirt
ebay-cors-filter.noarch1.0.1-0.1.ovirt.el7@ovirt-3.6
gperftools-libs.x86_64 2.4-7.el7  @ovirt-3.6
ipxe-bootimgs.noarch   20130517-7.gitc4bce43.el7  @ovirt-3.6
ipxe-roms.noarch   20130517-7.gitc4bce43.el7  @ovirt-3.6
ipxe-roms-qemu.noarch  20130517-7.gitc4bce43.el7  @ovirt-3.6
jasperreports-server.noarch6.0.1-1.el7@ovirt-3.6
libcacard-ev.x86_6410:2.3.0-29.1.el7  @ovirt-3.6
libgovirt.x86_64   0.3.1-3.el7@base
otopi.noarch   1.4.0-1.el7.centos @ovirt-3.6
otopi-java.noarch  1.4.0-1.el7.centos @ovirt-3.6
ovirt-engine.noarch3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-backend.noarch3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-cli.noarch3.6.0.1-1.el7.centos   @ovirt-3.6
ovirt-engine-dbscripts.noarch  3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-extension-aaa-jdbc.noarch 1.0.1-1.el7@ovirt-3.6
ovirt-engine-extension-aaa-ldap.noarch 1.1.2-1.el7.centos @ovirt-3.6
ovirt-engine-extension-aaa-ldap-setup.noarch
   1.1.2-1.el7.centos @ovirt-3.6
ovirt-engine-extensions-api-impl.noarch
   3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-jboss-as.x86_64   7.1.1-1.el7.centos @ovirt-3.6
ovirt-engine-lib.noarch3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-restapi.noarch3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-sdk-python.noarch 3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-setup.noarch  3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-setup-base.noarch 3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-setup-plugin-ovirt-engine.noarch
   3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-setup-plugin-ovirt-engine-common.noarch
   3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-setup-plugin-vmconsole-proxy-helper.noarch
   3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-setup-plugin-websocket-proxy.noarch
   3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-tools.noarch  3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-userportal.noarch 3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-vmconsole-proxy-helper.noarch
   3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-webadmin-portal.noarch3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-websocket-proxy.noarch3.6.0.3-1.el7.centos   @ovirt-3.6
ovirt-engine-wildfly.x86_648.2.0-1.el7@ovirt-3.6
ovirt-engine-wildfly-overlay.noarch001-2.el7  @ovirt-3.6
ovirt-host-deploy.noarch   1.4.0-1.el7.centos @ovirt-3.6
ovirt-host-deploy-java.noarch  1.4.0-1.el7.centos @ovirt-3.6
ovirt-host-deploy-offline.x86_64   1.4.0-1.el7.centos @ovirt-3.6
ovirt-hosted-engine-ha.noarch  1.3.2.1-1.el7.centos   @ovirt-3.6
ovirt-hosted-engine-setup.noarch   1.3.0-1.el7.centos @ovirt-3.6
ovirt-image-uploader.noarch3.6.0-1.el7.centos @ovirt-3.6
ovirt-iso-uploader.noarch  3.6.0-1.el7.centos @ovirt-3.6
ovirt-setup-lib.noarch 1.0.0-1.el7.centos @ovirt-3.6
ovirt-vmconsole.noarch 1.0.0-1.el7.centos @ovirt-3.6
ovirt-vmconsole-host.noarch1.0.0-1.el7.centos @ovirt-3.6
ovirt-vmconsole-proxy.noarch   1.0.0-1.el7.centos @ovirt-3.6
patternfly1.noarch 1.3.0-1.el7.centos 
@ovirt-3.6-patternfly1-noarch-epel
python-gluster.noarch  3.7.6-1.el7

[ovirt-users] aaa-LDAP schema selection

2015-12-23 Thread Jamie Lawrence
Hello all,

I’d like to get the LDAP plugin working. We have a lovely LDAP setup deployed 
(OpenLDAP), and nobody here has a clue how to map what we have to the options 
the installer presents.

Well, a clue, yes. 

We include the core, cosine, nis, inetorgperson and misc schemas in the config.

The RHDS, 389, AD, IPA and Novell options are eliminated because we aren’t 
running any of that. I eliminated ‘RFC-2307 Schema (Generic)’ by finding 
attributes not included in the RFC, but added by OpenLDAP. 

Assuming what we are running maps to any of them, one of the  ‘OpenLDAP 
[RFC-2307|Standard] Schema' seem likely. 

Does anyone know of a test (attribute that should be in one, or not in another, 
or some such) to figure this out? Can it be inferred from my schema includes 
(listed above)? I fear that determining this via process of elimination is 
going to be brutal due to difficult-to-replicate weirdness because of only 
minor differences, and the fact that there are other moving parts at the moment 
with this setup.

And to those who enjoy them, happy holidays.

-j

smime.p7s
Description: S/MIME cryptographic signature
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Setting PXE boot via Python API

2015-12-04 Thread Jamie Lawrence

> On Dec 4, 2015, at 12:50 AM, Juan Hernández <jhern...@redhat.com> wrote:
> 
> On 12/03/2015 10:59 PM, Jamie Lawrence wrote:

>>vm.set_os(params.OperatingSystem(cmdline=kernel_cmd))
> 
> This line ^ is overwriting the complete OS configuration that you set
> before, so the change to the boot device is lost. If you enable the
> debug output of the SDK (with debug=True in the constructor of the API
> object) you will see that this is what is sent:

Oh, thank you. That’s what I was missing. (Also, debug=True will make life
nicer.)

-j

-- 
Jamie Lawrence | jlawre...@squaretrade.com





smime.p7s
Description: S/MIME cryptographic signature
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Setting PXE boot via Python API

2015-12-03 Thread Jamie Lawrence
Hello,

I’m currently playing with scripted VM creation and have an issue getting the 
newly created VM to PXE boot. My objective is to port some creation scripts 
from the environment this will eventually replace and worry about making this 
more “Ovirt-ey” later.

Skipping the details, everything is happy through creation, and when I boot it, 
it attempts to boot from a ‘disk' and fails, and I don’t understand why.

Following the creation of the VM, creation/attachment of the disk and net,

boot_dev = params.Boot(dev='network')
vm.os.set_boot([boot_dev])
vm.set_os(params.OperatingSystem(cmdline=kernel_cmd))
vm.update()

kernel_cmd there evaluates to a fairly typical PXEboot string that works with 
our legacy setup - there isn’t really anything exotic going on. The BIOS 
doesn’t show any attempt at a PXE boot - it goes straight to the disk, declares 
it unbootable (because it was just created and is blank), and halts. It feels 
like the set_boot line is wrong or ignored, but this is new to me.

Anyone have a hint?

Thanks,

-j

smime.p7s
Description: S/MIME cryptographic signature
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Engine setup: insistent DNS demand

2015-11-10 Thread Jamie Lawrence

[Snippage]

> On Nov 10, 2015, at 12:26 AM, Yedidyah Bar David  wrote:

>>> IIUC engine-setup never fails on missing DNS resolution, only warns.
> 
> Sorry, that was wrong. Let me sketch the flow:

[Nice writeup of the DNS test flow omitted] 

>> I may well be missing something or otherwise being bone-headed, but I am
>> getting [ Error ] messages, which it doesn’t allow me to skip.

Turns out, I was being bone-headed.

The hosts file was the problem; there were actually two errors in it. I can’t 
be sure, but think, I was actually looking at the hosts file on a different 
machine when I declared it correct, because it was correct on the other 
machines involved, and they’re named similarly enough (varying single digit) 
that my typical, default state of too many open shells probably confused me.

> May I suggest that you simply use, everywhere, both in the dns and in
> /etc/hosts,
> different names for the different addresses.

Very good advice. I corrected the errors in the hosts files and life became 
significantly better.

> And, if what you want is "Please add a flag or whatever that will allow me
> to override all this name lookup mess and just make engine-setup do what I
> say", please consider that the current behavior actually did find something
> which I personally think is unintended, so it helped you catch it now instead
> of perhaps spending much more time, during a much less comfortable situation,
> when something actually breaks due to this.

As a general rule, I do like to have a —shut-up-obey-the-human switch. But you 
are absolutely right that the installer caught an factual error, and then 
caught me making a likely coffee-related error in checking on the first. I 
don’t see any of this as requiring a feature request.

Thanks, Didi, for the extensive analysis and reply. I do greatly appreciate it.

Cheers,

-j

smime.p7s
Description: S/MIME cryptographic signature
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Engine setup: insistent DNS demand

2015-11-09 Thread Jamie Lawrence

Hi Alex,

On 6 Nov 2015, at 20:05, alexmcwhir...@triadic.us wrote:

Im using an older version of ovirt, so it may not apply. But i have 
all of my systems setup with /etc/hosts and no DNS at all. The 
installer complained, but it still installed and ran just fine. 
Downing the interface you want ovirt to use during setup is a bad 
idea.


Clarification: ISTM Ovirt does not necessarily need to know about the 
interface in question - I am mounting storage Ovirt will use over it, 
but not at this time using any Gluster-Ovirt integration.


The interface with DNS service (the one not involved in the error) is 
the primary interface, whereas (at least it seems to me, but I could be 
very wrong), Ovirt doesn’t need to know about that other one, which 
has no DNS service and is resulting in the trouble.


-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Engine setup: insistent DNS demand

2015-11-06 Thread Jamie Lawrence

Hi all,

I’m having trouble finding current references to this problem. (I’m 
seeing workarounds from 2013, but, not surprisingly, things have changed 
since then.)


I’m attempting to run engine-setup, and get to the DNS reverse lookup 
of the FQDN. The machine has two (bonded) interfaces, one for storage 
and one for everything else. The “everything else” network has DNS 
service, the storage network doesn’t, and this seems to make 
engine-setup cranky. /etc/hosts is properly set up for the storage 
network, but that apparently doesn’t count. I tried running with the 
-offline flag, but that apparently still expects DNS.


We do not want/need DNS on the storage network, and I’m hoping someone 
knows a workaround for this not involving DNSMasq.


I considered downing that interface for the setup, but I don’t know 
why engine-setup is so insistent about DNS, and hiding an interface 
seems like a potentially bad idea in any case, so I thought I’d ask 
about it first.


Details:
ovirt-engine.noarch 0:3.6.0.3-1.el7.centos
ovirt-engine-setup-plugin-allinone.noarch 0:3.6.0.3-1.el7.centos

CentOS Linux release 7.1.1503 (Core)

TIA, and happy weekend to all,

-j
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] about LXC and ovirt

2015-10-20 Thread Jamie Lawrence
On Tue, Oct 20, 2015 at 1:51 AM, Dan Kenigsberg  wrote:


> Until then, can you share your own use case for runnig LXC?
>

One use case that recently came up in discussion is in a build farm.
Lightweight containers with the possibility to use AUFS and RAM disks make
for fast, isolated compiles, and they also seem great for automated
testing. Having all of that managed with the same as more long-lived,
traditional VMs is appealing. (Whether doing that in the same cluster as
those longer-lived VMs is wise is a much more context-specific question.)

I can see similar arguments for lots other typical network services where a
fully virtualized VM is unnecessary overhead (there's no need for full
isolation or different OSes), but the logical isolation is valuable.

-j


-- 
jlawre...@squaretrade.com
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users