[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-10 Thread Doug Ingham
Hi Jim,
 Just to throw my 2 cents in, one of my clusters is very similar to yours,
& I'm not having any of the issues you complain about. One thing I would
strongly recommend you do however is bond your NICs with LACP 802.3ad -
either 2x1Gbit for oVirt & 2x1Gbit for Gluster, or bond all of your NICs
together & separate the storage & management networks with VLANs. Swap
should generally be avoided today, RAM is cheap.

4x R710s, each w/...
* 96GB RAM
* 6x 7.2k SATA HDDs in RAID 0 (Gluster dist-rep 2+1 arb)
* 2x USB sticks in RAID 1 (CentOS)
* 4x 1Gbit, bonded with LACP/802.3ad/mode 4

[root@v0 ~]# gluster volume info data

Volume Name: data
Type: Distributed-Replicate
Volume ID: bded65c7-e79e-4bc9-9630-36a69ad2e684
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: s0:/gluster/data/brick
Brick2: s1:/gluster/data/brick
Brick3: s2:/gluster/data/arbiter (arbiter)
Brick4: s2:/gluster/data/brick
Brick5: s3:/gluster/data/brick
Brick6: s0:/gluster/data/arbiter (arbiter)
Options Reconfigured:
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
features.shard: on
cluster.data-self-heal-algorithm: full
storage.owner-uid: 36
storage.owner-gid: 36
server.allow-insecure: on
network.ping-timeout: 30
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal: on
cluster.metadata-self-heal: on
cluster.entry-self-heal: on
cluster.granular-entry-heal: enable
features.lock-heal: on


I've got about 30 VMs running on this setup without issue. Upgrading to
10Gbit will be the next step, however I/O is generally nicely balanced
across all of the NICs, so it's rarely as issue.
Considering none of this kit is particularly new or high performance, it's
not been overly noticeable as all aspects of the load & I/O are very evenly
distributed.


On 10 July 2018 at 06:03, Alex K  wrote:

>
> I see also that the last 4 or 5 weeks (after I upgraded from 4.1 to 4.2) I
> have to almost every week go and refresh the servers (maintenance, reboot)
> to release the RAM. If I leave them the RAM ill be eventually depleted from
> gluster services. I am running gluster 3.12.9-1 with ovirt 4.2.4.5-1.el7.
>
> Alex
>
> On Mon, Jul 9, 2018 at 6:08 PM, Edward Clay 
> wrote:
>
>> Just to add my .02 here.  I've opened a bug on this issue where HV/host
>> connected to clusterfs volumes are running out of ram.  This seemed to be a
>> bug fixed in gluster 3.13 but that patch doesn't seem to be avaiable any
>> longer and 3.12 is what ovirt is using.  For example I have a host that was
>> showing 72% of memory consumption with 3 VMs running on it.  If I migrate
>> those VMs to another Host memory consumption drops to 52%.  If i put this
>> host into maintenance and then activate it it drops down to 2% or so.
>> Since I ran into this issue I've been manually watching memory consumption
>> on each host and migrating VMs from it to others to keep things from
>> dying.  I'm hoping with the announcement of gluster 3.12 end of life and
>> the move to gluster 4.1 that this will get fixed or that the patch from
>> 3.13 can get backported so this problem will go away.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1593826
>>
>> On 07/07/2018 11:49 AM, Jim Kusznir wrote:
>>
>> **Security Notice - This external email is NOT from The Hut Group**
>>
>> This host has NO VMs running on it, only 3 running cluster-wide
>> (including the engine, which is on its own storage):
>>
>> top - 10:44:41 up 1 day, 17:10,  1 user,  load average: 15.86, 14.33,
>> 13.39
>> Tasks: 381 total,   1 running, 379 sleeping,   1 stopped,   0 zombie
>> %Cpu(s):  2.7 us,  2.1 sy,  0.0 ni, 89.0 id,  6.1 wa,  0.0 hi,  0.2 si,
>> 0.0 st
>> KiB Mem : 32764284 total,   338232 free,   842324 used, 31583728
>> buff/cache
>> KiB Swap: 12582908 total, 12258660 free,   324248 used. 31076748 avail
>> Mem
>>
>>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
>> COMMAND
>>
>> 13279 root  20   0 2380708  37628   4396 S  51.7  0.1   3768:03
>> glusterfsd
>>
>> 13273 root  20   0 2233212  20460   4380 S  17.2  0.1 105:50.44
>> glusterfsd
>>
>> 13287 root  20   0 2233212  20608   4340 S   4.3  0.1  34:27.20
>> glusterfsd
>>
>> 16205 vdsm   0 -20 5048672  88940  13364 S   1.3  0.3   0:32.69
>> vdsmd
>>
>> 16300 vdsm  20   0  608488  25096   5404 S   1.3  0.1   0:05.78
>> python
>>
>>  1109 vdsm  20   0 3127696  44228   8552 S   0.7  0.1  18:49.76
>> ovirt-ha-broker
>>
>> 2 root  20   0   0  0  0 S   0.7  0.0   0:00.13
>> kworker/u64:3
>>
>>10 root  20   0   0  0  0 S   0.3  0.0   4:22.36
>> rcu_sched
>>
>>   572 root   0 -20   0  0  0 S   0.3  0.0   0:12.02
>> kworker/1:1H
>>
>>   797 root  20   0  

[ovirt-users] Re: VM interface bonding (LACP)

2018-05-18 Thread Doug Ingham
On 14 May 2018 at 16:25, Christopher Cox <c...@endlessnow.com> wrote:

> In the ideal case, what you'd have:
>
>| Single virtio virtual interface
>|
>  VM  Host  Switch stack
>  |
>  |--- 4x 1Gbit interfaces bonded over LACP
>
> The change: virtio instead of "1 Gbit"
>

It's using virtio. My confusion came from the VIF's speed being reported by
the Engine & within the guest.


> You can't get blood from a stone, that is, you can't manufacture bandwidth
> that isn't there.  If you need more than gigabit speed, you need something
> like 10Gbit.  Realize that usually, we're talking about a system created to
> run more than one VM.  If just one, you'll do better with dedicated
> hardware.  If more than one VM, then there sharing going on, though you
> might be able to use QoS (either in oVirt or outside). Even so, if just one
> VM on 10Gbit, you won't necessarily get full 10Gbit out of virtio.  But at
> the same  time bonding should help in the case of multiple VMs.
>
> Now, back to the suggestion at hand.  Multiple virtual NICs.  If the
> logical networks presented via oVirt are such that each (however many)
> logical network has it's own "pipe", then defining a vNIC on each of those
> networks gets you the same sort of "gain" with respect to bonding.  That
> is, no magic bandwidth increase for a particular connection, but more pipes
> available for multiple connections (essentially what you'd expect).
>
> Obviously up to you how you want to do this.  I think you might do better
> to consider a better underlying infrastructure to oVirt rather than trying
> to bond vNICs.  Pretty sure I'm right about that.  Would think the idea of
> bonding at the VM level might be best for simulating something rather than
> something you do because it's right/best.
>

Oh, I'm certain you're right about that! My current budget's focused on
beefing up the resilience our storage layer, however the network is next on
my list. For the moment though, it's a case of working with what I've got.

Bandwidth has only really become an issue recently, since we've started
streaming live events, and that's a simple case of (relatively) low
bandwidth connections. The only place that might get much benefit from
single 10Gbit links would be on our distributed storage layer, although
with 10 nodes, each with 4x1Gbit LAGGs, even that's holding up quite well.

Let's see how the tests go tomorrow...

On 05/14/2018 03:03 PM, Doug Ingham wrote:
>
>> On 14 May 2018 at 15:35, Juan Pablo <pablo.localh...@gmail.com > pablo.localh...@gmail.com>> wrote:
>>
>> so you have lacp on your host, and you want lacp also on your vm...
>> somehow doesn't sounds correct.
>> there are several lacp modes. which one are you using on the host?
>>
>>
>>   Correct!
>>
>>   | Single 1Gbit virtual interface
>>   |
>> VM  Host  Switch stack
>> |
>> |--- 4x 1Gbit interfaces bonded over LACP
>>
>> The traffic for all of the VMs is distributed across the host's 4 bonded
>> links, however each VM is limited to the 1Gbit of its own virtual
>> interface. In the case of my proxy, all web traffic is routed through it,
>> so its single Gbit interface has become a bottleneck.
>>
>> To increase the total bandwidth available to my VM, I presume I will need
>> to add multiple Gbit VIFs & bridge them with a bonding mode.
>> Balance-alb (mode 6) is one option, however I'd prefer to use LACP (mode
>> 4) if possible.
>>
>>
>> 2018-05-14 16:20 GMT-03:00 Doug Ingham:
>>
>> On 14 May 2018 at 15:03, Vinícius Ferrão wrote:
>>
>> You should use better hashing algorithms for LACP.
>>
>> Take a look at this explanation:
>> https://www.ibm.com/developerworks/community/blogs/
>> storageneers/entry/Enhancing_IP_Network_Performance_with_LACP?lang=en
>> <https://www.ibm.com/developerworks/community/blogs/
>> storageneers/entry/Enhancing_IP_Network_Performance_with_LACP?lang=en>
>>
>> In general only L2 hashing is made, you can achieve better
>> throughput with L3 and multiple IPs, or with L4 (ports).
>>
>> Your switch should support those features too, if you’re
>> using one.
>>
>> V.
>>
>>
>> The problem isn't the LACP connection between the host & the
>> switch, but setting up LACP between the VM & the host. For
>> reasons of stability, my 4.1 cluster's switch ty

[ovirt-users] Re: VM interface bonding (LACP)

2018-05-17 Thread Doug Ingham
 Very handy to know! Cheers!

I've been running a couple of tests over the past few days & it seems,
counter to what I said earlier, the proxy's interfering with the LACP
balancing too, as it rewrites the origin. Duh. *facepalm*

It skipped my mind that all our logs use the x-forwarded headers, so I
overlooked than one!

I'm going to test a new config on the reverse proxy to round-robin the
outbound IPs. We'll find out tomorrow if the VIF really isn't limited to
the reported 1Gbit.

Thanks


On 14 May 2018 at 17:45, Yaniv Kaul <yk...@redhat.com> wrote:

>
>
> On Mon, May 14, 2018, 11:33 PM Chris Adams <c...@cmadams.net> wrote:
>
>> Once upon a time, Doug Ingham <dou...@gmail.com> said:
>> >  Correct!
>> >
>> >  | Single 1Gbit virtual interface
>> >  |
>> > VM  Host  Switch stack
>> >|
>> >|--- 4x 1Gbit interfaces bonded over LACP
>> >
>> > The traffic for all of the VMs is distributed across the host's 4 bonded
>> > links, however each VM is limited to the 1Gbit of its own virtual
>> > interface. In the case of my proxy, all web traffic is routed through
>> it,
>> > so its single Gbit interface has become a bottleneck.
>>
>> It was my understanding that the virtual interface showing up as 1 gig
>> was just a reporting thing (something has to be put in the speed field).
>> I don't think the virtual interface is actually limited to 1 gig, the
>> server will just pass packets as fast as it can.
>>
>
> Absolutely right.
> Y.
>
>
>> --
>> Chris Adams <c...@cmadams.net>
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>>
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
>
>


-- 
Doug
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: VM interface bonding (LACP)

2018-05-14 Thread Doug Ingham
On 14 May 2018 at 15:35, Juan Pablo <pablo.localh...@gmail.com> wrote:

> so you have lacp on your host, and you want lacp also on your vm...
> somehow doesn't sounds correct.
> there are several lacp modes. which one are you using on the host?
>
>

 Correct!

 | Single 1Gbit virtual interface
 |
VM  Host  Switch stack
   |
   |--- 4x 1Gbit interfaces bonded over LACP

The traffic for all of the VMs is distributed across the host's 4 bonded
links, however each VM is limited to the 1Gbit of its own virtual
interface. In the case of my proxy, all web traffic is routed through it,
so its single Gbit interface has become a bottleneck.

To increase the total bandwidth available to my VM, I presume I will need
to add multiple Gbit VIFs & bridge them with a bonding mode.
Balance-alb (mode 6) is one option, however I'd prefer to use LACP (mode 4)
if possible.


> 2018-05-14 16:20 GMT-03:00 Doug Ingham:
>
>> On 14 May 2018 at 15:03, Vinícius Ferrão wrote:
>>
>>> You should use better hashing algorithms for LACP.
>>>
>>> Take a look at this explanation: https://www.ibm.com/developerw
>>> orks/community/blogs/storageneers/entry/Enhancing_IP_
>>> Network_Performance_with_LACP?lang=en
>>>
>>> In general only L2 hashing is made, you can achieve better throughput
>>> with L3 and multiple IPs, or with L4 (ports).
>>>
>>> Your switch should support those features too, if you’re using one.
>>>
>>> V.
>>>
>>
>> The problem isn't the LACP connection between the host & the switch, but
>> setting up LACP between the VM & the host. For reasons of stability, my 4.1
>> cluster's switch type is currently "Linux Bridge", not "OVS". Ergo my
>> question, is LACP on the VM possible with that, or will I have to use ALB?
>>
>> Regards,
>>  Doug
>>
>>
>>>
>>>
>>> On 14 May 2018, at 15:16, Doug Ingham wrote:
>>>
>>> Hi All,
>>>  My hosts have all of their interfaces bonded via LACP to maximise
>>> throughput, however the VMs are still limited to Gbit virtual interfaces.
>>> Is there a way to configure my VMs to take full advantage of the bonded
>>> physical interfaces?
>>>
>>> One way might be adding several VIFs to each VM & using ALB bonding,
>>> however I'd rather use LACP if possible...
>>>
>>> Cheers,
>>> --
>>> Doug
>>>
>>>
>> --
>> Doug
>>
>>


-- 
Doug
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: VM interface bonding (LACP)

2018-05-14 Thread Doug Ingham
On 14 May 2018 at 15:03, Vinícius Ferrão <fer...@if.ufrj.br> wrote:

> You should use better hashing algorithms for LACP.
>
> Take a look at this explanation: https://www.ibm.com/
> developerworks/community/blogs/storageneers/entry/Enhancing_IP_Network_
> Performance_with_LACP?lang=en
>
> In general only L2 hashing is made, you can achieve better throughput with
> L3 and multiple IPs, or with L4 (ports).
>
> Your switch should support those features too, if you’re using one.
>
> V.
>

The problem isn't the LACP connection between the host & the switch, but
setting up LACP between the VM & the host. For reasons of stability, my 4.1
cluster's switch type is currently "Linux Bridge", not "OVS". Ergo my
question, is LACP on the VM possible with that, or will I have to use ALB?

Regards,
 Doug


>
>
> On 14 May 2018, at 15:16, Doug Ingham wrote:
>
> Hi All,
>  My hosts have all of their interfaces bonded via LACP to maximise
> throughput, however the VMs are still limited to Gbit virtual interfaces.
> Is there a way to configure my VMs to take full advantage of the bonded
> physical interfaces?
>
> One way might be adding several VIFs to each VM & using ALB bonding,
> however I'd rather use LACP if possible...
>
> Cheers,
> --
> Doug
>
>
-- 
Doug
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: VM interface bonding (LACP)

2018-05-14 Thread Doug Ingham
On 14 May 2018 at 15:01, Juan Pablo  wrote:

> LACP is not intended for maximizing throughtput.
> if you are using iscsi, you should use multipathd instead.
>
> regards,
>

Umm, maximising the total throughput for multiple concurrent connections is
most definitely one of the uses of LACP. In this case, the VM is our main
reverse proxy, and its single Gbit VIF has become a bottleneck.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] VM interface bonding (LACP)

2018-05-14 Thread Doug Ingham
Hi All,
 My hosts have all of their interfaces bonded via LACP to maximise
throughput, however the VMs are still limited to Gbit virtual interfaces.
Is there a way to configure my VMs to take full advantage of the bonded
physical interfaces?

One way might be adding several VIFs to each VM & using ALB bonding,
however I'd rather use LACP if possible...

Cheers,
-- 
Doug
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


[ovirt-users] Re: Gluster quorum

2018-05-12 Thread Doug Ingham
The two key errors I'd investigate are these...

2018-05-10 03:24:21,048+02 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
> (DefaultQuartzScheduler8) [7715ceda] Could not associate brick '10.104.0.1:
> /gluster/brick/brick1' of volume 'e0f568fa-987c-4f5c-b853-01bce718ee27'
> with correct network as no gluster network found in cluster
> '59c10db3-0324-0320-0120-0339'
>
> 2018-05-10 03:24:20,749+02 ERROR 
> [org.ovirt.engine.core.bll.gluster.GlusterSyncJob]
> (DefaultQuartzScheduler7) [43f4eaec] Error while refreshing brick statuses
> for volume 'volume1' of cluster 'C6220': null
>
> 2018-05-10 11:59:26,051+02 ERROR [org.ovirt.engine.core.vdsbroker.gluster.
> GetGlusterLocalLogicalVolumeListVDSCommand] (DefaultQuartzScheduler4)
> [400fa486] Command 'GetGlusterLocalLogicalVolumeListVDSCommand(HostName =
> n4.itsmart.cloud, VdsIdVDSCommandParametersBase:
> {hostId='3ddef95f-158d-407c-a7d8-49641e012755'})' execution failed: null
>

I'd start with that first one. Is the network/interface group of your
storage layer actually defined as a Gluster & Migration network within
oVirt?


On 12 May 2018 at 03:44, Demeter Tibor  wrote:

> Hi,
>
> Could someone help me please ? I can't finish my upgrade process.
>
> Thanks
> R
> Tibor
>
>
>
> - 2018. máj.. 10., 12:51, Demeter Tibor  írta:
>
> Hi,
>
> I've attached the vdsm and supervdsm logs. But I don't have engine.log
> here, because that is on hosted engine vm. Should I send that ?
>
> Thank you
>
> Regards,
>
> Tibor
> - 2018. máj.. 10., 12:30, Sahina Bose  írta:
>
> There's a bug here. Can you log one attaching this engine.log and also
> vdsm.log & supervdsm.log from n3.itsmart.cloud
>
> On Thu, May 10, 2018 at 3:35 PM, Demeter Tibor 
> wrote:
>
>> Hi,
>>
>> I found this:
>>
>>
>> 2018-05-10 03:24:19,096+02 INFO  [org.ovirt.engine.core.
>> vdsbroker.gluster.GetGlusterVolumeAdvancedDetailsVDSCommand]
>> (DefaultQuartzScheduler7) [43f4eaec] FINISH, 
>> GetGlusterVolumeAdvancedDetailsVDSCommand,
>> return: org.ovirt.engine.core.common.businessentities.gluster.
>> GlusterVolumeAdvancedDetails@ca97448e, log id: 347435ae
>> 2018-05-10 03:24:19,097+02 ERROR 
>> [org.ovirt.engine.core.bll.gluster.GlusterSyncJob]
>> (DefaultQuartzScheduler7) [43f4eaec] Error while refreshing brick statuses
>> for volume 'volume2' of cluster 'C6220': null
>> 2018-05-10 03:24:19,097+02 INFO  
>> [org.ovirt.engine.core.bll.lock.InMemoryLockManager]
>> (DefaultQuartzScheduler8) [7715ceda] Failed to acquire lock and wait lock
>> 'EngineLock:{exclusiveLocks='[59c10db3-0324-0320-0120-0339=GLUSTER]',
>> sharedLocks=''}'
>> 2018-05-10 03:24:19,104+02 INFO  [org.ovirt.engine.core.
>> vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand]
>> (DefaultQuartzScheduler7) [43f4eaec] START, 
>> GetGlusterLocalLogicalVolumeListVDSCommand(HostName
>> = n4.itsmart.cloud, VdsIdVDSCommandParametersBase:
>> {hostId='3ddef95f-158d-407c-a7d8-49641e012755'}), log id: 6908121d
>> 2018-05-10 03:24:19,106+02 ERROR [org.ovirt.engine.core.
>> vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand]
>> (DefaultQuartzScheduler7) [43f4eaec] Command '
>> GetGlusterLocalLogicalVolumeListVDSCommand(HostName = n4.itsmart.cloud,
>> VdsIdVDSCommandParametersBase:{hostId='3ddef95f-158d-407c-a7d8-49641e012755'})'
>> execution failed: null
>> 2018-05-10 03:24:19,106+02 INFO  [org.ovirt.engine.core.
>> vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand]
>> (DefaultQuartzScheduler7) [43f4eaec] FINISH, 
>> GetGlusterLocalLogicalVolumeListVDSCommand,
>> log id: 6908121d
>> 2018-05-10 03:24:19,107+02 INFO  [org.ovirt.engine.core.
>> vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand]
>> (DefaultQuartzScheduler7) [43f4eaec] START, 
>> GetGlusterLocalLogicalVolumeListVDSCommand(HostName
>> = n1.itsmart.cloud, VdsIdVDSCommandParametersBase:
>> {hostId='8e737bab-e0bb-4f16-ab85-e24e91882f57'}), log id: 735c6a5f
>> 2018-05-10 03:24:19,109+02 ERROR [org.ovirt.engine.core.
>> vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand]
>> (DefaultQuartzScheduler7) [43f4eaec] Command '
>> GetGlusterLocalLogicalVolumeListVDSCommand(HostName = n1.itsmart.cloud,
>> VdsIdVDSCommandParametersBase:{hostId='8e737bab-e0bb-4f16-ab85-e24e91882f57'})'
>> execution failed: null
>> 2018-05-10 03:24:19,109+02 INFO  [org.ovirt.engine.core.
>> vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand]
>> (DefaultQuartzScheduler7) [43f4eaec] FINISH, 
>> GetGlusterLocalLogicalVolumeListVDSCommand,
>> log id: 735c6a5f
>> 2018-05-10 03:24:19,110+02 INFO  [org.ovirt.engine.core.
>> vdsbroker.gluster.GetGlusterLocalLogicalVolumeListVDSCommand]
>> (DefaultQuartzScheduler7) [43f4eaec] START, 
>> GetGlusterLocalLogicalVolumeListVDSCommand(HostName
>> = n2.itsmart.cloud, VdsIdVDSCommandParametersBase:
>> {hostId='06e361ef-3361-4eaa-9923-27fa1a0187a4'}), log id: 6f9e9f58
>> 2018-05-10 03:24:19,112+02 ERROR 

Re: [ovirt-users] Emergency shutdown script

2018-04-23 Thread Doug Ingham
I've plugged this into our monitoring.

When the UPS' are at 50%, it puts the general cluster into global
maintenance & then triggers a shutdown action on all of the VMs in the
cluster's service group via the monitoring agent (you could use an SNMP
trap if you use agentless monitoring). Once all of the VMs are down, it
then continues with the hosts.

At 20%, it does the same with our "core" cluster.

This method means that the HE can be shutdown, and it works whether the HE
is up or not.

For emergencies when the monitoring is offline, I've also hacked up a bash
script which parses the output of vdsClient & uses a loop to send a
shutdown signal to all of the VMs on each host.

Doug

On Mon, 23 Apr 2018, 14:14 Simon Vincent,  wrote:

> Does anyone have a way of shutting down oVirt automatically in the case of
> a power outage?
>
> Ideally I would like a script that can be automatically run when the UPS
> reaches a certain level. I had a look at the python SDK but I could only
> find functions for shutting down VMs and not hosts. Also I suspect this
> wont let me shutdown the hosted engine VM.
>
> Any ideas?
>
> Regards
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt KVM Guest Definition: How to read these from within a virtual machine?

2018-04-16 Thread Doug Ingham
Err...by reading the hardware specs in the standard manner? eg. dmidecode,
etc.

On 15 April 2018 at 01:28, TomK  wrote:

> From within an oVirt (KVM) guest machine, how can I read the guest
> specific definitions such as memory, CPU, disk etc configuration that the
> guest was given?
>
> I would like to do this from within the virtual machine guest.
>
> --
> Cheers,
> Tom K.
> 
> -
>
> Living on earth is expensive, but it includes a free trip around the sun.
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>



-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VM Portal looking for translators

2017-08-22 Thread Doug Ingham
I sent Marek an email with my username last week, offering to do the pt_br
translation, however I've still had no response?

Rgds,

On 22 August 2017 at 09:38, Gianluca Cecchi 
wrote:

> On Mon, Aug 14, 2017 at 8:37 PM, Jakub Niedermertl 
> wrote:
>
>> Hi all,
>>
>> new VM Portal project [1] - a replacement of oVirt userportal -  is
>> looking for community translators. If you know any of
>>
>> * Chinese (Simplified)
>> * French
>> * German
>> * Italian
>> * Japanese
>> * Korean
>> * Portuguese
>> * Russian
>> * Spanish
>>
>>
> Hello,
> I just completed translation for Italian language.
> Hopefully it will flow into next version (4.2.0?)
>
> Gianluca
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>


-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Pass Discard guest OS support?

2017-08-01 Thread Doug Ingham
Hi All,
 Just today I noticed that guests can now pass discards to the underlying
shared filesystem.

http://www.ovirt.org/develop/release-management/features/storage/pass-discard-from-guest-to-underlying-storage/

Is this supported by all of the main Linux guest OS's running the virt
agent?
And what about Windows guests? (specifically, 2012 & 2016)

Cheers,
-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Upgrading HC from 4.0 to 4.1

2017-07-01 Thread Doug Ingham
> Only problem I would like to manage is that I have gluster network shared
> with ovirtmgmt one.
> Can I move it now with these updated packages?
>

Are the gluster peers configured with the same hostnames/IPs as your hosts
within oVirt?

Once they're configured on the same network, separating them might be a bit
difficult. Also, the last time I looked, oVirt still doesn't support
managing HCI oVirt/Gluster nodes running each service on a different
interface (see below).

In theory, the procedure would involve stopping all of the Gluster
processes on all of the peers, updating the peer addresses in the gluster
configs on all of the nodes, then restarting glusterd & the bricks. I've
not tested this however, and it's not a "supported" procedure. I've no idea
how oVirt would deal with these changes either.


To properly separate my own storage & management networks from the
beginning, I configured each host with 2 IPs on different subnets and a
different hostname corresponding to each IP. For example, "v0" points to
the management interface of the first node, and "s0" points to the storage
interface.

oVirt's problem is that, whilst it can see the pre-configured bricks and
volumes on each host, it can't create any new bricks or volumes because it
wants to use the same hostnames it uses to manage the hosts. It also means
that it can't fence the hosts correctly, as it doesn't understand that "v0"
& "s0" are the same host.
This isn't a problem for me though, as I don't need to manage my Gluster
instances via the GUI, and automatic fencing can be done via the IPMI
interfaces.

Last I read, this is a recognised problem, but a fix isn't expect to arrive
any time soon.

-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Seamless SAN HA failovers with oVirt?

2017-06-06 Thread Doug Ingham
Hey Matthew,
 I think it's VDSM that handles the pausing & resuming of the VMs.

An analogous small-scale scenario...the Gluster layer for one of our
smaller oVirt clusters temporarily lost quorum the other week, locking all
I/O for about 30 minutes. The VMs all went into pause & then resumed
automatically when quorum was restored.

To my surprise/relief, not a single one of the 10 odd VMs reported any
errors.

YMMV

Doug

On 6 June 2017 at 13:45, Matthew Trent 
wrote:

> Thanks for the replies, all!
>
> Yep, Chris is right. TrueNAS HA is active/passive and there isn't a way
> around that when failing between heads.
>
> Sven: In my experience with iX support, they have directed me to reboot
> the active node to initiate failover. There's "hactl takeover" and "hactl
> giveback" commends, but reboot seems to be their preferred method.
>
> VMs going into a paused state and resuming when storage is back online
> sounds great. As long as oVirt's pause/resume isn't significantly slower
> than the 30-or-so seconds the TrueNAS takes to complete its failover,
> that's a pretty tolerable interruption for my needs. So my next questions
> are:
>
> 1) Assuming the SAN failover DOES work correctly, can anyone comment on
> their experience with oVirt pausing/thawing VMs in an NFS-based
> active/passive SAN failover scenario? Does it work reliably without
> intervention? Is it reasonably fast?
>
> 2) Is there anything else in the oVirt stack that might cause it to "freak
> out" rather than gracefully pause/unpause VMs?
>
> 2a) Particularly: I'm running hosted engine on the same TrueNAS storage.
> Does that change anything WRT to timeouts and oVirt's HA and fencing and
> sanlock and such?
>
> 2b) Is there a limit to how long oVirt will wait for storage before doing
> something more drastic than just pausing VMs?
>
> --
> Matthew Trent
> Network Engineer
> Lewis County IT Services
> 360.740.1247 - Helpdesk
> 360.740.3343 - Direct line
>
> 
> From: users-boun...@ovirt.org  on behalf of
> Chris Adams 
> Sent: Tuesday, June 6, 2017 7:21 AM
> To: users@ovirt.org
> Subject: Re: [ovirt-users] Seamless SAN HA failovers with oVirt?
>
> Once upon a time, Juan Pablo  said:
> > Chris, if you have active-active with multipath: you upgrade one system,
> > reboot it, check it came active again, then upgrade the other.
>
> Yes, but that's still not how a TrueNAS (and most other low- to
> mid-range SANs) works, so is not relevant.  The TrueNAS only has a
> single active node talking to the hard drives at a time, because having
> two nodes talking to the same storage at the same time is a hard problem
> to solve (typically requires custom hardware with active cache coherency
> and such).
>
> You can (and should) use multipath between servers and a TrueNAS, and
> that protects against NIC, cable, and switch failures, but does not help
> with a controller failure/reboot/upgrade.  Multipath is also used to
> provide better bandwidth sharing between links than ethernet LAGs.
>
> --
> Chris Adams 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>



-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Setting up hosted engine appliance on a bonded interface

2017-03-28 Thread Doug Ingham
Hi Bryan,

On 23 March 2017 at 23:54, Bryan Sockel  wrote:

>
> Hi,
>
> I am attempting to deploy an appliance to a bonded interface, and i
> getting this error when it attempts to setup the bridge:
>
>
> [ ERROR ] Failed to execute stage 'Misc configuration': Failed to setup
> networks {'ovirtmgmt': {'bonding': 'bond0', 'ipaddr': u'10.20.101.181',
> 'netmask': u'255.255.255.0', 'defaultRoute': True, 'gateway':
> u'10.20.101.1'}}. Error code: "-32603" message: "Attempt to call function:
> >
> with arguments: ({u'ovirtmgmt': {u'bonding': u'bond0', u'ipaddr':
> u'10.20.101.181', u'netmask': u'255.255.255.0', u'defaultRoute': True,
> u'gateway': u'10.20.101.1'}}, {}, {u'connectivityCheck': False}) error:
> 'NoneType' object is not iterable"
>
>
> Is it possible to deploy an appliance to a network bond?  My Network guy
> has configured my ports in the switch so i can connect via bonded links.
>  however, if the interfaces are not bonded, it will suspend the links.
>

With LACP & bonding mode 4 (the type of bonding you configure with a
switch), you *can* configure the link to be suspended if one of the member
ports are down, however that would only be desired in very specific edge
cases. I see no reason why you should be using such a configuration.
LACP is usually used to provide both aggregated bandwidth & port
redundancy. If one of the member ports fail, then the link is maintained
with the aggregated bandwidth of the remaining links. You don't even need
the bond configured on both sides, if you only configure it on one side,
then the ports fall back to individual/non-aggregated mode (although some
manufacturers only permit that in passive mode, not active mode).

If your bond is configured as you say, then I suggest you ask your network
guy to reconfigure the bond - you lose link redundancy at absolutely no
benefit.



>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>

-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Trouble installing windows 10

2017-03-17 Thread Doug Ingham
Fedora has signed drivers, however whilst I can't speak for Windows 10,
I've still not had any luck getting any of the VirtIO & Spice drivers
working on Windows Server 2016.
The services are *running*, but there doesn't seem to be any actual
communcation going on between the hypervisor & guest...

https://fedoraproject.org/wiki/Windows_Virtio_Drivers#Direct_download

On 17 March 2017 at 17:29, Tomáš Golembiovský  wrote:

> Hi,
>
> On Fri, 17 Mar 2017 18:27:41 +
> Jim Fuhr  wrote:
>
> > I'm trying out Ovirt 4.1 and I have it working fine with Linux VMs.
> But, when I try to install Windows 10 using the virtio-win drivers for the
> hard-drive Windows 10 refuses the drivers because they are not signed.  I
> don't know how tell Windows 10 to ignore the missing signature.
>
> could you tell us where did you get the drivers from? Which package and
> what version of the package is that?
>
> Thanks,
>
> Tomas
>
> --
> Tomáš Golembiovský 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>



-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Memory for Engine Machine

2017-02-20 Thread Doug Ingham
16GB is just the recommended amount of memory. The more items your Engine
has to manage, the more memory it will consume, so whilst it might not be
using that amount of memory at the moment, it will do as you expand your
cluster.

On 20 February 2017 at 16:22, FERNANDO FREDIANI 
wrote:

> Hello folks
>
> I have a Engine dedicated machine running with 4GB of memory. It has been
> working fine without any apparent issues.
>
> If I check the system memory usage it rarely goes over 1.5GB.
>
> But when I upgrade oVirt Engine it complains with the following message:
> "[WARNING] Less than 16384MB of memory is available".
>
> Why is all that required if the real usage doesn't show that need ? Or am
> I missing anything ?
>
> Fernando Frediani
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>



-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Importing existing (dirty) storage domains

2017-02-16 Thread Doug Ingham
Hi Nir,

On 16 Feb 2017 22:41, "Nir Soffer" <nsof...@redhat.com> wrote:

On Fri, Feb 17, 2017 at 3:16 AM, Doug Ingham <dou...@gmail.com> wrote:
> Well that didn't go so well. I deleted both dom_md/ids & dom_md/leases in
> the cloned volume, and I still can't import the storage domain.

You cannot delete files from dom_md, this will invalidate the storage
domain and you
will not be able to use it without restoring these files.

The leases file on file storage domain is unused, so creating empty
file is enough.

The ids file must be created and initialized using sanlock, please you
should find
instructions how to do it in the archives.

> The snapshot was also taken some 4 hours before the attempted import, so
I'm
> surprised the locks haven't expired by themselves...

Leases do not expire if vdsm is connected to storage, and sanlock can access
the storage.

I'm not sure what do you mean by volume snapshots.


A snapshot is like a save-point, the state of a storage volume from a
specific point in time.

In this case, it means I have created a copy/clone of my active data
volume. It's a completely new, separate volume, and is not attached to any
running services.

I'm using this copy/clone to test the import process, before doing it with
my live volume. If I "break" something in the cloned volume, no worries, I
can just delete it and recreate it from the snapshot.

Hope that clears things up a bit!


To import a storage domain, you should first make that no other setup is
using
the storage domain.

The best way to do it is to detach the storage domain from the other setup.

If you are using hosted engine, you must also put the hosted engine agent in
global maintenance mode.

If your engine is broken, the best way to disconnect from storage is to
reboot
the hosts.


So this is the issue. My current tests emulate exactly this, however I'm
still not able to import the domain into the new Engine. When I try to do
so, I get the resulting logs I copied in my earlier email.


Doug


Nir


>
>
> 2017-02-16 21:58:24,630-03 INFO
> [org.ovirt.engine.core.bll.storage.connection.
AddStorageServerConnectionCommand]
> (default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] Lock Acquired to
> object
> 'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION,
> ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
> 2017-02-16 21:58:24,645-03 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]
> (default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] START,
> ConnectStorageServerVDSCommand(HostName = v5.dc0.example.com,
> StorageServerConnectionManagementVDSParameters:{runAsync='true',
> hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0',
> storagePoolId='----',
> storageType='GLUSTERFS',
> connectionList='[StorageServerConnections:{id='null',
> connection='localhost:data-teste2', iqn='null', vfsType='glusterfs',
> mountOptions='null', nfsVersion='null', nfsRetrans='null',
nfsTimeo='null',
> iface='null', netIfaceName='null'}]'}), log id: 726df65e
> 2017-02-16 21:58:26,046-03 INFO
> [org.ovirt.engine.core.bll.storage.connection.
AddStorageServerConnectionCommand]
> (default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] Lock freed to
> object 'EngineLock:{exclusiveLocks='[localhost:data
> teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]',
> sharedLocks='null'}'
> 2017-02-16 21:58:26,206-03 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCom
mand]
> (default task-52) [85548427-713f-4ffb-a385-a97a7ee4109d] START,
> HSMGetStorageDomainsListVDSCommand(HostName = v5.dc0.example.com,
> HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true',
> hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0',
> storagePoolId='----', storageType='null',
> storageDomainType='Data', path='localhost:data-teste2'}), log id: 79f6cc88
> 2017-02-16 21:58:27,899-03 INFO
> [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCom
mand]
> (default task-50) [38e87311-a7a5-49a8-bf18-857dd969cd5f] START,
> HSMGetStorageDomainsListVDSCommand(HostName = v5.dc0.example.com,
> HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true',
> hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0',
> storagePoolId='----', storageType='null',
> storageDomainType='Data', path='localhost:data-teste2'}), log id: 7280d13
> 2017-02-16 21:58:29,156-03 INFO
> [org.ovirt.engine.core.bll.storage.connection.
RemoveStorageServerConnectionCommand]
> (default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] Lock Acquired to
> object
> 'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION,
> ACTION_TYPE_FAILED_OBJECT_LOCKED>,
> 5e5f6610-c759-448b-a53d-9a456f51368

Re: [ovirt-users] Importing existing (dirty) storage domains

2017-02-16 Thread Doug Ingham
Well that didn't go so well. I deleted both dom_md/ids & dom_md/leases in
the cloned volume, and I still can't import the storage domain.
The snapshot was also taken some 4 hours before the attempted import, so
I'm surprised the locks haven't expired by themselves...


2017-02-16 21:58:24,630-03 INFO
[org.ovirt.engine.core.bll.storage.connection.AddStorageServerConnectionCommand]
(default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] Lock Acquired to
object
'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION,
ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2017-02-16 21:58:24,645-03 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand]
(default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] START,
ConnectStorageServerVDSCommand(HostName = v5.dc0.example.com,
StorageServerConnectionManagementVDSParameters:{runAsync='true',
hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0',
storagePoolId='----',
storageType='GLUSTERFS',
connectionList='[StorageServerConnections:{id='null',
connection='localhost:data-teste2', iqn='null', vfsType='glusterfs',
mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null',
iface='null', netIfaceName='null'}]'}), log id: 726df65e
2017-02-16 21:58:26,046-03 INFO
[org.ovirt.engine.core.bll.storage.connection.AddStorageServerConnectionCommand]
(default task-45) [d59bc8c0-3c53-4a34-9d7c-8c982ee14e14] Lock freed to
object 'EngineLock:{exclusiveLocks='[localhost:data
teste2=<STORAGE_CONNECTION, ACTION_TYPE_FAILED_OBJECT_LOCKED>]',
sharedLocks='null'}'
2017-02-16 21:58:26,206-03 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCommand]
(default task-52) [85548427-713f-4ffb-a385-a97a7ee4109d] START,
HSMGetStorageDomainsListVDSCommand(HostName = v5.dc0.example.com,
HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true',
hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0',
storagePoolId='----', storageType='null',
storageDomainType='Data', path='localhost:data-teste2'}), log id: 79f6cc88
2017-02-16 21:58:27,899-03 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainsListVDSCommand]
(default task-50) [38e87311-a7a5-49a8-bf18-857dd969cd5f] START,
HSMGetStorageDomainsListVDSCommand(HostName = v5.dc0.example.com,
HSMGetStorageDomainsListVDSCommandParameters:{runAsync='true',
hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0',
storagePoolId='----', storageType='null',
storageDomainType='Data', path='localhost:data-teste2'}), log id: 7280d13
2017-02-16 21:58:29,156-03 INFO
[org.ovirt.engine.core.bll.storage.connection.RemoveStorageServerConnectionCommand]
(default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] Lock Acquired to
object
'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION,
ACTION_TYPE_FAILED_OBJECT_LOCKED>,
5e5f6610-c759-448b-a53d-9a456f513681=<STORAGE_CONNECTION,
ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'
2017-02-16 21:58:29,168-03 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand]
(default task-57) [5e4b20cf-60d2-4ae9-951b-c2693603aa6f] START,
DisconnectStorageServerVDSCommand(HostName = v5.dc0.example.com,
StorageServerConnectionManagementVDSParameters:{runAsync='true',
hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0',
storagePoolId='----',
storageType='GLUSTERFS',
connectionList='[StorageServerConnections:{id='5e5f6610-c759-448b-a53d-9a456f513681',
connection='localhost:data-teste2', iqn='null', vfsType='glusterfs',
mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null',
iface='null', netIfaceName='null'}]'}), log id: 6042b108
2017-02-16 21:58:29,193-03 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.DisconnectStorageServerVDSCommand]
(default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] START,
DisconnectStorageServerVDSCommand(HostName = v5.dc0.example.com,
StorageServerConnectionManagementVDSParameters:{runAsync='true',
hostId='1a3f10f2-e4ce-44b9-9495-06e445cfa0b0',
storagePoolId='----',
storageType='GLUSTERFS',
connectionList='[StorageServerConnections:{id='5e5f6610-c759-448b-a53d-9a456f513681',
connection='localhost:data-teste2', iqn='null', vfsType='glusterfs',
mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null',
iface='null', netIfaceName='null'}]'}), log id: 4e9421cf
2017-02-16 21:58:31,398-03 INFO
[org.ovirt.engine.core.bll.storage.connection.RemoveStorageServerConnectionCommand]
(default task-56) [1b3826e4-4890-43d4-8854-16f3c573a31f] Lock freed to
object
'EngineLock:{exclusiveLocks='[localhost:data-teste2=<STORAGE_CONNECTION,
ACTION_TYPE_FAILED_OBJECT_LOCKED>,
5e5f6610-c759-448b-a53d-9a456f513681=<STORAGE_CONNECTION,
ACTION_TYPE_FAILED_OBJECT_LOCKED>]', sharedLocks='null'}'

Again, many thanks!
 Doug

On 16 February 2017 at 18:53, Doug Ingham <dou...@gmai

Re: [ovirt-users] Importing existing (dirty) storage domains

2017-02-16 Thread Doug Ingham
Hi Nir,

On 16 February 2017 at 13:55, Nir Soffer <nsof...@redhat.com> wrote:

> On Mon, Feb 13, 2017 at 3:35 PM, Doug Ingham <dou...@gmail.com> wrote:
> > Hi Sahina,
> >
> > On 13 February 2017 at 05:45, Sahina Bose <sab...@redhat.com> wrote:
> >>
> >> Any errors in the gluster mount logs for this gluster volume?
> >>
> >> How about "gluster vol heal  info" - does it list any entries
> to
> >> heal?
> >
> >
> > After more investigating, I found out that there is a sanlock daemon that
> > runs with VDSM, independently of the HE, so I'd basically have to bring
> the
> > volume down & wait for the leases to expire/delete them* before I can
> import
> > the domain.
> >
> > *I understand removing /dom_md/leases/ should do the job?
>
> No, the issue is probably dom_md/ids accessed by sanlock, but removing
> files
> accessed by sanlock will not help, an open file will remain open until
> sanlock
> close the file.
>

I'm testing this with volume snapshots at the moment, so there are no
processes accessing the new volume.


Did you try to reboot the host before installing it again? If you did and
> you
> still have these issues, you probably need to remove the previous
> installation
> properly before installing again.
>
> Adding Simone to help with uninstalling and reinstalling hosted engine.
>

The Hosted-Engine database had been corrupted and the restore wasn't
running correctly, so I installed a new engine on a new server - no
restores or old data. The aim is to import the old storage domain into the
new Engine & then import the VMs into the new storage domain.
My only problem with this is that there appear to be some file based leases
somewhere that, unless I manage to locate & delete them, force me to wait
for the leases to timeout before I can import the old storage domain.
To minimise downtime, I'm trying to avoid having to wait for the leases to
timeout.

Regards,
 Doug


>
> Nir
>
> >
> >
> >>
> >>
> >> On Thu, Feb 9, 2017 at 11:57 PM, Doug Ingham <dou...@gmail.com> wrote:
> >>>
> >>> Some interesting output from the vdsm log...
> >>>
> >>>
> >>> 2017-02-09 15:16:24,051 INFO  (jsonrpc/1) [storage.StorageDomain]
> >>> Resource namespace 01_img_60455567-ad30-42e3-a9df-62fe86c7fd25 already
> >>> registered (sd:731)
> >>> 2017-02-09 15:16:24,051 INFO  (jsonrpc/1) [storage.StorageDomain]
> >>> Resource namespace 02_vol_60455567-ad30-42e3-a9df-62fe86c7fd25 already
> >>> registered (sd:740)
> >>> 2017-02-09 15:16:24,052 INFO  (jsonrpc/1) [storage.SANLock] Acquiring
> >>> Lease(name='SDM',
> >>> path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/
> 60455567-ad30-42e3-a9df-6
> >>> 2fe86c7fd25/dom_md/leases', offset=1048576) for host id 1
> >>> (clusterlock:343)
> >>> 2017-02-09 15:16:24,057 INFO  (jsonrpc/1) [storage.SANLock] Releasing
> >>> host id for domain 60455567-ad30-42e3-a9df-62fe86c7fd25 (id: 1)
> >>> (clusterlock:305)
> >>> 2017-02-09 15:16:25,149 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC
> >>> call GlusterHost.list succeeded in 0.17 seconds (__init__:515)
> >>> 2017-02-09 15:16:25,264 INFO  (Reactor thread)
> >>> [ProtocolDetector.AcceptorImpl] Accepted connection from
> >>> :::127.0.0.1:55060 (protocoldetector:72)
> >>> 2017-02-09 15:16:25,270 INFO  (Reactor thread)
> >>> [ProtocolDetector.Detector] Detected protocol stomp from
> >>> :::127.0.0.1:55060 (protocoldetector:127)
> >>> 2017-02-09 15:16:25,271 INFO  (Reactor thread) [Broker.StompAdapter]
> >>> Processing CONNECT request (stompreactor:102)
> >>> 2017-02-09 15:16:25,271 INFO  (JsonRpc (StompReactor))
> >>> [Broker.StompAdapter] Subscribe command received (stompreactor:129)
> >>> 2017-02-09 15:16:25,416 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC
> >>> call Host.getHardwareInfo succeeded in 0.01 seconds (__init__:515)
> >>> 2017-02-09 15:16:25,419 INFO  (jsonrpc/6) [dispatcher] Run and protect:
> >>> repoStats(options=None) (logUtils:49)
> >>> 2017-02-09 15:16:25,419 INFO  (jsonrpc/6) [dispatcher] Run and protect:
> >>> repoStats, Return response: {u'e8d04da7-ad3d-4227-a45d-b5a29b2f43e5':
> >>> {'code': 0, 'actual': True
> >>> , 'version': 4, 'acquired': True, 'delay': '0.000854128', 'lastCheck':
> >>> '5.1', 'valid': True}, u'a77b8821-ff19-4d17-a3ce-a6c3a69436d5':
> {'code': 0,
> >>> 

Re: [ovirt-users] Scheduling snapshots

2017-02-15 Thread Doug Ingham
https://github.com/wefixit-AT/oVirtBackup

...although I understand the API calls it uses have been deprecated in 4.1.

On 15 February 2017 at 14:38, Pat Riehecky  wrote:

> Has someone got a script to automate scheduling snapshots of a specific
> system (and retaining them for X days)?
>
> Pat
>
> --
> Pat Riehecky
>
> Fermi National Accelerator Laboratory
> www.fnal.gov
> www.scientificlinux.org
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>



-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] How to enable Global Maintenance via the REST-API?

2017-02-15 Thread Doug Ingham
Aha! Cheers

On 15 February 2017 at 12:20, Andrej Krejcir <akrej...@redhat.com> wrote:

> Hi,
>
> global maintenance can be set using the 'maintenance' method on the Hosted
> Engine VM.
>
> http://ovirt.github.io/ovirt-engine-api-model/4.0/#services/
> vm/methods/maintenance
>
> Regards,
> Andrej
>
> On 13 February 2017 at 21:17, Doug Ingham <dou...@gmail.com> wrote:
>
>> Hey Guys,
>>  I've gone through both oVirt's & Red Hat's API docs, but I can only find
>> info on getting the global maintenance state & setting local maintenance on
>> specific hosts.
>>
>> Is it not possible to set global maintenance via the API?
>>
>> I'm writing up a new script for our engine-backup routine, but having to
>> set GM by acessing a specific node seems a bit inelegant...
>>
>> https://access.redhat.com/documentation/en/red-hat-virtualiz
>> ation/4.0/single/rest-api-guide/
>>
>> Cheers,
>> --
>> Doug
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>


-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] How to enable Global Maintenance via the REST-API?

2017-02-13 Thread Doug Ingham
Hey Guys,
 I've gone through both oVirt's & Red Hat's API docs, but I can only find
info on getting the global maintenance state & setting local maintenance on
specific hosts.

Is it not possible to set global maintenance via the API?

I'm writing up a new script for our engine-backup routine, but having to
set GM by acessing a specific node seems a bit inelegant...

https://access.redhat.com/documentation/en/red-hat-virtualization/4.0/single/rest-api-guide/

Cheers,
-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Importing existing (dirty) storage domains

2017-02-13 Thread Doug Ingham
Hi Sahina,

On 13 February 2017 at 05:45, Sahina Bose <sab...@redhat.com> wrote:

> Any errors in the gluster mount logs for this gluster volume?
>
> How about "gluster vol heal  info" - does it list any entries to
> heal?
>

After more investigating, I found out that there is a sanlock daemon that
runs with VDSM, independently of the HE, so I'd basically have to bring the
volume down & wait for the leases to expire/delete them* before I can
import the domain.

*I understand removing /dom_md/leases/ should do the job?



>
> On Thu, Feb 9, 2017 at 11:57 PM, Doug Ingham <dou...@gmail.com> wrote:
>
>> Some interesting output from the vdsm log...
>>
>>
>> 2017-02-09 15:16:24,051 INFO  (jsonrpc/1) [storage.StorageDomain]
>> Resource namespace 01_img_60455567-ad30-42e3-a9df-62fe86c7fd25 already
>> registered (sd:731)
>> 2017-02-09 15:16:24,051 INFO  (jsonrpc/1) [storage.StorageDomain]
>> Resource namespace 02_vol_60455567-ad30-42e3-a9df-62fe86c7fd25 already
>> registered (sd:740)
>> 2017-02-09 15:16:24,052 INFO  (jsonrpc/1) [storage.SANLock] Acquiring
>> Lease(name='SDM', path=u'/rhev/data-center/mnt/g
>> lusterSD/localhost:data2/60455567-ad30-42e3-a9df-6
>> 2fe86c7fd25/dom_md/leases', offset=1048576) for host id 1
>> (clusterlock:343)
>> 2017-02-09 15:16:24,057 INFO  (jsonrpc/1) [storage.SANLock] Releasing
>> host id for domain 60455567-ad30-42e3-a9df-62fe86c7fd25 (id: 1)
>> (clusterlock:305)
>> 2017-02-09 15:16:25,149 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC
>> call GlusterHost.list succeeded in 0.17 seconds (__init__:515)
>> 2017-02-09 15:16:25,264 INFO  (Reactor thread)
>> [ProtocolDetector.AcceptorImpl] Accepted connection from :::
>> 127.0.0.1:55060 (protocoldetector:72)
>> 2017-02-09 15:16:25,270 INFO  (Reactor thread)
>> [ProtocolDetector.Detector] Detected protocol stomp from :::
>> 127.0.0.1:55060 (protocoldetector:127)
>> 2017-02-09 15:16:25,271 INFO  (Reactor thread) [Broker.StompAdapter]
>> Processing CONNECT request (stompreactor:102)
>> 2017-02-09 15:16:25,271 INFO  (JsonRpc (StompReactor))
>> [Broker.StompAdapter] Subscribe command received (stompreactor:129)
>> 2017-02-09 15:16:25,416 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC
>> call Host.getHardwareInfo succeeded in 0.01 seconds (__init__:515)
>> 2017-02-09 15:16:25,419 INFO  (jsonrpc/6) [dispatcher] Run and protect:
>> repoStats(options=None) (logUtils:49)
>> 2017-02-09 15:16:25,419 INFO  (jsonrpc/6) [dispatcher] Run and protect:
>> repoStats, Return response: {u'e8d04da7-ad3d-4227-a45d-b5a29b2f43e5':
>> {'code': 0, 'actual': True
>> , 'version': 4, 'acquired': True, 'delay': '0.000854128', 'lastCheck':
>> '5.1', 'valid': True}, u'a77b8821-ff19-4d17-a3ce-a6c3a69436d5': {'code':
>> 0, 'actual': True, 'vers
>> ion': 4, 'acquired': True, 'delay': '0.000966556', 'lastCheck': '2.6',
>> 'valid': True}} (logUtils:52)
>> 2017-02-09 15:16:25,447 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC
>> call Host.getStats succeeded in 0.03 seconds (__init__:515)
>> 2017-02-09 15:16:25,450 ERROR (JsonRpc (StompReactor)) [vds.dispatcher]
>> SSL error receiving from > (':::127.0.0.1', 55060, 0, 0) at 0x7f69c0043cf8>: unexpected eof
>> (betterAsyncore:113)
>> 2017-02-09 15:16:25,812 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC
>> call GlusterVolume.list succeeded in 0.10 seconds (__init__:515)
>> 2017-02-09 15:16:25,940 INFO  (Reactor thread)
>> [ProtocolDetector.AcceptorImpl] Accepted connection from :::
>> 127.0.0.1:55062 (protocoldetector:72)
>> 2017-02-09 15:16:25,946 INFO  (Reactor thread)
>> [ProtocolDetector.Detector] Detected protocol stomp from :::
>> 127.0.0.1:55062 (protocoldetector:127)
>> 2017-02-09 15:16:25,947 INFO  (Reactor thread) [Broker.StompAdapter]
>> Processing CONNECT request (stompreactor:102)
>> 2017-02-09 15:16:25,947 INFO  (JsonRpc (StompReactor))
>> [Broker.StompAdapter] Subscribe command received (stompreactor:129)
>> 2017-02-09 15:16:26,058 ERROR (jsonrpc/1) [storage.TaskManager.Task]
>> (Task='02cad901-5fe8-4f2d-895b-14184f67feab') Unexpected error (task:870)
>> Traceback (most recent call last):
>>   File "/usr/share/vdsm/storage/task.py", line 877, in _run
>> return fn(*args, **kargs)
>>   File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in
>> wrapper
>> res = f(*args, **kwargs)
>>   File "/usr/share/vdsm/storage/hsm.py", line 812, in
>> forcedDetachStorageDomain
>> self._deatchStorageDomainFromOldPools(sdUUID)
>>   File "/usr/share/vdsm/storage/hsm.py", line 7

Re: [ovirt-users] Gluster storage question

2017-02-11 Thread Doug Ingham
On 11 February 2017 at 15:39, Bartosiak-Jentys, Chris <
chris.bartosiak-jen...@certico.co.uk> wrote:

> Thank you for your reply Doug,
>
> I didn't use localhost as I was preparing to follow instructions (blog
> post: http://community.redhat.com/blog/2014/11/up-and-
> running-with-ovirt-3-5-part-two/)  for setting up CTDB and had already
> created hostnames for the floating IP when I decided to ditch that and go
> with the hosts file hack. I already had the volumes mounted on those
> hostnames but you are absolutely right, simply using localhost would be the
> best option.
>
oVirt 3.5? 2014? That's ld. Both oVirt & Gluster have moved on a lot
since then. I would strongly recommend studying Gluster's documentation
before implementing it in production. It's not complicated, but you have to
have a good understanding of what you're doing & why if you want to protect
the integrity of your data & avoid waking up one day to find everything in
meltdown.

https://gluster.readthedocs.io/en/latest/

Red Hat's portal is also very good & full of detailed tips for tuning your
setup, however their "stable" versions (which they have to support) are of
course much older than the project's own latest stable, so keep this in
mind when considering their advice.

https://access.redhat.com/documentation/en/red-hat-storage/

Likewise with their oVirt documentation, although their supported oVirt
versions are much closer to the current stable release. It also features a
lot of very good advice for configuring & tuning an oVirt (RHEV) &
GlusterFS (RHGS) hyperconverged setup.

https://access.redhat.com/documentation/en/red-hat-virtualization/

For any other Gluster specific questions, you can usually get good & timely
responses on their mailing list & IRC channel.

Thank you for your suggested outline of how to power up/down the cluster, I
> hadn't considered the fact that turning on two out of date nodes would
> clobber data on the new node. This is something I will need to be very
> careful to avoid. The setup is mostly for lab work so not really mission
> critical but I do run a few VM's (freeIPA, GitLab and pfSense) that I'd
> like to keep up 24/7. I make regular backups (outside of ovirt) of those
> just in case.
>
> Thanks, I will do some reading on how gluster handles quorum and heal
> operations but your procedure sounds like a sensible way to operate this
> cluster.
>
> Regards,
>
> Chris.
>
>
> On 2017-02-11 18:08, Doug Ingham wrote:
>
>
>
> On 11 February 2017 at 13:32, Bartosiak-Jentys, Chris <
> chris.bartosiak-jen...@certico.co.uk> wrote:
>
>> Hello list,
>>
>> Just wanted to get your opinion on my ovirt home lab setup. While this is
>> not a production setup I would like it to run relatively reliably so please
>> tell me if the following storage configuration is likely to result in
>> corruption or just bat s**t insane.
>>
>> I have a 3 node hosted engine setup, VM data store and engine data store
>> are both replica 3 gluster volumes (one brick on each host).
>> I do not want to run all 3 hosts 24/7 due to electricity costs, I only
>> power up the larger hosts (2 Dell R710's) when I need additional resources
>> for VM's.
>>
>> I read about using CTDB and floating/virtual IP's to allow the storage
>> mount point to transition between available hosts but after some thought
>> decided to go about this another, simpler, way:
>>
>> I created a common hostname for the storage mount points: gfs-data and
>> gfs-engine
>>
>> On each host I edited /etc/hosts file to have these hostnames resolve to
>> each hosts IP i.e. on host1 gfs-data & gfs-engine --> host1 IP
>> on host2 gfs-data & gfs-engine --> host2 IP
>> etc.
>>
>> In ovirt engine each storage domain is mounted as gfs-data:/data and
>> gfs-engine:/engine
>> My thinking is that this way no matter which host is up and acting as SPM
>> it will be able to mount the storage as its only dependent on that host
>> being up.
>>
>> I changed gluster options for server-quorum-ratio so that the volumes
>> remain up even if quorum is not met, I know this is risky but its just a
>> lab setup after all.
>>
>> So, any thoughts on the /etc/hosts method to ensure the storage mount
>> point is always available? Is data corruption more or less inevitable with
>> this setup? Am I insane ;) ?
>
>
> Why not just use localhost? And no need for CTDB with a floating IP, oVirt
> uses libgfapi for Gluster which deals with that all natively.
>
> As for the quorum issue, I would most definitely *not* run with quorum
> disabled when you're running more than one node. As you say y

Re: [ovirt-users] Gluster storage question

2017-02-11 Thread Doug Ingham
On 11 February 2017 at 13:32, Bartosiak-Jentys, Chris <
chris.bartosiak-jen...@certico.co.uk> wrote:

> Hello list,
>
> Just wanted to get your opinion on my ovirt home lab setup. While this is
> not a production setup I would like it to run relatively reliably so please
> tell me if the following storage configuration is likely to result in
> corruption or just bat s**t insane.
>
> I have a 3 node hosted engine setup, VM data store and engine data store
> are both replica 3 gluster volumes (one brick on each host).
> I do not want to run all 3 hosts 24/7 due to electricity costs, I only
> power up the larger hosts (2 Dell R710's) when I need additional resources
> for VM's.
>
> I read about using CTDB and floating/virtual IP's to allow the storage
> mount point to transition between available hosts but after some thought
> decided to go about this another, simpler, way:
>
> I created a common hostname for the storage mount points: gfs-data and
> gfs-engine
>
> On each host I edited /etc/hosts file to have these hostnames resolve to
> each hosts IP i.e. on host1 gfs-data & gfs-engine --> host1 IP
> on host2 gfs-data & gfs-engine --> host2 IP
> etc.
>
> In ovirt engine each storage domain is mounted as gfs-data:/data and
> gfs-engine:/engine
> My thinking is that this way no matter which host is up and acting as SPM
> it will be able to mount the storage as its only dependent on that host
> being up.
>
> I changed gluster options for server-quorum-ratio so that the volumes
> remain up even if quorum is not met, I know this is risky but its just a
> lab setup after all.
>
> So, any thoughts on the /etc/hosts method to ensure the storage mount
> point is always available? Is data corruption more or less inevitable with
> this setup? Am I insane ;) ?
>

Why not just use localhost? And no need for CTDB with a floating IP, oVirt
uses libgfapi for Gluster which deals with that all natively.

As for the quorum issue, I would most definitely *not* run with quorum
disabled when you're running more than one node. As you say you
specifically plan for when the other 2 nodes of the replica 3 set will be
active or not, I'd do something along the lines of the following...

Going from 3 nodes to 1 node:
 - Put nodes 2 & 3 in maintenance to offload their virtual load;
 - Once the 2 nodes are free of load, disable quorum on the Gluster volumes;
 - Power down the 2 nodes.

Going from 1 node to 3 nodes:
 - Power on *only* 1 of the pair of nodes (if you power on both & self-heal
is enabled, Gluster will "heal" the files on the main node with the older
files on the 2 nodes which were powered down);
 - Allow Gluster some time to detect that the files are in split-brain;
 - Tell Gluster to heal the files in split-brain based on modification time;
 - Once the 2 nodes are in sync, re-enable quorum & power on the last node,
which will be resynchronised automatically;
 - Take the 2 hosts out of maintenance mode.

If you want to power on the 2nd two nodes at the same time, make absolutely
sure self-heal is disabled first! If you don't, Gluster will see the 2nd
two nodes as in quorum & heal the data on your 1st node with the
out-of-date data.


-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Current state of UI brick management when using dedicated interfaces for ovirtmgmt & gluster?

2017-02-10 Thread Doug Ingham
Hey Guys,
 I currently use dedicated interfaces & hostnames to separate gluster
traffic on my "hyperconverged" hosts.

For example, the first node uses "v0" for its management interface & "s0"
for its gluster interface.

With this setup, I notice that all functions under the "Volumes" tab work,
however I'm unable to import storage domains with "Use managed gluster",
and hosts' bricks aren't listed under the "Hosts" tab.

My engine log is also full of entries such as this...

2017-02-10 03:25:07,155-03 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn]
(DefaultQuartzScheduler3) [78aef637] Could not add brick
's0:/gluster/data-novo/brick' to volume
'bded65c7-e79e-4bc9-9630-36a69ad2e684' - server uuid
'a9d062c6-7d01-404f-ab0c-3ed468e60c91' not found in cluster
'0002-0002-0002-0002-017a'
2017-02-10 03:25:09,157-03 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesHealInfoReturn]
(DefaultQuartzScheduler10) [6828f9a7] Could not fetch heal info for brick
's0:/gluster/data-novo/brick' - server uuid
'a9d062c6-7d01-404f-ab0c-3ed468e60c91' not found

I'm wondering whether the different hostnames used to configure each of the
interfaces is casuing the confusion?
So...is there something wrong, or is this still an unsupported
configuration?

Cheers,
-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Best way to shutdown and restart hypervisor?

2017-02-10 Thread Doug Ingham
On 9 February 2017 at 10:08, Gianluca Cecchi 
wrote:

> On Wed, Feb 8, 2017 at 10:59 AM, Gianluca Cecchi <
> gianluca.cec...@gmail.com> wrote:
>
>> Hello,
>> what is considered the best way to shutdown and restart an hypervisor,
>> supposing plain CentOS 7 host?
>>
>> For example to cover these scenarios:
>> 1) update host from 4.0 to 4.1
>> 2) planned maintenance to the cabinet where the server is located and
>> take the opportunity to update also OS packages
>>
>> My supposed workflow:
>>
>> - put host into maintenance
>> - yum update on host
>> - shutdown os from inside host (because from power mgmt it is brutal
>> power off / power on)
>> --> should I get any warning from web admin gui in this case, even if the
>> host was in maintenance mode?
>> - power mgmt -> start from webadmin gui
>> (or power on button/virtual button at host side?)
>>
>
I'd add one more step at the beginning, update the cluster's scheduling
policy to InClusterUpgrade. That will automatically migrate your VMs to the
most up-to-date host, freeing your other hosts for a rolling upgrade of the
cluster.


>
>
>> Would be advisable to put inside power mgmt functionality some logic
>> about os mgmt, so for example, if action is restart, first try to shutdown
>> OS and only in case of failure power off/power on?
>>
>> Thanks in advance,
>> Gianluca
>>
>
> I see that one of the new features of 4.1 is Host restart through SSH
> Power Mgmt --> SSH Mgmt --> Restart
>
> How does it fit in question above?
> Can I choose this way after having run "yum update" on host?
>

Features are made available only after you update all of the hosts in the
cluster & the (Hosted-)Engine to the same version. Then you can update the
Compatibility Version of the cluster.

http://www.ovirt.org/documentation/self-hosted/chap-Maintenance_and_Upgrading_Resources/

-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Manually starting VMs via vdsClient (HE offline)

2017-02-09 Thread Doug Ingham
On 9 February 2017 at 15:48, Yaniv Kaul <yk...@redhat.com> wrote:

>
>
> On Thu, Feb 9, 2017 at 6:00 PM, Doug Ingham <dou...@gmail.com> wrote:
>
>>
>>
>> On 9 February 2017 at 12:03, Dan Yasny <dya...@gmail.com> wrote:
>>
>>>
>>> On Thu, Feb 9, 2017 at 9:55 AM, Doug Ingham <dou...@gmail.com> wrote:
>>>
>>>> Hi Dan,
>>>>
>>>> On 8 February 2017 at 18:26, Dan Yasny <dya...@gmail.com> wrote:
>>>>>
>>>>>
>>>>> But seriously, above all, I'd recommend you backup the engine (it
>>>>> comes with a utility) often and well. I do it via cron every hour in
>>>>> production, keeping a rotation of hourly and daily backups, just in case.
>>>>> It doesn't take much space or resources, but it's more than just best
>>>>> practice - that database is the summary of the entire setup.
>>>>>
>>>>>
>>>> If you don't mind, may I ask what process you use for backing up your
>>>> engine? If you use HE, do you keep one server dedicated to just that VM?
>>>> I've not had that particular issue in the restore process yet, however
>>>> I read that it's recommended the HE host is free of virtual load before the
>>>> backup takes place. And as they need to be done frequently, I'm reading
>>>> that as a dedicated host...
>>>>
>>>>
>>> If you use a dedicated host, you might as well abandon self hosted. HE
>>> is nice for small setups with the HA built in for extra fun, but once you
>>> scale, it might not be able to cope and you'll need real hardware. You're
>>> running a heavy-ish java engine plus two databases after all.
>>>
>>> So as I said, all I do is add the engine-backup command to cron on the
>>> engine, and then my backup server comes in and pulls out the files via scp,
>>> also through cron. Nothing fancy really, but it lets me sleep at night
>>>
>>
>> This particular project has 10 new maxed out servers to back it, and I
>> don't see it outgrowing that for at least a year or so. It's hardly a full
>> DC.
>> I presume the DB will become the heaviest part of the load, and I'm
>> already planning a separate high I/O environment for dedicated HA DB hosts.
>>
>> See the top section of this page:
>> http://www.ovirt.org/documentation/self-hosted/chap-Backing_
>> up_and_Restoring_an_EL-Based_Self-Hosted_Environment
>>
>> It seems that I'll always have to keep at least one host free to be able
>> to avoid restore problems. If not, and I were to keep hourly backups, then
>> migrating VMs off the host every hour would just be a pain.
>>
>
> I don't see the point in an hourly backup. Of what? The DB? The VM? What
> storage will it be based on?
> I suggest revising the strategy.
>
>

Um, that wasn't my suggestion. To be honest, a fortnight's worth of daily
engine-backups & snapshots of the engine volume should suffice for us. It's
a "Hyperconverged" setup using gluster storage on the compute nodes.


-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Importing existing (dirty) storage domains

2017-02-09 Thread Doug Ingham
Some interesting output from the vdsm log...


2017-02-09 15:16:24,051 INFO  (jsonrpc/1) [storage.StorageDomain] Resource
namespace 01_img_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered
(sd:731)
2017-02-09 15:16:24,051 INFO  (jsonrpc/1) [storage.StorageDomain] Resource
namespace 02_vol_60455567-ad30-42e3-a9df-62fe86c7fd25 already registered
(sd:740)
2017-02-09 15:16:24,052 INFO  (jsonrpc/1) [storage.SANLock] Acquiring
Lease(name='SDM',
path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-6
2fe86c7fd25/dom_md/leases', offset=1048576) for host id 1 (clusterlock:343)
2017-02-09 15:16:24,057 INFO  (jsonrpc/1) [storage.SANLock] Releasing host
id for domain 60455567-ad30-42e3-a9df-62fe86c7fd25 (id: 1) (clusterlock:305)
2017-02-09 15:16:25,149 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call
GlusterHost.list succeeded in 0.17 seconds (__init__:515)
2017-02-09 15:16:25,264 INFO  (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from :::
127.0.0.1:55060 (protocoldetector:72)
2017-02-09 15:16:25,270 INFO  (Reactor thread) [ProtocolDetector.Detector]
Detected protocol stomp from :::127.0.0.1:55060 (protocoldetector:127)
2017-02-09 15:16:25,271 INFO  (Reactor thread) [Broker.StompAdapter]
Processing CONNECT request (stompreactor:102)
2017-02-09 15:16:25,271 INFO  (JsonRpc (StompReactor))
[Broker.StompAdapter] Subscribe command received (stompreactor:129)
2017-02-09 15:16:25,416 INFO  (jsonrpc/5) [jsonrpc.JsonRpcServer] RPC call
Host.getHardwareInfo succeeded in 0.01 seconds (__init__:515)
2017-02-09 15:16:25,419 INFO  (jsonrpc/6) [dispatcher] Run and protect:
repoStats(options=None) (logUtils:49)
2017-02-09 15:16:25,419 INFO  (jsonrpc/6) [dispatcher] Run and protect:
repoStats, Return response: {u'e8d04da7-ad3d-4227-a45d-b5a29b2f43e5':
{'code': 0, 'actual': True
, 'version': 4, 'acquired': True, 'delay': '0.000854128', 'lastCheck':
'5.1', 'valid': True}, u'a77b8821-ff19-4d17-a3ce-a6c3a69436d5': {'code': 0,
'actual': True, 'vers
ion': 4, 'acquired': True, 'delay': '0.000966556', 'lastCheck': '2.6',
'valid': True}} (logUtils:52)
2017-02-09 15:16:25,447 INFO  (jsonrpc/6) [jsonrpc.JsonRpcServer] RPC call
Host.getStats succeeded in 0.03 seconds (__init__:515)
2017-02-09 15:16:25,450 ERROR (JsonRpc (StompReactor)) [vds.dispatcher] SSL
error receiving from : unexpected eof
(betterAsyncore:113)
2017-02-09 15:16:25,812 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call
GlusterVolume.list succeeded in 0.10 seconds (__init__:515)
2017-02-09 15:16:25,940 INFO  (Reactor thread)
[ProtocolDetector.AcceptorImpl] Accepted connection from :::
127.0.0.1:55062 (protocoldetector:72)
2017-02-09 15:16:25,946 INFO  (Reactor thread) [ProtocolDetector.Detector]
Detected protocol stomp from :::127.0.0.1:55062 (protocoldetector:127)
2017-02-09 15:16:25,947 INFO  (Reactor thread) [Broker.StompAdapter]
Processing CONNECT request (stompreactor:102)
2017-02-09 15:16:25,947 INFO  (JsonRpc (StompReactor))
[Broker.StompAdapter] Subscribe command received (stompreactor:129)
2017-02-09 15:16:26,058 ERROR (jsonrpc/1) [storage.TaskManager.Task]
(Task='02cad901-5fe8-4f2d-895b-14184f67feab') Unexpected error (task:870)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 877, in _run
return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in
wrapper
res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 812, in
forcedDetachStorageDomain
self._deatchStorageDomainFromOldPools(sdUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 790, in
_deatchStorageDomainFromOldPools
dom.acquireClusterLock(host_id)
  File "/usr/share/vdsm/storage/sd.py", line 810, in acquireClusterLock
self._manifest.acquireDomainLock(hostID)
  File "/usr/share/vdsm/storage/sd.py", line 499, in acquireDomainLock
self._domainLock.acquire(hostID, self.getDomainLease())
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line
362, in acquire
"Cannot acquire %s" % (lease,), str(e))
AcquireLockFailure: Cannot obtain lock:
u"id=60455567-ad30-42e3-a9df-62fe86c7fd25, rc=5, out=Cannot acquire
Lease(name='SDM',
path=u'/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases',
offset=1048576), err=(5, 'Sanlock resource not acquired', 'Input/output
error')"
2017-02-09 15:16:26,058 INFO  (jsonrpc/1) [storage.TaskManager.Task]
(Task='02cad901-5fe8-4f2d-895b-14184f67feab') aborting: Task is aborted:
'Cannot obtain lock' - code 651 (task:1175)
2017-02-09 15:16:26,059 ERROR (jsonrpc/1) [storage.Dispatcher] {'status':
{'message': 'Cannot obtain lock: u"id=60455567-ad30-42e3-a9df-62fe86c7fd25,
rc=5, out=Cannot acquire Lease(name=\'SDM\',
path=u\'/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases\',
offset=1048576), err=(5, \'Sanlock resource not acquired\', \'Input/output
error\')"', 'code': 651}} (dispatcher:77)

[ovirt-users] Importing existing (dirty) storage domains

2017-02-09 Thread Doug Ingham
Hi All,
 My original HE died & was proving too much of a hassle to restore, so I've
setup a new HE on a new host & now want to import my previous data storage
domain with my VMs.

The problem is when I try to attach the new domain to the datacenter, it
hangs for a minute and then comes back with, "Failed to attach Storage
Domain data2 to Data Center Default"

It's a gluster volume, and I'm able to mount & write to it via the CLI on
the host, without issue.
It's also using the same gluster volume options as the initialized master
domain.

I get this in sanlock.log:

2017-02-09 14:54:57-0300 1698758 [9303]: s10:r12 resource
60455567-ad30-42e3-a9df-62fe86c7fd25:SDM:/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases:1048576
for 3,15,12533
2017-02-09 14:54:57-0300 1698758 [9303]: open error -5
/rhev/data-center/mnt/glusterSD/localhost:data2/60455567-ad30-42e3-a9df-62fe86c7fd25/dom_md/leases
2017-02-09 14:54:57-0300 1698758 [9303]: r12 acquire_token open error -5
2017-02-09 14:54:57-0300 1698758 [9303]: r12 cmd_acquire 3,15,12533
acquire_token -5

Any ideas?

-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Manually starting VMs via vdsClient (HE offline)

2017-02-09 Thread Doug Ingham
On 9 February 2017 at 12:03, Dan Yasny <dya...@gmail.com> wrote:

>
> On Thu, Feb 9, 2017 at 9:55 AM, Doug Ingham <dou...@gmail.com> wrote:
>
>> Hi Dan,
>>
>> On 8 February 2017 at 18:26, Dan Yasny <dya...@gmail.com> wrote:
>>>
>>>
>>> But seriously, above all, I'd recommend you backup the engine (it comes
>>> with a utility) often and well. I do it via cron every hour in production,
>>> keeping a rotation of hourly and daily backups, just in case. It doesn't
>>> take much space or resources, but it's more than just best practice - that
>>> database is the summary of the entire setup.
>>>
>>>
>> If you don't mind, may I ask what process you use for backing up your
>> engine? If you use HE, do you keep one server dedicated to just that VM?
>> I've not had that particular issue in the restore process yet, however I
>> read that it's recommended the HE host is free of virtual load before the
>> backup takes place. And as they need to be done frequently, I'm reading
>> that as a dedicated host...
>>
>>
> If you use a dedicated host, you might as well abandon self hosted. HE is
> nice for small setups with the HA built in for extra fun, but once you
> scale, it might not be able to cope and you'll need real hardware. You're
> running a heavy-ish java engine plus two databases after all.
>
> So as I said, all I do is add the engine-backup command to cron on the
> engine, and then my backup server comes in and pulls out the files via scp,
> also through cron. Nothing fancy really, but it lets me sleep at night
>

This particular project has 10 new maxed out servers to back it, and I
don't see it outgrowing that for at least a year or so. It's hardly a full
DC.
I presume the DB will become the heaviest part of the load, and I'm already
planning a separate high I/O environment for dedicated HA DB hosts.

See the top section of this page:
http://www.ovirt.org/documentation/self-hosted/chap-Backing_up_and_Restoring_an_EL-Based_Self-Hosted_Environment

It seems that I'll always have to keep at least one host free to be able to
avoid restore problems. If not, and I were to keep hourly backups, then
migrating VMs off the host every hour would just be a pain.

-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Manually starting VMs via vdsClient (HE offline)

2017-02-09 Thread Doug Ingham
Hi Dan,

On 8 February 2017 at 18:26, Dan Yasny  wrote:
>
>
> But seriously, above all, I'd recommend you backup the engine (it comes
> with a utility) often and well. I do it via cron every hour in production,
> keeping a rotation of hourly and daily backups, just in case. It doesn't
> take much space or resources, but it's more than just best practice - that
> database is the summary of the entire setup.
>
>
If you don't mind, may I ask what process you use for backing up your
engine? If you use HE, do you keep one server dedicated to just that VM?
I've not had that particular issue in the restore process yet, however I
read that it's recommended the HE host is free of virtual load before the
backup takes place. And as they need to be done frequently, I'm reading
that as a dedicated host...

-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Manually starting VMs via vdsClient (HE offline)

2017-02-08 Thread Doug Ingham
Hi Dan,

On 8 February 2017 at 18:10, Dan Yasny <dya...@gmail.com> wrote:

>
>
> On Wed, Feb 8, 2017 at 4:07 PM, Doug Ingham <dou...@gmail.com> wrote:
>
>> Hi Guys,
>>  My Hosted-Engine has failed & it looks like the easiest solution will be
>> to install a new one. Now before I try to re-add the old hosts (still
>> running the guest VMs) & import the storage domain into the new engine, in
>> case things don't go to plan, I want to make sure I'm able to bring up the
>> guests on the hosts manually.
>>
>> The problem is vdsClient is giving me an "Unexpected exception", without
>> much more info as to why it's failing.
>>
>> Any idea?
>>
>> [root@v0 ~]# vdsClient -s 0 list table | grep georep
>> 9d1c3fef-498e-4c20-b124-01364d4d45a8  30455  georep-proxy Down
>>
>> [root@v0 ~]# vdsClient -s 0 continue 9d1c3fef-498e-4c20-b124-01364d4d45a8
>> Unexpected exception
>>
>> /var/log/vdsm/vdsm.log
>> periodic/1063::WARNING::2017-02-08 17:57:52,532::periodic::276::v
>> irt.periodic.VmDispatcher::(__call__) could not run > 'vdsm.virt.periodic.DriveWatermarkMonitor'> on
>> ['65c9807c-7216-40b3-927c-5fd93bbd42ba', u'9d1c3fef-498e-4c20-b124-0136
>> 4d4d45a8']
>>
>>
> continue meane un-pause, not "start from a stopped state"
>

I search the manual for start/init/resume syntax, and "continue" was the
closest thing I found.


> now having said that, if you expect the VMs not to be able to start after
> you rebuild the engine and the VMs exist on the hosts, I'd collect a virsh
> -r dumpxml VMNAME for each - that way you have the disks in use, and all
> the VM configuration in a file, and with some minor LVM manipulation you'll
> be able to start the VM via virsh
>

My main concern is that I might have to halt the VMs or VDSM services for
some reason when trying to migrate to the new engine. I just want to make
sure that no matter what happens, I can still get the VMs back online.

I'm still getting myself acquainted with virsh/vdsClient. Could you provide
any insight into what I'd have to do to restart the guests manually?

Thanks,
-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Manually starting VMs via vdsClient (HE offline)

2017-02-08 Thread Doug Ingham
Hi Guys,
 My Hosted-Engine has failed & it looks like the easiest solution will be
to install a new one. Now before I try to re-add the old hosts (still
running the guest VMs) & import the storage domain into the new engine, in
case things don't go to plan, I want to make sure I'm able to bring up the
guests on the hosts manually.

The problem is vdsClient is giving me an "Unexpected exception", without
much more info as to why it's failing.

Any idea?

[root@v0 ~]# vdsClient -s 0 list table | grep georep
9d1c3fef-498e-4c20-b124-01364d4d45a8  30455  georep-proxy Down

[root@v0 ~]# vdsClient -s 0 continue 9d1c3fef-498e-4c20-b124-01364d4d45a8
Unexpected exception

/var/log/vdsm/vdsm.log
periodic/1063::WARNING::2017-02-08
17:57:52,532::periodic::276::virt.periodic.VmDispatcher::(__call__) could
not run  on
['65c9807c-7216-40b3-927c-5fd93bbd42ba',
u'9d1c3fef-498e-4c20-b124-01364d4d45a8']

Cheers,
-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Restoring Hosted-Engine from a stale backup

2017-02-06 Thread Doug Ingham
On 6 February 2017 at 13:30, Simone Tiraboschi  wrote:

>
>
>1. What problems can I expect to have with VMs added/modified
>since the last backup?
>
> Modified VMs will be reverted to the previous configuration;
>>> additional VMs should be seen as external VMs, then you could import.
>>>
>>
>> Given VDSM kept the VMs up whilst the HE's been down, how will the
>> running VMs that were present before & after the backup be affected?
>>
>> Many of the VMs that were present during the last backup are now on
>> different hosts, including the HE VM. Will that cause any issues?
>>
>
> For normal VMs I don't expect any issue: the engine will simply update the
> correspondent record once it will find them on the managed hosts.
> A serious issue could instead happen with HA VMs:
> if the engine finds earlier an HA VM as running on a different host it
> will simply update its record, the issue is if it finds earlier the VM a
> not on the original host since it will try to restart it causing a split
> brain and probably a VM corruption.
> I opened a bug to track it:
> https://bugzilla.redhat.com/show_bug.cgi?id=1419649
>

Ouch. *All* of our VMs are HA by default.

So the simplest current solution would be to shutdown the running VMs in
VDSM, before restoring the backup & running engine-setup?

Cheers,
-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Safe to install/use epel repository?

2017-01-31 Thread Doug Ingham
On 31 January 2017 at 13:27, Gianluca Cecchi 
wrote:

> This in CentOS 7.3 plain hosts used as hypervisors and intended for oVirt
> 4.0 and 4.1 hosts.
> In particular for performance related packages such as
> bwm-ng
> iftop
> htop
> nethogs
> and the like.
> Thanks,
> Gianluca
>

We've been running CentOS 7 with the epel, ovirt, gluster & zabbix repos on
all of our hosts.for a good few months now, without issue.

-- 
D
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Location of the gluster client log with libgfapi

2017-01-27 Thread Doug Ingham
Hey guys,
 Would anyone be able to tell me the name/location of the gluster client
log when mounting through libgfapi?

Cheers,
-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Actual downtime during migration?

2017-01-27 Thread Doug Ingham
>
> its memory resynchronised one last time
>
Actually, thinking about it, rather than diffing *all* of the memory on the
first host to resync it at the last moment, the hypervisor probably
simultaneously copies the current state of memory & uses copy-on-write
(COW) to write all new transactions to both hosts.
Only a technical difference, but it'd greatly reduce how long the VM needs
to be paused to complete a full sync.

-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Actual downtime during migration?

2017-01-27 Thread Doug Ingham
Hi Gianluca,
 My educated guess...When you live migrate a VM, its state in memory is
copied over to the new host, but the VM still remains online during this
period to minimise downtime. Once its state in memory is fully copied to
the new host, the VM is paused on the original host, its memory
resynchronised one last time & then brought up on the new host.
So whilst the entire process took 39 seconds, actual downtime was only
133ms.

Doug

On 27 January 2017 at 09:34, Gianluca Cecchi 
wrote:

> Hello,
> I was testing put host into maintenance on 4.0.6, with 1 VM running.
> It correctly completes the live migration of the VM and I see this event
> in pane:
>
> Migration completed (VM: ol65, Source: ovmsrv06, Destination: ovmsrv05,
> Duration: 39 seconds, Total: 39 seconds, Actual downtime: 133ms)
>
> What is considered as "Actual downtime"?
>
> Thanks,
> Gianluca
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>


-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] safe to reboot ovirt-engine?

2017-01-26 Thread Doug Ingham
Make sure to enable global maintenance mode before doing so!

The Hosted-Engine is just a manager for the underlying hypervisors, which
will keep running the VMs as usual until the engine comes back online.
Disable maintenance mode afterwards.

On 26 January 2017 at 13:48, Wout Peeters  wrote:

> Hi,
>
> A simple answer to this I'm sure, but is it safe to reboot the
> ovirt-engine while vms on the vm-hosts connected to it are running?
> Anything in particular to take into account while doing so?
>
> Thanks.
>
> Kind regards,
>
> Wout
>
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>


-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] get UI working throug ALIAS and real hostname

2017-01-24 Thread Doug Ingham
On 24 January 2017 at 15:15, emanuel.santosvar...@mahle.com wrote:

> If I access the UI via "ALIAS" I get the Error-Page "The client is not
> authorized to request an authorization. It's required to access the system
> using FQDN.
>
> What can I do to get UI working through ALIAS and real hostname?
>

The Hosted-Engine should be installed & configured with the correct FQDN to
begin with. Changing it post install is currently unsupported & can cause a
whole host of problems. There are a couple of documented cases where people
have attempted to do so with varying success, but non have come out problem
free.

I'm currently in the same situation after a company domain change and have
decided that the risk of unforeseen issues & potential problems down the
line, are far greater than the pain of redeploying & migrating to a new
environment.

A really hacky hack, without interfering with the engine, would be to try &
put a reverse proxy in front, but that'll require a load of dynamic
rewriting filters to work.


On 24 January 2017 at 11:41, Juan Hernández  wrote:

> Create a 99-whatever-you-like.conf file in
> /etc/ovirt-engine/engine.conf.d with the following content:
>
>   SSO_ALTERNATE_ENGINE_FQDNS="thealias"
>
> Then restart the engine:
>
>   systemctl restart ovirt-engine
>
> This setting is documented here:
>
>
> https://github.com/oVirt/ovirt-engine/blob/master/
> packaging/services/ovirt-engine/ovirt-engine.conf.in#L363-L366
>

AFAIK, the SSL certificates will still need updating & I've read of people
still having other issues due to differing FQDNs. Being able to update the
HE's FQDN would be of big interest to me, but I've not yet seen one case
where it didn't end with anomalies...


-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Restoring Hosted-Engine from a stale backup

2017-01-24 Thread Doug Ingham
Hey guys,
 Just giving this a bump in the hope that someone might be able to advise...

Hi all,
>  One of our engines has had a DB failure* & it seems there was an
> unnoticed problem in its backup routine, meaning the last backup I've got
> is a couple of weeks old.
> Luckily, VDSM has kept the underlying VMs running without any
> interruptions, so my objective is to get the HE back online & get the hosts
> & VMs back under its control with minimal downtime.
>
> So, my questions are the following...
>
>1. What problems can I expect to have with VMs added/modified since
>the last backup?
>2. As it's only the DB that's been affected, can I skip redeploying
>the Engine & jump straight to restoring the DB & rerunning engine-setup?
>3. The original docs I read didn't mention that it's best to leave a
>host in maintenance mode before running the engine backup, so my plan is to
>install a new temporary host on a separate server, re-add the old hosts &
>then once everything's back up, remove the temporary host. Are there any
>faults in this plan?
>4. When it comes to deleting the old HE VM, the docs point to a
>paywalled guide on redhat.com...?
>
>  To add a bit more info to 4), I'm referring to the following...

Note: If the Engine database is restored successfully, but the Engine
> virtual machine appears to be Down and cannot be migrated to another
> self-hosted engine host, you can enable a new Engine virtual machine and
> remove the dead Engine virtual machine from the environment by following
> the steps provided in https://access.redhat.com/solutions/1517683.
>
Source:
http://www.ovirt.org/documentation/self-hosted/chap-Backing_up_and_Restoring_an_EL-Based_Self-Hosted_Environment/

CentOS 7
> oVirt 4.0.4
> Gluster 3.8
>
> * Apparently a write somehow cleared fsync, despite not actually having
> been written to disk?! No idea how that happened...
>
> Many thanks,
> --
> Doug
>

Cheers,
-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Restoring Hosted-Engine from a stale backup

2017-01-20 Thread Doug Ingham
Hi all,
 One of our engines has had a DB failure* & it seems there was an unnoticed
problem in its backup routine, meaning the last backup I've got is a couple
of weeks old.
Luckily, VDSM has kept the underlying VMs running without any
interruptions, so my objective is to get the HE back online & get the hosts
& VMs back under its control with minimal downtime.

So, my questions are the following...

   1. What problems can I expect to have with VMs added/modified since the
   last backup?
   2. As it's only the DB that's been affected, can I skip redeploying the
   Engine & jump straight to restoring the DB & rerunning engine-setup?
   3. The original docs I read didn't mention that it's best to leave a
   host in maintenance mode before running the engine backup, so my plan is to
   install a new temporary host on a separate server, re-add the old hosts &
   then once everything's back up, remove the temporary host. Are there any
   faults in this plan?
   4. When it comes to deleting the old HE VM, the docs point to a
   paywalled guide on redhat.com...?

CentOS 7
oVirt 4.0.4
Gluster 3.8

* Apparently a write somehow cleared fsync, despite not actually having
been written to disk?! No idea how that happened...

Many thanks,
-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VM listed twice in oVirt UI

2017-01-17 Thread Doug Ingham
Hi Martin,

> My initial fear was that it was trying to execute the VM twice, however
> it turned out to be just a GUI issue.
> > VM management was unaffected.
>
> How many times do you see the VM in the REST API? /ovirt-engine/api/vms
> If you see two of them, do they have different or the same ID?
>

I didn't notice any differing IDs in the logs & it was the same in the
interface (although it wasn't possible to select the 2 listed items
independently, they displayed & acted as a single item).
I've made a note to check the REST API if it occurs in the future, however
my HE has since borked itself & I'm now in the process of
restoring/redeploying it. I've got access to the logs, but the engine & API
are now offline.

Doug


> On Tue, Jan 17, 2017 at 1:52 PM, Doug Ingham <dou...@gmail.com> wrote:
>
>> Hi Tomas,
>>
>> Heh, that seems interesting. Any chance you have the engine and vdsm logs
>> from the time the second vm has shown up?
>>
>>
>> I'll see if there's anything shareable in the Engine logs during the
>> window it happened, however VDSM logs will be difficult as the VMs moved
>> around a lot whilst I was trying to get everything back online.
>>
>>
>> BTW if you restart the VM, will one of them disappear? or you will have 2
>> down VMs?
>>
>>
>> It's persisted reboots of the VM, hosts and the Engine.
>>
>> My initial fear was that it was trying to execute the VM twice, however
>> it turned out to be just a GUI issue. VM management was unaffected.
>>
>> Doug
>>
>> On Mon, Jan 9, 2017 at 8:09 PM, Doug Ingham <dou...@gmail.com> wrote:
>>
>>> Hi all,
>>>  We had some hiccups in our datacenter over the new year which caused
>>> some problems with our hosted engine.
>>>
>>> I've managed to get everything back up & running, however now one of the
>>> VMs is listed twice in the UI. When I click on the VM, both items are
>>> highlighted & I'm able to configure & manage the VM as usual.
>>> The FQDN & other things are picked up from the guest agent & displayed
>>> on the UI as usual, however the CPU, RAM & Uptime stats for one of the pair
>>> continue to show the stats from when the cluster died.
>>>
>>> Any ideas?
>>>
>>> oVirt 4.0.3.7
>>> Gluster 3.8.5
>>> CentOS 7
>>>
>>> [image: Inline images 1]
>>>
>>> Many thanks,
>>> --
>>> Doug
>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>> .
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>


-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] VM listed twice in oVirt UI

2017-01-17 Thread Doug Ingham
Hi Tomas,

Heh, that seems interesting. Any chance you have the engine and vdsm logs
from the time the second vm has shown up?


I'll see if there's anything shareable in the Engine logs during the window
it happened, however VDSM logs will be difficult as the VMs moved around a
lot whilst I was trying to get everything back online.


BTW if you restart the VM, will one of them disappear? or you will have 2
down VMs?


It's persisted reboots of the VM, hosts and the Engine.

My initial fear was that it was trying to execute the VM twice, however it
turned out to be just a GUI issue. VM management was unaffected.

Doug

On Mon, Jan 9, 2017 at 8:09 PM, Doug Ingham <dou...@gmail.com> wrote:

> Hi all,
>  We had some hiccups in our datacenter over the new year which caused some
> problems with our hosted engine.
>
> I've managed to get everything back up & running, however now one of the
> VMs is listed twice in the UI. When I click on the VM, both items are
> highlighted & I'm able to configure & manage the VM as usual.
> The FQDN & other things are picked up from the guest agent & displayed on
> the UI as usual, however the CPU, RAM & Uptime stats for one of the pair
> continue to show the stats from when the cluster died.
>
> Any ideas?
>
> oVirt 4.0.3.7
> Gluster 3.8.5
> CentOS 7
>
> [image: Inline images 1]
>
> Many thanks,
> --
> Doug
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] oVirt getting confused w/ hosts' gluster bricks on different FQDNs/interfaces?

2017-01-10 Thread Doug Ingham
Hey all,
 Each of my hosts/nodes also hosts its own gluster bricks for the storage
domains, and peers over a dedicated FQDN & interface.

For example, the first server is setup like the following...
eth0: v0.dc0.example.com (10.10.10.100)
eth1: s0.dc0.example.com (10.123.123.100)

As it's a self-hosted engine, the gluster volumes & peers were necessarily
setup outside of the oVirt UI, before the HE was deployed. oVirt then
picked up the volume & brick configurations when the storage domains were
added.

However on the UI, the servers & directories for the bricks under the
Volumes tab all show the main FQDNs of the hosts, not the FQDNs used for
the storage layer.
When I click on "Advanced Details" for a brick, it just shows "Error in
fetching the brick details, please try again."

Could this be a sign of a problem, or potentially be causing any problems?

Technically, the volumes can be mounted on both FQDNs as glusterd listens
on all interfaces, however it only peers on the storage interfaces.

(The bricks on my data volume, and only my data volume, keep coming
unsynced and I'm trying to locate the exact cause :/ )

Any help much appreciated,
-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] VM listed twice in oVirt UI

2017-01-09 Thread Doug Ingham
Hi all,
 We had some hiccups in our datacenter over the new year which caused some
problems with our hosted engine.

I've managed to get everything back up & running, however now one of the
VMs is listed twice in the UI. When I click on the VM, both items are
highlighted & I'm able to configure & manage the VM as usual.
The FQDN & other things are picked up from the guest agent & displayed on
the UI as usual, however the CPU, RAM & Uptime stats for one of the pair
continue to show the stats from when the cluster died.

Any ideas?

oVirt 4.0.3.7
Gluster 3.8.5
CentOS 7

[image: Inline images 1]

Many thanks,
-- 
Doug
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users