[ovirt-users] Re: Turn Off Email Alerts

2018-08-28 Thread Johan Bernhardsson
Those alerts are also coming from hosted-engine that keeps ovirt manager 
running.


I would rather have a filter in my email client for them than disabling all 
of the alerting stuff


/Johan

On August 28, 2018 22:36:34 Douglas Duckworth  wrote:

Hi

Can someone please help?  I keep getting ovirt alerts via email despite 
turning off postix and ovirt-engine-notifier.service


Thanks,

Douglas Duckworth, MSc, LFCS
HPC System Administrator
Scientific Computing Unit
Weill Cornell Medicine
1300 York Avenue
New York, NY 10065
E: d...@med.cornell.edu
O: 212-746-6305
F: 212-746-8690



On Fri, Aug 24, 2018 at 8:59 AM, Douglas Duckworth 
 wrote:


Hi

How do I turn off hosted engine alerts?  We are in a testing phase so these 
are not needed.  I have disabled postfix on all hosts as well as stopped 
the ovirt notification daemon on the hosted engine.  I kept it running 
while putting /dev/null in 
/usr/share/ovirt-engine/services/ovirt-engine-notifier/ovirt-engine-notifier.conf 
for mail server.  Yet I still get alerts for every thing done such as 
putting hosts in maintenance mode.  Very confusing.


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/X6YRFWUAKYFY2HQF56HGUK3BPXJL2HBH/


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/L5ZLZ2ENRWHRXXK57YICY2PXWMICMC4J/


[ovirt-users] Is the lists spam filter broken?

2018-08-12 Thread Johan Bernhardsson

Several mails today that is pure spam 


/Johan

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LMCREWRKSQDDJGJKAMR42W6TTZEEWWGF/


[ovirt-users] Re: How to keep VM running when a storage domain is offline

2018-07-24 Thread Johan Bernhardsson
You need replicated gluster storage for that. So that one part of the 
storage can go down but two others still keep on running. And you need to 
set the threshold so that a quorum of 2 is sufficient (if you have replica 3).


If the storage volume is offline. It would be the same as the same thing as 
if you pulled the hard drive from a server.


So if ovirt wouldn't pause the virtual server then Linux would complain and 
set it to read only.



On July 24, 2018 13:05:13 Matthew B  wrote:

Hello,

I am trying to understand how I can prevent a VM from being paused when one 
of it's disks is unavailable due to a problem with the storage domain.


The scenario:

A VM with 3 disks. OS disk on a highly available domain. And Two large 
disks each on separate domains. (so a total of 3 domains).


The two large disks are mirrored using ZFS - but when one of the storage 
domains goes down the VM pauses. Is is possible to configure the VM to not 
pause when certain storage domains are unavailable? So instead of getting 
Paused due to IO error the disk would just be missing until that domain was 
brought back online?


Thanks,
-Matthew
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ICR64SPQO2EVXP65N35GBCEFGAL4DS5U/


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BBNFIUOJ6EQMUNXZMMUHEK4QEOZ7QRCT/


[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-09 Thread Johan Bernhardsson
aemon
1890 zabbix20   0   83904   1696   1612 S   0.3  0.0  24:30.63 
/usr/sbin/zabbix_agentd: collector [idle 1 sec]
2722 root  20   0 1298004   6148   2580 S   0.3  0.0  38:10.82 
/usr/sbin/glusterfsd -s ovirt3.nwfiber.com --volfile-id 
iso.ovirt3.nwfiber.com.gluster-brick4-iso -p /var/run/gl+
6340 root  20   0   0  0  0 S   0.3  0.0   0:04.30 
[kworker/7:0]
10652 root  20   0   0  0  0 S   0.3  0.0   0:00.23 
[kworker/u64:2]
14724 root  20   0 1076344  17400   3200 S   0.3  0.1  10:04.13 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/run/gluster/glustershd/glustershd.pid -+
22011 root  20   0   0  0  0 S   0.3  0.0   0:05.04 
[kworker/10:1]


Not sure why the system load dropped other than I was trying to take a 
picture of it :)


In any case, it appears that at this time, I have plenty of swap, ram, and 
network capacity, and yet things are still running very sluggish; I'm still 
getting e-mails from servers complaining about loss of communication with 
something or another; I still get e-mails from the engine about bad engine 
status, then recovery, etc.


1g isn't good enough for Gluster. It doesn't help that you have SSD, 
because network is certainly your bottleneck even for regular performance, 
not to mention when you are healing. Jumbo frames would give you additional 
5% or so - nothing to write home about.



I've shut down 2/3 of my VMs, toojust trying to keep the critical ones 
operating.


At this point, I don't believe the problem is the memory leak, but it seems 
to be triggered by the memory leak, as in all my problems started when I 
got low ram warnings from one of my 3 nodes and began recovery efforts from 
that.


I do really like the idea / concept behind glusterfs, but I really have to 
figure out why its been so poor performing from day one, and its caused 95% 
of my outages (including several large ones lately).  If I can get it 
stable, reliable, and well performing, then I'd love to keep it.  If I 
can't, then perhaps NFS is the way to go?  I don't like the single point of 
failure aspect of it, but my other NAS boxes I run for clients (central 
storage for windows boxes) have been very solid; If I could get that kind 
of reliability for my ovirt stack, it would be a substantial improvement.  
Currently, it seems about every other month I have a gluster-induced outage.


Sometimes I wonder if its just hyperconverged is the issue, but my 
infrastructure doesn't justify three servers at the same location...I might 
be able to do two, but even that seems like its pushing it.


We have many happy users running Gluster and hyperconverged. We need to 
understand where's the failure in your setup.



Looks like I can upgrade to 10G for about $900.  I can order a dual-Xeon 
supermicro 12-disk server, loaded with 2TB WD Enterprise disks and a pair 
of SSDs for the os, 32GB ram, 2.67Ghz CPUs for about $720 delivered.  I've 
got to do something to improve my reliability; I can't keep going the way I 
have been


Agreed. Thanks for continuing looking into this, we'll probably need some 
Gluster logs to understand what's going on.

Y.


--Jim



On Fri, Jul 6, 2018 at 9:13 PM, Johan Bernhardsson  wrote:

Load like that is mostly io based either the machine is swapping or network 
is to slow. Check I/o wait in top.


And the problem where you get oom killer to kill off gluster. That means 
that you don't monitor ram usage on the servers? Either it's eating all 
your ram and swap gets really io intensive and then is killed off. Or you 
have the wrong swap settings in sysctl.conf (there are tons of broken 
guides that recommends swappines to 0 but that disables swap on newer 
kernels. The proper swappines for only swapping when nesseary is 1 or a 
sufficiently low number like 10 default is 60)



Moving to nfs will not improve things. You will get more memory since 
gluster isn't running and that is good. But you will have a single node 
that can fail with all your storage and it would still be on 1 gigabit only 
and your three node cluster would easily saturate that link.


On July 7, 2018 04:13:13 Jim Kusznir  wrote:
So far it does not appear to be helping much. I'm still getting VM's 
locking up and all kinds of notices from overt engine about non-responsive 
hosts.  I'm still seeing load averages in the 20-30 range.


Jim

On Fri, Jul 6, 2018, 3:13 PM Jim Kusznir  wrote:
Thank you for the advice and help

I do plan on going 10Gbps networking; haven't quite jumped off that cliff 
yet, though.


I did put my data-hdd (main VM storage volume) onto a dedicated 1Gbps 
network, and I've watched throughput on that and never seen more than 
60GB/s achieved (as reported by bwm-ng).  I have a separate 1Gbps network 
for communication and ovirt migration, but I wanted to break that up 
further (separate out VM traffice from migration/mgmt traffic).  My three 
SSD-backed gluster volumes run the main network

[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-09 Thread Johan Bernhardsson
.com,debug-threads=on -S 
-object secret,id=masterKey0,format=ra+


The VMs I see here and above together account for most? (5.2+3.6+1.5+1.7 = 
12GB) - still plenty of memory left.


10 root  20   0   0  0  0 S   0.3  0.0 215:54.72 [rcu_sched]
1030 sanlock   rt   0  773804  27908   2744 S   0.3  0.1  35:55.61 
/usr/sbin/sanlock daemon
1890 zabbix20   0   83904   1696   1612 S   0.3  0.0  24:30.63 
/usr/sbin/zabbix_agentd: collector [idle 1 sec]
2722 root  20   0 1298004   6148   2580 S   0.3  0.0  38:10.82 
/usr/sbin/glusterfsd -s ovirt3.nwfiber.com --volfile-id 
iso.ovirt3.nwfiber.com.gluster-brick4-iso -p /var/run/gl+
6340 root  20   0   0  0  0 S   0.3  0.0   0:04.30 
[kworker/7:0]
10652 root  20   0   0  0  0 S   0.3  0.0   0:00.23 
[kworker/u64:2]
14724 root  20   0 1076344  17400   3200 S   0.3  0.1  10:04.13 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/run/gluster/glustershd/glustershd.pid -+
22011 root  20   0   0  0  0 S   0.3  0.0   0:05.04 
[kworker/10:1]


Not sure why the system load dropped other than I was trying to take a 
picture of it :)


In any case, it appears that at this time, I have plenty of swap, ram, and 
network capacity, and yet things are still running very sluggish; I'm still 
getting e-mails from servers complaining about loss of communication with 
something or another; I still get e-mails from the engine about bad engine 
status, then recovery, etc.


1g isn't good enough for Gluster. It doesn't help that you have SSD, 
because network is certainly your bottleneck even for regular performance, 
not to mention when you are healing. Jumbo frames would give you additional 
5% or so - nothing to write home about.



I've shut down 2/3 of my VMs, toojust trying to keep the critical ones 
operating.


At this point, I don't believe the problem is the memory leak, but it seems 
to be triggered by the memory leak, as in all my problems started when I 
got low ram warnings from one of my 3 nodes and began recovery efforts from 
that.


I do really like the idea / concept behind glusterfs, but I really have to 
figure out why its been so poor performing from day one, and its caused 95% 
of my outages (including several large ones lately).  If I can get it 
stable, reliable, and well performing, then I'd love to keep it.  If I 
can't, then perhaps NFS is the way to go?  I don't like the single point of 
failure aspect of it, but my other NAS boxes I run for clients (central 
storage for windows boxes) have been very solid; If I could get that kind 
of reliability for my ovirt stack, it would be a substantial improvement.  
Currently, it seems about every other month I have a gluster-induced outage.


Sometimes I wonder if its just hyperconverged is the issue, but my 
infrastructure doesn't justify three servers at the same location...I might 
be able to do two, but even that seems like its pushing it.


We have many happy users running Gluster and hyperconverged. We need to 
understand where's the failure in your setup.



Looks like I can upgrade to 10G for about $900.  I can order a dual-Xeon 
supermicro 12-disk server, loaded with 2TB WD Enterprise disks and a pair 
of SSDs for the os, 32GB ram, 2.67Ghz CPUs for about $720 delivered.  I've 
got to do something to improve my reliability; I can't keep going the way I 
have been


Agreed. Thanks for continuing looking into this, we'll probably need some 
Gluster logs to understand what's going on.

Y.


--Jim



On Fri, Jul 6, 2018 at 9:13 PM, Johan Bernhardsson  wrote:

Load like that is mostly io based either the machine is swapping or network 
is to slow. Check I/o wait in top.


And the problem where you get oom killer to kill off gluster. That means 
that you don't monitor ram usage on the servers? Either it's eating all 
your ram and swap gets really io intensive and then is killed off. Or you 
have the wrong swap settings in sysctl.conf (there are tons of broken 
guides that recommends swappines to 0 but that disables swap on newer 
kernels. The proper swappines for only swapping when nesseary is 1 or a 
sufficiently low number like 10 default is 60)



Moving to nfs will not improve things. You will get more memory since 
gluster isn't running and that is good. But you will have a single node 
that can fail with all your storage and it would still be on 1 gigabit only 
and your three node cluster would easily saturate that link.


On July 7, 2018 04:13:13 Jim Kusznir  wrote:
So far it does not appear to be helping much. I'm still getting VM's 
locking up and all kinds of notices from overt engine about non-responsive 
hosts.  I'm still seeing load averages in the 20-30 range.


Jim

On Fri, Jul 6, 2018, 3:13 PM Jim Kusznir  wrote:
Thank you for the advice and help

I do plan on going 10Gbps networking; haven't quite jumped off that cliff 
yet, though.


I did put my data-hdd (main VM storage volume) onto a dedic

[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-07 Thread Johan Bernhardsson
0  0 S   0.3  0.0   0:05.04 
[kworker/10:1]


Not sure why the system load dropped other than I was trying to take a 
picture of it :)


In any case, it appears that at this time, I have plenty of swap, ram, and 
network capacity, and yet things are still running very sluggish; I'm still 
getting e-mails from servers complaining about loss of communication with 
something or another; I still get e-mails from the engine about bad engine 
status, then recovery, etc.


I've shut down 2/3 of my VMs, toojust trying to keep the critical ones 
operating.


At this point, I don't believe the problem is the memory leak, but it seems 
to be triggered by the memory leak, as in all my problems started when I 
got low ram warnings from one of my 3 nodes and began recovery efforts from 
that.


I do really like the idea / concept behind glusterfs, but I really have to 
figure out why its been so poor performing from day one, and its caused 95% 
of my outages (including several large ones lately).  If I can get it 
stable, reliable, and well performing, then I'd love to keep it.  If I 
can't, then perhaps NFS is the way to go?  I don't like the single point of 
failure aspect of it, but my other NAS boxes I run for clients (central 
storage for windows boxes) have been very solid; If I could get that kind 
of reliability for my ovirt stack, it would be a substantial improvement.  
Currently, it seems about every other month I have a gluster-induced outage.


Sometimes I wonder if its just hyperconverged is the issue, but my 
infrastructure doesn't justify three servers at the same location...I might 
be able to do two, but even that seems like its pushing it.


Looks like I can upgrade to 10G for about $900.  I can order a dual-Xeon 
supermicro 12-disk server, loaded with 2TB WD Enterprise disks and a pair 
of SSDs for the os, 32GB ram, 2.67Ghz CPUs for about $720 delivered.  I've 
got to do something to improve my reliability; I can't keep going the way I 
have been


--Jim



On Fri, Jul 6, 2018 at 9:13 PM, Johan Bernhardsson  wrote:

Load like that is mostly io based either the machine is swapping or network 
is to slow. Check I/o wait in top.


And the problem where you get oom killer to kill off gluster. That means 
that you don't monitor ram usage on the servers? Either it's eating all 
your ram and swap gets really io intensive and then is killed off. Or you 
have the wrong swap settings in sysctl.conf (there are tons of broken 
guides that recommends swappines to 0 but that disables swap on newer 
kernels. The proper swappines for only swapping when nesseary is 1 or a 
sufficiently low number like 10 default is 60)



Moving to nfs will not improve things. You will get more memory since 
gluster isn't running and that is good. But you will have a single node 
that can fail with all your storage and it would still be on 1 gigabit only 
and your three node cluster would easily saturate that link.


On July 7, 2018 04:13:13 Jim Kusznir  wrote:

So far it does not appear to be helping much. I'm still getting VM's 
locking up and all kinds of notices from overt engine about non-responsive 
hosts.  I'm still seeing load averages in the 20-30 range.


Jim

On Fri, Jul 6, 2018, 3:13 PM Jim Kusznir  wrote:
Thank you for the advice and help

I do plan on going 10Gbps networking; haven't quite jumped off that cliff 
yet, though.


I did put my data-hdd (main VM storage volume) onto a dedicated 1Gbps 
network, and I've watched throughput on that and never seen more than 
60GB/s achieved (as reported by bwm-ng).  I have a separate 1Gbps network 
for communication and ovirt migration, but I wanted to break that up 
further (separate out VM traffice from migration/mgmt traffic).  My three 
SSD-backed gluster volumes run the main network too, as I haven't been able 
to get them to move to the new network (which I was trying to use as all 
gluster).  I tried bonding, but that seamed to reduce performance rather 
than improve it.


--Jim

On Fri, Jul 6, 2018 at 2:52 PM, Jamie Lawrence  
wrote:

Hi Jim,

I don't have any targeted suggestions, because there isn't much to latch on 
to. I can say Gluster replica three  (no arbiters) on dedicated servers 
serving a couple Ovirt VM clusters here have not had these sorts of issues.


I suspect your long heal times (and the resultant long periods of high 
load) are at least partly related to 1G networking. That is just a matter 
of IO - heals of VMs involve moving a lot of bits. My cluster uses 10G 
bonded NICs on the gluster and ovirt boxes for storage traffic and separate 
bonded 1G for ovirtmgmt and communication with other machines/people, and 
we're occasionally hitting the bandwidth ceiling on the storage network. 
I'm starting to think about 40/100G, different ways of splitting up 
intensive systems, and considering iSCSI for specific volumes, although I 
really don't want to go there.


I don't run FreeNAS[1], but I do run FreeBSD as storage servers for their 
exc

[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-06 Thread Johan Bernhardsson
Load like that is mostly io based either the machine is swapping or network 
is to slow. Check I/o wait in top.


And the problem where you get oom killer to kill off gluster. That means 
that you don't monitor ram usage on the servers? Either it's eating all 
your ram and swap gets really io intensive and then is killed off. Or you 
have the wrong swap settings in sysctl.conf (there are tons of broken 
guides that recommends swappines to 0 but that disables swap on newer 
kernels. The proper swappines for only swapping when nesseary is 1 or a 
sufficiently low number like 10 default is 60)



Moving to nfs will not improve things. You will get more memory since 
gluster isn't running and that is good. But you will have a single node 
that can fail with all your storage and it would still be on 1 gigabit only 
and your three node cluster would easily saturate that link.


On July 7, 2018 04:13:13 Jim Kusznir  wrote:
So far it does not appear to be helping much. I'm still getting VM's 
locking up and all kinds of notices from overt engine about non-responsive 
hosts.  I'm still seeing load averages in the 20-30 range.


Jim

On Fri, Jul 6, 2018, 3:13 PM Jim Kusznir  wrote:
Thank you for the advice and help

I do plan on going 10Gbps networking; haven't quite jumped off that cliff 
yet, though.


I did put my data-hdd (main VM storage volume) onto a dedicated 1Gbps 
network, and I've watched throughput on that and never seen more than 
60GB/s achieved (as reported by bwm-ng).  I have a separate 1Gbps network 
for communication and ovirt migration, but I wanted to break that up 
further (separate out VM traffice from migration/mgmt traffic).  My three 
SSD-backed gluster volumes run the main network too, as I haven't been able 
to get them to move to the new network (which I was trying to use as all 
gluster).  I tried bonding, but that seamed to reduce performance rather 
than improve it.


--Jim

On Fri, Jul 6, 2018 at 2:52 PM, Jamie Lawrence  
wrote:


Hi Jim,

I don't have any targeted suggestions, because there isn't much to latch on 
to. I can say Gluster replica three  (no arbiters) on dedicated servers 
serving a couple Ovirt VM clusters here have not had these sorts of issues.


I suspect your long heal times (and the resultant long periods of high 
load) are at least partly related to 1G networking. That is just a matter 
of IO - heals of VMs involve moving a lot of bits. My cluster uses 10G 
bonded NICs on the gluster and ovirt boxes for storage traffic and separate 
bonded 1G for ovirtmgmt and communication with other machines/people, and 
we're occasionally hitting the bandwidth ceiling on the storage network. 
I'm starting to think about 40/100G, different ways of splitting up 
intensive systems, and considering iSCSI for specific volumes, although I 
really don't want to go there.


I don't run FreeNAS[1], but I do run FreeBSD as storage servers for their 
excellent ZFS implementation, mostly for backups. ZFS will make your `heal` 
problem go away, but not your bandwidth problems, which become worse 
(because of fewer NICS pushing traffic). 10G hardware is not exactly in the 
impulse-buy territory, but if you can, I'd recommend doing some testing 
using it. I think at least some of your problems are related.


If that's not possible, my next stops would be optimizing everything I 
could about sharding, healing and optimizing for serving the shard size to 
squeeze as much performance out of 1G as I could, but that will only go so far.


-j

[1] FreeNAS is just a storage-tuned FreeBSD with a GUI.



On Jul 6, 2018, at 1:19 PM, Jim Kusznir  wrote:

hi all:

Once again my production ovirt cluster is collapsing in on itself.  My 
servers are intermittently unavailable or degrading, customers are noticing 
and calling in.  This seems to be yet another gluster failure that I 
haven't been able to pin down.


I posted about this a while ago, but didn't get anywhere (no replies that I 
found).  The problem started out as a glusterfsd process consuming large 
amounts of ram (up to the point where ram and swap were exhausted and the 
kernel OOM killer killed off the glusterfsd process).  For reasons not 
clear to me at this time, that resulted in any VMs running on that host and 
that gluster volume to be paused with I/O error (the glusterfs process is 
usually unharmed; why it didn't continue I/O with other servers is 
confusing to me).


I have 3 servers and a total of 4 gluster volumes (engine, iso, data, and 
data-hdd).  The first 3 are replica 2+arb; the 4th (data-hdd) is replica 3. 
 The first 3 are backed by an LVM partition (some thin provisioned) on an 
SSD; the 4th is on a seagate hybrid disk (hdd + some internal flash for 
acceleration).  data-hdd is the only thing on the disk.  Servers are Dell 
R610 with the PERC/6i raid card, with the disks individually passed through 
to the OS (no raid enabled).


The above RAM usage issue came from the data-hdd volume.  Yesterday, I 
cought one of the 

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-30 Thread Johan Bernhardsson
Is storage working as it should?  Does the gluster mount point respond as 
it should? Can you write files to it?  Does the physical drives say that 
they are ok? Can you write (you shouldn't bypass gluster mount point but 
you need to test the drives) to the physical drives?


For me this sounds like broken or almost broken hardware or broken 
underlying filesystems.


If one of the drives malfunction and timeout, gluster will be slow and 
timeout. It runs write in sync so the slowest node will slow down the whole 
system.


/Johan


On May 30, 2018 08:29:46 Jim Kusznir  wrote:
hosted-engine --deploy failed (would not come up on my existing gluster 
storage).  However, I realized no changes were written to my existing 
storage.  So, I went back to trying to get my old engine running.


hosted-engine --vm-status is now taking a very long time (5+minutes) to 
return, and it returns stail information everywhere.  I thought perhaps the 
lockspace is corrupt, so tried to clean that and metadata, but both are 
failing (--cleam-metadata has hung and I can't even ctrl-c out of it).


How can I reinitialize all the lockspace/metadata safely?  There is no 
engine or VMs running currently


--Jim

On Tue, May 29, 2018 at 9:33 PM, Jim Kusznir  wrote:
Well, things went from bad to very, very bad

It appears that during one of the 2 minute lockups, the fencing agents 
decided that another node in the cluster was down.  As a result, 2 of the 3 
nodes were simultaneously reset with fencing agent reboot.  After the nodes 
came back up, the engine would not start.  All running VMs (including VMs 
on the 3rd node that was not rebooted) crashed.


I've now been working for about 3 hours trying to get the engine to come 
up.  I don't know why it won't start.  hosted-engine --vm-start says its 
starting, but it doesn't start (virsh doesn't show any VMs running).  I'm 
currently running --deploy, as I had run out of options for anything else I 
can come up with.  I hope this will allow me to re-import all my existing 
VMs and allow me to start them back up after everything comes back up.


I do have an unverified geo-rep backup; I don't know if it is a good backup 
(there were several prior messages to this list, but I didn't get replies 
to my questions.  It was running in what I believe to be "strange", and the 
data directories are larger than their source).


I'll see if my --deploy works, and if not, I'll be back with another 
message/help request.


When the dust settles and I'm at least minimally functional again, I really 
want to understand why all these technologies designed to offer redundancy 
conspired to reduce uptime and create failures where there weren't any 
otherwise.  I thought with hosted engine, 3 ovirt servers and glusterfs 
with minimum replica 2+arb or replica 3 should have offered strong 
resilience against server failure or disk failure, and should have 
prevented / recovered from data corruption.  Instead, all of the above 
happened (once I get my cluster back up, I still have to try and recover my 
webserver VM, which won't boot due to XFS corrupt journal issues created 
during the gluster crashes).  I think a lot of these issues were rooted 
from the upgrade from 4.1 to 4.2.


--Jim

On Tue, May 29, 2018 at 6:25 PM, Jim Kusznir  wrote:
I also finally found the following in my system log on one server:

[10679.524491] INFO: task glusterclogro:14933 blocked for more than 120 
seconds.
[10679.525826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

[10679.527144] glusterclogro   D 97209832bf40 0 14933  1 0x0080
[10679.527150] Call Trace:
[10679.527161]  [] schedule+0x29/0x70
[10679.527218]  [] _xfs_log_force_lsn+0x2e8/0x340 [xfs]
[10679.527225]  [] ? wake_up_state+0x20/0x20
[10679.527254]  [] xfs_file_fsync+0x107/0x1e0 [xfs]
[10679.527260]  [] do_fsync+0x67/0xb0
[10679.527268]  [] ? system_call_after_swapgs+0xbc/0x160
[10679.527271]  [] SyS_fsync+0x10/0x20
[10679.527275]  [] system_call_fastpath+0x1c/0x21
[10679.527279]  [] ? system_call_after_swapgs+0xc8/0x160
[10679.527283] INFO: task glusterposixfsy:14941 blocked for more than 120 
seconds.
[10679.528608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

[10679.529956] glusterposixfsy D 972495f84f10 0 14941  1 0x0080
[10679.529961] Call Trace:
[10679.529966]  [] schedule+0x29/0x70
[10679.530003]  [] _xfs_log_force_lsn+0x2e8/0x340 [xfs]
[10679.530008]  [] ? wake_up_state+0x20/0x20
[10679.530038]  [] xfs_file_fsync+0x107/0x1e0 [xfs]
[10679.530042]  [] do_fsync+0x67/0xb0
[10679.530046]  [] ? system_call_after_swapgs+0xbc/0x160
[10679.530050]  [] SyS_fdatasync+0x13/0x20
[10679.530054]  [] system_call_fastpath+0x1c/0x21
[10679.530058]  [] ? system_call_after_swapgs+0xc8/0x160
[10679.530062] INFO: task glusteriotwr13:15486 blocked for more than 120 
seconds.
[10679.531805] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

[10679.533732] 

Re: [ovirt-users] oVirt non-self-hosted HA

2018-04-05 Thread Johan Bernhardsson
The norm is to have a cluster with shared storage. So you have 3 to 5 
hardware noed that shares storage for the hosted engine. That shared 
storage is in sync. So you don't have one engine per physical node.


If one hardware node goes down the engine is restarted on another node with 
the help of hosted-engine service.


/Johan

On April 5, 2018 08:11:50 TomK  wrote:

On 4/4/2018 3:11 AM, Yaniv Kaul wrote:


On Wed, Apr 4, 2018 at 12:39 AM, Tom > wrote:



Sent from my iPhone

On Apr 3, 2018, at 9:32 AM, Yaniv Kaul > wrote:



On Tue, Apr 3, 2018 at 3:12 PM, TomK > wrote:

Hey Guy's,

If I'm looking to setup the oVirt engine in an HA
configuration off the physical servers hosting my VM's (non
self hosted), what are my options here?

I want to setup two to four active oVirt engine instances
elsewhere and handle the HA via something like haproxy /
keepalived to keep the entire experience seamless to the user.


You will need to set up the oVirt engine service as well as the PG
database (and ovirt-engine-dwhd service and any other service we
run next to the engine) as highly available module.
In pacemaker[1], for example.
You'll need to ensure configuration is also sync'ed between nodes,
etc.
Y.

So already have one ovirt engine setup separately on a vm that
manages two remote physical hosts.  So familiar with the single host
approach which I would simply replicate.  At least that’s the idea
anyway.  Could you please expand a bit on the highly available
module and  syncing the config between hosts?


That's a different strategy, which is also legit - you treat this VM as
a highly available resource. Now you do not need to sync the config -
just the VM disk and config.

I think there's a postgres component too and if oVirt engine keeps all
it's date on the postgres tables, then synchronizing this piece might be
all I need?  I'm not sure how the separate oVirt engines sitting on
various separate physical hosts keep their settings in sync about the
rest of the physicals in an oVirt environment. (Assume we may have 100
oVirt physicals for example.)

Perhaps something like
https://www.unixarena.com/2015/12/rhel-7-pacemaker-configuring-ha-kvm-guest.html
.

But if you are already doing that, I'm not sure why you'd prefer this
over hosted-engine setup.

I'm comparing both options.  I really don't want to ask too many
specific until I have the chance to read into the details of both.

Y.

Cheers,
Tom



Cheers,
Tom


[1] https://clusterlabs.org/quickstart-redhat.html



From what I've seen in oVirt, that seems to be possible
without the two oVirt engines even knowing each other's
existence but is it something anyone has ever done?  Any
recommendations in this case?

Having settings replicated would be a bonus but I would be
comfortable if they weren't and I handle that myself.

--
Cheers,
Tom K.
-

Living on earth is expensive, but it includes a free trip
around the sun.

___
Users mailing list
Users@ovirt.org 
http://lists.ovirt.org/mailman/listinfo/users



--
Cheers,
Tom K.
-

Living on earth is expensive, but it includes a free trip around the sun.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt non-self-hosted HA

2018-04-03 Thread Johan Bernhardsson

Hi,

It's not entirely clear what you want to do.

Ovirt is an interface that will control hardware nodes that runs virtual 
servers. It's similar to vmwares vsphere.


The engine need to be replicated so that if one goes down the other have 
the exact same information.


/Johan


On April 3, 2018 14:16:14 TomK  wrote:

Hey Guy's,

If I'm looking to setup the oVirt engine in an HA configuration off the
physical servers hosting my VM's (non self hosted), what are my options
here?

I want to setup two to four active oVirt engine instances elsewhere and
handle the HA via something like haproxy / keepalived to keep the entire
experience seamless to the user.


From what I've seen in oVirt, that seems to be possible without the two

oVirt engines even knowing each other's existence but is it something
anyone has ever done?  Any recommendations in this case?

Having settings replicated would be a bonus but I would be comfortable
if they weren't and I handle that myself.

--
Cheers,
Tom K.
-

Living on earth is expensive, but it includes a free trip around the sun.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Rebuilding my infra..

2018-01-08 Thread Johan Bernhardsson
You can't start the hosted engine storage bin anything less than replica 3 
without changing the installer scripts manually.


For the third it can be pretty much anything capable of running as an arbiter.

/Johan


On January 8, 2018 21:33:25 carl langlois  wrote:


I should have say replica 3+arbiter when adding the 3rd host.

Yes i know that the documentation specified 3 hosts. But if i understand
this is to make sure the system is always reliable and that make sense. But
my questions is mainly is it possible to start with 2 nodes and had a 3rd
one later..when i had the budget. If i am not mistaking Gluster can work
with 2 nodes.



On Mon, Jan 8, 2018 at 3:15 PM, Vinícius Ferrão  wrote:


If I’m not wrong GlusterFS in oVirt requires 3 hosts.

Here’s the RHHI guide, it’s pretty much the same for oVirt:
https://access.redhat.com/documentation/en-us/red_hat_
hyperconverged_infrastructure/1.1/html/deploying_red_hat_
hyperconverged_infrastructure/

> On 8 Jan 2018, at 18:10, carl langlois  wrote:
>
> Hi all
>
> After screwing my infra with the update to 4.2 (probably a bad
manipulation), i am planning a rebuild of the entire infra. First i want to
replace my NFS storage with a glusterfs storage. All documentation tell me
that i need 3 hosts.. but for the moment i only have 2 but planning to had
more later.
>
> So does it make sense to start with 2 hosts and use glusterfs as the
storage domain(lets says with a replicate of two with all its limitations).
> If it make sense,
> 1- what is the best way to do it.
> 2- how hard will it be to had the 3rd host when available and make it
replica 2+arbiter.
>
> Also in a setup where i have 3 hosts (replica 2+arbiter) does all the 3
hosts can run users vm?
>
> Thanks for your inputs.
>
> Carl
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users






--
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Q: oVirt Node Update Wiped my Updates

2017-12-18 Thread Johan Bernhardsson
The differences are here:
https://www.ovirt.org/documentation/install-guide/chap-Introduction_to_
Hypervisor_Hosts/

And also the different guides on how to install them.

/Johan

On Tue, 2017-12-19 at 01:06 +0100, Johan Bernhardsson wrote:
> ovirt node is a small minimal os and will wipe your manually
> installed
> packages on an upgrade.
> 
> If you want local packages that are critical for you you should
> install
> full centos/rhev server and use that as a virtualization node. (This
> is
> what i did since i wanted more control of the virtulization nodes)
> 
> 
> /Johan
> On Tue, 2017-12-19 at 01:54 +0200, Andrei V wrote:
> > 
> > Hi !
> > 
> > I have updated today oVirt node 4.1 via yum, installed from oVirt
> > DVD.
> > Update was a minor version change.
> > Now all software I had installed, including mc, Samba, etc. have
> > been
> > lost. Looks like oVirt node is being updated with whole system
> > 600MB
> > image, not just with rpms (please correct if I'm wrong here).
> > 
> > Keeping my manually installed software is crucial, since it has UPS
> > and
> > hardware RAID monitor.
> > 
> > Can anyone suggest if this is normal behavior or a single glitch ?
> > 
> > So far I found only this instruction for node installation:
> > https://www.ovirt.org/documentation/install-guide/chap-oVirt_Nodes/
> > 
> > And it uses pre-made DVD ISO, not manual install with RHEL/CentOS
> > DVD
> > and RPMs from yum repository.
> > 
> > Thanks in advance for any suggestion(s).
> > Andrei
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Q: oVirt Node Update Wiped my Updates

2017-12-18 Thread Johan Bernhardsson
ovirt node is a small minimal os and will wipe your manually installed
packages on an upgrade.

If you want local packages that are critical for you you should install
full centos/rhev server and use that as a virtualization node. (This is
what i did since i wanted more control of the virtulization nodes)


/Johan
On Tue, 2017-12-19 at 01:54 +0200, Andrei V wrote:
> Hi !
> 
> I have updated today oVirt node 4.1 via yum, installed from oVirt
> DVD.
> Update was a minor version change.
> Now all software I had installed, including mc, Samba, etc. have been
> lost. Looks like oVirt node is being updated with whole system 600MB
> image, not just with rpms (please correct if I'm wrong here).
> 
> Keeping my manually installed software is crucial, since it has UPS
> and
> hardware RAID monitor.
> 
> Can anyone suggest if this is normal behavior or a single glitch ?
> 
> So far I found only this instruction for node installation:
> https://www.ovirt.org/documentation/install-guide/chap-oVirt_Nodes/
> 
> And it uses pre-made DVD ISO, not manual install with RHEL/CentOS DVD
> and RPMs from yum repository.
> 
> Thanks in advance for any suggestion(s).
> Andrei
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Two node configuration.

2017-12-15 Thread Johan Bernhardsson

No it is not safe to only use two nodes as you can end up with split brain.

So two nodes and an arbiter node is needed. The arbiter doesn't need to be 
that fancy.


Also the installer if installed with gluster as hosted storage (storage for 
the engine) will complain if replica is less than 3.


/Johan


On December 15, 2017 12:54:14 PM Jarek  wrote:

Yes, I checked it but it seems I still need three nodes - 2 for storage and 
one smaller for arbiter.

Is it safe to deploy it only on two nodes?
Am I wrong?


From: "Sandro Bonazzola" 
To: "Jaroslaw Augustynowicz" 
Cc: "users" 
Sent: Friday, December 15, 2017 12:03:24 PM
Subject: Re: [ovirt-users] Two node configuration.



2017-12-15 8:55 GMT+01:00 Jarek < [ mailto:j...@jaru.eu.org | 
j...@jaru.eu.org ] > :




Hello, currently I'm using kvms with pcs on vms... is there any ovirt 
solution for ha with two nodes (storage on local disks) without pcs& drbd 
for storage? I know about gluster storage but it needs third host;/




Did you check [ 
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes 
| 
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/creating_arbitrated_replicated_volumes 
] ?




BQ_BEGIN


___
Users mailing list
[ mailto:Users@ovirt.org | Users@ovirt.org ]
[ http://lists.ovirt.org/mailman/listinfo/users | 
http://lists.ovirt.org/mailman/listinfo/users ]



BQ_END




--


SANDRO BONAZZOLA

ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R

[ https://www.redhat.com/ | Red Hat EMEA ]
[ https://red.ht/sig ]

[ https://redhat.com/trusted | TRIED. TESTED. TRUSTED. ]





--
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Debugging why hosted engine flips between EngineUp and EngineBadHealth

2017-12-01 Thread Johan Bernhardsson
We have had a similar issue that has been resolved with restarting the
engine vps. 

Not ideal but it solves the problem for a about a month.

/JohanOn Fri, 2017-12-01 at 10:50 +0100, Luca 'remix_tj' Lorenzetto wrote:
> Hi all,
> 
> since some days my hosted-engine environments (one RHEV 4.0.7, one
> ovirt 4.1.7) continue to send mails about changes between EngineUp
> and
> EngineBadHealth.
> 
> This is pretty annoying and i'm not able to find out the root cause.
> 
> The only issue i've seen on hosts is this error appearing sometimes
> randomly about sending mails.
> 
> Thread-1::ERROR::2017-12-01
> 03:05:05,084::notifications::39::ovirt_hosted_engine_ha.broker.notifi
> cations.Notifications::(send_email)
> [Errno -2] Name or service not known
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-
> packages/ovirt_hosted_engine_ha/broker/notifications.py",
> line 26, in send_email
> timeout=float(cfg["smtp-timeout"]))
>   File "/usr/lib64/python2.7/smtplib.py", line 255, in __init__
> (code, msg) = self.connect(host, port)
>   File "/usr/lib64/python2.7/smtplib.py", line 315, in connect
> self.sock = self._get_socket(host, port, self.timeout)
>   File "/usr/lib64/python2.7/smtplib.py", line 290, in _get_socket
> return socket.create_connection((host, port), timeout)
>   File "/usr/lib64/python2.7/socket.py", line 553, in
> create_connection
> for res in getaddrinfo(host, port, 0, SOCK_STREAM):
> gaierror: [Errno -2] Name or service not known
> Thread-6::WARNING::2017-12-01
> 03:05:05,427::engine_health::130::engine_health.CpuLoadNoEngine::(act
> ion)
> bad health status: Hosted Engine is not up!
> 
> There are no errors on engine logs and all the api queries done by
> ovirt-hosted-engine-ha returns HTTP code 200.
> 
> I suspect the switch between EngineUP and EngineBadHealth status
> could
> be due to some dns resolution issues, but there is no clear message
> on
> the log showing this and this doesn't help our netadmins to make some
> traces.
> 
> Is there a way to increase the verbosity of broker.log and agent.log?
> 
> Luca
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] glusterFS is not independent?

2017-11-09 Thread Johan Bernhardsson
Gluster would be to overcomplicate this if you only want local storage
on both to be reachable over the network.
Simplest way is to setup an nf server on both nodes and create a
storage domain for each.
Gluster is a way to secure your data and replicate them over several
nodes. so that if one node goes down or explodes you always have the
data replicated to other nodes. Running a single brick gluster volume
is not recomended. 
/JohanOn Thu, 2017-11-09 at 11:57 +0100, Jon bae wrote:
> Thank you for your answer!
> 
> I don't understand way I have to put them in replicate mode. As I
> understand replicate means, that the files get copy to both nodes,
> but I would like to have them independent, and I move the vm disks to
> the node how i want it.
> 
> Theoretical I only need a solution where I can use local storage from
> the nodes, but that they are reachable over the network. 
> 
> Jonathan
> 
> 2017-11-09 11:39 GMT+01:00 Johan Bernhardsson <jo...@kafit.se>:
> > For it to work you need to have the bricks in replicate. Of brick
> > on each server. 
> > 
> > If you only have two nodes. The quoum will be to low so it will set
> > the gluster to failsafe mode until the other brick comes online. 
> > 
> > For it to work properly you need three nodes with one brick or two
> > nodes and a third node acting as an arbiter.
> > 
> > /Johan
> > 
> > On Thu, 2017-11-09 at 11:35 +0100, Jon bae wrote:
> > > Hello,
> > > I'm very new to oVirt and glusterFS, so maybe I got something
> > > wrong...
> > > 
> > > I have the oVirt engine installed on a separate server and I have
> > > also two physical nodes. On every node I configure glusterFS, the
> > > volume is in distribution mode and have only one brick, from is
> > > one node. Both volumes I also add to its own storage domain.
> > > 
> > > The idea was, that both storage domains are independent from each
> > > other, that I can turn of one node and only turn it on, when I
> > > need it.
> > > 
> > > But now I have the problem, that when I turn of on node, both
> > > storage domains goes down. and the volume shows the the brick is
> > > not available.
> > > 
> > > Is there a way to fix this?
> > > 
> > > Regards
> > > Jonathan
> > > 
> > > 
> > > ___
> > > Users mailing list
> > > Users@ovirt.org
> > > http://lists.ovirt.org/mailman/listinfo/users
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> > ___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] glusterFS is not independent?

2017-11-09 Thread Johan Bernhardsson
For it to work you need to have the bricks in replicate. Of brick on
each server. 
If you only have two nodes. The quoum will be to low so it will set the
gluster to failsafe mode until the other brick comes online. 
For it to work properly you need three nodes with one brick or two
nodes and a third node acting as an arbiter.
/JohanOn Thu, 2017-11-09 at 11:35 +0100, Jon bae wrote:
> Hello,
> I'm very new to oVirt and glusterFS, so maybe I got something
> wrong...
> 
> I have the oVirt engine installed on a separate server and I have
> also two physical nodes. On every node I configure glusterFS, the
> volume is in distribution mode and have only one brick, from is one
> node. Both volumes I also add to its own storage domain.
> 
> The idea was, that both storage domains are independent from each
> other, that I can turn of one node and only turn it on, when I need
> it.
> 
> But now I have the problem, that when I turn of on node, both storage
> domains goes down. and the volume shows the the brick is not
> available.
> 
> Is there a way to fix this?
> 
> Regards
> Jonathan
> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Advise needed: building cheap HA oVirt cluster with just 2 physical servers

2017-11-03 Thread Johan Bernhardsson
On drbd you have one master and use iscsi to share the storage. If the 
master fails it will fail over to the other.


You have one up that will act as the iscsi storage and jump between the hosts.

If you run drbd master master ... Then you might get split brain and that 
is not fun :)



On November 3, 2017 09:04:49 Eduardo Mayoral <emayo...@arsys.es> wrote:


Just genuinely curious, how do you avoid split-brain situations with a
2-node setup (be it drbd, gluster or anything else)?

Eduardo Mayoral Jimeno (emayo...@arsys.es)
Administrador de sistemas. Departamento de Plataformas. Arsys internet.
+34 941 620 145 ext. 5153

On 03/11/17 08:40, Johan Bernhardsson wrote:


Check on drbd. I have used that to build a cluster for two servers. It
need some more work than a three node gluster conf but works well.

I even think they have a white paper on how to do it for virtualization.

/Johan

On November 3, 2017 08:11:04 Artem Tambovskiy
<artem.tambovs...@gmail.com> wrote:


Looking for a design advise on oVirt provisioning. I'm running a PoC
lab on single bare-metal host (suddenly it was setup with just Local
Storage domain) and 
no I'd like to rebuild the setup by making a cluster of 2 physical
servers, no external storage array available. That are the options
here? is there any options to build cheap HA cluster with just 2
servers? 

Thanks in advance!

Artem
___
Users mailing list
Users@ovirt.org <mailto:Users%40ovirt.org>
http://lists.ovirt.org/mailman/listinfo/users




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Advise needed: building cheap HA oVirt cluster with just 2 physical servers

2017-11-03 Thread Johan Bernhardsson
Check on drbd. I have used that to build a cluster for two servers. It need 
some more work than a three node gluster conf but works well.


I even think they have a white paper on how to do it for virtualization.

/Johan


On November 3, 2017 08:11:04 Artem Tambovskiy  
wrote:



Looking for a design advise on oVirt provisioning. I'm running a PoC lab on
single bare-metal host (suddenly it was setup with just Local Storage
domain) and
no I'd like to rebuild the setup by making a cluster of 2 physical servers,
no external storage array available. That are the options here? is there
any options to build cheap HA cluster with just 2 servers?

Thanks in advance!

Artem



--
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Odp: Re: How to upgrade self-hosted engine?

2017-09-21 Thread Johan Bernhardsson

2017-09-21 13:08 skrev gabriel_skup...@o2.pl:

Dnia 20 września 2017 15:54 Kasturi Narra 
napisał(a):


Hi,

upgrade HE (Hosted Engine ) by doing the steps below.

1) Move HE to global maintenance by running the command
'hosted-engine --set-maintenance --mode=global'

2) Add the required repos which has higher package versions.

3) Run 'yum update ovirt\*setup\*'


What if we upgrade all packages by 'yum update'?



That will break the engine most probably. Since engine-setup does more 
than just installing the package. To ensure a safe upgrade follow these 
steps.


/Johan

--
Security all the way ...

Linux/CMS/Network/Print/Virtualisation/VoIP Consultant

Kafit AB
Orgnr:  556792-5945
Mobile: +46705111751
Sweden: +46101993005
Seychelles: +2486478105
Uk: +448701821792
Email:  jo...@kafit.se
Web:http://www.kafit.se

Connect with me on LinkedIn: http://www.linkedin.com/in/smallone

About me: http://about.me/smallone/bio
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Odp: Re: How to upgrade self-hosted engine?

2017-09-20 Thread Johan Bernhardsson
If you follow that guide it will update everything on the virtual server 
running the engine. You should not run yum to upgrade the packages 
(correct me anyone if i am wrong about this)


The engine-setup will download and install the packages for you.

And on the nodes  upgrade first via the web interface and after that you 
can upgrade with yum on the nodes. Don't forget to have the node in 
maintence mode when you run yum update


/Johan
2017-09-20 14:39 skrev gabriel_skup...@o2.pl:

Thanks. What about the system itself?

Is yum update enough?

 Dnia 20 września 2017 13:25 Johan Bernhardsson <jo...@kafit.se>
napisał(a):


Follow this guide if it is between minor releases



https://www.ovirt.org/documentation/upgrade-guide/chap-Updates_between_Minor_Releases/



Don't forget to send the hosted-engine to global maintenance

/Johan

On September 20, 2017 13:11:41 gabriel_skup...@o2.pl wrote:


In oVirt Engine Web Administration portal I can see option to
upgrade the nodes but can't see any option to upgrade
hosted-engine itself?

What is the recommended procedure for it?

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


--
Security all the way ...

Linux/CMS/Network/Print/Virtualisation/VoIP Consultant

Kafit AB
Orgnr:  556792-5945
Mobile: +46705111751
Sweden: +46101993005
Seychelles: +2486478105
Uk: +448701821792
Email:  jo...@kafit.se
Web:http://www.kafit.se

Connect with me on LinkedIn: http://www.linkedin.com/in/smallone

About me: http://about.me/smallone/bio
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] How to upgrade self-hosted engine?

2017-09-20 Thread Johan Bernhardsson

Follow this guide if it is between minor releases


https://www.ovirt.org/documentation/upgrade-guide/chap-Updates_between_Minor_Releases/

Don't forget to send the hosted-engine to global maintenance

/Johan



On September 20, 2017 13:11:41 gabriel_skup...@o2.pl wrote:

In oVirt Engine Web Administration portal I can see option to upgrade the 
nodes but cant see any option to upgrade hosted-engine itself?   What 
is the recommended procedure for it?




--
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted-engine --deploy 4.1.6 - Web GUI Down

2017-09-20 Thread Johan Bernhardsson
Why are you stopping firewalld? A better solution is to actually add 
firewall rules and open up what's needed.


/Johan


On September 19, 2017 22:17:51 Mat Gomes  wrote:


Hi Guys,

I'm attempting to rebuild my environment for production testing, We've 
multiple locations NY and CH with a 2+1arbiter setup, this will run 
Geo-Replication.


All goes smoothly, VM starts and I'm able to login onto it via SSH, I 
usually stop Firewalld as it usually causes the Web-GUI to be unreachable
This time around the Web GUI can't be reached, I've done more than 10 
deployments and tests but I'm not sure what's causing this, 4.1.5 was 
deployed same exact way aswell. it all seems to be running fine, port 80 is 
listening, I've seem most of the logs but nothing stands out.


Please help.


Steps:
#Puppet Installation/deployment.
class ovirt {

package { 'centos-release-gluster310':
ensure => installed,
}
package { 'ovirt-release41-4.1.6-1.el7.centos.noarch':
ensure  => installed,
source => "http://resources.ovirt.org/pub/yum-repo/ovirt-release41.rpm;,
provider => 'rpm',
install_options => ['--nosignature'],
}
$ovirt_packages = 
['system-storage-manager','vdsm-gluster','ovirt-hosted-engine-setup','ovirt-engine-appliance','glusterfs-server',]


package { $ovirt_packages:
ensure => installed,
require => [ 
Package['centos-release-gluster310','ovirt-release41-4.1.6-1.el7.centos.noarch']],

install_options => ['--disablerepo=epel'],
}
service { 'glusterd':
ensure => running,
enable => true,
require => [ Package['vdsm-gluster'], 
File['/etc/glusterfs/glusterd.vol']],
}
file { '/etc/glusterfs/glusterd.vol':
ensure  => file,
source  => 'puppet:///modules/ovirt/glusterd.vol',
owner   => root,
group   => root,
mode=> '0644',
}  <- All works fine.

Once everything is up/running deployed, peer-probed, ssh-keys are 
created/shared, volumes settings are set and started:
gluster volume create engine replica 3 arbiter 1 
host1:/data/glusterfs/vol1/engine host2:/data/glusterfs/vol1/engine 
host3:/data/glusterfs/vol1/engine

gluster volume set engine cluster.quorum-type auto
gluster volume set engine network.ping-timeout 10
gluster volume set engine auth.allow \*
gluster volume set engine group virt
gluster volume set engine storage.owner-uid 36
gluster volume set engine storage.owner-gid 36
gluster volume set engine server.allow-insecure on
gluster volume start engine

At last hosted-engine --deploy is ran,   --> https://pastebin.com/QXEwmSwT 
<--   answer file


VM starts but no WEB GUI.
Let me know if you need more info

Best Regards,
[cid:image001.png@01D3312F.03099560]

Mat Gomes | Assistant Vice President, IT
t. 212-531-8594  m. 954-254-1294
e. mgo...@clearpoolgroup.com
w. clearpoolgroup.com






--
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt high points

2017-09-07 Thread Johan Bernhardsson



On September 7, 2017 19:01:58 Christopher Cox  wrote:







Any links or ideas appreciated,


oVirt is NOT VMware.  But if you do things "well" oVirt works quite
well.  Follow the list to see folks that didn't necessarily do things
"well" (sad, but true).

I inherited this oVirt... not ideal for blades because it's better to
have lots of networks.  We just have two blade fabrics, one for SAN and
one for the rest, and it would be nice to have ovirtmgmt and migration
networks be isolated.  With that said, with our massively VLAN'd setup,
it does work and has been very reliable.  For performance reasons, I
recommend that you attempt to dedicate a host for SPM, or at least keep
the number of VMs deployed there to a minimum.  There are tweaks in the
setup to keep VMs off the SPM node (talking mainly if you have a
massively combined network like I have currently).

We've survived many bad events with regards to SAN and power, which is a
tribute to oVirt's reliability.  However, you can shoot yourself in the
foot very easily with oVirt... so just be careful.

Is VMware better?  Yes.  Is it more flexible than oVirt?  Yes. Is it
more reliable than oVirt? Yes.  In other words, if money is of no
concern, VMware and VCenter.

We will likely never do VMware here due to cost (noting, that the cost
is in VCenter, and IMHO, it's not horrible, but I do not control the
wallet here, and we tend to prefer FOSS here... and FOSS is my personal
preference as well).

Companies generally speaking just want something that works.  And oVirt
does work.  But if money is of no concern and you need the friendliness
of something VCenter like (noting that not everyone needs VCenter or
RHEV-M or oVirt Manager), then VMware is still better.

If you don't need something VCenter like, I can also so say that libvirt
(KVM) and virt-manager is also reasonable, and we use that as well.  But
we also have a (free) ESXi (because we have to, forced requirement).

The ovirtmgmt web ui is gross IMHO.  It's a perfect example of an
overweight UI where a simplified UI would have been cleaner, faster and
better.  Just because you know how to write thousands of lines of
javascript doesn't mean you should.  Not everything needs to act like a
trading floor application or facebook.  The art of efficient UI design
has been lost.  With that said, the RESTful i/f part is nice.  Nice to
the point of not needing the SDK.

Finally, VMware can be expensive.  It's not a "one time" purchase.  It's
HAS TO BE ongoing.  And it can get very expensive if not understood.
With that said, if you have anything Microsoft in the enterprise, you
already understand and are prepared to throw cash for IT infrastructure.
  If you do go VMware, make sure to use a hefty Vcenter host as upgrades
to VCenter involve a lot of bloat and waste.

VMware can be a real "pain" support wise.  They can deprecate your
entire hypervisor HW stack, especially true in a major release.  They
can even deprecate HW in a minor release (I have fallen victim to this).

Thus, again, if you have money to burn and have relatively short HW life
cycles (less than 5 years for sure), AND that includes OS life cycles as
well, then VMware is probably ok.  Not saying there aren't some problems
on the oVirt side as well, just saying VMware has more expensive warts.
And thus "paid support" becomes somewhat humorous (but in a sad sort of
way).

(oVirt community support ROCKS!  Just saying...)


From my work with both VMware and ovirt. I must say that the ovirt 4.1 
installations I have is more reliable that the vsphere/vcenter 
installations I maintain.


But the key is to do it well. That applies to any virtualization solution. 
If you plan wrong and just throw it in you will have problems.


I use ovirt 4.1.* and gluster as a Backend. And the many things I thought 
about in loads of ways has made it rock solid. As a separate vlan for 
storage and migration and one for ovirtmanagement.


And yes this is an awesome community :)

/Johan


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] hyperconverged question

2017-09-01 Thread Johan Bernhardsson
If gluster drops in quorum so that it has less votes than it should it
will stop file operations until quorum is back to normal.If i rember it
right you need two bricks to write for quorum to be met and that the
arbiter only is a vote to avoid split brain.
Basically what you have is a raid5 solution without a spare. And when
one disk dies it will run in degraded mode. And some raid systems will
stop the raid until you have removed the disk or forced it to run
anyway. 
You can read up on it here: https://gluster.readthedocs.io/en/latest/Ad
ministrator%20Guide/arbiter-volumes-and-quorum/
/JohanOn Thu, 2017-08-31 at 22:33 -0700, Jim Kusznir wrote:
> Hi all:  
> 
> Sorry to hijack the thread, but I was about to start essentially the
> same thread.
> 
> I have a 3 node cluster, all three are hosts and gluster nodes
> (replica 2 + arbitrar).  I DO have the mnt_options=backup-volfile-
> servers= set:
> 
> storage=192.168.8.11:/engine
> mnt_options=backup-volfile-servers=192.168.8.12:192.168.8.13
> 
> I had an issue today where 192.168.8.11 went down.  ALL VMs
> immediately paused, including the engine (all VMs were running on
> host2:192.168.8.12).  I couldn't get any gluster stuff working until
> host1 (192.168.8.11) was restored.
> 
> What's wrong / what did I miss?
> 
> (this was set up "manually" through the article on setting up self-
> hosted gluster cluster back when 4.0 was new..I've upgraded it to 4.1
> since).
> 
> Thanks!
> --Jim
> 
> 
> On Thu, Aug 31, 2017 at 12:31 PM, Charles Kozler 
> m> wrote:
> > Typo..."Set it up and then failed that **HOST**"
> > 
> > And upon that host going down, the storage domain went down. I only
> > have hosted storage domain and this new one - is this why the DC
> > went down and no SPM could be elected?
> > 
> > I dont recall this working this way in early 4.0 or 3.6
> > 
> > On Thu, Aug 31, 2017 at 3:30 PM, Charles Kozler 
> > om> wrote:
> > > So I've tested this today and I failed a node. Specifically, I
> > > setup a glusterfs domain and selected "host to use: node1". Set
> > > it up and then failed that VM
> > > 
> > > However, this did not work and the datacenter went down. My
> > > engine stayed up, however, it seems configuring a domain to pin
> > > to a host to use will obviously cause it to fail
> > > 
> > > This seems counter-intuitive to the point of glusterfs or any
> > > redundant storage. If a single host has to be tied to its
> > > function, this introduces a single point of failure
> > > 
> > > Am I missing something obvious?
> > > 
> > > On Thu, Aug 31, 2017 at 9:43 AM, Kasturi Narra  > > > wrote:
> > > > yes, right.  What you can do is edit the hosted-engine.conf
> > > > file and there is a parameter as shown below [1] and replace h2
> > > > and h3 with your second and third storage servers. Then you
> > > > will need to restart ovirt-ha-agent and ovirt-ha-broker
> > > > services in all the nodes .
> > > > 
> > > > [1] 'mnt_options=backup-volfile-servers=:' 
> > > > 
> > > > On Thu, Aug 31, 2017 at 5:54 PM, Charles Kozler 
> > > > il.com> wrote:
> > > > > Hi Kasturi -
> > > > > 
> > > > > Thanks for feedback
> > > > > 
> > > > > > If cockpit+gdeploy plugin would be have been used then that
> > > > > would have automatically detected glusterfs replica 3 volume
> > > > > created during Hosted Engine deployment and this question
> > > > > would not have been asked
> > > > >   
> > > > > Actually, doing hosted-engine --deploy it too also auto
> > > > > detects glusterfs.  I know glusterfs fuse client has the
> > > > > ability to failover between all nodes in cluster, but I am
> > > > > still curious given the fact that I see in ovirt config
> > > > > node1:/engine (being node1 I set it to in hosted-engine --
> > > > > deploy). So my concern was to ensure and find out exactly how
> > > > > engine works when one node goes away and the fuse client
> > > > > moves over to the other node in the gluster cluster
> > > > > 
> > > > > But you did somewhat answer my question, the answer seems to
> > > > > be no (as default) and I will have to use hosted-engine.conf
> > > > > and change the parameter as you list
> > > > > 
> > > > > So I need to do something manual to create HA for engine on
> > > > > gluster? Yes?
> > > > > 
> > > > > Thanks so much!
> > > > > 
> > > > > On Thu, Aug 31, 2017 at 3:03 AM, Kasturi Narra 
> > > > > .com> wrote:
> > > > > > Hi,
> > > > > > 
> > > > > >    During Hosted Engine setup question about glusterfs
> > > > > > volume is being asked because you have setup the volumes
> > > > > > yourself. If cockpit+gdeploy plugin would be have been used
> > > > > > then that would have automatically detected glusterfs
> > > > > > replica 3 volume created during Hosted Engine deployment
> > > > > > and this question would not have been asked.
> > > > > > 
> > > > > >    During new storage domain creation when glusterfs is
> > > > > > selected there is a feature called 'use managed gluster
> > > > > > volumes' and upon checking 

Re: [ovirt-users] Issues getting agent working on Ubuntu 17.04

2017-08-08 Thread Johan Bernhardsson
And it would have been good if i read the whole email  :)
On Tue, 2017-08-08 at 22:04 +0200, Johan Bernhardsson wrote:
> It is a bug that is also present in 16.04.  The log directory in
> /var/log/ovirt-guest-agent  has the wrong user (or permission) It
> should have ovirtagent   as user and group.
> 
> /Johan
> 
> On Tue, 2017-08-08 at 15:59 -0400, Wesley Stewart wrote:
> > I am having trouble getting the ovirt agent working on Ubuntu 17.04
> > (perhaps it just isnt there yet)
> > 
> > Currently I have two test machines a 16.04 and a 17.04 ubuntu
> > servers.
> > 
> > 
> > On the 17.04 server:
> > Currently isntalled:
> > ovirt-guest-agent (1.0.12.2.dfsg-2), and service --status-all
> > reveals a few virtualization agents:
> >  [ - ]  open-vm-tools
> >  [ - ]  ovirt-guest-agent
> >  [ + ]  qemu-guest-agent
> > 
> > I can't seem to start ovirt-guest-agent
> > sudo service ovirt-guest-agent start/restart does nothing
> > 
> > Running  sudo systemctl status ovirt-guest-agent.service
> > Aug 08 15:31:50 ubuntu-template systemd[1]: Starting oVirt Guest
> > Agent...
> > Aug 08 15:31:50 ubuntu-template systemd[1]: Started oVirt Guest
> > Agent.
> > Aug 08 15:31:51 ubuntu-template python[1219]: *** stack smashing
> > detected ***: /usr/bin/python terminated
> > Aug 08 15:31:51 ubuntu-template systemd[1]: ovirt-guest-
> > agent.service: Main process exited, code=killed, status=6/ABRT
> > Aug 08 15:31:51 ubuntu-template systemd[1]: ovirt-guest-
> > agent.service: Unit entered failed state.
> > Aug 08 15:31:51 ubuntu-template systemd[1]: ovirt-guest-
> > agent.service: Failed with result 'signal'.
> > 
> > sudo systemctl enable ovirt-guest-agent.service
> > Also does not seem to do antyhing.
> > 
> > Doing more research, I found:
> > http://lists.ovirt.org/pipermail/users/2017-July/083071.html
> > So perhaps the ovirt-guest-agent is broken for Ubuntu 17.04?
> > 
> > 
> > On the 16.04 Server I have:
> > Took some fiddling, but I eventually got it working
> > 
> > 
> > 
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Issues getting agent working on Ubuntu 17.04

2017-08-08 Thread Johan Bernhardsson
It is a bug that is also present in 16.04.  The log directory in
/var/log/ovirt-guest-agent  has the wrong user (or permission) It
should have ovirtagent   as user and group.
/Johan
On Tue, 2017-08-08 at 15:59 -0400, Wesley Stewart wrote:
> I am having trouble getting the ovirt agent working on Ubuntu 17.04
> (perhaps it just isnt there yet)
> 
> Currently I have two test machines a 16.04 and a 17.04 ubuntu
> servers.
> 
> 
> On the 17.04 server:
> Currently isntalled:
> ovirt-guest-agent (1.0.12.2.dfsg-2), and service --status-all reveals
> a few virtualization agents:
>  [ - ]  open-vm-tools
>  [ - ]  ovirt-guest-agent
>  [ + ]  qemu-guest-agent
> 
> I can't seem to start ovirt-guest-agent
> sudo service ovirt-guest-agent start/restart does nothing
> 
> Running  sudo systemctl status ovirt-guest-agent.service
> Aug 08 15:31:50 ubuntu-template systemd[1]: Starting oVirt Guest
> Agent...
> Aug 08 15:31:50 ubuntu-template systemd[1]: Started oVirt Guest
> Agent.
> Aug 08 15:31:51 ubuntu-template python[1219]: *** stack smashing
> detected ***: /usr/bin/python terminated
> Aug 08 15:31:51 ubuntu-template systemd[1]: ovirt-guest-
> agent.service: Main process exited, code=killed, status=6/ABRT
> Aug 08 15:31:51 ubuntu-template systemd[1]: ovirt-guest-
> agent.service: Unit entered failed state.
> Aug 08 15:31:51 ubuntu-template systemd[1]: ovirt-guest-
> agent.service: Failed with result 'signal'.
> 
> sudo systemctl enable ovirt-guest-agent.service
> Also does not seem to do antyhing.
> 
> Doing more research, I found:
> http://lists.ovirt.org/pipermail/users/2017-July/083071.html
> So perhaps the ovirt-guest-agent is broken for Ubuntu 17.04?
> 
> 
> On the 16.04 Server I have:
> Took some fiddling, but I eventually got it working
> 
> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Good practices

2017-08-08 Thread Johan Bernhardsson
On ovirt gluster uses sharding. So all large files are broken up in small 
pieces on the gluster bricks.


/Johan


On August 8, 2017 12:19:39 Moacir Ferreira <moacirferre...@hotmail.com> wrote:

Thanks Johan, you brought "light" into my darkness! I went looking for the 
GlusterFS tiering how-to and it looks like quite simple to attach a SSD as 
hot tier. For those willing to read about it, go here: 
http://blog.gluster.org/2016/03/automated-tiering-in-gluster/



Now, I still have a question: VMs are made of very large .qcow2 files. My 
understanding is that files in Gluster are kept all together in a single 
brick. If so, I will not benefit from tiering as a single SSD will not be 
big enough to fit all my large VM .qcow2 files. This would not be true if 
Gluster can store "blocks" of data that compose a large file spread on 
several bricks. But if I am not wrong, this is one of key differences in 
between GlusterFS and Ceph. Can you comment?



Moacir


____
From: Johan Bernhardsson <jo...@kafit.se>
Sent: Tuesday, August 8, 2017 7:03 AM
To: Moacir Ferreira; Devin Acosta; users@ovirt.org
Subject: Re: [ovirt-users] Good practices


You attach the ssd as a hot tier with a gluster command. I don't think that 
gdeploy or ovirt gui can do it.


The gluster docs and redhat docs explains tiering quite good.

/Johan

On August 8, 2017 07:06:42 Moacir Ferreira <moacirferre...@hotmail.com> wrote:

Hi Devin,


Please consider that for the OS I have a RAID 1. Now, lets say I use RAID 5 
to assemble a single disk on each server. In this case, the SSD will not 
make any difference, right? I guess that to be possible to use it, the SSD 
should not be part of the RAID 5. In this case I could create a logical 
volume made of the RAIDed brick and then extend it using the SSD. I.e.: 
Using gdeploy:



[disktype]

jbod



[pv1]

action=create

devices=sdb, sdc

wipefs=yes

ignore_vg_erros=no


[vg1]

action=create

vgname=gluster_vg_jbod

pvname=sdb

ignore_vg_erros=no


[vg2]

action=extend

vgname=gluster_vg_jbod

pvname=sdc

ignore_vg_erros=no


But will Gluster be able to auto-detect and use this SSD brick for tiering? 
Do I have to do some other configurations? Also, as the VM files (.qcow2) 
are quite big will I benefit from tiering? This is wrong and my approach 
should be other?



Thanks,

Moacir



From: Devin Acosta <de...@pabstatencio.com>
Sent: Monday, August 7, 2017 7:46 AM
To: Moacir Ferreira; users@ovirt.org
Subject: Re: [ovirt-users] Good practices


Moacir,

I have recently installed multiple Red Hat Virtualization hosts for several 
different companies, and have dealt with the Red Hat Support Team in depth 
about optimal configuration in regards to setting up GlusterFS most 
efficiently and I wanted to share with you what I learned.


In general Red Hat Virtualization team frowns upon using each DISK of the 
system as just a JBOD, sure there is some protection by having the data 
replicated, however, the recommendation is to use RAID 6 (preferred) or 
RAID-5, or at least RAID-1 at the very least.


Here is the direct quote from Red Hat when I asked about RAID and Bricks:

"A typical Gluster configuration would use RAID underneath the bricks. RAID 
6 is most typical as it gives you 2 disk failure protection, but RAID 5 
could be used too. Once you have the RAIDed bricks, you'd then apply the 
desired replication on top of that. The most popular way of doing this 
would be distributed replicated with 2x replication. In general you'll get 
better performance with larger bricks. 12 drives is often a sweet spot. 
Another option would be to create a separate tier using all SSD’s.”


In order to SSD tiering from my understanding you would need 1 x NVMe drive 
in each server, or 4 x SSD hot tier (it needs to be distributed, replicated 
for the hot tier if not using NVME). So with you only having 1 SSD drive in 
each server, I’d suggest maybe looking into the NVME option.


Since your using only 3-servers, what I’d probably suggest is to do (2 
Replicas + Arbiter Node), this setup actually doesn’t require the 3rd 
server to have big drives at all as it only stores meta-data about the 
files and not actually a full copy.


Please see the attached document that was given to me by Red Hat to get 
more information on this. Hope this information helps you.



--

Devin Acosta, RHCA, RHVCA
Red Hat Certified Architect


On August 6, 2017 at 7:29:29 PM, Moacir Ferreira 
(moacirferre...@hotmail.com<mailto:moacirferre...@hotmail.com>) wrote:


I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU 
sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use 
GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and 
a dual 10Gb NIC. So my intention is to create a loop like a server triangle 
using the 40Gb NICs for virtualization files (VMs .qcow2) 

Re: [ovirt-users] Good practices

2017-08-08 Thread Johan Bernhardsson
You attach the ssd as a hot tier with a gluster command. I don't think that 
gdeploy or ovirt gui can do it.


The gluster docs and redhat docs explains tiering quite good.

/Johan


On August 8, 2017 07:06:42 Moacir Ferreira  wrote:


Hi Devin,


Please consider that for the OS I have a RAID 1. Now, lets say I use RAID 5 
to assemble a single disk on each server. In this case, the SSD will not 
make any difference, right? I guess that to be possible to use it, the SSD 
should not be part of the RAID 5. In this case I could create a logical 
volume made of the RAIDed brick and then extend it using the SSD. I.e.: 
Using gdeploy:



[disktype]

jbod



[pv1]

action=create

devices=sdb, sdc

wipefs=yes

ignore_vg_erros=no


[vg1]

action=create

vgname=gluster_vg_jbod

pvname=sdb

ignore_vg_erros=no


[vg2]

action=extend

vgname=gluster_vg_jbod

pvname=sdc

ignore_vg_erros=no


But will Gluster be able to auto-detect and use this SSD brick for tiering? 
Do I have to do some other configurations? Also, as the VM files (.qcow2) 
are quite big will I benefit from tiering? This is wrong and my approach 
should be other?



Thanks,

Moacir



From: Devin Acosta 
Sent: Monday, August 7, 2017 7:46 AM
To: Moacir Ferreira; users@ovirt.org
Subject: Re: [ovirt-users] Good practices


Moacir,

I have recently installed multiple Red Hat Virtualization hosts for several 
different companies, and have dealt with the Red Hat Support Team in depth 
about optimal configuration in regards to setting up GlusterFS most 
efficiently and I wanted to share with you what I learned.


In general Red Hat Virtualization team frowns upon using each DISK of the 
system as just a JBOD, sure there is some protection by having the data 
replicated, however, the recommendation is to use RAID 6 (preferred) or 
RAID-5, or at least RAID-1 at the very least.


Here is the direct quote from Red Hat when I asked about RAID and Bricks:

"A typical Gluster configuration would use RAID underneath the bricks. RAID 
6 is most typical as it gives you 2 disk failure protection, but RAID 5 
could be used too. Once you have the RAIDed bricks, you'd then apply the 
desired replication on top of that. The most popular way of doing this 
would be distributed replicated with 2x replication. In general you'll get 
better performance with larger bricks. 12 drives is often a sweet spot. 
Another option would be to create a separate tier using all SSD’s.”


In order to SSD tiering from my understanding you would need 1 x NVMe drive 
in each server, or 4 x SSD hot tier (it needs to be distributed, replicated 
for the hot tier if not using NVME). So with you only having 1 SSD drive in 
each server, I’d suggest maybe looking into the NVME option.


Since your using only 3-servers, what I’d probably suggest is to do (2 
Replicas + Arbiter Node), this setup actually doesn’t require the 3rd 
server to have big drives at all as it only stores meta-data about the 
files and not actually a full copy.


Please see the attached document that was given to me by Red Hat to get 
more information on this. Hope this information helps you.



--

Devin Acosta, RHCA, RHVCA
Red Hat Certified Architect


On August 6, 2017 at 7:29:29 PM, Moacir Ferreira 
(moacirferre...@hotmail.com) wrote:


I am willing to assemble a oVirt "pod", made of 3 servers, each with 2 CPU 
sockets of 12 cores, 256GB RAM, 7 HDD 10K, 1 SSD. The idea is to use 
GlusterFS to provide HA for the VMs. The 3 servers have a dual 40Gb NIC and 
a dual 10Gb NIC. So my intention is to create a loop like a server triangle 
using the 40Gb NICs for virtualization files (VMs .qcow2) access and to 
move VMs around the pod (east /west traffic) while using the 10Gb 
interfaces for giving services to the outside world (north/south traffic).



This said, my first question is: How should I deploy GlusterFS in such 
oVirt scenario? My questions are:



1 - Should I create 3 RAID (i.e.: RAID 5), one on each oVirt node, and then 
create a GlusterFS using them?


2 - Instead, should I create a JBOD array made of all server's disks?

3 - What is the best Gluster configuration to provide for HA while not 
consuming too much disk space?


4 - Does a oVirt hypervisor pod like I am planning to build, and the 
virtualization environment, benefits from tiering when using a SSD disk? 
And yes, will Gluster do it by default or I have to configure it to do so?



At the bottom line, what is the good practice for using GlusterFS in small 
pods for enterprises?



You opinion/feedback will be really appreciated!

Moacir

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



--
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Install ovirt on Azure

2017-08-07 Thread Johan Bernhardsson
There is no point on doing that as azure is a cloud in itself and ovirt
is to build your own virtual environment to deploy on local hardware.

/Johan

On Mon, 2017-08-07 at 12:32 +0200, Grzegorz Szypa wrote:
> Hi.
> 
> Did anyone try to install ovirt on Azure Environment?
> 
> -- 
> G.Sz.
> ___

> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] sanlock ids file broken after server crash (Fixed)

2017-08-01 Thread Johan Bernhardsson
I have two options. One it got there when we moved the servers from our
lab desk to the hosting site. We had some problems getting it running. 

Or two a couple of weeks ago two servers rebooted after high load. That
might have caused a damage to the file.

I did manage to move all servers from that storage and removed it,
cleaned it and added it as a new storage.

Not what i wanted but it solved the problem.

/Johan

On Sun, 2017-07-30 at 16:24 +0300, Maor Lipchuk wrote:
> Hi David,
> 
> I'm not sure how it got to that character in the first place.
> Nir, Is there a safe way to fix that while there are running VMs?
> 
> Regards,
> Maor
> 
> On Sun, Jul 30, 2017 at 11:58 AM, Johan Bernhardsson <jo...@kafit.se>
> wrote:
> > 
> > (First reply did not get to the list)
> > 
> > From sanlock.log:
> > 
> > 2017-07-30 10:49:31+0200 1766275 [1171]: s310751 lockspace
> > 0924ff77-
> > ef51-435b-b90d-50bfbf2e8de7:1:/rhev/data-
> > center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d-
> > 50bfbf2e8de7/dom_md/ids:0
> > 2017-07-30 10:49:31+0200 1766275 [10496]: verify_leader 1 wrong
> > space
> > name 0924ff77-ef51-435b-b90d-50bfbf2eke7 0924ff77-ef51-435b-
> > b90d-
> > 50bfbf2e8de7 /rhev/data-
> > center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-
> > ef51-435b-b90d-50bfbf2e8de7/dom_md/ids
> > 2017-07-30 10:49:31+0200 1766275 [10496]: leader1
> > delta_acquire_begin
> > error -226 lockspace 0924ff77-ef51-435b-b90d-50bfbf2e8de7 host_id 1
> > 2017-07-30 10:49:31+0200 1766275 [10496]: leader2 path /rhev/data-
> > center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d-
> > 50bfbf2e8de7/dom_md/ids offset 0
> > 2017-07-30 10:49:31+0200 1766275 [10496]: leader3 m 12212010 v
> > 30003 ss
> > 512 nh 0 mh 4076 oi 1 og 2031079063 lv 0
> > 2017-07-30 10:49:31+0200 1766275 [10496]: leader4 sn 0924ff77-ef51-
> > 435b-b90d-50bfbf2eke7 rn <93>7^\afa5-3a91-415b-a04c-
> > 221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7
> > 2017-07-30 10:49:32+0200 1766276 [1171]: s310751 add_lockspace fail
> > result -226
> > 
> > 
> > vdsm logs doesnt have any errors and engine.log does not have any
> > errors.
> > 
> > And if i check the ids file manually. I can see that everything in
> > it
> > is correct except for the first host in the cluster where the space
> > name and host id is broken.
> > 
> > 
> > /Johan
> > 
> > On Sun, 2017-07-30 at 11:18 +0300, Maor Lipchuk wrote:
> > > 
> > > Hi Johan,
> > > 
> > > Can you please share the vdsm and engine logs.
> > > 
> > > Also, it won't harm to also get the sanlock logs just in case
> > > sanlock
> > > was configured to save all debugging in a log file (see
> > > http://people.redhat.com/teigland/sanlock-messages.txt)).
> > > Try to share the sanlock ouput by running  'sanlock client
> > > status',
> > > 'sanlock client log_dump'.
> > > 
> > > Regards,
> > > Maor
> > > 
> > > On Thu, Jul 27, 2017 at 6:18 PM, Johan Bernhardsson <johan@kafit.
> > > se>
> > > wrote:
> > > > 
> > > > 
> > > > Hello,
> > > > 
> > > > The ids file for sanlock is broken on one setup. The first host
> > > > id
> > > > in
> > > > the file is wrong.
> > > > 
> > > > From the logfile i have:
> > > > 
> > > > verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d-
> > > > 50bfbf2e�ke7
> > > > 0924ff77-ef51-435b-b90d-50bfbf2e8de7 /rhev/data-
> > > > center/mnt/glusterSD/
> > > > 
> > > > 
> > > > 
> > > > Note the broken char in the space name.
> > > > 
> > > > This also apears. And it seams as the hostid too is broken in
> > > > the
> > > > ids
> > > > file:
> > > > 
> > > > leader4 sn 0924ff77-ef51-435b-b90d-50bfbf2e�ke7 rn ��7 afa5-
> > > > 3a91-
> > > > 415b-
> > > > a04c-221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7
> > > > 
> > > > Note the broken chars there as well.
> > > > 
> > > > If i check the ids file with less or strings the first row
> > > > where my
> > > > vbgkvm01 host are. That has broken chars.
> > > > 
> > > > Can this be repaired in some way without taking down all the
> > > > virtual
> > > > machines on that storage?
> > > > 
> > > > 
> > > > /Johan
> > > > ___
> > > > Users mailing list
> > > > Users@ovirt.org
> > > > http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] problem while moving/copying disks: vdsm low level image copy failed

2017-07-31 Thread Johan Bernhardsson
I added log snippets from when it fails to the bug entry and how the
volume is setup.
/Johan
On Mon, 2017-07-31 at 10:23 +0300, Benny Zlotnik wrote:
> Forgot to add there is a bug for this issue[1] 
> Please add your gluster mount and brick logs to the bug entry
> 
> [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1458846
> 
> On Sun, Jul 30, 2017 at 3:02 PM, Johan Bernhardsson <jo...@kafit.se>
> wrote:
> > OS Version:
> > RHEL - 7 - 3.1611.el7.centos
> > OS Description:
> > CentOS Linux 7 (Core)
> > Kernel Version:
> > 3.10.0 - 514.16.1.el7.x86_64
> > KVM Version:
> > 2.6.0 - 28.el7_3.9.1
> > LIBVIRT Version:
> > libvirt-2.0.0-10.el7_3.9
> > VDSM Version:
> > vdsm-4.19.15-1.el7.centos
> > SPICE Version:
> > 0.12.4 - 20.el7_3
> > GlusterFS Version:
> > glusterfs-3.8.11-1.el7
> > CEPH Version:
> > librbd1-0.94.5-1.el7
> > qemu-img version 2.6.0 (qemu-kvm-ev-2.6.0-28.el7_3.9.1), Copyright
> > (c) 2004-2008 Fabrice Bellard
> > 
> > This is what i have on the hosts.
> > 
> > /Johan
> > 
> > On Sun, 2017-07-30 at 13:56 +0300, Benny Zlotnik wrote:
> > > Hi, 
> > > 
> > > Can please you provide the versions of vdsm, qemu, libvirt?
> > > 
> > > On Sun, Jul 30, 2017 at 1:01 PM, Johan Bernhardsson 
> > > se> wrote:
> > > > Hello,
> > > > 
> > > > We get this error message while moving or copying some of the
> > > > disks on
> > > > our main cluster running 4.1.2 on centos7  
> > > > 
> > > > This is shown in the engine:
> > > > VDSM vbgkvm02 command HSMGetAllTasksStatusesVDS failed: low
> > > > level Image
> > > > copy failed
> > > > 
> > > > I can copy it inside the host. And i can use dd to copy.
> > > > Haven't tried
> > > > to run qemu-img manually yet.
> > > > 
> > > > 
> > > > This is from vdsm.log on the host:
> > > > 2017-07-28 09:07:22,741+0200 ERROR (tasks/6) [root] Job
> > > > u'c82d4c53-
> > > > 3eb4-405e-a2d5-c4c77519360e' failed (jobs:217)
> > > > Traceback (most recent call last):
> > > >   File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line
> > > > 154, in
> > > > run
> > > > self._run()
> > > >   File "/usr/share/vdsm/storage/sdm/api/copy_data.py", line 88,
> > > > in _run
> > > > self._operation.wait_for_completion()
> > > >   File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line
> > > > 329, in
> > > > wait_for_completion
> > > > self.poll(timeout)
> > > >   File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line
> > > > 324, in
> > > > poll
> > > > self.error)
> > > > QImgError: cmd=['/usr/bin/taskset', '--cpu-list', '0-15',
> > > > '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3',
> > > > '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T',
> > > > 'none', '-f',
> > > > 'raw', u'/rhev/data-
> > > > center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-
> > > > 43
> > > > 5b-b90d-50bfbf2e8de7/images/750f4184-b852-4b00-94fc-
> > > > 476f3f5b93c7/3fe43487-3302-4b34-865a-07c5c6aedbf2', '-O',
> > > > 'raw',
> > > > u'/rhev/data-center/mnt/glusterSD/10.137.30.105:_fs03/5d47a297-
> > > > a21f-
> > > > 4587-bb7c-dd00d52010d5/images/750f4184-b852-4b00-94fc-
> > > > 476f3f5b93c7/3fe43487-3302-4b34-865
> > > > a-07c5c6aedbf2'], ecode=1, stdout=, stderr=qemu-img: error
> > > > while
> > > > reading sector 12197886: No data available
> > > > , message=None
> > > > 
> > > > 
> > > > The storage domains are all based on gluster. The storage
> > > > domains that
> > > > we see this on is configured as dispersed volumes. 
> > > > 
> > > > Found a way to "fix" the problem. And that is to run dd
> > > > if=/dev/vda
> > > > of=/dev/null bs=1M  inside the virtual guest. After that we can
> > > > copy an
> > > > image or use storage livemigration.
> > > > 
> > > > Is this a gluster problem or an vdsm problem? Or could it be
> > > > something
> > > > with qemu-img?
> > > > 
> > > > /Johan
> > > > ___
> > > > Users mailing list
> > > > Users@ovirt.org
> > > > http://lists.ovirt.org/mailman/listinfo/users
> > > > ___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] problem while moving/copying disks: vdsm low level image copy failed

2017-07-30 Thread Johan Bernhardsson
OS Version:
RHEL - 7 - 3.1611.el7.centos
OS Description:
CentOS Linux 7
(Core)
Kernel Version:
3.10.0 - 514.16.1.el7.x86_64
KVM Version:
2.6.0 -
28.el7_3.9.1
LIBVIRT Version:
libvirt-2.0.0-10.el7_3.9
VDSM Version:
vdsm-
4.19.15-1.el7.centos
SPICE Version:
0.12.4 - 20.el7_3
GlusterFS Version:
gl
usterfs-3.8.11-1.el7
CEPH Version:
librbd1-0.94.5-1.el7

qemu-img version 2.6.0 (qemu-kvm-ev-2.6.0-28.el7_3.9.1), Copyright (c)
2004-2008 Fabrice Bellard
This is what i have on the hosts.
/Johan
On Sun, 2017-07-30 at 13:56 +0300, Benny Zlotnik wrote:
> Hi, 
> 
> Can please you provide the versions of vdsm, qemu, libvirt?
> 
> On Sun, Jul 30, 2017 at 1:01 PM, Johan Bernhardsson <jo...@kafit.se>
> wrote:
> > Hello,
> > 
> > We get this error message while moving or copying some of the disks
> > on
> > our main cluster running 4.1.2 on centos7  
> > 
> > This is shown in the engine:
> > VDSM vbgkvm02 command HSMGetAllTasksStatusesVDS failed: low level
> > Image
> > copy failed
> > 
> > I can copy it inside the host. And i can use dd to copy. Haven't
> > tried
> > to run qemu-img manually yet.
> > 
> > 
> > This is from vdsm.log on the host:
> > 2017-07-28 09:07:22,741+0200 ERROR (tasks/6) [root] Job u'c82d4c53-
> > 3eb4-405e-a2d5-c4c77519360e' failed (jobs:217)
> > Traceback (most recent call last):
> >   File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 154,
> > in
> > run
> > self._run()
> >   File "/usr/share/vdsm/storage/sdm/api/copy_data.py", line 88, in
> > _run
> > self._operation.wait_for_completion()
> >   File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line
> > 329, in
> > wait_for_completion
> > self.poll(timeout)
> >   File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line
> > 324, in
> > poll
> > self.error)
> > QImgError: cmd=['/usr/bin/taskset', '--cpu-list', '0-15',
> > '/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3',
> > '/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none',
> > '-f',
> > 'raw', u'/rhev/data-center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-
> > ef51-
> > 43
> > 5b-b90d-50bfbf2e8de7/images/750f4184-b852-4b00-94fc-
> > 476f3f5b93c7/3fe43487-3302-4b34-865a-07c5c6aedbf2', '-O', 'raw',
> > u'/rhev/data-center/mnt/glusterSD/10.137.30.105:_fs03/5d47a297-
> > a21f-
> > 4587-bb7c-dd00d52010d5/images/750f4184-b852-4b00-94fc-
> > 476f3f5b93c7/3fe43487-3302-4b34-865
> > a-07c5c6aedbf2'], ecode=1, stdout=, stderr=qemu-img: error while
> > reading sector 12197886: No data available
> > , message=None
> > 
> > 
> > The storage domains are all based on gluster. The storage domains
> > that
> > we see this on is configured as dispersed volumes. 
> > 
> > Found a way to "fix" the problem. And that is to run dd if=/dev/vda
> > of=/dev/null bs=1M  inside the virtual guest. After that we can
> > copy an
> > image or use storage livemigration.
> > 
> > Is this a gluster problem or an vdsm problem? Or could it be
> > something
> > with qemu-img?
> > 
> > /Johan
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> > ___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] problem while moving/copying disks: vdsm low level image copy failed

2017-07-30 Thread Johan Bernhardsson
Hello,

We get this error message while moving or copying some of the disks on
our main cluster running 4.1.2 on centos7  

This is shown in the engine:
VDSM vbgkvm02 command HSMGetAllTasksStatusesVDS failed: low level Image
copy failed

I can copy it inside the host. And i can use dd to copy. Haven't tried
to run qemu-img manually yet.


This is from vdsm.log on the host:
2017-07-28 09:07:22,741+0200 ERROR (tasks/6) [root] Job u'c82d4c53-
3eb4-405e-a2d5-c4c77519360e' failed (jobs:217)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/jobs.py", line 154, in
run
self._run()
  File "/usr/share/vdsm/storage/sdm/api/copy_data.py", line 88, in _run
self._operation.wait_for_completion()
  File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 329, in
wait_for_completion
self.poll(timeout)
  File "/usr/lib/python2.7/site-packages/vdsm/qemuimg.py", line 324, in
poll
self.error)
QImgError: cmd=['/usr/bin/taskset', '--cpu-list', '0-15',
'/usr/bin/nice', '-n', '19', '/usr/bin/ionice', '-c', '3',
'/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-f', 
'raw', u'/rhev/data-center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-
43
5b-b90d-50bfbf2e8de7/images/750f4184-b852-4b00-94fc-
476f3f5b93c7/3fe43487-3302-4b34-865a-07c5c6aedbf2', '-O', 'raw',
u'/rhev/data-center/mnt/glusterSD/10.137.30.105:_fs03/5d47a297-a21f-
4587-bb7c-dd00d52010d5/images/750f4184-b852-4b00-94fc-
476f3f5b93c7/3fe43487-3302-4b34-865
a-07c5c6aedbf2'], ecode=1, stdout=, stderr=qemu-img: error while
reading sector 12197886: No data available
, message=None


The storage domains are all based on gluster. The storage domains that
we see this on is configured as dispersed volumes. 

Found a way to "fix" the problem. And that is to run dd if=/dev/vda
of=/dev/null bs=1M  inside the virtual guest. After that we can copy an
image or use storage livemigration.

Is this a gluster problem or an vdsm problem? Or could it be something
with qemu-img?

/Johan
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] sanlock ids file broken after server crash

2017-07-30 Thread Johan Bernhardsson
(First reply did not get to the list)

From sanlock.log:

2017-07-30 10:49:31+0200 1766275 [1171]: s310751 lockspace 0924ff77-
ef51-435b-b90d-50bfbf2e8de7:1:/rhev/data-
center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d-
50bfbf2e8de7/dom_md/ids:0
2017-07-30 10:49:31+0200 1766275 [10496]: verify_leader 1 wrong space
name 0924ff77-ef51-435b-b90d-50bfbf2eke7 0924ff77-ef51-435b-b90d-
50bfbf2e8de7 /rhev/data-center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-
ef51-435b-b90d-50bfbf2e8de7/dom_md/ids
2017-07-30 10:49:31+0200 1766275 [10496]: leader1 delta_acquire_begin
error -226 lockspace 0924ff77-ef51-435b-b90d-50bfbf2e8de7 host_id 1
2017-07-30 10:49:31+0200 1766275 [10496]: leader2 path /rhev/data-
center/mnt/glusterSD/vbgsan02:_fs02/0924ff77-ef51-435b-b90d-
50bfbf2e8de7/dom_md/ids offset 0
2017-07-30 10:49:31+0200 1766275 [10496]: leader3 m 12212010 v 30003 ss
512 nh 0 mh 4076 oi 1 og 2031079063 lv 0
2017-07-30 10:49:31+0200 1766275 [10496]: leader4 sn 0924ff77-ef51-
435b-b90d-50bfbf2eke7 rn <93>7^\afa5-3a91-415b-a04c-
221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7
2017-07-30 10:49:32+0200 1766276 [1171]: s310751 add_lockspace fail
result -226


vdsm logs doesnt have any errors and engine.log does not have any
errors. 

And if i check the ids file manually. I can see that everything in it
is correct except for the first host in the cluster where the space
name and host id is broken. 


/Johan

On Sun, 2017-07-30 at 11:18 +0300, Maor Lipchuk wrote:
> Hi Johan,
> 
> Can you please share the vdsm and engine logs.
> 
> Also, it won't harm to also get the sanlock logs just in case sanlock
> was configured to save all debugging in a log file (see
> http://people.redhat.com/teigland/sanlock-messages.txt)).
> Try to share the sanlock ouput by running  'sanlock client status',
> 'sanlock client log_dump'.
> 
> Regards,
> Maor
> 
> On Thu, Jul 27, 2017 at 6:18 PM, Johan Bernhardsson <jo...@kafit.se>
> wrote:
> > 
> > Hello,
> > 
> > The ids file for sanlock is broken on one setup. The first host id
> > in
> > the file is wrong.
> > 
> > From the logfile i have:
> > 
> > verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d-
> > 50bfbf2e�ke7
> > 0924ff77-ef51-435b-b90d-50bfbf2e8de7 /rhev/data-
> > center/mnt/glusterSD/
> > 
> > 
> > 
> > Note the broken char in the space name.
> > 
> > This also apears. And it seams as the hostid too is broken in the
> > ids
> > file:
> > 
> > leader4 sn 0924ff77-ef51-435b-b90d-50bfbf2e�ke7 rn ��7 afa5-3a91-
> > 415b-
> > a04c-221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7
> > 
> > Note the broken chars there as well.
> > 
> > If i check the ids file with less or strings the first row where my
> > vbgkvm01 host are. That has broken chars.
> > 
> > Can this be repaired in some way without taking down all the
> > virtual
> > machines on that storage?
> > 
> > 
> > /Johan
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] sanlock ids file broken after server crash

2017-07-27 Thread Johan Bernhardsson
Hello,

The ids file for sanlock is broken on one setup. The first host id in
the file is wrong.

From the logfile i have:

verify_leader 1 wrong space name 0924ff77-ef51-435b-b90d-50bfbf2e�ke7
0924ff77-ef51-435b-b90d-50bfbf2e8de7 /rhev/data-center/mnt/glusterSD/



Note the broken char in the space name. 

This also apears. And it seams as the hostid too is broken in the ids
file:

leader4 sn 0924ff77-ef51-435b-b90d-50bfbf2e�ke7 rn ��7afa5-3a91-415b-
a04c-221d3e060163.vbgkvm01.a ts 4351980 cs eefa4dd7

Note the broken chars there as well. 

If i check the ids file with less or strings the first row where my
vbgkvm01 host are. That has broken chars.

Can this be repaired in some way without taking down all the virtual
machines on that storage?


/Johan
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users