[ovirt-users] Re: i/o wait and slow system

2020-08-28 Thread Darrell Budic
See below:

> On Aug 27, 2020, at 3:19 PM, info--- via Users  wrote:
> 
> Thank you. Reboot of the engine and afterwards the backup server helped :-)

Good deal.

> Should I revert some of my previous changes? Reduce the write window size?
> - gluster volume set vmstore performance.read-ahead on
> - gluster volume set vmstore performance.stat-prefetch on
> - gluster volume set vmstore performance.write-behind-window-size 64MB
> - gluster volume set vmstore performance.flush-behind on
> - gluster volume set vmstore cluster.choose-local on

I find these tend to be workload & system dependent, you’re best bet is to 
benchmark and test yourself on your system.

I’m using these on my volumes:
performance.stat-prefetch: on
performance.read-ahead: off
performance.write-behind-window-size: 64MB___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YTGSW23P4Z36UMRU26PZCRATRLP75XHC/


[ovirt-users] Re: i/o wait and slow system

2020-08-27 Thread Darrell Budic
Looks like you’ve got a posix or nfs mount there? Is your gluster storage 
domain of type GlusterFS? And make sure you restarted the ovirt-engine after 
enabling LibfgApiSupported, before stopping and restarting the vm.

An active libgf mount looks like:


  
  

  


> On Aug 26, 2020, at 1:12 PM, info--- via Users  wrote:
> 
> I enabled libgfapi and powered off / on the VM.
> 
> - engine-config --all
> - LibgfApiSupported: true version: 4.3
> 
> How can I see that this is active on the VM? The disk looks the same like 
> before.
> 
> - virsh dumpxml 15
>
>   io='threads'/>
>   file='/rhev/data-center/mnt/glusterSD/10.9.9.101:_vmstore/f2c621de-42bf-4dbf-920c-adf4506b786d/images/1e231e3e-d98c-491a-9236-907814d4837/c755aaa3-7d3d-4c0d-8184-c6aae37229ba'>
>
>  
>  
>  
>  
>
> 
> Here is the Volume setup:
> 
> Volume Name: vmstore
> Type: Distributed-Replicate
> Volume ID: 195e2a05-9667-4b8b-b0b7-82294631de50
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x 3 = 6
> Transport-type: tcp
> Bricks:
> Brick1: 10.9.9.101:/gluster_bricks/vmstore/vmstore
> Brick2: 10.9.9.102:/gluster_bricks/vmstore/vmstore
> Brick3: 10.9.9.103:/gluster_bricks/vmstore/vmstore
> Brick4: 10.9.9.101:/gluster_bricks/S4CYNF0M219849L/S4CYNF0M219849L
> Brick5: 10.9.9.102:/gluster_bricks/S4CYNF0M219836L/S4CYNF0M219836L
> Brick6: 10.9.9.103:/gluster_bricks/S4CYNF0M219801Y/S4CYNF0M219801Y
> Options Reconfigured:
> performance.write-behind-window-size: 64MB
> performance.flush-behind: on
> performance.stat-prefetch: on
> performance.client-io-threads: on
> nfs.disable: on
> transport.address-family: inet
> performance.strict-o-direct: on
> performance.quick-read: off
> performance.read-ahead: on
> performance.io-cache: off
> performance.low-prio-threads: 32
> network.remote-dio: off
> cluster.eager-lock: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 1
> features.shard: on
> user.cifs: off
> cluster.choose-local: on
> client.event-threads: 4
> server.event-threads: 4
> network.ping-timeout: 30
> storage.owner-uid: 36
> storage.owner-gid: 36
> cluster.granular-entry-heal: enable
> 
> Thank you for your support.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z3JD7MQV2PIQZJSMW6NKPL4W7JLBGPKN/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OMVWGGWHNP33GNYFMLUYYFS32BL3FHQC/


[ovirt-users] Re: broker.log not rotating

2020-07-07 Thread Darrell Budic
They’re just log files, so generally safe to delete. You may want to take a 
look at the huge one though, see what’s up. I had a similar problem that turned 
out to be a broken HA agent install, cleaned and reinstalled and it went back 
to the same volume of logs as the others.

Now I need to check if they added logrotate.d config files for it to clean 
those up, hum.

  -Darrell

> On Jul 2, 2020, at 12:00 AM, Anton Louw via Users  wrote:
> 
> 
> 
> Hi All,
>  
> I had a space alert on one of my nodes this morning, and when I looked 
> around, I saw that  var/log/ovirt-hosted-engine-ha/broker.log was sitting at 
> around 30GB. Does anybody know if it is safe to delete the log file? Or is 
> there another process that I should follow? 
>  
> I had a look at my other nodes, and the broker.log file does not exceed 4GB.
>  
> Thank you
> 
> Anton Louw
> Cloud Engineer: Storage and Virtualization at Vox
> T:  087 805  | D: 087 805 1572
> M: N/A
> E:  <>anton.l...@voxtelecom.co.za 
> A: Rutherford Estate, 1 Scott Street, Waverley, Johannesburg
> www.vox.co.za 
> 
>    
>   
>  
>  
>   
>  
> Disclaimer
> 
> The contents of this email are confidential to the sender and the intended 
> recipient. Unless the contents are clearly and entirely of a personal nature, 
> they are subject to copyright in favour of the holding company of the Vox 
> group of companies. Any recipient who receives this email in error should 
> immediately report the error to the sender and permanently delete this email 
> from all storage devices.
> 
> This email has been scanned for viruses and malware, and may have been 
> automatically archived by Mimecast Ltd, an innovator in Software as a Service 
> (SaaS) for business. Providing a safer and more useful place for your human 
> generated data. Specializing in; Security, archiving and compliance. To find 
> out more Click Here 
> .
> 
> 
> 
> ___
> Users mailing list -- users@ovirt.org 
> To unsubscribe send an email to users-le...@ovirt.org 
> 
> Privacy Statement: https://www.ovirt.org/privacy-policy.html 
> 
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/ 
> 
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4L6OOVVW6KOSBFSJIXASJT7GADTX24YJ/
>  
> 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/T6O654OB5LOQBU7JDNNSDW4ALQHZGRZL/


[ovirt-users] Re: Sometimes paused due to unknown storage error on gluster

2020-03-28 Thread Darrell Budic
Nic,

I didn’t see what version of gluster you were running? There was a leak that 
caused similar behavior for me in early 6.x versions, but it was fixed in 6.6 
(I think, you’d have to find it in the bugzilla to be sure) and I havn’t seen 
this in a while. Not sure it’s exactly your symptoms (mine would pause after a 
while running, not immediately), but might be worth checking on.

 -Darrell

> On Mar 28, 2020, at 12:26 PM, Nir Soffer  wrote:
> 
> On Sat, Mar 28, 2020 at 1:59 PM Strahil Nikolov  > wrote:
>> 
>> On March 28, 2020 11:03:54 AM GMT+02:00, Gianluca Cecchi 
>>  wrote:
>>> On Sat, Mar 28, 2020 at 8:39 AM Strahil Nikolov 
>>> wrote:
>>> 
 On March 28, 2020 3:21:45 AM GMT+02:00, Gianluca Cecchi <
 gianluca.cec...@gmail.com> wrote:
 
 
>>> [snip]
>>> 
 Actually it only happened with empty disk (thin provisioned) and
>>> sudden
> high I/O during the initial phase of install of the OS; it didn't
> happened
> then during normal operaton (even with 600MB/s of throughput).
 
>>> 
>>> [snip]
>>> 
>>> 
 Hi Gianluca,
 
 Is it happening to machines with preallocated disks or on machines
>>> with
 thin disks ?
 
 Best Regards,
 Strahil Nikolov
 
>>> 
>>> thin provisioned. But as I have tro create many VMs with 120Gb of disk
>>> size
>>> of which probably only a part during time will be allocated, it would
>>> be
>>> unfeasible to make them all preallocated. I learned that thin is not
>>> good
>>> for block based storage domains and heavy I/O, but I would hope that it
>>> is
>>> not the same with file based storage domains...
>>> Thanks,
>>> Gianluca
>> 
>> This is normal - gluster cannot allocate fast enough the needed shards (due 
>> to high IO),  so the qemu pauses  the VM until  storage  is available  again 
>> .
> 
> I don't know glusterfs internals, but I think this is very unlikely.
> 
> For block storage thin provisioning in vdsm, vdsm is responsible for 
> allocating
> more space, but vdsm is not in the datapath, it is monitoring the allocation 
> and
> allocate more data when free space reaches a limit. It has no way to block I/O
> before more space is available. Gluster is in the datapath and can
> block I/O until
> it can process it.
> 
> Can you explain what is the source for this theory?
> 
>> You can think about VDO (with deduplication ) as a  PV for the  Thin LVM and 
>> this way you can preallocate your VMs , while saving space (deduplication, 
>> zero-block elimination  and even compression).
>> Of  course, VDO will reduce  performance (unless  you have battery-backed 
>> write cache and compression is disabled),  but  tbe benefits will be alot 
>> more.
>> 
>> Another approach is to increase the shard size - so gluster will create 
>> fewer  shards,  but allocation on disk will be higher.
>> 
>> Best Regards,
>> Strahil Nikolov
>> ___
>> Users mailing list -- users@ovirt.org 
>> To unsubscribe send an email to users-le...@ovirt.org 
>> 
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html 
>> 
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/ 
>> 
>> List Archives: 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/77DYUF7A5D6BIAYGVCBDKRBX2YWWJDJ4/
>>  
>> 
> ___
> Users mailing list -- users@ovirt.org 
> To unsubscribe send an email to users-le...@ovirt.org 
> 
> Privacy Statement: https://www.ovirt.org/privacy-policy.html 
> 
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/ 
> 
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/2LC5HGDMXJPOMVIYABLM77BRWG6LYOZJ/
>  
> 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ESC4GEVGZ425ETKDWZDOOLELJABS2L2M/


[ovirt-users] Re: Speed Issues

2020-03-24 Thread Darrell Budic
Christian,

Adding on to Stahil’s notes, make sure you’re using jumbo MTUs on servers and 
client host nodes. Making sure you’re using appropriate disk schedulers on 
hosts and VMs is important, worth double checking that it’s doing what you 
think it is. If you are only HCI, gluster’s choose-local on is a good thing, 
but try

cluster.choose-local: false
cluster.read-hash-mode: 3

if you have separate servers or nodes with are not HCI to allow it spread reads 
over multiple nodes.

Test out these settings if you have lots of RAM and cores on your servers, they 
work well for me with 20 cores and 64GB ram on my servers with my load:

performance.io-thread-count: 64
performance.low-prio-threads: 32

these are worth testing for your workload.

If you’re running VMs with these, test out libglapi connections, it’s 
significantly better for IO latency than plain fuse mounts. If you can tolerate 
the issues, the biggest one at the moment being you can’t take snapshots of the 
VMs with it enabled as of March.

If you have tuned available, I use throughput-performance on my servers and 
guest-host on my vm nodes, throughput-performance on some HCI ones. 

I’d test with out the fips-rchecksum setting, that may be creating extra work 
for your servers.

If you mounted individual bricks, check that you disabled barriers on them at 
mount if appropriate.

Hope it helps,

  -Darrell

> On Mar 24, 2020, at 6:23 AM, Strahil Nikolov  wrote:
> 
> On March 24, 2020 11:20:10 AM GMT+02:00, Christian Reiss 
>  wrote:
>> Hey Strahil,
>> 
>> seems you're the go-to-guy with pretty much all my issues. I thank you 
>> for this and your continued support. Much appreciated.
>> 
>> 
>> 200mb/reads however seems like a broken config or malfunctioning
>> gluster 
>> than requiring performance tweaks. I enabled profiling so I have real 
>> life data available. But seriously even without tweaks I would like 
>> (need) 4 times those numbers, 800mb write speed is okay'ish, given the 
>> fact that 10gbit backbone can be the limiting factor.
>> 
>> We are running BigCouch/CouchDB Applications that really really need
>> IO. 
>> Not in throughput but in response times. 200mb/s is just way off.
>> 
>> It feels as gluster can/should do more, natively.
>> 
>> -Chris.
>> 
>> On 24/03/2020 06:17, Strahil Nikolov wrote:
>>> Hey Chris,,
>>> 
>>> You got some options.
>>> 1. To speedup the reads in HCI - you can use the option :
>>> cluster.choose-local: on
>>> 2. You can adjust the server and client event-threads
>>> 3. You can use NFS Ganesha (which connects to all servers via
>> libgfapi)  as a NFS Server.
>>> In such case you have to use some clustering like ctdb or pacemaker.
>>> Note:disable cluster.choose-local if you use this one
>>> 4 You can try the built-in NFS , although it's deprecated (NFS
>> Ganesha is fully supported)
>>> 5.  Create a gluster profile during the tests. I have seen numerous
>> improperly selected tests -> so test with real-world  workload.
>> Synthetic tests are not good.
>>> 
>>> Best Regards,
>>> Strahil Nikolov
> 
> Hey Chris,
> 
> What type is your VM ?
> Try with 'High Performance' one (there is a  good RH documentation on that 
> topic).
> 
> If the DB load  was  directly on gluster, you could use the settings in the 
> '/var/lib/gluster/groups/db-workload'  to optimize that, but I'm not sure  if 
> this will bring any performance  on a VM.
> 
> 1. Check the VM disk scheduler. Use 'noop/none' (depends on multiqueue is 
> enabled) to allow  the Hypervisor aggregate the I/O requests from multiple 
> VMs.
> Next, set 'noop/none' disk scheduler  on the hosts - these 2 are the optimal 
> for SSDs and NVME disks  (if I recall corectly you are  using SSDs)
> 
> 2. Disable cstates on the host and Guest (there are a lot of articles  about 
> that)
> 
> 3. Enable MTU 9000 for Hypervisor (gluster node).
> 
> 4. You can try setting/unsetting the tunables in the db-workload  group and 
> run benchmarks with real workload  .
> 
> 5.  Some users  reported  that enabling  TCP offload  on the hosts gave huge  
> improvement in performance  of gluster  - you can try that.
> Of course  there are mixed  feelings - as others report  that disabling it 
> brings performance. I guess  it is workload  specific.
> 
> 6.  You can try to tune  the 'performance.readahead'  on your  gluster volume.
> 
> Here are some settings  of some users /from an old e-mail/:
> 
> performance.read-ahead: on
> performance.stat-prefetch: on
> performance.flush-behind: on 
> performance.client-io-threads: on
> performance.write-behind-window-size: 64MB (shard  size)
> 
> 
> 
> For a  48 cores / host:
> 
> server.event-threads: 4
> client.event-threads: 8
> 
> Your ecent-threads  seem to be too high.And yes, documentation explains it , 
> but without an example it becomes more confusing.
> 
> Best Regards,
> Strahil Nikolov
>  
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to 

[ovirt-users] Re: paused vm's will not resume

2020-02-18 Thread Darrell Budic
What version of ovirt are you running? What is your storage domain, nfs or 
gluster? Using libglfapi? How full is your storage domain? If it’s gluster, 
what type is it and how full are all the bricks?

Have you tried stopping and restarting them? Not ideal, but may get you a 
running system again. 

Are there other VMs which continue running without trouble? If so, do these two 
do heavy disk IO?

good luck,

  -Darrell

> On Feb 17, 2020, at 11:57 PM, Strahil Nikolov  wrote:
> 
> On February 18, 2020 6:52:44 AM GMT+02:00, eev...@digitaldatatechs.com wrote:
>> I have 2 vm's, which are the most important in my world, that paused
>> and will not resume. I have googled this to death but no solution. It
>> stated a lack of space but none of the drives on my hosts are using
>> more than 30% or there space and these 2 have ran on kvm host for
>> several years and always had at least 50% free space. 
>> I like ovirt and want to use it but I cannot tolerate the down time. If
>> I cannot get this resolved, I'm going back to kvm hosts. I am pulling
>> my hair out here.
>> If anyone can help with this issue, please let me know. 
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/JBXNV3WT2W72I2E7EXM2KY4YN37STIMC/
> 
> Anything in libvirt/vdsm logs ?
> 
> Did you update gluster recently ? I had such issues when gluster refused 
> reading of shards.
> Can you read the disk via:
> sudo -u vdsm dd 
> if=/rhev/full/path/to/mountpoint/long_string_representing_file of=/dev/null 
> bs=4M status=progress
> 
> What happens when you run dd as root -> can you read from the image ?
> 
> 
> Any errors in gluster logs ?
> 
> Best Regards,
> Strahil Nikolov
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/AVNOLJSLTXOHPJOJXBJS65TSDGHCOLBG/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IKVXCDQ5KXJISRLG7SHNU7CZCX2MYRMH/


[ovirt-users] Re: glusterfs

2020-02-14 Thread Darrell Budic
Hi Eric-

Glad you got thought that part. I don’t use iscsi backed volumes for my gluster 
storage, so I don’t much advice for you there. I’ve cc’d the ovirt users list 
back in, someone there may be able to help you futher. It’s good practice to 
reply to the list and specific people when conversing here, so you might want 
to watch to be sure you don’t drop the cc: in the future.

Re: the storage master, it’s not related to where the VM disks are stored. Once 
you mange to get a new storage domain setup, you’ll be able to create disks on 
whichever domain you want, and that is how you determine what VM disk is hooked 
up to what. You can even have a VM with disks on multiple storage domains, can 
be good for high performance needs. The SDM may even move around if a domain 
become unavailable. You may want to check the list archives for discussion on 
this, I seem to recall some in the past. You also should confirm where the 
disks for your HA engine are located, they may be on your local raid disk 
instead of the iscsi disks if the SDM is on a local disk…

Good luck,

  -Darrell



> On Feb 14, 2020, at 3:03 PM,  
>  wrote:
> 
> I enabled gluster and reinstalled and all went well. I set it for distributed 
> replication so I need 3 nodes. I migrated the rest of my vm's and I am 
> installing the third node shortly. 
> My biggest concern is getting the storage master on the lun it was previously 
> set to. I get the snapshots on it so I can recover from disaster more easily. 
> I need it to persistently be on the lun I designate. 
> Also, I want the luns to be the gluster replication volumes but ther is no 
> mount point in fstab on the machines. 
> I am new to gluster as well so please be patient with me.
> 
> Eric Evans
> Digital Data Services LLC.
> 304.660.9080
> 
> 
> -Original Message-
> From: Darrell Budic  
> Sent: Friday, February 14, 2020 2:58 PM
> To: eev...@digitaldatatechs.com
> Subject: Re: [ovirt-users] Re: glusterfs
> 
> You don’t even need to clean everything out, unless you need to destroy your 
> old storage to the create new gluster backing bricks. Ovirt has a feature to 
> migrate date between storage domains you can use to move an existing VM disk 
> to a different storage facility. Note that “reinstall” is an option on the 
> Installation menu for hosts, you do not need to remove the host first. It 
> will pretty much just add the vdsm-gluster components in this case, safe to 
> use. Just put it in maintenance first.
> 
> You can certainly start fresh in the manner you describe if you want.
> 
>> On Feb 14, 2020, at 11:56 AM,  
>>  wrote:
>> 
>> I have already imported a few vm's to see how the import process would go. 
>> So, I remove vm's and the current storage domains, and the hosts, then add 
>> gluster on the main ovirt node, then add the hosts back, storage back and 
>> reimport vm's? 
>> I want to make sure before I get started. My first go around with Ovirt and 
>> want to make sure before I change anything.
>> 
>> Eric Evans
>> Digital Data Services LLC.
>> 304.660.9080
>> 
>> 
>> -Original Message-
>> From: Darrell Budic 
>> Sent: Friday, February 14, 2020 11:54 AM
>> To: eev...@digitaldatatechs.com
>> Cc: users@ovirt.org
>> Subject: [ovirt-users] Re: glusterfs
>> 
>> You can add it in to a running ovirt cluster, it just isn’t as automatic. 
>> First you need to enable Gluster in at the cluster settings level for a new 
>> or existing cluster. Then either install/reinstall your nodes, or install 
>> gluster manually and add vdsm-gluster packages. You can create a stand alone 
>> gluster server set this way, you don’t need any vddm packages, but then you 
>> have to create volumes manually. Once you’ve got that done, you can create 
>> bricks and volumes in the GUI or by hand, and then add a new storage domain 
>> and start using it. There may be ansible for some of this, but I haven’t 
>> done it in a while and am not sure what’s available there.
>> 
>> -Darrell
>> 
>>> On Feb 14, 2020, at 8:22 AM, eev...@digitaldatatechs.com wrote:
>>> 
>>> I currently have 3 nodes, one is the engine node and 2 Centos 7 hosts, and 
>>> I plan to add another Centos 7 KVM host once I get all the vm's migrated. I 
>>> have san storage plus the raid 5 internal disks. All OS are installed on 
>>> mirrored SAS raid 1. I want to use the raid 5 vd's as exports, ISO and use 
>>> the 4TB  iscsi for the vm's to run on. The iscsi has snapshots hourly and 
>>> over write weekly.
>>> So here is my question: I want to add glusterfs, but after further reading, 
>>> that should have been done in t

[ovirt-users] Re: glusterfs

2020-02-14 Thread Darrell Budic
You can add it in to a running ovirt cluster, it just isn’t as automatic. First 
you need to enable Gluster in at the cluster settings level for a new or 
existing cluster. Then either install/reinstall your nodes, or install gluster 
manually and add vdsm-gluster packages. You can create a stand alone gluster 
server set this way, you don’t need any vddm packages, but then you have to 
create volumes manually. Once you’ve got that done, you can create bricks and 
volumes in the GUI or by hand, and then add a new storage domain and start 
using it. There may be ansible for some of this, but I haven’t done it in a 
while and am not sure what’s available there.

  -Darrell

> On Feb 14, 2020, at 8:22 AM, eev...@digitaldatatechs.com wrote:
> 
> I currently have 3 nodes, one is the engine node and 2 Centos 7 hosts, and I 
> plan to add another Centos 7 KVM host once I get all the vm's migrated. I 
> have san storage plus the raid 5 internal disks. All OS are installed on 
> mirrored SAS raid 1. I want to use the raid 5 vd's as exports, ISO and use 
> the 4TB  iscsi for the vm's to run on. The iscsi has snapshots hourly and 
> over write weekly.
> So here is my question: I want to add glusterfs, but after further reading, 
> that should have been done in the initial setup. I am not new to Linux, but 
> new to Ovirt and need to know if  I can implement glusterfs now or if it's a 
> start from scratch situation. I really don't want to start over but would 
> like the redundancy. 
> Any advice is appreciated. 
> Eric
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7PN44O7U2FC4WGIXQAQF3MRKUDJBWZD/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GIJMUPFKPWOAXDDUKDGXO2VI2QFI3D6G/


[ovirt-users] Re: Enabling Libgfapi in 4.3.8 - VMs won't start

2020-02-13 Thread Darrell Budic
Well, now that I’ve gone and read through that bug again in detail, I’m not 
sure I’ve worked around it after all. I do seem to recall additional discussion 
on the original bug for HA engine ligfapi a mention that RR-DNS would work to 
resolve the issue, but can’t remember the bug ID at the moment. I will test 
thoroughly the next time I update my glusterfs servers. But I firmly believe 
that I’ve never encountered that issue in over 3 years of running gluster with 
libgfapi enabled.. 

I use round robin DNS, and in theory, QEMU retries until it gets a working 
server. I also have said DNS setup in host files on all my hosts and gluster 
servers, having discovered the hard way that when your DNS server runs on an 
ovirt managed VM, you have a bootstrap problem when thing break badly :) 
Somewhere around gluster 3.12, I added backup servers to the mount options for 
my gluster storage volumes as well, and have’t had any issues with that.

And to be frank, the significant performance bonus from libgfapi is still 
absolutely worth it to me even if it means automatic HA won’t work if one 
particular server is down. I can always intervene in the DNS on my hosts if I 
have to, and it just hasn’t come up yet. 

  -Darrell


> On Feb 13, 2020, at 5:19 PM, Strahil Nikolov  wrote:
> 
> On February 13, 2020 11:51:41 PM GMT+02:00, Stephen Panicho 
> mailto:s.pani...@gmail.com>> wrote:
>> Darrell, would you care to elaborate on your HA workaround?
>> 
>> As far as I understand, only the primary Gluster host is visible to
>> libvirt
>> when using gfapi, so if that host goes down, all VMs break. I imagine
>> you're using a round-robin DNS entry for the primary Gluster host, but
>> I'd
>> like to be sure.
>> 
>> On Wed, Feb 12, 2020 at 11:01 AM Darrell Budic 
>> wrote:
>> 
>>> Yes. I’m using libgfapi access on gluster 6.7 with overt 4.3.8 just
>> fine,
>>> but I don’t use snapshots. You can work around the HA issue with DNS
>> and
>>> backup server entries on the storage domain as well. Worth it to me
>> for the
>>> performance, YMMV.
>>> 
>>> On Feb 12, 2020, at 8:04 AM, Jayme  wrote:
>>> 
>>> From my understanding it's not a default option but many users are
>> still
>>> using libgfapi successfully. I'm not sure about its status in the
>> latest
>>> 4.3.8 release but I know it is/was working for people in previous
>> versions.
>>> The libgfapi bugs affect HA and snapshots (on 3 way replica HCI) but
>> it
>>> should still be working otherwise, unless like I said something
>> changed in
>>> more recent releases of oVirt.
>>> 
>>> On Wed, Feb 12, 2020 at 9:43 AM Guillaume Pavese <
>>> guillaume.pav...@interactiv-group.com> wrote:
>>> 
>>>> Libgfapi is not supported because of an old bug in qemu. That qemu
>> bug is
>>>> slowly getting fixed, but the bugs about Libgfapi support in ovirt
>> have
>>>> since been closed as WONTFIX and DEFERRED
>>>> 
>>>> See :
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1465810
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1484660
>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1484227 
>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1484227>> : "No plans to
>>>> enable libgfapi in RHHI-V for now. Closing this bug"
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1484227 
>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1484227> : "No plans to
>>>> enable libgfapi in RHHI-V for now. Closing this bug"
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1633642 
>>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1633642> : "Closing this
>> as
>>>> no action taken from long back.Please reopen if required."
>>>> 
>>>> Would be nice if someone could reopen the closed bugs so this
>> feature
>>>> doesn't get forgotten
>>>> 
>>>> Guillaume Pavese
>>>> Ingénieur Système et Réseau
>>>> Interactiv-Group
>>>> 
>>>> 
>>>> On Tue, Feb 11, 2020 at 9:58 AM Stephen Panicho
>> mailto:s.pani...@gmail.com>>
>>>> wrote:
>>>> 
>>>>> I used the cockpit-based hc setup and "option
>> rpc-auth-allow-insecure"
>>>>> is absent from /etc/glusterfs/glusterd.vol.
>>>>> 
>>>>> I'm going to redo the cluster this week and report back. Thanks for
>> the
>>>>> tip!
>>>>> 
>>>>> On

[ovirt-users] Re: Enabling Libgfapi in 4.3.8 - VMs won't start

2020-02-12 Thread Darrell Budic
Yes. I’m using libgfapi access on gluster 6.7 with overt 4.3.8 just fine, but I 
don’t use snapshots. You can work around the HA issue with DNS and backup 
server entries on the storage domain as well. Worth it to me for the 
performance, YMMV.

> On Feb 12, 2020, at 8:04 AM, Jayme  wrote:
> 
> From my understanding it's not a default option but many users are still 
> using libgfapi successfully. I'm not sure about its status in the latest 
> 4.3.8 release but I know it is/was working for people in previous versions. 
> The libgfapi bugs affect HA and snapshots (on 3 way replica HCI) but it 
> should still be working otherwise, unless like I said something changed in 
> more recent releases of oVirt.
> 
> On Wed, Feb 12, 2020 at 9:43 AM Guillaume Pavese 
>  <mailto:guillaume.pav...@interactiv-group.com>> wrote:
> Libgfapi is not supported because of an old bug in qemu. That qemu bug is 
> slowly getting fixed, but the bugs about Libgfapi support in ovirt have since 
> been closed as WONTFIX and DEFERRED
> 
> See :
> https://bugzilla.redhat.com/show_bug.cgi?id=1465810 
> <https://bugzilla.redhat.com/show_bug.cgi?id=1465810>
> https://bugzilla.redhat.com/show_bug.cgi?id=1484660 
> <https://bugzilla.redhat.com/show_bug.cgi?id=1484227> : "No plans to enable 
> libgfapi in RHHI-V for now. Closing this bug"
> https://bugzilla.redhat.com/show_bug.cgi?id=1484227 
> <https://bugzilla.redhat.com/show_bug.cgi?id=1484227> : "No plans to enable 
> libgfapi in RHHI-V for now. Closing this bug"
> https://bugzilla.redhat.com/show_bug.cgi?id=1633642 
> <https://bugzilla.redhat.com/show_bug.cgi?id=1633642> : "Closing this as no 
> action taken from long back.Please reopen if required."
> 
> Would be nice if someone could reopen the closed bugs so this feature doesn't 
> get forgotten
> 
> Guillaume Pavese
> Ingénieur Système et Réseau
> Interactiv-Group
> 
> 
> On Tue, Feb 11, 2020 at 9:58 AM Stephen Panicho  <mailto:s.pani...@gmail.com>> wrote:
> I used the cockpit-based hc setup and "option rpc-auth-allow-insecure" is 
> absent from /etc/glusterfs/glusterd.vol.
> 
> I'm going to redo the cluster this week and report back. Thanks for the tip!
> 
> On Mon, Feb 10, 2020 at 6:01 PM Darrell Budic  <mailto:bu...@onholyground.com>> wrote:
> The hosts will still mount the volume via FUSE, but you might double check 
> you set the storage up as Gluster and not NFS.
> 
> Then gluster used to need some config in glusterd.vol to set 
> 
> option rpc-auth-allow-insecure on
> 
> I’m not sure if that got added to a hyper converged setup or not, but I’d 
> check it.
> 
>> On Feb 10, 2020, at 4:41 PM, Stephen Panicho > <mailto:s.pani...@gmail.com>> wrote:
>> 
>> No, this was a relatively new cluster-- only a couple days old. Just a 
>> handful of VMs including the engine.
>> 
>> On Mon, Feb 10, 2020 at 5:26 PM Jayme > <mailto:jay...@gmail.com>> wrote:
>> Curious do the vms have active snapshots?
>> 
>> On Mon, Feb 10, 2020 at 5:59 PM > <mailto:s.pani...@gmail.com>> wrote:
>> Hello, all. I have a 3-node Hyperconverged oVirt 4.3.8 cluster running on 
>> CentOS 7.7 hosts. I was investigating poor Gluster performance and heard 
>> about libgfapi, so I thought I'd give it a shot. Looking through the 
>> documentation, followed by lots of threads and BZ reports, I've done the 
>> following to enable it:
>> 
>> First, I shut down all VMs except the engine. Then...
>> 
>> On the hosts:
>> 1. setsebool -P virt_use_glusterfs on
>> 2. dynamic_ownership=0 in /etc/libvirt/qemu.conf
>> 
>> On the engine VM:
>> 1. engine-config -s LibgfApiSupported=true --cver=4.3
>> 2. systemctl restart ovirt-engine
>> 
>> VMs now fail to launch. Am I doing this correctly? I should also note that 
>> the hosts still have the Gluster domain mounted via FUSE.
>> 
>> Here's a relevant bit from engine.log:
>> 
>> 2020-02-06T16:38:32.573511Z qemu-kvm: -drive 
>> file=gluster://node1.fs.trashnet.xyz:24007/vmstore/781717e5-1cff-43a1-b586-9941503544e8/images/a1d56b14-6d72-4f46-a0aa-eb0870c36bc4/a2314816-7970-49ce-a80c-ab0d1cf17c78,file.debug=4,format=qcow2,if=none,id=drive-ua-a1d56b14-6d72-4f46-a0aa-eb0870c36bc4,serial=a1d56b14-6d72-4f46-a0aa-eb0870c36bc4,werror=stop,rerror=stop,cache=none,discard=unmap,aio=native
>>  
>> <http://node1.fs.trashnet.xyz:24007/vmstore/781717e5-1cff-43a1-b586-9941503544e8/images/a1d56b14-6d72-4f46-a0aa-eb0870c36bc4/a2314816-7970-49ce-a80c-ab0d1cf17c78,file.debug=4,format=qcow2,if=none,id=drive-ua-a1d56b14-6d72-4f46-a0aa-eb0870c36bc4,serial=a1d56b14-6d72-4f

[ovirt-users] Re: Enabling Libgfapi in 4.3.8 - VMs won't start

2020-02-10 Thread Darrell Budic
The hosts will still mount the volume via FUSE, but you might double check you 
set the storage up as Gluster and not NFS.

Then gluster used to need some config in glusterd.vol to set 

option rpc-auth-allow-insecure on

I’m not sure if that got added to a hyper converged setup or not, but I’d check 
it.

> On Feb 10, 2020, at 4:41 PM, Stephen Panicho  wrote:
> 
> No, this was a relatively new cluster-- only a couple days old. Just a 
> handful of VMs including the engine.
> 
> On Mon, Feb 10, 2020 at 5:26 PM Jayme  > wrote:
> Curious do the vms have active snapshots?
> 
> On Mon, Feb 10, 2020 at 5:59 PM  > wrote:
> Hello, all. I have a 3-node Hyperconverged oVirt 4.3.8 cluster running on 
> CentOS 7.7 hosts. I was investigating poor Gluster performance and heard 
> about libgfapi, so I thought I'd give it a shot. Looking through the 
> documentation, followed by lots of threads and BZ reports, I've done the 
> following to enable it:
> 
> First, I shut down all VMs except the engine. Then...
> 
> On the hosts:
> 1. setsebool -P virt_use_glusterfs on
> 2. dynamic_ownership=0 in /etc/libvirt/qemu.conf
> 
> On the engine VM:
> 1. engine-config -s LibgfApiSupported=true --cver=4.3
> 2. systemctl restart ovirt-engine
> 
> VMs now fail to launch. Am I doing this correctly? I should also note that 
> the hosts still have the Gluster domain mounted via FUSE.
> 
> Here's a relevant bit from engine.log:
> 
> 2020-02-06T16:38:32.573511Z qemu-kvm: -drive 
> file=gluster://node1.fs.trashnet.xyz:24007/vmstore/781717e5-1cff-43a1-b586-9941503544e8/images/a1d56b14-6d72-4f46-a0aa-eb0870c36bc4/a2314816-7970-49ce-a80c-ab0d1cf17c78,file.debug=4,format=qcow2,if=none,id=drive-ua-a1d56b14-6d72-4f46-a0aa-eb0870c36bc4,serial=a1d56b14-6d72-4f46-a0aa-eb0870c36bc4,werror=stop,rerror=stop,cache=none,discard=unmap,aio=native
>  
> :
>  Could not read qcow2 header: Invalid argument.
> 
> The full engine.log from one of the attempts:
> 
> 2020-02-06 16:38:24,909Z INFO  
> [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
> (ForkJoinPool-1-worker-12) [] add VM 
> 'df9dbac4-35c0-40ee-acd4-a1cfc959aa8b'(yumcache) to rerun treatment
> 2020-02-06 16:38:25,010Z ERROR 
> [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] 
> (ForkJoinPool-1-worker-12) [] Rerun VM 
> 'df9dbac4-35c0-40ee-acd4-a1cfc959aa8b'. Called from VDS 
> 'node2.ovirt.trashnet.xyz '
> 2020-02-06 16:38:25,091Z WARN  
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
> (EE-ManagedThreadFactory-engine-Thread-216) [] EVENT_ID: 
> USER_INITIATED_RUN_VM_FAILED(151), Failed to run VM yumcache on Host 
> node2.ovirt.trashnet.xyz .
> 2020-02-06 16:38:25,166Z INFO  [org.ovirt.engine.core.bll.RunVmCommand] 
> (EE-ManagedThreadFactory-engine-Thread-216) [] Lock Acquired to object 
> 'EngineLock:{exclusiveLocks='[df9dbac4-35c0-40ee-acd4-a1cfc959aa8b=VM]', 
> sharedLocks=''}'
> 2020-02-06 16:38:25,179Z INFO  
> [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] 
> (EE-ManagedThreadFactory-engine-Thread-216) [] START, 
> IsVmDuringInitiatingVDSCommand( 
> IsVmDuringInitiatingVDSCommandParameters:{vmId='df9dbac4-35c0-40ee-acd4-a1cfc959aa8b'}),
>  log id: 2107f52a
> 2020-02-06 16:38:25,181Z INFO  
> [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] 
> (EE-ManagedThreadFactory-engine-Thread-216) [] FINISH, 
> IsVmDuringInitiatingVDSCommand, return: false, log id: 2107f52a
> 2020-02-06 16:38:25,298Z INFO  [org.ovirt.engine.core.bll.RunVmCommand] 
> (EE-ManagedThreadFactory-engine-Thread-216) [] Running command: RunVmCommand 
> internal: false. Entities affected :  ID: 
> df9dbac4-35c0-40ee-acd4-a1cfc959aa8b Type: VMAction group RUN_VM with role 
> type USER
> 2020-02-06 16:38:25,313Z INFO  
> [org.ovirt.engine.core.bll.utils.EmulatedMachineUtils] 
> (EE-ManagedThreadFactory-engine-Thread-216) [] Emulated machine 
> 'pc-q35-rhel7.6.0' which is different than that of the cluster is set for 
> 'yumcache'(df9dbac4-35c0-40ee-acd4-a1cfc959aa8b)
> 2020-02-06 16:38:25,382Z INFO  
> [org.ovirt.engine.core.vdsbroker.UpdateVmDynamicDataVDSCommand] 
> (EE-ManagedThreadFactory-engine-Thread-216) [] START, 
> UpdateVmDynamicDataVDSCommand( 
> UpdateVmDynamicDataVDSCommandParameters:{hostId='null', 
> vmId='df9dbac4-35c0-40ee-acd4-a1cfc959aa8b', 
> vmDynamic='org.ovirt.engine.core.common.businessentities.VmDynamic@9774a64'}),
>  log id: 4a83911f
> 2020-02-06 16:38:25,417Z INFO  
> [org.ovirt.engine.core.vdsbroker.UpdateVmDynamicDataVDSCommand] 
> 

[ovirt-users] Re: Emergency :/ No VMs starting

2020-02-02 Thread Darrell Budic
Check the contents of these directories:

[root@node03:/rhev/data-center/mnt/glusterSD/node01.dc-dus.dalason.net 
:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/images]
 # l
total 345K
drwxr-xr-x. 46 vdsm kvm 8.0K Feb  2 23:18 .
drwxr-xr-x.  5 vdsm kvm   64 Feb  3 00:31 ..
drwxr-xr-x.  2 vdsm kvm 8.0K Jan 17 15:54 0b21c949-7133-4b34-b909-a6660ae12800
drwxr-xr-x.  2 vdsm kvm  165 Feb  3 01:48 0dde79ab-d773-4d23-b397-7c39371ccc60
drwxr-xr-x.  2 vdsm kvm 8.0K Jan 17 09:49 1347d489-012b-40fc-acb5-d00a9ea133a4
drwxr-xr-x.  2 vdsm kvm 8.0K Jan 22 15:04 1ccc4db6-f47d-4474-b0fa-a0c1eddb0fa7
drwxr-xr-x.  2 vdsm kvm 8.0K Jan 21 16:28 22cab044-a26d-4266-9af7-a6408eaf140c
drwxr-xr-x.  2 vdsm kvm 8.0K Jan 30 06:03 288d061a-6c6c-4536-a594-3bede63c0654
drwxr-xr-x.  2 vdsm kvm 8.0K Jan  9 16:46 40c51753-1533-45ab-b9de-2c51d8a18370

and what version of Ovirt are you running? This looks a bit like a libvirt 
change/bug that changed ownership on the actual disk image to root.root on 
shutdown/migrations, preventing later start attempts. 

This may help if that’s the case:
chown -R vdsm.kvm /rhev/data-center/mnt/glusterSD/node01.dc-dus.dalason.net 
:_ssd__storage/fec2eb5e-21b5-496b-9ea5-f718b2cb5556/images

> On Feb 2, 2020, at 8:54 PM, Christian Reiss  wrote:
> 
> Hey,
> 
> it was _while_ placing the host _into_ maintenance, to be precise.
> I restarted the volumes and even each machine and the entire cluster to no 
> avail.
> 
> I am currently migrating the disk images out of ovirt into openvz/kvm to get 
> them running. The copied disk images are flawless and working.
> 
> 
> On 03/02/2020 03:28, Jayme wrote:
>> I checked my HCI cluster and those permissions seem to match what I'm 
>> seeing.  Since there's no VMs running currently have you tried restarting 
>> the gluster volumes as well as the glusterd service? I'm not sure what would 
>> have caused this with one host placed in maintenance.
> 
> -- 
> with kind regards,
> mit freundlichen Gruessen,
> 
> Christian Reiss
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/FOFNQO4FLJAIUB3EEXFBZ3HGWFI3ACU5/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NL2E4DHBNVKN2TP2GEFII4NQCZYEG5PE/


[ovirt-users] Re: Libgfapi considerations

2019-12-16 Thread Darrell Budic
I use libgfap in production, the performance is worth a couple of quirks for me.

- watch major version updates, they’ll silently turn it off because the engine 
starts using a new version variable
- VM/qemu security quirk that resets ownership when the VM quits, was 
supposedly fixed in 4.3.6 but I still have it happen to me, a cron’d chown 
keeps it under control for me
- some VMs cause a libvirt/vdsmd interaction that results in failed stats 
query, and the engine thinks my VMs are offline because the stats gathering is 
stuck. hoped a bug fix in 4.3.6 would take care of this too, but didn’t. may be 
my VMs though, still analyzing for specific file issues

I need to spend some time doing a little more research and filing/updating some 
bug reports, but it’s been a busy end of year so far…

  -Darrell

> On Dec 14, 2019, at 5:47 PM, Strahil Nikolov  wrote:
> 
> According to GlusterFS Storage Domain 
> 
>  
> the feature is not the default as it is incompatible with Live Storage 
> Migration.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> В събота, 14 декември 2019 г., 17:06:32 ч. Гринуич+2, Jayme 
>  написа:
> 
> 
> Are there currently any known issues with using libgfapi in the latest stable 
> version of ovirt in hci deployments?  I have recently enabled it and have 
> noticed a significant (over 4x) increase in io performance on my vms. I’m 
> concerned however since it does not seem to be an ovirt default setting.  Is 
> libgfapi considered safe and stable to use in ovirt 4.3 hci?
> ___
> Users mailing list -- users@ovirt.org 
> To unsubscribe send an email to users-le...@ovirt.org 
> 
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ 
> 
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/ 
> 
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/FYVTG3NUIXE5LJBBVEGGKHQFOGKJ5CU2/
>  
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/NDZD4W5UYYR6MROFS2OS5HLZCUIJUVIJ/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6KAWATJCAONOXE2HSLPXKC4YB23JE3KA/


[ovirt-users] Re: Ovirt instance having RTO in every 10 minutes

2019-11-15 Thread Darrell Budic
Every 10 minutes & ping loss sounds like your hosted engine may be being 
restarted by the monitoring agents. Check the Hosted Engine uptime, is it < 
10m? Then check the ovirt-HA-agent logs on your hosts and see if you can tell 
why it’s restarting and correct the issue.

> On Nov 15, 2019, at 5:10 AM, Crazy Ayansh  
> wrote:
> 
> Hi Team,
> 
> I am using ovirt hosted engine 4.3.6.6-1 the newest one but it seems to me 
> it's not stable as my hosted engine VM stopped pining in every 10 minutes and 
> my web console get disconnected every time.
> 
> 
> See in the above snapshot in every 10 minutes Data Center status get Non 
> responsive. could any one help out here.
> 
> 
> Thanks
> Shashank
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ABW3C7TZFONCZPJXKTVN2WP2VWB2CYGL/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LWXQSCSNZ75556SEWDGCGHARCCVZML7S/


[ovirt-users] Re: Linux VM keeps crashing on copying files "with lost communication with qemu"

2019-08-02 Thread Darrell Budic
I’ve been seeing similar issues lately (since upgrading to 4.3.4 and later) 
with gluster storage (and libgfapi), but haven’t pinned to anything particular 
yet. But libvirt seems to have some issues and vdsmd stops being able to poll 
disk usage. My VMs don’t crash or stop working, but they do report as ? in the 
management interface and can’t be migrated. Current work around is to restart 
libvirtd then vdsmd, occasionally restarting libvirtd restarts a VM (presumably 
the one that had broken some disk something…).

Sorry I don’t have more info to go on, but I’ll update here if I get more on 
anything similar.

  -Darrell

> On Aug 2, 2019, at 9:23 AM, kevin.do...@manchester.ac.uk wrote:
> 
> HI
> I think it is a bug with qemu, and isci disks. Has anyone else seen this 
> issue VM crashes and lost communication with qemu Can I update qemu ? if so 
> what is the latest support version to use on Ovirt 4.3.2.1-1.el7 Current 
> versions of qemu installed
> 
> Installed Packages
> qemu-img-ev.x86_64 10:2.12.0-18.el7_6.3.1
> qemu-kvm-common-ev.x86_64 10:2.12.0-18.el7_6.3.1
> qemu-kvm-ev.x86_64 10:2.12.0-18.el7_6.3.1
> 
> Many thanks
> Kevin
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/FDMHYEJXRG6IGB3E3C7O3NL34WO2XG43/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZAOILBQRM7XPYU5VQMPWXULCFHS3FQEB/


[ovirt-users] Re: iptables with 4.3+?

2019-07-04 Thread Darrell Budic
I’m in the same boat, puppet managing iptables rules, and was able to continue 
forcing it on my 4.3.x ovirt systems. Engine-setup complains all the time, but 
so far it hasn’t broken anything.

  -Darrell


> On Jul 4, 2019, at 9:38 AM, Jordan Conway  wrote:
> 
> Hello,
> I'm working on migrating an existing ovirt setup to a new hosted-engine setup 
> and I've been seeing messages about iptables support being deprecated and 
> slated to be removed.
> Can I continue using iptables to manage the firewalls on my ovirt hosts if I 
> don't care about allowing ovirt to configure the firewalls?
> We manage all of our machines with puppet and iptables is deeply integrated 
> into this. It would be non-trivial to migrate to firewalld support.
> As it stands I already manage the firewall rules for our ovirt hosts with 
> puppet and iptables and have always ignored the "Automatically Configure 
> Firewall" option when adding new hosts. Will this continue to work?
> 
> Also with hosted engine, I had to cowboy enable firewalld to get the engine 
> installed, but now that I've got a cluster up and running with hosted engine 
> enabled on several hosts, can I just switch back from firewalld to iptables 
> assuming I've got all the correct ports open?
> 
> Thank you,
> Jordan Conway
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/CFKUWD44EKAOGHSR5PBOC5CL5YMXZCR4/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7HKXXY6KFVICSGFYAPTKTYPRUWCF35FU/


[ovirt-users] Re: Memory ballon question

2019-06-13 Thread Darrell Budic
Hum… Make sure mom is running on the host then, you should see it using CPU if 
it’s doing it’s job, maybe ksmd if you have same page merging enabled.. Note 
that you will see swap happen on the host still, stuff is going to get pushed 
out of memory even if it ballooning is happening. You should also see the guest 
agent using more memory on the VMs, you won’t actually see their memory 
available change, just the agent having a larger locked allocation.

> On Jun 13, 2019, at 12:05 AM, Strahil  wrote:
> 
> Hi Darrell,
> 
> Yes , all VMs (both openSUSE and RedHat/CentOS 7) have the ovirt-guest-agent 
> up and running.
> 
> Best Regards,
> Strahil Nikolov
> 
> On Jun 12, 2019 22:07, Darrell Budic  wrote:
> Do you have the overt-guest-agent running on your VMs? It’s required for 
> ballooning to control allocations on the guest side.
> 
> On Jun 12, 2019, at 11:32 AM, Strahil  <mailto:hunter86...@yahoo.com>> wrote:
> 
> Hello All,
> 
> as a KVM user I know how usefull is the memory balloon and how you can both 
> increase - and also decrease memory live (both Linux & Windows).
> I have noticed that I cannot decrease the memory in oVirt.
> 
> Does anyone got a clue why the situation is like that ?
> 
> I was expecting that the guaranteed memory is the minimum to which the 
> balloon driver will not go bellow, but when I put my host under pressure - 
> the host just started to swap instead of reducing some of the VM memory (and 
> my VMs had plenty of free space).
> 
> It will be great if oVirt can decrease the memory (if the VM has unallocated 
> memory) when the host is under pressure and the VM cannot be relocated.
> 
> Best Regards,
> Strahil Nikolov
> 
> ___
> Users mailing list -- users@ovirt.org <mailto:users@ovirt.org>
> To unsubscribe send an email to users-le...@ovirt.org 
> <mailto:users-le...@ovirt.org>
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ 
> <https://www.ovirt.org/site/privacy-policy/>
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/ 
> <https://www.ovirt.org/community/about/community-guidelines/>
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LUWCN2MLNTDJUEZBCTVXFMVABGPUSEFH/
>  
> <https://lists.ovirt.org/archives/list/users@ovirt.org/message/LUWCN2MLNTDJUEZBCTVXFMVABGPUSEFH/>
> 

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ERRM7NDXR4SKEPQMRCKPELMUKRU4WGCJ/


[ovirt-users] Re: Memory ballon question

2019-06-12 Thread Darrell Budic
Do you have the overt-guest-agent running on your VMs? It’s required for 
ballooning to control allocations on the guest side.

> On Jun 12, 2019, at 11:32 AM, Strahil  wrote:
> 
> Hello All,
> 
> as a KVM user I know how usefull is the memory balloon and how you can both 
> increase - and also decrease memory live (both Linux & Windows).
> I have noticed that I cannot decrease the memory in oVirt.
> 
> Does anyone got a clue why the situation is like that ?
> 
> I was expecting that the guaranteed memory is the minimum to which the 
> balloon driver will not go bellow, but when I put my host under pressure - 
> the host just started to swap instead of reducing some of the VM memory (and 
> my VMs had plenty of free space).
> 
> It will be great if oVirt can decrease the memory (if the VM has unallocated 
> memory) when the host is under pressure and the VM cannot be relocated.
> 
> Best Regards,
> Strahil Nikolov
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LUWCN2MLNTDJUEZBCTVXFMVABGPUSEFH/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/22XYJD7XAYZLVYCJUB6TW3RZ5VJFJ3ET/


[ovirt-users] Re: [ovirt-announce] Re: [ANN] oVirt 4.3.4 First Release Candidate is now available

2019-05-20 Thread Darrell Budic
Wow, I think Strahil and i both hit different edge cases on this one. I was 
running that on my test cluster with a ZFS backed brick, which does not support 
O_DIRECT (in the current version, 0.8 will, when it’s released). I tested on a 
XFS backed brick with gluster virt group applied and network.remote-dio 
disabled and ovirt was able to create the storage volume correctly. So not a 
huge problem for most people, I imagine.

Now I’m curious about the apparent disconnect between gluster and ovirt though. 
Since the gluster virt group sets network.remote-dio on, what’s the reasoning 
behind disabling it for these tests?

> On May 18, 2019, at 11:44 PM, Sahina Bose  wrote:
> 
> 
> 
> On Sun, 19 May 2019 at 12:21 AM, Nir Soffer  <mailto:nsof...@redhat.com>> wrote:
> On Fri, May 17, 2019 at 7:54 AM Gobinda Das  <mailto:go...@redhat.com>> wrote:
> From RHHI side default we are setting below volume options:
> 
> { group: 'virt',
>  storage.owner-uid: '36',
>  storage.owner-gid: '36',
>  network.ping-timeout: '30',
>  performance.strict-o-direct: 'on',
>  network.remote-dio: 'off'
> 
> According to the user reports, this configuration is not compatible with 
> oVirt.
> 
> Was this tested?
> 
> Yes, this is set by default in all test configuration. We’re checking on the 
> bug, but the error is likely when the underlying device does not support 512b 
> writes. 
> With network.remote-dio off gluster will ensure o-direct writes
> 
>}
> 
> 
> On Fri, May 17, 2019 at 2:31 AM Strahil Nikolov  <mailto:hunter86...@yahoo.com>> wrote:
> Ok, setting 'gluster volume set data_fast4 network.remote-dio on' allowed me 
> to create the storage domain without any issues.
> I set it on all 4 new gluster volumes and the storage domains were 
> successfully created.
> 
> I have created bug for that:
> https://bugzilla.redhat.com/show_bug.cgi?id=1711060 
> <https://bugzilla.redhat.com/show_bug.cgi?id=1711060>
> 
> If someone else already opened - please ping me to mark this one as duplicate.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> В четвъртък, 16 май 2019 г., 22:27:01 ч. Гринуич+3, Darrell Budic 
> mailto:bu...@onholyground.com>> написа:
> 
> 
> On May 16, 2019, at 1:41 PM, Nir Soffer  <mailto:nsof...@redhat.com>> wrote:
> 
>> 
>> On Thu, May 16, 2019 at 8:38 PM Darrell Budic > <mailto:bu...@onholyground.com>> wrote:
>> I tried adding a new storage domain on my hyper converged test cluster 
>> running Ovirt 4.3.3.7 and gluster 6.1. I was able to create the new gluster 
>> volume fine, but it’s not able to add the gluster storage domain (as either 
>> a managed gluster volume or directly entering values). The created gluster 
>> volume mounts and looks fine from the CLI. Errors in VDSM log:
>> 
>> ... 
>> 2019-05-16 10:25:09,584-0500 ERROR (jsonrpc/5) [storage.fileSD] Underlying 
>> file system doesn't supportdirect IO (fileSD:110)
>> 2019-05-16 10:25:09,584-0500 INFO  (jsonrpc/5) [vdsm.api] FINISH 
>> createStorageDomain error=Storage Domain target is unsupported: () 
>> from=:::10.100.90.5,44732, flow_id=31d993dd, 
>> task_id=ecea28f3-60d4-476d-9ba8-b753b7c9940d (api:52)
>> 
>> The direct I/O check has failed.
>> 
>> 
>> So something is wrong in the files system.
>> 
>> To confirm, you can try to do:
>> 
>> dd if=/dev/zero of=/path/to/mountoint/test bs=4096 count=1 oflag=direct
>> 
>> This will probably fail with:
>> dd: failed to open '/path/to/mountoint/test': Invalid argument
>> 
>> If it succeeds, but oVirt fail to connect to this domain, file a bug and we 
>> will investigate.
>> 
>> Nir
> 
> Yep, it fails as expected. Just to check, it is working on pre-existing 
> volumes, so I poked around at gluster settings for the new volume. It has 
> network.remote-dio=off set on the new volume, but enabled on old volumes. 
> After enabling it, I’m able to run the dd test:
> 
> [root@boneyard mnt]# gluster vol set test network.remote-dio enable
> volume set: success
> [root@boneyard mnt]# dd if=/dev/zero of=testfile bs=4096 count=1 oflag=direct
> 1+0 records in
> 1+0 records out
> 4096 bytes (4.1 kB) copied, 0.0018285 s, 2.2 MB/s
> 
> I’m also able to add the storage domain in ovirt now.
> 
> I see network.remote-dio=enable is part of the gluster virt group, so 
> apparently it’s not getting set by ovirt duding the volume creation/optimze 
> for storage?
> 
> 
> 
> ___
> Users mailing list -- users@ovirt.org <mailto:users@ovirt.org>
> To unsubscribe send an email to users-le...@ovir

[ovirt-users] Re: [ovirt-announce] Re: [ANN] oVirt 4.3.4 First Release Candidate is now available

2019-05-16 Thread Darrell Budic
https://bugzilla.redhat.com/show_bug.cgi?id=1711054



> On May 16, 2019, at 2:17 PM, Nir Soffer  wrote:
> 
> On Thu, May 16, 2019 at 10:12 PM Darrell Budic  <mailto:bu...@onholyground.com>> wrote:
> On May 16, 2019, at 1:41 PM, Nir Soffer  <mailto:nsof...@redhat.com>> wrote:
>> 
>> On Thu, May 16, 2019 at 8:38 PM Darrell Budic > <mailto:bu...@onholyground.com>> wrote:
>> I tried adding a new storage domain on my hyper converged test cluster 
>> running Ovirt 4.3.3.7 and gluster 6.1. I was able to create the new gluster 
>> volume fine, but it’s not able to add the gluster storage domain (as either 
>> a managed gluster volume or directly entering values). The created gluster 
>> volume mounts and looks fine from the CLI. Errors in VDSM log:
>> 
>> ... 
>> 2019-05-16 10:25:09,584-0500 ERROR (jsonrpc/5) [storage.fileSD] Underlying 
>> file system doesn't supportdirect IO (fileSD:110)
>> 2019-05-16 10:25:09,584-0500 INFO  (jsonrpc/5) [vdsm.api] FINISH 
>> createStorageDomain error=Storage Domain target is unsupported: () 
>> from=:::10.100.90.5,44732, flow_id=31d993dd, 
>> task_id=ecea28f3-60d4-476d-9ba8-b753b7c9940d (api:52)
>> 
>> The direct I/O check has failed.
>> 
>> 
>> So something is wrong in the files system.
>> 
>> To confirm, you can try to do:
>> 
>> dd if=/dev/zero of=/path/to/mountoint/test bs=4096 count=1 oflag=direct
>> 
>> This will probably fail with:
>> dd: failed to open '/path/to/mountoint/test': Invalid argument
>> 
>> If it succeeds, but oVirt fail to connect to this domain, file a bug and we 
>> will investigate.
>> 
>> Nir
> 
> Yep, it fails as expected. Just to check, it is working on pre-existing 
> volumes, so I poked around at gluster settings for the new volume. It has 
> network.remote-dio=off set on the new volume, but enabled on old volumes. 
> After enabling it, I’m able to run the dd test:
> 
> [root@boneyard mnt]# gluster vol set test network.remote-dio enable
> volume set: success
> [root@boneyard mnt]# dd if=/dev/zero of=testfile bs=4096 count=1 oflag=direct
> 1+0 records in
> 1+0 records out
> 4096 bytes (4.1 kB) copied, 0.0018285 s, 2.2 MB/s
> 
> I’m also able to add the storage domain in ovirt now.
> 
> I see network.remote-dio=enable is part of the gluster virt group, so 
> apparently it’s not getting set by ovirt duding the volume creation/optimze 
> for storage?
> 
> I'm not sure who is responsible for changing these settings. oVirt always 
> required directio, and we
> never had to change anything in gluster.
> 
> Sahina, maybe gluster changed the defaults?
> 
> Darrell, please file a bug, probably for RHHI.
> 
> Nir

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/66BVZIQJVEP2Q3H5HQ5QAQIGLCMF6XZG/


[ovirt-users] Re: [ovirt-announce] Re: [ANN] oVirt 4.3.4 First Release Candidate is now available

2019-05-16 Thread Darrell Budic
On May 16, 2019, at 1:41 PM, Nir Soffer  wrote:
> 
> On Thu, May 16, 2019 at 8:38 PM Darrell Budic  <mailto:bu...@onholyground.com>> wrote:
> I tried adding a new storage domain on my hyper converged test cluster 
> running Ovirt 4.3.3.7 and gluster 6.1. I was able to create the new gluster 
> volume fine, but it’s not able to add the gluster storage domain (as either a 
> managed gluster volume or directly entering values). The created gluster 
> volume mounts and looks fine from the CLI. Errors in VDSM log:
> 
> ... 
> 2019-05-16 10:25:09,584-0500 ERROR (jsonrpc/5) [storage.fileSD] Underlying 
> file system doesn't supportdirect IO (fileSD:110)
> 2019-05-16 10:25:09,584-0500 INFO  (jsonrpc/5) [vdsm.api] FINISH 
> createStorageDomain error=Storage Domain target is unsupported: () 
> from=:::10.100.90.5,44732, flow_id=31d993dd, 
> task_id=ecea28f3-60d4-476d-9ba8-b753b7c9940d (api:52)
> 
> The direct I/O check has failed.
> 
> 
> So something is wrong in the files system.
> 
> To confirm, you can try to do:
> 
> dd if=/dev/zero of=/path/to/mountoint/test bs=4096 count=1 oflag=direct
> 
> This will probably fail with:
> dd: failed to open '/path/to/mountoint/test': Invalid argument
> 
> If it succeeds, but oVirt fail to connect to this domain, file a bug and we 
> will investigate.
> 
> Nir

Yep, it fails as expected. Just to check, it is working on pre-existing 
volumes, so I poked around at gluster settings for the new volume. It has 
network.remote-dio=off set on the new volume, but enabled on old volumes. After 
enabling it, I’m able to run the dd test:

[root@boneyard mnt]# gluster vol set test network.remote-dio enable
volume set: success
[root@boneyard mnt]# dd if=/dev/zero of=testfile bs=4096 count=1 oflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.0018285 s, 2.2 MB/s

I’m also able to add the storage domain in ovirt now.

I see network.remote-dio=enable is part of the gluster virt group, so 
apparently it’s not getting set by ovirt duding the volume creation/optimze for 
storage?



___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OPBXHYOHZA4XR5CHU7KMD2ISQWLFRG5N/


[ovirt-users] Re: [ovirt-announce] Re: [ANN] oVirt 4.3.4 First Release Candidate is now available

2019-05-16 Thread Darrell Budic
I tried adding a new storage domain on my hyper converged test cluster running 
Ovirt 4.3.3.7 and gluster 6.1. I was able to create the new gluster volume 
fine, but it’s not able to add the gluster storage domain (as either a managed 
gluster volume or directly entering values). The created gluster volume mounts 
and looks fine from the CLI. Errors in VDSM log:

2019-05-16 10:25:08,158-0500 INFO  (jsonrpc/1) [vdsm.api] START 
connectStorageServer(domType=7, spUUID=u'----', 
conList=[{u'mnt_options': u'backup-volfile-servers=10.50.3.11:10.50.3.10', 
u'id': u'----', u'connection': 
u'10.50.3.12:/test', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'ipv6_enabled': 
u'false', u'vfs_type': u'glusterfs', u'password': '', u'port': u''}], 
options=None) from=:::10.100.90.5,44732, 
flow_id=fcde45c4-3b03-4a85-818a-06be560edee4, 
task_id=0582219d-ce68-4951-8fbd-3dce6d102fca (api:48)
2019-05-16 10:25:08,306-0500 INFO  (jsonrpc/1) 
[storage.StorageServer.MountConnection] Creating directory 
u'/rhev/data-center/mnt/glusterSD/10.50.3.12:_test' (storageServer:168)
2019-05-16 10:25:08,306-0500 INFO  (jsonrpc/1) [storage.fileUtils] Creating 
directory: /rhev/data-center/mnt/glusterSD/10.50.3.12:_test mode: None 
(fileUtils:199)
2019-05-16 10:25:08,306-0500 WARN  (jsonrpc/1) 
[storage.StorageServer.MountConnection] Using user specified 
backup-volfile-servers option (storageServer:275)
2019-05-16 10:25:08,306-0500 INFO  (jsonrpc/1) [storage.Mount] mounting 
10.50.3.12:/test at /rhev/data-center/mnt/glusterSD/10.50.3.12:_test (mount:204)
2019-05-16 10:25:08,453-0500 INFO  (jsonrpc/1) [IOProcessClient] (Global) 
Starting client (__init__:308)
2019-05-16 10:25:08,460-0500 INFO  (ioprocess/5389) [IOProcess] (Global) 
Starting ioprocess (__init__:434)
2019-05-16 10:25:08,473-0500 INFO  (itmap/0) [IOProcessClient] 
(/glusterSD/10.50.3.12:_test) Starting client (__init__:308)
2019-05-16 10:25:08,481-0500 INFO  (ioprocess/5401) [IOProcess] 
(/glusterSD/10.50.3.12:_test) Starting ioprocess (__init__:434)
2019-05-16 10:25:08,484-0500 INFO  (jsonrpc/1) [vdsm.api] FINISH 
connectStorageServer return={'statuslist': [{'status': 0, 'id': 
u'----'}]} from=:::10.100.90.5,44732, 
flow_id=fcde45c4-3b03-4a85-818a-06be560edee4, 
task_id=0582219d-ce68-4951-8fbd-3dce6d102fca (api:54)
2019-05-16 10:25:08,484-0500 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC call 
StoragePool.connectStorageServer succeeded in 0.33 seconds (__init__:312)

2019-05-16 10:25:09,169-0500 INFO  (jsonrpc/7) [vdsm.api] START 
connectStorageServer(domType=7, spUUID=u'----', 
conList=[{u'mnt_options': u'backup-volfile-servers=10.50.3.11:10.50.3.10', 
u'id': u'd0ab6b05-2486-40f0-9b15-7f150017ec12', u'connection': 
u'10.50.3.12:/test', u'iqn': u'', u'user': u'', u'tpgt': u'1', u'ipv6_enabled': 
u'false', u'vfs_type': u'glusterfs', u'password': '', u'port': u''}], 
options=None) from=:::10.100.90.5,44732, flow_id=31d993dd, 
task_id=9eb2f42c-852d-4af6-ae4e-f65d8283d6e0 (api:48)
2019-05-16 10:25:09,180-0500 INFO  (jsonrpc/7) [vdsm.api] FINISH 
connectStorageServer return={'statuslist': [{'status': 0, 'id': 
u'd0ab6b05-2486-40f0-9b15-7f150017ec12'}]} from=:::10.100.90.5,44732, 
flow_id=31d993dd, task_id=9eb2f42c-852d-4af6-ae4e-f65d8283d6e0 (api:54)
2019-05-16 10:25:09,180-0500 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call 
StoragePool.connectStorageServer succeeded in 0.01 seconds (__init__:312)
2019-05-16 10:25:09,186-0500 INFO  (jsonrpc/5) [vdsm.api] START 
createStorageDomain(storageType=7, 
sdUUID=u'4037f461-2b6d-452f-8156-fcdca820a8a1', domainName=u'gTest', 
typeSpecificArg=u'10.50.3.12:/test', domClass=1, domVersion=u'4', 
block_size=512, max_hosts=250, options=None) from=:::10.100.90.5,44732, 
flow_id=31d993dd, task_id=ecea28f3-60d4-476d-9ba8-b753b7c9940d (api:48)
2019-05-16 10:25:09,492-0500 WARN  (jsonrpc/5) [storage.LVM] Reloading VGs 
failed (vgs=[u'4037f461-2b6d-452f-8156-fcdca820a8a1'] rc=5 out=[] err=['  
Volume group "4037f461-2b6d-452f-8156-fcdca820a8a1" not found', '  Cannot 
process volume group 4037f461-2b6d-452f-8156-fcdca820a8a1']) (lvm:442)
2019-05-16 10:25:09,507-0500 INFO  (jsonrpc/5) [storage.StorageDomain] 
sdUUID=4037f461-2b6d-452f-8156-fcdca820a8a1 domainName=gTest 
remotePath=10.50.3.12:/test domClass=1, block_size=512, alignment=1048576 
(nfsSD:86)
2019-05-16 10:25:09,521-0500 INFO  (jsonrpc/5) [IOProcessClient] 
(4037f461-2b6d-452f-8156-fcdca820a8a1) Starting client (__init__:308)
2019-05-16 10:25:09,528-0500 INFO  (ioprocess/5437) [IOProcess] 
(4037f461-2b6d-452f-8156-fcdca820a8a1) Starting ioprocess (__init__:434)
2019-05-16 10:25:09,584-0500 ERROR (jsonrpc/5) [storage.fileSD] Underlying file 
system doesn't supportdirect IO (fileSD:110)
2019-05-16 10:25:09,584-0500 INFO  (jsonrpc/5) [vdsm.api] FINISH 
createStorageDomain error=Storage Domain target is unsupported: () 

[ovirt-users] Re: Dropped RX Packets

2019-05-16 Thread Darrell Budic
Check your host for dropped packets as well. I had found that some of my older 
10G cards were setting smaller buffers than they could, and using ethtool to 
set tx and rx buffers to their max values significantly improved things for 
those cards. And look at your switch to be sure it/they are not dropping 
packets for some reason. 

If you’re using dual 10g links, how do you have them configured on the host?

> On May 16, 2019, at 9:38 AM, Oliver Riesener  
> wrote:
> 
> Hi Magnus,
> 
> I've had a bad **virtual** network card three times in the last five years. 
> Yes it' possible.
> 
> I my case, NFS services didn't work as expected, but other services were ok.
> 
> Today if this would happen again, i unplug and replug the VM nic. Like:
> 
> GUI::Compute::VirtualMachines::VMname::Network Interfaces::nicN
> -> Edit CardStatus -> Unplugged :: OK
> -> Edit CardStatus -> Plugged :: OK
> 
> HTH
> 
> Oliver
> 
> On 16.05.19 15:17, Magnus Isaksson wrote:
>> Hello all!
>> 
>> I'm having quite some trouble with VMs that have a large amount of dropped 
>> packets on RX.
>> This, plus customers complain about short dropped connections, for example 
>> one customer has a SQL server and an other serevr connecting to it, and it 
>> is randomly dropping connections. Before they moved their VM:s to us they 
>> did not have any of these issues.
>> 
>> Does anyone have an idea of what this can be due to? And how can i fix it? 
>> It is starting to be a deal breaker for our customers on whether they will 
>> stay with us or not.
>> 
>> I was thinking of reinstalling the nodes with oVirt Node, instead of the 
>> full CentOS, would this perhaps fix the issue?
>> 
>> The enviroment is:
>> Huawei x6000 with 4 nodes
>> Each node having Intel X722 network card and connecting with 10G (fiber) to 
>> a Juniper EX 4600. Storage via FC to a IBM FS900.
>> Each node is running a full CentOS 7.6 connecting to a Engine 4.2.8.2
>> 
>> Regards
>>  Magnus
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QXGQSKYBUCFPDCBIQVAAZAWFQX54A2BD/
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/FQXYN3P2QD727ZKGCNDZCCOOJVJU52DU/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WDQLQVXY3YIPFB4NGK4QVKQU7XXWOV7Z/


[ovirt-users] Re: Gluster volume heals and after 5 seconds has /dom_md/ids dirty again

2019-05-14 Thread Darrell Budic
Yep, so far so good. Feels like 3.12.15 again, stability wise ;)


> On May 14, 2019, at 5:28 AM, Strahil  wrote:
> 
> Hi Darrel,
> Is ovirrt dealing  OK with Gluster 6.X  ?
> 
> Best Regards,
> Strahil NikolovOn May 13, 2019 18:37, Darrell Budic  
> wrote:
>> 
>> I encountered serious issues with 5.3-5.5 (crashing bricks, multiple brick 
>> processes for the same brick causing disconnects and excessive heals). I had 
>> better luck with 5.6, although it’s not clear to me if the duplicate brick 
>> process issue is still present in that version. I finally jumped to 6 which 
>> has been more stable for me. I’d recommend upgrading at least to 5.6 if not 
>> going right to 6.1. 
>> 
>>> On May 13, 2019, at 10:30 AM, Andreas Elvers 
>>>  wrote: 
>>> 
>>>> What version of gluster are you running at the moment? 
>>> 
>>> I'm running glusterfs-5.5-1.el7.x86_64 the one that comes with oVirt Node 
>>> 4.3.3.1 
>>> ___ 
>>> Users mailing list -- users@ovirt.org 
>>> To unsubscribe send an email to users-le...@ovirt.org 
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ 
>>> oVirt Code of Conduct: 
>>> https://www.ovirt.org/community/about/community-guidelines/ 
>>> List Archives: 
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OPMOUE5FQRYF4Q6ZMPBDFM6EYAXRNM44/
>>>  
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/MDCYZ6JUMJHKNTVCQWCCHG5WMI4DNRAW/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/EAR7HRS3J6NXECYWMMYXVCCXMGADBZEM/


[ovirt-users] Re: Gluster volume heals and after 5 seconds has /dom_md/ids dirty again

2019-05-13 Thread Darrell Budic
Ah, must be more fixed than I thought. I don’t have a NodeNG setup to examine, 
so I’m afraid I won’t have many more suggestions. 

> On May 13, 2019, at 11:29 AM, Jayme  wrote:
> 
> I use node NG as well, I just updated to 4.3.3 two days ago and I'm on 
> Gluster 5.5.  Yum update on host node yields no updates available
> 
> On Mon, May 13, 2019 at 1:03 PM Darrell Budic  <mailto:bu...@onholyground.com>> wrote:
> Ovirt just pulls in the gluster5 repos, if you upgrade now you should get 
> gluster 5.6 on your nodes. If you’re running them on centos, you can install 
> centos-release-gluster6 to go to gluster6. Ovirt NodeNG is a different story, 
> as you mention, but I believe you can still run an update on it to get the 
> latest gluster version?
> 
> Those recommendations are based on my personal experience, but see also:
> https://bugzilla.redhat.com/show_bug.cgi?id=1683602 
> <https://bugzilla.redhat.com/show_bug.cgi?id=1683602>
> https://bugzilla.redhat.com/show_bug.cgi?id=1677319 
> <https://bugzilla.redhat.com/show_bug.cgi?id=1677319>
> 
> 
>> On May 13, 2019, at 10:47 AM, Andreas Elvers 
>> > <mailto:andreas.elvers+ovirtfo...@solutions.work>> wrote:
>> 
>> Please note that I am running a hyper-converged NodeNG setup. I understand 
>> that upgrading single components is not really possible with a ovirt Node 
>> NG. And could probably break the datacenter upgrade path.
>> 
>> Could you point out some reference for your suggestions? Docs, Bug reports 
>> or the sorts?
>> 
>>> I encountered serious issues with 5.3-5.5 (crashing bricks, multiple brick 
>>> processes for
>>> the same brick causing disconnects and excessive heals). I had better luck 
>>> with 5.6,
>>> although it’s not clear to me if the duplicate brick process issue is still 
>>> present in
>>> that version. I finally jumped to 6 which has been more stable for me. I’d 
>>> recommend
>>> upgrading at least to 5.6 if not going right to 6.1.
>> ___
>> Users mailing list -- users@ovirt.org <mailto:users@ovirt.org>
>> To unsubscribe send an email to users-le...@ovirt.org 
>> <mailto:users-le...@ovirt.org>
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ 
>> <https://www.ovirt.org/site/privacy-policy/>
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/ 
>> <https://www.ovirt.org/community/about/community-guidelines/>
>> List Archives: 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BFOB6QEST6EAGEZGSCM4GO7BFRUYCKEI/
>>  
>> <https://lists.ovirt.org/archives/list/users@ovirt.org/message/BFOB6QEST6EAGEZGSCM4GO7BFRUYCKEI/>
> 
> ___
> Users mailing list -- users@ovirt.org <mailto:users@ovirt.org>
> To unsubscribe send an email to users-le...@ovirt.org 
> <mailto:users-le...@ovirt.org>
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ 
> <https://www.ovirt.org/site/privacy-policy/>
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/ 
> <https://www.ovirt.org/community/about/community-guidelines/>
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5HKFS376DJVADIICLLVJSRJKXE73EBZC/
>  
> <https://lists.ovirt.org/archives/list/users@ovirt.org/message/5HKFS376DJVADIICLLVJSRJKXE73EBZC/>

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7HE6IFXA2RXKKQDSVAQGZYSC5EXRVLSD/


[ovirt-users] Re: Gluster volume heals and after 5 seconds has /dom_md/ids dirty again

2019-05-13 Thread Darrell Budic
Ovirt just pulls in the gluster5 repos, if you upgrade now you should get 
gluster 5.6 on your nodes. If you’re running them on centos, you can install 
centos-release-gluster6 to go to gluster6. Ovirt NodeNG is a different story, 
as you mention, but I believe you can still run an update on it to get the 
latest gluster version?

Those recommendations are based on my personal experience, but see also:
https://bugzilla.redhat.com/show_bug.cgi?id=1683602 

https://bugzilla.redhat.com/show_bug.cgi?id=1677319 



> On May 13, 2019, at 10:47 AM, Andreas Elvers 
>  wrote:
> 
> Please note that I am running a hyper-converged NodeNG setup. I understand 
> that upgrading single components is not really possible with a ovirt Node NG. 
> And could probably break the datacenter upgrade path.
> 
> Could you point out some reference for your suggestions? Docs, Bug reports or 
> the sorts?
> 
>> I encountered serious issues with 5.3-5.5 (crashing bricks, multiple brick 
>> processes for
>> the same brick causing disconnects and excessive heals). I had better luck 
>> with 5.6,
>> although it’s not clear to me if the duplicate brick process issue is still 
>> present in
>> that version. I finally jumped to 6 which has been more stable for me. I’d 
>> recommend
>> upgrading at least to 5.6 if not going right to 6.1.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BFOB6QEST6EAGEZGSCM4GO7BFRUYCKEI/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5HKFS376DJVADIICLLVJSRJKXE73EBZC/


[ovirt-users] Re: Gluster volume heals and after 5 seconds has /dom_md/ids dirty again

2019-05-13 Thread Darrell Budic
I encountered serious issues with 5.3-5.5 (crashing bricks, multiple brick 
processes for the same brick causing disconnects and excessive heals). I had 
better luck with 5.6, although it’s not clear to me if the duplicate brick 
process issue is still present in that version. I finally jumped to 6 which has 
been more stable for me. I’d recommend upgrading at least to 5.6 if not going 
right to 6.1.

> On May 13, 2019, at 10:30 AM, Andreas Elvers 
>  wrote:
> 
>> What version of gluster are you running at the moment?
> 
> I'm running glusterfs-5.5-1.el7.x86_64 the one that comes with oVirt Node 
> 4.3.3.1
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OPMOUE5FQRYF4Q6ZMPBDFM6EYAXRNM44/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MDCYZ6JUMJHKNTVCQWCCHG5WMI4DNRAW/


[ovirt-users] Re: Gluster volume heals and after 5 seconds has /dom_md/ids dirty again

2019-05-13 Thread Darrell Budic
What version of gluster are you running at the moment? 

> On May 13, 2019, at 10:25 AM, Andreas Elvers 
>  wrote:
> 
> Yes. After a reboot you could have a sync issue for up to a few hours. But 
> this issue persists now for 24 days. Additionally I see errors in the 
> glustershd.log of the two hosts that are having heal info for that volume. 
> The first node shows as OK and has no errors in its glustershd.log.
> 
> The errors are like this:
> 
> [2019-05-13 15:18:40.808945] W [MSGID: 114031] 
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-engine-client-1: remote 
> operation failed. Path:  
> (95ba9fb2-b0ae-436c-9c31-2779cf202235) [No such file or directory]
> [2019-05-13 15:18:40.809113] W [MSGID: 114031] 
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-engine-client-2: remote 
> operation failed. Path:  
> (95ba9fb2-b0ae-436c-9c31-2779cf202235) [No such file or directory]
> [root@node02 ~]#
> 
> Looks like the first node is sane and the other two are the masters but are 
> not so sane. :-/ 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4JROBW3YYGM65YNATOF7EHSMMA3H6NFL/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BSL2LHZTWYB2HWO5JZF2P2X72BPLTATI/


[ovirt-users] Re: Gluster volume heals and after 5 seconds has /dom_md/ids dirty again

2019-05-13 Thread Darrell Budic
I see this sometimes after rebooting a server, and it usually stops happening, 
generally within a few hours, I’ve never tracked it down further. Don’t know 
for sure, but I assume it’s related to healing and goes away once everything 
syncs up.

Occasionally it turns out to be a communications problem between servers 
(usually an update to something screws up my firewall), so I always check my 
peer status when I see it and make sure all servers are talking to each other.

> On May 13, 2019, at 4:13 AM, Andreas Elvers 
>  wrote:
> 
> I restored my engine to a gluster volume named :/engine on a three node 
> hyperconverged oVirt 4.3.3.1 cluster. Before restoring I was checking the 
> status of the volumes. They were clean. No heal entries. All peers connected. 
> gluster volume status looked good. Then I restored. This went well. The 
> engine is up. But the engine gluster volume shows entries on node02 and 
> node03. The engine was installed to node01. I have to deploy the engine to 
> the other two hosts to reach full HA, but I bet maintenance is not possible 
> until the volume is healed. 
> 
> I tried "gluster volume heal engine" also with added "full". The heal entries 
> will disappear for a few seconds and then /dom_md/ids will pop up again. The 
> __DIRECT_IO_TEST__ will join later. The split-brain info has no entries. Is 
> this some kind of hidden split brain? Maybe there is data on node01 brick 
> which got not synced to the other two nodes? I can only speculate. Gluster 
> docs say: this should heal. But it doesn't.  I have two other volumes. Those 
> are fine. One of them containing 3 VMs that are running. I also tried to shut 
> down the engine, so no-one was using the volume. Then heal. Same effect. 
> Those two files will always show up. But none other. Heal can always be 
> started successfully from any of the participating nodes.
> 
> Reset the volume bricks one by one and cross fingers? 
> 
> [root@node03 ~]#  gluster volume heal engine info
> Brick node01.infra.solutions.work:/gluster_bricks/engine/engine
> Status: Connected
> Number of entries: 0
> 
> Brick node02.infra.solutions.work:/gluster_bricks/engine/engine
> /9f4d5ae9-e01d-4b73-8b6d-e349279e9782/dom_md/ids
> /__DIRECT_IO_TEST__
> Status: Connected
> Number of entries: 2
> 
> Brick node03.infra.solutions.work:/gluster_bricks/engine/engine
> /9f4d5ae9-e01d-4b73-8b6d-e349279e9782/dom_md/ids
> /__DIRECT_IO_TEST__
> Status: Connected
> Number of entries: 2
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/L3YCRPRAGPUMBZIBFOPT6L4B7H4M6HLS/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6XOCRXRCQOUKE4RVK7PWDZHNU4EUAMQ6/


[ovirt-users] Re: Cluster Un-stable since power outage

2019-05-07 Thread Darrell Budic
Was your hyper converged and is this storage gluster based?

Your error is DNS related, if a bit odd. Have you checked the resolv.conf 
configs and confirmed the servers listed there are reachable and responsive? 
When your hosts are active, are they able to mount all the storage domains they 
need? You should also make sure each HA node can reliably ping your gateway IP, 
failures there will cause nodes to bounce.

A starting place rather a solution, but the first places to look. Good luck!

  -Darrell



> On May 7, 2019, at 5:14 AM, Alan G  wrote:
> 
> Hi,
> 
> We have a dev cluster running 4.2. It had to be powered down as the building 
> was going to loose power. Since we've brought it back up it has been 
> massively un-stable (Hosts constantly switching state, VMs migrating all the 
> time).
> 
> I now have one host running (with HE) and all others in maintenance mode. 
> When I try activate another host i see storage errors in vdsm.log
> 
> 2019-05-07 09:41:00,114+ ERROR (monitor/a98c0b4) [storage.Monitor] Error 
> checking domain a98c0b42-47b9-4632-8b54-0ff3bd80d4c2 (monitor:424)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line 416, 
> in _checkDomainStatus
> masterStats = self.domain.validateMaster()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 941, in 
> validateMaster
> if not self.validateMasterMount():
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/blockSD.py", line 1377, 
> in validateMasterMount
> return mount.isMounted(self.getMasterDir())
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 161, in 
> isMounted
> getMountFromTarget(target)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 173, in 
> getMountFromTarget
> for rec in _iterMountRecords():
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 143, in 
> _iterMountRecords
> for rec in _iterKnownMounts():
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 139, in 
> _iterKnownMounts
> yield _parseFstabLine(line)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 81, in 
> _parseFstabLine
> fs_spec = fileUtils.normalize_path(_unescape_spaces(fs_spec))
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileUtils.py", line 94, 
> in normalize_path
> host, tail = address.hosttail_split(path)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/network/address.py", 
> line 43, in hosttail_split
> raise HosttailError('%s is not a valid hosttail address:' % hosttail)
> HosttailError: :/ is not a valid hosttail address:
> 
> Not sure if it's related but since the restart the hosted_storage domain has 
> been elected the master domain.
> 
> I'm a bit stuck at the moment. My only idea is to remove HE and switch to a 
> standalone Engine VM running outside the cluster.
> 
> Thanks,
> 
> Alan
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/UDINZK5BQQHXYENSVV3OYFMVLG2YXBNT/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/I6YJQFP43R5NTQN3HG2VWBJW2WFFBGNB/


[ovirt-users] Re: Tuning Gluster Writes

2019-04-15 Thread Darrell Budic
Interesting. Who’s 10g cards and which offload settings did you disable? Did 
you do that on the servers or the vm host clients or both?

> On Apr 15, 2019, at 11:37 AM, Alex McWhirter  wrote:
>> I went in and disabled TCP offload on all the nics, huge performance boost. 
>> went from 110MB/s to 240MB/s seq writes, reads lost a bit of performance 
>> going down to 680MB/s, but that's a decent trade off. Latency is still 
>> really high though, need to work on that. I think some more TCP tuning might 
>> help.
>> 
>>  
> Those changes didn't do a whole lot, but i ended up enabling 
> performance.read-ahead on the gluster volume. my blockdev read ahead values 
> were already 8192, which seemed good enough. Not sure if ovirt set those, or 
> if it's just the defaults of my raid controller.
> 
> Anyways up to 350MB/s writes, 700MB/s reads. Which so happens to correlate 
> with the saturation of my 10G network. Latency is still a slight issue, but 
> at least now im not blocking :)
> 
>  
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5COPHAIVCVK42KMMGWZQVMNGDH6Q32ZC/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/T3QMRYHIDRZPUTW4QMGGVOCJ3S3VHLRY/


[ovirt-users] Re: [Gluster-users] Announcing Gluster release 5.5

2019-03-29 Thread Darrell Budic
I’ve also encounter multiple brick processes (glusterfsd) being spawned per 
brick directory on gluster 5.5 while upgrading from 3.12.15. In my case, it’s 
on a stand alone server cluster that doesn’t have ovirt installed, so it seems 
to be gluster itself. 

Haven’t had the chance to followup on some bug reports yet, but hopefully in 
the next day or so...

> On Mar 29, 2019, at 9:39 AM, Olaf Buitelaar  wrote:
> 
> Dear Krutika,
>  
> 1. I’ve made 2 profile runs of around 10 minutes (see files profile_data.txt 
> and profile_data2.txt). Looking at it, most time seems be spent at the  fop’s 
> fsync and readdirp.
> Unfortunate I don’t have the profile info for the 3.12.15 version so it’s a 
> bit hard to compare.
> One additional thing I do notice on 1 machine (10.32.9.5) the iowait time 
> increased a lot, from an average below the 1% it’s now around the 12% after 
> the upgrade.
> So first suspicion with be lighting strikes twice, and I’ve also just now a 
> bad disk, but that doesn’t appear to be the case, since all smart status 
> report ok.
> Also dd shows performance I would more or less expect;
> dd if=/dev/zero of=/data/test_file  bs=100M count=1  oflag=dsync
> 1+0 records in
> 1+0 records out
> 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s
> dd if=/dev/zero of=/data/test_file  bs=1G count=1  oflag=dsync
> 1+0 records in
> 1+0 records out
> 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s
> if=/dev/urandom of=/data/test_file  bs=1024 count=100
> 100+0 records in
> 100+0 records out
> 102400 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s
> dd if=/dev/zero of=/data/test_file  bs=1024 count=100
> 100+0 records in
> 100+0 records out
> 102400 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s
> When I disable this brick (service glusterd stop; pkill glusterfsd) 
> performance in gluster is better, but not on par with what it was. Also the 
> cpu usages on the “neighbor” nodes which hosts the other bricks in the same 
> subvolume increases quite a lot in this case, which I wouldn’t expect 
> actually since they shouldn't handle much more work, except flagging shards 
> to heal. Iowait  also goes to idle once gluster is stopped, so it’s for sure 
> gluster which waits for io.
>  
> 2. I’ve attached the mnt log and volume info, but I couldn’t find anything 
> relevant in in those logs. I think this is because we run the VM’s with 
> libgfapi;
> [root@ovirt-host-01 ~]# engine-config  -g LibgfApiSupported
> LibgfApiSupported: true version: 4.2
> LibgfApiSupported: true version: 4.1
> LibgfApiSupported: true version: 4.3
> And I can confirm the qemu process is invoked with the gluster:// address for 
> the images.
> The message is logged in the /var/lib/libvert/qemu/  file, which 
> I’ve also included. For a sample case see around; 2019-03-28 20:20:07
> Which has the error; E [MSGID: 133010] 
> [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on 
> shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c 
> [Stale file handle]
>  
> 3. yes I see multiple instances for the same brick directory, like;
> /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id 
> ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p 
> /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid
>  -S /var/run/gluster/452591c9165945d9.socket --brick-name 
> /data/gfs/bricks/brick1/ovirt-core -l 
> /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log 
> --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1 
> --process-name brick --brick-port 49154 --xlator-option 
> ovirt-core-server.listen-port=49154
>  
> I’ve made an export of the output of ps from the time I observed these 
> multiple processes.
> In addition the brick_mux bug as noted by Atin. I might also have another 
> possible cause, as ovirt moves nodes from none-operational state or 
> maintenance state to active/activating, it also seems to restart gluster, 
> however I don’t have direct proof for this theory.
>  
> Thanks Olaf
> 
> Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola  >:
> 
> 
> Il giorno gio 28 mar 2019 alle ore 17:48  > ha scritto:
> Dear All,
> 
> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While previous 
> upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a different 
> experience. After first trying a test upgrade on a 3 node setup, which went 
> fine. i headed to upgrade the 9 node production platform, unaware of the 
> backward compatibility issues between gluster 3.12.15 -> 5.3. After upgrading 
> 2 nodes, the HA engine stopped and wouldn't start. Vdsm wasn't able to mount 
> the engine storage domain, since /dom_md/metadata was missing or couldn't be 
> accessed. Restoring this file by getting a good copy of the underlying 
> bricks, removing the file from the underlying bricks where the file was 0 
> bytes and mark with the 

[ovirt-users] Re: [Gluster-users] Announcing Gluster release 5.5

2019-03-26 Thread Darrell Budic
Following up on this, my test/dev cluster is now completely upgraded to ovirt 
4.3.2-1 and gluster5.5 and I’ve bumped the op-version on the gluster volumes. 
It’s behaving normally and gluster is happy, no excessive healing or crashing 
bricks. 

I did encounter https://bugzilla.redhat.com/show_bug.cgi?id=1677160 
<https://bugzilla.redhat.com/show_bug.cgi?id=1677160> on my production cluster 
(with gluster 5.5 clients and 3.12.15 servers) and am proceeding to upgrade my 
gluster servers to 5.5 now that I’m happy with it on my dev cluster. A little 
quicker that I’d like, but it seems to be behaving and I was also in the middle 
of adding disk to my servers, and have to restart them (or at least gluster), 
so I’m going for it.

After I finish this, I’ll test gluster 6 out.

  -Darrell



> On Mar 25, 2019, at 11:04 AM, Darrell Budic  wrote:
> 
> I’m not quite done with my test upgrade to ovirt 4.3.x with gluster 5.5, but 
> so far it’s looking good. I have NOT encountered the upgrade bugs listed as 
> resolved in the 5.5 release notes. Strahil, I didn’t encounter the brick 
> death issue and don’t have a bug ID handy for it, but so far I haven’t had 
> any bricks die. I’m moving the last node of my hyperconverged test 
> environment over today, and will followup again tomorrow on it.
> 
> Separately, I upgraded my production nodes from ovirt 4.3.1 to 4.3.2 (they 
> have a separate gluster server cluster which is still on 3.12.15), which 
> seems to have moved to the gluster 5.3.2 release. While 5.3.0 clients were 
> not having any trouble talking to my 3.12.15 servers, 5.3.2 hit 
> https://bugzilla.redhat.com/show_bug.cgi?id=1651246 
> <https://bugzilla.redhat.com/show_bug.cgi?id=1651246>, causing disconnects to 
> one of my servers (but only one, oddly enough), raising the load on my other 
> two servers and causing a lot of continuous healing. This lead to some 
> stability issues with my hosted engine and general sluggishness of the ovirt 
> UI. I also experienced problems migrating from 4.3.1 nodes, but that seems to 
> have been related to the underlying gluster issues, as it seems to have 
> cleared up onceI resolved the gluster problems. Since I was testing gluster 
> 5.5 already, I moved my nodes to gluster 5.5 (instead of rolling them back) 
> as the bug above was resolved in that version. That did the trick, and my 
> cluster is back to normal and behaving properly again.
> 
> So my gluster 5.5 experience has been positive so far, and it looks like 5.3 
> is a version for laying down and avoiding. I’ll update again tomorrow, and 
> then flag the centos maintainers about 5.5 stability so it gets out of the 
> -testing repo if all continues to go well.
> 
>   -Darrell
> 
> 
>> On Mar 21, 2019, at 3:39 PM, Strahil > <mailto:hunter86...@yahoo.com>> wrote:
>> 
>> Hi Darrel,
>> 
>> Will it fix the cluster brick sudden death issue ?
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> On Mar 21, 2019 21:56, Darrell Budic > <mailto:bu...@onholyground.com>> wrote:
>> This release of Gluster 5.5 appears to fix the gluster 3.12->5.3 migration 
>> problems many ovirt users have encountered. 
>> 
>> I’ll try and test it out this weekend and report back. If anyone else gets a 
>> chance to check it out, let us know how it goes!
>> 
>>   -Darrell
>> 
>> Begin forwarded message:
>> 
>> From: Shyam Ranganathan mailto:srang...@redhat.com>>
>> Subject: [Gluster-users] Announcing Gluster release 5.5
>> Date: March 21, 2019 at 6:06:33 AM CDT
>> To: annou...@gluster.org <mailto:annou...@gluster.org>, gluster-users 
>> Discussion List > <mailto:gluster-us...@gluster.org>>
>> Cc: GlusterFS Maintainers > <mailto:maintain...@gluster.org>>
>> 
>> The Gluster community is pleased to announce the release of Gluster
>> 5.5 (packages available at [1]).
>> 
>> Release notes for the release can be found at [3].
>> 
>> Major changes, features and limitations addressed in this release:
>> 
>> - Release 5.4 introduced an incompatible change that prevented rolling
>> upgrades, and hence was never announced to the lists. As a result we are
>> jumping a release version and going to 5.5 from 5.3, that does not have
>> the problem.
>> 
>> Thanks,
>> Gluster community
>> 
>> [1] Packages for 5.5:
>> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/ 
>> <https://download.gluster.org/pub/gluster/glusterfs/5/5.5/>
>> 
>> [2] Release notes for 5.5:
>> https://docs.gluster.org/en/latest/release-notes/5.5 
>> <https://docs.gluster.org/en/latest/release-notes/5.5>/
>

[ovirt-users] Re: [Gluster-users] Announcing Gluster release 5.5

2019-03-25 Thread Darrell Budic
I’m not quite done with my test upgrade to ovirt 4.3.x with gluster 5.5, but so 
far it’s looking good. I have NOT encountered the upgrade bugs listed as 
resolved in the 5.5 release notes. Strahil, I didn’t encounter the brick death 
issue and don’t have a bug ID handy for it, but so far I haven’t had any bricks 
die. I’m moving the last node of my hyperconverged test environment over today, 
and will followup again tomorrow on it.

Separately, I upgraded my production nodes from ovirt 4.3.1 to 4.3.2 (they have 
a separate gluster server cluster which is still on 3.12.15), which seems to 
have moved to the gluster 5.3.2 release. While 5.3.0 clients were not having 
any trouble talking to my 3.12.15 servers, 5.3.2 hit 
https://bugzilla.redhat.com/show_bug.cgi?id=1651246 
<https://bugzilla.redhat.com/show_bug.cgi?id=1651246>, causing disconnects to 
one of my servers (but only one, oddly enough), raising the load on my other 
two servers and causing a lot of continuous healing. This lead to some 
stability issues with my hosted engine and general sluggishness of the ovirt 
UI. I also experienced problems migrating from 4.3.1 nodes, but that seems to 
have been related to the underlying gluster issues, as it seems to have cleared 
up onceI resolved the gluster problems. Since I was testing gluster 5.5 
already, I moved my nodes to gluster 5.5 (instead of rolling them back) as the 
bug above was resolved in that version. That did the trick, and my cluster is 
back to normal and behaving properly again.

So my gluster 5.5 experience has been positive so far, and it looks like 5.3 is 
a version for laying down and avoiding. I’ll update again tomorrow, and then 
flag the centos maintainers about 5.5 stability so it gets out of the -testing 
repo if all continues to go well.

  -Darrell


> On Mar 21, 2019, at 3:39 PM, Strahil  wrote:
> 
> Hi Darrel,
> 
> Will it fix the cluster brick sudden death issue ?
> 
> Best Regards,
> Strahil Nikolov
> 
> On Mar 21, 2019 21:56, Darrell Budic  wrote:
> This release of Gluster 5.5 appears to fix the gluster 3.12->5.3 migration 
> problems many ovirt users have encountered. 
> 
> I’ll try and test it out this weekend and report back. If anyone else gets a 
> chance to check it out, let us know how it goes!
> 
>   -Darrell
> 
> Begin forwarded message:
> 
> From: Shyam Ranganathan mailto:srang...@redhat.com>>
> Subject: [Gluster-users] Announcing Gluster release 5.5
> Date: March 21, 2019 at 6:06:33 AM CDT
> To: annou...@gluster.org <mailto:annou...@gluster.org>, gluster-users 
> Discussion List mailto:gluster-us...@gluster.org>>
> Cc: GlusterFS Maintainers  <mailto:maintain...@gluster.org>>
> 
> The Gluster community is pleased to announce the release of Gluster
> 5.5 (packages available at [1]).
> 
> Release notes for the release can be found at [3].
> 
> Major changes, features and limitations addressed in this release:
> 
> - Release 5.4 introduced an incompatible change that prevented rolling
> upgrades, and hence was never announced to the lists. As a result we are
> jumping a release version and going to 5.5 from 5.3, that does not have
> the problem.
> 
> Thanks,
> Gluster community
> 
> [1] Packages for 5.5:
> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/ 
> <https://download.gluster.org/pub/gluster/glusterfs/5/5.5/>
> 
> [2] Release notes for 5.5:
> https://docs.gluster.org/en/latest/release-notes/5.5 
> <https://docs.gluster.org/en/latest/release-notes/5.5>/
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users 
> <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5SS24L27QNSR2MZEQEGKCLWAIW5DVTYX/


[ovirt-users] Fwd: [Gluster-users] Announcing Gluster release 5.5

2019-03-21 Thread Darrell Budic
This release of Gluster 5.5 appears to fix the gluster 3.12->5.3 migration 
problems many ovirt users have encountered. 

I’ll try and test it out this weekend and report back. If anyone else gets a 
chance to check it out, let us know how it goes!

  -Darrell

> Begin forwarded message:
> 
> From: Shyam Ranganathan 
> Subject: [Gluster-users] Announcing Gluster release 5.5
> Date: March 21, 2019 at 6:06:33 AM CDT
> To: annou...@gluster.org, gluster-users Discussion List 
> 
> Cc: GlusterFS Maintainers 
> 
> The Gluster community is pleased to announce the release of Gluster
> 5.5 (packages available at [1]).
> 
> Release notes for the release can be found at [3].
> 
> Major changes, features and limitations addressed in this release:
> 
> - Release 5.4 introduced an incompatible change that prevented rolling
> upgrades, and hence was never announced to the lists. As a result we are
> jumping a release version and going to 5.5 from 5.3, that does not have
> the problem.
> 
> Thanks,
> Gluster community
> 
> [1] Packages for 5.5:
> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/
> 
> [2] Release notes for 5.5:
> https://docs.gluster.org/en/latest/release-notes/5.5/
> ___
> Gluster-users mailing list
> gluster-us...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LOMZUTJOAFDJVOLTHXNSSONM7A3QPAPX/


[ovirt-users] Re: Hosted Engine I/O scheduler

2019-03-20 Thread Darrell Budic
> On Mar 20, 2019, at 12:42 PM, Ryan Barry  wrote:
> 
> On Wed, Mar 20, 2019, 1:16 PM Darrell Budic  <mailto:bu...@onholyground.com>> wrote:
> Inline:
> 
>> On Mar 20, 2019, at 4:25 AM, Roy Golan > <mailto:rgo...@redhat.com>> wrote:
>> 
>> On Mon, 18 Mar 2019 at 22:14, Darrell Budic > <mailto:bu...@onholyground.com>> wrote:
>> I agree, been checking some of my more disk intensive VMs this morning, 
>> switching them to noop definitely improved responsiveness. All the virtio 
>> ones I’ve found were using deadline (with RHEL/Centos guests), but some of 
>> the virt-scsi were using deadline and some were noop, so I’m not sure of a 
>> definitive answer on that level yet. 
>> 
>> For the hosts, it depends on what your backend is running. With a separate 
>> storage server on my main cluster, it doesn’t matter what the hosts set for 
>> me. You mentioned you run hyper converged, so I’d say it depends on what 
>> your disks are. If you’re using SSDs, go none/noop as they don’t benefit 
>> from the queuing. If they are HDDs, I’d test cfq or deadline and see which 
>> gave better latency and throughput to your vms. I’d guess you’ll find 
>> deadline to offer better performance, but cfq to share better amongst 
>> multiple VMs. Unless you use ZFS underneath, then go noop and let ZFS take 
>> care of it.
>> 
>>> On Mar 18, 2019, at 2:05 PM, Strahil >> <mailto:hunter86...@yahoo.com>> wrote:
>>> 
>>> Hi Darrel,
>>> 
>>> Still, based on my experience we shouldn't queue our I/O in the VM, just to 
>>> do the same in the Host.
>>> 
>>> I'm still considering if I should keep deadline  in my hosts or to switch 
>>> to 'cfq'.
>>> After all, I'm using Hyper-converged oVirt and this needs testing.
>>> What I/O scheduler  are  you using on the  host?
>>> 
>> 
>> 
>> Our internal scale team is testing now 'throughput-performance' tuned 
>> profile and it gives
>> promising results, I suggest you try it as well.
>> We will go over the results of a comparison against the virtual-guest profile
>> , if there will be evidence for improvements we will set it as the default 
>> (if it won't degrade small,medium scale envs). 
> 
> I don’t think that will make a difference in this case. Both virtual-host and 
> virtual-guest include the throughput-performance profile, just with “better” 
> virtual memory tunings for guest and hosts. None of those 3 modify the disk 
> queue schedulers, by default, at least not on my Centos 7.6 systems.
> 
> Re my testing, I have virtual-host on my hosts and virtual-guest on my guests 
> already.
> 
> Unfortunately, the ideal scheduler really depends on storage configuration. 
> Gluster, ZFS, iSCSI, FC, and NFS don't align on a single "best" configuration 
> (to say nothing of direct LUNs on guests), then there's workload 
> considerations.
> 
> The scale team is aiming for a balanced "default" policy rather than one 
> which is best for a specific environment.
> 
> That said, I'm optimistic that the results will let us give better 
> recommendations if your workload/storage benefits from a different scheduler

Agreed, but that wasn’t my point, I was commenting that those tuned profiles do 
not set schedulers, so that won’t make a difference, disk scheduler wise. Or 
are they testing changes to the default policy config? Good point on direct 
LUNs too.

And a question, why not virtual-guest if you’re talking about in guest/engine 
defaults? Or are they testing host profiles, in which case the question becomes 
why not virtual-host? Or am I missing where they are testing the scheduler?

I’m already using virtual-host on my hosts, which appears to have been set by 
the ovirt node setup process, and virtual-guest in my RHEL based guests, which 
I’ve been setting with puppet for a long time now.


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JYGAG3M3GKKP7WF7CEH4DJL6TQIR4PJF/


[ovirt-users] Re: Hosted Engine I/O scheduler

2019-03-20 Thread Darrell Budic
Inline:

> On Mar 20, 2019, at 4:25 AM, Roy Golan  wrote:
> 
> On Mon, 18 Mar 2019 at 22:14, Darrell Budic  <mailto:bu...@onholyground.com>> wrote:
> I agree, been checking some of my more disk intensive VMs this morning, 
> switching them to noop definitely improved responsiveness. All the virtio 
> ones I’ve found were using deadline (with RHEL/Centos guests), but some of 
> the virt-scsi were using deadline and some were noop, so I’m not sure of a 
> definitive answer on that level yet. 
> 
> For the hosts, it depends on what your backend is running. With a separate 
> storage server on my main cluster, it doesn’t matter what the hosts set for 
> me. You mentioned you run hyper converged, so I’d say it depends on what your 
> disks are. If you’re using SSDs, go none/noop as they don’t benefit from the 
> queuing. If they are HDDs, I’d test cfq or deadline and see which gave better 
> latency and throughput to your vms. I’d guess you’ll find deadline to offer 
> better performance, but cfq to share better amongst multiple VMs. Unless you 
> use ZFS underneath, then go noop and let ZFS take care of it.
> 
>> On Mar 18, 2019, at 2:05 PM, Strahil > <mailto:hunter86...@yahoo.com>> wrote:
>> 
>> Hi Darrel,
>> 
>> Still, based on my experience we shouldn't queue our I/O in the VM, just to 
>> do the same in the Host.
>> 
>> I'm still considering if I should keep deadline  in my hosts or to switch to 
>> 'cfq'.
>> After all, I'm using Hyper-converged oVirt and this needs testing.
>> What I/O scheduler  are  you using on the  host?
>> 
> 
> 
> Our internal scale team is testing now 'throughput-performance' tuned profile 
> and it gives
> promising results, I suggest you try it as well.
> We will go over the results of a comparison against the virtual-guest profile
> , if there will be evidence for improvements we will set it as the default 
> (if it won't degrade small,medium scale envs). 

I don’t think that will make a difference in this case. Both virtual-host and 
virtual-guest include the throughput-performance profile, just with “better” 
virtual memory tunings for guest and hosts. None of those 3 modify the disk 
queue schedulers, by default, at least not on my Centos 7.6 systems.

Re my testing, I have virtual-host on my hosts and virtual-guest on my guests 
already.


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FH5LLYXSEJKXTVVOAZCSMV6AAU33CNCA/


[ovirt-users] Re: Hosted Engine I/O scheduler

2019-03-18 Thread Darrell Budic
I agree, been checking some of my more disk intensive VMs this morning, 
switching them to noop definitely improved responsiveness. All the virtio ones 
I’ve found were using deadline (with RHEL/Centos guests), but some of the 
virt-scsi were using deadline and some were noop, so I’m not sure of a 
definitive answer on that level yet. 

For the hosts, it depends on what your backend is running. With a separate 
storage server on my main cluster, it doesn’t matter what the hosts set for me. 
You mentioned you run hyper converged, so I’d say it depends on what your disks 
are. If you’re using SSDs, go none/noop as they don’t benefit from the queuing. 
If they are HDDs, I’d test cfq or deadline and see which gave better latency 
and throughput to your vms. I’d guess you’ll find deadline to offer better 
performance, but cfq to share better amongst multiple VMs. Unless you use ZFS 
underneath, then go noop and let ZFS take care of it.

> On Mar 18, 2019, at 2:05 PM, Strahil  wrote:
> 
> Hi Darrel,
> 
> Still, based on my experience we shouldn't queue our I/O in the VM, just to 
> do the same in the Host.
> 
> I'm still considering if I should keep deadline  in my hosts or to switch to 
> 'cfq'.
> After all, I'm using Hyper-converged oVirt and this needs testing.
> What I/O scheduler  are  you using on the  host?
> 
> Best Regards,
> Strahil Nikolov
> 
> On Mar 18, 2019 19:15, Darrell Budic  wrote:
> Checked this on mine, see the same thing. Switching the engine to noop 
> definitely feels more responsive.
> 
> I checked on some VMs as well, it looks like virtio drives (vda, vdb….) get 
> mq-deadline by default, but virtscsi gets noop. I used to think the tuned 
> profile for virtual-guest would set noop, but apparently not…
> 
>   -Darrell
> 
> On Mar 18, 2019, at 1:58 AM, Strahil Nikolov  <mailto:hunter86...@yahoo.com>> wrote:
> 
> Hi All,
> 
> I have changed my I/O scheduler to none and here are the results so far:
> 
> Before (mq-deadline):
> Adding a disk to VM (initial creation) START: 2019-03-17 16:34:46.709
> Adding a disk to VM (initial creation) COMPLETED: 2019-03-17 16:45:17.996
> 
> After (none):
> Adding a disk to VM (initial creation) START: 2019-03-18 08:52:02.xxx
> Adding a disk to VM (initial creation) COMPLETED: 2019-03-18 08:52:20.xxx
> 
> Of course the results are inconclusive, as I have tested only once - but I 
> feel the engine more responsive.
> 
> Best Regards,
> Strahil Nikolov
> 
> В неделя, 17 март 2019 г., 18:30:23 ч. Гринуич+2, Strahil 
> mailto:hunter86...@yahoo.com>> написа:
> 
> 
> Dear All,
> 
> I have just noticed that my Hosted Engine has  a strange I/O scheduler:
> 
> Last login: Sun Mar 17 18:14:26 2019 from 192.168.1.43 <http://192.168.1.43/>
> [root@engine ~]# cat /sys/block/vda/queue/scheduler
> [mq-deadline] kyber none
> [root@engine ~]#
> 
> Based on my experience  anything than noop/none  is useless and performance 
> degrading  for a VM.
> 
> Is there any reason that we have this scheduler ?
> It is quite pointless  to process (and delay) the I/O in the VM and then 
> process (and again delay)  on Host Level .
> 
> If there is no reason to keep the deadline, I will open a bug about it.
> 
> Best Regards,
> Strahil Nikolov
> 
> Dear All,
> 
> I have just noticed that my Hosted Engine has  a strange I/O scheduler:
> 
> Last login: Sun Mar 17 18:14:26 2019 from 192.168.1.43 <http://192.168.1.43/>
> [root@engine <mailto:root@engine> ~]# cat /sys/block/vda/queue/scheduler
> [mq-deadline] kyber none
> [root@engine <mailto:root@engine> ~]#
> 
> Based on my experience  anything than noop/none  is useless and performance 
> degrading  for a VM.
> 
> 

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MMTH6225GKYQEZ26BXDBTB52LNWMWVBH/


[ovirt-users] Re: Hosted Engine I/O scheduler

2019-03-18 Thread Darrell Budic
Checked this on mine, see the same thing. Switching the engine to noop 
definitely feels more responsive.

I checked on some VMs as well, it looks like virtio drives (vda, vdb….) get 
mq-deadline by default, but virtscsi gets noop. I used to think the tuned 
profile for virtual-guest would set noop, but apparently not…

  -Darrell

> On Mar 18, 2019, at 1:58 AM, Strahil Nikolov  wrote:
> 
> Hi All,
> 
> I have changed my I/O scheduler to none and here are the results so far:
> 
> Before (mq-deadline):
> Adding a disk to VM (initial creation) START: 2019-03-17 16:34:46.709
> Adding a disk to VM (initial creation) COMPLETED: 2019-03-17 16:45:17.996
> 
> After (none):
> Adding a disk to VM (initial creation) START: 2019-03-18 08:52:02.xxx
> Adding a disk to VM (initial creation) COMPLETED: 2019-03-18 08:52:20.xxx
> 
> Of course the results are inconclusive, as I have tested only once - but I 
> feel the engine more responsive.
> 
> Best Regards,
> Strahil Nikolov
> 
> В неделя, 17 март 2019 г., 18:30:23 ч. Гринуич+2, Strahil 
>  написа:
> 
> 
> Dear All,
> 
> I have just noticed that my Hosted Engine has  a strange I/O scheduler:
> 
> Last login: Sun Mar 17 18:14:26 2019 from 192.168.1.43 
> [root@engine ~]# cat /sys/block/vda/queue/scheduler
> [mq-deadline] kyber none
> [root@engine ~]#
> 
> Based on my experience  anything than noop/none  is useless and performance 
> degrading  for a VM.
> 
> Is there any reason that we have this scheduler ?
> It is quite pointless  to process (and delay) the I/O in the VM and then 
> process (and again delay)  on Host Level .
> 
> If there is no reason to keep the deadline, I will open a bug about it.
> 
> Best Regards,
> Strahil Nikolov
> 
> Dear All,
> 
> I have just noticed that my Hosted Engine has  a strange I/O scheduler:
> 
> Last login: Sun Mar 17 18:14:26 2019 from 192.168.1.43
> [root@engine  ~]# cat /sys/block/vda/queue/scheduler
> [mq-deadline] kyber none
> [root@engine  ~]#
> 
> Based on my experience  anything than noop/none  is useless and performance 
> degrading  for a VM.
> 
> 
> Is there any reason that we have this scheduler ?
> It is quite pointless  to process (and delay) the I/O in the VM and then 
> process (and again delay)  on Host Level .
> 
> If there is no reason to keep the deadline, I will open a bug about it.
> 
> Best Regards,
> Strahil Nikolov
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/YY5ZAPMTD5HUYEBEGD2YYO7EOSTVYIE7/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IXY7RZWFF3B4PJPMO6GCTLTBGWWRMHGB/


[ovirt-users] Re: Libgfapisupport messes disk image ownership

2019-03-15 Thread Darrell Budic
You may have this one instead. I just encountered it last night, still seems to 
be an issue.

https://bugzilla.redhat.com/show_bug.cgi?id=1666795

> On Mar 15, 2019, at 4:25 PM, Hesham Ahmed  wrote:
> 
> I had reported this here: https://bugzilla.redhat.com/show_bug.cgi?id=1687126
> 
> Has anyone else faced this with 4.3.1?
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/VBIASF6YXLOHVKHYRSEFGSPBKH52OSYX/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KUNUCXOGU6GADJRQALBJPCVVDJ4UUKHF/


[ovirt-users] Re: Are people still experiencing issues with GlusterFS on 4.3x?

2019-03-15 Thread Darrell Budic
Upgrading gluster from version 3.12 or 4.1 (included in ovirt 3.x) to 5.3 (in 
ovirt 4.3) seems to cause this due to a bug in the gluster upgrade process. 
It’s an unfortunate side effect fo us upgrading ovirt hyper-converged systems. 
Installing new should be fine, but I’d wait for gluster to get 
https://bugzilla.redhat.com/show_bug.cgi?id=1684385 
 included in the version 
ovirt installs before installing a hyper converged cluster. 

I just upgraded my 4.2.8 cluster to 4.3.1, leaving my separate gluster 3.12.15 
servers along, and it worked fine. Except for a different bug screwing up HA 
engine permissions on launch, but it looks like that’s getting fixed on a 
different bug.

Sandro, it’s unfortunate I can’t take more part in testing days, but the 
haven’t been happening at times where I can participate, and a one test test 
isn’t really something i can participate in often. I sometimes try and keep up 
with the RCs on my test cluster, but major version changes wait until I get 
time to consider it, unfortunately. I’m also a little surprised that a major 
upstream issue like that bug hasn’t caused you to issue more warnings, it’s 
something that is going to affect everyone who’s upgrading a converged system. 
Any discussion on why more news wasn’t released about it?

  -Darrell


> On Mar 15, 2019, at 11:50 AM, Jayme  wrote:
> 
> That is essentially the behaviour that I've seen.  I wonder if perhaps it 
> could be related to the increased heal activity that occurs on the volumes 
> during reboots of nodes after updating.
> 
> On Fri, Mar 15, 2019 at 12:43 PM Ron Jerome  > wrote:
> Just FYI, I have observed similar issues where a volume becomes unstable for 
> a period of time after the upgrade, but then seems to settle down after a 
> while.  I've only witnessed this in the 4.3.x versions.  I suspect it's more 
> of a Gluster issue than oVirt, but troubling none the less.  
> 
> On Fri, 15 Mar 2019 at 09:37, Jayme  > wrote:
> Yes that is correct.  I don't know if the upgrade to 4.3.1 itself caused 
> issues or simply related somehow to rebooting all hosts again to apply node 
> updates started causing brick issues for me again. I started having similar 
> brick issues after upgrading to 4.3 originally that seemed to have 
> stabilized, prior to 4.3 I never had a single glusterFS issue or brick 
> offline on 4.2
> 
> On Fri, Mar 15, 2019 at 9:48 AM Sandro Bonazzola  > wrote:
> 
> 
> Il giorno ven 15 mar 2019 alle ore 13:38 Jayme  > ha scritto:
> I along with others had GlusterFS issues after 4.3 upgrades, the failed to 
> dispatch handler issue with bricks going down intermittently.  After some 
> time it seemed to have corrected itself (at least in my enviornment) and I 
> hadn't had any brick problems in a while.  I upgraded my three node HCI 
> cluster to 4.3.1 yesterday and again I'm running in to brick issues.  They 
> will all be up running fine then all of a sudden a brick will randomly drop 
> and I have to force start the volume to get it back up. 
> 
> Just to clarify, you already where on oVirt 4.3.0 + Glusterfs 5.3-1 and 
> upgraded to oVirt 4.3.1 + Glusterfs 5.3-2 right?
> 
> 
>  
> 
> Have any of these Gluster issues been addressed in 4.3.2 or any other 
> releases/patches that may be available to help the problem at this time?
> 
> Thanks!
> ___
> Users mailing list -- users@ovirt.org 
> To unsubscribe send an email to users-le...@ovirt.org 
> 
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ 
> 
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/ 
> 
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/746CU33TP223CFYS6BFUA2C4FIYZQMGU/
>  
> 
> 
> 
> -- 
> SANDRO BONAZZOLA
> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
> Red Hat EMEA 
> sbona...@redhat.com    
> 
>  
> ___
> Users mailing list -- users@ovirt.org 
> To unsubscribe send an email to users-le...@ovirt.org 
> 
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ 
> 
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/ 
> 
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/RXHP4R5OXAJQ3SOUEKXYGOKTU43LZV3M/
>  
> 

[ovirt-users] Re: HC : JBOD or RAID5/6 for NVME SSD drives?

2019-02-25 Thread Darrell Budic
I do similar with ZFS. In fact, I have a mix of large multi-drive ZFS volumes 
as single bricks, and a few SSDs with xfs as single bricks in other volumes, 
based on use. 

From what I’ve gathered watching the lists for a while, some people with lots 
of single bricks (drives) per node encounter higher heal times than people with 
large single volume bricks (mdadm, hardware raid, ZFS…) encounter better 
healing but maybe suffer a small performance penalty. Seems people like to raid 
their spinning disks and use SSDs or NVMes as single drive bricks in most cases.

Obviously your hardware and use case will drive it, but with NMVes, I’d be 
tempted to use them as single bricks. Raid 1 with them would let you bail one 
and not have to heal gluster, so that would be a bonus, and might get you more 
IOPS to boot. I’d do it if I could afford it ;) The ultimate answer is to test 
it in both configs, including testing healing across them and see what works 
best for you.

> On Feb 25, 2019, at 6:35 AM, Guillaume Pavese 
>  wrote:
> 
> Thanks Jayme,
> 
> We currently use H730 PERC cards on our test cluster but we are not set on 
> anything yet for the production cluster.
> We are indeed worried about losing a drive in JBOD mode. Would setting up a 
> RAID1 of NVME drives with mdadm, and then use that as the JBOD drive for the 
> volume, be a *good* idea? Is that even possible/ something that people do?
> 
> 
> Guillaume Pavese
> Ingénieur Système et Réseau
> Interactiv-Group
> 
> 
> On Sat, Feb 23, 2019 at 2:51 AM Jayme  > wrote:
> Personally I feel like raid on top of GlusterFS is too wasteful.  It would 
> give you a few advantages such as being able to replace a failed drive at 
> raid level vs replacing bricks with Gluster.  
> 
> In my production HCI setup I have three Dell hosts each with two 2Tb SSDs in 
> JBOD.  I find this setup works well for me, but I have not yet run in to any 
> drive failure scenarios. 
> 
> What Perc card do you have in the dell machines?   Jbod is tough with most 
> Perc cards, in many cases to do Jbod you have to fake it using individual 
> raid 0 for each drive.  Only some perc controllers allow true jbod 
> passthrough. 
> 
> On Fri, Feb 22, 2019 at 12:30 PM Guillaume Pavese 
>  > wrote:
> Hi,
> 
> We have been evaluating oVirt HyperConverged for 9 month now with a test 
> cluster of 3 DELL Hosts with Hardware RAID5 on PERC card. 
> We were not impressed with the performance...
> No SSD for LV Cache on these hosts but I tried anyway with LV Cache on a ram 
> device. Perf were almost unchanged.
> 
> It seems that LV Cache is its own source of bugs and problems anyway, so we 
> are thinking going for full NVME drives when buying the production cluster.
> 
> What would the recommandation be in that case, JBOD or RAID?
> 
> Thanks
> 
> Guillaume Pavese
> Ingénieur Système et Réseau
> Interactiv-Group
> ___
> Users mailing list -- users@ovirt.org 
> To unsubscribe send an email to users-le...@ovirt.org 
> 
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ 
> 
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/ 
> 
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/IODRDUEIZBPT2RMEPWCXBTJUU3LV3JUD/
>  
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/KEVWLTZTSKX3AVVUXO46DD3U7DEUNUXE/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ASUDLMRN3GRTUWAUTTJYPKGYLYNF5KPS/


[ovirt-users] Re: Stuck completing last step of 4.3 upgrade

2019-02-20 Thread Darrell Budic
I was just helping Tristam on #ovirt with a similar problem, we found that his 
two upgraded nodes were running multiple glusterfsd processes per brick (but 
not all bricks). His volume & brick files in /var/lib/gluster looked normal, 
but starting glusterd would often spawn extra fsd processes per brick, seemed 
random. Gluster bug? Maybe related to  
https://bugzilla.redhat.com/show_bug.cgi?id=1651246 
, but I’m helping debug 
this one second hand… Possibly related to the brick crashes? We wound up 
stopping glusterd, killing off all the fsds, restarting glusterd, and repeating 
until it only spawned one fsd per brick. Did that to each updated server, then 
restarted glusterd on the not-yet-updated server to get it talking to the right 
bricks. That seemed to get to a mostly stable gluster environment, but he’s 
still seeing 1-2 files listed as needing healing on the upgraded bricks (but 
not the 3.12 brick). Mainly the DIRECT_IO_TEST and one of the dom/ids files, 
but he can probably update that. Did manage to get his engine going again, 
waiting to see if he’s stable now.

Anyway, figured it was worth posting about so people could check for multiple 
brick processes (glusterfsd) if they hit this stability issue as well, maybe 
find common ground.

Note: also encountered https://bugzilla.redhat.com/show_bug.cgi?id=1348434 
 trying to get his engine 
back up, restarting libvirtd let us get it going again. Maybe un-needed if he’d 
been able to complete his third node upgrades, but he got stuck before then, 
so...

  -Darrell

> On Feb 14, 2019, at 1:12 AM, Sahina Bose  wrote:
> 
> On Thu, Feb 14, 2019 at 2:39 AM Ron Jerome  wrote:
>> 
>> 
>>> 
>>> Can you be more specific? What things did you see, and did you report bugs?
>> 
>> I've got this one: https://bugzilla.redhat.com/show_bug.cgi?id=1649054
>> and this one: https://bugzilla.redhat.com/show_bug.cgi?id=1651246
>> and I've got bricks randomly going offline and getting out of sync with the 
>> others at which point I've had to manually stop and start the volume to get 
>> things back in sync.
> 
> Thanks for reporting these. Will follow up on the bugs to ensure
> they're addressed.
> Regarding brciks going offline - are the brick processes crashing? Can
> you provide logs of glusterd and bricks. Or is this to do with
> ovirt-engine and brick status not being in sync?
> 
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/3RVMLCRK4BWCSBTWVXU2JTIDBWU7WEOP/
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4PKJSVDIH3V4H7Q2RKS2C4ZUMWDODQY6/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DYFZAC4BPJNGZP3PEZ6ZP2AB3C3JVAFM/


[ovirt-users] Re: Q: Is it safe to execute on node "saslpasswd2 -a libvirt username" ?

2019-01-23 Thread Darrell Budic
I’ve done it with no ill effects. Can be useful for troubleshooting or clearing 
a stuck VM if the engine is down, but I don’t recommend doing much with it if 
your engine is up and running.


> On Jan 23, 2019, at 11:27 AM, Andrei Verovski  wrote:
> 
> Hi !
> 
> Is it safe to execute on oVirt node this command ?
> saslpasswd2 -a libvirt username
> 
> Its a production environment, screwing up anything is not an option.
> I have no idea how VDSM interacts with libvirt, so not sure about this.
> 
> Thanks in advance
> Andrei
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/YWQUXSQ7F2WOHX3ZUXBQVHOB4KRBFJAO/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VOLTTM5PNIQJRYHYGNG7XH6YCW6MR227/


[ovirt-users] Re: Ovirt hosts running 4.2.3-1.el7 fail to upgrade to CentOS 7.6

2018-12-27 Thread Darrell Budic
Yes, known issue, update your ovirt-release42 first:

> On Dec 13, 2018, at 9:08 AM, Sandro Bonazzola  wrote:
> 
> I would consider "yum update ovirt-release42" as a better option. It will 
> provide the missing nbdkit dependency.
> It has been fixed in oVirt 4.2.7 release.
> 



> On Dec 27, 2018, at 8:05 AM, Andrea Dell'Amico  
> wrote:
> 
> Hello all,
> I tested an upgrade of one of my hypervisors. The ‘yum update’ fails because 
> of a dependency:
> 
> ——
> ---> Package nbdkit-plugin-vddk.x86_64 0:1.2.6-1.el7_6.2 will be installed
> --> Processing Dependency: nbdkit(x86-64) = 1.2.6-1.el7_6.2 for package: 
> nbdkit-plugin-vddk-1.2.6-1.el7_6.2.x86_64
> --> Finished Dependency Resolution
> Error: Package: nbdkit-plugin-vddk-1.2.6-1.el7_6.2.x86_64 (updates)
>  Requires: nbdkit(x86-64) = 1.2.6-1.el7_6.2
>  Available: nbdkit-1.2.6-1.el7.x86_64 (base)
>  nbdkit(x86-64) = 1.2.6-1.el7
>  Available: nbdkit-1.2.6-1.el7_6.2.x86_64 (updates)
>  nbdkit(x86-64) = 1.2.6-1.el7_6.2
>  Installing: nbdkit-1.2.7-2.el7.x86_64 (ovirt-4.2-epel)
>  nbdkit(x86-64) = 1.2.7-2.el7
> ——
> 
> It seems that the ovirt-epel repository has a nbdkit* version older than the 
> CentOS updates one. Is it a known problem?
> 
> 
> Best,
> Andrea
> --
> Andrea Dell'Amico
> http://adellam.sevenseas.org/
> 
> 
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/L6VWWBT4SLCEBPOPSAPXLBWJTCFECBCY/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PZ4EJ3TZTVPPTAKLUOIODLQX4J7ZPJTO/


[ovirt-users] Re: Running different minor versions in different clusters?

2018-12-18 Thread Darrell Budic
It survives and continues to work fine. I’ve been in this situation during 
upgrades when I haven’t been able to do all my clusters at the same time.


> On Dec 18, 2018, at 12:17 PM, Florian Schmid  wrote:
> 
> Hi,
> 
> does nobody has a clue on this? I would need a clear statement about that. ;)
> 
> LG Florian
> 
> 
> - Ursprüngliche Mail -
> Von: "Florian Schmid" 
> An: "users" 
> Gesendet: Dienstag, 11. Dezember 2018 13:31:35
> Betreff: [ovirt-users] Running different minor versions in different clusters?
> 
> Hi,
> 
> I want to ask, if this is a supported Environment, 
> when I have for example the latest 4.2 version running on engine and all 
> hosts in a cluster have an earlier release of 4.2 running?
> 
> Example:
> - ovirt engine: 4.2.7
> - hosts in cluster A: 4.2.5
> - hosts in cluster B: 4.2.7
> 
> The engine VM would run in cluster A with the older version.
> 
> I ask, because we have a very huge ovirt setup, split into different clusters.
> In one cluster, I would like to have always the latest version and in the 
> others, the upgrade will be done in bigger time-intervals.
> 
> LG Florian Schmid
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TVSWNPPOON5RYDYQVHRXPUDTIFBCWBJO/
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/C7C2JXVXR3NEF5ZVESMCSGPBYFQSUA5U/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5LHVX6IC2DJ6DXHYZ5DWTAY3FNDGW4GJ/


[ovirt-users] Re: Gluster with two ovirt nodes

2018-12-13 Thread Darrell Budic
I would recommend just putting a gluster Arbiter on the 3rd node, then you can 
use normal ovirt tools more easily.

If you really want to do this, I wouldn’t bother with ctdb. I used to do it, 
switch to a simpler DNS trick, just put entries in your hosts file with the 
storage ip of both nodes, and use that name in ovirt to access your storage. 
But yes, you can install ctdb by hand and it will work.


> On Dec 12, 2018, at 2:58 PM, Stefan Wolf  wrote:
> 
> Hello,
> 
> i like to set up glusterfs with two ovirt nodes and on more "normal" node is 
> this possible?
> i 've setup glusterfs in cli on two ovirt nodes and 3rd network storage. 
> glusterfs is up and running.
> But now i like to get something like VIP with ctdb for example. is there any 
> possibility to set this up with ovirt?
> Or do i *ve to setup ovirt manually in centos to install ctdb?
> Or are there any other ideas?
> 
> thank you stefan
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4JKFSI4XIMXCKQQEQ7W4ZPWNASYZ52TL/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/E3QUAWLH7L5YXQ26X7HYKFVF6KYDDRU6/


[ovirt-users] Re: Memory usage inclines

2018-12-13 Thread Darrell Budic
Agree, looks like disk caching, which is considered in use but can be freed as 
things ask for it. What kind of storage are you using here?

I checked my converged group, the GUI memory use looks like it includes the 
cache as well, just not as extreme as yours :) If you wanted to confirm, you 
could clear it with

sync; echo 3 > /proc/sys/vm/drop_caches

and you should get all the cache memory back without rebooting. Presumably, 
it’s all holding filesystem data for quick access, so not a terrible thing, 
just alarming to see 97% utilization. Unless it kicked off unneeded MOM runs if 
you’re using same page merging and ballooning, then its doing unneeded work and 
maybe slowing your VMs… You’d probably see it in top if this was the case, it’s 
fairly CPU intensive.

Ultimately, you may want to file a bug/feature request to have the GUI 
differentiate cache from “regular’ in use memory for clarity.

> On Dec 13, 2018, at 12:06 AM, Tony Brian Albers  wrote:
> 
> Screenshots attached
> 
> We have 2 hosts and 24 running vms, all vms are pretty small.
> 
> As you can see, top and the dashboard does not agree.
> 
> An interesting thing is that if I view the host itself in the engine,
> it says under "General" that 
> Max free Memory for scheduling new VMs: 413360 MB
> 
> So maybe it's some sort of caching that's using the memory.
> 
> 
> /tony
> 
> On Wed, 2018-12-12 at 09:59 -0600, Darrell Budic wrote:
>> Yeah, you’re right about 400G, I dropped a digit reading it out of
>> your top display. 
>> 
>> So what are you seeing in the dashboard, I’m not sure I understand
>> the disconnect between the top you shared and what you’re seeing
>> there. It shows lots more than 110G in use, I gather? Or are you
>> seeing this on the hosts page per host mem use?
>> 
>>> On Dec 12, 2018, at 12:34 AM, Tony Brian Albers  wrote:
>>> 
>>> I'm not following you on the 42G available, the way I see it
>>> there's
>>> 400+G available:
>>> 
>>> [root@man-001 ~]# free -h
>>>   totalusedfree  shared  buff/cache
>>>a
>>> vailable
>>> Mem:   503G 96G 19G205M387G
>>> 
>>> 405G
>>> Swap:  4.0G520K4.0G
>>> 
>>> And here's top sorted by %mem usage:
>>> 
>>> top - 07:29:00 up 104 days, 20:56,  1 user,  load average: 0.59,
>>> 0.68,
>>> 0.67
>>> Tasks: 564 total,   1 running, 563 sleeping,   0 stopped,   0
>>> zombie
>>> %Cpu(s):  1.5 us,  0.2 sy,  0.0 ni, 98.2 id,  0.0 wa,  0.0 hi,  0.0
>>> si,  0.0 st
>>> KiB Mem : 52807689+total, 20085144 free, 10132981+used,
>>> 40666195+buff/cache
>>> KiB Swap:  4194300 total,  4193780 free,  520 used.
>>> 42491062+avail
>>> Mem 
>>> 
>>>PID USER  PR  NIVIRTRESSHR S  %CPU
>>> %MEM TIME+
>>> COMMAND 
>>>   5517 qemu  20   0 9128268   8.1g  14084
>>> S   3.0  1.6   5892:07
>>> qemu-kvm
>>>  14187 qemu  20   0 9210236   8.1g  14072
>>> S   5.3  1.6   6586:00
>>> qemu-kvm
>>>  12791 qemu  20   0 9272448   8.1g  14140
>>> S  14.2  1.6  17452:10
>>> qemu-kvm
>>> 135526 qemu  20   0 9117748   8.1g  13664
>>> S   2.3  1.6   5874:48
>>> qemu-kvm
>>>   7938 qemu  20   0 9129936   8.1g  13744
>>> S   2.3  1.6  22109:28
>>> qemu-kvm
>>>  11764 qemu  20   0 9275520   8.1g  13720
>>> S   3.3  1.6  10679:25
>>> qemu-kvm
>>>  12066 qemu  20   0 9360552   8.1g  13708
>>> S   3.0  1.6  10724:34
>>> qemu-kvm
>>>  11153 qemu  20   0 9113544   8.1g  13700
>>> S  15.6  1.6  19050:12
>>> qemu-kvm
>>>  12436 qemu  20   0 9161800   8.1g  13712
>>> S  16.2  1.6  21268:00
>>> qemu-kvm
>>>   6902 qemu  20   0 9110480   8.0g  13580
>>> S   0.7  1.6   1804:16
>>> qemu-kvm
>>>   7621 qemu  20   0 9203816   4.8g  14264
>>> S   1.7  1.0   3143:35
>>> qemu-kvm
>>>   6587 qemu  20   0 4880980   4.1g  13744
>>> S   0.7  0

[ovirt-users] Re: Memory usage inclines

2018-12-12 Thread Darrell Budic
Yeah, you’re right about 400G, I dropped a digit reading it out of your top 
display. 

So what are you seeing in the dashboard, I’m not sure I understand the 
disconnect between the top you shared and what you’re seeing there. It shows 
lots more than 110G in use, I gather? Or are you seeing this on the hosts page 
per host mem use?

> On Dec 12, 2018, at 12:34 AM, Tony Brian Albers  wrote:
> 
> I'm not following you on the 42G available, the way I see it there's
> 400+G available:
> 
> [root@man-001 ~]# free -h
>   totalusedfree  shared  buff/cache   a
> vailable
> Mem:   503G 96G 19G205M387G
> 405G
> Swap:  4.0G520K4.0G
> 
> And here's top sorted by %mem usage:
> 
> top - 07:29:00 up 104 days, 20:56,  1 user,  load average: 0.59, 0.68,
> 0.67
> Tasks: 564 total,   1 running, 563 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  1.5 us,  0.2 sy,  0.0 ni, 98.2 id,  0.0 wa,  0.0 hi,  0.0
> si,  0.0 st
> KiB Mem : 52807689+total, 20085144 free, 10132981+used,
> 40666195+buff/cache
> KiB Swap:  4194300 total,  4193780 free,  520 used. 42491062+avail
> Mem 
> 
>PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND 
>   5517 qemu  20   0 9128268   8.1g  14084 S   3.0  1.6   5892:07
> qemu-kvm
>  14187 qemu  20   0 9210236   8.1g  14072 S   5.3  1.6   6586:00
> qemu-kvm
>  12791 qemu  20   0 9272448   8.1g  14140 S  14.2  1.6  17452:10
> qemu-kvm
> 135526 qemu  20   0 9117748   8.1g  13664 S   2.3  1.6   5874:48
> qemu-kvm
>   7938 qemu  20   0 9129936   8.1g  13744 S   2.3  1.6  22109:28
> qemu-kvm
>  11764 qemu  20   0 9275520   8.1g  13720 S   3.3  1.6  10679:25
> qemu-kvm
>  12066 qemu  20   0 9360552   8.1g  13708 S   3.0  1.6  10724:34
> qemu-kvm
>  11153 qemu  20   0 9113544   8.1g  13700 S  15.6  1.6  19050:12
> qemu-kvm
>  12436 qemu  20   0 9161800   8.1g  13712 S  16.2  1.6  21268:00
> qemu-kvm
>   6902 qemu  20   0 9110480   8.0g  13580 S   0.7  1.6   1804:16
> qemu-kvm
>   7621 qemu  20   0 9203816   4.8g  14264 S   1.7  1.0   3143:35
> qemu-kvm
>   6587 qemu  20   0 4880980   4.1g  13744 S   0.7  0.8   2354:56
> qemu-kvm
>   7249 qemu  20   0 4913084   1.6g  13712 S   0.7  0.3   1380:38
> qemu-kvm
> 111877 qemu  20   0 19110881.1g  14076 S   0.3  0.2 
> 419:58.70
> qemu-kvm
>   4602 vdsm   0 -20 4803160  114184  13860 S   1.3  0.0 
>   2143:44
> vdsmd   
>   4058 root  15  -5 1154020   38804   9588 S   0.0  0.0   
> 0:00.81
> supervdsmd  
>818 root  20   0   84576  35356  34940 S   0.0  0.0   1:05.60
> systemd-journal 
>   3602 root  20   0 1496796   32536   9232 S   0.0  0.0 
> 123:53.70
> python  
>   2672 root  20   0  358328  30228   7984 S   0.0  0.0   0:14.76
> firewalld   
>   4801 vdsm  20   0 1640996   28904   5484 S   0.0  0.0   
> 1265:14
> python
> 
> 
> Rebooting a host doesn't help, (I've tried that earlier) the only thing
> that works is to stop all vm's, reboot all hosts at the same time and
> start vm's again. Then memory usage shown in the dashboard slowly
> increases over time again.
> 
> /tony
> 
> 
> 
> 
> 
> 
> On Tue, 2018-12-11 at 14:09 -0600, Darrell Budic wrote:
>> That’s only reporting 42G available of your 512, ok but something
>> still using it. Try sorting the top by memory %, should be ‘>’ while
>> it’s running.
>> 
>>> On Dec 11, 2018, at 1:39 AM, Tony Brian Albers  wrote:
>>> 
>>> Looks ok to me:
>>> 
>>> top - 08:38:07 up 103 days, 22:05,  1 user,  load average: 0.68,
>>> 0.62,
>>> 0.57
>>> Tasks: 565 total,   1 running, 564 sleeping,   0 stopped,   0
>>> zombie
>>> %Cpu(s):  1.0 us,  0.5 sy,  0.0 ni, 98.5 id,  0.0 wa,  0.0 hi,  0.0
>>> si,  0.0 st
>>> KiB Mem : 52807689+total, 22355988 free, 10132873+used,
>>> 40439219+buff/cache
>>> KiB Swap:  419

[ovirt-users] Re: Memory usage inclines

2018-12-11 Thread Darrell Budic
That’s only reporting 42G available of your 512, ok but something still using 
it. Try sorting the top by memory %, should be ‘>’ while it’s running.

> On Dec 11, 2018, at 1:39 AM, Tony Brian Albers  wrote:
> 
> Looks ok to me:
> 
> top - 08:38:07 up 103 days, 22:05,  1 user,  load average: 0.68, 0.62,
> 0.57
> Tasks: 565 total,   1 running, 564 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  1.0 us,  0.5 sy,  0.0 ni, 98.5 id,  0.0 wa,  0.0 hi,  0.0
> si,  0.0 st
> KiB Mem : 52807689+total, 22355988 free, 10132873+used,
> 40439219+buff/cache
> KiB Swap:  4194300 total,  4193780 free,  520 used. 42492028+avail
> Mem 
> 
>PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND  
>  14187 qemu  20   0 9144668   8.1g  14072 S  12.6  1.6   6506:46
> qemu-kvm 
>  11153 qemu  20   0 9244680   8.1g  13700 S   4.3  1.6  18881:11
> qemu-kvm 
>  12436 qemu  20   0 9292936   8.1g  13712 S   3.3  1.6  21071:56
> qemu-kvm 
>   5517 qemu  20   0 9128268   8.1g  14084 S   3.0  1.6   5801:03
> qemu-kvm 
>  11764 qemu  20   0 9185364   8.1g  13720 S   3.0  1.6  10585:14
> qemu-kvm 
>   7938 qemu  20   0 9252876   8.1g  13744 S   2.6  1.6  21912:46
> qemu-kvm 
>  12791 qemu  20   0 9182292   8.1g  14140 S   2.6  1.6  17299:36
> qemu-kvm 
>   4602 vdsm   0 -20 4803160  114132  13860 S   2.3  0.0 
>   2123:45
> vdsmd
>   7621 qemu  20   0 9187424   4.8g  14264 S   2.3  1.0   3114:25
> qemu-kvm 
>  12066 qemu  20   0 9188436   8.1g  13708 S   2.3  1.6  10629:53
> qemu-kvm 
> 135526 qemu  20   0 9298060   8.1g  13664 S   2.0  1.6   5792:05
> qemu-kvm 
>   6587 qemu  20   0 4883036   4.1g  13744 S   1.3  0.8   2334:54
> qemu-kvm 
>   3814 root  20   0 1450200   25096  14208 S   1.0  0.0 
> 368:03.80
> libvirtd 
>   6902 qemu  20   0 9110480   8.0g  13580 S   1.0  1.6   1787:57
> qemu-kvm 
>   7249 qemu  20   0 4913084   1.6g  13712 S   0.7  0.3   1367:32
> qemu-kvm 
> 
> 
> It looks like it's only in oVirt-engine that there's an issue. The host
> seems happy enough.
> 
> /tony
> 
> 
> 
> On Mon, 2018-12-10 at 20:14 -0600, Darrell Budic wrote:
>> Grab a shell on your hosts and check top memory use quick. Could be
>> VDSMD, in which case restarting the process will give you a temp fix.
>> If you’re running hyperconvered, check your gluster version, there
>> was a leak in versions 3.12.7 - 3.1.12 or so, updating ovirt/gluster
>> is the best fix for that.
>> 
>>> On Dec 10, 2018, at 7:36 AM, Tony Brian Albers  wrote:
>>> 
>>> Hi guys,
>>> 
>>> We have a small test installation here running around 30 vms on 2
>>> hosts.
>>> 
>>> oVirt 4.2.5.3
>>> 
>>> The hosts each have 512 GB memory, and the vms are sized with 4-8
>>> GB
>>> each.
>>> 
>>> I have noticed that over the last months, the memory usage in the
>>> dashboard has been increasing and is now showing 946.8 GB used of
>>> 1007.2 GB.
>>> 
>>> What can be causing this?
>>> 
>>> TIA,
>>> 
>>> -- 
>>> -- 
>>> Tony Albers
>>> Systems Architect
>>> Systems Director, National Cultural Heritage Cluster
>>> Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
>>> Tel: +45 2566 2383 / +45 8946 2316
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct: https://www.ovirt.org/community/about/commun
>>> ity-guidelines/
>>> List Archives: https://lists.ovirt.org/archives/list/us...@ovirt.or
>>> g/message/SDDH2OC5RBOVYYCLGPOUF6HO676HWI5U/
>> 
>> 
> -- 
> Tony Albers
> Systems Architect
> Systems Director, National Cultural Heritage Cluster
> Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
> Tel: +45 2566 2383  / +45 8946 2316 
> 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/V6ARR4MKWQBKWN77DDMOU72W3GPJNYRN/


[ovirt-users] Re: Memory usage inclines

2018-12-10 Thread Darrell Budic
Grab a shell on your hosts and check top memory use quick. Could be VDSMD, in 
which case restarting the process will give you a temp fix. If you’re running 
hyperconvered, check your gluster version, there was a leak in versions 3.12.7 
- 3.1.12 or so, updating ovirt/gluster is the best fix for that.

> On Dec 10, 2018, at 7:36 AM, Tony Brian Albers  wrote:
> 
> Hi guys,
> 
> We have a small test installation here running around 30 vms on 2
> hosts.
> 
> oVirt 4.2.5.3
> 
> The hosts each have 512 GB memory, and the vms are sized with 4-8 GB
> each.
> 
> I have noticed that over the last months, the memory usage in the
> dashboard has been increasing and is now showing 946.8 GB used of
> 1007.2 GB.
> 
> What can be causing this?
> 
> TIA,
> 
> -- 
> -- 
> Tony Albers
> Systems Architect
> Systems Director, National Cultural Heritage Cluster
> Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
> Tel: +45 2566 2383 / +45 8946 2316
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/SDDH2OC5RBOVYYCLGPOUF6HO676HWI5U/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BG5ZJFV7GHH4AODRATSP2O3NE3DD3L3S/


[ovirt-users] Re: Hyperconverged Ovirt + ZFS

2018-11-16 Thread Darrell Budic
Well, personally I know ZFS and I don’t know VDO. Going to have to check it out 
now that I know it exists, sounds interesting to have it at the dm layer. I 
don’t use it exclusively either, but have been finding it useful for 
compression.

What is your source for the statement that COW filesystems have downsides over 
time for VM workloads?

> On Nov 16, 2018, at 3:11 PM, Donny Davis  wrote:
> 
> Why not just use the built in stuff like VDO. What benefits does ZFS bring 
> for the use case? 
> For most vm based workloads ZFS is the opposite of ideal over the lifecycle 
> of a VM. COW filesystems have downsides over time. 
> 
> 
> 
> 
> On Thu, Nov 15, 2018 at 6:09 PM Darrell Budic  <mailto:bu...@onholyground.com>> wrote:
> I did this in the past and didn’t have any trouble with gluster/ZFS, but 
> 4.2.x probably does more validation.
> 
> I recommend these settings on your zfs volumes, I set mine at the root(v0 
> here) and let them inherit:
> 
> required:
> v0xattr sa local
> v0acltype   posixacl   local
> optional but I recommend them:
> v0relatime  on local
> v0compression   lz4local
> 
> For gluster, I think it checks that it’s been “optimized for virt storage”. 
> Either apply the virt group or set the options you’ll find in 
> /var/lib/glusterd/groups/virt.
> 
> Note that I don’t recommend the default settings for cluster.shd-max-threads 
> & cluster.shd-wait-qlength. They can swamp your machines during heals unless 
> you have a lot of cores and ram. You get a slightly faster heal, but often 
> have VMs pausing for storage or other odd ball storage related errors. I 
> prefer max-threads = 1 or maybe 2, and wait-qlength=1024 or 2048. These are 
> per volume, so they hit harder than you think they will if you have a lot of 
> volumes running.
> 
> Also make sure the gluster volumes themselves got set to 36.36 for 
> owner.group, doesn’t matter for the bricks. Can do it with volume settings or 
> mount the volume and set it manually.
> 
> Hope it helps!
> 
>   -Darrell
> 
> 
>> On Nov 15, 2018, at 6:28 AM, Thomas Simmons > <mailto:twsn...@gmail.com>> wrote:
>> 
>> Hello All,
>> 
>> I recently took a new job in a RedHat shop and I'd like to move all of my 
>> homelab systems to RedHat upstream products to better align with what I 
>> manage at work. I had a "custom" (aka - hacked together) 3-node 
>> Hyperconverged XenServer cluster and would like to get this moved over to 
>> Ovirt (I'm currently testing with 4.2.7). Unfortunately, my storage is 
>> limited to software RAID with a 128GB SSD for cache. If at all possible, I 
>> would prefer to use ZFS (RAIDZ+ZIL+L2ARC) instead of MD RAID + lvmcache, 
>> however I'm not able to get this working and I'm not sure why. My ZFS and 
>> Gluster configuration is working - at least where I can manually mount all 
>> of my gluster volumes from all of my nodes, however, hosted-engine --deploy 
>> fails. I understand this isn't an out of the box configuration for Ovirt, 
>> however I see no reason why this shouldn't work. I would think this would be 
>> no different than using any other Gluster volume for the engine datastore, 
>> Am I missing something that would prevent this from working?
>> 
>> [ INFO  ] TASK [Add glusterfs storage domain]
>> [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is 
>> "[Storage Domain target is unsupported]". HTTP response code is 400.
>> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "deprecations": 
>> [{"msg": "The 'ovirt_storage_domains' module is being renamed 
>> 'ovirt_storage_domain'", "version": 2.8}], "msg": "Fault reason is 
>> \"Operation Failed\". Fault detail is \"[Storage Domain target is 
>> unsupported]\". HTTP response code is 400."}
>> 
>> Even though it fails, it appears to have mounted and written 
>> __DIRECT_IO_TEST__ to my Gluster volume:
>> 
>> [root@vmh1 ~]# mount -t glusterfs localhost:/engine /mnt/engine/
>> [root@vmh1 ~]# ls /mnt/engine/
>> __DIRECT_IO_TEST__
>> 
>> If I cancel and try to run the deploy again, I get a different failure:
>> 
>> [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[Error 
>> creating a storage domain]". HTTP response code is 400.
>> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "deprecations": 
>> [{"msg&qu

[ovirt-users] Re: Hyperconverged Ovirt + ZFS

2018-11-15 Thread Darrell Budic
I did this in the past and didn’t have any trouble with gluster/ZFS, but 4.2.x 
probably does more validation.

I recommend these settings on your zfs volumes, I set mine at the root(v0 here) 
and let them inherit:

required:
v0xattr sa local
v0acltype   posixacl   local
optional but I recommend them:
v0relatime  on local
v0compression   lz4local

For gluster, I think it checks that it’s been “optimized for virt storage”. 
Either apply the virt group or set the options you’ll find in 
/var/lib/glusterd/groups/virt.

Note that I don’t recommend the default settings for cluster.shd-max-threads & 
cluster.shd-wait-qlength. They can swamp your machines during heals unless you 
have a lot of cores and ram. You get a slightly faster heal, but often have VMs 
pausing for storage or other odd ball storage related errors. I prefer 
max-threads = 1 or maybe 2, and wait-qlength=1024 or 2048. These are per 
volume, so they hit harder than you think they will if you have a lot of 
volumes running.

Also make sure the gluster volumes themselves got set to 36.36 for owner.group, 
doesn’t matter for the bricks. Can do it with volume settings or mount the 
volume and set it manually.

Hope it helps!

  -Darrell


> On Nov 15, 2018, at 6:28 AM, Thomas Simmons  wrote:
> 
> Hello All,
> 
> I recently took a new job in a RedHat shop and I'd like to move all of my 
> homelab systems to RedHat upstream products to better align with what I 
> manage at work. I had a "custom" (aka - hacked together) 3-node 
> Hyperconverged XenServer cluster and would like to get this moved over to 
> Ovirt (I'm currently testing with 4.2.7). Unfortunately, my storage is 
> limited to software RAID with a 128GB SSD for cache. If at all possible, I 
> would prefer to use ZFS (RAIDZ+ZIL+L2ARC) instead of MD RAID + lvmcache, 
> however I'm not able to get this working and I'm not sure why. My ZFS and 
> Gluster configuration is working - at least where I can manually mount all of 
> my gluster volumes from all of my nodes, however, hosted-engine --deploy 
> fails. I understand this isn't an out of the box configuration for Ovirt, 
> however I see no reason why this shouldn't work. I would think this would be 
> no different than using any other Gluster volume for the engine datastore, Am 
> I missing something that would prevent this from working?
> 
> [ INFO  ] TASK [Add glusterfs storage domain]
> [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is 
> "[Storage Domain target is unsupported]". HTTP response code is 400.
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "deprecations": 
> [{"msg": "The 'ovirt_storage_domains' module is being renamed 
> 'ovirt_storage_domain'", "version": 2.8}], "msg": "Fault reason is 
> \"Operation Failed\". Fault detail is \"[Storage Domain target is 
> unsupported]\". HTTP response code is 400."}
> 
> Even though it fails, it appears to have mounted and written 
> __DIRECT_IO_TEST__ to my Gluster volume:
> 
> [root@vmh1 ~]# mount -t glusterfs localhost:/engine /mnt/engine/
> [root@vmh1 ~]# ls /mnt/engine/
> __DIRECT_IO_TEST__
> 
> If I cancel and try to run the deploy again, I get a different failure:
> 
> [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[Error 
> creating a storage domain]". HTTP response code is 400.
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "deprecations": 
> [{"msg": "The 'ovirt_storage_domains' module is being renamed 
> 'ovirt_storage_domain'", "version": 2.8}], "msg": "Fault reason is 
> \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". 
> HTTP response code is 400."}
> 
> Gluster seems ok...
> 
> [root@vmh1 /]# gluster volume info engine
>  
> Volume Name: engine
> Type: Replicate
> Volume ID: 2e34f8f5-0129-4ba5-983f-1eb5178deadc
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: vmh1-ib:/zpool1/engine
> Brick2: vmh2-ib:/zpool1/engine
> Brick3: vmh3-ib:/zpool1/engine
> Options Reconfigured:
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
> 
> ZFS looks good too...
> 
> [root@vmh1 ~]# ansible ovirthosts -m shell -a 'zpool status' -b
> vmh1 | CHANGED | rc=0 >>
>   pool: zpool1
>  state: ONLINE
>   scan: none requested
> config:
> 
>   NAMESTATE READ WRITE CKSUM
>   zpool1  ONLINE   0 0 0
> sdc   ONLINE   0 0 0
> sdd   ONLINE   0 0 0
> sde   ONLINE   0 0 0
>   logs
> sdb2  ONLINE   0 0 0
>   cache
> sdb1  ONLINE   0 0 0
> 
> errors: No known data errors
> 
> vmh3 | CHANGED | rc=0 >>
>   pool: zpool1
>  state: ONLINE
>   scan: none requested
> config:
> 
>   NAMESTATE READ WRITE CKSUM
> 

[ovirt-users] Re: [ovirt-announce] [ANN] oVirt 4.2.7 async update is now available

2018-11-13 Thread Darrell Budic
Re 1647032, isn’t server.allow-insecure=on required for libgfapi? Or has that 
been worked around in a different way?

  -Darrell


> On Nov 13, 2018, at 9:46 AM, Sandro Bonazzola  wrote:
> 
> The oVirt Team has just released a new version of the following packages:
> - ovirt-engine
> - ovirt-hosted-engine-setup
> - ovirt-release42
> 
> Corresponding oVirt Node is being built and will be released tomorrow.
> 
> The async release addresses the following bugs:
> 
> - Bug 1647032 - Update gluster volume options set on the volume
> - Bug 1645757 - VMs running on the deployed host are removed from the engine 
> after backup/restore
> - Bug 1620314 - SHE disaster recovery is broken in new 4.2 deployments as 
> hosted_storage is master 
> - Bug 1568841 - Can't restore hosted-engine backup at deployment
> 
> The following enhancements are also included:
> - Bug 1469908 - [RFE] Support managed/automated restore
> - Bug 1406067 - [RFE] have the option to install hosted engine on specific 
> datacenter and cluster. 
> 
> Thanks,
> -- 
> SANDRO BONAZZOLA
> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
> Red Hat EMEA 
> sbona...@redhat.com    
> 
>  
> ___
> Announce mailing list -- annou...@ovirt.org
> To unsubscribe send an email to announce-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/annou...@ovirt.org/message/JQJCKQXQIHDV4NRVWHLRN6QFZCKPPIK4/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/APBURUFS52WXJTC63O7ARWMY5HREH2PM/


[ovirt-users] Re: Affinity rules in ovirt

2018-10-14 Thread Darrell Budic
VM to VM affinity will try and run the VMs on the same host if positive, and 
different hosts if negative.

VM to Host affinity will try and run the VM on a specific set of Hosts if 
positive, and not on those hosts if negative.

Enforcing will keep the scheduler from launching a VM if it can’t meet those 
criteria, and will try and make changes to which hosts are running where if it 
can. If you don’t set it, it will still try and launch VMs on different hosts, 
but if it can’t, it will still launch the VM. It also won’t make changes to 
bring all VMs into compliance with the Affinity rules, from what I can tel.

> On Oct 14, 2018, at 8:15 AM, Hari Prasanth Loganathan 
>  wrote:
> 
> Hi Team,
> 
> I tried to follow up the affinity rules using this tutorial : 
> https://www.youtube.com/watch?v=rs_5BSqacWE 
>  but I have few clarifications 
> in it.,
> 
> 1) I understand the VM to VM affinity, It means both the individual VMs needs 
> to run on a single common host, Say I have 2 hosts, so what happens in case 
> of failure of host running the VMs? 
> 2)  If I create a affinity group and I get the following data, 
> i) What is host rule / vms rule - enabled, enforcing and 
> positive ? 
> 
>   "enforcing": "true",
> "hosts_rule": {
> "enabled": "true",
> "enforcing": "true",
> "positive": "false"
> },
> "vms_rule": {
> "enabled": "false",
> "enforcing": "true",
> "positive": "false"
> }
> 
> 3) What is the difference between VM to VM affinity and VM to Host affinity? 
> 
> Doc's are not very clear, so please any help is appreciated.
> 
> Thanks,
> Hari 
> 
> 
> 
> DISCLAIMER - MSysTechnologies LLC 
> 
> 
> This email message, contents and its attachments may contain confidential, 
> proprietary or legally privileged information and is intended solely for the 
> use of the individual or entity to whom it is actually intended. If you have 
> erroneously received this message, please permanently delete it immediately 
> and notify the sender. If you are not the intended recipient of the email 
> message,you are notified strictly not to disseminate,distribute or copy this 
> e-mail.E-mail transmission cannot be guaranteed to be secure or error-free as 
> Information could be intercepted, corrupted, lost, destroyed, incomplete or 
> contain viruses and MSysTechnologies LLC accepts no liability for the 
> contents and integrity of this mail or for any damage caused by the 
> limitations of the e-mail transmission.
> 
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ST35SIO45EO647BJKUVCUZQBRSAHAUSA/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GXWG7ICW2YN7HZKWVVMRLO53YAIA5DCI/


[ovirt-users] Re: VM stuck in paused mode with Cluster Compatibility Version 3.6 on 4.2 cluster

2018-09-20 Thread Darrell Budic
I had something similar happen while upgrading. Didn’t find a way to fix the 
configs on the fly, but was able to un-pause the VMs using virsh, then proceed 
to handle the ovirt portions. Probably work for you as well.

> From: Marco Lorenzo Crociani 
> Subject: [ovirt-users] VM stuck in paused mode with Cluster Compatibility 
> Version 3.6 on 4.2 cluster
> Date: September 20, 2018 at 11:10:48 AM CDT
> To: users
> 
> Hi,
> we upgraded ovirt from version 4.1 to 4.2.6. Rebooted all vms.
> We missed two vms that were at Cluster Compatibility Version 3.6.
> There was a gluster/network IO problem and vms got paused. We were able to 
> recover all the other vms from the paused state but we have two vms that 
> won't run because:
> 
> "Cannot run VM. The Custom Compatibility Version of VM VM_NAME (3.6) is not 
> supported in Data Center compatibility version 4.1."
> 
> Can we force the CCV of the paused vm to 4.1?
> 
> Regards,
> 
> -- 
> Marco Crociani
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5XH3H6ADEY3WFYNEVVREEGCA57NPDAQY/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WJHHBWZTBHYWVPREFWMM32N4M5UJ726K/


[ovirt-users] Re: Upgraded host, engine now won't boot

2018-09-04 Thread Darrell Budic
Glad you got into it and got it working. Not sure why it keeps unpausing, could 
open a bug if you wanted.

Yep, the engine-setup is on the engine vm itself, not the hosts. You want 
https://www.ovirt.org/documentation/self-hosted/chap-Maintenance_and_Upgrading_Resources/
 
<https://www.ovirt.org/documentation/self-hosted/chap-Maintenance_and_Upgrading_Resources/>,
 although if you aren’t running the appliance in the first place, I don’t think 
it all applies. Just log in and run yum update or yum update ovirt* and then 
engine-setup again.


> From: Jim Kusznir 
> Subject: Re: [ovirt-users] Upgraded host, engine now won't boot
> Date: September 3, 2018 at 10:21:01 PM CDT
> To: Darrell Budic
> Cc: users
> 
> Ok, finally got it...Had to get a terminal ready with the virsh command and 
> guess what the instance number was, and then run suspend right after starting 
> with --vm-start-paused.  Got it to really be paused, got into the console, 
> booted the old kernel, and have now been repairing a bad yum transactionI 
> *think* I've finished that.
> 
> So, if I understand correctly, after the yum update, I should run 
> engine-setup?  Do I run that inside the engine vm, or on the host its running 
> on?
> 
> BTW: I did look up upgrade procedures on the documentation for the release.  
> It links to two or three levels of other documents, then ends in an error 404.
> 
> --Jim
> 
> On Mon, Sep 3, 2018 at 6:39 PM, Jim Kusznir  <mailto:j...@palousetech.com>> wrote:
> global maintence mode is already on.  hosted-engine --vm-start-paused results 
> in a non-paused VM being started.  Of course, this is executed after 
> hosted-engine --vm-poweroff and suitable time left to let things shut down.
> 
> I just ran another test, and did in fact see the engine was briefly paused, 
> but then was quickly put in the running state.  I don't know by what, though. 
>  Global maintence mode is definitely enabled, every run of the hosted-engine 
> command reminds me!
> 
> 
> 
> 
> 
> On Mon, Sep 3, 2018 at 11:12 AM, Darrell Budic  <mailto:bu...@onholyground.com>> wrote:
> Don’t know if there’s anything special, it’s been a while since I’ve needed 
> to start it in paused mode. Try putting it in HA maintenance mode from the 
> CLI and then start it in paused mode maybe?
> 
>> From: Jim Kusznir mailto:j...@palousetech.com>>
>> Subject: Re: [ovirt-users] Upgraded host, engine now won't boot
>> Date: September 3, 2018 at 1:08:27 PM CDT
>> 
>> To: Darrell Budic
>> Cc: users
>> 
>> Unfortunately, I seem unable to get connected to the console early enough to 
>> actually see a kernel list.
>> 
>> I've tried the hosted-engine --start-vm-paused command, but it just starts 
>> it (running mode, not paused).  By the time I can get vnc connected, I have 
>> just that last line.  ctrl-alt-del doesn't do anything with it, either.  
>> sending a reset through virsh seems to just kill the VM (it doesn't respawn).
>> 
>> ha seems to have some trouble with this too...Originally I allowed ha to 
>> start it, and it would take it a good long while before it gave up on the 
>> engine and reset it.  It instantly booted to the same crashed state, and 
>> again waited a "good long while" (sorry, never timed it, but I know it was 
>> >5 min).
>> 
>> My current thought is that I need to get the engine started in paused mode, 
>> connect vnc, then unpause it with virsh to catch what is happening.  Is 
>> there any magic to getting it started in paused mode?
>> 
>> On Mon, Sep 3, 2018 at 11:03 AM, Darrell Budic > <mailto:bu...@onholyground.com>> wrote:
>> Send it a ctl-alt-delete and see what happens. Possibly try an older kernel 
>> at the grub boot menu. Could also try stopping it with hosted-engine 
>> —vm-stop and let HA reboot it, see if it boots or get onto the console 
>> quickly and try and watch more of the boot.
>> 
>> Ssh and yum upgrade is fine for the OS, although it’s a good idea to enable 
>> Global HA Maintenance first so the HA watchdogs don’t reboot it in the 
>> middle of that. After that, run “engine-setup” again, at least if there are 
>> new ovirt engine updates to be done. Then disable Global HA Maintenance, and 
>> run "shutdown -h now” to stop the Engine VM (rebooting seems to cause it to 
>> exit anyway, HA seems to run it as a single execution VM. Or at least in the 
>> past, it seems to quit anyway on me and shutdown triggered HA faster). Wait 
>> a few minutes, and HA will respawn it on a new instance and you can log into 
>> your engine again.
>>> From: Jim Kusznir mailto:j...@palousetech.com>

[ovirt-users] Re: Upgraded host, engine now won't boot

2018-09-03 Thread Darrell Budic
Don’t know if there’s anything special, it’s been a while since I’ve needed to 
start it in paused mode. Try putting it in HA maintenance mode from the CLI and 
then start it in paused mode maybe?

> From: Jim Kusznir 
> Subject: Re: [ovirt-users] Upgraded host, engine now won't boot
> Date: September 3, 2018 at 1:08:27 PM CDT
> To: Darrell Budic
> Cc: users
> 
> Unfortunately, I seem unable to get connected to the console early enough to 
> actually see a kernel list.
> 
> I've tried the hosted-engine --start-vm-paused command, but it just starts it 
> (running mode, not paused).  By the time I can get vnc connected, I have just 
> that last line.  ctrl-alt-del doesn't do anything with it, either.  sending a 
> reset through virsh seems to just kill the VM (it doesn't respawn).
> 
> ha seems to have some trouble with this too...Originally I allowed ha to 
> start it, and it would take it a good long while before it gave up on the 
> engine and reset it.  It instantly booted to the same crashed state, and 
> again waited a "good long while" (sorry, never timed it, but I know it was >5 
> min).
> 
> My current thought is that I need to get the engine started in paused mode, 
> connect vnc, then unpause it with virsh to catch what is happening.  Is there 
> any magic to getting it started in paused mode?
> 
> On Mon, Sep 3, 2018 at 11:03 AM, Darrell Budic  <mailto:bu...@onholyground.com>> wrote:
> Send it a ctl-alt-delete and see what happens. Possibly try an older kernel 
> at the grub boot menu. Could also try stopping it with hosted-engine —vm-stop 
> and let HA reboot it, see if it boots or get onto the console quickly and try 
> and watch more of the boot.
> 
> Ssh and yum upgrade is fine for the OS, although it’s a good idea to enable 
> Global HA Maintenance first so the HA watchdogs don’t reboot it in the middle 
> of that. After that, run “engine-setup” again, at least if there are new 
> ovirt engine updates to be done. Then disable Global HA Maintenance, and run 
> "shutdown -h now” to stop the Engine VM (rebooting seems to cause it to exit 
> anyway, HA seems to run it as a single execution VM. Or at least in the past, 
> it seems to quit anyway on me and shutdown triggered HA faster). Wait a few 
> minutes, and HA will respawn it on a new instance and you can log into your 
> engine again.
>> From: Jim Kusznir mailto:j...@palousetech.com>>
>> Subject: Re: [ovirt-users] Upgraded host, engine now won't boot
>> Date: September 3, 2018 at 12:45:22 PM CDT
>> To: Darrell Budic
>> Cc: users
>> 
>> 
>> Thanks to Jayme who pointed me to the --add-console-password hosted-engine 
>> command to set a password for vnc.  Using that, I see only the single line:
>> 
>> Probing EDD (edd=off to disable)... ok
>> 
>> --Jim
>> 
>> On Mon, Sep 3, 2018 at 10:26 AM, Jim Kusznir > <mailto:j...@palousetech.com>> wrote:
>> Is there a way to get a graphical console on boot of the engine vm so I can 
>> see what's causing the failure to boot?
>> 
>> On Mon, Sep 3, 2018 at 10:23 AM, Jim Kusznir > <mailto:j...@palousetech.com>> wrote:
>> Thanks; I guess I didn't mention that I started there.
>> 
>> The virsh list shows it in state running, and gluster is showing fully 
>> online and healed.  However, I cannot bring up a console of the engine VM to 
>> see why its not booting, even though it shows in running state.
>> 
>> In any case, the hosts and engine were running happily.  I applied the 
>> latest updates on the host, and the engine went unstable.  I thought, Ok, 
>> maybe there's an update to ovirt that also needs to be applied to the 
>> engine, so I ssh'ed in and ran yum update (never did find clear instructions 
>> on how one is supposed to maintain the engine, but I did see that listed 
>> online).  A while later, it reset and never booted again.
>> 
>> -JIm
>> 
>> On Sun, Sep 2, 2018 at 4:28 PM, Darrell Budic > <mailto:bu...@onholyground.com>> wrote:
>> It’s definitely not starting, you’ll have to see if you can figure out why. 
>> A couple things to try:
>> 
>> - Check "virsh list" and see if it’s running, or paused for storage. (google 
>> "virsh saslpasswd2 
>> <https://www.google.com/search?client=safari=en=virsh+saslpasswd2=UTF-8=UTF-8>”
>>  if you need to add a user to do this with, it’s per host)
>> -  It’s hyper converged, so check your gluster volume for healing and/or 
>> split brains and wait/resolve those.
>> - check “gluster peer status” and on each host and make sure your gluster 
>> hosts are all talking. I’ve seen an u

[ovirt-users] Re: Upgraded host, engine now won't boot

2018-09-03 Thread Darrell Budic
Send it a ctl-alt-delete and see what happens. Possibly try an older kernel at 
the grub boot menu. Could also try stopping it with hosted-engine —vm-stop and 
let HA reboot it, see if it boots or get onto the console quickly and try and 
watch more of the boot.

Ssh and yum upgrade is fine for the OS, although it’s a good idea to enable 
Global HA Maintenance first so the HA watchdogs don’t reboot it in the middle 
of that. After that, run “engine-setup” again, at least if there are new ovirt 
engine updates to be done. Then disable Global HA Maintenance, and run 
"shutdown -h now” to stop the Engine VM (rebooting seems to cause it to exit 
anyway, HA seems to run it as a single execution VM. Or at least in the past, 
it seems to quit anyway on me and shutdown triggered HA faster). Wait a few 
minutes, and HA will respawn it on a new instance and you can log into your 
engine again.
> From: Jim Kusznir 
> Subject: Re: [ovirt-users] Upgraded host, engine now won't boot
> Date: September 3, 2018 at 12:45:22 PM CDT
> To: Darrell Budic
> Cc: users
> 
> Thanks to Jayme who pointed me to the --add-console-password hosted-engine 
> command to set a password for vnc.  Using that, I see only the single line:
> 
> Probing EDD (edd=off to disable)... ok
> 
> --Jim
> 
> On Mon, Sep 3, 2018 at 10:26 AM, Jim Kusznir  <mailto:j...@palousetech.com>> wrote:
> Is there a way to get a graphical console on boot of the engine vm so I can 
> see what's causing the failure to boot?
> 
> On Mon, Sep 3, 2018 at 10:23 AM, Jim Kusznir  <mailto:j...@palousetech.com>> wrote:
> Thanks; I guess I didn't mention that I started there.
> 
> The virsh list shows it in state running, and gluster is showing fully online 
> and healed.  However, I cannot bring up a console of the engine VM to see why 
> its not booting, even though it shows in running state.
> 
> In any case, the hosts and engine were running happily.  I applied the latest 
> updates on the host, and the engine went unstable.  I thought, Ok, maybe 
> there's an update to ovirt that also needs to be applied to the engine, so I 
> ssh'ed in and ran yum update (never did find clear instructions on how one is 
> supposed to maintain the engine, but I did see that listed online).  A while 
> later, it reset and never booted again.
> 
> -JIm
> 
> On Sun, Sep 2, 2018 at 4:28 PM, Darrell Budic  <mailto:bu...@onholyground.com>> wrote:
> It’s definitely not starting, you’ll have to see if you can figure out why. A 
> couple things to try:
> 
> - Check "virsh list" and see if it’s running, or paused for storage. (google 
> "virsh saslpasswd2 
> <https://www.google.com/search?client=safari=en=virsh+saslpasswd2=UTF-8=UTF-8>”
>  if you need to add a user to do this with, it’s per host)
> -  It’s hyper converged, so check your gluster volume for healing and/or 
> split brains and wait/resolve those.
> - check “gluster peer status” and on each host and make sure your gluster 
> hosts are all talking. I’ve seen an upgrade screwup the firewall, easy fix is 
> to add a rule to allow the hosts to talk to each other on your gluster 
> network, no questions asked (-j ACCEPT, no port, etc).
> 
> Good luck!
> 
>> From: Jim Kusznir mailto:j...@palousetech.com>>
>> Subject: [ovirt-users] Upgraded host, engine now won't boot
>> Date: September 1, 2018 at 8:38:12 PM CDT
>> To: users
>> 
>> Hello:
>> 
>> I saw that there were updates to my ovirt-4.2 3 node hyperconverged system, 
>> so I proceeded to apply them the usual way through the UI.
>> 
>> At one point, the hosted engine was migrated to one of the upgraded hosts, 
>> and then went "unstable" on me.  Now, the hosted engine appears to be 
>> crashed:  It gets powered up, but it never boots up to the point where it 
>> responds to pings or allows logins.  After a while, the hosted engine shows 
>> status (via console "hosted-engine --vm-status" command) "Powering Down".  
>> It stays there for a long time.
>> 
>> I tried forcing a poweroff then powering it on, but again, it never gets up 
>> to where it will respond to pings.  --vm-status shows bad health, but up.
>> 
>> I tried running the hosted-engine --console command, but got:
>> 
>> [root@ovirt1 ~]# hosted-engine --console
>> The engine VM is running on this host
>> Connected to domain HostedEngine
>> Escape character is ^]
>> error: internal error: cannot find character device 
>> 
>> [root@ovirt1 ~]# 
>> 
>> 
>> I tried to run the hosted-engine --upgrade-appliance command, but it hangs 
>> at obtaining certificate (understandably, as the hosted-engine is n

[ovirt-users] Re: Upgraded host, engine now won't boot

2018-09-02 Thread Darrell Budic
It’s definitely not starting, you’ll have to see if you can figure out why. A 
couple things to try:

- Check "virsh list" and see if it’s running, or paused for storage. (google 
"virsh saslpasswd2 
”
 if you need to add a user to do this with, it’s per host)
-  It’s hyper converged, so check your gluster volume for healing and/or split 
brains and wait/resolve those.
- check “gluster peer status” and on each host and make sure your gluster hosts 
are all talking. I’ve seen an upgrade screwup the firewall, easy fix is to add 
a rule to allow the hosts to talk to each other on your gluster network, no 
questions asked (-j ACCEPT, no port, etc).

Good luck!

> From: Jim Kusznir 
> Subject: [ovirt-users] Upgraded host, engine now won't boot
> Date: September 1, 2018 at 8:38:12 PM CDT
> To: users
> 
> Hello:
> 
> I saw that there were updates to my ovirt-4.2 3 node hyperconverged system, 
> so I proceeded to apply them the usual way through the UI.
> 
> At one point, the hosted engine was migrated to one of the upgraded hosts, 
> and then went "unstable" on me.  Now, the hosted engine appears to be 
> crashed:  It gets powered up, but it never boots up to the point where it 
> responds to pings or allows logins.  After a while, the hosted engine shows 
> status (via console "hosted-engine --vm-status" command) "Powering Down".  It 
> stays there for a long time.
> 
> I tried forcing a poweroff then powering it on, but again, it never gets up 
> to where it will respond to pings.  --vm-status shows bad health, but up.
> 
> I tried running the hosted-engine --console command, but got:
> 
> [root@ovirt1 ~]# hosted-engine --console
> The engine VM is running on this host
> Connected to domain HostedEngine
> Escape character is ^]
> error: internal error: cannot find character device 
> 
> [root@ovirt1 ~]# 
> 
> 
> I tried to run the hosted-engine --upgrade-appliance command, but it hangs at 
> obtaining certificate (understandably, as the hosted-engine is not up).
> 
> How do i recover from this?  And what caused this?
> 
> --Jim
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/XBNOOF4OA5C5AFGCT3KGUPUTRSOLIPXX/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FRGKRZF4G4S2HNN5TNN7DOPEODJBAQSD/


[ovirt-users] Re: Next Gluster Updates?

2018-08-29 Thread Darrell Budic
3.12.13 is now showing up  in the storage repo.

I can confirm it fixes the leak I’ve been seeing since 3.12.9 (upgraded one of 
my nodes and ran it overnight). Hurray!

> From: Sahina Bose 
> Subject: [ovirt-users] Re: Next Gluster Updates?
> Date: August 28, 2018 at 3:28:27 AM CDT
> To: Robert OKane; Gluster Devel
> Cc: users
> 
> 
> 
> On Mon, Aug 27, 2018 at 5:51 PM, Robert O'Kane  > wrote:
> I had a bug request in Bugzilla for Gluster being killed due to a memory 
> leak. The Gluster People say it is fixed in gluster-3.12.13
> 
> When will Ovirt have this update?  I am getting tired of having to restart my 
> hypervisors every week or so...
> 
> I currently have ovirt-release42-4.2.5.1-1.el7.noarch  and yum check-updates 
> shows me no new gluster versions.(still 3.12.11)
> 
> oVirt will pick it up as soon as the gluster release is pushed to CentOS 
> storage repo - http://mirror.centos.org/centos/7/storage/x86_64/gluster-3.12/ 
> 
> 
> Niels, Shyam - any ETA for gluster-3.12.13 in CentOS
> 
> 
> Cheers,
> 
> Robert O'Kane
> 
> -- 
> Robert O'Kane
> Systems Administrator
> Kunsthochschule für Medien Köln
> Peter-Welter-Platz 2
> 50676 Köln
> 
> fon: +49(221)20189-223
> fax: +49(221)20189-49223
> ___
> Users mailing list -- users@ovirt.org 
> To unsubscribe send an email to users-le...@ovirt.org 
> 
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ 
> 
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/ 
> 
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/L7ZTIQA3TAM7IR4LCTWMXXCSGCLWUJJN/
>  
> 
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/2ZKCL2QUVEQRPPV4I3EUYXMAE6PGZUNM/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NGKDOCJBQWZEDFCHMTJGJLLQGCVSMEWK/


[ovirt-users] Re: Weird Memory Leak Issue

2018-08-29 Thread Darrell Budic
There’s a memory leak in gluster 3.12.9 - 3.12.12 on fuse mounted volumes, 
sounds like what you’re seeing.

The fix is in 3.12.13, which should be showing up today or tomorrow in the 
centos repos (currently available from the testing repo). I’ve been running it 
overnight on one host to test, looks like they got it.

> From: Cole Johnson 
> Subject: [ovirt-users] Weird Memory Leak Issue
> Date: August 29, 2018 at 9:35:39 AM CDT
> To: users@ovirt.org
> 
> Hello,
> I have a hyperconverged, self hosted ovirt cluster with three hosts,
> running 4 VM's.  The hosts are running the latest ovirt node.  The
> VM's are Linux, Windows server 2016, and Windows Server 2008r2.  The
> problem is with any host running the 2008r2 VM will run out of memory
> after 8-10 hours, causing any VM on the host to be paused, and making
> to host all but unresponsive. This problem seems to only exist with
> this specific VM.  None of the other running VM's have this problem.
> I can resolve the problem by migrating the VM to a different host,
> then putting the host into maintenance mode, the activating it back.
> The leak appears to be in glusterfsd.  Is there anything I can do to
> permanently fix this?
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/A223GXBU32TQGGVA2KADYTIBHPEF3EID/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RGP2JDZH3BRZFNMDR5DI3V2U4SXIFSAY/


[ovirt-users] Re: Tuning and testing GlusterFS performance

2018-08-05 Thread Darrell Budic
The defaults are a queue depth of 1000 and 1 thread. Recommended settings are 
going to depend on what kind of hardware you’re running it on, load, and memory 
as much or more than disk type/speed, from my experience.

I’d probably recommend a # of queues equal to half my total CPU core count, 
leaving the other half for handling actually serving data. Unless it’s 
hyper-converged, then I’d keep it to 1 or two, since those CPUs would also be 
serving VM. For the queue dept, I don’t have any good ideas other than using 
the default 1000. Especially if you don’t have a huge storage system, it won’t 
make a big difference. One other thing I’m not sure of is if that’s threads per 
SHD, if it’s per, you get one per volume and might want to limit it even more. 
My reason is that if gluster can max your CPU, it’s got high enough settings 
for those two vars :)

And these numbers are relative, I was testing with 8/1 after a post in 
gluster-users suggested it helped speed up healing time, and I found it took my 
systems about 4 or 5 hours to heal fully after rebooting a server. BUT they 
also staved my VMs for iops and eventually corrupted some disks due to io 
timeouts and failures. VMs would pause all the time as well. I have about 60 
VMs on the main cluster of 3 stand alone servers, several volumes. Ugh. With 
1/1000 it takes about 6 hours to fully heal after a reboot, but no VM thrashing 
on disk and nothings been corrupted since. Note that it’s actually healing all 
that time, but at least one node will be maxing it’s CPU (even with 1/1000) 
comparing files and making sure they are synced with the other servers.

Ovirt Devs, if you’ve made the default optimized setting or cockpit setup 
8/1, I think you’re doing most folk a dis-service unless they have massive 
servers..
> From: Jayme 
> Subject: Re: [ovirt-users] Tuning and testing GlusterFS performance
> Date: August 5, 2018 at 2:21:00 PM EDT
> To: William Dossett
> Cc: Darrell Budic; users
> 
> I can't imagine too many probs with such a minor update I've been doing 
> updates on Ovirt for a while (non gluster) and haven't had too many problems 
> 
> On Sun, Aug 5, 2018, 2:49 PM William Dossett,  <mailto:william.doss...@gmail.com>> wrote:
> Ah. Ok… mine are the H710s and yes I had to do virtual drives at RAID 0.  
> I’ve got my first templates up and running now anyway, getting ready to demo 
> this to mgmt. late this week or early next.  Hoping to get some budget for 
> flash drives after that.
> 
>  
> 
> They got quotes in for renewing our VMware licensing last week… ½ a million!  
> So I have a fairly interested audience 
> 
>  
> 
> Pretty sure with some cash I can get the performance we need using flash,  
> the other thing will be upgrades…  going to see how the upgrade from 4.2.4 to 
> 4.2.5 goes this week.  Classically this is where open source has failed me in 
> the past, but this is feeling much more like a finished product than it used 
> to.
> 
>  
> 
> Regards
> 
> Bill
> 
>  
> 
>  
> 
> From: Jayme mailto:jay...@gmail.com>> 
> Sent: Sunday, August 5, 2018 10:18 AM
> To: William Dossett  <mailto:william.doss...@gmail.com>>
> Cc: Darrell Budic mailto:bu...@onholyground.com>>; 
> users mailto:users@ovirt.org>>
> Subject: Re: [ovirt-users] Tuning and testing GlusterFS performance
> 
>  
> 
> I'm using h310s which are known to have crap queue depth, I'm using them 
> because they are one of the only percs that allow you to do both raid and 
> passtrhough jbod instead of having to jbod using individual raid 0s.  They 
> should be fine but could bottleneck during an intensive brick rebuild in 
> addition to regular volume activity 
> 
>  
> 
> On Sun, Aug 5, 2018, 1:06 PM William Dossett,  <mailto:william.doss...@gmail.com>> wrote:
> 
> I think Percs have queue depth of 31 if that’s of any help… fairly common 
> with that level of controller.
> 
>  
> 
> From: Jayme mailto:jay...@gmail.com>> 
> Sent: Sunday, August 5, 2018 9:50 AM
> To: Darrell Budic mailto:bu...@onholyground.com>>
> Cc: William Dossett  <mailto:william.doss...@gmail.com>>; users  <mailto:users@ovirt.org>>
> Subject: Re: [ovirt-users] Tuning and testing GlusterFS performance
> 
>  
> 
> I would have to assume so because I have not manually modified any gluster 
> volume settings after performing gdeploy via cockpit.  What would you 
> recommend these values be set to and does the fact that I am running SSDs 
> make any difference in this regard?  I've been a bit concerned about how a 
> rebuild might affect performance as the raid controllers in these servers 
> doesn't have a large queue depth 
> 
>  
> 
> On Sun, Aug 5, 2018, 12:07 PM Dar

[ovirt-users] Re: Tuning and testing GlusterFS performance

2018-08-05 Thread Darrell Budic
It set these by default?

cluster.shd-wait-qlength: 1
cluster.shd-max-threads: 8

In my experience, these are WAY too high and will degrade performance to the 
point of causing problems on decently used volumes during a heal. If these are 
being set by the HCI installer, I’d recommend changing them.


> From: Jayme 
> Subject: [ovirt-users] Re: Tuning and testing GlusterFS performance
> Date: August 4, 2018 at 10:31:30 AM EDT
> To: William Dossett
> Cc: users
> 
> Yes the volume options can be changed on the fly post creation no problem.  
> Good luck!
> 
> On Sat, Aug 4, 2018, 11:23 AM William Dossett,  > wrote:
> Hey, thanks!  Good catch!  Going to have to take a look at that, will be 
> working on it this weekend.. hopefully we can do this post creation.
> 
>  
> 
> Thanks again
> 
> Bill
> 
>  
> 
>  
> 
> From: Jayme mailto:jay...@gmail.com>> 
> Sent: Thursday, August 2, 2018 5:56 PM
> To: William Dossett  >
> Cc: users mailto:users@ovirt.org>>
> Subject: Re: [ovirt-users] Tuning and testing GlusterFS performance
> 
>  
> 
> Bill,
> 
>  
> 
> I thought I'd let you (and others know this) as it might save you some 
> headaches.  I found that my performance problem was resolved by clicking 
> "optimize for virt store" option in the volume settings of the hosted engine 
> (for the data volume).  Doing this one change has increased my I/O 
> performance by 10x alone.  I don't know why this would not be set or 
> recommended by default but I'm glad I found it!
> 
>  
> 
> - James
> 
>  
> 
> On Thu, Aug 2, 2018 at 2:32 PM, William Dossett  > wrote:
> 
> Yeah, I am just ramping up here, but this project is mostly on my own time 
> and money, hence no SSDs for Gluster… I’ve already blown close to $500 of my 
> own money on 10Gb ethernet cards and SFPs on ebay as my company frowns on us 
> getting good deals for equipment on ebay and would rather go to their 
> preferred supplier – where $500 wouldn’t even buy half a 10Gb CNA ☹  but I 
> believe in this project and it feels like it is getting ready for showtime – 
> if I can demo this in a few weeks and get some interest I’ll be asking them 
> to reimburse me, that’s for sure!
> 
>  
> 
> Hopefully going to get some of the other work off my plate and work on this 
> later this afternoon, will let you know any findings.
> 
>  
> 
> Regards
> 
> Bill
> 
>  
> 
>  
> 
> From: Jayme mailto:jay...@gmail.com>> 
> Sent: Thursday, August 2, 2018 11:07 AM
> To: William Dossett  >
> Cc: users mailto:users@ovirt.org>>
> Subject: Re: [ovirt-users] Tuning and testing GlusterFS performance
> 
>  
> 
> Bill,
> 
>  
> 
> Appreciate the feedback and would be interested to hear some of your results. 
>  I'm a bit worried about what i'm seeing so far on a very stock 3 node HCI 
> setup.  8mb/sec on that dd test mentioned in the original post from within a 
> VM (which may be explained by bad testing methods or some other configuration 
> considerations).. but what is more worrisome to me is that I tried another dd 
> test to time creating a 32GB file, it was taking a long time so I exited the 
> process and the VM basically locked up on me, I couldn't access it or the 
> console and eventually had to do a hard shutdown of the VM to recover.  
> 
>  
> 
> I don't plan to host many VMs, probably around 15.  They aren't super 
> demanding servers but some do read/write big directories such as working with 
> github repos and large node_module folders, rsyncs of fairly large dirs etc.  
> I'm definitely going to have to do a lot more testing before I can be assured 
> enough to put any important VMs on this cluster.
> 
>  
> 
> - James
> 
>  
> 
> On Thu, Aug 2, 2018 at 1:54 PM, William Dossett  > wrote:
> 
> I usually look at IOPs using IOMeter… you usually want several workers 
> running reads and writes in different threads at the same time.   You can run 
> Dynamo on a Linux instance and then connect it to a window GUI running 
> IOMeter to give you stats.  I was getting around 250 IOPs on JBOD sata 
> 7200rpm drives which isn’t bad for cheap and cheerful sata drives.
> 
>  
> 
> As I said, I’ve worked with HCI in VMware now for a couple of years, 
> intensely this last year when we had some defective Dell hardware and trying 
> to diagnose the problem.  Since then the hardware has been completely 
> replaced with all flash solution.   So when I got the all flash solution I 
> used IOmeter on it and was only getting around 3000 IOPs on enterprise flash 
> disks… not exactly stellar, but OK for one VM.  The trick there was the scale 
> out.  There is a VMware Fling call HCI Bench.  Its very cool in that you spin 
> up one VM and then it spawns 40 more VMs across the cluster.  I  could then 
> use VSAN observer and it showed my hosts were actually doing 30K IOPs on 
> average which is absolutely stellar 

[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-09 Thread Darrell Budic
I encountered this after upgrading clients to 3.12.9 as well. It’s not present 
in 3.12.8 or 3.12.6. I’ve added some data I had to that bug, can produce more 
if needed. Forgot to mention my server cluster is at 3.12.9, and is not showing 
any problems, it’s just the clients.

A test cluster on 3.12.11 also shows it, just slower because it’s got fewer 
clients on it.


> From: Sahina Bose 
> Subject: [ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)
> Date: July 9, 2018 at 10:42:15 AM CDT
> To: Edward Clay; Jim Kusznir
> Cc: users
> 
> see response about bug at 
> https://lists.ovirt.org/archives/list/users@ovirt.org/thread/WRYEBOLNHJZGKKJUNF77TJ7WMBS66ZYK/
>  
> 
>  which seems to indicate the referenced bug is fixed at 3.12.2 and higher.
> 
> Could you attach the statedump of the process to the bug 
> https://bugzilla.redhat.com/show_bug.cgi?id=1593826 
>  as requested?
> 
> 
> 
> On Mon, Jul 9, 2018 at 8:38 PM, Edward Clay  > wrote:
> Just to add my .02 here.  I've opened a bug on this issue where HV/host 
> connected to clusterfs volumes are running out of ram.  This seemed to be a 
> bug fixed in gluster 3.13 but that patch doesn't seem to be avaiable any 
> longer and 3.12 is what ovirt is using.  For example I have a host that was 
> showing 72% of memory consumption with 3 VMs running on it.  If I migrate 
> those VMs to another Host memory consumption drops to 52%.  If i put this 
> host into maintenance and then activate it it drops down to 2% or so.  Since 
> I ran into this issue I've been manually watching memory consumption on each 
> host and migrating VMs from it to others to keep things from dying.  I'm 
> hoping with the announcement of gluster 3.12 end of life and the move to 
> gluster 4.1 that this will get fixed or that the patch from 3.13 can get 
> backported so this problem will go away.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1593826 
> 
> 
> On 07/07/2018 11:49 AM, Jim Kusznir wrote:
>> **Security Notice - This external email is NOT from The Hut Group** 
>> 
>> This host has NO VMs running on it, only 3 running cluster-wide (including 
>> the engine, which is on its own storage):
>> 
>> top - 10:44:41 up 1 day, 17:10,  1 user,  load average: 15.86, 14.33, 13.39
>> Tasks: 381 total,   1 running, 379 sleeping,   1 stopped,   0 zombie
>> %Cpu(s):  2.7 us,  2.1 sy,  0.0 ni, 89.0 id,  6.1 wa,  0.0 hi,  0.2 si,  0.0 
>> st
>> KiB Mem : 32764284 total,   338232 free,   842324 used, 31583728 buff/cache
>> KiB Swap: 12582908 total, 12258660 free,   324248 used. 31076748 avail Mem 
>> 
>>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND  
>>  
>>
>> 13279 root  20   0 2380708  37628   4396 S  51.7  0.1   3768:03 
>> glusterfsd   
>>  
>> 13273 root  20   0 2233212  20460   4380 S  17.2  0.1 105:50.44 
>> glusterfsd   
>> 
>> 13287 root  20   0 2233212  20608   4340 S   4.3  0.1  34:27.20 
>> glusterfsd   
>> 
>> 16205 vdsm   0 -20 5048672  88940  13364 S   1.3  0.3   0:32.69 vdsmd
>>  
>>
>> 16300 vdsm  20   0  608488  25096   5404 S   1.3  0.1   0:05.78 python   
>>  
>>
>>  1109 vdsm  20   0 3127696  44228   8552 S   0.7  0.1  18:49.76 
>> ovirt-ha-broker  
>>  
>> 2 root  20   0   0  0  0 S   0.7  0.0   0:00.13 
>> kworker/u64:3
>>  
>>10 root  20   0   0  0  0 S   0.3  0.0   4:22.36 
>> rcu_sched
>>  
>>   572 root   0 -20   0  0  0 S   0.3  0.0   0:12.02 
>> kworker/1:1H 
>>  
>>   797 root  20   0   0  0  0 S   0.3  0.0   1:59.59 
>> kdmwork-253:2
>>  
>>   877 root   0 -20   0  0  0 S   0.3  0.0 

[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-06 Thread Darrell Budic
Jim-

In additional to my comments on the gluster-users list (go conservative on your 
cluster-shd settings for all volumes), I have one ovirt specific one that can 
help you in the situation you’re in, at least if you’re seeing the same client 
side memory use issue I am on gluster 3.12.9+. Since its client side, you can 
(temporarily) recover the RAM by putting a node into maintenance (without 
stopping gluster, and ignoring pending heals if needed), then re-activate it. 
It will unmount the gluster volumes, restarting the glusterfsds that are 
hogging the RAM. Then do it to the next node, and the next. Keeps you from 
having to reboot a node and making your heal situation worse. You may have 
repeat it occasionally, but it will keep you going, and you can stagger it 
between nodes and/or just redistribute VMs afterward.

  -Darrell
> From: Jim Kusznir 
> Subject: [ovirt-users] Ovirt cluster unstable; gluster to blame (again)
> Date: July 6, 2018 at 3:19:34 PM CDT
> To: users
> 
> hi all:
> 
> Once again my production ovirt cluster is collapsing in on itself.  My 
> servers are intermittently unavailable or degrading, customers are noticing 
> and calling in.  This seems to be yet another gluster failure that I haven't 
> been able to pin down.
> 
> I posted about this a while ago, but didn't get anywhere (no replies that I 
> found).  The problem started out as a glusterfsd process consuming large 
> amounts of ram (up to the point where ram and swap were exhausted and the 
> kernel OOM killer killed off the glusterfsd process).  For reasons not clear 
> to me at this time, that resulted in any VMs running on that host and that 
> gluster volume to be paused with I/O error (the glusterfs process is usually 
> unharmed; why it didn't continue I/O with other servers is confusing to me).
> 
> I have 3 servers and a total of 4 gluster volumes (engine, iso, data, and 
> data-hdd).  The first 3 are replica 2+arb; the 4th (data-hdd) is replica 3.  
> The first 3 are backed by an LVM partition (some thin provisioned) on an SSD; 
> the 4th is on a seagate hybrid disk (hdd + some internal flash for 
> acceleration).  data-hdd is the only thing on the disk.  Servers are Dell 
> R610 with the PERC/6i raid card, with the disks individually passed through 
> to the OS (no raid enabled).
> 
> The above RAM usage issue came from the data-hdd volume.  Yesterday, I cought 
> one of the glusterfsd high ram usage before the OOM-Killer had to run.  I was 
> able to migrate the VMs off the machine and for good measure, reboot the 
> entire machine (after taking this opportunity to run the software updates 
> that ovirt said were pending).  Upon booting back up, the necessary volume 
> healing began.  However, this time, the healing caused all three servers to 
> go to very, very high load averages (I saw just under 200 on one server; 
> typically they've been 40-70) with top reporting IO Wait at 7-20%.  Network 
> for this volume is a dedicated gig network.  According to bwm-ng, initially 
> the network bandwidth would hit 50MB/s (yes, bytes), but tailed off to mostly 
> in the kB/s for a while.  All machines' load averages were still 40+ and 
> gluster volume heal data-hdd info reported 5 items needing healing.  Server's 
> were intermittently experiencing IO issues, even on the 3 gluster volumes 
> that appeared largely unaffected.  Even the OS activities on the hosts itself 
> (logging in, running commands) would often be very delayed.  The ovirt engine 
> was seemingly randomly throwing engine down / engine up / engine failed 
> notifications.  Responsiveness on ANY VM was horrific most of the time, with 
> random VMs being inaccessible.
> 
> I let the gluster heal run overnight.  By morning, there were still 5 items 
> needing healing, all three servers were still experiencing high load, and 
> servers were still largely unstable.
> 
> I've noticed that all of my ovirt outages (and I've had a lot, way more than 
> is acceptable for a production cluster) have come from gluster.  I still have 
> 3 VMs who's hard disk images have become corrupted by my last gluster crash 
> that I haven't had time to repair / rebuild yet (I believe this crash was 
> caused by the OOM issue previously mentioned, but I didn't know it at the 
> time).
> 
> Is gluster really ready for production yet?  It seems so unstable to me  
> I'm looking at replacing gluster with a dedicated NFS server likely FreeNAS.  
> Any suggestions?  What is the "right" way to do production storage on this (3 
> node cluster)?  Can I get this gluster volume stable enough to get my VMs to 
> run reliably again until I can deploy another storage solution?
> 
> --Jim
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> 

[ovirt-users] Re: Installing 3rd party watchdogs?

2018-06-06 Thread Darrell Budic
No, things like zabbix_agent or snmpd are fine. You just don’t want anything 
else to reboot a node the is under the control of an ovirt engine, leave the to 
the engine.

> From: femi adegoke 
> Subject: [ovirt-users] Installing 3rd party watchdogs?
> Date: June 6, 2018 at 3:12:49 AM CDT
> To: users@ovirt.org
> 
> "Important: Third-party watchdogs should not be installed on Red Hat 
> Enterprise Linux hosts".  
> 
> https://www.ovirt.org/documentation/install-guide/chap-Enterprise_Linux_Hosts/
> 
> Does this mean no monitoring agents/packages should be installed?
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QOGVARSHMLGMWNYHS3Z2Z77AL7WG5GCU/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AYN24FWX5GSVPOQ7QOKNQEFLSUYEIC6Q/


Re: [ovirt-users] Issues with ZFS volume creation

2018-04-02 Thread Darrell Budic
Try it with —force, if the disks have any kind of partition table on them, zfs 
will not allow you to over-write them by default.

If it’s still complaining about the disks being in use, it’s probably 
mutlipathd grabbing them. multipath -l or multipath -ll will show the to you. 
You may be able to get the pool creation done by doing ‘multipath -f’ to clear 
the ables and creating the pool before multipathd grabs the disks again, or you 
may want to read up on mutlipathd and edit your configs to prevent it from 
grabbing the disks you’re trying to use (or configure it for mutipath access to 
said disk, if you have the hardware for it).


> From: Tal Bar-Or 
> Subject: [ovirt-users] Issues with ZFS volume creation
> Date: March 25, 2018 at 9:54:35 AM CDT
> To: users
> 
> 
> Hello All,
> I know this question is might be out of Ovirt scope, but I don't have 
> anywhere else to ask for this issue (ZFS users mailing doesn't work), so I am 
> trying my luck here anyway
> so the issues go as follows :
> 
> Installed ZFS on top of CentOs 7.4 with Ovirt 4.2 , on physical Dell R720 
> with 15 sas  10 k 1.2TB each attached to PERC H310 adapter, disks are 
> configured to non-raid, all went OK, but when I am trying to create new zfs 
> pool using the following command:
>  
> zpool create -m none -o ashift=12 zvol raidz2 sda sdb sdc sdd sde sdf sdg sdh 
> sdi sdj sdk sdl sdm
> I get the following error below:
> /dev/sda is in use and contains a unknown filesystem.
> /dev/sdb is in use and contains a unknown filesystem.
> /dev/sdc is in use and contains a unknown filesystem.
> /dev/sdd is in use and contains a unknown filesystem.
> /dev/sde is in use and contains a unknown filesystem.
> /dev/sdf is in use and contains a unknown filesystem.
> /dev/sdg is in use and contains a unknown filesystem.
> /dev/sdh is in use and contains a unknown filesystem.
> /dev/sdi is in use and contains a unknown filesystem.
> /dev/sdj is in use and contains a unknown filesystem.
> /dev/sdk is in use and contains a unknown filesystem.
> /dev/sdl is in use and contains a unknown filesystem.
> /dev/sdm is in use and contains a unknown filesystem.
> 
> When typing command lsblk I get the following output below, all seems ok, any 
> idea what could be wrong?
> Please advice
> Thanks
> 
> # lsblk
> NAMEMAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
> sda   8:00  1.1T  0 disk
> └─35000cca07245c0ec 253:20  1.1T  0 mpath
> sdb   8:16   0  1.1T  0 disk
> └─35000cca072463898 253:10   0  1.1T  0 mpath
> sdc   8:32   0  1.1T  0 disk
> └─35000cca0724540e8 253:80  1.1T  0 mpath
> sdd   8:48   0  1.1T  0 disk
> └─35000cca072451b68 253:70  1.1T  0 mpath
> sde   8:64   0  1.1T  0 disk
> └─35000cca07245f578 253:30  1.1T  0 mpath
> sdf   8:80   0  1.1T  0 disk
> └─35000cca07246c568 253:11   0  1.1T  0 mpath
> sdg   8:96   0  1.1T  0 disk
> └─35000cca0724620c8 253:12   0  1.1T  0 mpath
> sdh   8:112  0  1.1T  0 disk
> └─35000cca07245d2b8 253:13   0  1.1T  0 mpath
> sdi   8:128  0  1.1T  0 disk
> └─35000cca07245f0e8 253:40  1.1T  0 mpath
> sdj   8:144  0  1.1T  0 disk
> └─35000cca072418958 253:50  1.1T  0 mpath
> sdk   8:160  0  1.1T  0 disk
> └─35000cca072429700 253:10  1.1T  0 mpath
> sdl   8:176  0  1.1T  0 disk
> └─35000cca07245d848 253:90  1.1T  0 mpath
> sdm   8:192  0  1.1T  0 disk
> └─35000cca0724625a8 253:00  1.1T  0 mpath
> sdn   8:208  0  1.1T  0 disk
> └─35000cca07245f5ac 253:60  1.1T  0 mpath
> 
> 
> -- 
> Tal Bar-or
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt vm's paused due to storage error

2018-03-30 Thread Darrell Budic
Found (and caused) my problem. 

I’d been evaluating different settings for (default settings shown):
cluster.shd-max-threads 1   
cluster.shd-wait-qlength1024

and had forgotten to reset them after testing. I had them at max-thread 8 and 
qlength 1.

It worked in that the cluster healed in approximately half the time, and was a 
total failure in that my cluster experienced IO pauses and at least one VM 
abnormal shutdown. 

I have 6 core processers in these boxes, and it looks like I just overloaded 
them to the point that normal IO wasn’t getting serviced because the self-heal 
was getting too much priority. I’ve reverted to the defaults for these, and 
things are now behaving normally, no pauses during healing at all.

Moral of the story is don’t forget to undo testing settings when done, and 
really don’t test extreme settings in production!

Back to upgrading my test cluster so I can properly abuse things like this.

  -Darrell
> From: Darrell Budic <bu...@onholyground.com>
> Subject: Re: [ovirt-users] Ovirt vm's paused due to storage error
> Date: March 22, 2018 at 1:23:29 PM CDT
> To: users
> 
> I’ve also encounter something similar on my setup, ovirt 3.1.9 with a gluster 
> 3.12.3 storage cluster. All the storage domains in question are setup as 
> gluster volumes & sharded, and I’ve enabled libgfapi support in the engine. 
> It’s happened primarily to VMs that haven’t been restarted to switch to gfapi 
> yet (still have fuse mounts for these), but one or two VMs that have been 
> switched to gfapi mounts as well.
> 
> I started updating the storage cluster to gluster 3.12.6 yesterday and got 
> more annoying/bad behavior as well. Many VMs that were “high disk use” VMs 
> experienced hangs, but not as storage related pauses. Instead, they hang and 
> their watchdogs eventually reported CPU hangs. All did eventually resume 
> normal operation, but it was annoying, to be sure. The Ovirt Engine also lost 
> contact with all of my VMs (unknown status, ? in GUI), even though it still 
> had contact with the hosts. My gluster cluster reported no errors, volume 
> status was normal, and all peers and bricks were connected. Didn’t see 
> anything in the gluster logs that indicated problems, but there were reports 
> of failed heals that eventually went away. 
> 
> Seems like something in vdsm and/or libgfapi isn’t handling the gfapi mounts 
> well during healing and the related locks, but I can’t tell what it is. I’ve 
> got two more servers in the cluster to upgrade to 3.12.6 yet, and I’ll keep 
> an eye on more logs while I’m doing it, will report on it after I get more 
> info.
> 
>   -Darrell
>> From: Sahina Bose <sab...@redhat.com <mailto:sab...@redhat.com>>
>> Subject: Re: [ovirt-users] Ovirt vm's paused due to storage error
>> Date: March 22, 2018 at 4:56:13 AM CDT
>> To: Endre Karlson
>> Cc: users
>> 
>> Can you provide "gluster volume info" and  the mount logs of the data volume 
>> (I assume that this hosts the vdisks for the VM's with storage error).
>> 
>> Also vdsm.log at the corresponding time.
>> 
>> On Fri, Mar 16, 2018 at 3:45 AM, Endre Karlson <endre.karl...@gmail.com 
>> <mailto:endre.karl...@gmail.com>> wrote:
>> Hi, this is is here again and we are getting several vm's going into storage 
>> error in our 4 node cluster running on centos 7.4 with gluster and ovirt 
>> 4.2.1.
>> 
>> Gluster version: 3.12.6
>> 
>> volume status
>> [root@ovirt3 ~]# gluster volume status
>> Status of volume: data
>> Gluster process TCP Port  RDMA Port  Online  Pid
>> --
>> Brick ovirt0:/gluster/brick3/data   49152 0  Y   
>> 9102 
>> Brick ovirt2:/gluster/brick3/data   49152 0  Y   
>> 28063
>> Brick ovirt3:/gluster/brick3/data   49152 0  Y   
>> 28379
>> Brick ovirt0:/gluster/brick4/data   49153 0  Y   
>> 9111 
>> Brick ovirt2:/gluster/brick4/data   49153 0  Y   
>> 28069
>> Brick ovirt3:/gluster/brick4/data   49153 0  Y   
>> 28388
>> Brick ovirt0:/gluster/brick5/data   49154 0  Y   
>> 9120 
>> Brick ovirt2:/gluster/brick5/data   49154 0  Y   
>> 28075
>> Brick ovirt3:/gluster/brick5/data   49154 0  Y   
>> 28397
>> Brick ovirt0:/gluster/brick6/data   49155 0  Y 

Re: [ovirt-users] gluster self-heal takes cluster offline

2018-03-23 Thread Darrell Budic
What version of ovirt and gluster? Sounds like something I just saw with 
gluster 3.12.x, are you using libgfapi or just fuse mounts?

> From: Sahina Bose 
> Subject: Re: [ovirt-users] gluster self-heal takes cluster offline
> Date: March 23, 2018 at 1:26:01 AM CDT
> To: Jim Kusznir
> Cc: Ravishankar Narayanankutty; users
> 
> 
> 
> On Fri, Mar 16, 2018 at 2:45 AM, Jim Kusznir  > wrote:
> Hi all:
> 
> I'm trying to understand why/how (and most importantly, how to fix) an 
> substantial issue I had last night.  This happened one other time, but I 
> didn't know/understand all the parts associated with it until last night.
> 
> I have a 3 node hyperconverged (self-hosted engine, Gluster on each node) 
> cluster.  Gluster is Replica 2 + arbitrar.  Current network configuration is 
> 2x GigE on load balance ("LAG Group" on switch), plus one GigE from each 
> server on a separate vlan, intended for Gluster (but not used).  Server 
> hardware is Dell R610's, each server as an SSD in it.  Server 1 and 2 have 
> the full replica, server 3 is the arbitrar.
> 
> I put server 2 into maintence so I can work on the hardware, including turn 
> it off and such.  In the course of the work, I found that I needed to 
> reconfigure the SSD's partitioning somewhat, and it resulted in wiping the 
> data partition (storing VM images).  I figure, its no big deal, gluster will 
> rebuild that in short order.  I did take care of the extended attr settings 
> and the like, and when I booted it up, gluster came up as expected and began 
> rebuilding the disk.
> 
> How big was the data on this partition? What was the shard size set on the 
> gluster volume?
> Out of curiosity, how long did it take to heal and come back to operational?
> 
> 
> The problem is that suddenly my entire cluster got very sluggish.  The entine 
> was marking nodes and VMs failed and unfaling them throughout the system, 
> fairly randomly.  It didn't matter what node the engine or VM was on.  At one 
> point, it power cycled server 1 for "non-responsive" (even though everything 
> was running on it, and the gluster rebuild was working on it).  As a result 
> of this, about 6 VMs were killed and my entire gluster system went down hard 
> (suspending all remaining VMs and the engine), as there were no remaining 
> full copies of the data.  After several minutes (these are Dell servers, 
> after all...), server 1 came back up, and gluster resumed the rebuild, and 
> came online on the cluster.  I had to manually (virtsh command) unpause the 
> engine, and then struggle through trying to get critical VMs back up.  
> Everything was super slow, and load averages on the servers were often seen 
> in excess of 80 (these are 8 core / 16 thread boxes).  Actual CPU usage 
> (reported by top) was rarely above 40% (inclusive of all CPUs) for any one 
> server. Glusterfs was often seen using 180%-350% of a CPU on server 1 and 2.  
> 
> I ended up putting the cluster in global HA maintence mode and disabling 
> power fencing on the nodes until the process finished.  It appeared on at 
> least two occasions a functional node was marked bad and had the fencing not 
> been disabled, a node would have rebooted, just further exacerbating the 
> problem.  
> 
> Its clear that the gluster rebuild overloaded things and caused the problem.  
> I don't know why the load was so high (even IOWait was low), but load 
> averages were definately tied to the glusterfs cpu utilization %.   At no 
> point did I have any problems pinging any machine (host or VM) unless the 
> engine decided it was dead and killed it.
> 
> Why did my system bite it so hard with the rebuild?  I baby'ed it along until 
> the rebuild was complete, after which it returned to normal operation.
> 
> As of this event, all networking (host/engine management, gluster, and VM 
> network) were on the same vlan.  I'd love to move things off, but so far any 
> attempt to do so breaks my cluster.  How can I move my management interfaces 
> to a separate VLAN/IP Space?  I also want to move Gluster to its own private 
> space, but it seems if I change anything in the peers file, the entire 
> gluster cluster goes down.  The dedicated gluster network is listed as a 
> secondary hostname for all peers already.
> 
> Will the above network reconfigurations be enough?  I got the impression that 
> the issue may not have been purely network based, but possibly server IO 
> overload.  Is this likely / right?
> 
> I appreciate input.  I don't think gluster's recovery is supposed to do as 
> much damage as it did the last two or three times any healing was required.
> 
> Thanks!
> --Jim
> 
> ___
> Users mailing list
> Users@ovirt.org 
> http://lists.ovirt.org/mailman/listinfo/users 
> 
> 
> 
> ___
> Users 

Re: [ovirt-users] Ovirt vm's paused due to storage error

2018-03-22 Thread Darrell Budic
I’ve also encounter something similar on my setup, ovirt 3.1.9 with a gluster 
3.12.3 storage cluster. All the storage domains in question are setup as 
gluster volumes & sharded, and I’ve enabled libgfapi support in the engine. 
It’s happened primarily to VMs that haven’t been restarted to switch to gfapi 
yet (still have fuse mounts for these), but one or two VMs that have been 
switched to gfapi mounts as well.

I started updating the storage cluster to gluster 3.12.6 yesterday and got more 
annoying/bad behavior as well. Many VMs that were “high disk use” VMs 
experienced hangs, but not as storage related pauses. Instead, they hang and 
their watchdogs eventually reported CPU hangs. All did eventually resume normal 
operation, but it was annoying, to be sure. The Ovirt Engine also lost contact 
with all of my VMs (unknown status, ? in GUI), even though it still had contact 
with the hosts. My gluster cluster reported no errors, volume status was 
normal, and all peers and bricks were connected. Didn’t see anything in the 
gluster logs that indicated problems, but there were reports of failed heals 
that eventually went away. 

Seems like something in vdsm and/or libgfapi isn’t handling the gfapi mounts 
well during healing and the related locks, but I can’t tell what it is. I’ve 
got two more servers in the cluster to upgrade to 3.12.6 yet, and I’ll keep an 
eye on more logs while I’m doing it, will report on it after I get more info.

  -Darrell
> From: Sahina Bose 
> Subject: Re: [ovirt-users] Ovirt vm's paused due to storage error
> Date: March 22, 2018 at 4:56:13 AM CDT
> To: Endre Karlson
> Cc: users
> 
> Can you provide "gluster volume info" and  the mount logs of the data volume 
> (I assume that this hosts the vdisks for the VM's with storage error).
> 
> Also vdsm.log at the corresponding time.
> 
> On Fri, Mar 16, 2018 at 3:45 AM, Endre Karlson  > wrote:
> Hi, this is is here again and we are getting several vm's going into storage 
> error in our 4 node cluster running on centos 7.4 with gluster and ovirt 
> 4.2.1.
> 
> Gluster version: 3.12.6
> 
> volume status
> [root@ovirt3 ~]# gluster volume status
> Status of volume: data
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick ovirt0:/gluster/brick3/data   49152 0  Y   9102 
> Brick ovirt2:/gluster/brick3/data   49152 0  Y   28063
> Brick ovirt3:/gluster/brick3/data   49152 0  Y   28379
> Brick ovirt0:/gluster/brick4/data   49153 0  Y   9111 
> Brick ovirt2:/gluster/brick4/data   49153 0  Y   28069
> Brick ovirt3:/gluster/brick4/data   49153 0  Y   28388
> Brick ovirt0:/gluster/brick5/data   49154 0  Y   9120 
> Brick ovirt2:/gluster/brick5/data   49154 0  Y   28075
> Brick ovirt3:/gluster/brick5/data   49154 0  Y   28397
> Brick ovirt0:/gluster/brick6/data   49155 0  Y   9129 
> Brick ovirt2:/gluster/brick6_1/data 49155 0  Y   28081
> Brick ovirt3:/gluster/brick6/data   49155 0  Y   28404
> Brick ovirt0:/gluster/brick7/data   49156 0  Y   9138 
> Brick ovirt2:/gluster/brick7/data   49156 0  Y   28089
> Brick ovirt3:/gluster/brick7/data   49156 0  Y   28411
> Brick ovirt0:/gluster/brick8/data   49157 0  Y   9145 
> Brick ovirt2:/gluster/brick8/data   49157 0  Y   28095
> Brick ovirt3:/gluster/brick8/data   49157 0  Y   28418
> Brick ovirt1:/gluster/brick3/data   49152 0  Y   23139
> Brick ovirt1:/gluster/brick4/data   49153 0  Y   23145
> Brick ovirt1:/gluster/brick5/data   49154 0  Y   23152
> Brick ovirt1:/gluster/brick6/data   49155 0  Y   23159
> Brick ovirt1:/gluster/brick7/data   49156 0  Y   23166
> Brick ovirt1:/gluster/brick8/data   49157 0  Y   23173
> Self-heal Daemon on localhost   N/A   N/AY   7757 
> Bitrot Daemon on localhost  N/A   N/AY   7766 
> Scrubber Daemon on localhostN/A   N/AY   7785 
> Self-heal Daemon on ovirt2  N/A   N/AY   8205 
> Bitrot Daemon on ovirt2 N/A   N/AY   8216 
> Scrubber Daemon on ovirt2   N/A   N/AY   8227 
> Self-heal Daemon on ovirt0  N/A   N/AY   32665
> Bitrot Daemon on ovirt0 N/A   N/AY  

Re: [ovirt-users] Ovirt with ZFS+ Gluster

2018-03-19 Thread Darrell Budic
Most of this is still valid if getting a bit long in the tooth: 
https://docs.gluster.org/en/latest/Administrator%20Guide/Gluster%20On%20ZFS/

I’ve got it running on several production clusters. I’m using the zfsol 0.7.6 
kmod installation myself. I use a zvol per brick, and only one brick per 
machine from the zpool per gluster volume. If I had more disks, I might have 
two zvols with a brick each per gluster volume, but not now. My local settings:

# zfs get all v0 | grep local
v0compression   lz4local
v0xattr sa local
v0acltype   posixacl   local
v0relatime  on local


> From: Karli Sjöberg 
> Subject: Re: [ovirt-users] Ovirt with ZFS+ Gluster
> Date: March 19, 2018 at 3:36:41 AM CDT
> To: Tal Bar-Or; users
> 
> On Sun, 2018-03-18 at 14:01 +0200, Tal Bar-Or wrote:
>> Hello,
>> 
>> I started to do  new modest system planing and the system will be
>> mounted on top of 3~4 Dell r720 with each 2xe5-2640 v2 and 128GB
>> memory and 12xsas 10k 1.2tb and 3x ssd's
>> my plan is to use zfs on top of glusterfs , and my question is since
>> i didn't saw any doc on it 
>> Is this kind of deployment is done in the past and recommended.
>> any way if yes is there any doc how to ?
>> Thanks 
>> 
>> 
>> -- 
>> Tal Bar-or
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
> 
> There aren´t any specific documentation about using ZFS underneath
> Gluster together with oVirt, but there´s nothing wrong IMO about using
> ZFS with Gluster. E.g. 45 Drives are using it and posting really funny
> videos about it:
> 
> https://www.youtube.com/watch?v=A0wV4k58RIs
> 
> Are you planning this as a standalone Gluster cluster or do you want to
> use it hyperconverged?
> 
> /K___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt and gateway behavior

2018-02-06 Thread Darrell Budic
I’ve seen this sort of happen on my systems, the gateway ip goes down for some 
reason, and the engine restarts repeatedly, rending it unusable, even though 
it’s on the same ip subnet as all the host boxes and can still talk to the 
VDSMs. In my case, it doesn’t hurt the cluster or DC, but it’s annoying and 
unnecessary in my environment where the gateway isn’t important for cluster 
communications..

I can understand why using the ip of the gateway became a test as a proxy for 
network connectivity, but it seems like it’s something that isn’t always valid 
and maybe the local admin should have a choice of how it’s used. Something like 
the current fencing option for “50% hosts down” as a double check, if you can 
still reach the vdsm hosts, don’t restart the engine vm.

  -Darrell
> From: Yaniv Kaul 
> Subject: Re: [ovirt-users] ovirt and gateway behavior
> Date: February 6, 2018 at 2:40:14 AM CST
> To: Alex
> Cc: Ovirt Users
> 
> 
> 
> On Feb 5, 2018 2:21 PM, "Alex K"  > wrote:
> Hi all, 
> 
> I have a 3 nodes ovirt 4.1 cluster, self hosted on top of glusterfs. The 
> cluster is used to host several VMs. 
> I have observed that when gateway is lost (say the gateway device is down) 
> the ovirt cluster goes down. 
> 
> Is the cluster down, or just the self-hosted engine? 
> 
> 
> It seems a bit extreme behavior especially when one does not care if the 
> hosted VMs have connectivity to Internet or not. 
> 
> Are the VMs down? 
> The hosts? 
> Y. 
> 
> 
> Can this behavior be disabled?
> 
> Thanx, 
> Alex
> 
> ___
> Users mailing list
> Users@ovirt.org 
> http://lists.ovirt.org/mailman/listinfo/users 
> 
> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [ANN] oVirt 4.1.9 Release is now available

2018-01-24 Thread Darrell Budic
Couple questions about a fixed ‘bug' in the release notes for this:

Does https://bugzilla.redhat.com/show_bug.cgi?id=1517237 mean that ovirt will 
no longer use libgfapi for any VMs, or is it just removing the check box from 
the storage GUI?

If it is removing the capacity, I have several questions:
- what happens to my cluster which has it enabled from CLI?
- what happens to my currently running VMs using it?
- why are you removing a major feature with a bug notice and no further 
information?
- it doesn’t seem to affect my HA vms, I’ve seen my 4.1.8 system properly 
restart systems using it (node/libvirtd crash that seems to have been related 
to spectre/meltdown firmwares)
- or does this only affect the hosted-engine VM?

If it doesn’t actually remove the capability, this is ignorable. But this is a 
much wanted feature for me, and I’ve been quite happy to finally get to use it. 
Not going to be pleased if it disappears without notice.

  -Darrell

> From: Arman Khalatyan 
> Subject: Re: [ovirt-users] [ANN] oVirt 4.1.9 Release is now available
> Date: January 24, 2018 at 6:47:38 AM CST
> To: Lev Veyde
> Cc: annou...@ovirt.org, users
> 
> Thanks for the announcement.
> A little comment: could you please fix the line  yum install 
>  
> There is an extra '<' symbol there since 4.0.x :=)
> 
> 
> On Wed, Jan 24, 2018 at 12:00 PM, Lev Veyde  > wrote:
> The oVirt Project is pleased to announce the availability of the oVirt 4.1.9 
> release, as of January 24th, 2017
> 
> This update is the ninth in a series of stabilization updates to the 4.1
> series.
> 
> Please note that no further updates will be issued for the 4.1 series.
> We encourage users to upgrade to 4.2 series to receive new features and 
> updates.
>  
> This release is available now for:
> * Red Hat Enterprise Linux 7.4 or later
> * CentOS Linux (or similar) 7.4 or later
>  
> This release supports Hypervisor Hosts running:
> * Red Hat Enterprise Linux 7.4 or later
> * CentOS Linux (or similar) 7.4 or later
> * oVirt Node 4.1
>  
> See the release notes [1] for installation / upgrade instructions and
> a list of new features and bugs fixed.
>  
> Notes:
> - oVirt Appliance is already available
> - oVirt Live is already available [2]
> - oVirt Node will be available soon [2]
> 
> Additional Resources:
> * Read more about the oVirt 4.1.9 release 
> highlights:http://www.ovirt.org/release/4.1.9/ 
> 
> * Get more oVirt Project updates on Twitter: https://twitter.com/ovirt 
> 
> * Check out the latest project news on the oVirt 
> blog:http://www.ovirt.org/blog/ 
>  
> [1] http://www.ovirt.org/release/4.1.9/ 
> [2] http://resources.ovirt.org/pub/ovirt-4.1/iso/ 
> 
> 
> -- 
> 
> LEV VEYDE
> SOFTWARE ENGINEER, RHCE | RHCVA | MCITP
> Red Hat Israel
> 
>  
> l...@redhat.com  | lve...@redhat.com 
>   
> TRIED. TESTED. TRUSTED. 
> ___
> Users mailing list
> Users@ovirt.org 
> http://lists.ovirt.org/mailman/listinfo/users 
> 
> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] OVS not running / logwatch error after upgrade from 4.0.6 to 4.1.8

2018-01-19 Thread Darrell Budic
OVS is an optional tech preview in 4.1.x, you don’t need it. It is annoying 
about the logwatch errors though…

I think I created the directory to avoid the errors, I forgot exactly what it 
was, sorry.

> From: Derek Atkins 
> Subject: [ovirt-users] OVS not running / logwatch error after upgrade from 
> 4.0.6 to 4.1.8
> Date: January 19, 2018 at 10:44:56 AM CST
> To: users
> 
> Hi,
> I recently upgraded my 1-host ovirt deployment from 4.0.6 to 4.1.8.
> Since then, the host has been reporting a cron.daily error:
> 
> /etc/cron.daily/logrotate:
> 
> logrotate_script: line 4: cd: /var/run/openvswitch: No such file or directory
> 
> This isn't surprising, since:
> 
> # systemctl status openvswitch
> ● openvswitch.service - Open vSwitch
>   Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled;
> vendor preset: disabled)
>   Active: inactive (dead)
> 
> The host was just upgraded by "yum update".
> Was there anything special that needed to happen after the update?
> Do I *NEED* OVS running?
> The VMs all seem to be behaving properly.
> 
> Thanks,
> 
> -derek
> 
> -- 
>   Derek Atkins 617-623-3745
>   de...@ihtfp.com www.ihtfp.com
>   Computer and Internet Security Consultant
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Problems with some vms

2018-01-14 Thread Darrell Budic
What version of gluster are you running? I’ve seen a few of these since moving 
my storage cluster to 12.3, but still haven’t been able to determine what’s 
causing it. Seems to be happening most often on VMs that haven’t been switches 
over to libgfapi mounts yet, but even one of those has paused once so far. They 
generally restart fine from the GUI, and nothing seems to need healing.

> From: Endre Karlson 
> Subject: [ovirt-users] Problems with some vms
> Date: January 14, 2018 at 12:55:45 PM CST
> To: users
> 
> Hi, we are getting some errors with some of our vms in a 3 node server setup.
> 
> 2018-01-14 15:01:44,015+0100 INFO  (libvirt/events) [virt.vm] 
> (vmId='2c34f52d-140b-4dbe-a4bd-d2cb467b0b7c') abnormal vm stop device 
> virtio-disk0  error eother (vm:4880)
> 
> We are running glusterfs for shared storage.
> 
> I have tried setting global maintenance on the first server and then issuing 
> a 'hosted-engine --vm-start' but that leads to nowhere.
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Some major problems after 4.2 upgrade, could really use some assistance

2018-01-11 Thread Darrell Budic
Were you running gluster under you shared storage? If so, you probably need to 
setup ganesha nfs yourself.

If not, check your ha-agent logs and make sure it’s mounting the storage 
properly and check for errors. Good luck!

> From: Jayme 
> Subject: Re: [ovirt-users] Some major problems after 4.2 upgrade, could 
> really use some assistance
> Date: January 11, 2018 at 12:28:32 PM CST
> To: Martin Sivak; users@ovirt.org
> 
> This is becoming critical for me, does anyone have any ideas or 
> recommendations on what I can do to recover access to hosted VM?  As of right 
> now I have three hosts that are fully updated, they have the 4.2  repo and a 
> full yum update was performed on them, there are no new updates to apply.  
> The hosted engine had updates as well as a full and complete engine-setup, 
> but did not return after being shut down.  There must be some way I can get 
> the engine running again?  Please
> 
> On Thu, Jan 11, 2018 at 8:24 AM, Jayme  > wrote:
> The hosts have all ready been fully updated with 4.2 packages though.
> 
> ex. 
> 
> ovirt-host.x86_64  
> 4.2.0-1.el7.centos   @ovirt-4.2
> ovirt-host-dependencies.x86_64 
> 4.2.0-1.el7.centos   @ovirt-4.2
> ovirt-host-deploy.noarch   
> 1.7.0-1.el7.centos   @ovirt-4.2
> ovirt-hosted-engine-ha.noarch  
> 2.2.2-1.el7.centos   @ovirt-4.2
> ovirt-hosted-engine-setup.noarch   
> 2.2.3-1.el7.centos   @ovirt-4.2
> 
> On Thu, Jan 11, 2018 at 8:16 AM, Martin Sivak  > wrote:
> Hi,
> 
> yes, you need to upgrade the hosts. Just take the
> ovirt-hosted-engine-ha and ovirt-hosted-engine-setup packages from
> ovirt 4.2 repositories.
> 
> Martin
> 
> On Thu, Jan 11, 2018 at 11:40 AM, Jayme  > wrote:
> > How do I upgrade the hosted engine packages when I can't reach it or do you
> > mean upgrade the hosts if so how exactly do I do that. As for the missing VM
> > it appears that the disk image is there but it's missing XML file I have no
> > idea why or how to recreate it.
> >
> > On Jan 11, 2018 4:43 AM, "Martin Sivak"  > > wrote:
> >>
> >> Hi,
> >>
> >> you hit one known issue we already have fixes for (4.1 hosts with 4.2
> >> engine):
> >> https://gerrit.ovirt.org/#/q/status:open+project:ovirt-hosted-engine-ha+branch:v2.1.z+topic:ovf_42_for_41
> >>  
> >> 
> >>
> >> You can try hotfixing it by upgrading hosted engine packages to 4.2 or
> >> applying the patches manually and installing python-lxml.
> >>
> >> I am not sure what happened to your other VM.
> >>
> >> Best regards
> >>
> >> Martin Sivak
> >>
> >> On Thu, Jan 11, 2018 at 6:15 AM, Jayme  >> > wrote:
> >> > I performed Ovirt 4.2 upgrade on a 3 host cluster with NFS shared
> >> > storage.
> >> > The shared storage is mounted from one of the hosts.
> >> >
> >> > I upgraded the hosted engine first, downloading the 4.2 rpm, doing a yum
> >> > update then engine setup which seemed to complete successfully, at the
> >> > end
> >> > it powered down the hosted VM but it never came back up.  I was unable
> >> > to
> >> > start it.
> >> >
> >> > I proceeded to upgrade the three hosts, ovirt 4.2 rpm and a full yum
> >> > update.
> >> > I also rebooted each of the three hosts.
> >> >
> >> > After some time the hosts did come back and almost all of the VMs are
> >> > running again and seem to be working ok with the exception of two:
> >> >
> >> > 1. The hosted VM still will not start, I've tried everything I can think
> >> > of.
> >> >
> >> > 2. A VM that I know existed is not running and does not appear to exist,
> >> > I
> >> > have no idea where it is or how to start it.
> >> >
> >> > 1. Hosted engine
> >> >
> >> > From one of the hosts I get a weird error trying to start it:
> >> >
> >> > # hosted-engine --vm-start
> >> > Command VM.getStats with args {'vmID':
> >> > '4013c829-c9d7-4b72-90d5-6fe58137504c'} failed:
> >> > (code=1, message=Virtual machine does not exist: {'vmId':
> >> > u'4013c829-c9d7-4b72-90d5-6fe58137504c'})
> >> >
> >> > From the two other hosts I do not get the same error as above, sometimes
> >> > it
> >> > appears to start but --vm-status shows errors such as:  Engine status
> >> > : {"reason": "failed liveliness check", "health": "bad", "vm": "up",
> >> > "detail": "Up"}
> >> >
> >> > Seeing these errors in syslog:
> >> >
> >> > Jan 11 01:06:30 host0 libvirtd: 2018-01-11 05:06:30.473+: 

Re: [ovirt-users] Non-responsive host, VM's are still running - how to resolve?

2017-11-14 Thread Darrell Budic
Try restarting vdsmd from the shell, “systemctl restart vdsmd”.


> From: Artem Tambovskiy 
> Subject: [ovirt-users] Non-responsive host, VM's are still running - how to 
> resolve?
> Date: November 14, 2017 at 11:23:32 AM CST
> To: users
> 
> Apparently, i lost the host which was running hosted-engine and another 4 
> VM's exactly during migration of second host from bare-metal to second host 
> in the cluster. For some reason first host entered the "Non reponsive" state. 
> The interesting thing is that hosted-engine and all other VM's up and 
> running, so its like a communication problem between hosted-engine and host. 
> 
> The engine.log at hosted-engine is full of following messages:
> 
> 2017-11-14 17:06:43,158Z INFO  
> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
> Connecting to ovirt2/80.239.162.106 
> 2017-11-14 17:06:43,159Z ERROR 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] 
> (DefaultQuartzScheduler9) [50938c3] Command 'GetAllVmStatsVDSCommand(HostName 
> = ovirt2.telia.ru , 
> VdsIdVDSCommandParametersBase:{runAsync='true', 
> hostId='3970247c-69eb-4bd8-b263-9100703a8243'})' execution failed: 
> java.net.NoRouteToHostException: No route to host
> 2017-11-14 17:06:43,159Z INFO  
> [org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] 
> (DefaultQuartzScheduler9) [50938c3] Failed to fetch vms info for host 
> 'ovirt2.telia.ru ' - skipping VMs monitoring.
> 2017-11-14 17:06:45,929Z INFO  
> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
> Connecting to ovirt2/80.239.162.106 
> 2017-11-14 17:06:45,930Z ERROR 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] 
> (DefaultQuartzScheduler2) [6080f1cc] Command 
> 'GetCapabilitiesVDSCommand(HostName = ovirt2.telia.ru 
> , 
> VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', 
> hostId='3970247c-69eb-4bd8-b263-9100703a8243', vds='Host[ovirt2.telia.ru 
> ,3970247c-69eb-4bd8-b263-9100703a8243]'})' execution 
> failed: java.net.NoRouteToHostException: No route to host
> 2017-11-14 17:06:45,930Z ERROR 
> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
> (DefaultQuartzScheduler2) [6080f1cc] Failure to refresh host 'ovirt2.telia.ru 
> ' runtime info: java.net.NoRouteToHostException: No 
> route to host
> 2017-11-14 17:06:48,933Z INFO  
> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
> Connecting to ovirt2/80.239.162.106 
> 2017-11-14 17:06:48,934Z ERROR 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] 
> (DefaultQuartzScheduler6) [1a64dfea] Command 
> 'GetCapabilitiesVDSCommand(HostName = ovirt2.telia.ru 
> , 
> VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', 
> hostId='3970247c-69eb-4bd8-b263-9100703a8243', vds='Host[ovirt2.telia.ru 
> ,3970247c-69eb-4bd8-b263-9100703a8243]'})' execution 
> failed: java.net.NoRouteToHostException: No route to host
> 2017-11-14 17:06:48,934Z ERROR 
> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
> (DefaultQuartzScheduler6) [1a64dfea] Failure to refresh host 'ovirt2.telia.ru 
> ' runtime info: java.net.NoRouteToHostException: No 
> route to host
> 2017-11-14 17:06:50,931Z INFO  
> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
> Connecting to ovirt2/80.239.162.106 
> 2017-11-14 17:06:50,932Z ERROR 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] 
> (DefaultQuartzScheduler4) [6b19d168] Command 'SpmStatusVDSCommand(HostName = 
> ovirt2.telia.ru , 
> SpmStatusVDSCommandParameters:{runAsync='true', 
> hostId='3970247c-69eb-4bd8-b263-9100703a8243', 
> storagePoolId='5a044257-02ec-0382-0243-01f2'})' execution failed: 
> java.net.NoRouteToHostException: No route to host
> 2017-11-14 17:06:50,939Z INFO  
> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
> Connecting to ovirt2/80.239.162.106 
> 2017-11-14 17:06:50,940Z ERROR 
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] 
> (DefaultQuartzScheduler4) [6b19d168] IrsBroker::Failed::GetStoragePoolInfoVDS
> 2017-11-14 17:06:50,940Z ERROR 
> [org.ovirt.engine.core.vdsbroker.irsbroker.GetStoragePoolInfoVDSCommand] 
> (DefaultQuartzScheduler4) [6b19d168] Command 'GetStoragePoolInfoVDSCommand( 
> GetStoragePoolInfoVDSCommandParameters:{runAsync='true', 
> storagePoolId='5a044257-02ec-0382-0243-01f2', 
> ignoreFailoverLimit='true'})' execution failed: IRSProtocolException: 
> 2017-11-14 17:06:51,937Z INFO  
> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] 
> Connecting 

Re: [ovirt-users] Enabling libgfapi disk access with oVirt 4.2

2017-11-09 Thread Darrell Budic
You do need to stop the VMs and restart them, not just issue a reboot. I havn’t 
tried under 4.2 yet, but it works in 4.1.6 that way for me.

> From: Alessandro De Salvo 
> Subject: Re: [ovirt-users] Enabling libgfapi disk access with oVirt 4.2
> Date: November 9, 2017 at 2:35:01 AM CST
> To: users@ovirt.org
> 
> Hi again,
> 
> OK, tried to stop all the vms, except the engine, set engine-config -s 
> LibgfApiSupported=true (for 4.2 only) and restarted the engine.
> 
> When I tried restarting the VMs they are still not using gfapi, so it does 
> not seem to help.
> 
> Cheers,
> 
> 
> 
> Alessandro
> 
> 
> 
> 
> Il 09/11/17 09:12, Alessandro De Salvo ha scritto:
>> Hi,
>> where should I enable gfapi via the UI?
>> The only command I tried was engine-config -s LibgfApiSupported=true but the 
>> result is what is shown in my output below, so it’s set to true for v4.2. Is 
>> it enough?
>> I’ll try restarting the engine. Is it really needed to stop all the VMs and 
>> restart them all? Of course this is a test setup and I can do it, but for 
>> production clusters in the future it may be a problem.
>> Thanks,
>> 
>>Alessandro
>> 
>> Il giorno 09 nov 2017, alle ore 07:23, Kasturi Narra > > ha scritto:
>> 
>>> Hi ,
>>> 
>>> The procedure to enable gfapi is below.
>>> 
>>> 1) stop all the vms running
>>> 2) Enable gfapi via UI or using engine-config command
>>> 3) Restart ovirt-engine service
>>> 4) start the vms.
>>> 
>>> Hope you have not missed any !!
>>> 
>>> Thanks
>>> kasturi 
>>> 
>>> On Wed, Nov 8, 2017 at 11:58 PM, Alessandro De Salvo 
>>> >> > wrote:
>>> Hi,
>>> 
>>> I'm using the latest 4.2 beta release and want to try the gfapi access, but 
>>> I'm currently failing to use it.
>>> 
>>> My test setup has an external glusterfs cluster v3.12, not managed by oVirt.
>>> 
>>> The compatibility flag is correctly showing gfapi should be enabled with 
>>> 4.2:
>>> 
>>> # engine-config -g LibgfApiSupported
>>> LibgfApiSupported: false version: 3.6
>>> LibgfApiSupported: false version: 4.0
>>> LibgfApiSupported: false version: 4.1
>>> LibgfApiSupported: true version: 4.2
>>> 
>>> The data center and cluster have the 4.2 compatibility flags as well.
>>> 
>>> However, when starting a VM with a disk on gluster I can still see the disk 
>>> is mounted via fuse.
>>> 
>>> Any clue of what I'm still missing?
>>> 
>>> Thanks,
>>> 
>>> 
>>>Alessandro
>>> 
>>> ___
>>> Users mailing list
>>> Users@ovirt.org 
>>> http://lists.ovirt.org/mailman/listinfo/users 
>>> 
>>> 
>> 
>> 
>> ___
>> Users mailing list
>> Users@ovirt.org 
>> http://lists.ovirt.org/mailman/listinfo/users 
>> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] How to best view remote console via macosX

2017-11-03 Thread Darrell Budic
https://www.ovirt.org/develop/release-management/features/virt/novnc-console/

> From: Jayme Fall <ja...@silverorange.com>
> Subject: Re: [ovirt-users] How to best view remote console via macosX
> Date: November 3, 2017 at 5:24:24 PM CDT
> To: Darrell Budic
> 
> How is web based VNC supported?  Do you need to install any specific 
> components, when I launch console from ovirt admin it just prompts to 
> download VV file. 
> 
>> On Nov 3, 2017, at 5:54 PM, Darrell Budic <bu...@onholyground.com 
>> <mailto:bu...@onholyground.com>> wrote:
>> 
>> I find using the web based VNC is the simplest from my mac. You can extract 
>> data from a console.vv file and open it with any VNC software on a mac, even 
>> Screen Sharing, but you have to enter the IP & port manually.  I’m not aware 
>> of any spice solutions at this moment, but i haven’t looked for one in a 
>> while.
>> 
>>> From: Jayme Fall <ja...@silverorange.com <mailto:ja...@silverorange.com>>
>>> Subject: [ovirt-users] How to best view remote console via macosX
>>> Date: November 3, 2017 at 3:06:29 PM CDT
>>> To: users@ovirt.org <mailto:users@ovirt.org>
>>> 
>>> I’m wondering what the best method is to get overt console support working 
>>> from a MacOSx device.  I have tried opening console.vv files using vnc 
>>> client as well as remote viewer and have not had any luck thus far.  Overt 
>>> is version 4.1.4
>>> 
>>> Thanks!
>>> ___
>>> Users mailing list
>>> Users@ovirt.org <mailto:Users@ovirt.org>
>>> http://lists.ovirt.org/mailman/listinfo/users 
>>> <http://lists.ovirt.org/mailman/listinfo/users>
>> 
> 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] How to best view remote console via macosX

2017-11-03 Thread Darrell Budic
I find using the web based VNC is the simplest from my mac. You can extract 
data from a console.vv file and open it with any VNC software on a mac, even 
Screen Sharing, but you have to enter the IP & port manually.  I’m not aware of 
any spice solutions at this moment, but i haven’t looked for one in a while.

> From: Jayme Fall 
> Subject: [ovirt-users] How to best view remote console via macosX
> Date: November 3, 2017 at 3:06:29 PM CDT
> To: users@ovirt.org
> 
> I’m wondering what the best method is to get overt console support working 
> from a MacOSx device.  I have tried opening console.vv files using vnc client 
> as well as remote viewer and have not had any luck thus far.  Overt is 
> version 4.1.4
> 
> Thanks!
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [ANN] oVirt 4.2.0 First Beta Release is now available for testing

2017-11-01 Thread Darrell Budic
From: Greg Sheremeta 
> Subject: Re: [ovirt-users] [ANN] oVirt 4.2.0 First Beta Release is now 
> available for testing
> Date: November 1, 2017 at 11:21:52 AM CDT
> To: Robert Story
> Cc: FERNANDO FREDIANI; users
> 
> 
> I'd argue
> that oVirt, particularly the admin portal, is for a much more
> technical audience. I think right-click should stay for admin portal.
> 
> What are people's opinions on an "actions" button on the far right of the 
> tables?
> See #6 here:
> http://www.patternfly.org/pattern-library/content-views/table-view/#/design 
> 
> 
> Would that be an adequate substitute for right-clicking?

Mostly, but I’d prefer the options to be available on the left by the name of 
the VM, as that’s what I’m likely sorting on and where I’m going to find it 
easier to be sure I’ve got the right one when I select an action on a row that 
might not be highlighted yet.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [ANN] oVirt 4.2.0 First Beta Release is now available for testing

2017-10-31 Thread Darrell Budic
> This is UXD guideline around web applications.
> 
> Some will like it more, and some won't understand why they don't see the 
> browser menu they expect when they right click. 
> 
> There is no "one size fits all here", but that's the guideline we followed. 

That’s very true. I haven’t read the paternfly guidelines, but is there an 
option for a modifier click? Or I might suggest that the machine icon could be 
overloaded with the “old” right click menu on click for that little bit of 
extra functionality?







> On Oct 31, 2017 8:01 PM, "FERNANDO FREDIANI" <fernando.fredi...@upx.com 
> <mailto:fernando.fredi...@upx.com>> wrote:
> Question is: who is the user ? There are different types of them for 
> different proposes.
> 
> Fernando
> 
> On 31/10/2017 15:57, Oved Ourfali wrote:
>> As mentioned earlier, this is one motivation but not the only one. You see 
>> right click less and less in web applications, as it isn't considered a good 
>> user experience. This is also the patternfly guideline (patternfly is a 
>> framework we heavily use throughout the application). 
>> 
>> We will however consider bringing this back if there will be high demand. 
>> 
>> Thanks for the feedback!
>> Oved 
>> 
>> On Oct 31, 2017 7:50 PM, "Darrell Budic" <bu...@onholyground.com 
>> <mailto:bu...@onholyground.com>> wrote:
>> Agreed. I use the right click functionality all the time and will miss it. 
>> With 70+ VMs, I may check status in a mobile interface, but I’m never going 
>> to use it for primary work. Please prioritize ease of use on Desktop over 
>> Mobile!
>> 
>> 
>>> From: FERNANDO FREDIANI <fernando.fredi...@upx.com 
>>> <mailto:fernando.fredi...@upx.com>>
>>> Subject: Re: [ovirt-users] [ANN] oVirt 4.2.0 First Beta Release is now 
>>> available for testing
>>> Date: October 31, 2017 at 11:59:20 AM CDT
>>> To: users@ovirt.org <mailto:users@ovirt.org>
>>> 
>>> 
>>> On 31/10/2017 13:43, Alexander Wels wrote:
>>>>> 
>>>>> Will the right click dialog be available in the final release? Because,
>>>>> currently in 4.2 we need to go at the up right corner to interact with
>>>>> object (migrate, maintenance...)
>>>>> 
>>>> Short answer: No, we removed it on purpose.
>>>> 
>>>> Long answer: No, here are the reasons why:
>>>> - We are attempting to get the UI more mobile friendly, and while its not 
>>>> 100%
>>>> there yet, it is actually quite useable on a mobile device now. Mobile 
>>>> devices
>>>> don't have a right click, so hiding functionality in there would make no
>>>> sense.
>>> Please don't put mobile usage over Desktop usage. While mobile usage is 
>>> nice to have in "certain" situations. In real day by day operation nobody 
>>> uses mobile devices to do their deployments and manage their large 
>>> environments. If having both options where you can switch between then is 
>>> nice, but if something should prevail should always be Desktop. We are not 
>>> talking about a Stock Trading interface or something you need that level or 
>>> flexibility and mobility to do static things anytime anywhere.
>>> 
>>> So I beg you to consider well before remove things which are pretty useful 
>>> for a day by day and real management usage because of a new trend or buzz 
>>> stuff.
>>> Right click is always on popular on Desktop enviroments and will be for 
>>> quite a while.
>>>> - You can now right click and get the browsers menu instead of ours and you
>>>> can do things like copy from the menu.
>>>> - We replicated all the functionality from the menu in the buttons/kebab 
>>>> menu
>>>> available on the right. Our goal was to have all the commonly used actions 
>>>> as
>>>> a button, and less often used actions in the kebab to declutter the 
>>>> interface.
>>>> We traded an extra click for some mouse travel
>>>> - Lots of people didn't realize there even was a right click menu because 
>>>> its
>>>> a web interface, and they couldn't find some functionality that was only
>>>> available in the right click menu.
>>>> 
>>>> Now that being said, we are still debating if it was a good move or not. 
>>>> For
>>>> now we want to see how it plays out, if a lot of people want it back, it is
>>>> certainly possi

Re: [ovirt-users] [ANN] oVirt 4.2.0 First Beta Release is now available for testing

2017-10-31 Thread Darrell Budic
Agreed. I use the right click functionality all the time and will miss it. With 
70+ VMs, I may check status in a mobile interface, but I’m never going to use 
it for primary work. Please prioritize ease of use on Desktop over Mobile!


> From: FERNANDO FREDIANI 
> Subject: Re: [ovirt-users] [ANN] oVirt 4.2.0 First Beta Release is now 
> available for testing
> Date: October 31, 2017 at 11:59:20 AM CDT
> To: users@ovirt.org
> 
> 
> On 31/10/2017 13:43, Alexander Wels wrote:
>>> 
>>> Will the right click dialog be available in the final release? Because,
>>> currently in 4.2 we need to go at the up right corner to interact with
>>> object (migrate, maintenance...)
>>> 
>> Short answer: No, we removed it on purpose.
>> 
>> Long answer: No, here are the reasons why:
>> - We are attempting to get the UI more mobile friendly, and while its not 
>> 100%
>> there yet, it is actually quite useable on a mobile device now. Mobile 
>> devices
>> don't have a right click, so hiding functionality in there would make no
>> sense.
> Please don't put mobile usage over Desktop usage. While mobile usage is nice 
> to have in "certain" situations. In real day by day operation nobody uses 
> mobile devices to do their deployments and manage their large environments. 
> If having both options where you can switch between then is nice, but if 
> something should prevail should always be Desktop. We are not talking about a 
> Stock Trading interface or something you need that level or flexibility and 
> mobility to do static things anytime anywhere.
> 
> So I beg you to consider well before remove things which are pretty useful 
> for a day by day and real management usage because of a new trend or buzz 
> stuff.
> Right click is always on popular on Desktop enviroments and will be for quite 
> a while.
>> - You can now right click and get the browsers menu instead of ours and you
>> can do things like copy from the menu.
>> - We replicated all the functionality from the menu in the buttons/kebab menu
>> available on the right. Our goal was to have all the commonly used actions as
>> a button, and less often used actions in the kebab to declutter the 
>> interface.
>> We traded an extra click for some mouse travel
>> - Lots of people didn't realize there even was a right click menu because its
>> a web interface, and they couldn't find some functionality that was only
>> available in the right click menu.
>> 
>> Now that being said, we are still debating if it was a good move or not. For
>> now we want to see how it plays out, if a lot of people want it back, it is
>> certainly possible we will put it back.
>> 
 that is something you are interested in. Its also much faster and better
 than before.
 
> On 31/10/2017 10:13, Sandro Bonazzola wrote:
>> The oVirt Project is pleased to announce the availability of the First
>> Beta Release of oVirt 4.2.0, as of October 31st, 2017
>> 
>> 
>> This is pre-release software. This pre-release should not to be used
>> in production.
>> 
>> Please take a look at our community page[1] to learn how to ask
>> questions and interact with developers and users.All issues or bugs
>> should be reported via oVirt Bugzilla[2].
>> 
>> This update is the first beta release of the 4.2.0 version. This
>> release brings more than 230 enhancements and more than one thousand
>> bug fixes, including more than 380 high or urgent severity fixes, on
>> top of oVirt 4.1 series.
>> 
>> 
>> What's new in oVirt 4.2.0?
>> 
>>   *
>> The Administration Portalhas been completely redesigned using
>>  Patternfly, a widely adopted standard in web application design.
>>  It now features a cleaner, more intuitive design, for an improved
>>  user experience.
>>  *
>> There is an all-new VM Portalfor non-admin users.
>>  *
>> A new High Performance virtual machinetype has been added to the
>>  New VM dialog box in the Administration Portal.
>>  *
>> Open Virtual Network (OVN)adds support for Open vSwitch software
>>  defined networking (SDN).
>>  *
>> oVirt now supports Nvidia vGPU.
>>  *
>> The ovirt-ansible-rolespackage helps users with common
>>  administration tasks.
>>  *
>> Virt-v2vnow supports Debian/Ubuntu based VMs.
>> 
>> For more information about these and other features, check out the
>> oVirt 4.2.0 blog post
>> .
>> 
>> 
>> This release is available now on x86_64 architecture for:
>> 
>> * Red Hat Enterprise Linux 7.4 or later
>> 
>> * CentOS Linux (or similar) 7.4 or later
>> 
>> 
>> This release supports Hypervisor Hosts on x86_64 and ppc64le
>> architectures for:
>> 
>> * Red Hat 

Re: [ovirt-users] VM resource allocation and IO Threads

2017-10-30 Thread Darrell Budic
Best explanation I’ve found is 
https://wiki.mikejung.biz/KVM_/_Xen#virtio-blk_iothreads_.28x-data-plane.29  If 
you google a bit, you’ll find some more under QEMU topics, I saw some 
discussion of threads and queues in virtio-scsi, but that seems to be a 
slightly different thing than this setting.

In short, having at least 1 offers advantages for all your VM’s disks, and if 
you want to be optimal (at the possible expense of extra CPU for IO), one per 
drive attached. There is (currently) no benefit to having more than 1 thread 
per drive. From what I can tell, if you have more drives than threads they 
share the threads evenly and are statically assigned to a thread. Seems to be 
effective at QEMU start, so you have to change it with the VM down or stop and 
start it again.

I currently enable it on all VMs and assign 1 thread per drive on my systems.
> From: Gianluca Cecchi 
> Subject: [ovirt-users] VM resource allocation and IO Threads
> Date: October 27, 2017 at 9:26:59 AM CDT
> To: users
> 
> Hello,
> can anyone give any pointer to deeper information about what in subject and 
> the value for "Num Of IO Threads" configuration, best practices and 
> to-be-expected improvements?
> 
> I read also here:
> https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html-single/virtual_machine_management_guide/#Editing_IO_Threads
>  
> 
> 
> but in some points it seems not so clear to me:
> 
> eg:
> 
> If a virtual machine has more than one disk, you can enable or change the 
> number of IO threads to improve performance.
> 
> but also
> 
> Red Hat recommends using the default number of IO threads, which is 1.
> 
> There is also a note about deactivation and activation of disks: does it mean 
> that even if I poweroff the VM and change its config I have to make this step 
> after?
> 
> Anyone has run benchmarks?
> Does it make sense if my VM has 3 disks to configure 6 IO threads for example?
> Do IO threads map to SCSI controllers inside the guest or what?
> 
> Thanks in advance,
> Gianluca
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Debugging warning messages about bonding mode 4

2017-10-06 Thread Darrell Budic
That looks like the normal state for a LACP bond, but it does record some churn 
(bond renegotiations, I believe). So it probably bounced once or twice coming 
up. Maybe a slow switch, maybe a switch relying on dynamic bonding instead of 
static bonds, and taking longer to establish. 

For the ones with a down link, and this one too, you could ask the network guys 
if they statically configured the bond, or if they could, might make it quicker 
to bring it up.

I don’t think anything updates when the host is in maintenance, you could take 
it out and see what happens :) The bond is lower level though, should come up 
if it’s configured properly, and you should be able to see that on the host.

  -Darrell

a bond on one of mine:

# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 00:0f:53:08:4b:ac
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 13
Partner Key: 14
Partner Mac Address: 64:64:9b:5e:9b:00

Slave Interface: p1p1
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0f:53:08:4b:ac
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 00:0f:53:08:4b:ac
port key: 13
port priority: 255
port number: 1
port state: 61
details partner lacp pdu:
system priority: 127
system mac address: 64:64:9b:5e:9b:00
oper key: 14
port priority: 127
port number: 8
port state: 63

Slave Interface: p1p2
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0f:53:08:4b:ad
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 00:0f:53:08:4b:ac
port key: 13
port priority: 255
port number: 2
port state: 61
details partner lacp pdu:
system priority: 127
system mac address: 64:64:9b:5e:9b:00
oper key: 14
port priority: 127
port number: 7
port state: 63


> From: Gianluca Cecchi 
> Subject: [ovirt-users] Debugging warning messages about bonding mode 4
> Date: October 6, 2017 at 6:28:16 AM CDT
> To: users
> 
> Hello,
> on a 2 nodes cluster in 4.1.6 I have this situation.
> Every node has 3 bonds, each one composed by 2 network adapters and each one 
> of type mode=4
> (actually in setup networks I have configured custom and then the value: 
> "mode=4 miimon=100"
> )
> 
> At this moment only one of the servers has access to FC storage, while the 
> other is currently on maintenance.
> 
> On 2 of the 3 bonds of the active server I get an exclamation point in 
> "Network Interfaces" subtab with this mouseover popup
> 
> Bond is in link aggregation mode (mode 4), but no partner mac has been 
> reported for it
> 
> What is the exact meaning of this message? Do I have to care about (I think 
> so..)?
> What should I report to network guys?
> Eg, one of these two warning bonds status is:
> 
> # cat /proc/net/bonding/bond2
> Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
> 
> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
> Transmit Hash Policy: layer2 (0)
> MII Status: up
> MII Polling Interval (ms): 100
> Up Delay (ms): 0
> Down Delay (ms): 0
> 
> 802.3ad info
> LACP rate: slow
> Min links: 0
> Aggregator selection policy (ad_select): stable
> System priority: 65535
> System MAC address: 48:df:37:0c:7f:5a
> Active Aggregator Info:
> Aggregator ID: 5
> Number of ports: 2
> Actor Key: 9
> Partner Key: 6
> Partner Mac Address: b8:38:61:9c:75:80
> 
> Slave Interface: ens2f2
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 2
> Permanent HW addr: 48:df:37:0c:7f:5a
> Slave queue ID: 0
> Aggregator ID: 5
> Actor Churn State: none
> Partner Churn State: none
> Actor Churned Count: 2
> Partner Churned Count: 3
> details actor lacp pdu:
> system priority: 65535
> system mac address: 48:df:37:0c:7f:5a
> port key: 9
> port priority: 255
> port number: 1
> port state: 61
> details partner lacp pdu:
> system priority: 32768
> system mac address: b8:38:61:9c:75:80
> oper key: 6
> port priority: 32768
> port number: 293
> port state: 61
> 
> Slave Interface: ens2f3
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 2
> Permanent HW addr: 48:df:37:0c:7f:5b
> Slave queue ID: 0
> Aggregator ID: 5
> Actor Churn State: none
> Partner 

Re: [ovirt-users] More than one mgmt network possible?

2017-09-11 Thread Darrell Budic
From personal experience, if you want it in the same Cluster as other servers, 
it needs to be on the same mgmt network. If you put it in it’s own cluster, it 
can have it’s own mgmt network. The engine needs IP connectivity, obviously.

I have a DC running with 3 clusters, 2 in the same interconnected vlan 100 on 
opposite sides of Chicago, and one with mgmt on vlan 40 in Amsterdam.

 -Darrell

> On Sep 11, 2017, at 4:47 AM, Gianluca Cecchi  
> wrote:
> 
> Hello,
> in site1 I have 2 oVirt hosts with ovirtmgmt configured on vlan167.
> Now I want to add a server that is in site2 where this vlan doesn't arrive.
> I have here a vlan 169 that does routing with the vlan 167 of site1.
> Can I add the host into the same cluster or the only way is to "transport" 
> vlan167 into site2 too?
> 
> Thanks,
> Gianluca
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt-hosted-engine state transition messages

2017-07-23 Thread Darrell Budic
This happened to me again, started last night so it was almost a week from the 
last restart. System was not out of memory, a bit low, and it may have been 
churning buffers or java GC, I’m on vacation and didn’t dig into it very far. 
Restarted the engine and it’s happy. DWH was still working, but web interface 
was a bit slow before the restart. This was 4.1.3 now. Added some ram to the 
Hosted Engine, but looks like I need to restart it and will probably wait until 
I’m back for that.


> On Jul 18, 2017, at 9:22 AM, Darrell Budic <bu...@onholyground.com> wrote:
> 
> I had some of this going on recently under 4.1.2, started with one or two 
> warning messages, then a flood of them. Did the upgrade to 4.1.3 and haven’t 
> seen it yet, but it’s only been a few days so far. A java process was 
> consuming much CPU, and the DataWarehouse appears to not be collecting data 
> (evidenced by a blank dashboard). My DWH has since recovered as well.
> 
> I forgot to check, but suspect I was low/out of memory on my engine VM, it’s 
> an old one with only 6G allocated currently. Watching for this to happen 
> again, and will confirm RAM utilization and bump up appropriately if it looks 
> like it’s starved for RAM.
> 
> 
>> On Jul 18, 2017, at 5:45 AM, Christophe TREFOIS <christophe.tref...@uni.lu 
>> <mailto:christophe.tref...@uni.lu>> wrote:
>> 
>> I have the same as you on 4.1.0
>> 
>> EngineBadHealth-EngineUp 1 minute later. Sometimes 20 times per day, mostly 
>> on weekends.
>> 
>> Cheers,
>> -- 
>> 
>> Dr Christophe Trefois, Dipl.-Ing.  
>> Technical Specialist / Post-Doc
>> 
>> UNIVERSITÉ DU LUXEMBOURG
>> 
>> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
>> Campus Belval | House of Biomedicine  
>> 6, avenue du Swing 
>> L-4367 Belvaux  
>> T: +352 46 66 44 6124 
>> F: +352 46 66 44 6949  
>> http://www.uni.lu/lcsb <http://www.uni.lu/lcsb>
>>  <https://www.facebook.com/trefex>   <https://twitter.com/Trefex>   
>> <https://plus.google.com/+ChristopheTrefois/>   
>> <https://www.linkedin.com/in/trefoischristophe>   <http://skype:Trefex?call>
>> 
>> 
>> This message is confidential and may contain privileged information. 
>> It is intended for the named recipient only. 
>> If you receive it in error please notify me and permanently delete the 
>> original message and any copies. 
>> 
>>   
>> 
>>> On 17 Jul 2017, at 17:35, Jim Kusznir <j...@palousetech.com 
>>> <mailto:j...@palousetech.com>> wrote:
>>> 
>>> Ok, I've been ignoring this for a long time as the logs were so verbose and 
>>> didn't show anything I could identify as usable debug info.  Recently one 
>>> of my ovirt hosts (currently NOT running the main engine, but a candidate) 
>>> was cycling as much as 40 times a day between "EngineUpBadHealth and 
>>> EngineUp".  Here's the log snippit.  I included some time before and after 
>>> if that's helpful.  In this case, I got an email about bad health at 8:15 
>>> and a restore (engine up) at 8:16.  I see where the messages are sent, but 
>>> I don't see any explanation as to why / what the problem is.
>>> 
>>> BTW: 192.168.8.11 is this computer's physical IP; 192.168.8.12 is the 
>>> computer currently running the engine.  Both are also hosting the gluster 
>>> store (eg, I have 3 hosts, all are participating in the gluster replica 
>>> 2+arbitrator).
>>> 
>>> I'd appreciate it if someone could shed some light on why this keeps 
>>> happening!
>>> 
>>> --Jim
>>> 
>>> 
>>> MainThread::INFO::2017-07-17 
>>> 08:12:06,230::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>>  Reloading vm.conf from the shared storage domain
>>> MainThread::INFO::2017-07-17 
>>> 08:12:06,230::config::412::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>>  Trying to get a fresher copy of vm configuration from the OVF_STORE
>>> MainThread::INFO::2017-07-17 
>>> 08:12:08,877::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
>>>  Found OVF_STORE: imgUUID:e10c90a5-4d9c-4e18-b6f7-ae8f0cdf4f57, 
>>> volUUID:a9754d40-eda1-44d7-ac92-76a228f9f1ac
>>> MainThread::INFO::2017-07-17 
>>> 08:12:09,432::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
>>>  Found OVF_STORE: imgUUID:f22829ab-9fd5-415a-9a8f-809d3f

Re: [ovirt-users] ovirt-hosted-engine state transition messages

2017-07-18 Thread Darrell Budic
I had some of this going on recently under 4.1.2, started with one or two 
warning messages, then a flood of them. Did the upgrade to 4.1.3 and haven’t 
seen it yet, but it’s only been a few days so far. A java process was consuming 
much CPU, and the DataWarehouse appears to not be collecting data (evidenced by 
a blank dashboard). My DWH has since recovered as well.

I forgot to check, but suspect I was low/out of memory on my engine VM, it’s an 
old one with only 6G allocated currently. Watching for this to happen again, 
and will confirm RAM utilization and bump up appropriately if it looks like 
it’s starved for RAM.


> On Jul 18, 2017, at 5:45 AM, Christophe TREFOIS  
> wrote:
> 
> I have the same as you on 4.1.0
> 
> EngineBadHealth-EngineUp 1 minute later. Sometimes 20 times per day, mostly 
> on weekends.
> 
> Cheers,
> -- 
> 
> Dr Christophe Trefois, Dipl.-Ing.  
> Technical Specialist / Post-Doc
> 
> UNIVERSITÉ DU LUXEMBOURG
> 
> LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
> Campus Belval | House of Biomedicine  
> 6, avenue du Swing 
> L-4367 Belvaux  
> T: +352 46 66 44 6124 
> F: +352 46 66 44 6949  
> http://www.uni.lu/lcsb 
>        
>    
>    
> 
> 
> This message is confidential and may contain privileged information. 
> It is intended for the named recipient only. 
> If you receive it in error please notify me and permanently delete the 
> original message and any copies. 
> 
>   
> 
>> On 17 Jul 2017, at 17:35, Jim Kusznir > > wrote:
>> 
>> Ok, I've been ignoring this for a long time as the logs were so verbose and 
>> didn't show anything I could identify as usable debug info.  Recently one of 
>> my ovirt hosts (currently NOT running the main engine, but a candidate) was 
>> cycling as much as 40 times a day between "EngineUpBadHealth and EngineUp".  
>> Here's the log snippit.  I included some time before and after if that's 
>> helpful.  In this case, I got an email about bad health at 8:15 and a 
>> restore (engine up) at 8:16.  I see where the messages are sent, but I don't 
>> see any explanation as to why / what the problem is.
>> 
>> BTW: 192.168.8.11 is this computer's physical IP; 192.168.8.12 is the 
>> computer currently running the engine.  Both are also hosting the gluster 
>> store (eg, I have 3 hosts, all are participating in the gluster replica 
>> 2+arbitrator).
>> 
>> I'd appreciate it if someone could shed some light on why this keeps 
>> happening!
>> 
>> --Jim
>> 
>> 
>> MainThread::INFO::2017-07-17 
>> 08:12:06,230::config::485::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
>>  Reloading vm.conf from the shared storage domain
>> MainThread::INFO::2017-07-17 
>> 08:12:06,230::config::412::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>  Trying to get a fresher copy of vm configuration from the OVF_STORE
>> MainThread::INFO::2017-07-17 
>> 08:12:08,877::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
>>  Found OVF_STORE: imgUUID:e10c90a5-4d9c-4e18-b6f7-ae8f0cdf4f57, 
>> volUUID:a9754d40-eda1-44d7-ac92-76a228f9f1ac
>> MainThread::INFO::2017-07-17 
>> 08:12:09,432::ovf_store::103::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan)
>>  Found OVF_STORE: imgUUID:f22829ab-9fd5-415a-9a8f-809d3f7887d4, 
>> volUUID:9f4760ee-119c-412a-a1e8-49e73e6ba929
>> MainThread::INFO::2017-07-17 
>> 08:12:09,925::ovf_store::112::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>  Extracting Engine VM OVF from the OVF_STORE
>> MainThread::INFO::2017-07-17 
>> 08:12:10,324::ovf_store::119::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF)
>>  OVF_STORE volume path: 
>> /rhev/data-center/mnt/glusterSD/192.168.8.11:_engine/c0acdefb-7d16-48ec-9d76-659b8fe33e2a/images/f22829ab-9fd5-415a-9a8f-809d3f7887d4/9f4760ee-119c-412a-a1e8-49e73e6ba929
>>  
>> MainThread::INFO::2017-07-17 
>> 08:12:10,696::config::431::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>  Found an OVF for HE VM, trying to convert
>> MainThread::INFO::2017-07-17 
>> 08:12:10,704::config::436::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store)
>>  Got vm.conf from OVF_STORE
>> MainThread::INFO::2017-07-17 
>> 08:12:10,705::states::426::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>>  Engine vm running on localhost
>> MainThread::INFO::2017-07-17 
>> 08:12:10,714::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm)
>>  Initializing VDSM
>> MainThread::INFO::2017-07-17 
>> 

[ovirt-users] vdsm (4.1) restarts glusterd when activating a node, even if it's already running

2017-07-02 Thread Darrell Budic
Upgrading some nodes today, and noticed that vdsmd restarts glusterd on a node 
when it activates it. This is causing a short break in healing when the shd 
gets disconnected, forcing some extra healing when the healing process reports 
“Transport Endpoint Disconnected” (N/A in the ovirt gui).

This is on a converged cluster (3 nodes, gluster replica volume across all 3, 
ovirt-engine running elsewhere). Centos 7 install, just upgraded to Ovirt 
4.1.2, running cluster 3.10 from the Centos SIG.

The process I’m observing:

Place a node into maintenance via GUI
Update node from command line
Reboot node (kernel update)
Watch gluster heal itself after reboot
Activate node in GUI
gluster is completely stopped on this node
gluster is started on this node
healing begins again, but isn’t working
“gluster vol heal  info” reports this node’s information not available 
because “Transport endpoint not connected”.
This clears up in 5-10 minutes, then volume heals normally

Someone with a similar setup want to check this and see if it’s something 
specific to my nodes, or just a general problem with the way it’s restarting 
gluster? Looking for a little confirmation before I file a bug report on it.

Or a dev want to comment on why it stops and starts gluster, instead of a 
restart which would presumably leave the brick processes and shd running and 
not causing this effect?

Thanks,

  -Darrell
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] vdsm changing disk scheduler when starting, configurable?

2017-07-02 Thread Darrell Budic
It seems vdsmd under 4.1.x (or something under it’s control) changes the disk 
schedulers when it starts or a host node is activated, and I’d like to avoid 
this. Is it preventable? Or configurable anywhere? This was probably happening 
under earlier version, but I just noticed it while upgrading some converged 
boxes today.

It likes to set deadline, which I understand is the RHEL default for centos 7 
on non SATA disks. But I’d rather have NOOP on my SSDs because SSDs, and NOOP 
on my SATA spinning platters because ZFS does it’s own scheduling, and running 
anything other than NOOP can cause increased CPU utilization for no gain. It’s 
also fighting ZFS, which tires to set NOOP on whole disks it controls, and my 
kernel command line setting.

Thanks,

  -Darrell
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Very poor GlusterFS performance

2017-06-19 Thread Darrell Budic
Chris-

You probably need to head over to gluster-us...@gluster.org 
 for help with performance issues.

That said, what kind of performance are you getting, via some form or testing 
like bonnie++ or even dd runs? Raw bricks vs gluster performance is useful to 
determine what kind of performance you’re actually getting.

Beyond that, I’d recommend dropping the arbiter bricks and re-adding them as 
full replicas, they can’t serve distributed data in this configuration and may 
be slowing things down on you. If you’ve got a storage network setup, make sure 
it’s using the largest MTU it can, and consider adding/testing these settings 
that I use on my main storage volume:

performance.io-thread-count: 32
client.event-threads: 8
server.event-threads: 3
performance.stat-prefetch: on

Good luck,

  -Darrell


> On Jun 19, 2017, at 9:46 AM, Chris Boot  wrote:
> 
> Hi folks,
> 
> I have 3x servers in a "hyper-converged" oVirt 4.1.2 + GlusterFS 3.10
> configuration. My VMs run off a replica 3 arbiter 1 volume comprised of
> 6 bricks, which themselves live on two SSDs in each of the servers (one
> brick per SSD). The bricks are XFS on LVM thin volumes straight onto the
> SSDs. Connectivity is 10G Ethernet.
> 
> Performance within the VMs is pretty terrible. I experience very low
> throughput and random IO is really bad: it feels like a latency issue.
> On my oVirt nodes the SSDs are not generally very busy. The 10G network
> seems to run without errors (iperf3 gives bandwidth measurements of >=
> 9.20 Gbits/sec between the three servers).
> 
> To put this into perspective: I was getting better behaviour from NFS4
> on a gigabit connection than I am with GlusterFS on 10G: that doesn't
> feel right at all.
> 
> My volume configuration looks like this:
> 
> Volume Name: vmssd
> Type: Distributed-Replicate
> Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 2 x (2 + 1) = 6
> Transport-type: tcp
> Bricks:
> Brick1: ovirt3:/gluster/ssd0_vmssd/brick
> Brick2: ovirt1:/gluster/ssd0_vmssd/brick
> Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter)
> Brick4: ovirt3:/gluster/ssd1_vmssd/brick
> Brick5: ovirt1:/gluster/ssd1_vmssd/brick
> Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter)
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet6
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> performance.low-prio-threads: 32
> network.remote-dio: off
> cluster.eager-lock: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 1
> features.shard: on
> user.cifs: off
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard-block-size: 128MB
> performance.strict-o-direct: on
> network.ping-timeout: 30
> cluster.granular-entry-heal: enable
> 
> I would really appreciate some guidance on this to try to improve things
> because at this rate I will need to reconsider using GlusterFS altogether.
> 
> Cheers,
> Chris
> 
> -- 
> Chris Boot
> bo...@bootc.net
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Lost our HostedEngineVM

2017-03-22 Thread Darrell Budic
From a hosted engine host shell, it’s:

hosted-engine —vm-start

hosted-engine —vm-status
 is also useful. If you restored your storage (that include the hosted engine 
storage) after rebooting that host, you should try restarting ovirt-ha-agent & 
ovirt-ha-broker, or just restart the machine and see if it mounts it properly.

 
> On Mar 21, 2017, at 4:58 PM, Matt Emma  wrote:
> 
> We’re in a bit of a panic mode, so excuse any shortness. 
>  
> We had a storage failure. We rebooted a VMHost that had the hostedengine VM - 
> The HostedENgine did not try to move to the other hosts. We’ve since restored 
> storage and we are able to successfully restart the paused VMs. We know the 
> HostedEngine’s VM ID is there a way we can force load it from the mounted 
> storage? 
>  
> -Matt 
> ___
> Users mailing list
> Users@ovirt.org 
> http://lists.ovirt.org/mailman/listinfo/users 
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt 4 and 10GbE NICs

2017-03-14 Thread Darrell Budic

> On Mar 14, 2017, at 7:54 AM, FERNANDO FREDIANI  
> wrote:
> 
> Isn't the traffic shown on the dashboard based in 1Gbps always, even if the 
> hosts have 10Gb interfaces ?
> 

Yep, all dirt interfaces show as 1Gb.

> Is there anywhere in oVirt config files or Database that you can tell to the 
> dashboard to consider 10Gb instead of 1Gb for those cases ?
> 
> 

Not that I know of, but it doesn’t affect that available performance, it’s just 
visible.

I’ve gotten ~3.5Gbps out of iperf with no appreciable tuning to a VM, so it’s 
definitely possible to get more speed out of them.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Replicated Glusterfs on top of ZFS

2017-03-03 Thread Darrell Budic
Why are you using an arbitrator if all your HW configs are identical? I’d use a 
true replica 3 in this case.

Also in my experience with gluster and vm hosting, the ZIL/slog degrades write 
performance unless it’s a truly dedicated disk. But I have 8 spinners backing 
my ZFS volumes, so trying to share a sata disk wasn’t a good zil. If yours is 
dedicated SAS, keep it, if it’s SATA, try testing without it.

You don’t have compression enabled on your zfs volume, and I’d recommend 
enabling relatime on it. Depending on the amount of RAM in these boxes, you 
probably want to limit your zfs arc size to 8G or so (1/4 total ram or less). 
Gluster just works volumes hard during a rebuild, what’s the problem you’re 
seeing? If it’s affecting your VMs, using shading and tuning client & server 
threads can help avoid interruptions to your VMs while repairs are running. If 
you really need to limit it, you can use cgroups to keep it from hogging all 
the CPU, but it takes longer to heal, of course. There are a couple older posts 
and blogs about it, if you go back a while.


> On Mar 3, 2017, at 9:02 AM, Arman Khalatyan  wrote:
> 
> The problem itself is not the streaming data performance., and also dd zero 
> does not help much in the production zfs running with compression.
> the main problem comes when the gluster is starting to do something with 
> that, it is using xattrs, probably accessing extended attributes inside the 
> zfs is slower than XFS.
> Also primitive find file or ls -l in the (dot)gluster folders takes ages: 
> 
> now I can see that arbiter host has almost 100% cache miss during the 
> rebuild, which is actually natural while he is reading always the new 
> datasets:
> [root@clei26 ~]# arcstat.py 1
> time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c  
> 15:57:31292910029  100 0029  100   685M   31G  
> 15:57:32   530   476 89   476   89 00   457   89   685M   31G  
> 15:57:33   480   467 97   467   97 00   463   97   685M   31G  
> 15:57:34   452   443 98   443   98 00   435   97   685M   31G  
> 15:57:35   582   547 93   547   93 00   536   94   685M   31G  
> 15:57:36   439   417 94   417   94 00   393   94   685M   31G  
> 15:57:38   435   392 90   392   90 00   374   89   685M   31G  
> 15:57:39   364   352 96   352   96 00   352   96   685M   31G  
> 15:57:40   408   375 91   375   91 00   360   91   685M   31G  
> 15:57:41   552   539 97   539   97 00   539   97   685M   31G  
> 
> It looks like we cannot have in the same system performance and reliability :(
> Simply final conclusion is with the single disk+ssd even zfs doesnot help to 
> speedup the glusterfs healing.
> I will stop here:)
> 
> 
> 
> 
> On Fri, Mar 3, 2017 at 3:35 PM, Juan Pablo  > wrote:
> cd to inside the pool path
> then dd if=/dev/zero of=test.tt  bs=1M 
> leave it runing 5/10 minutes.
> do ctrl+c paste result here.
> etc.
> 
> 2017-03-03 11:30 GMT-03:00 Arman Khalatyan  >:
> No, I have one pool made of the one disk and ssd as a cache and log device.
> I have 3 Glusterfs bricks- separate 3 hosts:Volume type Replicate (Arbiter)= 
> replica 2+1!
> That how much you can push into compute nodes(they have only 3 disk slots).
> 
> 
> On Fri, Mar 3, 2017 at 3:19 PM, Juan Pablo  > wrote:
> ok, you have 3 pools, zclei22, logs and cache, thats wrong. you should have 1 
> pool, with zlog+cache if you are looking for performance.
> also, dont mix drives. 
> whats the performance issue you are facing? 
> 
> 
> regards,
> 
> 2017-03-03 11:00 GMT-03:00 Arman Khalatyan  >:
> This is CentOS 7.3 ZoL version 0.6.5.9-1
> 
> [root@clei22 ~]# lsscsi
> 
> [2:0:0:0]diskATA  INTEL SSDSC2CW24 400i  /dev/sda
> 
> [3:0:0:0]diskATA  HGST HUS724040AL AA70  /dev/sdb
> 
> [4:0:0:0]diskATA  WDC WD2002FYPS-0 1G01  /dev/sdc
> 
> 
> 
> [root@clei22 ~]# pvs ;vgs;lvs
> 
>   PV VGFmt  Attr 
> PSize   PFree
> 
>   /dev/mapper/INTEL_SSDSC2CW240A3_CVCV306302RP240CGN vg_cache  lvm2 a--  
> 223.57g 0
> 
>   /dev/sdc2  centos_clei22 lvm2 a--   
>  1.82t 64.00m
> 
>   VG#PV #LV #SN Attr   VSize   VFree
> 
>   centos_clei22   1   3   0 wz--n-   1.82t 64.00m
> 
>   vg_cache1   2   0 wz--n- 223.57g 0
> 
>   LV   VGAttr   LSize   Pool Origin Data%  Meta%  Move 
> Log Cpy%Sync Convert
> 
>   home centos_clei22 -wi-ao   1.74t   
> 
> 
>   root centos_clei22 -wi-ao  50.00g   
>   

Re: [ovirt-users] upgrading from 3.6 -> 4.1, vm restarts at 4.0 mandatory?

2017-02-26 Thread Darrell Budic
Not really, not when upgrading to 4.0 at any rate. Take chapter 4 in that guide:

http://www.ovirt.org/documentation/upgrade-guide/chap-Post-Upgrade_Tasks/ 
<http://www.ovirt.org/documentation/upgrade-guide/chap-Post-Upgrade_Tasks/>

When you perform these steps, you are advised that you now need to restart the 
VMs to enable the new 4.0 compatible configurations.

As there is no 4.1 upgrade guide, I’m left to interpret Chapter 1 to imply that 
you need to be at 4.0 to go to 4.1. But does that mean just having the Cluster 
and DC levels at 4.0, or also having your VMs running at 4.0? This wasn’t 
applicable to 3.6, so there’s no parallel here. Basically, I’m concerned that a 
similar procedure for setting Cluster & DC compatibility to 4.1 will not 
properly handle VMs still running with 3.6 configs, and hoping someone has 
concrete knowledge of this step that can chime in.

FYI, the breadcrumbs on the web site don’t actually work, they give you a page 
with the right number of list items for what you’re reading, but no actual 
links to the chapters (the href is present, but no link text exists).

  -Darrell

> On Feb 26, 2017, at 1:10 AM, Fred Rolland <froll...@redhat.com> wrote:
> 
> Hi,
> 
> Restart the VMs is not part of the upgrade procedure.
> You can check the upgrade guide :
> http://www.ovirt.org/documentation/upgrade-guide/chap-Updating_the_oVirt_Environment/
>  
> <http://www.ovirt.org/documentation/upgrade-guide/chap-Updating_the_oVirt_Environment/>
> 
> Regards,
> Fred
> 
> On Fri, Feb 24, 2017 at 10:44 PM, Darrell Budic <bu...@onholyground.com 
> <mailto:bu...@onholyground.com>> wrote:
> I’m upgrading my main cluster from 3.6 to 4.1, and I’m currently at 4.0. I’ve 
> upgraded the cluster and datacenter compatibility versions to 4.0, and now 
> all my VMs are pending restart to update their configs to 4.0.
> 
> My question is “Do I need to do this here, or can I go ahead and update the 
> engine and host nodes to 4.1, update compatibility to 4.1, and then restart 
> all the VMs to get them on 4.1”? Or is that unsafe, will I screw them up if I 
> go to 4.1 compatibility in this state?
> 
> Thanks,
> 
>   -Darrell
> 
> ___
> Users mailing list
> Users@ovirt.org <mailto:Users@ovirt.org>
> http://lists.ovirt.org/mailman/listinfo/users 
> <http://lists.ovirt.org/mailman/listinfo/users>
> 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] upgrading from 3.6 -> 4.1, vm restarts at 4.0 mandatory?

2017-02-24 Thread Darrell Budic
I’m upgrading my main cluster from 3.6 to 4.1, and I’m currently at 4.0. I’ve 
upgraded the cluster and datacenter compatibility versions to 4.0, and now all 
my VMs are pending restart to update their configs to 4.0.

My question is “Do I need to do this here, or can I go ahead and update the 
engine and host nodes to 4.1, update compatibility to 4.1, and then restart all 
the VMs to get them on 4.1”? Or is that unsafe, will I screw them up if I go to 
4.1 compatibility in this state?

Thanks,

  -Darrell

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] gpu passthrough

2017-02-21 Thread Darrell Budic
I think you need to click the “down arrow” under the top box, should move the 
GPU into the “Host Devices to be attached” box, then hit Ok to make the 
assignment.

> On Feb 21, 2017, at 3:26 AM, qinglong.d...@horebdata.cn wrote:
> 
> Hi, all:
> I want to assign the gpu card of one host to the vm which is running 
> in the host.
> 
> After I click "OK", I got nothing. Anyone can help? Thanks!
> ___
> Users mailing list
> Users@ovirt.org 
> http://lists.ovirt.org/mailman/listinfo/users 
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Request for feedback on your db vacuum status

2016-12-13 Thread Darrell Budic
Whups, bad reply to, no problem copying the list.

It's a self hosted system, currently with two clusters and 9 active hosts.74 
VMs, yes. It’s had a few more clusters over time, and a few more hosts, 
including some removed and re-added when upgrading from centos 6 to 7. But only 
9 at the moment. One cluster of 6 with most of the vms, using external Gluster 
servers not managed by ovirt, and one cluster of 3 which are also gluster 
servers as well as hypervisors. 7 vms currently on the smaller cluster.


> On Dec 13, 2016, at 3:36 AM, Roy Golan <rgo...@redhat.com> wrote:
> On 12 December 2016 at 20:31, Darrell Budic <bu...@onholyground.com 
> <mailto:bu...@onholyground.com>> wrote:
> Here’s mine: http://paste.fedoraproject.org/505443/14815674/ 
> <http://paste.fedoraproject.org/505443/14815674/>
> 
> This engine has been around since at 3.3, maybe 3.2, currently on 3.6 because 
> I haven’t had time to arrange the OS upgrade from centos 6 to 7 for the 
> engine host yet.
> 
> 
> Thank you very much Darrell! your vacuum seems boring (good!) and the db 
> seems healthy.  Can you reply to the list that you sent the feedback so 
> everyone will have a chance to look at it? also how big is your setup? 
> judging by the output is it 74 vms and 42 hosts?  
> 
>> On Dec 8, 2016, at 8:18 AM, Roy Golan <rgo...@redhat.com 
>> <mailto:rgo...@redhat.com>> wrote:
>> 
>> Hi all,
>> 
>> Following the thread about vacuum tool [1] I would like to gather some 
>> feedback about your deployment's db vacuum status The info is completely 
>> anonymous and function running it is a read only reporting one and should 
>> have little or no effect on the db.
>> 
>> The result can be pretty verbose  but again will not disclose sensitive 
>> info. Anyway review it before pasting it. It should look something like 
>> that(a snippet of one table):
>> 
>> INFO:  vacuuming "pg_catalog.pg_ts_template"
>> INFO:  index "pg_ts_template_tmplname_index" now contains 5 row versions in 
>> 2 pages
>> DETAIL:  0 index row versions were removed.
>> 0 index pages have been deleted, 0 are currently reusable.
>> CPU 0.00s/0.00u sec elapsed 0.00 sec.
>> 
>> 
>> 1. sudo su - postgres  -c "psql engine -c 'vacuum verbose'" &> 
>> /tmp/vacuum.log
>> 
>> 2. review the /tmp/vacuum.log
>> 
>> 3. paste it to http://paste.fedoraproject.org/ 
>> <http://paste.fedoraproject.org/> and reply with the link here
>> 
>> 
>> [1] http://lists.ovirt.org/pipermail/devel/2016-December/014484.html 
>> <http://lists.ovirt.org/pipermail/devel/2016-December/014484.html>
>> 
>> 
>> Thanks,
>> Roy
>> ___
>> Users mailing list
>> Users@ovirt.org <mailto:Users@ovirt.org>
>> http://lists.ovirt.org/mailman/listinfo/users 
>> <http://lists.ovirt.org/mailman/listinfo/users>
> 
> 

___
Users mailing list
Users@ovirt.org
http://lists.phx.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirtmgmt manual bridge cannot be used in ovirt 4.0

2016-11-28 Thread Darrell Budic
I’m having trouble with that plan and I’m not even trying to put the ovirtmgmt 
bridge on an existing bond. I have a pre-existing gluster setup with gluster 
running on a bonded interface. The ovirtmgmt should go on a different 
interface, but even that fails on the bond with the “torn down manually” 
statement. Haven’t had much time to do more troubleshooting, but it’s rather 
annoying. 

I’d really prefer it if the installer would just deal with existing network 
setups, I (presumably) know what I want there, especially in the case of 
pre-existing gluster setups.

  -Darrell

> On Nov 28, 2016, at 9:11 AM, Charles Kozler  wrote:
> 
> What happens when you configure the bond and then build the bridge manually 
> over the bond? oVirt installer should skip over it and not do anything. Just 
> make sure you have DEFROUTE set or routes configuration file as you expect 
> (this is what used to screw me up)
> 
> On Mon, Nov 28, 2016 at 10:06 AM,  > wrote:
> Thanks for your responses but the ui is not an option for me as i am dealing 
> with loads of systems.
> in 3.5 ovirt used to just accept the bridge as it was and incorporate it, i 
> am just wondering if i am facing a bug or a feature at the moment.
> 
> 
> Charles Kozler schreef op 2016-11-28 15:48:
> Thats what I used to do as well then on oVirt 4 it started screwing
> with the the bond as well so I ended up just dumbing it down and
> figured using the UI after the fact would be OK. I cant remember
> exactly what would happen but it would be stupid little things like
> routing would break or something. 
> 
> On Mon, Nov 28, 2016 at 9:43 AM, Simone Tiraboschi
>  [8]> wrote:
> 
> On Mon, Nov 28, 2016 at 3:42 PM, Charles Kozler
>  [7]> wrote:
> 
> What Ive been doing since oVirt 4 is just configuring one NIC
> manually when I provision the server (eg: eth0, em1, etc) and then
> let oVirt do the bridge setup. Once the engine is up I login to
> the UI and I use it to bond the NICs in whatever fashion I need
> (LACP or active-backup). Any time I tried to configure ovirtmgmt
> manually it seemed to "annoy" the hosted-engine --deploy script
> 
> This is fine.
> Another thing you could do is manually creating the bond and then
> having hosted-engine-setup creating the management bridge over your
> bond.
> 
>  
> 
> On Mon, Nov 28, 2016 at 9:33 AM, Simone Tiraboschi
>  [6]> wrote:
> 
> On Mon, Nov 28, 2016 at 12:24 PM,   [3]>
> wrote:
> 
> Hi All,
> 
> In our ovirt 3.5 setup. i have always setup the ovirtmgmt
> bridge manually .
> The bridge consisted of 2 nics
> 
> Id have /etc/vdsm/vdsm.conf list net_persist = ifcfg
> 
> 
> When i then deployed the host from the ovirt ui or api it
> would install and would display the network setup correctly in
> the ui.
> 
> On ovirt 4. (vdsm-4.18.15.3-1.el7.centos.x86_64)
> I seem unable to follow the same approach.
> 
> In the engine logs i get among other things
> 
> If the interface ovirtmgmt is a bridge, it should be
> torn-down manually.
> 
> the interface is indeed a bridge with two nics which i would
> like to keep this way.
> 
> On the host vdsm.log i get limited info,
> 
> when start a python terminal to obtain netinfo i get this
> 
> from vdsm.tool import unified_persistence
> unified_persistence.netswitch.netinfo()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File
> "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py",
> line 298, in netinfo
> _netinfo = netinfo_get(compatibility=compatibility)
>   File
> 
> 
> "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/cache.py",
> line 109, in get
> return _get(vdsmnets)
>   File
> 
> 
> "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/cache.py",
> line 101, in _get
> report_network_qos(networking)
>   File
> 
> "/usr/lib/python2.7/site-packages/vdsm/network/netinfo/qos.py",
> line 46, in report_network_qos
> iface, = host_ports
> ValueError: too many values to unpack
> 
> As it appears the line in question does not like to deal with
> a list of nics i think.
> but either way.
> 
> Is in ovirt 4 the ability to use the ovirtmgmt bridge with
> multiple nics removed?
> 
> But do you need a bridge or a bond?
>  
> 
> If so what can i do to stick to what we have done in the past.
> 
> Thanks.
> 
> ___
> Users mailing list
> Users@ovirt.org  [1]
> http://lists.ovirt.org/mailman/listinfo/users 
>  [2]
> 
> ___
> Users mailing list
> Users@ovirt.org  [4]
> http://lists.ovirt.org/mailman/listinfo/users 
>  [5]
> 
> 
> 
> 

  1   2   >