[ovirt-users] disk description

2019-06-06 Thread Dmitry Filonov
Hi -

 Looks like Friday is right around the corner...
 Have a really stupid question.
 Is there a way to edit disk description (or extend it's size) without
attaching that disk to some VM and editing disk there?

 Thanks!

--
Dmitry Filonov
Linux Administrator
SBGrid Core | Harvard Medical School
250 Longwood Ave, SGM-114
Boston, MA 02115
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/237NULLK4KXIKWR5ZFQJD7EROHH4XYGZ/


[ovirt-users] Re: high number of interface RX errors on ovirtmgmt network

2019-06-06 Thread Jayme
I increased RX ring params on the interface and restarted networking on
each host.  So far the error counts on all three hosts 1gig interfaces are
still at zero.  Will see how it holds up

On Thu, Jun 6, 2019 at 12:20 PM Jayme  wrote:

> I have a three node HCI setup on Dell R720s running the latest stable
> version of 4.3.3
>
> Each hosts has a 1gig link and a 10gig link.  The 1gig is used for ovirt
> management network and 10gig link is used for backend glusterFS traffic.
>
> I haven't noticed before but after installing ovirt metrics store I'm
> seeing that gig interface used for ovirtmgmt on all three hosts are showing
> high RX error rates.  The 10gig interfaces for glusterFS on all three hosts
> appear to be fine.
>
> The 1gig ethernet controllers are: Broadcom Inc. and subsidiaries
> NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)
>
> Other physical servers on the same network/switches outside of oVirt have
> zero RX errors.
>
> Here is an example of what I'm seeing:
>
> host0:
>
> # ip -s link show em3
> 4: em3:  mtu 1500 qdisc mq master
> ovirtmgmt state UP mode DEFAULT group default qlen 1000
> link/ether b0:83:fe:cc:9a:2d brd ff:ff:ff:ff:ff:ff
> RX: bytes  packets  errors  dropped overrun mcast
> 51777532544474 36233202312 416993  0   0   2062421
> TX: bytes  packets  errors  dropped carrier collsns
> 7284362442704 18685883330 0   0   0   0
>
> host1:
>
> # ip -s link show em3
> 4: em3:  mtu 1500 qdisc mq master
> ovirtmgmt state UP mode DEFAULT group default qlen 1000
> link/ether b0:83:fe:cc:99:31 brd ff:ff:ff:ff:ff:ff
> RX: bytes  packets  errors  dropped overrun mcast
> 9518766859330 14424644226 89638   0   0   2056578
> TX: bytes  packets  errors  dropped carrier collsns
> 27866585257227 22323979969 0   0   0   0
>
> host2:
>
> # ip -s link show em3
> 4: em3:  mtu 1500 qdisc mq master
> ovirtmgmt state UP mode DEFAULT group default qlen 1000
> link/ether b0:83:fe:cc:92:50 brd ff:ff:ff:ff:ff:ff
> RX: bytes  packets  errors  dropped overrun mcast
> 6409138012195 13045254148 14825   0   0   2040655
> TX: bytes  packets  errors  dropped carrier collsns
> 31577745516683 23466818659 0   0   0   0
>
> Anyone have any ideas why the RX error rate on the ovirtmgmt network could
> be so high?
>
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/CAPDTFBYN7DOHEURK6TC2OATQ3M6HHKR/


[ovirt-users] Re: Replace bad Host from a 9 Node hyperconverged setup 4.3.3

2019-06-06 Thread Dmitry Filonov
Can you remove bricks that belong to a fried server? Either from a GUI or
CLI
You should be able to do so and then it should allow you to remove host
from the oVirt setup.



--
Dmitry Filonov
Linux Administrator
SBGrid Core | Harvard Medical School
250 Longwood Ave, SGM-114
Boston, MA 02115


On Thu, Jun 6, 2019 at 4:36 PM  wrote:

> Definitely is a challenge trying to replace a bad host.
>
> So let me tell you what I see and have done so far:
>
> 1.-I have a host that went bad due to HW issues.
> 2.-This bad host is still showing in the compute --> hosts section.
> 3.-This host was part of a hyperconverged setup with Gluster.
> 4.-The gluster bricks for this server show up with a "?" mark inside the
> volumes under Storage ---> Volumes ---> Myvolume ---> bricks
> 5.-Under Compute ---> Hosts --> mybadhost.mydomain.com the host  is in
> maintenance mode.
> 6.-When I try to remove that host (with "Force REmove" ticked) I keep
> getting:
> Operation Canceled
>  Error while executing action:
> mybadhost.mydomain.com
> - Cannot remove Host. Server having Gluster volume.
> Note: I have also confirmed "host has been rebooted"
>
> Since the bad host was not recoverable (it was fried), I took a brand new
> server with the same specs and installed oVirt 4.3.3 on it and have it
> ready to add it to the cluster with the same hostname and IP but I cant do
> this until I remove the old entries on the WEB UI of the Hosted Engine VM.
>
> If this is not possible would I really need to add this new host with a
> different name and IP?
> What would be the correct and best procedure to fix this?
>
> Note that my setup is a 9 node setup with hyperconverged and replica 3
> bricks and  in a  distributed replicated volume scenario.
>
> Thanks
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/N4HFTCWNFTOJJ34VSBHY5NKK5ZQAEDB7/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3ADQBVAZ3RGDIG5SRODOVJBZUOEMAC3Z/


[ovirt-users] Re: Replace bad Host from a 9 Node hyperconverged setup 4.3.3

2019-06-06 Thread Edward Berger
I'll presume you didn't fully backup your hosts root file systems on the
host which was fried.

It may be easier to replace with a new hostname/IP.
I would focus on the gluster config first, since it was hyperconverged.

I don't know which way engine UI is using to detect gluster mount on
missing host and decides not to remove the old host.
You probably also have the storage domain "mounted in the data-center" with
backup volume servers pointing at the old host details
The remaining gluster peers also notice the outage, and it could be
detecting that?

I would try to make gluster changes, so maybe the engine UI will allow you
to remove old hyperconverged host entry.
(The Engine UI is really trying to protect your gluster data).
I'd try changing the mount options and there is a way to tell gluster to
only use two hosts and stop trying to connect to the
third, but I don't remember the details.










On Thu, Jun 6, 2019 at 4:32 PM  wrote:

> Definitely is a challenge trying to replace a bad host.
>
> So let me tell you what I see and have done so far:
>
> 1.-I have a host that went bad due to HW issues.
> 2.-This bad host is still showing in the compute --> hosts section.
> 3.-This host was part of a hyperconverged setup with Gluster.
> 4.-The gluster bricks for this server show up with a "?" mark inside the
> volumes under Storage ---> Volumes ---> Myvolume ---> bricks
> 5.-Under Compute ---> Hosts --> mybadhost.mydomain.com the host  is in
> maintenance mode.
> 6.-When I try to remove that host (with "Force REmove" ticked) I keep
> getting:
> Operation Canceled
>  Error while executing action:
> mybadhost.mydomain.com
> - Cannot remove Host. Server having Gluster volume.
> Note: I have also confirmed "host has been rebooted"
>
> Since the bad host was not recoverable (it was fried), I took a brand new
> server with the same specs and installed oVirt 4.3.3 on it and have it
> ready to add it to the cluster with the same hostname and IP but I cant do
> this until I remove the old entries on the WEB UI of the Hosted Engine VM.
>
> If this is not possible would I really need to add this new host with a
> different name and IP?
> What would be the correct and best procedure to fix this?
>
> Note that my setup is a 9 node setup with hyperconverged and replica 3
> bricks and  in a  distributed replicated volume scenario.
>
> Thanks
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/N4HFTCWNFTOJJ34VSBHY5NKK5ZQAEDB7/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BPDZYAPS4LEKVFFNHKEXK4H4LQO5LOL6/


[ovirt-users] Re: [ANN] oVirt 4.3.4 Fourth Release Candidate is now available

2019-06-06 Thread Strahil Nikolov
 Hi Sandro,
thanks for the update.I have noticed in RC3 and now in RC4 that data gluster 
bricks does not provide "Advanced Details", while the arbiter does.
I'm mentioning that , as oVirt is currently being rebased for gluster v6 (my 
setup is using gluster v6.1 from CentOS 7 repos) and you can keep that in 
mind.For details , check  1693998 – [Tracker] Rebase on Gluster 6

| 
| 
|  | 
1693998 – [Tracker] Rebase on Gluster 6


 |

 |

 |


I can't find any other issues in RC4. Maybe someone with gluster v5 can check 
their "Advanced Details" and confirm they are OK.
Best Regards,Strahil Nikolov
В четвъртък, 6 юни 2019 г., 11:02:00 ч. Гринуич+3, Sandro Bonazzola 
 написа:  
 
 The oVirt Project is pleased to announce the availability of the oVirt 4.3.4 
FourthRelease Candidate, as of June 6th, 2019.

This update is a release candidate of the fourth in a series of stabilization 
updates to the 4.3 series.
This is pre-release software. This pre-release should not to be used 
inproduction.

This release is available now on x86_64 architecture for:
* Red Hat Enterprise Linux 7.6 or later
* CentOS Linux (or similar) 7.6 or later

This release supports Hypervisor Hosts on x86_64 and ppc64le architectures for:
* Red Hat Enterprise Linux 7.6 or later
* CentOS Linux (or similar) 7.6 or later
* oVirt Node 4.3 (available for x86_64 only)

Experimental tech preview for x86_64 and s390x architectures for Fedora 28 is 
also included.

See the release notes [1] for installation / upgrade instructions and a list of 
new features and bugs fixed.

Notes:
- oVirt Appliance is already available
- oVirt Node is already available[2]
- oVirt Windows Guest Tools iso is already available [2]

Additional Resources:
* Read more about the oVirt 4.3.4 release 
highlights:http://www.ovirt.org/release/4.3.4/
* Get more oVirt Project updates on Twitter: https://twitter.com/ovirt
* Check out the latest project news on the oVirt blog:http://www.ovirt.org/blog/

[1] http://www.ovirt.org/release/4.3.4/
[2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/

-- 

Sandro Bonazzola



MANAGER, SOFTWARE ENGINEERING, EMEA R RHV

Red Hat EMEA

sbona...@redhat.com   

|  |  |

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZDVUI3KHHJCFEOYLMHVDIHPWE37TAKTK/
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DKNVUJYQ6GH3T6NES5OT3EETGHXZ7EO6/


[ovirt-users] Re: Replace bad Host from a 9 Node hyperconverged setup 4.3.3

2019-06-06 Thread adrianquintero
Definitely is a challenge trying to replace a bad host.

So let me tell you what I see and have done so far:

1.-I have a host that went bad due to HW issues.
2.-This bad host is still showing in the compute --> hosts section.
3.-This host was part of a hyperconverged setup with Gluster.
4.-The gluster bricks for this server show up with a "?" mark inside the 
volumes under Storage ---> Volumes ---> Myvolume ---> bricks
5.-Under Compute ---> Hosts --> mybadhost.mydomain.com the host  is in 
maintenance mode.
6.-When I try to remove that host (with "Force REmove" ticked) I keep getting:
Operation Canceled
 Error while executing action: 
mybadhost.mydomain.com
- Cannot remove Host. Server having Gluster volume.
Note: I have also confirmed "host has been rebooted"

Since the bad host was not recoverable (it was fried), I took a brand new 
server with the same specs and installed oVirt 4.3.3 on it and have it ready to 
add it to the cluster with the same hostname and IP but I cant do this until I 
remove the old entries on the WEB UI of the Hosted Engine VM.

If this is not possible would I really need to add this new host with a 
different name and IP?  
What would be the correct and best procedure to fix this?

Note that my setup is a 9 node setup with hyperconverged and replica 3  bricks 
and  in a  distributed replicated volume scenario.

Thanks
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/N4HFTCWNFTOJJ34VSBHY5NKK5ZQAEDB7/


[ovirt-users] Re: Replace bad Host from a 9 Node hyperconverged setup 4.3.3

2019-06-06 Thread Strahil Nikolov
 Have you tried with "Force remove" tick ?
Best Regards,Strahil Nikolov
В четвъртък, 6 юни 2019 г., 21:47:20 ч. Гринуич+3, Adrian Quintero 
 написа:  
 
 I tried removing the bad host but running into the following issue , any idea?

Operation Canceled
Error while executing action: 

host1.mydomain.com   
   - Cannot remove Host. Server having Gluster volume.



On Thu, Jun 6, 2019 at 11:18 AM Adrian Quintero  
wrote:

Leo, I forgot to mention that I have 1 SSD disk for caching purposes, wondering 
how that setup should be achieved?
thanks,
Adrian

On Wed, Jun 5, 2019 at 11:25 PM Adrian Quintero  
wrote:

Hi Leo, yes, this helps a lot, this confirms the plan we had in mind.
Will test tomorrow and post the results.
Thanks again
Adrian
On Wed, Jun 5, 2019 at 11:18 PM Leo David  wrote:

Hi Adrian,I think the steps are:- reinstall the host- join it to virtualisation 
clusterAnd if was member of gluster cluster as well:- go to host - storage 
devices- create the bricks on the devices - as they are on the other hosts- go 
to storage - volumes- replace each failed brick with the corresponding new 
one.Hope it helps.Cheers,Leo

On Wed, Jun 5, 2019, 23:09  wrote:

Anybody have had to replace a failed host from a 3, 6, or 9 node hyperconverged 
setup with gluster storage?

One of my hosts is completely dead, I need to do a fresh install using ovirt 
node iso, can anybody point me to the proper steps?

thanks,
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RFBYQKWC2KNZVYTYQF5T256UZBCJHK5F/


-- 
Adrian Quintero



-- 
Adrian Quintero



-- 
Adrian Quintero
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PB2YWWPO2TRJ6EYXAETPUV2DSVQLXDRR/
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6EDIM2TLIFPEKANZ2QIUTXGSIWKYC2ET/


[ovirt-users] Re: Replace bad Host from a 9 Node hyperconverged setup 4.3.3

2019-06-06 Thread Adrian Quintero
I tried removing the bad host but running into the following issue , any
idea?
Operation Canceled
Error while executing action:

host1.mydomain.com

   - Cannot remove Host. Server having Gluster volume.




On Thu, Jun 6, 2019 at 11:18 AM Adrian Quintero 
wrote:

> Leo, I forgot to mention that I have 1 SSD disk for caching purposes,
> wondering how that setup should be achieved?
>
> thanks,
>
> Adrian
>
> On Wed, Jun 5, 2019 at 11:25 PM Adrian Quintero 
> wrote:
>
>> Hi Leo, yes, this helps a lot, this confirms the plan we had in mind.
>>
>> Will test tomorrow and post the results.
>>
>> Thanks again
>>
>> Adrian
>>
>> On Wed, Jun 5, 2019 at 11:18 PM Leo David  wrote:
>>
>>> Hi Adrian,
>>> I think the steps are:
>>> - reinstall the host
>>> - join it to virtualisation cluster
>>> And if was member of gluster cluster as well:
>>> - go to host - storage devices
>>> - create the bricks on the devices - as they are on the other hosts
>>> - go to storage - volumes
>>> - replace each failed brick with the corresponding new one.
>>> Hope it helps.
>>> Cheers,
>>> Leo
>>>
>>>
>>> On Wed, Jun 5, 2019, 23:09  wrote:
>>>
 Anybody have had to replace a failed host from a 3, 6, or 9 node
 hyperconverged setup with gluster storage?

 One of my hosts is completely dead, I need to do a fresh install using
 ovirt node iso, can anybody point me to the proper steps?

 thanks,
 ___
 Users mailing list -- users@ovirt.org
 To unsubscribe send an email to users-le...@ovirt.org
 Privacy Statement: https://www.ovirt.org/site/privacy-policy/
 oVirt Code of Conduct:
 https://www.ovirt.org/community/about/community-guidelines/
 List Archives:
 https://lists.ovirt.org/archives/list/users@ovirt.org/message/RFBYQKWC2KNZVYTYQF5T256UZBCJHK5F/

>>> --
>> Adrian Quintero
>>
>
>
> --
> Adrian Quintero
>


-- 
Adrian Quintero
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PB2YWWPO2TRJ6EYXAETPUV2DSVQLXDRR/


[ovirt-users] Re: ETL service sampling has encountered an error. Please consult the service log for more details.

2019-06-06 Thread Shirly Radco
Hi Nicolas,

Please open a bug in bugzilla and attach the ovirt-engine-dwh.log,
engine.log,
The versions of ovirt-engine and ovirt-engine-dwh
and any other relevant information that you can tell about your env.
Did you do any change before this issue started?

Best regards,

--

Shirly Radco

BI Senior Software Engineer

Red Hat 




On Thu, Jun 6, 2019 at 3:44 PM  wrote:

> Hi,
>
> We're running oVirt 4.1.9 (cannot upgrade yet until [1] is released).
> Since a few days ago our event list if full of lines like this:
>
>ETL service sampling has encountered an error. Please consult the
> service log for more details.
>
> Having a look at the log I see events like:
>
> 2019-06-06
> 13:37:11|NJ4C8T|TOlL8U|FdlWtU|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java
>
> Exception|tJDBCOutput_7|org.postgresql.util.PSQLException:ERROR: current
> transaction is aborted, commands ignored until end of transaction
> block|1
> 2019-06-06
> 13:37:11|NJ4C8T|TOlL8U|FdlWtU|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java
>
> Exception|tJDBCOutput_4|org.postgresql.util.PSQLException:ERROR: current
> transaction is aborted, commands ignored until end of transaction
> block|1
> Exception in component tJDBCOutput_5
> org.postgresql.util.PSQLException: ERROR: current transaction is
> aborted, commands ignored until end of transaction block
>  at
>
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2157)
>  at
>
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886)
>  at
>
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
>  at
>
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:555)
>  at
>
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:417)
>  at
>
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:363)
>  at
>
> ovirt_engine_dwh.statisticssync_4_1.StatisticsSync.tJDBCInput_10Process(StatisticsSync.java:9030)
>  at
>
> ovirt_engine_dwh.statisticssync_4_1.StatisticsSync$5.run(StatisticsSync.java:16071)
> 2019-06-06
> 13:37:11|NJ4C8T|TOlL8U|FdlWtU|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java
>
> Exception|tJDBCOutput_5|org.postgresql.util.PSQLException:ERROR: current
> transaction is aborted, commands ignored until end of transaction
> block|1
> Exception in component tRunJob_5
> java.lang.RuntimeException: Child job running failed
>  at
>
> ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_5Process(SampleRunJobs.java:1654)
>  at
>
> ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_6Process(SampleRunJobs.java:1456)
>  at
>
> ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_1Process(SampleRunJobs.java:1228)
>  at
>
> ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_4Process(SampleRunJobs.java:1000)
>  at
>
> ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tJDBCConnection_2Process(SampleRunJobs.java:767)
>  at
>
> ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tJDBCConnection_1Process(SampleRunJobs.java:642)
>  at
>
> ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs$2.run(SampleRunJobs.java:2683)
> 2019-06-06
> 13:37:11|FdlWtU|TOlL8U|KNLNa4|OVIRT_ENGINE_DWH|SampleRunJobs|Default|6|Java
>
> Exception|tRunJob_5|java.lang.RuntimeException:Child job running
> failed|1
> Exception in component tRunJob_1
> java.lang.RuntimeException: Child job running failed
>  at
>
> ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tRunJob_1Process(SampleTimeKeepingJob.java:6067)
>  at
>
> ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_2Process(SampleTimeKeepingJob.java:5809)
>  at
>
> ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_1Process(SampleTimeKeepingJob.java:)
>  at
>
> ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_2Process(SampleTimeKeepingJob.java:4319)
>  at
>
> ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tRowGenerator_2Process(SampleTimeKeepingJob.java:4188)
>  at
>
> ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_3Process(SampleTimeKeepingJob.java:3593)
>  at
>
> ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_5Process(SampleTimeKeepingJob.java:2977)
>  at
>
> ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_4Process(SampleTimeKeepingJob.java:2295)
>  at
>
> ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_3Process(SampleTimeKeepingJob.java:1649)
>  at
>
> ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob$2.run(SampleTimeKeepingJob.java:11363)
>
> Can someone tell me how to fix it? I already restarted 

[ovirt-users] high number of interface RX errors on ovirtmgmt network

2019-06-06 Thread Jayme
I have a three node HCI setup on Dell R720s running the latest stable
version of 4.3.3

Each hosts has a 1gig link and a 10gig link.  The 1gig is used for ovirt
management network and 10gig link is used for backend glusterFS traffic.

I haven't noticed before but after installing ovirt metrics store I'm
seeing that gig interface used for ovirtmgmt on all three hosts are showing
high RX error rates.  The 10gig interfaces for glusterFS on all three hosts
appear to be fine.

The 1gig ethernet controllers are: Broadcom Inc. and subsidiaries NetXtreme
II BCM57800 1/10 Gigabit Ethernet (rev 10)

Other physical servers on the same network/switches outside of oVirt have
zero RX errors.

Here is an example of what I'm seeing:

host0:

# ip -s link show em3
4: em3:  mtu 1500 qdisc mq master
ovirtmgmt state UP mode DEFAULT group default qlen 1000
link/ether b0:83:fe:cc:9a:2d brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast
51777532544474 36233202312 416993  0   0   2062421
TX: bytes  packets  errors  dropped carrier collsns
7284362442704 18685883330 0   0   0   0

host1:

# ip -s link show em3
4: em3:  mtu 1500 qdisc mq master
ovirtmgmt state UP mode DEFAULT group default qlen 1000
link/ether b0:83:fe:cc:99:31 brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast
9518766859330 14424644226 89638   0   0   2056578
TX: bytes  packets  errors  dropped carrier collsns
27866585257227 22323979969 0   0   0   0

host2:

# ip -s link show em3
4: em3:  mtu 1500 qdisc mq master
ovirtmgmt state UP mode DEFAULT group default qlen 1000
link/ether b0:83:fe:cc:92:50 brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast
6409138012195 13045254148 14825   0   0   2040655
TX: bytes  packets  errors  dropped carrier collsns
31577745516683 23466818659 0   0   0   0

Anyone have any ideas why the RX error rate on the ovirtmgmt network could
be so high?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ENN7QJF5QCHO3FZ5KBVG2VDDSOA3AOGM/


[ovirt-users] Re: Replace bad Host from a 9 Node hyperconverged setup 4.3.3

2019-06-06 Thread Adrian Quintero
Leo, I forgot to mention that I have 1 SSD disk for caching purposes,
wondering how that setup should be achieved?

thanks,

Adrian

On Wed, Jun 5, 2019 at 11:25 PM Adrian Quintero 
wrote:

> Hi Leo, yes, this helps a lot, this confirms the plan we had in mind.
>
> Will test tomorrow and post the results.
>
> Thanks again
>
> Adrian
>
> On Wed, Jun 5, 2019 at 11:18 PM Leo David  wrote:
>
>> Hi Adrian,
>> I think the steps are:
>> - reinstall the host
>> - join it to virtualisation cluster
>> And if was member of gluster cluster as well:
>> - go to host - storage devices
>> - create the bricks on the devices - as they are on the other hosts
>> - go to storage - volumes
>> - replace each failed brick with the corresponding new one.
>> Hope it helps.
>> Cheers,
>> Leo
>>
>>
>> On Wed, Jun 5, 2019, 23:09  wrote:
>>
>>> Anybody have had to replace a failed host from a 3, 6, or 9 node
>>> hyperconverged setup with gluster storage?
>>>
>>> One of my hosts is completely dead, I need to do a fresh install using
>>> ovirt node iso, can anybody point me to the proper steps?
>>>
>>> thanks,
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/RFBYQKWC2KNZVYTYQF5T256UZBCJHK5F/
>>>
>> --
> Adrian Quintero
>


-- 
Adrian Quintero
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/45ERP2ZTABEPRBV7P2XANRZIEBBCFGX3/


[ovirt-users] Re: Hosted Engine Abruptly Stopped Responding - Unexpected Shutdown

2019-06-06 Thread Strahil

On Jun 6, 2019 12:52, souvaliotima...@mail.com wrote:
>
> Hello, 
>
> I came upon a problem the previous month that I figured it would be good to 
> discuss here. I'm sorry I didn't post here earlier but time slipped me. 
>
> I have set up a glustered, hyperconverged oVirt environment for experimental 
> use as a means to see its  behaviour and get used to its management and 
> performance before setting it up as a production environment for use in our 
> organization. The environment is up and running since 2018 October. The three 
> nodes are HP ProLiant DL380 G7 and have the following characteristics: 
>
> Mem: 22GB 
> CPU: 2x Hexa Core - Intel Xeon Hexa Core E56xx 
> HDD: 5x 300GB 
> Network: BCM5709C with dual-port Gigabit 
> OS: Linux RedHat 7.5.1804(Core 3.10.0-862.3.2.el7.x86_64 x86_64) - Ovirt Node 
> 4.2.3.1 
>
> As I was working on the environment, the engine stopped working. 
> Not long before the time the HE stopped, I was in the web interface managing 
> my VMs, when the browser froze and the HE was also not responding to ICMP 
> requests. 
>
> The first thing I did was to connect via ssh to all nodes and run the command 
> #hosted-engine --vm-status 
> which showed that the HE was down in nodes 1 and 2 and up on the 3rd node. 
>
> After executing 
> #virsh -r list 
> the VM list that was shown contained two of the VMs I had previously created 
> and were up; the HE was nowhere. 
>
> I tried to restart the HE with the 
> #hosted-engine --vm-start 
> but it didn't work. 
>
> I then put all nodes in maintenance mode with the command 
> #hosted-engine --set-maintenance --mode=global 
> (I guess I should have done that earlier) and re-run 
> #hosted-engine --vm-start 
> that had the same result as it previously did. 
>
> After checking the mails the system sent to the root user, I saw there were 
> several mails on the 3rd node (where the HE had been), informing of the HE's 
> state. The messages were changing between EngineDown-EngineStart, 
> EngineStart-EngineStarting, EngineStarting-EngineMaybeAway, 
> EngineMaybeAway-EngineUnexpectedlyDown, EngineUnexpectedlyDown-EngineDown, 
> EngineDown-EngineStart and so forth. 
>
> I continued by searching the following logs in all nodes : 
> /var/log/libvirt/qemu/HostedEngine.log 
> /var/log/libvirt/qemu/win10.log 
> /var/log/libvirt/qemu/DNStest.log 
> /var/log/vdsm/vdsm.log 
> /var/log/ovirt-hosted-engine-ha/agent.log 
>
> After that I spotted and error that had started appearing almost a month ago 
> in node #2: 
> ERROR Internal server error Traceback (most recent call last): File 
> "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in 
> _handle_request res = method(**params) File 
> "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in 
> _dynamicMethod result = fn(*methodArgs) File 
> "/usr/lib/python2.7/site-packages/vdsm/gluster/apiwrapper.py", line 85, in 
> logicalVolumeList return self._gluster.logicalVolumeList() File 
> "/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", line 90, in wrapper 
> rv = func(*args, **kwargs) File 
> "/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", line 808, in 
> logicalVolumeList status = self.svdsmProxy.glusterLogicalVolumeList() File 
> "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in 
> __call__ return callMethod() File 
> "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 52, in 
>  getattr(self._supervdsmProxy._svdsm, self._funcName)(*args, 
> AttributeError: 'AutoProxy[instance]' object has no attribute 
> 'glusterLogicalVolumeList' 
>
>
> The outputs of the following commands were also checked as a way to see if 
> there was a mandatory process missing/killed, a memory problem or even disk 
> space shortage that led to the sudden death of a process 
> #ps -A 
> #top 
> #free -h 
> #df -hT 
>
> Finally, after some time delving in the logs, the output of the 
> #journalctl --dmesg 
> showed the following message 
> "Out of memory: Kill process 5422 (qemu-kvm) score 514 or sacrifice child. 
> Killed process 5422 (qemu-kvm) total-vm:17526548kB, anon-rss:9310396kB, 
> file-rss:2336kB, shmem-rss:12kB" 
> which after that the ovirtmgmt started not responding. 
If you run out of memory, you should take that serious.Droping the cache seems 
like a workaround and not a fix.
Check if KSM is enabled - this will merge your VM's memory pages for an 
exchange for CPU cycles - still better than getting a VM killed.
Also, you can protect the HostedEngine from OOM killer.

> I tried to restart the vhostd by executing 
> #/etc/rc.d/init.d/vhostmd start 
> but it didn't work. 
>
> Finally, I decided to run the HE restart command on the other nodes as well 
> (I'd figured that since the HE was last running on the node #3, that's where 
> I should try to restart it). So, I run 
> #hosted-engine --vm-start 
> and the output was 
> "Command VM.getStats with args {'vmID':'...<το ID της HE>'} failed: 
> (code=1,message=Virtual machine does 

[ovirt-users] Moving (or removing) an HE host should be prevented

2019-06-06 Thread Stefano Stagnaro
I've realized that moving an HE host to another DC/CL is not prevented, leading 
to an awkward situation for which the host is UP in the new CL but still 
retains the "silver crown". Moreover, "hosted-engine --vm-status" on another HE 
host still show the departed host with score 0. The same situation can be 
achieved removing an HE host that was previously set in Maintenance.

I think those operations should be prevented for HE hosts and a warning message 
like "Please undeploy HE first" should be displayed. Otherwise trigger the HE 
undeployment on host move/remove.

B.R.,
Stefano Stagnaro
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/H77DRZ6PR2KEIA2NW5IST4SW4ZOQ6A62/


[ovirt-users] Re: Hosted Engine Abruptly Stopped Responding - Unexpected Shutdown

2019-06-06 Thread Edward Berger
When I read your intro, and I hit the memory figure,  I was saying to
myself, what
I'd definitely increase the memory if possible.   As high as you can
affordably fit into the servers.
Engine asks 16GB at installation time, add some for gluster services and
you're at your limits before you add a user VM.

My first non-hyperconverged hosted-engine install used a 32GB and a 24GB
dual xeon machines with only 8GB allocated for the engine VM.
I felt more confident in it when I upgraded the 24GB node to 48GB.  So 48GB
would my minimum, 64 OK, and the more the better..
Later, I was able to find some used 144GB supermicro servers which I
replaced the above nodes with.

Modern 64bit CentOS likes to have around 2GB per core for basic server
functions.
For desktops, I say have at least 8GB because web browsers eat up RAM.



On Thu, Jun 6, 2019 at 5:52 AM  wrote:

> Hello,
>
> I came upon a problem the previous month that I figured it would be good
> to discuss here. I'm sorry I didn't post here earlier but time slipped me.
>
> I have set up a glustered, hyperconverged oVirt environment for
> experimental use as a means to see its  behaviour and get used to its
> management and performance before setting it up as a production environment
> for use in our organization. The environment is up and running since 2018
> October. The three nodes are HP ProLiant DL380 G7 and have the following
> characteristics:
>
> Mem: 22GB
> CPU: 2x Hexa Core - Intel Xeon Hexa Core E56xx
> HDD: 5x 300GB
> Network: BCM5709C with dual-port Gigabit
> OS: Linux RedHat 7.5.1804(Core 3.10.0-862.3.2.el7.x86_64 x86_64) - Ovirt
> Node 4.2.3.1
>
> As I was working on the environment, the engine stopped working.
> Not long before the time the HE stopped, I was in the web interface
> managing my VMs, when the browser froze and the HE was also not responding
> to ICMP requests.
>
> The first thing I did was to connect via ssh to all nodes and run the
> command
> #hosted-engine --vm-status
> which showed that the HE was down in nodes 1 and 2 and up on the 3rd node.
>
> After executing
> #virsh -r list
> the VM list that was shown contained two of the VMs I had previously
> created and were up; the HE was nowhere.
>
> I tried to restart the HE with the
> #hosted-engine --vm-start
> but it didn't work.
>
> I then put all nodes in maintenance mode with the command
> #hosted-engine --set-maintenance --mode=global
> (I guess I should have done that earlier) and re-run
> #hosted-engine --vm-start
> that had the same result as it previously did.
>
> After checking the mails the system sent to the root user, I saw there
> were several mails on the 3rd node (where the HE had been), informing of
> the HE's state. The messages were changing between EngineDown-EngineStart,
> EngineStart-EngineStarting, EngineStarting-EngineMaybeAway,
> EngineMaybeAway-EngineUnexpectedlyDown, EngineUnexpectedlyDown-EngineDown,
> EngineDown-EngineStart and so forth.
>
> I continued by searching the following logs in all nodes :
> /var/log/libvirt/qemu/HostedEngine.log
> /var/log/libvirt/qemu/win10.log
> /var/log/libvirt/qemu/DNStest.log
> /var/log/vdsm/vdsm.log
> /var/log/ovirt-hosted-engine-ha/agent.log
>
> After that I spotted and error that had started appearing almost a month
> ago in node #2:
> ERROR Internal server error Traceback (most recent call last): File
> "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in
> _handle_request res = method(**params) File
> "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in
> _dynamicMethod result = fn(*methodArgs) File
> "/usr/lib/python2.7/site-packages/vdsm/gluster/apiwrapper.py", line 85, in
> logicalVolumeList return self._gluster.logicalVolumeList() File
> "/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", line 90, in wrapper
> rv = func(*args, **kwargs) File
> "/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", line 808, in
> logicalVolumeList status = self.svdsmProxy.glusterLogicalVolumeList() File
> "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in
> __call__ return callMethod() File
> "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 52, in
>  getattr(self._supervdsmProxy._svdsm, self._funcName)(*args,
> AttributeError: 'AutoProxy[instance]' object has no attribute
> 'glusterLogicalVolumeList'
>
>
> The outputs of the following commands were also checked as a way to see if
> there was a mandatory process missing/killed, a memory problem or even disk
> space shortage that led to the sudden death of a process
> #ps -A
> #top
> #free -h
> #df -hT
>
> Finally, after some time delving in the logs, the output of the
> #journalctl --dmesg
> showed the following message
> "Out of memory: Kill process 5422 (qemu-kvm) score 514 or sacrifice child.
> Killed process 5422 (qemu-kvm) total-vm:17526548kB, anon-rss:9310396kB,
> file-rss:2336kB, shmem-rss:12kB"
> which after that the ovirtmgmt started not responding.
>
> I tried to restart the vhostd by 

[ovirt-users] ETL service sampling has encountered an error. Please consult the service log for more details.

2019-06-06 Thread nicolas

Hi,

We're running oVirt 4.1.9 (cannot upgrade yet until [1] is released). 
Since a few days ago our event list if full of lines like this:


  ETL service sampling has encountered an error. Please consult the 
service log for more details.


Having a look at the log I see events like:

2019-06-06 
13:37:11|NJ4C8T|TOlL8U|FdlWtU|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java 
Exception|tJDBCOutput_7|org.postgresql.util.PSQLException:ERROR: current 
transaction is aborted, commands ignored until end of transaction 
block|1
2019-06-06 
13:37:11|NJ4C8T|TOlL8U|FdlWtU|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java 
Exception|tJDBCOutput_4|org.postgresql.util.PSQLException:ERROR: current 
transaction is aborted, commands ignored until end of transaction 
block|1

Exception in component tJDBCOutput_5
org.postgresql.util.PSQLException: ERROR: current transaction is 
aborted, commands ignored until end of transaction block
at 
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2157)
at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886)
at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:555)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:417)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:363)
at 
ovirt_engine_dwh.statisticssync_4_1.StatisticsSync.tJDBCInput_10Process(StatisticsSync.java:9030)
at 
ovirt_engine_dwh.statisticssync_4_1.StatisticsSync$5.run(StatisticsSync.java:16071)
2019-06-06 
13:37:11|NJ4C8T|TOlL8U|FdlWtU|OVIRT_ENGINE_DWH|StatisticsSync|Default|6|Java 
Exception|tJDBCOutput_5|org.postgresql.util.PSQLException:ERROR: current 
transaction is aborted, commands ignored until end of transaction 
block|1

Exception in component tRunJob_5
java.lang.RuntimeException: Child job running failed
at 
ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_5Process(SampleRunJobs.java:1654)
at 
ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_6Process(SampleRunJobs.java:1456)
at 
ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_1Process(SampleRunJobs.java:1228)
at 
ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tRunJob_4Process(SampleRunJobs.java:1000)
at 
ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tJDBCConnection_2Process(SampleRunJobs.java:767)
at 
ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs.tJDBCConnection_1Process(SampleRunJobs.java:642)
at 
ovirt_engine_dwh.samplerunjobs_4_1.SampleRunJobs$2.run(SampleRunJobs.java:2683)
2019-06-06 
13:37:11|FdlWtU|TOlL8U|KNLNa4|OVIRT_ENGINE_DWH|SampleRunJobs|Default|6|Java 
Exception|tRunJob_5|java.lang.RuntimeException:Child job running 
failed|1

Exception in component tRunJob_1
java.lang.RuntimeException: Child job running failed
at 
ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tRunJob_1Process(SampleTimeKeepingJob.java:6067)
at 
ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_2Process(SampleTimeKeepingJob.java:5809)
at 
ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_1Process(SampleTimeKeepingJob.java:)
at 
ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_2Process(SampleTimeKeepingJob.java:4319)
at 
ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tRowGenerator_2Process(SampleTimeKeepingJob.java:4188)
at 
ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_3Process(SampleTimeKeepingJob.java:3593)
at 
ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_5Process(SampleTimeKeepingJob.java:2977)
at 
ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCInput_4Process(SampleTimeKeepingJob.java:2295)
at 
ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob.tJDBCConnection_3Process(SampleTimeKeepingJob.java:1649)
at 
ovirt_engine_dwh.sampletimekeepingjob_4_1.SampleTimeKeepingJob$2.run(SampleTimeKeepingJob.java:11363)


Can someone tell me how to fix it? I already restarted ovirt-engine, 
ovirt-engine-dwhd, postgresql and the three at a time and still didn't 
fix the issue. Currently the DWHD data is empty in the Dashboard.


Thanks!

  [1]: https://github.com/oVirt/ovirt-web-ui/issues/490
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PXMHT5K6ZBFMNHEAUSYWWSDZ2GFUIOXQ/


[ovirt-users] Hosted Engine Abruptly Stopped Responding - Unexpected Shutdown

2019-06-06 Thread souvaliotimaria
Hello, 

I came upon a problem the previous month that I figured it would be good to 
discuss here. I'm sorry I didn't post here earlier but time slipped me. 

I have set up a glustered, hyperconverged oVirt environment for experimental 
use as a means to see its  behaviour and get used to its management and 
performance before setting it up as a production environment for use in our 
organization. The environment is up and running since 2018 October. The three 
nodes are HP ProLiant DL380 G7 and have the following characteristics:

Mem: 22GB
CPU: 2x Hexa Core - Intel Xeon Hexa Core E56xx
HDD: 5x 300GB
Network: BCM5709C with dual-port Gigabit
OS: Linux RedHat 7.5.1804(Core 3.10.0-862.3.2.el7.x86_64 x86_64) - Ovirt Node 
4.2.3.1

As I was working on the environment, the engine stopped working.
Not long before the time the HE stopped, I was in the web interface managing my 
VMs, when the browser froze and the HE was also not responding to ICMP 
requests. 

The first thing I did was to connect via ssh to all nodes and run the command
#hosted-engine --vm-status 
which showed that the HE was down in nodes 1 and 2 and up on the 3rd node. 

After executing
#virsh -r list
the VM list that was shown contained two of the VMs I had previously created 
and were up; the HE was nowhere.

I tried to restart the HE with the
#hosted-engine --vm-start
but it didn't work.

I then put all nodes in maintenance mode with the command
#hosted-engine --set-maintenance --mode=global
(I guess I should have done that earlier) and re-run
#hosted-engine --vm-start
that had the same result as it previously did. 

After checking the mails the system sent to the root user, I saw there were 
several mails on the 3rd node (where the HE had been), informing of the HE's 
state. The messages were changing between EngineDown-EngineStart, 
EngineStart-EngineStarting, EngineStarting-EngineMaybeAway, 
EngineMaybeAway-EngineUnexpectedlyDown, EngineUnexpectedlyDown-EngineDown, 
EngineDown-EngineStart and so forth.

I continued by searching the following logs in all nodes :
/var/log/libvirt/qemu/HostedEngine.log
/var/log/libvirt/qemu/win10.log
/var/log/libvirt/qemu/DNStest.log
/var/log/vdsm/vdsm.log
/var/log/ovirt-hosted-engine-ha/agent.log

After that I spotted and error that had started appearing almost a month ago in 
node #2:
ERROR Internal server error Traceback (most recent call last): File 
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 606, in 
_handle_request res = method(**params) File 
"/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 197, in 
_dynamicMethod result = fn(*methodArgs) File 
"/usr/lib/python2.7/site-packages/vdsm/gluster/apiwrapper.py", line 85, in 
logicalVolumeList return self._gluster.logicalVolumeList() File 
"/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", line 90, in wrapper rv 
= func(*args, **kwargs) File 
"/usr/lib/python2.7/site-packages/vdsm/gluster/api.py", line 808, in 
logicalVolumeList status = self.svdsmProxy.glusterLogicalVolumeList() File 
"/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 55, in 
__call__ return callMethod() File 
"/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 52, in 
 getattr(self._supervdsmProxy._svdsm, self._funcName)(*args, 
AttributeError: 'AutoProxy[instance]' object has no attribute 
'glusterLogicalVolumeList'


The outputs of the following commands were also checked as a way to see if 
there was a mandatory process missing/killed, a memory problem or even disk 
space shortage that led to the sudden death of a process
#ps -A
#top
#free -h
#df -hT

Finally, after some time delving in the logs, the output of the 
#journalctl --dmesg
showed the following message
"Out of memory: Kill process 5422 (qemu-kvm) score 514 or sacrifice child.
Killed process 5422 (qemu-kvm) total-vm:17526548kB, anon-rss:9310396kB,
file-rss:2336kB, shmem-rss:12kB"
which after that the ovirtmgmt started not responding.

I tried to restart the vhostd by executing
#/etc/rc.d/init.d/vhostmd start
but it didn't work. 

Finally, I decided to run the HE restart command on the other nodes as well 
(I'd figured that since the HE was last running on the node #3, that's where I 
should try to restart it). So, I run 
#hosted-engine --vm-start
and the output was 
"Command VM.getStats with args {'vmID':'...<το ID της HE>'} failed:
(code=1,message=Virtual machine does not exist: {'vmID':'...<το ID της
HE>'})"
And then I run the command again and the output was
"VM exists and its status is Powering Up."

After that I executed 
#virsh -r list
and the output was the following:
Id Name   State

2  HostedEngine  running

After the HE's restart two mails came that stated: 
ReinitializeFSMEngineStarting and EngineStarting-EngineUp

After that and after checking that we had access to the web interface again, we 
executed
hosted-engine --set-maintenance --mode=none
to get out of the 

[ovirt-users] [ANN] oVirt 4.3.4 Fourth Release Candidate is now available

2019-06-06 Thread Sandro Bonazzola
The oVirt Project is pleased to announce the availability of the oVirt
4.3.4 FourthRelease Candidate, as of June 6th, 2019.

This update is a release candidate of the fourth in a series of
stabilization updates to the 4.3 series.
This is pre-release software. This pre-release should not to be used
inproduction.

This release is available now on x86_64 architecture for:
* Red Hat Enterprise Linux 7.6 or later
* CentOS Linux (or similar) 7.6 or later

This release supports Hypervisor Hosts on x86_64 and ppc64le architectures
for:
* Red Hat Enterprise Linux 7.6 or later
* CentOS Linux (or similar) 7.6 or later
* oVirt Node 4.3 (available for x86_64 only)

Experimental tech preview for x86_64 and s390x architectures for Fedora 28
is also included.

See the release notes [1] for installation / upgrade instructions and a
list of new features and bugs fixed.

Notes:
- oVirt Appliance is already available
- oVirt Node is already available[2]
- oVirt Windows Guest Tools iso is already available [2]

Additional Resources:
* Read more about the oVirt 4.3.4 release highlights:
http://www.ovirt.org/release/4.3.4/
* Get more oVirt Project updates on Twitter: https://twitter.com/ovirt
* Check out the latest project news on the oVirt blog:
http://www.ovirt.org/blog/

[1] http://www.ovirt.org/release/4.3.4/
[2] http://resources.ovirt.org/pub/ovirt-4.3-pre/iso/

-- 

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R RHV

Red Hat EMEA 

sbona...@redhat.com


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZDVUI3KHHJCFEOYLMHVDIHPWE37TAKTK/


[ovirt-users]Re: Bond Mode 1 (Active-Backup),vm unreachable for minutes when bond link change

2019-06-06 Thread Edward Haas
On Sat, May 25, 2019 at 5:06 AM  wrote:

> Hello,
>
> I've a problem, all my ovirt hosts and vms are linked with a bonding mode
> 1(Active-Backup)2x10Gbps
> ovirt version:4.3
> topology:
>--eno2
> vm--ovirtmgmt--bond0---eno1
>
> ifcfg-bond0:
> # Generated by VDSM version 4.30.9.1
> DEVICE=bond0
> BONDING_OPTIOS='mode=1 miion=100'
> BRIDGE=ovirtmgmt
> MACADDR=a4:be:26:16:e9:b2
> ONBOOT=yes
> MTU=1500
> DEFROUTE=no
> NM_CONTROLLER=no
> IPV6INIT=no
>
> ifcfg-eno1:
> # Generated by VDSM version 4.30.9.1
> DEVICE=eno1
> MASTER=bond0
> SLAVE=yes
> ONBOOT=yes
> MTU=1500
> DEFROUTE=no
> NM_CONTROLLER=no
> IPV6INIT=no
>
> ifcfg-eno2:
> # Generated by VDSM version 4.30.9.1
> DEVICE=eno2
> MASTER=bond0
> SLAVE=yes
> ONBOOT=yes
> MTU=1500
> DEFROUTE=no
> NM_CONTROLLER=no
> IPV6INIT=no
>
> ifcfg-ovirtmgmt:
> # Generated by VDSM version 4.30.9.1
> DEVICE=ovirtmgmt
> TYPE=Brodge
> DELAY=0
> STP=off
> ONBOOT=yes
> IPADDR=x.x.x.x
> NEYMASK=255.255.255.0
> GATEWAY=x.x.x.x
> BOOTPROTO=none
> MTU=1500
> DEFROUTE=yes
> NM_CONTROLLER=no
> IPV6INIT=yes
> IPV6_AUTOCONF=yes
>
>
> cat /proc/net/bonding/bond0
> Ethernet Chanel Bonding Driver:v3.7.1(April 27, 2011)
>
> Bonding Mode:fault-tolerance(active-ackup)
> Primary Slave:none
> Currently Active Slave:eno1
> MII Status:up
> MII Polling Intercal (ms):100
> Up Delay (ms) : 0
> Down Delay (ms) : 0
>
> Slave Interface :eno1
> MII Status:up
> Speed : 1 Mbps
> Link Failure Count : 0
> Permanent HW addr :a4:be:26:16:e9:b2
> Slave queue ID: 0
>
> Slave Interface :eno2
> MII Status:up
> Speed : 1 Mbps
> Link Failure Count : 0
> Permanent HW addr :a4:be:26:16:e9:b2
> Slave queue ID: 0
>
> ping vm from different subnet.
>
> Eveything is okay if I don't change bond link interface。When I unplug
> Currently Active Slave eno1,bond link change to eno2 as expected but vm
> become unreachable until external physical switch MAC Table ageing time
> expired.It seems that vm doesn't sent gratuitous ARP when bond link change.
> How can I fix if?
>

There is no reason for the VM OS to send anything as it is unaware of the
change you have done in the network.
It should work fine if you perform this operation from oVirt Management, as
it will cause the interfaces to be set down and up again (I would expect
the links to go down as a result), causing the switch ports to flush its
mac address table.


> vm os is Centos 7.5
> ovirt version 4.2 also tested.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/CUC67VZ7WNW5M4L7IBBDIUZKK7SRLMLQ/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FAAET3PUF5OWWH7GUA72G3ICWK3MLSRU/