On Mon, Mar 20, 2017 at 5:16 PM, Logan Kuhn <supp...@jac-properties.com> wrote:
> That is another odd aspect about this. The ovirt-engine service is up as > is the httpd and ovirt-engine-dwhd services. > > Any ideas on how to fix it? > Please start running curl --insecure https://<your.vm.address>/ovirt-engine/services/health from your hosts and from the engine VM itself and check the output > > Logan > > On Mon, Mar 20, 2017 at 11:02 AM, Simone Tiraboschi <stira...@redhat.com> > wrote: > >> >> >> On Mon, Mar 20, 2017 at 4:39 PM, Logan Kuhn <supp...@jac-properties.com> >> wrote: >> >>> So that sounds like the host isn't able to communicate properly with the >>> HEVM. The cluster is still in global maintenance, but the HEVM still >>> thinks that it isn't because the database says it isn't in global >>> maintenance. >>> >> >> No, it simply means that that the engine is not able to start and, if the >> engine doesn't start, nothing else will update the host status in your DB. >> >> >> >>> >>> Logan >>> >>> >>> On Mon, Mar 20, 2017 at 10:34 AM, Simone Tiraboschi <stira...@redhat.com >>> > wrote: >>> >>>> >>>> >>>> On Mon, Mar 20, 2017 at 4:28 PM, Logan Kuhn <supp...@jac-properties.com >>>> > wrote: >>>> >>>>> Yup, ovirttest1 ran out of disk space on Friday, we recovered it and >>>>> everything seemed completely normal. >>>>> >>>>> the postgres service is down on the HEVM, but that is because it's on >>>>> our postgresql cluster, has been for weeks. I can connect to it's >>>>> database >>>>> from within the HEVM using the credentials stored at >>>>> /etc/ovirt-engine/engine.conf.d/10-setup-database.conf I can tail >>>>> the logs on the postgres master and ovirt can and does connect to it. >>>>> >>>>> However, trying from ovirttest1 I cannot connect to the engine >>>>> database using those same credentails, should I be able to? It'd make >>>>> sense to be able to connect to it.... >>>>> >>>> >>>> It could depend on how you configured your pg_hba.conf for your DBMS >>>> instance. >>>> Only the engine and dwh have to connect to the engine VM, a direct DB >>>> connection from the hosts is not required. >>>> >>>> >>>>> >>>>> Logan >>>>> >>>>> On Mon, Mar 20, 2017 at 10:14 AM, Alexander Wels <aw...@redhat.com> >>>>> wrote: >>>>> >>>>>> On Monday, March 20, 2017 9:14:51 AM EDT Logan Kuhn wrote: >>>>>> > Starting at 1:09am on Saturday the Hosted Engine has been rebooting >>>>>> because >>>>>> > it failed it's liveliness check. This is due to the webadmin not >>>>>> loading. >>>>>> > Nothing changed as far as I can tell on the engine since it's last >>>>>> > successful reboot on Friday afternoon. >>>>>> > >>>>>> > The engine, dwhd and httpd are all up and do not seem to be >>>>>> reporting >>>>>> > anything unusual in their respective logs. The engine can talk to >>>>>> the >>>>>> > database as I can login using the credentials in /etc/ovirt-engine/ >>>>>> engine.co >>>>>> > nf.d/10-setup-database.conf and the logs on the postgres server are >>>>>> showing >>>>>> > activity. >>>>>> > >>>>>> > I tried to run engine-setup but it says it's not in global >>>>>> maintenance even >>>>>> > though the hosted engine hosts agree that it is. We are on version >>>>>> 4.0.6.3 >>>>>> > >>>>>> > Server, engine and agent logs are attached >>>>>> > >>>>>> > Regards, >>>>>> > Logan >>>>>> >>>>>> Looking at our logs, it appears that on Friday one of your hosts ran >>>>>> out of >>>>>> disk space in its logs or temp directory. At which point connectivity >>>>>> started >>>>>> to be spotty. I see a bunch of attempts to migrate VMs away from that >>>>>> host >>>>>> (ovirttest1). All of them fail. That repeats a ton of times, I >>>>>> forwarded to >>>>>> Saturday where it appears you had a bunch of stale locks which also >>>>>> repeates a >>>>>> bunch of time until the engine VM gets restarted. >>>>>> >>>>>> Then I see nothing but restarts of the engine and no apparent errors >>>>>> in the >>>>>> engine log. >>>>>> >>>>>> The server log does however reveal this: >>>>>> 2017-03-20 07:04:27,282 ERROR [org.quartz.core.ErrorLogger] >>>>>> (QuartzOvirtDBScheduler_QuartzSchedulerThread) An error occurred >>>>>> while >>>>>> scanning for the next triggers to fire.: >>>>>> org.quartz.JobPersistenceException: >>>>>> Failed to obtain DB connection from data source 'NMEngineDS': >>>>>> java.sql.SQLException: Could not retrieve datasource via JNDI url >>>>>> 'java:/ >>>>>> ENGINEDataSourceNoJTA' java.sql.SQLException: >>>>>> javax.resource.ResourceException: IJ000470: You are trying to use a >>>>>> connection >>>>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA [See >>>>>> nested >>>>>> exception: java.sql.SQLException: Could not retrieve datasource via >>>>>> JNDI url >>>>>> 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException: >>>>>> javax.resource.ResourceException: IJ000470: You are trying to use a >>>>>> connection >>>>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA] >>>>>> at >>>>>> org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConn >>>>>> ection(JobStoreCMT.java: >>>>>> 168) [quartz.jar:] >>>>>> at >>>>>> org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonMan >>>>>> agedTXLock(JobStoreSupport.java: >>>>>> 3807) [quartz.jar:] >>>>>> at >>>>>> org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrig >>>>>> gers(JobStoreSupport.java: >>>>>> 2751) [quartz.jar:] >>>>>> at org.quartz.core.QuartzSchedule >>>>>> rThread.run(QuartzSchedulerThread.java: >>>>>> 264) [quartz.jar:] >>>>>> Caused by: java.sql.SQLException: Could not retrieve datasource via >>>>>> JNDI url >>>>>> 'java:/ENGINEDataSourceNoJTA' java.sql.SQLException: >>>>>> javax.resource.ResourceException: IJ000470: You are trying to use a >>>>>> connection >>>>>> factory that has been shut down: java:/ENGINEDataSourceNoJTA >>>>>> at >>>>>> org.quartz.utils.JNDIConnectionProvider.getConnection(JNDICo >>>>>> nnectionProvider.java: >>>>>> 163) [quartz.jar:] >>>>>> at >>>>>> org.quartz.utils.DBConnectionManager.getConnection(DBConnect >>>>>> ionManager.java: >>>>>> 108) [quartz.jar:] >>>>>> at >>>>>> org.quartz.impl.jdbcjobstore.JobStoreCMT.getNonManagedTXConn >>>>>> ection(JobStoreCMT.java: >>>>>> 165) [quartz.jar:] >>>>>> ... 3 more >>>>>> >>>>>> Is your postgresql service running? That is the most likely source of >>>>>> the >>>>>> engine not coming up. >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> Users@ovirt.org >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>> >>> >> >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users