Re: [ovirt-users] Gluster setup disappears any chance to recover?

Arman Khalatyan Thu, 02 Mar 2017 01:08:38 -0800

I just discovered in the logs several troubles:
1) the rdma support was not installed from glusterfs (but the RDMA check
box was selected)
2) somehow every second during the resync the connection was going down and
up...
3)Due to 2) the hosts are restarging daemon glusterfs several times, with
correct parameters and with no parameters.. they where giving conflict and
one other other was overtaking.
Maybe the fault was due to the onboot enabled glusterfs service.


I can try to destroy whole cluster and reinstall from scratch to see if we
can figure-out why the vol config files are disappears.

On Thu, Mar 2, 2017 at 5:34 AM, Ramesh Nachimuthu <rnach...@redhat.com>
wrote:

>
>
>
>
> ----- Original Message -----
> > From: "Arman Khalatyan" <arm2...@gmail.com>
> > To: "Ramesh Nachimuthu" <rnach...@redhat.com>
> > Cc: "users" <users@ovirt.org>, "Sahina Bose" <sab...@redhat.com>
> > Sent: Wednesday, March 1, 2017 11:22:32 PM
> > Subject: Re: [ovirt-users] Gluster setup disappears any chance to
> recover?
> >
> > ok I will answer by my self:
> > yes gluster daemon is managed by vdms:)
> > and to recover lost config simply one should add "force" keyword
> > gluster volume create GluReplica replica 3 arbiter 1 transport TCP,RDMA
> > 10.10.10.44:/zclei22/01/glu 10.10.10.42:/zclei21/01/glu
> > 10.10.10.41:/zclei26/01/glu
> > force
> >
> > now everything is up an running !
> > one annoying thing is epel dependency in the zfs and conflicting ovirt...
> > every time one need to enable and then disable epel.
> >
> >
>
> Glusterd service will be started when you add/activate the host in oVirt.
> It will be configured to start after every reboot.
> Volumes disappearing seems to be a serious issue. We have never seen such
> an issue with XFS file system. Are you able to reproduce this issue
> consistently?.
>
> Regards,
> Ramesh
>
> >
> > On Wed, Mar 1, 2017 at 5:33 PM, Arman Khalatyan <arm2...@gmail.com>
> wrote:
> >
> > > ok Finally by single brick up and running so I can access to data.
> > > Now the question is do we need to run glusterd daemon on startup? or
> it is
> > > managed by vdsmd?
> > >
> > >
> > > On Wed, Mar 1, 2017 at 2:36 PM, Arman Khalatyan <arm2...@gmail.com>
> wrote:
> > >
> > >> all folders /var/lib/glusterd/vols/ are empty
> > >> In the history of one of the servers I found the command how it was
> > >> created:
> > >>
> > >> gluster volume create GluReplica replica 3 arbiter 1 transport
> TCP,RDMA
> > >> 10.10.10.44:/zclei22/01/glu 10.10.10.42:/zclei21/01/glu 10.10.10.41:
> > >> /zclei26/01/glu
> > >>
> > >> But executing this command it claims that:
> > >> volume create: GluReplica: failed: /zclei22/01/glu is already part of
> a
> > >> volume
> > >>
> > >> Any chance to force it?
> > >>
> > >>
> > >>
> > >> On Wed, Mar 1, 2017 at 12:13 PM, Ramesh Nachimuthu <
> rnach...@redhat.com>
> > >> wrote:
> > >>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> ----- Original Message -----
> > >>> > From: "Arman Khalatyan" <arm2...@gmail.com>
> > >>> > To: "users" <users@ovirt.org>
> > >>> > Sent: Wednesday, March 1, 2017 3:10:38 PM
> > >>> > Subject: Re: [ovirt-users] Gluster setup disappears any chance to
> > >>> recover?
> > >>> >
> > >>> > engine throws following errors:
> > >>> > 2017-03-01 10:39:59,608+01 WARN
> > >>> > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.
> AuditLogDirector]
> > >>> > (DefaultQuartzScheduler6) [d7f7d83] EVENT_ID:
> > >>> > GLUSTER_VOLUME_DELETED_FROM_CLI(4,027), Correlation ID: null, Call
> > >>> Stack:
> > >>> > null, Custom Event ID: -1, Message: Detected deletion of volume
> > >>> GluReplica
> > >>> > on cluster HaGLU, and deleted it from engine DB.
> > >>> > 2017-03-01 10:39:59,610+01 ERROR
> > >>> > [org.ovirt.engine.core.bll.gluster.GlusterSyncJob]
> > >>> (DefaultQuartzScheduler6)
> > >>> > [d7f7d83] Error while removing volumes from database!:
> > >>> > org.springframework.dao.DataIntegrityViolationException:
> > >>> > CallableStatementCallback; SQL [{call
> deleteglustervolumesbyguids(?)
> > >>> }];
> > >>> > ERROR: update or delete on table "gluster_volumes" violates
> foreign key
> > >>> > constraint "fk_storage_connection_to_glustervolume" on table
> > >>> > "storage_server_connections"
> > >>> > Detail: Key (id)=(3d8bfa9d-1c83-46ac-b4e9-bd317623ed2d) is still
> > >>> referenced
> > >>> > from table "storage_server_connections".
> > >>> > Where: SQL statement "DELETE
> > >>> > FROM gluster_volumes
> > >>> > WHERE id IN (
> > >>> > SELECT *
> > >>> > FROM fnSplitterUuid(v_volume_ids)
> > >>> > )"
> > >>> > PL/pgSQL function deleteglustervolumesbyguids(character varying)
> line
> > >>> 3 at
> > >>> > SQL statement; nested exception is org.postgresql.util.
> PSQLException:
> > >>> ERROR:
> > >>> > update or delete on table "gluster_volumes" violates foreign key
> > >>> constraint
> > >>> > "fk_storage_connection_to_glustervolume" on table
> > >>> > "storage_server_connections"
> > >>> > Detail: Key (id)=(3d8bfa9d-1c83-46ac-b4e9-bd317623ed2d) is still
> > >>> referenced
> > >>> > from table "storage_server_connections".
> > >>> > Where: SQL statement "DELETE
> > >>> > FROM gluster_volumes
> > >>> > WHERE id IN (
> > >>> > SELECT *
> > >>> > FROM fnSplitterUuid(v_volume_ids)
> > >>> > )"
> > >>> > PL/pgSQL function deleteglustervolumesbyguids(character varying)
> line
> > >>> 3 at
> > >>> > SQL statement
> > >>> > at
> > >>> > org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTra
> > >>> nslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:243)
> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
> > >>> > at
> > >>> > org.springframework.jdbc.support.AbstractFallbackSQLExceptio
> > >>> nTranslator.translate(AbstractFallbackSQLExceptionTr
> anslator.java:73)
> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
> > >>> > at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTempl
> > >>> ate.java:1094)
> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
> > >>> > at org.springframework.jdbc.core.JdbcTemplate.call(JdbcTemplate
> > >>> .java:1130)
> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
> > >>> > at
> > >>> > org.springframework.jdbc.core.simple.AbstractJdbcCall.execut
> > >>> eCallInternal(AbstractJdbcCall.java:405)
> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
> > >>> > at
> > >>> > org.springframework.jdbc.core.simple.AbstractJdbcCall.doExec
> > >>> ute(AbstractJdbcCall.java:365)
> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
> > >>> > at
> > >>> > org.springframework.jdbc.core.simple.SimpleJdbcCall.execute(
> > >>> SimpleJdbcCall.java:198)
> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
> > >>> > at
> > >>> > org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.ex
> > >>> ecuteImpl(SimpleJdbcCallsHandler.java:135)
> > >>> > [dal.jar:]
> > >>> > at
> > >>> > org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.ex
> > >>> ecuteImpl(SimpleJdbcCallsHandler.java:130)
> > >>> > [dal.jar:]
> > >>> > at
> > >>> > org.ovirt.engine.core.dal.dbbroker.SimpleJdbcCallsHandler.ex
> > >>> ecuteModification(SimpleJdbcCallsHandler.java:76)
> > >>> > [dal.jar:]
> > >>> > at
> > >>> > org.ovirt.engine.core.dao.gluster.GlusterVolumeDaoImpl.remov
> > >>> eAll(GlusterVolumeDaoImpl.java:233)
> > >>> > [dal.jar:]
> > >>> > at
> > >>> > org.ovirt.engine.core.bll.gluster.GlusterSyncJob.removeDelet
> > >>> edVolumes(GlusterSyncJob.java:521)
> > >>> > [bll.jar:]
> > >>> > at
> > >>> > org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshVolu
> > >>> meData(GlusterSyncJob.java:465)
> > >>> > [bll.jar:]
> > >>> > at
> > >>> > org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshClus
> > >>> terData(GlusterSyncJob.java:133)
> > >>> > [bll.jar:]
> > >>> > at
> > >>> > org.ovirt.engine.core.bll.gluster.GlusterSyncJob.refreshLigh
> > >>> tWeightData(GlusterSyncJob.java:111)
> > >>> > [bll.jar:]
> > >>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >>> > [rt.jar:1.8.0_121]
> > >>> > at
> > >>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
> > >>> ssorImpl.java:62)
> > >>> > [rt.jar:1.8.0_121]
> > >>> > at
> > >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
> > >>> thodAccessorImpl.java:43)
> > >>> > [rt.jar:1.8.0_121]
> > >>> > at java.lang.reflect.Method.invoke(Method.java:498)
> [rt.jar:1.8.0_121]
> > >>> > at
> > >>> > org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(Jo
> > >>> bWrapper.java:77)
> > >>> > [scheduler.jar:]
> > >>> > at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrap
> > >>> per.java:51)
> > >>> > [scheduler.jar:]
> > >>> > at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
> [quartz.jar:]
> > >>> > at java.util.concurrent.Executors$RunnableAdapter.call(Executor
> > >>> s.java:511)
> > >>> > [rt.jar:1.8.0_121]
> > >>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > >>> > [rt.jar:1.8.0_121]
> > >>> > at
> > >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> > >>> Executor.java:1142)
> > >>> > [rt.jar:1.8.0_121]
> > >>> > at
> > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> > >>> lExecutor.java:617)
> > >>> > [rt.jar:1.8.0_121]
> > >>> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121]
> > >>> > Caused by: org.postgresql.util.PSQLException: ERROR: update or
> delete
> > >>> on
> > >>> > table "gluster_volumes" violates foreign key constraint
> > >>> > "fk_storage_connection_to_glustervolume" on table
> > >>> > "storage_server_connections"
> > >>> > Detail: Key (id)=(3d8bfa9d-1c83-46ac-b4e9-bd317623ed2d) is still
> > >>> referenced
> > >>> > from table "storage_server_connections".
> > >>> > Where: SQL statement "DELETE
> > >>> > FROM gluster_volumes
> > >>> > WHERE id IN (
> > >>> > SELECT *
> > >>> > FROM fnSplitterUuid(v_volume_ids)
> > >>> > )"
> > >>> > PL/pgSQL function deleteglustervolumesbyguids(character varying)
> line
> > >>> 3 at
> > >>> > SQL statement
> > >>> > at
> > >>> > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorRespons
> > >>> e(QueryExecutorImpl.java:2157)
> > >>> > at
> > >>> > org.postgresql.core.v3.QueryExecutorImpl.processResults(Quer
> > >>> yExecutorImpl.java:1886)
> > >>> > at
> > >>> > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecut
> > >>> orImpl.java:255)
> > >>> > at
> > >>> > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(Abstract
> > >>> Jdbc2Statement.java:555)
> > >>> > at
> > >>> > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags
> > >>> (AbstractJdbc2Statement.java:417)
> > >>> > at
> > >>> > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(Abstract
> > >>> Jdbc2Statement.java:410)
> > >>> > at
> > >>> > org.jboss.jca.adapters.jdbc.CachedPreparedStatement.execute(
> > >>> CachedPreparedStatement.java:303)
> > >>> > at
> > >>> > org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.execute
> > >>> (WrappedPreparedStatement.java:442)
> > >>> > at
> > >>> > org.springframework.jdbc.core.JdbcTemplate$6.doInCallableSta
> > >>> tement(JdbcTemplate.java:1133)
> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
> > >>> > at
> > >>> > org.springframework.jdbc.core.JdbcTemplate$6.doInCallableSta
> > >>> tement(JdbcTemplate.java:1130)
> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
> > >>> > at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTempl
> > >>> ate.java:1078)
> > >>> > [spring-jdbc.jar:4.2.4.RELEASE]
> > >>> > ... 24 more
> > >>> >
> > >>> >
> > >>> >
> > >>>
> > >>> This is a side effect volume deletion in the gluster side. Looks like
> > >>> you have storage domains created using those volumes.
> > >>>
> > >>> > On Wed, Mar 1, 2017 at 9:49 AM, Arman Khalatyan <
> arm2...@gmail.com >
> > >>> wrote:
> > >>> >
> > >>> >
> > >>> >
> > >>> > Hi,
> > >>> > I just tested power cut on the test system:
> > >>> >
> > >>> > Cluster with 3-Hosts each host has 4TB localdisk with zfs on it
> > >>> /zhost/01/glu
> > >>> > folder as a brick.
> > >>> >
> > >>> > Glusterfs was with replicated to 3 disks with arbiter. So far so
> good.
> > >>> Vm was
> > >>> > up an running with 5oGB OS disk: dd was showing 100-70MB/s
> performance
> > >>> with
> > >>> > the Vm disk.
> > >>> > I just simulated disaster powercut: with ipmi power-cycle all 3
> hosts
> > >>> same
> > >>> > time.
> > >>> > the result is all hosts are green up and running but bricks are
> down.
> > >>> > in the processes I can see:
> > >>> > ps aux | grep gluster
> > >>> > root 16156 0.8 0.0 475360 16964 ? Ssl 08:47 0:00
> /usr/sbin/glusterd -p
> > >>> > /var/run/glusterd.pid --log-level INFO
> > >>> >
> > >>> > What happened with my volume setup??
> > >>> > Is it possible to recover it??
> > >>> > [root@clei21 ~]# gluster peer status
> > >>> > Number of Peers: 2
> > >>> >
> > >>> > Hostname: clei22.cls
> > >>> > Uuid: 96b52c7e-3526-44fd-af80-14a3073ebac2
> > >>> > State: Peer in Cluster (Connected)
> > >>> > Other names:
> > >>> > 192.168.101.40
> > >>> > 10.10.10.44
> > >>> >
> > >>> > Hostname: clei26.cls
> > >>> > Uuid: c9fab907-5053-41a8-a1fa-d069f34e42dc
> > >>> > State: Peer in Cluster (Connected)
> > >>> > Other names:
> > >>> > 10.10.10.41
> > >>> > [root@clei21 ~]# gluster volume info
> > >>> > No volumes present
> > >>> > [root@clei21 ~]#
> > >>>
> > >>> I not sure why all volumes are getting deleted after reboot. Do you
> see
> > >>> any vol files under the directory /var/lib/glusterd/vols/?. Also
> > >>> /var/log/glusterfs/cmd_history.log should have all the gluster
> commands
> > >>> executed.
> > >>>
> > >>> Regards,
> > >>> Ramesh
> > >>>
> > >>> >
> > >>> >
> > >>> >
> > >>> > _______________________________________________
> > >>> > Users mailing list
> > >>> > Users@ovirt.org
> > >>> > http://lists.ovirt.org/mailman/listinfo/users
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Gluster setup disappears any chance to recover?

Reply via email to