Re: [ovirt-users] NullPointerException when changing compatibility version to 4.0

2017-07-22 Thread Michal Skrivanek

> On 20 Jul 2017, at 15:56, Marcel Hanke  wrote:
> 
> Hi,
> the Log is >400MB heres a part with the Errors. 

ok
and which exact version do you have?

> 
> thanks Marcel
> 
> On Thursday, July 20, 2017 02:43:57 PM Eli Mesika wrote:
>> Hi
>> 
>> Please attach full engine.log
>> 
>> On Wed, Jul 19, 2017 at 12:33 PM, Marcel Hanke 
>> 
>> wrote:
>>> Hi,
>>> i currently have a problem with changing one of our clusters to
>>> compatibility
>>> version 4.0.
>>> The Log shows a NullPointerException after several successful vms:
>>> 2017-07-19 11:19:45,886 ERROR [org.ovirt.engine.core.bll.UpdateVmCommand]
>>> (default task-31) [1acd2990] Error during ValidateFailure.:
>>> java.lang.NullPointerException
>>> 
>>>at
>>> 
>>> org.ovirt.engine.core.bll.UpdateVmCommand.validate(
>>> UpdateVmCommand.java:632)
>>> [bll.jar:]
>>> 
>>>at
>>> 
>>> org.ovirt.engine.core.bll.CommandBase.internalValidate(
>>> CommandBase.java:886)
>>> [bll.jar:]
>>> 
>>>at
>>> 
>>> org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:391)
>>> [bll.jar:]
>>> 
>>>at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:493)
>>> 
>>> [bll.jar:]
>>> .
>>> 
>>> On other Clusters with the exect same configuration the change to 4.0 was
>>> successfull without a problem.
>>> Turning off the cluster for the change is also not possible because of
>>> 
 1200
>>> 
>>> Vms running on it.
>>> 
>>> Does anyone have an idea what to do, or that to look for?
>>> 
>>> Thanks
>>> Marcel
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-22 Thread Ravishankar N


On 07/21/2017 11:41 PM, yayo (j) wrote:

Hi,

Sorry for follow up again, but, checking the ovirt interface I've 
found that ovirt report the "engine" volume as an "arbiter" 
configuration and the "data" volume as full replicated volume. Check 
these screenshots:


This is probably some refresh bug in the UI, Sahina might be able to 
tell you.


https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFfVmR5aDQ?usp=sharing

But the "gluster volume info" command report that all 2 volume are 
full replicated:



/Volume Name: data/
/Type: Replicate/
/Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d/
/Status: Started/
/Snapshot Count: 0/
/Number of Bricks: 1 x 3 = 3/
/Transport-type: tcp/
/Bricks:/
/Brick1: gdnode01:/gluster/data/brick/
/Brick2: gdnode02:/gluster/data/brick/
/Brick3: gdnode04:/gluster/data/brick/
/Options Reconfigured:/
/nfs.disable: on/
/performance.readdir-ahead: on/
/transport.address-family: inet/
/storage.owner-uid: 36/
/performance.quick-read: off/
/performance.read-ahead: off/
/performance.io-cache: off/
/performance.stat-prefetch: off/
/performance.low-prio-threads: 32/
/network.remote-dio: enable/
/cluster.eager-lock: enable/
/cluster.quorum-type: auto/
/cluster.server-quorum-type: server/
/cluster.data-self-heal-algorithm: full/
/cluster.locking-scheme: granular/
/cluster.shd-max-threads: 8/
/cluster.shd-wait-qlength: 1/
/features.shard: on/
/user.cifs: off/
/storage.owner-gid: 36/
/features.shard-block-size: 512MB/
/network.ping-timeout: 30/
/performance.strict-o-direct: on/
/cluster.granular-entry-heal: on/
/auth.allow: */
/server.allow-insecure: on/





/Volume Name: engine/
/Type: Replicate/
/Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515/
/Status: Started/
/Snapshot Count: 0/
/Number of Bricks: 1 x 3 = 3/
/Transport-type: tcp/
/Bricks:/
/Brick1: gdnode01:/gluster/engine/brick/
/Brick2: gdnode02:/gluster/engine/brick/
/Brick3: gdnode04:/gluster/engine/brick/
/Options Reconfigured:/
/nfs.disable: on/
/performance.readdir-ahead: on/
/transport.address-family: inet/
/storage.owner-uid: 36/
/performance.quick-read: off/
/performance.read-ahead: off/
/performance.io-cache: off/
/performance.stat-prefetch: off/
/performance.low-prio-threads: 32/
/network.remote-dio: off/
/cluster.eager-lock: enable/
/cluster.quorum-type: auto/
/cluster.server-quorum-type: server/
/cluster.data-self-heal-algorithm: full/
/cluster.locking-scheme: granular/
/cluster.shd-max-threads: 8/
/cluster.shd-wait-qlength: 1/
/features.shard: on/
/user.cifs: off/
/storage.owner-gid: 36/
/features.shard-block-size: 512MB/
/network.ping-timeout: 30/
/performance.strict-o-direct: on/
/cluster.granular-entry-heal: on/
/auth.allow: */

  server.allow-insecure: on


2017-07-21 19:13 GMT+02:00 yayo (j) >:


2017-07-20 14:48 GMT+02:00 Ravishankar N >:


But it does  say something. All these gfids of completed heals
in the log below are the for the ones that you have given the
getfattr output of. So what is likely happening is there is an
intermittent connection problem between your mount and the
brick process, leading to pending heals again after the heal
gets completed, which is why the numbers are varying each
time. You would need to check why that is the case.
Hope this helps,
Ravi




/[2017-07-20 09:58:46.573079] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1  sinks=2/
/[2017-07-20 09:59:22.995003] I [MSGID: 108026]
[afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
0-engine-replicate-0: performing metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81/
/[2017-07-20 09:59:22.999372] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81. sources=[0] 1  sinks=2/




Hi,

following your suggestion, I've checked the "peer" status and I
found that there is too many name for the hosts, I don't know if
this can be the problem or part of it:

/*gluster peer status on NODE01:*/
/Number of Peers: 2/
/
/
/Hostname: dnode02.localdomain.local/
/Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd/
/State: Peer in Cluster (Connected)/
/Other names:/
/192.168.10.52/
/dnode02.localdomain.local/