Re: [ovirt-users] engine.log is looping with Volume XXX contains a apparently corrupt brick(s).

2015-10-13 Thread Sahina Bose



On 10/12/2015 06:43 PM, Nico wrote:


Le 2015-10-12 14:04, Nir Soffer a écrit :



Yes, engine will let you use such volume in 3.5 - this is a bug. In 
3.6 you will

not be able to use such setup.

replica 2 fails in a very bad way when one brick is down; the
application may get
stale data, and this breaks sanlock. You will be get stuck with spm
that cannot be
stopped and other fun stuff.

You don't want to go in this direction, and we will not be able to 
support that.



here the last entries of vdsm.log


We need the whole file.

I suggest you file an ovirt bug and attach the full vdsm log file
showing the timeframe of
the error. Probably from the time you created the glusterfs domain.

Nir


Please find the full logs there:

https://94.23.2.63/log_vdsm/vdsm.log

https://94.23.2.63/log_vdsm/

https://94.23.2.63/log_engine/



The engine log looping with "Volume contains apparently corrupt bricks"- 
is when engine tries to get information from gluster CLI about the 
volumes and updates its database. These errors do not affect the 
functioning of the storage domain and running virtual machines, but 
affect the monitoring/management of the gluster volume from oVirt.


Now to identify the cause of the error - the logs indicate that the 
gluster's server uuid has either not been updated/ or is different in 
the engine. Could be one of these scenarios
1. Did you create the cluster with only virt service enabled and later 
enable gluster service? In this case, the gluster server uuid may not be 
updated. You will need to put host to maintenance and then activate it 
to resolve this


2. Did you re-install the gluster server nodes after adding it to oVirt? 
If this is the case, we need to investigate further how there's a mismatch.






___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] engine.log is looping with Volume XXX contains a apparently corrupt brick(s).

2015-10-12 Thread Nir Soffer
On Sun, Oct 11, 2015 at 6:43 PM, Nico  wrote:
> Recently, i built a small oVirt platform with 2 dedicated servers and 
> GlusterFS to synch the VM storage.
> Bricks:
>
> Brick1: ovirt01:/gluster/ovirt
>
> Brick2: ovirt02:/gluster/ovirt

This looks like replica 2 - this is not supported.

You can use either replica 1 (testing) or replica 3 (production).

> But when i check /var/log/ovirt/engine.log on ovirt01, there are error in 
> loop every 2 seconds:
To understand such error we need to see the vdsm log.

Nir
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] engine.log is looping with Volume XXX contains a apparently corrupt brick(s).

2015-10-12 Thread Nico
 

Le 2015-10-12 09:59, Nir Soffer a écrit : 

> On Sun, Oct 11, 2015 at 6:43 PM, Nico  wrote: 
> 
>> Recently, i built a small oVirt platform with 2 dedicated servers and 
>> GlusterFS to synch the VM storage.
>> Bricks:
>> 
>> Brick1: ovirt01:/gluster/ovirt
>> 
>> Brick2: ovirt02:/gluster/ovirt
> 
> This looks like replica 2 - this is not supported.
> 
> You can use either replica 1 (testing) or replica 3 (production).
> 
>> But when i check /var/log/ovirt/engine.log on ovirt01, there are error in 
>> loop every 2 seconds:
> To understand such error we need to see the vdsm log.
> 
> Nir

Yeah it is replica 2 as i've only 2 dedicated servers. 

why are you saying it is not supported ? Through oVirt GUI, it is
possible to create a Gluster Volume with 2 bricks in repllcate mode; i
tried it also. 

here the last entries of vdsm.log 

hread-167405::DEBUG::2015-10-12
10:12:20,132::stompReactor::163::yajsonrpc.StompServer::(send) Sending
response
Thread-55245::DEBUG::2015-10-12
10:12:22,529::task::595::Storage.TaskManager.Task::(_updateState)
Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::moving from state init ->
state preparing
Thread-55245::INFO::2015-10-12
10:12:22,530::logUtils::44::dispatcher::(wrapper) Run and protect:
getVolumeSize(sdUUID='d44ee4b0-8d36-467a-9610-c682a618b698',
spUUID='0ae7120a-430d-4534-9a7e-59c53fb2e804',
imgUUID='3454b077-297b-4b89-b8ce-a77f6ec5d22e',
volUUID='933da0b6-6a05-4e64-958a-e1c030cf5ddb', options=None)
Thread-55245::INFO::2015-10-12
10:12:22,535::logUtils::47::dispatcher::(wrapper) Run and protect:
getVolumeSize, Return response: {'truesize': '158983839744',
'apparentsize': '161061273600'}
Thread-55245::DEBUG::2015-10-12
10:12:22,535::task::1191::Storage.TaskManager.Task::(prepare)
Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::finished: {'truesize':
'158983839744', 'apparentsize': '161061273600'}
Thread-55245::DEBUG::2015-10-12
10:12:22,535::task::595::Storage.TaskManager.Task::(_updateState)
Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::moving from state preparing
-> state finished
Thread-55245::DEBUG::2015-10-12
10:12:22,535::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-55245::DEBUG::2015-10-12
10:12:22,536::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-55245::DEBUG::2015-10-12
10:12:22,536::task::993::Storage.TaskManager.Task::(_decref)
Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::ref 0 aborting False
Thread-55245::DEBUG::2015-10-12
10:12:22,545::libvirtconnection::143::root::(wrapper) Unknown
libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not found:
Requested metadata element is not present
JsonRpc (StompReactor)::DEBUG::2015-10-12
10:12:23,138::stompReactor::98::Broker.StompAdapter::(handle_frame)
Handling message 
JsonRpcServer::DEBUG::2015-10-12
10:12:23,139::__init__::530::jsonrpc.JsonRpcServer::(serve_requests)
Waiting for request
Thread-167406::DEBUG::2015-10-12
10:12:23,142::stompReactor::163::yajsonrpc.StompServer::(send) Sending
response
Thread-37810::DEBUG::2015-10-12
10:12:24,194::fileSD::262::Storage.Misc.excCmd::(getReadDelay)
/usr/bin/dd
if=/rhev/data-center/mnt/ovirt01:_data_iso/5aec30fa-be8b-4f4e-832e-eafb6fa4a8e0/dom_md/metadata
iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-37810::DEBUG::2015-10-12
10:12:24,201::fileSD::262::Storage.Misc.excCmd::(getReadDelay) SUCCESS:
 = '0+1 records in\n0+1 records out\n317 bytes (317 B) copied,
0.000131729 s, 2.4 MB/s\n';  = 0
JsonRpc (StompReactor)::DEBUG::2015-10-12
10:12:26,148::stompReactor::98::Broker.StompAdapter::(handle_frame)
Handling message 
JsonRpcServer::DEBUG::2015-10-12
10:12:26,149::__init__::530::jsonrpc.JsonRpcServer::(serve_requests)
Waiting for request
Thread-167407::DEBUG::2015-10-12
10:12:26,151::stompReactor::163::yajsonrpc.StompServer::(send) Sending
response
VM Channels Listener::DEBUG::2015-10-12
10:12:26,972::vmchannels::96::vds::(_handle_timeouts) Timeout on fileno
35.
Thread-30::DEBUG::2015-10-12
10:12:28,358::fileSD::262::Storage.Misc.excCmd::(getReadDelay)
/usr/bin/dd
if=/rhev/data-center/mnt/glusterSD/localhost:_ovirt/d44ee4b0-8d36-467a-9610-c682a618b698/dom_md/metadata
iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
Thread-30::DEBUG::2015-10-12
10:12:28,451::fileSD::262::Storage.Misc.excCmd::(getReadDelay) SUCCESS:
 = '0+1 records in\n0+1 records out\n470 bytes (470 B) copied,
0.000152738 s, 3.1 MB/s\n';  = 0
JsonRpc (StompReactor)::DEBUG::2015-10-12
10:12:29,157::stompReactor::98::Broker.StompAdapter::(handle_frame)
Handling message 
JsonRpcServer::DEBUG::2015-10-12
10:12:29,252::__init__::530::jsonrpc.JsonRpcServer::(serve_requests)
Waiting for request
Thread-167408::DEBUG::2015-10-12
10:12:29,254::stompReactor::163::yajsonrpc.StompServer::(send) Sending
response
JsonRpc (StompReactor)::DEBUG::2015-10-12
10:12:32,260::stompReactor::98::Broker.StompAdapter::(handle_frame)
Handling message 

Re: [ovirt-users] engine.log is looping with Volume XXX contains a apparently corrupt brick(s).

2015-10-12 Thread Nir Soffer
On Mon, Oct 12, 2015 at 11:14 AM, Nico  wrote:
>
>
> Le 2015-10-12 09:59, Nir Soffer a écrit :
>
> On Sun, Oct 11, 2015 at 6:43 PM, Nico  wrote:
>
> Recently, i built a small oVirt platform with 2 dedicated servers and
> GlusterFS to synch the VM storage.
> Bricks:
>
> Brick1: ovirt01:/gluster/ovirt
>
> Brick2: ovirt02:/gluster/ovirt
>
>
> This looks like replica 2 - this is not supported.
>
> You can use either replica 1 (testing) or replica 3 (production).
>
> But when i check /var/log/ovirt/engine.log on ovirt01, there are error in
> loop every 2 seconds:
>
> To understand such error we need to see the vdsm log.
>
> Nir
>
> Yeah it is replica 2 as i've only 2 dedicated servers.
>
> why are you saying it is not supported ? Through oVirt GUI, it is possible
> to create a Gluster Volume with 2 bricks in repllcate mode; i tried it also.

Yes, engine will let you use such volume in 3.5 - this is a bug. In 3.6 you will
not be able to use such setup.

replica 2 fails in a very bad way when one brick is down; the
application may get
stale data, and this breaks sanlock. You will be get stuck with spm
that cannot be
stopped and other fun stuff.

You don't want to go in this direction, and we will not be able to support that.

> here the last entries of vdsm.log

We need the whole file.

I suggest you file an ovirt bug and attach the full vdsm log file
showing the timeframe of
the error. Probably from the time you created the glusterfs domain.

Nir

>
>
>
> hread-167405::DEBUG::2015-10-12
> 10:12:20,132::stompReactor::163::yajsonrpc.StompServer::(send) Sending
> response
> Thread-55245::DEBUG::2015-10-12
> 10:12:22,529::task::595::Storage.TaskManager.Task::(_updateState)
> Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::moving from state init -> state
> preparing
> Thread-55245::INFO::2015-10-12
> 10:12:22,530::logUtils::44::dispatcher::(wrapper) Run and protect:
> getVolumeSize(sdUUID='d44ee4b0-8d36-467a-9610-c682a618b698',
> spUUID='0ae7120a-430d-4534-9a7e-59c53fb2e804',
> imgUUID='3454b077-297b-4b89-b8ce-a77f6ec5d22e',
> volUUID='933da0b6-6a05-4e64-958a-e1c030cf5ddb', options=None)
> Thread-55245::INFO::2015-10-12
> 10:12:22,535::logUtils::47::dispatcher::(wrapper) Run and protect:
> getVolumeSize, Return response: {'truesize': '158983839744', 'apparentsize':
> '161061273600'}
> Thread-55245::DEBUG::2015-10-12
> 10:12:22,535::task::1191::Storage.TaskManager.Task::(prepare)
> Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::finished: {'truesize':
> '158983839744', 'apparentsize': '161061273600'}
> Thread-55245::DEBUG::2015-10-12
> 10:12:22,535::task::595::Storage.TaskManager.Task::(_updateState)
> Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::moving from state preparing ->
> state finished
> Thread-55245::DEBUG::2015-10-12
> 10:12:22,535::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll)
> Owner.releaseAll requests {} resources {}
> Thread-55245::DEBUG::2015-10-12
> 10:12:22,536::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
> Owner.cancelAll requests {}
> Thread-55245::DEBUG::2015-10-12
> 10:12:22,536::task::993::Storage.TaskManager.Task::(_decref)
> Task=`c887acfa-bd10-4dfb-9374-da607c133e68`::ref 0 aborting False
> Thread-55245::DEBUG::2015-10-12
> 10:12:22,545::libvirtconnection::143::root::(wrapper) Unknown libvirterror:
> ecode: 80 edom: 20 level: 2 message: metadata not found: Requested metadata
> element is not present
> JsonRpc (StompReactor)::DEBUG::2015-10-12
> 10:12:23,138::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling
> message 
> JsonRpcServer::DEBUG::2015-10-12
> 10:12:23,139::__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting
> for request
> Thread-167406::DEBUG::2015-10-12
> 10:12:23,142::stompReactor::163::yajsonrpc.StompServer::(send) Sending
> response
> Thread-37810::DEBUG::2015-10-12
> 10:12:24,194::fileSD::262::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd
> if=/rhev/data-center/mnt/ovirt01:_data_iso/5aec30fa-be8b-4f4e-832e-eafb6fa4a8e0/dom_md/metadata
> iflag=direct of=/dev/null bs=4096 count=1 (cwd None)
> Thread-37810::DEBUG::2015-10-12
> 10:12:24,201::fileSD::262::Storage.Misc.excCmd::(getReadDelay) SUCCESS:
>  = '0+1 records in\n0+1 records out\n317 bytes (317 B) copied,
> 0.000131729 s, 2.4 MB/s\n';  = 0
> JsonRpc (StompReactor)::DEBUG::2015-10-12
> 10:12:26,148::stompReactor::98::Broker.StompAdapter::(handle_frame) Handling
> message 
> JsonRpcServer::DEBUG::2015-10-12
> 10:12:26,149::__init__::530::jsonrpc.JsonRpcServer::(serve_requests) Waiting
> for request
> Thread-167407::DEBUG::2015-10-12
> 10:12:26,151::stompReactor::163::yajsonrpc.StompServer::(send) Sending
> response
> VM Channels Listener::DEBUG::2015-10-12
> 10:12:26,972::vmchannels::96::vds::(_handle_timeouts) Timeout on fileno 35.
> Thread-30::DEBUG::2015-10-12
> 10:12:28,358::fileSD::262::Storage.Misc.excCmd::(getReadDelay) /usr/bin/dd
> 

Re: [ovirt-users] engine.log is looping with Volume XXX contains a apparently corrupt brick(s).

2015-10-12 Thread Nico
 

Le 2015-10-12 14:04, Nir Soffer a écrit : 

>> On Mon, Oct 12, 2015 at 11:14 AM, Nico  wrote:
> 
> Yes, engine will let you use such volume in 3.5 - this is a bug. In 3.6 you 
> will
> not be able to use such setup.
> 
> replica 2 fails in a very bad way when one brick is down; the
> application may get
> stale data, and this breaks sanlock. You will be get stuck with spm
> that cannot be
> stopped and other fun stuff.
> 
> You don't want to go in this direction, and we will not be able to support 
> that.

For the record, I already rebooted node1; and the node2 took over the
existing VM from node 1 and vice-versa. 

GlusterFS worked fine, oVirt application was still working fine .. i
guess it is because it was a soft reboot which stops softly the
services. 

I got another case where i stuck the network on the 2 nodes
simultaneously after a bad manipulation on oVirt GUI and i got a split
brain. 

i kept the error at this very moment: 

root@devnix-virt-master02 nets]# gluster volume heal ovirt info
split-brain
Brick devnix-virt-master01:/gluster/ovirt/
/d44ee4b0-8d36-467a-9610-c682a618b698/dom_md/ids
Number of entries in split-brain: 1 

Brick devnix-virt-master02:/gluster/ovirt/
/d44ee4b0-8d36-467a-9610-c682a618b698/dom_md/ids
Number of entries in split-brain: 1 

This file was having same size on both nodes; so it was hard to select
one. Finally i chose the younger one and all was back online after the
heal. 

It is this kind of stuff you are talking about with 2 nodes ? 

For now, I don't have budget to take a third one; so i'm a bit stuck and
disappointing. 

I've a third device but for backup, it has lot of storage but low cpu
abilities (no VT-X) so i can't use it as hypervisor. 

I could maybe use it as a third brick but is it possible to have this
kind of configuration ? 2 actives nodes as hypervisor and 1 third only
for gluster replica 3 ? 

Cheers 

Nico 

 ___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] engine.log is looping with Volume XXX contains a apparently corrupt brick(s).

2015-10-11 Thread Nico
 

Hi 

Recently, i built a small oVirt platform with 2 dedicated servers and
GlusterFS to synch the VM storage. 

oVirt Setup is simple: 

ovirt01 : Host Agent (VDSM) + oVirt Engine 

ovirt02 : Host Agent (VDSM) 

Version : 

ovirt-release35-005-1.noarch 

ovirt-engine-3.5.4.2-1.el7.centos.noarch 

vdsm-4.16.26-0.el7.centos.x86_64 

vdsm-gluster-4.16.26-0.el7.centos.noarch 

glusterfs-server-3.7.4-2.el7.x86_64 

GlusterFS Setup is simple, 2 bricks in replicate mode. 

It was done in shell; not using oVirt GUI, and then it was added in
STORAGE as a new DOMAIN; Type:DATA GlusterFS V3 

# gluster volume info 

Volume Name: ovirt 

Type: Replicate 

Volume ID: 043d2d36-dc2c-4f75-9d28-96dbac25d07c 

Status: Started 

Number of Bricks: 1 x 2 = 2 

Transport-type: tcp 

Bricks: 

Brick1: ovirt01:/gluster/ovirt 

Brick2: ovirt02:/gluster/ovirt 

Options Reconfigured: 

performance.readdir-ahead: on 

nfs.disable: true 

auth.allow: IP_A, IP_B 

network.ping-timeout: 10 

storage.owner-uid: 36 

storage.owner-gid: 36 

server.allow-insecure: on 

the data are reachable on the 2 nodes through a moint point that oVirt
created when i configured the Storage with the GUI: 

localhost:/ovirt 306G 216G 78G 74%
/rhev/data-center/mnt/glusterSD/localhost:_ovirt 

I created 7 VM on this shared storage and all is working fine. I can do
Live migration; all is working. 

But when i check /var/log/ovirt/engine.log on ovirt01, there are error
in loop every 2 seconds: 

2015-10-11 17:29:50,971 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler_Worker-29) [34dbe5cf] START,
GlusterVolumesListVDSCommand(HostName = ovirt02, HostId =
65a5bb5d-721f-4a4b-9e77-c4b9162c0aa6), log id: 41443b77 

2015-10-11 17:29:50,998 WARN
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturnForXmlRpc]
(DefaultQuartzScheduler_Worker-29) [34dbe5cf] Could not add brick
ovirt02:/gluster/ovirt to volume 043d2d36-dc2c-4f75-9d28-96dbac25d07c -
server uuid 3c340e59-334f-4aa6-ad61-af2acaf3cad6 not found in cluster
fb976d4f-de13-449b-93e8-600fcb59d4e6 

2015-10-11 17:29:50,999 INFO
[org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand]
(DefaultQuartzScheduler_Worker-29) [34dbe5cf] FINISH,
GlusterVolumesListVDSCommand, return:
{043d2d36-dc2c-4f75-9d28-96dbac25d07c=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@200ae0d1},
log id: 41443b77 

2015-10-11 17:29:51,001 WARN
[org.ovirt.engine.core.bll.gluster.GlusterSyncJob]
(DefaultQuartzScheduler_Worker-29) [34dbe5cf] Volume ovirt contains a
apparently corrupt brick(s). Hence will not add it to engine at this
point. 

I played a lot with oVirt at first it was running on a single node; in
Local Datacenter; then i added a second node, move the first host to a
new datacenter; migrated the images VM etc; with some pain at some very
moment and now all looks fine but i prefer to double check. 

So, i want to know if there is a real issue with ovirt/gluster setup
that i don't see, any info are welcome because i'm a bit worried to see
these message in LOOP on the log. 

Thanks in advance; 

Regards 

Nico 
 ___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users