[ovirt-users] oVirt + Gluster issues

2021-06-08 Thread José Ferradeira via Users
Hello, 

running ovirt 4.4.4.7-1.el8 and gluster 8.3. 
When i performe a restore of Zimbra Collaboration Email with features.shard on, 
the VM pauses with an unknown storage error. 
When I performe a restore of Zimbra Collaboration Email with features.shard 
off, it fills all the gluster storage domain disks. 

With older versions of gluster and ovirt the same happens. If I use a NFS 
storage domain it runs OK. 



-- 

Jose Ferradeira 
http://www.logicworks.pt 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QGBFSIHHTDOTTOWFWQKZFQMD56YWHTPZ/


[ovirt-users] oVirt - Gluster Node Offline but Bricks Active

2020-09-21 Thread Jeremey Wise
oVirt engine shows  one of the gluster servers having an issue.  I did a
graceful shutdown of all three nodes over weekend as I have to move around
some power connections in prep for UPS.

Came back up.. but

[image: image.png]

And this is reflected in 2 bricks online (should be three for each volume)
[image: image.png]

Command line shows gluster should be happy.

[root@thor engine]# gluster peer status
Number of Peers: 2

Hostname: odinst.penguinpages.local
Uuid: 83c772aa-33cd-430f-9614-30a99534d10e
State: Peer in Cluster (Connected)

Hostname: medusast.penguinpages.local
Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031
State: Peer in Cluster (Connected)
[root@thor engine]#

# All bricks showing online
[root@thor engine]# gluster volume status
Status of volume: data
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/data/data  49152 0  Y
11001
Brick odinst.penguinpages.local:/gluster_br
icks/data/data  49152 0  Y
2970
Brick medusast.penguinpages.local:/gluster_
bricks/data/data49152 0  Y
2646
Self-heal Daemon on localhost   N/A   N/AY
50560
Self-heal Daemon on odinst.penguinpages.loc
al  N/A   N/AY
3004
Self-heal Daemon on medusast.penguinpages.l
ocalN/A   N/AY
2475

Task Status of Volume data
--
There are no active volume tasks

Status of volume: engine
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/engine/engine  49153 0  Y
11012
Brick odinst.penguinpages.local:/gluster_br
icks/engine/engine  49153 0  Y
2982
Brick medusast.penguinpages.local:/gluster_
bricks/engine/engine49153 0  Y
2657
Self-heal Daemon on localhost   N/A   N/AY
50560
Self-heal Daemon on odinst.penguinpages.loc
al  N/A   N/AY
3004
Self-heal Daemon on medusast.penguinpages.l
ocalN/A   N/AY
2475

Task Status of Volume engine
--
There are no active volume tasks

Status of volume: iso
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/iso/iso49156 49157  Y
151426
Brick odinst.penguinpages.local:/gluster_br
icks/iso/iso49156 49157  Y
69225
Brick medusast.penguinpages.local:/gluster_
bricks/iso/iso  49156 49157  Y
45018
Self-heal Daemon on localhost   N/A   N/AY
50560
Self-heal Daemon on odinst.penguinpages.loc
al  N/A   N/AY
3004
Self-heal Daemon on medusast.penguinpages.l
ocalN/A   N/AY
2475

Task Status of Volume iso
--
There are no active volume tasks

Status of volume: vmstore
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/vmstore/vmstore49154 0  Y
11023
Brick odinst.penguinpages.local:/gluster_br
icks/vmstore/vmstore49154 0  Y
2993
Brick medusast.penguinpages.local:/gluster_
bricks/vmstore/vmstore  49154 0  Y
2668
Self-heal Daemon on localhost   N/A   N/AY
50560
Self-heal Daemon on medusast.penguinpages.l
ocalN/A   N/AY
2475
Self-heal Daemon on odinst.penguinpages.loc
al  N/A   N/AY
3004

Task Status of Volume vmstore
--
There are no active volume tasks

[root@thor engine]# gluster volume heal
data engine   iso  vmstore
[root@thor engine]# gluster volume heal data info
Brick thorst.penguinpages.local:/gluster_bricks/data/data
Status: Connected
Number of entries: 0

Brick odinst.penguinpages.local:/gluster_bricks/data/data
Status: Connected
Number of entries: 0

Brick 

[ovirt-users] oVirt Gluster Node Completely Failed Replacement

2019-10-23 Thread Robert Crawford
Hey Guys,

I had a Gluster node fail and I need to replace it; is there a replacement 
guide?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QVYO55KPAG5TGJLQ5OTZIRM4EDNZARWV/


[ovirt-users] oVirt Gluster Fails to Failover after node failure

2019-10-03 Thread Robert Crawford
One of my nodes has failed and the domain isn't coming online because the 
primary node isn't up? 
In the parameters there is backup-volfile-servers=192.168.100.2:192.168.100.3 
Any help?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HOSMQIUHDCA4URMUHOC4BYXBIAICB2RO/


[ovirt-users] oVirt Gluster Volume Cannot Find UUID

2019-10-03 Thread Robert Crawford
Hey Guys,

After an update my node goes into emergency mode saying it can't find the UUID 
associated with one of my physical volumes?
I tried an imbase rollback and nothing

pvs is showing that the identifier is missing
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/M46TDU6S3JTGLA44NB7R3RQTNKJVECZJ/


[ovirt-users] oVirt + Gluster

2019-09-22 Thread TomK

Hey All,

Seeing GlusterFS up to version 7, do any of the oVirt versions support 
anything higher then 3.X?


Or is Gluster not the preferred distributed file system choice for oVirt?

--
Thx,
TK.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NNKXLMIGLV3GJJVW3PIF2OXV6V3PS3BV/


[ovirt-users] Ovirt gluster arbiter within hosted VM

2019-04-19 Thread Alex K
Hi all,

I have a two node hyper-converged setup which are causing me split-brains
when network issues are encountered. Since I cannot add a third hardware
node, I was thinking to add a dedicated guest VM hosted in same
hyper-converged cluster which would do the arbiter for the volumes.

What do you think about this setup in regards to stability and performance?
I am running ovirt 4.2.

Thanx,
Alex
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LTDOAMBKVE5BP4V245MPYSCM6DK7S52X/


[ovirt-users] OVirt Gluster Fail

2019-03-23 Thread commramius
Ho una installazione OVirt 4.1 in gluster.

Durante la manutenzione di una macchina abbiamo l'Hosted Engine si è bloccato. 
A quel punto non c'è stata più possibilità di gestire nulla.
Le vm sono andate in pausa, e non sono state più gestibili.
Ho atteso il riavvio della macchina, ma a quel punto anche tutti i brick non 
erano più raggiungibili.
Ora mi trovo nella situazione in cui non viene più caricato il mount per 
l'engine.
Il gluster vede i peer connessi e i servizi accesi per i vari brick, ma non 
riesce a fare l'heal i messaggi che trovo per ogni macchina sono i seguenti:

# gluster volume heal engine info
Brick 192.170.254.3:/bricks/engine/brick
 
.
.
.
 

Status: Connected
Number of entries: 190

Brick 192.170.254.4:/bricks/engine/brick
Status: Il socket di destinazione non è connesso
Number of entries: -

Brick 192.170.254.6:/bricks/engine/brick
Status: Il socket di destinazione non è connesso
Number of entries: -

questo per tutti i brick (alcuni non hanno alcun heal da fare perchè le 
macchine all'interno erano state spente).

In pratica tutti i brick vedono solo localhost come connesso.
Come posso fare per ripristinare le macchine?
Esiste un modo per poter leggere i dati dalla macchina fisica ed esportarli in 
modo da poterli riusare?
Purtroppo abbiamo la necessità di accedere a quei dati

Qualcuno può aiutarmi.

Grazie

Andrea
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/M23M4GLKSM3CKWXXSHN3QWF7YHTEEHWW/


[ovirt-users] oVirt + gluster with JBOD

2018-09-12 Thread Eduardo Mayoral
Hi!

  I am thinking about using some spare servers I have for a
hyperconverged oVirt + gluster. Only problem with these servers is that
the RAID card is not specially good, so I am considering using gluster
in a JBOD configuration. JBOD is supported for recent versions of
gluster, but I am not sure if this configuration is supported by oVirt,
or if there are any special considerations I should take into account.

    Anyone running oVirt + gluster with JBOD who can comment on this?

    Thanks!

--

Eduardo Mayoral.

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2DTIGZ5VOKG52ZHCZTTCHGPT5NFKYSTP/


[ovirt-users] oVirt + Gluster geo-replication

2018-09-07 Thread femi adegoke
Hi Sahina,
https://ovirt.org/develop/release-management/features/gluster/gluster-geo-replication/

Unfortunately, there isn't info on how to set this up.
Can you share with us any tips or info needed to do geo-rep between 2 sites?
I also read an article that says geo-rep between 3 sites is possible?

Thanks,
Femi

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/66AW2542Y3PV2FTUHRB2Q3BLQBXX46HK/


[ovirt-users] Ovirt + Gluster : How do I gain access to the file systems of the VMs

2018-06-18 Thread Hanson Turner

Hi Guys,

My engine has corrupted, and while waiting for help, I'd like to see if 
I can pull some data off the VM's to re purpose back onto dedicated 
hardware.


Our setup is/was a gluster based storage system for VM's. The gluster 
data storage I'm assuming is okay, I think the hosted engine is hosed, 
and needs restored, but that's another thread.


I can copy the raw disk file off of the gluster data domain. What's the 
best way to mount it short of importing it into another gluster domain?


With vmware, we can grab the disk file and move it from server to server 
without issue. You can mount and explore contents with workstation.


What do we have available to us for ovirt?

Thanks,

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2AF5K2JERYH63K25XKA4FFP4QQDZSVWM/


[ovirt-users] oVirt Gluster storage hardware

2018-05-21 Thread ovirt

For those folks using SSD with Gluster, are you doing RAID or HBA?

Which is recommended for performance.

(I've asked this question before, not sure I got a good answer)
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org


Re: [ovirt-users] oVirt-Gluster Hyperconvergence: Graceful shutdown and startup

2018-01-04 Thread Bernhard Seidl
Am 04.01.2018 um 11:56 schrieb Sahina Bose:
> 
> 
> On Wed, Jan 3, 2018 at 9:27 PM, Bernhard Seidl  > wrote:
> 
> Hi all and a happy new year,
> 
> I am just testing oVirt 4.2 using a three node gluster hyperconvergence
> and self hosted engine setup. How should this setup be shutdown and
> started again? Here is what tried an experienced:
> 
> 
> The approach you outlined is correct.
> 
> 
> Shutdown:
> 
> 1. Shutdown all VMs
> 
> 2. Enable global ha maintenance
> 
> 3. Wait for all VM
> 
> 4. Shutdown hosted-engine using "hosted-engine --vm-shutdown"
> 
> 5. Wait for stopped hosted engine
> 
> 6. Shutdown all nodes
> 
> Startup:
> 
> 1. Switch on all nodes
> 
> 2. Start glusterd on all nodes since it does not start by default (is
> this a bug?) using "systemctl start glusterd*"*
> 
> 
> Can you log a bug?

There it is: https://bugzilla.redhat.com/show_bug.cgi?id=1531052

> 
> 
> 3. Check volume status using "gluster peer status" and "gluster volume
> status all" on one of the nodes
> 
> 4. Wait for ovrt-ha-agent until "hosted-engine --vm-status" does not
> fail anymore printing "The hosted engine configuration has not been
> retrieved from shared storage. Please ensure that ovirt-ha-agent is
> running and the storage server is reachable."
> 
> 4. Start hosted engine "hosted-engine --vm-start" on one of the nodes
> 
> 5. Check status using "hosted-engine --vm-status" and wait until health
> reports good
> 
> 6. Wait until a host got the SPM role
> 

I think that I missed a step here:
6a. Disable global ha maintenance


> 7. Start VMs
> 
> Is this the best/correct way?
> 
> Kind regards,
> Bernhard

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt-Gluster Hyperconvergence: Graceful shutdown and startup

2018-01-04 Thread Sahina Bose
On Wed, Jan 3, 2018 at 9:27 PM, Bernhard Seidl 
wrote:

> Hi all and a happy new year,
>
> I am just testing oVirt 4.2 using a three node gluster hyperconvergence
> and self hosted engine setup. How should this setup be shutdown and
> started again? Here is what tried an experienced:
>

The approach you outlined is correct.


> Shutdown:
>
> 1. Shutdown all VMs
>
> 2. Enable global ha maintenance
>
> 3. Wait for all VM
>
> 4. Shutdown hosted-engine using "hosted-engine --vm-shutdown"
>
> 5. Wait for stopped hosted engine
>
> 6. Shutdown all nodes
>
> Startup:
>
> 1. Switch on all nodes
>
> 2. Start glusterd on all nodes since it does not start by default (is
> this a bug?) using "systemctl start glusterd*"*
>

Can you log a bug?


> 3. Check volume status using "gluster peer status" and "gluster volume
> status all" on one of the nodes
>
> 4. Wait for ovrt-ha-agent until "hosted-engine --vm-status" does not
> fail anymore printing "The hosted engine configuration has not been
> retrieved from shared storage. Please ensure that ovirt-ha-agent is
> running and the storage server is reachable."
>
> 4. Start hosted engine "hosted-engine --vm-start" on one of the nodes
>
> 5. Check status using "hosted-engine --vm-status" and wait until health
> reports good
>
> 6. Wait until a host got the SPM role
>
> 7. Start VMs
>
> Is this the best/correct way?
>
> Kind regards,
> Bernhard
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] oVirt-Gluster Hyperconvergence: Graceful shutdown and startup

2018-01-03 Thread Bernhard Seidl
Hi all and a happy new year,

I am just testing oVirt 4.2 using a three node gluster hyperconvergence
and self hosted engine setup. How should this setup be shutdown and
started again? Here is what tried an experienced:

Shutdown:

1. Shutdown all VMs

2. Enable global ha maintenance

3. Wait for all VM

4. Shutdown hosted-engine using "hosted-engine --vm-shutdown"

5. Wait for stopped hosted engine

6. Shutdown all nodes

Startup:

1. Switch on all nodes

2. Start glusterd on all nodes since it does not start by default (is
this a bug?) using "systemctl start glusterd*"*

3. Check volume status using "gluster peer status" and "gluster volume
status all" on one of the nodes

4. Wait for ovrt-ha-agent until "hosted-engine --vm-status" does not
fail anymore printing "The hosted engine configuration has not been
retrieved from shared storage. Please ensure that ovirt-ha-agent is
running and the storage server is reachable."

4. Start hosted engine "hosted-engine --vm-start" on one of the nodes

5. Check status using "hosted-engine --vm-status" and wait until health
reports good

6. Wait until a host got the SPM role

7. Start VMs

Is this the best/correct way?

Kind regards,
Bernhard



smime.p7s
Description: S/MIME Cryptographic Signature
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-08 Thread Abi Askushi
Filed the *Bug 1459855*


Alex

On Thu, Jun 8, 2017 at 1:16 PM, Abi Askushi  wrote:

> Hi Denis,
>
> Ok I will file a bug for this.
> I am not sure if I will be able to provide troubleshooting info for much
> long as I already have put forward the replacement of disks with 512 ones.
>
> Alex
>
> On Thu, Jun 8, 2017 at 11:48 AM, Denis Chaplygin 
> wrote:
>
>> Hello Alex,
>>
>>
>> On Wed, Jun 7, 2017 at 11:39 AM, Abi Askushi 
>> wrote:
>>
>>> Hi Sahina,
>>>
>>> Did you have the chance to check the logs and have any idea how this may
>>> be addressed?
>>>
>>
>>
>> It seems to be a VDSM issue, as VDSM uses direct IO (and id actualy calls
>> dd) and assumes that block size is 512. I see in the code, that block size
>> is defined as a constant, so it probably may be adjusted, but i think it
>> would be better if we ask some one who knows that part better.
>>
>> Anyway, could you please file a bug on that issue? Thanks in advance.
>>
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-08 Thread Abi Askushi
Hi Denis,

Ok I will file a bug for this.
I am not sure if I will be able to provide troubleshooting info for much
long as I already have put forward the replacement of disks with 512 ones.

Alex

On Thu, Jun 8, 2017 at 11:48 AM, Denis Chaplygin 
wrote:

> Hello Alex,
>
>
> On Wed, Jun 7, 2017 at 11:39 AM, Abi Askushi 
> wrote:
>
>> Hi Sahina,
>>
>> Did you have the chance to check the logs and have any idea how this may
>> be addressed?
>>
>
>
> It seems to be a VDSM issue, as VDSM uses direct IO (and id actualy calls
> dd) and assumes that block size is 512. I see in the code, that block size
> is defined as a constant, so it probably may be adjusted, but i think it
> would be better if we ask some one who knows that part better.
>
> Anyway, could you please file a bug on that issue? Thanks in advance.
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-08 Thread Denis Chaplygin
Hello Alex,


On Wed, Jun 7, 2017 at 11:39 AM, Abi Askushi 
wrote:

> Hi Sahina,
>
> Did you have the chance to check the logs and have any idea how this may
> be addressed?
>


It seems to be a VDSM issue, as VDSM uses direct IO (and id actualy calls
dd) and assumes that block size is 512. I see in the code, that block size
is defined as a constant, so it probably may be adjusted, but i think it
would be better if we ask some one who knows that part better.

Anyway, could you please file a bug on that issue? Thanks in advance.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-07 Thread Abi Askushi
Hi Sahina,

Did you have the chance to check the logs and have any idea how this may be
addressed?


Thanx,
Alex

On Mon, Jun 5, 2017 at 12:14 PM, Sahina Bose  wrote:

> Can we have the gluster mount logs and brick logs to check if it's the
> same issue?
>
> On Sun, Jun 4, 2017 at 11:21 PM, Abi Askushi 
> wrote:
>
>> I clean installed everything and ran into the same.
>> I then ran gdeploy and encountered the same issue when deploying engine.
>> Seems that gluster (?) doesn't like 4K sector drives. I am not sure if it
>> has to do with alignment. The weird thing is that gluster volumes are all
>> ok, replicating normally and no split brain is reported.
>>
>> The solution to the mentioned bug (1386443
>> ) was to format
>> with 512 sector size, which for my case is not an option:
>>
>> mkfs.xfs -f -i size=512 -s size=512 /dev/gluster/engine
>> illegal sector size 512; hw sector is 4096
>>
>> Is there any workaround to address this?
>>
>> Thanx,
>> Alex
>>
>>
>> On Sun, Jun 4, 2017 at 5:48 PM, Abi Askushi 
>> wrote:
>>
>>> Hi Maor,
>>>
>>> My disk are of 4K block size and from this bug seems that gluster
>>> replica needs 512B block size.
>>> Is there a way to make gluster function with 4K drives?
>>>
>>> Thank you!
>>>
>>> On Sun, Jun 4, 2017 at 2:34 PM, Maor Lipchuk 
>>> wrote:
>>>
 Hi Alex,

 I saw a bug that might be related to the issue you encountered at
 https://bugzilla.redhat.com/show_bug.cgi?id=1386443

 Sahina, maybe you have any advise? Do you think that BZ1386443is
 related?

 Regards,
 Maor

 On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi 
 wrote:
 > Hi All,
 >
 > I have installed successfully several times oVirt (version 4.1) with
 3 nodes
 > on top glusterfs.
 >
 > This time, when trying to configure the same setup, I am facing the
 > following issue which doesn't seem to go away. During installation i
 get the
 > error:
 >
 > Failed to execute stage 'Misc configuration': Cannot acquire host id:
 > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
 'Sanlock
 > lockspace add failure', 'Invalid argument'))
 >
 > The only different in this setup is that instead of standard
 partitioning i
 > have GPT partitioning and the disks have 4K block size instead of 512.
 >
 > The /var/log/sanlock.log has the following lines:
 >
 > 2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace
 > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/m
 nt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862-c2b8
 -46e7-b2c8-91e4a5bb2047/dom_md/ids:0
 > 2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource
 > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/m
 nt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd862-c2b
 8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576
 > for 2,9,23040
 > 2017-06-03 19:21:36+0200 23471 [943]: s10 lockspace
 > a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-center/m
 nt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8
 b4d5e5e922/dom_md/ids:0
 > 2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7 aio collect RD
 > 0x7f59b8c0:0x7f59b8d0:0x7f59b0101000 result -22:0 match res
 > 2017-06-03 19:21:36+0200 23471 [23522]: read_sectors delta_leader
 offset
 > 127488 rv -22
 > /rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e
 7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids
 > 2017-06-03 19:21:37+0200 23472 [930]: s9 host 250 1 23450
 > 88c2244c-a782-40ed-9560-6cfa4d46f853.v0.neptune
 > 2017-06-03 19:21:37+0200 23472 [943]: s10 add_lockspace fail result
 -22
 >
 > And /var/log/vdsm/vdsm.log says:
 >
 > 2017-06-03 19:19:38,176+0200 WARN  (jsonrpc/3)
 > [storage.StorageServer.MountConnection] Using user specified
 > backup-volfile-servers option (storageServer:253)
 > 2017-06-03 19:21:12,379+0200 WARN  (periodic/1) [throttled] MOM not
 > available. (throttledlog:105)
 > 2017-06-03 19:21:12,380+0200 WARN  (periodic/1) [throttled] MOM not
 > available, KSM stats will be missing. (throttledlog:105)
 > 2017-06-03 19:21:14,714+0200 WARN  (jsonrpc/1)
 > [storage.StorageServer.MountConnection] Using user specified
 > backup-volfile-servers option (storageServer:253)
 > 2017-06-03 19:21:15,515+0200 ERROR (jsonrpc/4) [storage.initSANLock]
 Cannot
 > initialize SANLock for domain a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922
 > (clusterlock:238)
 > Traceback (most recent call last):
 >   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
 line
 > 234, in initSANLock
 > sanlock.init_lockspace(sdUUID, idsPath)
 > SanlockException: (107, 'Sanlock lockspace init 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-06 Thread Abi Askushi
Just to note that the mentioned logs below are from the dd with bs=512,
which were failing.
Attached the full logs from mount and brick.

Alex

On Tue, Jun 6, 2017 at 3:18 PM, Abi Askushi  wrote:

> Hi Krutika,
>
> My comments inline.
>
> Also attached the strace of:
> strace -y -ff -o /root/512-trace-on-root.log dd if=/dev/zero
> of=/mnt/test2.img oflag=direct bs=512 count=1
>
> and of:
> strace -y -ff -o /root/4096-trace-on-root.log dd if=/dev/zero
> of=/mnt/test2.img oflag=direct bs=4096 count=16
>
> I have mounted gluster volume at /mnt.
> The dd with bs=4096 is successful.
>
> The gluster mount log gives only the following:
> [2017-06-06 12:04:54.102576] W [MSGID: 114031] 
> [client-rpc-fops.c:854:client3_3_writev_cbk]
> 0-engine-client-0: remote operation failed [Invalid argument]
> [2017-06-06 12:04:54.102591] W [MSGID: 114031] 
> [client-rpc-fops.c:854:client3_3_writev_cbk]
> 0-engine-client-1: remote operation failed [Invalid argument]
> [2017-06-06 12:04:54.103355] W [fuse-bridge.c:2312:fuse_writev_cbk]
> 0-glusterfs-fuse: 205: WRITE => -1 gfid=075ab3a5-0274-4f07-a075-2748c3b4d394
> fd=0x7faf1d08706c (Transport endpoint is not connected)
>
> The gluster brick log gives:
> [2017-06-06 12:07:03.793080] E [MSGID: 113072] [posix.c:3453:posix_writev]
> 0-engine-posix: write failed: offset 0, [Invalid argument]
> [2017-06-06 12:07:03.793172] E [MSGID: 115067] 
> [server-rpc-fops.c:1346:server_writev_cbk]
> 0-engine-server: 291: WRITEV 0 (075ab3a5-0274-4f07-a075-2748c3b4d394) ==>
> (Invalid argument) [Invalid argument]
>
>
>
> On Tue, Jun 6, 2017 at 12:50 PM, Krutika Dhananjay 
> wrote:
>
>> OK.
>>
>> So for the 'Transport endpoint is not connected' issue, could you share
>> the mount and brick logs?
>>
>> Hmmm.. 'Invalid argument' error even on the root partition. What if you
>> change bs to 4096 and run?
>>
> If I use bs=4096 the dd is successful on /root and at gluster mounted
> volume.
>
>>
>> The logs I showed in my earlier mail shows that gluster is merely
>> returning the error it got from the disk file system where the
>> brick is hosted. But you're right about the fact that the offset 127488
>> is not 4K-aligned.
>>
>> If the dd on /root worked for you with bs=4096, could you try the same
>> directly on gluster mount point on a dummy file and capture the strace
>> output of dd?
>> You can perhaps reuse your existing gluster volume by mounting it at
>> another location and doing the dd.
>> Here's what you need to execute:
>>
>> strace -ff -T -p  -o 
>> `
>>
>> FWIW, here's something I found in man(2) open:
>>
>>
>>
>>
>> *Under  Linux  2.4,  transfer  sizes,  and  the alignment of the user
>> buffer and the file offset must all be multiples of the logical block size
>> of the filesystem.  Since Linux 2.6.0, alignment to the logical block size
>> of the   underlying storage (typically 512 bytes) suffices.  The
>> logical block size can be determined using the ioctl(2) BLKSSZGET operation
>> or from the shell using the command:   blockdev --getss*
>>
> Please note also that the physical disks are of 4K sector size (native).
> Thus OS is having 4096/4096 local/physical sector size.
> [root@v0 ~]# blockdev --getss /dev/sda
> 4096
> [root@v0 ~]# blockdev --getpbsz /dev/sda
> 4096
>
>>
>>
>> -Krutika
>>
>>
>> On Tue, Jun 6, 2017 at 1:18 AM, Abi Askushi 
>> wrote:
>>
>>> Also when testing with dd i get the following:
>>>
>>> *Testing on the gluster mount: *
>>> dd if=/dev/zero of=/rhev/data-center/mnt/glusterSD/10.100.100.1:
>>> _engine/test2.img oflag=direct bs=512 count=1
>>> dd: error writing β/rhev/data-center/mnt/glusterSD/10.100.100.1:
>>> _engine/test2.imgβ: *Transport endpoint is not connected*
>>> 1+0 records in
>>> 0+0 records out
>>> 0 bytes (0 B) copied, 0.00336755 s, 0.0 kB/s
>>>
>>> *Testing on the /root directory (XFS): *
>>> dd if=/dev/zero of=/test2.img oflag=direct bs=512 count=1
>>> dd: error writing β/test2.imgβ:* Invalid argument*
>>> 1+0 records in
>>> 0+0 records out
>>> 0 bytes (0 B) copied, 0.000321239 s, 0.0 kB/s
>>>
>>> Seems that the gluster is trying to do the same and fails.
>>>
>>>
>>>
>>> On Mon, Jun 5, 2017 at 10:10 PM, Abi Askushi 
>>> wrote:
>>>
 The question that rises is what is needed to make gluster aware of the
 4K physical sectors presented to it (the logical sector is also 4K). The
 offset (127488) at the log does not seem aligned at 4K.

 Alex

 On Mon, Jun 5, 2017 at 2:47 PM, Abi Askushi 
 wrote:

> Hi Krutika,
>
> I am saying that I am facing this issue with 4k drives. I never
> encountered this issue with 512 drives.
>
> Alex
>
> On Jun 5, 2017 14:26, "Krutika Dhananjay"  wrote:
>
>> This seems like a case of O_DIRECT reads and writes gone wrong,
>> judging by the 'Invalid argument' errors.
>>
>> The two operations 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-06 Thread Abi Askushi
Hi Krutika,

My comments inline.

Also attached the strace of:
strace -y -ff -o /root/512-trace-on-root.log dd if=/dev/zero
of=/mnt/test2.img oflag=direct bs=512 count=1

and of:
strace -y -ff -o /root/4096-trace-on-root.log dd if=/dev/zero
of=/mnt/test2.img oflag=direct bs=4096 count=16

I have mounted gluster volume at /mnt.
The dd with bs=4096 is successful.

The gluster mount log gives only the following:
[2017-06-06 12:04:54.102576] W [MSGID: 114031]
[client-rpc-fops.c:854:client3_3_writev_cbk] 0-engine-client-0: remote
operation failed [Invalid argument]
[2017-06-06 12:04:54.102591] W [MSGID: 114031]
[client-rpc-fops.c:854:client3_3_writev_cbk] 0-engine-client-1: remote
operation failed [Invalid argument]
[2017-06-06 12:04:54.103355] W [fuse-bridge.c:2312:fuse_writev_cbk]
0-glusterfs-fuse: 205: WRITE => -1
gfid=075ab3a5-0274-4f07-a075-2748c3b4d394 fd=0x7faf1d08706c (Transport
endpoint is not connected)

The gluster brick log gives:
[2017-06-06 12:07:03.793080] E [MSGID: 113072] [posix.c:3453:posix_writev]
0-engine-posix: write failed: offset 0, [Invalid argument]
[2017-06-06 12:07:03.793172] E [MSGID: 115067]
[server-rpc-fops.c:1346:server_writev_cbk] 0-engine-server: 291: WRITEV 0
(075ab3a5-0274-4f07-a075-2748c3b4d394) ==> (Invalid argument) [Invalid
argument]



On Tue, Jun 6, 2017 at 12:50 PM, Krutika Dhananjay 
wrote:

> OK.
>
> So for the 'Transport endpoint is not connected' issue, could you share
> the mount and brick logs?
>
> Hmmm.. 'Invalid argument' error even on the root partition. What if you
> change bs to 4096 and run?
>
If I use bs=4096 the dd is successful on /root and at gluster mounted
volume.

>
> The logs I showed in my earlier mail shows that gluster is merely
> returning the error it got from the disk file system where the
> brick is hosted. But you're right about the fact that the offset 127488 is
> not 4K-aligned.
>
> If the dd on /root worked for you with bs=4096, could you try the same
> directly on gluster mount point on a dummy file and capture the strace
> output of dd?
> You can perhaps reuse your existing gluster volume by mounting it at
> another location and doing the dd.
> Here's what you need to execute:
>
> strace -ff -T -p  -o 
> `
>
> FWIW, here's something I found in man(2) open:
>
>
>
>
> *Under  Linux  2.4,  transfer  sizes,  and  the alignment of the user
> buffer and the file offset must all be multiples of the logical block size
> of the filesystem.  Since Linux 2.6.0, alignment to the logical block size
> of the   underlying storage (typically 512 bytes) suffices.  The
> logical block size can be determined using the ioctl(2) BLKSSZGET operation
> or from the shell using the command:   blockdev --getss*
>
Please note also that the physical disks are of 4K sector size (native).
Thus OS is having 4096/4096 local/physical sector size.
[root@v0 ~]# blockdev --getss /dev/sda
4096
[root@v0 ~]# blockdev --getpbsz /dev/sda
4096

>
>
> -Krutika
>
>
> On Tue, Jun 6, 2017 at 1:18 AM, Abi Askushi 
> wrote:
>
>> Also when testing with dd i get the following:
>>
>> *Testing on the gluster mount: *
>> dd if=/dev/zero 
>> of=/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/test2.img
>> oflag=direct bs=512 count=1
>> dd: error writing β/rhev/data-center/mnt/glusterSD/10.100.100.1:
>> _engine/test2.imgβ: *Transport endpoint is not connected*
>> 1+0 records in
>> 0+0 records out
>> 0 bytes (0 B) copied, 0.00336755 s, 0.0 kB/s
>>
>> *Testing on the /root directory (XFS): *
>> dd if=/dev/zero of=/test2.img oflag=direct bs=512 count=1
>> dd: error writing β/test2.imgβ:* Invalid argument*
>> 1+0 records in
>> 0+0 records out
>> 0 bytes (0 B) copied, 0.000321239 s, 0.0 kB/s
>>
>> Seems that the gluster is trying to do the same and fails.
>>
>>
>>
>> On Mon, Jun 5, 2017 at 10:10 PM, Abi Askushi 
>> wrote:
>>
>>> The question that rises is what is needed to make gluster aware of the
>>> 4K physical sectors presented to it (the logical sector is also 4K). The
>>> offset (127488) at the log does not seem aligned at 4K.
>>>
>>> Alex
>>>
>>> On Mon, Jun 5, 2017 at 2:47 PM, Abi Askushi 
>>> wrote:
>>>
 Hi Krutika,

 I am saying that I am facing this issue with 4k drives. I never
 encountered this issue with 512 drives.

 Alex

 On Jun 5, 2017 14:26, "Krutika Dhananjay"  wrote:

> This seems like a case of O_DIRECT reads and writes gone wrong,
> judging by the 'Invalid argument' errors.
>
> The two operations that have failed on gluster bricks are:
>
> [2017-06-05 09:40:39.428979] E [MSGID: 113072]
> [posix.c:3453:posix_writev] 0-engine-posix: write failed: offset 0,
> [Invalid argument]
> [2017-06-05 09:41:00.865760] E [MSGID: 113040]
> [posix.c:3178:posix_readv] 0-engine-posix: read failed on
> gfid=8c94f658-ac3c-4e3a-b368-8c038513a914, fd=0x7f408584c06c,
> offset=127488 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-06 Thread Krutika Dhananjay
I stand corrected.

Just realised the strace command I gave was wrong.

Here's what you would actually need to execute:

strace -y -ff -o  

-Krutika

On Tue, Jun 6, 2017 at 3:20 PM, Krutika Dhananjay 
wrote:

> OK.
>
> So for the 'Transport endpoint is not connected' issue, could you share
> the mount and brick logs?
>
> Hmmm.. 'Invalid argument' error even on the root partition. What if you
> change bs to 4096 and run?
>
> The logs I showed in my earlier mail shows that gluster is merely
> returning the error it got from the disk file system where the
> brick is hosted. But you're right about the fact that the offset 127488 is
> not 4K-aligned.
>
> If the dd on /root worked for you with bs=4096, could you try the same
> directly on gluster mount point on a dummy file and capture the strace
> output of dd?
> You can perhaps reuse your existing gluster volume by mounting it at
> another location and doing the dd.
> Here's what you need to execute:
>
> strace -ff -T -p  -o 
> `
>
> FWIW, here's something I found in man(2) open:
>
>
>
>
> *Under  Linux  2.4,  transfer  sizes,  and  the alignment of the user
> buffer and the file offset must all be multiples of the logical block size
> of the filesystem.  Since Linux 2.6.0, alignment to the logical block size
> of the   underlying storage (typically 512 bytes) suffices.  The
> logical block size can be determined using the ioctl(2) BLKSSZGET operation
> or from the shell using the command:   blockdev --getss*
>
>
> -Krutika
>
>
> On Tue, Jun 6, 2017 at 1:18 AM, Abi Askushi 
> wrote:
>
>> Also when testing with dd i get the following:
>>
>> *Testing on the gluster mount: *
>> dd if=/dev/zero 
>> of=/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/test2.img
>> oflag=direct bs=512 count=1
>> dd: error writing β/rhev/data-center/mnt/glusterSD/10.100.100.1:
>> _engine/test2.imgβ: *Transport endpoint is not connected*
>> 1+0 records in
>> 0+0 records out
>> 0 bytes (0 B) copied, 0.00336755 s, 0.0 kB/s
>>
>> *Testing on the /root directory (XFS): *
>> dd if=/dev/zero of=/test2.img oflag=direct bs=512 count=1
>> dd: error writing β/test2.imgβ:* Invalid argument*
>> 1+0 records in
>> 0+0 records out
>> 0 bytes (0 B) copied, 0.000321239 s, 0.0 kB/s
>>
>> Seems that the gluster is trying to do the same and fails.
>>
>>
>>
>> On Mon, Jun 5, 2017 at 10:10 PM, Abi Askushi 
>> wrote:
>>
>>> The question that rises is what is needed to make gluster aware of the
>>> 4K physical sectors presented to it (the logical sector is also 4K). The
>>> offset (127488) at the log does not seem aligned at 4K.
>>>
>>> Alex
>>>
>>> On Mon, Jun 5, 2017 at 2:47 PM, Abi Askushi 
>>> wrote:
>>>
 Hi Krutika,

 I am saying that I am facing this issue with 4k drives. I never
 encountered this issue with 512 drives.

 Alex

 On Jun 5, 2017 14:26, "Krutika Dhananjay"  wrote:

> This seems like a case of O_DIRECT reads and writes gone wrong,
> judging by the 'Invalid argument' errors.
>
> The two operations that have failed on gluster bricks are:
>
> [2017-06-05 09:40:39.428979] E [MSGID: 113072]
> [posix.c:3453:posix_writev] 0-engine-posix: write failed: offset 0,
> [Invalid argument]
> [2017-06-05 09:41:00.865760] E [MSGID: 113040]
> [posix.c:3178:posix_readv] 0-engine-posix: read failed on
> gfid=8c94f658-ac3c-4e3a-b368-8c038513a914, fd=0x7f408584c06c,
> offset=127488 size=512, buf=0x7f4083c0b000 [Invalid argument]
>
> But then, both the write and the read have 512byte-aligned offset,
> size and buf address (which is correct).
>
> Are you saying you don't see this issue with 4K block-size?
>
> -Krutika
>
> On Mon, Jun 5, 2017 at 3:21 PM, Abi Askushi 
> wrote:
>
>> Hi Sahina,
>>
>> Attached are the logs. Let me know if sth else is needed.
>>
>> I have 5 disks (with 4K physical sector) in RAID5. The RAID has 64K
>> stripe size at the moment.
>> I have prepared the storage as below:
>>
>> pvcreate --dataalignment 256K /dev/sda4
>> vgcreate --physicalextentsize 256K gluster /dev/sda4
>>
>> lvcreate -n engine --size 120G gluster
>> mkfs.xfs -f -i size=512 /dev/gluster/engine
>>
>> Thanx,
>> Alex
>>
>> On Mon, Jun 5, 2017 at 12:14 PM, Sahina Bose 
>> wrote:
>>
>>> Can we have the gluster mount logs and brick logs to check if it's
>>> the same issue?
>>>
>>> On Sun, Jun 4, 2017 at 11:21 PM, Abi Askushi <
>>> rightkickt...@gmail.com> wrote:
>>>
 I clean installed everything and ran into the same.
 I then ran gdeploy and encountered the same issue when deploying
 engine.
 Seems that gluster (?) doesn't like 4K sector drives. I am not sure
 if it has to do with 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-06 Thread Krutika Dhananjay
OK.

So for the 'Transport endpoint is not connected' issue, could you share the
mount and brick logs?

Hmmm.. 'Invalid argument' error even on the root partition. What if you
change bs to 4096 and run?

The logs I showed in my earlier mail shows that gluster is merely returning
the error it got from the disk file system where the
brick is hosted. But you're right about the fact that the offset 127488 is
not 4K-aligned.

If the dd on /root worked for you with bs=4096, could you try the same
directly on gluster mount point on a dummy file and capture the strace
output of dd?
You can perhaps reuse your existing gluster volume by mounting it at
another location and doing the dd.
Here's what you need to execute:

strace -ff -T -p  -o
`

FWIW, here's something I found in man(2) open:




*Under  Linux  2.4,  transfer  sizes,  and  the alignment of the user
buffer and the file offset must all be multiples of the logical block size
of the filesystem.  Since Linux 2.6.0, alignment to the logical block size
of the   underlying storage (typically 512 bytes) suffices.  The
logical block size can be determined using the ioctl(2) BLKSSZGET operation
or from the shell using the command:   blockdev --getss*


-Krutika


On Tue, Jun 6, 2017 at 1:18 AM, Abi Askushi  wrote:

> Also when testing with dd i get the following:
>
> *Testing on the gluster mount: *
> dd if=/dev/zero 
> of=/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/test2.img
> oflag=direct bs=512 count=1
> dd: error writing 
> β/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/test2.imgβ:
> *Transport endpoint is not connected*
> 1+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 0.00336755 s, 0.0 kB/s
>
> *Testing on the /root directory (XFS): *
> dd if=/dev/zero of=/test2.img oflag=direct bs=512 count=1
> dd: error writing β/test2.imgβ:* Invalid argument*
> 1+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 0.000321239 s, 0.0 kB/s
>
> Seems that the gluster is trying to do the same and fails.
>
>
>
> On Mon, Jun 5, 2017 at 10:10 PM, Abi Askushi 
> wrote:
>
>> The question that rises is what is needed to make gluster aware of the 4K
>> physical sectors presented to it (the logical sector is also 4K). The
>> offset (127488) at the log does not seem aligned at 4K.
>>
>> Alex
>>
>> On Mon, Jun 5, 2017 at 2:47 PM, Abi Askushi 
>> wrote:
>>
>>> Hi Krutika,
>>>
>>> I am saying that I am facing this issue with 4k drives. I never
>>> encountered this issue with 512 drives.
>>>
>>> Alex
>>>
>>> On Jun 5, 2017 14:26, "Krutika Dhananjay"  wrote:
>>>
 This seems like a case of O_DIRECT reads and writes gone wrong, judging
 by the 'Invalid argument' errors.

 The two operations that have failed on gluster bricks are:

 [2017-06-05 09:40:39.428979] E [MSGID: 113072]
 [posix.c:3453:posix_writev] 0-engine-posix: write failed: offset 0,
 [Invalid argument]
 [2017-06-05 09:41:00.865760] E [MSGID: 113040]
 [posix.c:3178:posix_readv] 0-engine-posix: read failed on
 gfid=8c94f658-ac3c-4e3a-b368-8c038513a914, fd=0x7f408584c06c,
 offset=127488 size=512, buf=0x7f4083c0b000 [Invalid argument]

 But then, both the write and the read have 512byte-aligned offset, size
 and buf address (which is correct).

 Are you saying you don't see this issue with 4K block-size?

 -Krutika

 On Mon, Jun 5, 2017 at 3:21 PM, Abi Askushi 
 wrote:

> Hi Sahina,
>
> Attached are the logs. Let me know if sth else is needed.
>
> I have 5 disks (with 4K physical sector) in RAID5. The RAID has 64K
> stripe size at the moment.
> I have prepared the storage as below:
>
> pvcreate --dataalignment 256K /dev/sda4
> vgcreate --physicalextentsize 256K gluster /dev/sda4
>
> lvcreate -n engine --size 120G gluster
> mkfs.xfs -f -i size=512 /dev/gluster/engine
>
> Thanx,
> Alex
>
> On Mon, Jun 5, 2017 at 12:14 PM, Sahina Bose 
> wrote:
>
>> Can we have the gluster mount logs and brick logs to check if it's
>> the same issue?
>>
>> On Sun, Jun 4, 2017 at 11:21 PM, Abi Askushi > > wrote:
>>
>>> I clean installed everything and ran into the same.
>>> I then ran gdeploy and encountered the same issue when deploying
>>> engine.
>>> Seems that gluster (?) doesn't like 4K sector drives. I am not sure
>>> if it has to do with alignment. The weird thing is that gluster volumes 
>>> are
>>> all ok, replicating normally and no split brain is reported.
>>>
>>> The solution to the mentioned bug (1386443
>>> ) was to
>>> format with 512 sector size, which for my case is not an option:
>>>
>>> mkfs.xfs -f -i size=512 -s size=512 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-05 Thread Abi Askushi
Also when testing with dd i get the following:

*Testing on the gluster mount: *
dd if=/dev/zero
of=/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/test2.img
oflag=direct bs=512 count=1
dd: error writing
β/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/test2.imgβ:
*Transport endpoint is not connected*
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00336755 s, 0.0 kB/s

*Testing on the /root directory (XFS): *
dd if=/dev/zero of=/test2.img oflag=direct bs=512 count=1
dd: error writing β/test2.imgβ:* Invalid argument*
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000321239 s, 0.0 kB/s

Seems that the gluster is trying to do the same and fails.



On Mon, Jun 5, 2017 at 10:10 PM, Abi Askushi 
wrote:

> The question that rises is what is needed to make gluster aware of the 4K
> physical sectors presented to it (the logical sector is also 4K). The
> offset (127488) at the log does not seem aligned at 4K.
>
> Alex
>
> On Mon, Jun 5, 2017 at 2:47 PM, Abi Askushi 
> wrote:
>
>> Hi Krutika,
>>
>> I am saying that I am facing this issue with 4k drives. I never
>> encountered this issue with 512 drives.
>>
>> Alex
>>
>> On Jun 5, 2017 14:26, "Krutika Dhananjay"  wrote:
>>
>>> This seems like a case of O_DIRECT reads and writes gone wrong, judging
>>> by the 'Invalid argument' errors.
>>>
>>> The two operations that have failed on gluster bricks are:
>>>
>>> [2017-06-05 09:40:39.428979] E [MSGID: 113072]
>>> [posix.c:3453:posix_writev] 0-engine-posix: write failed: offset 0,
>>> [Invalid argument]
>>> [2017-06-05 09:41:00.865760] E [MSGID: 113040]
>>> [posix.c:3178:posix_readv] 0-engine-posix: read failed on
>>> gfid=8c94f658-ac3c-4e3a-b368-8c038513a914, fd=0x7f408584c06c,
>>> offset=127488 size=512, buf=0x7f4083c0b000 [Invalid argument]
>>>
>>> But then, both the write and the read have 512byte-aligned offset, size
>>> and buf address (which is correct).
>>>
>>> Are you saying you don't see this issue with 4K block-size?
>>>
>>> -Krutika
>>>
>>> On Mon, Jun 5, 2017 at 3:21 PM, Abi Askushi 
>>> wrote:
>>>
 Hi Sahina,

 Attached are the logs. Let me know if sth else is needed.

 I have 5 disks (with 4K physical sector) in RAID5. The RAID has 64K
 stripe size at the moment.
 I have prepared the storage as below:

 pvcreate --dataalignment 256K /dev/sda4
 vgcreate --physicalextentsize 256K gluster /dev/sda4

 lvcreate -n engine --size 120G gluster
 mkfs.xfs -f -i size=512 /dev/gluster/engine

 Thanx,
 Alex

 On Mon, Jun 5, 2017 at 12:14 PM, Sahina Bose  wrote:

> Can we have the gluster mount logs and brick logs to check if it's the
> same issue?
>
> On Sun, Jun 4, 2017 at 11:21 PM, Abi Askushi 
> wrote:
>
>> I clean installed everything and ran into the same.
>> I then ran gdeploy and encountered the same issue when deploying
>> engine.
>> Seems that gluster (?) doesn't like 4K sector drives. I am not sure
>> if it has to do with alignment. The weird thing is that gluster volumes 
>> are
>> all ok, replicating normally and no split brain is reported.
>>
>> The solution to the mentioned bug (1386443
>> ) was to format
>> with 512 sector size, which for my case is not an option:
>>
>> mkfs.xfs -f -i size=512 -s size=512 /dev/gluster/engine
>> illegal sector size 512; hw sector is 4096
>>
>> Is there any workaround to address this?
>>
>> Thanx,
>> Alex
>>
>>
>> On Sun, Jun 4, 2017 at 5:48 PM, Abi Askushi 
>> wrote:
>>
>>> Hi Maor,
>>>
>>> My disk are of 4K block size and from this bug seems that gluster
>>> replica needs 512B block size.
>>> Is there a way to make gluster function with 4K drives?
>>>
>>> Thank you!
>>>
>>> On Sun, Jun 4, 2017 at 2:34 PM, Maor Lipchuk 
>>> wrote:
>>>
 Hi Alex,

 I saw a bug that might be related to the issue you encountered at
 https://bugzilla.redhat.com/show_bug.cgi?id=1386443

 Sahina, maybe you have any advise? Do you think that BZ1386443is
 related?

 Regards,
 Maor

 On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi <
 rightkickt...@gmail.com> wrote:
 > Hi All,
 >
 > I have installed successfully several times oVirt (version 4.1)
 with 3 nodes
 > on top glusterfs.
 >
 > This time, when trying to configure the same setup, I am facing
 the
 > following issue which doesn't seem to go away. During
 installation i get the
 > error:
 >
 > Failed to execute stage 'Misc configuration': Cannot 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-05 Thread Abi Askushi
The question that rises is what is needed to make gluster aware of the 4K
physical sectors presented to it (the logical sector is also 4K). The
offset (127488) at the log does not seem aligned at 4K.

Alex

On Mon, Jun 5, 2017 at 2:47 PM, Abi Askushi  wrote:

> Hi Krutika,
>
> I am saying that I am facing this issue with 4k drives. I never
> encountered this issue with 512 drives.
>
> Alex
>
> On Jun 5, 2017 14:26, "Krutika Dhananjay"  wrote:
>
>> This seems like a case of O_DIRECT reads and writes gone wrong, judging
>> by the 'Invalid argument' errors.
>>
>> The two operations that have failed on gluster bricks are:
>>
>> [2017-06-05 09:40:39.428979] E [MSGID: 113072]
>> [posix.c:3453:posix_writev] 0-engine-posix: write failed: offset 0,
>> [Invalid argument]
>> [2017-06-05 09:41:00.865760] E [MSGID: 113040] [posix.c:3178:posix_readv]
>> 0-engine-posix: read failed on gfid=8c94f658-ac3c-4e3a-b368-8c038513a914,
>> fd=0x7f408584c06c, offset=127488 size=512, buf=0x7f4083c0b000 [Invalid
>> argument]
>>
>> But then, both the write and the read have 512byte-aligned offset, size
>> and buf address (which is correct).
>>
>> Are you saying you don't see this issue with 4K block-size?
>>
>> -Krutika
>>
>> On Mon, Jun 5, 2017 at 3:21 PM, Abi Askushi 
>> wrote:
>>
>>> Hi Sahina,
>>>
>>> Attached are the logs. Let me know if sth else is needed.
>>>
>>> I have 5 disks (with 4K physical sector) in RAID5. The RAID has 64K
>>> stripe size at the moment.
>>> I have prepared the storage as below:
>>>
>>> pvcreate --dataalignment 256K /dev/sda4
>>> vgcreate --physicalextentsize 256K gluster /dev/sda4
>>>
>>> lvcreate -n engine --size 120G gluster
>>> mkfs.xfs -f -i size=512 /dev/gluster/engine
>>>
>>> Thanx,
>>> Alex
>>>
>>> On Mon, Jun 5, 2017 at 12:14 PM, Sahina Bose  wrote:
>>>
 Can we have the gluster mount logs and brick logs to check if it's the
 same issue?

 On Sun, Jun 4, 2017 at 11:21 PM, Abi Askushi 
 wrote:

> I clean installed everything and ran into the same.
> I then ran gdeploy and encountered the same issue when deploying
> engine.
> Seems that gluster (?) doesn't like 4K sector drives. I am not sure if
> it has to do with alignment. The weird thing is that gluster volumes are
> all ok, replicating normally and no split brain is reported.
>
> The solution to the mentioned bug (1386443
> ) was to format
> with 512 sector size, which for my case is not an option:
>
> mkfs.xfs -f -i size=512 -s size=512 /dev/gluster/engine
> illegal sector size 512; hw sector is 4096
>
> Is there any workaround to address this?
>
> Thanx,
> Alex
>
>
> On Sun, Jun 4, 2017 at 5:48 PM, Abi Askushi 
> wrote:
>
>> Hi Maor,
>>
>> My disk are of 4K block size and from this bug seems that gluster
>> replica needs 512B block size.
>> Is there a way to make gluster function with 4K drives?
>>
>> Thank you!
>>
>> On Sun, Jun 4, 2017 at 2:34 PM, Maor Lipchuk 
>> wrote:
>>
>>> Hi Alex,
>>>
>>> I saw a bug that might be related to the issue you encountered at
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1386443
>>>
>>> Sahina, maybe you have any advise? Do you think that BZ1386443is
>>> related?
>>>
>>> Regards,
>>> Maor
>>>
>>> On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi 
>>> wrote:
>>> > Hi All,
>>> >
>>> > I have installed successfully several times oVirt (version 4.1)
>>> with 3 nodes
>>> > on top glusterfs.
>>> >
>>> > This time, when trying to configure the same setup, I am facing the
>>> > following issue which doesn't seem to go away. During installation
>>> i get the
>>> > error:
>>> >
>>> > Failed to execute stage 'Misc configuration': Cannot acquire host
>>> id:
>>> > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
>>> 'Sanlock
>>> > lockspace add failure', 'Invalid argument'))
>>> >
>>> > The only different in this setup is that instead of standard
>>> partitioning i
>>> > have GPT partitioning and the disks have 4K block size instead of
>>> 512.
>>> >
>>> > The /var/log/sanlock.log has the following lines:
>>> >
>>> > 2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace
>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/m
>>> nt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862-c2b8
>>> -46e7-b2c8-91e4a5bb2047/dom_md/ids:0
>>> > 2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource
>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/m
>>> nt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd862-c2b
>>> 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-05 Thread Abi Askushi
Hi Krutika,

I am saying that I am facing this issue with 4k drives. I never encountered
this issue with 512 drives.

Alex

On Jun 5, 2017 14:26, "Krutika Dhananjay"  wrote:

> This seems like a case of O_DIRECT reads and writes gone wrong, judging by
> the 'Invalid argument' errors.
>
> The two operations that have failed on gluster bricks are:
>
> [2017-06-05 09:40:39.428979] E [MSGID: 113072] [posix.c:3453:posix_writev]
> 0-engine-posix: write failed: offset 0, [Invalid argument]
> [2017-06-05 09:41:00.865760] E [MSGID: 113040] [posix.c:3178:posix_readv]
> 0-engine-posix: read failed on gfid=8c94f658-ac3c-4e3a-b368-8c038513a914,
> fd=0x7f408584c06c, offset=127488 size=512, buf=0x7f4083c0b000 [Invalid
> argument]
>
> But then, both the write and the read have 512byte-aligned offset, size
> and buf address (which is correct).
>
> Are you saying you don't see this issue with 4K block-size?
>
> -Krutika
>
> On Mon, Jun 5, 2017 at 3:21 PM, Abi Askushi 
> wrote:
>
>> Hi Sahina,
>>
>> Attached are the logs. Let me know if sth else is needed.
>>
>> I have 5 disks (with 4K physical sector) in RAID5. The RAID has 64K
>> stripe size at the moment.
>> I have prepared the storage as below:
>>
>> pvcreate --dataalignment 256K /dev/sda4
>> vgcreate --physicalextentsize 256K gluster /dev/sda4
>>
>> lvcreate -n engine --size 120G gluster
>> mkfs.xfs -f -i size=512 /dev/gluster/engine
>>
>> Thanx,
>> Alex
>>
>> On Mon, Jun 5, 2017 at 12:14 PM, Sahina Bose  wrote:
>>
>>> Can we have the gluster mount logs and brick logs to check if it's the
>>> same issue?
>>>
>>> On Sun, Jun 4, 2017 at 11:21 PM, Abi Askushi 
>>> wrote:
>>>
 I clean installed everything and ran into the same.
 I then ran gdeploy and encountered the same issue when deploying
 engine.
 Seems that gluster (?) doesn't like 4K sector drives. I am not sure if
 it has to do with alignment. The weird thing is that gluster volumes are
 all ok, replicating normally and no split brain is reported.

 The solution to the mentioned bug (1386443
 ) was to format
 with 512 sector size, which for my case is not an option:

 mkfs.xfs -f -i size=512 -s size=512 /dev/gluster/engine
 illegal sector size 512; hw sector is 4096

 Is there any workaround to address this?

 Thanx,
 Alex


 On Sun, Jun 4, 2017 at 5:48 PM, Abi Askushi 
 wrote:

> Hi Maor,
>
> My disk are of 4K block size and from this bug seems that gluster
> replica needs 512B block size.
> Is there a way to make gluster function with 4K drives?
>
> Thank you!
>
> On Sun, Jun 4, 2017 at 2:34 PM, Maor Lipchuk 
> wrote:
>
>> Hi Alex,
>>
>> I saw a bug that might be related to the issue you encountered at
>> https://bugzilla.redhat.com/show_bug.cgi?id=1386443
>>
>> Sahina, maybe you have any advise? Do you think that BZ1386443is
>> related?
>>
>> Regards,
>> Maor
>>
>> On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi 
>> wrote:
>> > Hi All,
>> >
>> > I have installed successfully several times oVirt (version 4.1)
>> with 3 nodes
>> > on top glusterfs.
>> >
>> > This time, when trying to configure the same setup, I am facing the
>> > following issue which doesn't seem to go away. During installation
>> i get the
>> > error:
>> >
>> > Failed to execute stage 'Misc configuration': Cannot acquire host
>> id:
>> > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
>> 'Sanlock
>> > lockspace add failure', 'Invalid argument'))
>> >
>> > The only different in this setup is that instead of standard
>> partitioning i
>> > have GPT partitioning and the disks have 4K block size instead of
>> 512.
>> >
>> > The /var/log/sanlock.log has the following lines:
>> >
>> > 2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace
>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/m
>> nt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862-c2b8
>> -46e7-b2c8-91e4a5bb2047/dom_md/ids:0
>> > 2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource
>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/m
>> nt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd862-c2b
>> 8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576
>> > for 2,9,23040
>> > 2017-06-03 19:21:36+0200 23471 [943]: s10 lockspace
>> > a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-center/m
>> nt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8
>> b4d5e5e922/dom_md/ids:0
>> > 2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7 aio collect RD
>> > 0x7f59b8c0:0x7f59b8d0:0x7f59b0101000 result -22:0 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-05 Thread Krutika Dhananjay
This seems like a case of O_DIRECT reads and writes gone wrong, judging by
the 'Invalid argument' errors.

The two operations that have failed on gluster bricks are:

[2017-06-05 09:40:39.428979] E [MSGID: 113072] [posix.c:3453:posix_writev]
0-engine-posix: write failed: offset 0, [Invalid argument]
[2017-06-05 09:41:00.865760] E [MSGID: 113040] [posix.c:3178:posix_readv]
0-engine-posix: read failed on gfid=8c94f658-ac3c-4e3a-b368-8c038513a914,
fd=0x7f408584c06c, offset=127488 size=512, buf=0x7f4083c0b000 [Invalid
argument]

But then, both the write and the read have 512byte-aligned offset, size and
buf address (which is correct).

Are you saying you don't see this issue with 4K block-size?

-Krutika

On Mon, Jun 5, 2017 at 3:21 PM, Abi Askushi  wrote:

> Hi Sahina,
>
> Attached are the logs. Let me know if sth else is needed.
>
> I have 5 disks (with 4K physical sector) in RAID5. The RAID has 64K stripe
> size at the moment.
> I have prepared the storage as below:
>
> pvcreate --dataalignment 256K /dev/sda4
> vgcreate --physicalextentsize 256K gluster /dev/sda4
>
> lvcreate -n engine --size 120G gluster
> mkfs.xfs -f -i size=512 /dev/gluster/engine
>
> Thanx,
> Alex
>
> On Mon, Jun 5, 2017 at 12:14 PM, Sahina Bose  wrote:
>
>> Can we have the gluster mount logs and brick logs to check if it's the
>> same issue?
>>
>> On Sun, Jun 4, 2017 at 11:21 PM, Abi Askushi 
>> wrote:
>>
>>> I clean installed everything and ran into the same.
>>> I then ran gdeploy and encountered the same issue when deploying engine.
>>> Seems that gluster (?) doesn't like 4K sector drives. I am not sure if
>>> it has to do with alignment. The weird thing is that gluster volumes are
>>> all ok, replicating normally and no split brain is reported.
>>>
>>> The solution to the mentioned bug (1386443
>>> ) was to format
>>> with 512 sector size, which for my case is not an option:
>>>
>>> mkfs.xfs -f -i size=512 -s size=512 /dev/gluster/engine
>>> illegal sector size 512; hw sector is 4096
>>>
>>> Is there any workaround to address this?
>>>
>>> Thanx,
>>> Alex
>>>
>>>
>>> On Sun, Jun 4, 2017 at 5:48 PM, Abi Askushi 
>>> wrote:
>>>
 Hi Maor,

 My disk are of 4K block size and from this bug seems that gluster
 replica needs 512B block size.
 Is there a way to make gluster function with 4K drives?

 Thank you!

 On Sun, Jun 4, 2017 at 2:34 PM, Maor Lipchuk 
 wrote:

> Hi Alex,
>
> I saw a bug that might be related to the issue you encountered at
> https://bugzilla.redhat.com/show_bug.cgi?id=1386443
>
> Sahina, maybe you have any advise? Do you think that BZ1386443is
> related?
>
> Regards,
> Maor
>
> On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi 
> wrote:
> > Hi All,
> >
> > I have installed successfully several times oVirt (version 4.1) with
> 3 nodes
> > on top glusterfs.
> >
> > This time, when trying to configure the same setup, I am facing the
> > following issue which doesn't seem to go away. During installation i
> get the
> > error:
> >
> > Failed to execute stage 'Misc configuration': Cannot acquire host id:
> > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
> 'Sanlock
> > lockspace add failure', 'Invalid argument'))
> >
> > The only different in this setup is that instead of standard
> partitioning i
> > have GPT partitioning and the disks have 4K block size instead of
> 512.
> >
> > The /var/log/sanlock.log has the following lines:
> >
> > 2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace
> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/m
> nt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862-c2b8
> -46e7-b2c8-91e4a5bb2047/dom_md/ids:0
> > 2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource
> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/m
> nt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd862-c2b
> 8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576
> > for 2,9,23040
> > 2017-06-03 19:21:36+0200 23471 [943]: s10 lockspace
> > a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-center/m
> nt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8
> b4d5e5e922/dom_md/ids:0
> > 2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7 aio collect RD
> > 0x7f59b8c0:0x7f59b8d0:0x7f59b0101000 result -22:0 match res
> > 2017-06-03 19:21:36+0200 23471 [23522]: read_sectors delta_leader
> offset
> > 127488 rv -22
> > /rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e
> 7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids
> > 2017-06-03 19:21:37+0200 23472 [930]: s9 host 250 1 23450
> > 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-05 Thread Abi Askushi
Hi Sahina,

Attached are the logs. Let me know if sth else is needed.

I have 5 disks (with 4K physical sector) in RAID5. The RAID has 64K stripe
size at the moment.
I have prepared the storage as below:

pvcreate --dataalignment 256K /dev/sda4
vgcreate --physicalextentsize 256K gluster /dev/sda4

lvcreate -n engine --size 120G gluster
mkfs.xfs -f -i size=512 /dev/gluster/engine

Thanx,
Alex

On Mon, Jun 5, 2017 at 12:14 PM, Sahina Bose  wrote:

> Can we have the gluster mount logs and brick logs to check if it's the
> same issue?
>
> On Sun, Jun 4, 2017 at 11:21 PM, Abi Askushi 
> wrote:
>
>> I clean installed everything and ran into the same.
>> I then ran gdeploy and encountered the same issue when deploying engine.
>> Seems that gluster (?) doesn't like 4K sector drives. I am not sure if it
>> has to do with alignment. The weird thing is that gluster volumes are all
>> ok, replicating normally and no split brain is reported.
>>
>> The solution to the mentioned bug (1386443
>> ) was to format
>> with 512 sector size, which for my case is not an option:
>>
>> mkfs.xfs -f -i size=512 -s size=512 /dev/gluster/engine
>> illegal sector size 512; hw sector is 4096
>>
>> Is there any workaround to address this?
>>
>> Thanx,
>> Alex
>>
>>
>> On Sun, Jun 4, 2017 at 5:48 PM, Abi Askushi 
>> wrote:
>>
>>> Hi Maor,
>>>
>>> My disk are of 4K block size and from this bug seems that gluster
>>> replica needs 512B block size.
>>> Is there a way to make gluster function with 4K drives?
>>>
>>> Thank you!
>>>
>>> On Sun, Jun 4, 2017 at 2:34 PM, Maor Lipchuk 
>>> wrote:
>>>
 Hi Alex,

 I saw a bug that might be related to the issue you encountered at
 https://bugzilla.redhat.com/show_bug.cgi?id=1386443

 Sahina, maybe you have any advise? Do you think that BZ1386443is
 related?

 Regards,
 Maor

 On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi 
 wrote:
 > Hi All,
 >
 > I have installed successfully several times oVirt (version 4.1) with
 3 nodes
 > on top glusterfs.
 >
 > This time, when trying to configure the same setup, I am facing the
 > following issue which doesn't seem to go away. During installation i
 get the
 > error:
 >
 > Failed to execute stage 'Misc configuration': Cannot acquire host id:
 > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
 'Sanlock
 > lockspace add failure', 'Invalid argument'))
 >
 > The only different in this setup is that instead of standard
 partitioning i
 > have GPT partitioning and the disks have 4K block size instead of 512.
 >
 > The /var/log/sanlock.log has the following lines:
 >
 > 2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace
 > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/m
 nt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862-c2b8
 -46e7-b2c8-91e4a5bb2047/dom_md/ids:0
 > 2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource
 > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/m
 nt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd862-c2b
 8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576
 > for 2,9,23040
 > 2017-06-03 19:21:36+0200 23471 [943]: s10 lockspace
 > a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-center/m
 nt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8
 b4d5e5e922/dom_md/ids:0
 > 2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7 aio collect RD
 > 0x7f59b8c0:0x7f59b8d0:0x7f59b0101000 result -22:0 match res
 > 2017-06-03 19:21:36+0200 23471 [23522]: read_sectors delta_leader
 offset
 > 127488 rv -22
 > /rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e
 7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids
 > 2017-06-03 19:21:37+0200 23472 [930]: s9 host 250 1 23450
 > 88c2244c-a782-40ed-9560-6cfa4d46f853.v0.neptune
 > 2017-06-03 19:21:37+0200 23472 [943]: s10 add_lockspace fail result
 -22
 >
 > And /var/log/vdsm/vdsm.log says:
 >
 > 2017-06-03 19:19:38,176+0200 WARN  (jsonrpc/3)
 > [storage.StorageServer.MountConnection] Using user specified
 > backup-volfile-servers option (storageServer:253)
 > 2017-06-03 19:21:12,379+0200 WARN  (periodic/1) [throttled] MOM not
 > available. (throttledlog:105)
 > 2017-06-03 19:21:12,380+0200 WARN  (periodic/1) [throttled] MOM not
 > available, KSM stats will be missing. (throttledlog:105)
 > 2017-06-03 19:21:14,714+0200 WARN  (jsonrpc/1)
 > [storage.StorageServer.MountConnection] Using user specified
 > backup-volfile-servers option (storageServer:253)
 > 2017-06-03 19:21:15,515+0200 ERROR (jsonrpc/4) [storage.initSANLock]
 Cannot
 > initialize SANLock for domain a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922
 > 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-05 Thread Sahina Bose
Can we have the gluster mount logs and brick logs to check if it's the same
issue?

On Sun, Jun 4, 2017 at 11:21 PM, Abi Askushi 
wrote:

> I clean installed everything and ran into the same.
> I then ran gdeploy and encountered the same issue when deploying engine.
> Seems that gluster (?) doesn't like 4K sector drives. I am not sure if it
> has to do with alignment. The weird thing is that gluster volumes are all
> ok, replicating normally and no split brain is reported.
>
> The solution to the mentioned bug (1386443
> ) was to format with
> 512 sector size, which for my case is not an option:
>
> mkfs.xfs -f -i size=512 -s size=512 /dev/gluster/engine
> illegal sector size 512; hw sector is 4096
>
> Is there any workaround to address this?
>
> Thanx,
> Alex
>
>
> On Sun, Jun 4, 2017 at 5:48 PM, Abi Askushi 
> wrote:
>
>> Hi Maor,
>>
>> My disk are of 4K block size and from this bug seems that gluster replica
>> needs 512B block size.
>> Is there a way to make gluster function with 4K drives?
>>
>> Thank you!
>>
>> On Sun, Jun 4, 2017 at 2:34 PM, Maor Lipchuk  wrote:
>>
>>> Hi Alex,
>>>
>>> I saw a bug that might be related to the issue you encountered at
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1386443
>>>
>>> Sahina, maybe you have any advise? Do you think that BZ1386443is related?
>>>
>>> Regards,
>>> Maor
>>>
>>> On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi 
>>> wrote:
>>> > Hi All,
>>> >
>>> > I have installed successfully several times oVirt (version 4.1) with 3
>>> nodes
>>> > on top glusterfs.
>>> >
>>> > This time, when trying to configure the same setup, I am facing the
>>> > following issue which doesn't seem to go away. During installation i
>>> get the
>>> > error:
>>> >
>>> > Failed to execute stage 'Misc configuration': Cannot acquire host id:
>>> > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22,
>>> 'Sanlock
>>> > lockspace add failure', 'Invalid argument'))
>>> >
>>> > The only different in this setup is that instead of standard
>>> partitioning i
>>> > have GPT partitioning and the disks have 4K block size instead of 512.
>>> >
>>> > The /var/log/sanlock.log has the following lines:
>>> >
>>> > 2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace
>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/m
>>> nt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862-
>>> c2b8-46e7-b2c8-91e4a5bb2047/dom_md/ids:0
>>> > 2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource
>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/m
>>> nt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd862-
>>> c2b8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576
>>> > for 2,9,23040
>>> > 2017-06-03 19:21:36+0200 23471 [943]: s10 lockspace
>>> > a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-center/m
>>> nt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-
>>> c8b4d5e5e922/dom_md/ids:0
>>> > 2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7 aio collect RD
>>> > 0x7f59b8c0:0x7f59b8d0:0x7f59b0101000 result -22:0 match res
>>> > 2017-06-03 19:21:36+0200 23471 [23522]: read_sectors delta_leader
>>> offset
>>> > 127488 rv -22
>>> > /rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e
>>> 7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids
>>> > 2017-06-03 19:21:37+0200 23472 [930]: s9 host 250 1 23450
>>> > 88c2244c-a782-40ed-9560-6cfa4d46f853.v0.neptune
>>> > 2017-06-03 19:21:37+0200 23472 [943]: s10 add_lockspace fail result -22
>>> >
>>> > And /var/log/vdsm/vdsm.log says:
>>> >
>>> > 2017-06-03 19:19:38,176+0200 WARN  (jsonrpc/3)
>>> > [storage.StorageServer.MountConnection] Using user specified
>>> > backup-volfile-servers option (storageServer:253)
>>> > 2017-06-03 19:21:12,379+0200 WARN  (periodic/1) [throttled] MOM not
>>> > available. (throttledlog:105)
>>> > 2017-06-03 19:21:12,380+0200 WARN  (periodic/1) [throttled] MOM not
>>> > available, KSM stats will be missing. (throttledlog:105)
>>> > 2017-06-03 19:21:14,714+0200 WARN  (jsonrpc/1)
>>> > [storage.StorageServer.MountConnection] Using user specified
>>> > backup-volfile-servers option (storageServer:253)
>>> > 2017-06-03 19:21:15,515+0200 ERROR (jsonrpc/4) [storage.initSANLock]
>>> Cannot
>>> > initialize SANLock for domain a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922
>>> > (clusterlock:238)
>>> > Traceback (most recent call last):
>>> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>>> line
>>> > 234, in initSANLock
>>> > sanlock.init_lockspace(sdUUID, idsPath)
>>> > SanlockException: (107, 'Sanlock lockspace init failure', 'Transport
>>> > endpoint is not connected')
>>> > 2017-06-03 19:21:15,515+0200 WARN  (jsonrpc/4)
>>> > [storage.StorageDomainManifest] lease did not initialize successfully
>>> > (sd:557)
>>> > Traceback (most recent call last):
>>> >   File "/usr/share/vdsm/storage/sd.py", line 552, in initDomainLock
>>> > 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-04 Thread Maor Lipchuk
On Sun, Jun 4, 2017 at 8:51 PM, Abi Askushi  wrote:
> I clean installed everything and ran into the same.
> I then ran gdeploy and encountered the same issue when deploying engine.
> Seems that gluster (?) doesn't like 4K sector drives. I am not sure if it
> has to do with alignment. The weird thing is that gluster volumes are all
> ok, replicating normally and no split brain is reported.
>
> The solution to the mentioned bug (1386443) was to format with 512 sector
> size, which for my case is not an option:
>
> mkfs.xfs -f -i size=512 -s size=512 /dev/gluster/engine
> illegal sector size 512; hw sector is 4096
>
> Is there any workaround to address this?
>
> Thanx,
> Alex
>
>
> On Sun, Jun 4, 2017 at 5:48 PM, Abi Askushi  wrote:
>>
>> Hi Maor,
>>
>> My disk are of 4K block size and from this bug seems that gluster replica
>> needs 512B block size.
>> Is there a way to make gluster function with 4K drives?
>>
>> Thank you!
>>
>> On Sun, Jun 4, 2017 at 2:34 PM, Maor Lipchuk  wrote:
>>>
>>> Hi Alex,
>>>
>>> I saw a bug that might be related to the issue you encountered at
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1386443
>>>
>>> Sahina, maybe you have any advise? Do you think that BZ1386443is related?
>>>
>>> Regards,
>>> Maor
>>>
>>> On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi 
>>> wrote:
>>> > Hi All,
>>> >
>>> > I have installed successfully several times oVirt (version 4.1) with 3
>>> > nodes
>>> > on top glusterfs.
>>> >
>>> > This time, when trying to configure the same setup, I am facing the
>>> > following issue which doesn't seem to go away. During installation i
>>> > get the
>>> > error:
>>> >
>>> > Failed to execute stage 'Misc configuration': Cannot acquire host id:
>>> > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22, 'Sanlock
>>> > lockspace add failure', 'Invalid argument'))
>>> >
>>> > The only different in this setup is that instead of standard
>>> > partitioning i
>>> > have GPT partitioning and the disks have 4K block size instead of 512.
>>> >
>>> > The /var/log/sanlock.log has the following lines:
>>> >
>>> > 2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace
>>> >
>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/mnt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047/dom_md/ids:0
>>> > 2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource
>>> >
>>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576
>>> > for 2,9,23040
>>> > 2017-06-03 19:21:36+0200 23471 [943]: s10 lockspace
>>> >
>>> > a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids:0
>>> > 2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7 aio collect RD
>>> > 0x7f59b8c0:0x7f59b8d0:0x7f59b0101000 result -22:0 match res
>>> > 2017-06-03 19:21:36+0200 23471 [23522]: read_sectors delta_leader
>>> > offset
>>> > 127488 rv -22
>>> >
>>> > /rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids
>>> > 2017-06-03 19:21:37+0200 23472 [930]: s9 host 250 1 23450
>>> > 88c2244c-a782-40ed-9560-6cfa4d46f853.v0.neptune
>>> > 2017-06-03 19:21:37+0200 23472 [943]: s10 add_lockspace fail result -22
>>> >
>>> > And /var/log/vdsm/vdsm.log says:
>>> >
>>> > 2017-06-03 19:19:38,176+0200 WARN  (jsonrpc/3)
>>> > [storage.StorageServer.MountConnection] Using user specified
>>> > backup-volfile-servers option (storageServer:253)
>>> > 2017-06-03 19:21:12,379+0200 WARN  (periodic/1) [throttled] MOM not
>>> > available. (throttledlog:105)
>>> > 2017-06-03 19:21:12,380+0200 WARN  (periodic/1) [throttled] MOM not
>>> > available, KSM stats will be missing. (throttledlog:105)
>>> > 2017-06-03 19:21:14,714+0200 WARN  (jsonrpc/1)
>>> > [storage.StorageServer.MountConnection] Using user specified
>>> > backup-volfile-servers option (storageServer:253)
>>> > 2017-06-03 19:21:15,515+0200 ERROR (jsonrpc/4) [storage.initSANLock]
>>> > Cannot
>>> > initialize SANLock for domain a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922
>>> > (clusterlock:238)
>>> > Traceback (most recent call last):
>>> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>>> > line
>>> > 234, in initSANLock
>>> > sanlock.init_lockspace(sdUUID, idsPath)
>>> > SanlockException: (107, 'Sanlock lockspace init failure', 'Transport
>>> > endpoint is not connected')
>>> > 2017-06-03 19:21:15,515+0200 WARN  (jsonrpc/4)
>>> > [storage.StorageDomainManifest] lease did not initialize successfully
>>> > (sd:557)
>>> > Traceback (most recent call last):
>>> >   File "/usr/share/vdsm/storage/sd.py", line 552, in initDomainLock
>>> > self._domainLock.initLock(self.getDomainLease())
>>> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>>> > line
>>> > 271, 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-04 Thread Abi Askushi
I clean installed everything and ran into the same.
I then ran gdeploy and encountered the same issue when deploying engine.
Seems that gluster (?) doesn't like 4K sector drives. I am not sure if it
has to do with alignment. The weird thing is that gluster volumes are all
ok, replicating normally and no split brain is reported.

The solution to the mentioned bug (1386443
) was to format with
512 sector size, which for my case is not an option:

mkfs.xfs -f -i size=512 -s size=512 /dev/gluster/engine
illegal sector size 512; hw sector is 4096

Is there any workaround to address this?

Thanx,
Alex


On Sun, Jun 4, 2017 at 5:48 PM, Abi Askushi  wrote:

> Hi Maor,
>
> My disk are of 4K block size and from this bug seems that gluster replica
> needs 512B block size.
> Is there a way to make gluster function with 4K drives?
>
> Thank you!
>
> On Sun, Jun 4, 2017 at 2:34 PM, Maor Lipchuk  wrote:
>
>> Hi Alex,
>>
>> I saw a bug that might be related to the issue you encountered at
>> https://bugzilla.redhat.com/show_bug.cgi?id=1386443
>>
>> Sahina, maybe you have any advise? Do you think that BZ1386443is related?
>>
>> Regards,
>> Maor
>>
>> On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi 
>> wrote:
>> > Hi All,
>> >
>> > I have installed successfully several times oVirt (version 4.1) with 3
>> nodes
>> > on top glusterfs.
>> >
>> > This time, when trying to configure the same setup, I am facing the
>> > following issue which doesn't seem to go away. During installation i
>> get the
>> > error:
>> >
>> > Failed to execute stage 'Misc configuration': Cannot acquire host id:
>> > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22, 'Sanlock
>> > lockspace add failure', 'Invalid argument'))
>> >
>> > The only different in this setup is that instead of standard
>> partitioning i
>> > have GPT partitioning and the disks have 4K block size instead of 512.
>> >
>> > The /var/log/sanlock.log has the following lines:
>> >
>> > 2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace
>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/
>> mnt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862
>> -c2b8-46e7-b2c8-91e4a5bb2047/dom_md/ids:0
>> > 2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource
>> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/
>> mnt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd86
>> 2-c2b8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576
>> > for 2,9,23040
>> > 2017-06-03 19:21:36+0200 23471 [943]: s10 lockspace
>> > a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-center/
>> mnt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-
>> 8e26-c8b4d5e5e922/dom_md/ids:0
>> > 2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7 aio collect RD
>> > 0x7f59b8c0:0x7f59b8d0:0x7f59b0101000 result -22:0 match res
>> > 2017-06-03 19:21:36+0200 23471 [23522]: read_sectors delta_leader offset
>> > 127488 rv -22
>> > /rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/
>> a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids
>> > 2017-06-03 19:21:37+0200 23472 [930]: s9 host 250 1 23450
>> > 88c2244c-a782-40ed-9560-6cfa4d46f853.v0.neptune
>> > 2017-06-03 19:21:37+0200 23472 [943]: s10 add_lockspace fail result -22
>> >
>> > And /var/log/vdsm/vdsm.log says:
>> >
>> > 2017-06-03 19:19:38,176+0200 WARN  (jsonrpc/3)
>> > [storage.StorageServer.MountConnection] Using user specified
>> > backup-volfile-servers option (storageServer:253)
>> > 2017-06-03 19:21:12,379+0200 WARN  (periodic/1) [throttled] MOM not
>> > available. (throttledlog:105)
>> > 2017-06-03 19:21:12,380+0200 WARN  (periodic/1) [throttled] MOM not
>> > available, KSM stats will be missing. (throttledlog:105)
>> > 2017-06-03 19:21:14,714+0200 WARN  (jsonrpc/1)
>> > [storage.StorageServer.MountConnection] Using user specified
>> > backup-volfile-servers option (storageServer:253)
>> > 2017-06-03 19:21:15,515+0200 ERROR (jsonrpc/4) [storage.initSANLock]
>> Cannot
>> > initialize SANLock for domain a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922
>> > (clusterlock:238)
>> > Traceback (most recent call last):
>> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>> line
>> > 234, in initSANLock
>> > sanlock.init_lockspace(sdUUID, idsPath)
>> > SanlockException: (107, 'Sanlock lockspace init failure', 'Transport
>> > endpoint is not connected')
>> > 2017-06-03 19:21:15,515+0200 WARN  (jsonrpc/4)
>> > [storage.StorageDomainManifest] lease did not initialize successfully
>> > (sd:557)
>> > Traceback (most recent call last):
>> >   File "/usr/share/vdsm/storage/sd.py", line 552, in initDomainLock
>> > self._domainLock.initLock(self.getDomainLease())
>> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>> line
>> > 271, in initLock
>> > initSANLock(self._sdUUID, self._idsPath, lease)
>> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
>> line
>> > 239, 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-04 Thread Abi Askushi
Hi Maor,

My disk are of 4K block size and from this bug seems that gluster replica
needs 512B block size.
Is there a way to make gluster function with 4K drives?

Thank you!

On Sun, Jun 4, 2017 at 2:34 PM, Maor Lipchuk  wrote:

> Hi Alex,
>
> I saw a bug that might be related to the issue you encountered at
> https://bugzilla.redhat.com/show_bug.cgi?id=1386443
>
> Sahina, maybe you have any advise? Do you think that BZ1386443is related?
>
> Regards,
> Maor
>
> On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi 
> wrote:
> > Hi All,
> >
> > I have installed successfully several times oVirt (version 4.1) with 3
> nodes
> > on top glusterfs.
> >
> > This time, when trying to configure the same setup, I am facing the
> > following issue which doesn't seem to go away. During installation i get
> the
> > error:
> >
> > Failed to execute stage 'Misc configuration': Cannot acquire host id:
> > (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22, 'Sanlock
> > lockspace add failure', 'Invalid argument'))
> >
> > The only different in this setup is that instead of standard
> partitioning i
> > have GPT partitioning and the disks have 4K block size instead of 512.
> >
> > The /var/log/sanlock.log has the following lines:
> >
> > 2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace
> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-
> center/mnt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/
> ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047/dom_md/ids:0
> > 2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource
> > ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-
> center/mnt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/
> ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576
> > for 2,9,23040
> > 2017-06-03 19:21:36+0200 23471 [943]: s10 lockspace
> > a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-
> center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-
> 4838-8e26-c8b4d5e5e922/dom_md/ids:0
> > 2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7 aio collect RD
> > 0x7f59b8c0:0x7f59b8d0:0x7f59b0101000 result -22:0 match res
> > 2017-06-03 19:21:36+0200 23471 [23522]: read_sectors delta_leader offset
> > 127488 rv -22
> > /rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-
> 8e26-c8b4d5e5e922/dom_md/ids
> > 2017-06-03 19:21:37+0200 23472 [930]: s9 host 250 1 23450
> > 88c2244c-a782-40ed-9560-6cfa4d46f853.v0.neptune
> > 2017-06-03 19:21:37+0200 23472 [943]: s10 add_lockspace fail result -22
> >
> > And /var/log/vdsm/vdsm.log says:
> >
> > 2017-06-03 19:19:38,176+0200 WARN  (jsonrpc/3)
> > [storage.StorageServer.MountConnection] Using user specified
> > backup-volfile-servers option (storageServer:253)
> > 2017-06-03 19:21:12,379+0200 WARN  (periodic/1) [throttled] MOM not
> > available. (throttledlog:105)
> > 2017-06-03 19:21:12,380+0200 WARN  (periodic/1) [throttled] MOM not
> > available, KSM stats will be missing. (throttledlog:105)
> > 2017-06-03 19:21:14,714+0200 WARN  (jsonrpc/1)
> > [storage.StorageServer.MountConnection] Using user specified
> > backup-volfile-servers option (storageServer:253)
> > 2017-06-03 19:21:15,515+0200 ERROR (jsonrpc/4) [storage.initSANLock]
> Cannot
> > initialize SANLock for domain a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922
> > (clusterlock:238)
> > Traceback (most recent call last):
> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
> line
> > 234, in initSANLock
> > sanlock.init_lockspace(sdUUID, idsPath)
> > SanlockException: (107, 'Sanlock lockspace init failure', 'Transport
> > endpoint is not connected')
> > 2017-06-03 19:21:15,515+0200 WARN  (jsonrpc/4)
> > [storage.StorageDomainManifest] lease did not initialize successfully
> > (sd:557)
> > Traceback (most recent call last):
> >   File "/usr/share/vdsm/storage/sd.py", line 552, in initDomainLock
> > self._domainLock.initLock(self.getDomainLease())
> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
> line
> > 271, in initLock
> > initSANLock(self._sdUUID, self._idsPath, lease)
> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
> line
> > 239, in initSANLock
> > raise se.ClusterLockInitError()
> > ClusterLockInitError: Could not initialize cluster lock: ()
> > 2017-06-03 19:21:37,867+0200 ERROR (jsonrpc/2) [storage.StoragePool]
> Create
> > pool hosted_datacenter canceled  (sp:655)
> > Traceback (most recent call last):
> >   File "/usr/share/vdsm/storage/sp.py", line 652, in create
> > self.attachSD(sdUUID)
> >   File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py",
> line
> > 79, in wrapper
> > return method(self, *args, **kwargs)
> >   File "/usr/share/vdsm/storage/sp.py", line 971, in attachSD
> > dom.acquireHostId(self.id)
> >   File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
> > self._manifest.acquireHostId(hostId, async)
> >   File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
> > 

Re: [ovirt-users] oVirt gluster sanlock issue

2017-06-04 Thread Maor Lipchuk
Hi Alex,

I saw a bug that might be related to the issue you encountered at
https://bugzilla.redhat.com/show_bug.cgi?id=1386443

Sahina, maybe you have any advise? Do you think that BZ1386443is related?

Regards,
Maor

On Sat, Jun 3, 2017 at 8:45 PM, Abi Askushi  wrote:
> Hi All,
>
> I have installed successfully several times oVirt (version 4.1) with 3 nodes
> on top glusterfs.
>
> This time, when trying to configure the same setup, I am facing the
> following issue which doesn't seem to go away. During installation i get the
> error:
>
> Failed to execute stage 'Misc configuration': Cannot acquire host id:
> (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22, 'Sanlock
> lockspace add failure', 'Invalid argument'))
>
> The only different in this setup is that instead of standard partitioning i
> have GPT partitioning and the disks have 4K block size instead of 512.
>
> The /var/log/sanlock.log has the following lines:
>
> 2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace
> ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/mnt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047/dom_md/ids:0
> 2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource
> ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576
> for 2,9,23040
> 2017-06-03 19:21:36+0200 23471 [943]: s10 lockspace
> a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids:0
> 2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7 aio collect RD
> 0x7f59b8c0:0x7f59b8d0:0x7f59b0101000 result -22:0 match res
> 2017-06-03 19:21:36+0200 23471 [23522]: read_sectors delta_leader offset
> 127488 rv -22
> /rhev/data-center/mnt/glusterSD/10.100.100.1:_engine/a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids
> 2017-06-03 19:21:37+0200 23472 [930]: s9 host 250 1 23450
> 88c2244c-a782-40ed-9560-6cfa4d46f853.v0.neptune
> 2017-06-03 19:21:37+0200 23472 [943]: s10 add_lockspace fail result -22
>
> And /var/log/vdsm/vdsm.log says:
>
> 2017-06-03 19:19:38,176+0200 WARN  (jsonrpc/3)
> [storage.StorageServer.MountConnection] Using user specified
> backup-volfile-servers option (storageServer:253)
> 2017-06-03 19:21:12,379+0200 WARN  (periodic/1) [throttled] MOM not
> available. (throttledlog:105)
> 2017-06-03 19:21:12,380+0200 WARN  (periodic/1) [throttled] MOM not
> available, KSM stats will be missing. (throttledlog:105)
> 2017-06-03 19:21:14,714+0200 WARN  (jsonrpc/1)
> [storage.StorageServer.MountConnection] Using user specified
> backup-volfile-servers option (storageServer:253)
> 2017-06-03 19:21:15,515+0200 ERROR (jsonrpc/4) [storage.initSANLock] Cannot
> initialize SANLock for domain a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922
> (clusterlock:238)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line
> 234, in initSANLock
> sanlock.init_lockspace(sdUUID, idsPath)
> SanlockException: (107, 'Sanlock lockspace init failure', 'Transport
> endpoint is not connected')
> 2017-06-03 19:21:15,515+0200 WARN  (jsonrpc/4)
> [storage.StorageDomainManifest] lease did not initialize successfully
> (sd:557)
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/sd.py", line 552, in initDomainLock
> self._domainLock.initLock(self.getDomainLease())
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line
> 271, in initLock
> initSANLock(self._sdUUID, self._idsPath, lease)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line
> 239, in initSANLock
> raise se.ClusterLockInitError()
> ClusterLockInitError: Could not initialize cluster lock: ()
> 2017-06-03 19:21:37,867+0200 ERROR (jsonrpc/2) [storage.StoragePool] Create
> pool hosted_datacenter canceled  (sp:655)
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/sp.py", line 652, in create
> self.attachSD(sdUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line
> 79, in wrapper
> return method(self, *args, **kwargs)
>   File "/usr/share/vdsm/storage/sp.py", line 971, in attachSD
> dom.acquireHostId(self.id)
>   File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
> self._manifest.acquireHostId(hostId, async)
>   File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
> self._domainLock.acquireHostId(hostId, async)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line
> 297, in acquireHostId
> raise se.AcquireHostIdFailure(self._sdUUID, e)
> AcquireHostIdFailure: Cannot acquire host id:
> (u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22, 'Sanlock
> lockspace add failure', 'Invalid argument'))
> 2017-06-03 19:21:37,870+0200 ERROR (jsonrpc/2) [storage.StoragePool] Domain
> ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047 

[ovirt-users] oVirt gluster sanlock issue

2017-06-03 Thread Abi Askushi
Hi All,

I have installed successfully several times oVirt (version 4.1) with 3
nodes on top glusterfs.

This time, when trying to configure the same setup, I am facing the
following issue which doesn't seem to go away. During installation i get
the error:

Failed to execute stage 'Misc configuration': Cannot acquire host id:
(u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22, 'Sanlock
lockspace add failure', 'Invalid argument'))

The only different in this setup is that instead of standard partitioning i
have GPT partitioning and the disks have 4K block size instead of 512.

The /var/log/sanlock.log has the following lines:

2017-06-03 19:21:15+0200 23450 [943]: s9 lockspace
ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:250:/rhev/data-center/mnt/_var_lib_ovirt-hosted-engin-setup_tmptjkIDI/ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047/dom_md/ids:0
2017-06-03 19:21:36+0200 23471 [944]: s9:r5 resource
ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047:SDM:/rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmptjkIDI/ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047/dom_md/leases:1048576
for 2,9,23040
2017-06-03 19:21:36+0200 23471 [943]: s10 lockspace
a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922:250:/rhev/data-center/mnt/glusterSD/10.100.100.1:
_engine/a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids:0
2017-06-03 19:21:36+0200 23471 [23522]: a5a6b0e7 aio collect RD
0x7f59b8c0:0x7f59b8d0:0x7f59b0101000 result -22:0 match res
2017-06-03 19:21:36+0200 23471 [23522]: read_sectors delta_leader offset
127488 rv -22 /rhev/data-center/mnt/glusterSD/10.100.100.1:
_engine/a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922/dom_md/ids
2017-06-03 19:21:37+0200 23472 [930]: s9 host 250 1 23450
88c2244c-a782-40ed-9560-6cfa4d46f853.v0.neptune
2017-06-03 19:21:37+0200 23472 [943]: s10 add_lockspace fail result -22

And /var/log/vdsm/vdsm.log says:

2017-06-03 19:19:38,176+0200 WARN  (jsonrpc/3)
[storage.StorageServer.MountConnection] Using user specified
backup-volfile-servers option (storageServer:253)
2017-06-03 19:21:12,379+0200 WARN  (periodic/1) [throttled] MOM not
available. (throttledlog:105)
2017-06-03 19:21:12,380+0200 WARN  (periodic/1) [throttled] MOM not
available, KSM stats will be missing. (throttledlog:105)
2017-06-03 19:21:14,714+0200 WARN  (jsonrpc/1)
[storage.StorageServer.MountConnection] Using user specified
backup-volfile-servers option (storageServer:253)
2017-06-03 19:21:15,515+0200 ERROR (jsonrpc/4) [storage.initSANLock] Cannot
initialize SANLock for domain a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922
(clusterlock:238)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line
234, in initSANLock
sanlock.init_lockspace(sdUUID, idsPath)
SanlockException: (107, 'Sanlock lockspace init failure', 'Transport
endpoint is not connected')
2017-06-03 19:21:15,515+0200 WARN  (jsonrpc/4)
[storage.StorageDomainManifest] lease did not initialize successfully
(sd:557)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sd.py", line 552, in initDomainLock
self._domainLock.initLock(self.getDomainLease())
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line
271, in initLock
initSANLock(self._sdUUID, self._idsPath, lease)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line
239, in initSANLock
raise se.ClusterLockInitError()
ClusterLockInitError: Could not initialize cluster lock: ()
2017-06-03 19:21:37,867+0200 ERROR (jsonrpc/2) [storage.StoragePool] Create
pool hosted_datacenter canceled  (sp:655)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sp.py", line 652, in create
self.attachSD(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line
79, in wrapper
return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 971, in attachSD
dom.acquireHostId(self.id)
  File "/usr/share/vdsm/storage/sd.py", line 790, in acquireHostId
self._manifest.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/sd.py", line 449, in acquireHostId
self._domainLock.acquireHostId(hostId, async)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py", line
297, in acquireHostId
raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id:
(u'a5a6b0e7-fc3f-4838-8e26-c8b4d5e5e922', SanlockException(22, 'Sanlock
lockspace add failure', 'Invalid argument'))
2017-06-03 19:21:37,870+0200 ERROR (jsonrpc/2) [storage.StoragePool] Domain
ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047 detach from MSD
ba6bd862-c2b8-46e7-b2c8-91e4a5bb2047 Ver 1 failed. (sp:528)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sp.py", line 525, in __cleanupDomains
self.detachSD(sdUUID)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line
79, in wrapper
return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1046, in detachSD
raise se.CannotDetachMasterStorageDomain(sdUUID)

Re: [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem

2016-09-29 Thread Ravishankar N

On 09/29/2016 08:03 PM, Davide Ferrari wrote:
It's strange, I've tried to trigger the error again by putting vm04 in 
maintenence and stopping the gluster service (from ovirt gui) and now 
the VM starts correctly. Maybe the arbiter indeed blamed the brick 
that was still up before, but how's that possible?


A write from the client on that file (vm image) could have succeeded 
only on vm04 even before you brought it down.


The only (maybe big) difference with the previous, erroneous 
situation, is that before I did maintenence (+ reboot) of 3 of my 4 
hosts, maybe I should have left more time between one reboot and another?


If you did not do anything from the previous run other than to bring the 
node up and things worked, then the file is not in split-brain. Split 
braine'd files need to be resolved before they can be accessed again, 
which apparently did not happen in your case.


-Ravi


2016-09-29 14:16 GMT+02:00 Ravishankar N >:


On 09/29/2016 05:18 PM, Sahina Bose wrote:

Yes, this is a GlusterFS problem. Adding gluster users ML

On Thu, Sep 29, 2016 at 5:11 PM, Davide Ferrari
> wrote:

Hello

maybe this is more glustefs then ovirt related but since
OVirt integrates Gluster management and I'm experiencing the
problem in an ovirt cluster, I'm writing here.

The problem is simple: I have a data domain mappend on a
replica 3 arbiter1 Gluster volume with 6 bricks, like this:

Status of volume: data_ssd
Gluster process TCP Port  RDMA Port  Online  Pid

--
Brick vm01.storage.billy:/gluster/ssd/data/
brick 49153 0  Y   19298
Brick vm02.storage.billy:/gluster/ssd/data/
brick 49153 0  Y   6146
Brick vm03.storage.billy:/gluster/ssd/data/
arbiter_brick 49153 0  Y   6552
Brick vm03.storage.billy:/gluster/ssd/data/
brick 49154 0  Y   6559
Brick vm04.storage.billy:/gluster/ssd/data/
brick 49152 0  Y   6077
Brick vm02.storage.billy:/gluster/ssd/data/
arbiter_brick 49154 0  Y   6153
Self-heal Daemon on localhost   N/A N/A   
Y   30746
Self-heal Daemon on vm01.storage.billy  N/A N/A   
Y   196058
Self-heal Daemon on vm03.storage.billy  N/A N/A   
Y   23205
Self-heal Daemon on vm04.storage.billy  N/A N/A   
Y   8246



Now, I've put in maintenance the vm04 host, from ovirt,
ticking the "Stop gluster" checkbox, and Ovirt didn't
complain about anything. But when I tried to run a new VM it
complained about "storage I/O problem", while the storage
data status was always UP.

Looking in the gluster logs I can see this:

[2016-09-29 11:01:01.556908] I
[glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No
change in volfile, continuing
[2016-09-29 11:02:28.124151] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done]
0-data_ssd-replicate-1: Failing READ on gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
[Input/output error]
[2016-09-29 11:02:28.126580] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1:
Unreadable subvolume -1 found with event generation 6 for
gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain)
[2016-09-29 11:02:28.127374] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done]
0-data_ssd-replicate-1: Failing FGETXATTR on gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
[Input/output error]
[2016-09-29 11:02:28.128130] W [MSGID: 108027]
[afr-common.c:2403:afr_discover_done] 0-data_ssd-replicate-1:
no read subvols for (null)
[2016-09-29 11:02:28.129890] W
[fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 8201:
READ => -1 gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d
fd=0x7f09b749d210 (Input/output error)
[2016-09-29 11:02:28.130824] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done]
0-data_ssd-replicate-1: Failing FSTAT on gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
[Input/output error]



Does `gluster volume heal data_ssd info split-brain` report that
the file is in split-brain, with vm04 still being down?
If yes, could you provide the extended attributes of this gfid
from all 3 bricks:
getfattr -d -m . -e hex
/path/to/brick/bf/59/bf5922b7-19f3-4ce3-98df-71e981ecca8d

If no, then I'm guessing that it is 

Re: [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem

2016-09-29 Thread Davide Ferrari
It's strange, I've tried to trigger the error again by putting vm04 in
maintenence and stopping the gluster service (from ovirt gui) and now the
VM starts correctly. Maybe the arbiter indeed blamed the brick that was
still up before, but how's that possible?
The only (maybe big) difference with the previous, erroneous situation, is
that before I did maintenence (+ reboot) of 3 of my 4 hosts, maybe I should
have left more time between one reboot and another?

2016-09-29 14:16 GMT+02:00 Ravishankar N :

> On 09/29/2016 05:18 PM, Sahina Bose wrote:
>
> Yes, this is a GlusterFS problem. Adding gluster users ML
>
> On Thu, Sep 29, 2016 at 5:11 PM, Davide Ferrari 
> wrote:
>
>> Hello
>>
>> maybe this is more glustefs then ovirt related but since OVirt integrates
>> Gluster management and I'm experiencing the problem in an ovirt cluster,
>> I'm writing here.
>>
>> The problem is simple: I have a data domain mappend on a replica 3
>> arbiter1 Gluster volume with 6 bricks, like this:
>>
>> Status of volume: data_ssd
>> Gluster process TCP Port  RDMA Port  Online
>> Pid
>> 
>> --
>> Brick vm01.storage.billy:/gluster/ssd/data/
>> brick   49153 0  Y
>> 19298
>> Brick vm02.storage.billy:/gluster/ssd/data/
>> brick   49153 0  Y
>> 6146
>> Brick vm03.storage.billy:/gluster/ssd/data/
>> arbiter_brick   49153 0  Y
>> 6552
>> Brick vm03.storage.billy:/gluster/ssd/data/
>> brick   49154 0  Y
>> 6559
>> Brick vm04.storage.billy:/gluster/ssd/data/
>> brick   49152 0  Y
>> 6077
>> Brick vm02.storage.billy:/gluster/ssd/data/
>> arbiter_brick   49154 0  Y
>> 6153
>> Self-heal Daemon on localhost   N/A   N/AY
>> 30746
>> Self-heal Daemon on vm01.storage.billy  N/A   N/AY
>> 196058
>> Self-heal Daemon on vm03.storage.billy  N/A   N/AY
>> 23205
>> Self-heal Daemon on vm04.storage.billy  N/A   N/AY
>> 8246
>>
>>
>> Now, I've put in maintenance the vm04 host, from ovirt, ticking the "Stop
>> gluster" checkbox, and Ovirt didn't complain about anything. But when I
>> tried to run a new VM it complained about "storage I/O problem", while the
>> storage data status was always UP.
>>
>> Looking in the gluster logs I can see this:
>>
>> [2016-09-29 11:01:01.556908] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
>> 0-glusterfs: No change in volfile, continuing
>> [2016-09-29 11:02:28.124151] E [MSGID: 108008]
>> [afr-read-txn.c:89:afr_read_txn_refresh_done] 0-data_ssd-replicate-1:
>> Failing READ on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain
>> observed. [Input/output error]
>> [2016-09-29 11:02:28.126580] W [MSGID: 108008]
>> [afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1: Unreadable
>> subvolume -1 found with event generation 6 for gfid
>> bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain)
>> [2016-09-29 11:02:28.127374] E [MSGID: 108008]
>> [afr-read-txn.c:89:afr_read_txn_refresh_done] 0-data_ssd-replicate-1:
>> Failing FGETXATTR on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d:
>> split-brain observed. [Input/output error]
>> [2016-09-29 11:02:28.128130] W [MSGID: 108027]
>> [afr-common.c:2403:afr_discover_done] 0-data_ssd-replicate-1: no read
>> subvols for (null)
>> [2016-09-29 11:02:28.129890] W [fuse-bridge.c:2228:fuse_readv_cbk]
>> 0-glusterfs-fuse: 8201: READ => -1 gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d
>> fd=0x7f09b749d210 (Input/output error)
>> [2016-09-29 11:02:28.130824] E [MSGID: 108008]
>> [afr-read-txn.c:89:afr_read_txn_refresh_done] 0-data_ssd-replicate-1:
>> Failing FSTAT on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain
>> observed. [Input/output error]
>>
>
> Does `gluster volume heal data_ssd info split-brain` report that the file
> is in split-brain, with vm04 still being down?
> If yes, could you provide the extended attributes of this gfid from all 3
> bricks:
> getfattr -d -m . -e hex /path/to/brick/bf/59/bf5922b7-
> 19f3-4ce3-98df-71e981ecca8d
>
> If no, then I'm guessing that it is not in actual split-brain (hence the
> 'Possible split-brain' message). If the node you brought down contains the
> only good copy of the file (i.e the other data brick and arbiter are up,
> and the arbiter 'blames' this other brick), all I/O is failed with EIO to
> prevent file from getting into actual split-brain. The heals will happen
> when the good node comes up and I/O should be allowed again in that case.
>
> -Ravi
>
>
> [2016-09-29 11:02:28.133879] W [fuse-bridge.c:767:fuse_attr_cbk]
>> 0-glusterfs-fuse: 8202: FSTAT() /ba2bd397-9222-424d-aecc-eb652
>> 

Re: [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem

2016-09-29 Thread Ravishankar N

On 09/29/2016 05:18 PM, Sahina Bose wrote:

Yes, this is a GlusterFS problem. Adding gluster users ML

On Thu, Sep 29, 2016 at 5:11 PM, Davide Ferrari > wrote:


Hello

maybe this is more glustefs then ovirt related but since OVirt
integrates Gluster management and I'm experiencing the problem in
an ovirt cluster, I'm writing here.

The problem is simple: I have a data domain mappend on a replica 3
arbiter1 Gluster volume with 6 bricks, like this:

Status of volume: data_ssd
Gluster process TCP Port  RDMA Port  Online  Pid

--
Brick vm01.storage.billy:/gluster/ssd/data/
brick 49153 0  Y   19298
Brick vm02.storage.billy:/gluster/ssd/data/
brick 49153 0  Y   6146
Brick vm03.storage.billy:/gluster/ssd/data/
arbiter_brick 49153 0  Y   6552
Brick vm03.storage.billy:/gluster/ssd/data/
brick 49154 0  Y   6559
Brick vm04.storage.billy:/gluster/ssd/data/
brick 49152 0  Y   6077
Brick vm02.storage.billy:/gluster/ssd/data/
arbiter_brick 49154 0  Y   6153
Self-heal Daemon on localhost N/A   N/AY   30746
Self-heal Daemon on vm01.storage.billy N/A   N/A   
Y   196058
Self-heal Daemon on vm03.storage.billy N/A   N/A   
Y   23205
Self-heal Daemon on vm04.storage.billy N/A   N/A   
Y   8246



Now, I've put in maintenance the vm04 host, from ovirt, ticking
the "Stop gluster" checkbox, and Ovirt didn't complain about
anything. But when I tried to run a new VM it complained about
"storage I/O problem", while the storage data status was always UP.

Looking in the gluster logs I can see this:

[2016-09-29 11:01:01.556908] I
[glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 0-glusterfs: No change
in volfile, continuing
[2016-09-29 11:02:28.124151] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done]
0-data_ssd-replicate-1: Failing READ on gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
[Input/output error]
[2016-09-29 11:02:28.126580] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1:
Unreadable subvolume -1 found with event generation 6 for gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain)
[2016-09-29 11:02:28.127374] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done]
0-data_ssd-replicate-1: Failing FGETXATTR on gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
[Input/output error]
[2016-09-29 11:02:28.128130] W [MSGID: 108027]
[afr-common.c:2403:afr_discover_done] 0-data_ssd-replicate-1: no
read subvols for (null)
[2016-09-29 11:02:28.129890] W [fuse-bridge.c:2228:fuse_readv_cbk]
0-glusterfs-fuse: 8201: READ => -1
gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d fd=0x7f09b749d210
(Input/output error)
[2016-09-29 11:02:28.130824] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done]
0-data_ssd-replicate-1: Failing FSTAT on gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
[Input/output error]



Does `gluster volume heal data_ssd info split-brain` report that the 
file is in split-brain, with vm04 still being down?
If yes, could you provide the extended attributes of this gfid from all 
3 bricks:
getfattr -d -m . -e hex 
/path/to/brick/bf/59/bf5922b7-19f3-4ce3-98df-71e981ecca8d


If no, then I'm guessing that it is not in actual split-brain (hence the 
'Possible split-brain' message). If the node you brought down contains 
the only good copy of the file (i.e the other data brick and arbiter are 
up, and the arbiter 'blames' this other brick), all I/O is failed with 
EIO to prevent file from getting into actual split-brain. The heals will 
happen when the good node comes up and I/O should be allowed again in 
that case.


-Ravi



[2016-09-29 11:02:28.133879] W [fuse-bridge.c:767:fuse_attr_cbk]
0-glusterfs-fuse: 8202: FSTAT()

/ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1-18a67615c527
=> -1 (Input/output error)
The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn]
0-data_ssd-replicate-1: Unreadable subvolume -1 found with event
generation 6 for gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d.
(Possible split-brain)" repeated 11 times between [2016-09-29
11:02:28.126580] and [2016-09-29 11:02:28.517744]
[2016-09-29 11:02:28.518607] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done]
0-data_ssd-replicate-1: Failing STAT on gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain observed.
[Input/output error]

Now, how is it possible to have a split brain if I stopped just

Re: [ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem

2016-09-29 Thread Sahina Bose
Yes, this is a GlusterFS problem. Adding gluster users ML

On Thu, Sep 29, 2016 at 5:11 PM, Davide Ferrari  wrote:

> Hello
>
> maybe this is more glustefs then ovirt related but since OVirt integrates
> Gluster management and I'm experiencing the problem in an ovirt cluster,
> I'm writing here.
>
> The problem is simple: I have a data domain mappend on a replica 3
> arbiter1 Gluster volume with 6 bricks, like this:
>
> Status of volume: data_ssd
> Gluster process TCP Port  RDMA Port  Online
> Pid
> 
> --
> Brick vm01.storage.billy:/gluster/ssd/data/
> brick   49153 0  Y
> 19298
> Brick vm02.storage.billy:/gluster/ssd/data/
> brick   49153 0  Y
> 6146
> Brick vm03.storage.billy:/gluster/ssd/data/
> arbiter_brick   49153 0  Y
> 6552
> Brick vm03.storage.billy:/gluster/ssd/data/
> brick   49154 0  Y
> 6559
> Brick vm04.storage.billy:/gluster/ssd/data/
> brick   49152 0  Y
> 6077
> Brick vm02.storage.billy:/gluster/ssd/data/
> arbiter_brick   49154 0  Y
> 6153
> Self-heal Daemon on localhost   N/A   N/AY
> 30746
> Self-heal Daemon on vm01.storage.billy  N/A   N/AY
> 196058
> Self-heal Daemon on vm03.storage.billy  N/A   N/AY
> 23205
> Self-heal Daemon on vm04.storage.billy  N/A   N/AY
> 8246
>
>
> Now, I've put in maintenance the vm04 host, from ovirt, ticking the "Stop
> gluster" checkbox, and Ovirt didn't complain about anything. But when I
> tried to run a new VM it complained about "storage I/O problem", while the
> storage data status was always UP.
>
> Looking in the gluster logs I can see this:
>
> [2016-09-29 11:01:01.556908] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2016-09-29 11:02:28.124151] E [MSGID: 108008] 
> [afr-read-txn.c:89:afr_read_txn_refresh_done]
> 0-data_ssd-replicate-1: Failing READ on gfid 
> bf5922b7-19f3-4ce3-98df-71e981ecca8d:
> split-brain observed. [Input/output error]
> [2016-09-29 11:02:28.126580] W [MSGID: 108008]
> [afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1: Unreadable
> subvolume -1 found with event generation 6 for gfid 
> bf5922b7-19f3-4ce3-98df-71e981ecca8d.
> (Possible split-brain)
> [2016-09-29 11:02:28.127374] E [MSGID: 108008] 
> [afr-read-txn.c:89:afr_read_txn_refresh_done]
> 0-data_ssd-replicate-1: Failing FGETXATTR on gfid 
> bf5922b7-19f3-4ce3-98df-71e981ecca8d:
> split-brain observed. [Input/output error]
> [2016-09-29 11:02:28.128130] W [MSGID: 108027] 
> [afr-common.c:2403:afr_discover_done]
> 0-data_ssd-replicate-1: no read subvols for (null)
> [2016-09-29 11:02:28.129890] W [fuse-bridge.c:2228:fuse_readv_cbk]
> 0-glusterfs-fuse: 8201: READ => -1 gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d
> fd=0x7f09b749d210 (Input/output error)
> [2016-09-29 11:02:28.130824] E [MSGID: 108008] 
> [afr-read-txn.c:89:afr_read_txn_refresh_done]
> 0-data_ssd-replicate-1: Failing FSTAT on gfid 
> bf5922b7-19f3-4ce3-98df-71e981ecca8d:
> split-brain observed. [Input/output error]
> [2016-09-29 11:02:28.133879] W [fuse-bridge.c:767:fuse_attr_cbk]
> 0-glusterfs-fuse: 8202: FSTAT() /ba2bd397-9222-424d-aecc-
> eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/
> ff4e49c6-3084-4234-80a1-18a67615c527 => -1 (Input/output error)
> The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn]
> 0-data_ssd-replicate-1: Unreadable subvolume -1 found with event generation
> 6 for gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain)"
> repeated 11 times between [2016-09-29 11:02:28.126580] and [2016-09-29
> 11:02:28.517744]
> [2016-09-29 11:02:28.518607] E [MSGID: 108008] 
> [afr-read-txn.c:89:afr_read_txn_refresh_done]
> 0-data_ssd-replicate-1: Failing STAT on gfid 
> bf5922b7-19f3-4ce3-98df-71e981ecca8d:
> split-brain observed. [Input/output error]
>
> Now, how is it possible to have a split brain if I stopped just ONE server
> which had just ONE of six bricks, and it was cleanly shut down with
> maintenance mode from ovirt?
>
> I created the volume originally this way:
> # gluster volume create data_ssd replica 3 arbiter 1
> vm01.storage.billy:/gluster/ssd/data/brick 
> vm02.storage.billy:/gluster/ssd/data/brick
> vm03.storage.billy:/gluster/ssd/data/arbiter_brick
> vm03.storage.billy:/gluster/ssd/data/brick 
> vm04.storage.billy:/gluster/ssd/data/brick
> vm02.storage.billy:/gluster/ssd/data/arbiter_brick
> # gluster volume set data_ssd group virt
> # gluster volume set data_ssd storage.owner-uid 36 && gluster volume set
> data_ssd storage.owner-gid 36
> # gluster volume start data_ssd
>
>
> --
> Davide Ferrari
> Senior Systems Engineer
>
> 

[ovirt-users] Ovirt/Gluster replica 3 distributed-replicated problem

2016-09-29 Thread Davide Ferrari
Hello

maybe this is more glustefs then ovirt related but since OVirt integrates
Gluster management and I'm experiencing the problem in an ovirt cluster,
I'm writing here.

The problem is simple: I have a data domain mappend on a replica 3 arbiter1
Gluster volume with 6 bricks, like this:

Status of volume: data_ssd
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick vm01.storage.billy:/gluster/ssd/data/
brick   49153 0  Y
19298
Brick vm02.storage.billy:/gluster/ssd/data/
brick   49153 0  Y
6146
Brick vm03.storage.billy:/gluster/ssd/data/
arbiter_brick   49153 0  Y
6552
Brick vm03.storage.billy:/gluster/ssd/data/
brick   49154 0  Y
6559
Brick vm04.storage.billy:/gluster/ssd/data/
brick   49152 0  Y
6077
Brick vm02.storage.billy:/gluster/ssd/data/
arbiter_brick   49154 0  Y
6153
Self-heal Daemon on localhost   N/A   N/AY
30746
Self-heal Daemon on vm01.storage.billy  N/A   N/AY
196058
Self-heal Daemon on vm03.storage.billy  N/A   N/AY
23205
Self-heal Daemon on vm04.storage.billy  N/A   N/AY
8246


Now, I've put in maintenance the vm04 host, from ovirt, ticking the "Stop
gluster" checkbox, and Ovirt didn't complain about anything. But when I
tried to run a new VM it complained about "storage I/O problem", while the
storage data status was always UP.

Looking in the gluster logs I can see this:

[2016-09-29 11:01:01.556908] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing
[2016-09-29 11:02:28.124151] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done] 0-data_ssd-replicate-1:
Failing READ on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain
observed. [Input/output error]
[2016-09-29 11:02:28.126580] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-data_ssd-replicate-1: Unreadable
subvolume -1 found with event generation 6 for gfid
bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain)
[2016-09-29 11:02:28.127374] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done] 0-data_ssd-replicate-1:
Failing FGETXATTR on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain
observed. [Input/output error]
[2016-09-29 11:02:28.128130] W [MSGID: 108027]
[afr-common.c:2403:afr_discover_done] 0-data_ssd-replicate-1: no read
subvols for (null)
[2016-09-29 11:02:28.129890] W [fuse-bridge.c:2228:fuse_readv_cbk]
0-glusterfs-fuse: 8201: READ => -1
gfid=bf5922b7-19f3-4ce3-98df-71e981ecca8d fd=0x7f09b749d210 (Input/output
error)
[2016-09-29 11:02:28.130824] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done] 0-data_ssd-replicate-1:
Failing FSTAT on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain
observed. [Input/output error]
[2016-09-29 11:02:28.133879] W [fuse-bridge.c:767:fuse_attr_cbk]
0-glusterfs-fuse: 8202: FSTAT()
/ba2bd397-9222-424d-aecc-eb652c0169d9/images/f02ac1ce-52cd-4b81-8b29-f8006d0469e0/ff4e49c6-3084-4234-80a1-18a67615c527
=> -1 (Input/output error)
The message "W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn]
0-data_ssd-replicate-1: Unreadable subvolume -1 found with event generation
6 for gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d. (Possible split-brain)"
repeated 11 times between [2016-09-29 11:02:28.126580] and [2016-09-29
11:02:28.517744]
[2016-09-29 11:02:28.518607] E [MSGID: 108008]
[afr-read-txn.c:89:afr_read_txn_refresh_done] 0-data_ssd-replicate-1:
Failing STAT on gfid bf5922b7-19f3-4ce3-98df-71e981ecca8d: split-brain
observed. [Input/output error]

Now, how is it possible to have a split brain if I stopped just ONE server
which had just ONE of six bricks, and it was cleanly shut down with
maintenance mode from ovirt?

I created the volume originally this way:
# gluster volume create data_ssd replica 3 arbiter 1
vm01.storage.billy:/gluster/ssd/data/brick
vm02.storage.billy:/gluster/ssd/data/brick
vm03.storage.billy:/gluster/ssd/data/arbiter_brick
vm03.storage.billy:/gluster/ssd/data/brick
vm04.storage.billy:/gluster/ssd/data/brick
vm02.storage.billy:/gluster/ssd/data/arbiter_brick
# gluster volume set data_ssd group virt
# gluster volume set data_ssd storage.owner-uid 36 && gluster volume set
data_ssd storage.owner-gid 36
# gluster volume start data_ssd


-- 
Davide Ferrari
Senior Systems Engineer
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt Gluster Hyperconverged problem

2016-09-20 Thread knarra

Hello Hanson,

Below  is the procedure to replace the host with same FQDN where 
existing host OS has to be re-installed. If the ovirt version you are 
running is 4.0, steps 14 and 15 are not required. You could reinstall 
the host from UI with HostedEngine->Deploy option.


*

1.

   Move host (host3) to maintenance in UI


2.

   Re-install OS, subscribe to channels & install required packages,
   prepare bricks (if needed)


3.

   Check gluster peer status from working node to obtain UUID of host
   being replaced

4.

   Create brick directories by running the command “mkdir /rhgs/brick{1..3}

5.

   Put /etc/fstab entries in the new node by copying it from other nodes.

6.

   Run mount -a so that bricks are mounted.


7.

   Edit gluster UUID in /var/lib/glusterd/glusterd.info

8.

   Copy peer info from a working peer to /var/lib/glusterd/peers
   (without the peer info of node being replaced, here host3)

9.

   Create and remove a tmp dir at all volume mount points

10.

   Run the command “setfattr -n trusted.non-existent-key -v abc ” to set extended attributes and remove the extended attribute
   by running the command “ setfattr -x trusted.non-existent-key ” at all mount points.

11.

   Restart glusterd

12.

   Ensure heal is in progress and complete

13.

   Edit the host, and fetch fingerprint in Advanced details - as
   fingerprint is changed due to reinstallation

14.

   Run “hosted-engine --deploy --config-append=answers.conf” on host3
   (Should be seen as additional host setup, provide the host number as
   known by other hosts)

15.

   hosted-engine deploy fails as the host being installed cannot be
   added to the engine with hostname already known error. Reinstalling
   from the UI and aborting HE setup seems to fix this. ovirt-ha-agent
   and ovirt-ha-broker services had to be started manually

1.

   Go to UI and click on reinstall button  to reinstall the host.
   Reinstalling host might fail due not able to configure
   management network.

2.

   Go to Network Interfaces tab and click on “Setup Host Networks”
   and assign the networks ovirtmgmt and glusternw to the correct nics.

3.

   Wait for sometime for the Node to come up and start
   ovirt-ha-agent and ovirt-ha-broker services.

*
Thanks
kasturi.

On 09/20/2016 11:27 AM, knarra wrote:

Hi,

Pad [1] contains the procedure to replace the host with same FQDN 
where existing host OS has to be re-installed.


[1] https://paste.fedoraproject.org/431252/47435076/

Thanks
kasturi.

On 09/20/2016 06:27 AM, Hanson wrote:

Hi Guys,

I encountered an unfortunate circumstance today. Possibly an 
achillies heel.


I have three hypervisors, HV1, HV2, HV3, all running gluster for 
hosted engine support. Individually they all pointed to 
HV1:/hosted_engine with backupvol=HV2,HV3...


HV1 lost its bootsector, which was discovered upon a reboot. This had 
zero impact, as designed, on the VM's.


However, now that HV1 is down, how does one go about replacing the 
original HV? The backup servers point to HV1, and you cannot readd 
the HV through the GUI, and the CLI will not readd it as it's already 
there... you cannot remove it as it is down in the GUI...


Pointing the other HV's to their own storage may make sense for 
multiple instances of the hosted_engine, however it's nice that the 
gluster volumes are replicated and that one VM can be relaunched when 
a HV error is detected. It's also consuming less resources.



What's the procedure to replace the original VM?



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt Gluster Hyperconverged problem

2016-09-19 Thread knarra

Hi,

Pad [1] contains the procedure to replace the host with same FQDN 
where existing host OS has to be re-installed.


[1] https://paste.fedoraproject.org/431252/47435076/

Thanks
kasturi.

On 09/20/2016 06:27 AM, Hanson wrote:

Hi Guys,

I encountered an unfortunate circumstance today. Possibly an achillies 
heel.


I have three hypervisors, HV1, HV2, HV3, all running gluster for 
hosted engine support. Individually they all pointed to 
HV1:/hosted_engine with backupvol=HV2,HV3...


HV1 lost its bootsector, which was discovered upon a reboot. This had 
zero impact, as designed, on the VM's.


However, now that HV1 is down, how does one go about replacing the 
original HV? The backup servers point to HV1, and you cannot readd the 
HV through the GUI, and the CLI will not readd it as it's already 
there... you cannot remove it as it is down in the GUI...


Pointing the other HV's to their own storage may make sense for 
multiple instances of the hosted_engine, however it's nice that the 
gluster volumes are replicated and that one VM can be relaunched when 
a HV error is detected. It's also consuming less resources.



What's the procedure to replace the original VM?



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] oVirt Gluster Hyperconverged problem

2016-09-19 Thread Hanson

Hi Guys,

I encountered an unfortunate circumstance today. Possibly an achillies 
heel.


I have three hypervisors, HV1, HV2, HV3, all running gluster for hosted 
engine support. Individually they all pointed to HV1:/hosted_engine with 
backupvol=HV2,HV3...


HV1 lost its bootsector, which was discovered upon a reboot. This had 
zero impact, as designed, on the VM's.


However, now that HV1 is down, how does one go about replacing the 
original HV? The backup servers point to HV1, and you cannot readd the 
HV through the GUI, and the CLI will not readd it as it's already 
there... you cannot remove it as it is down in the GUI...


Pointing the other HV's to their own storage may make sense for multiple 
instances of the hosted_engine, however it's nice that the gluster 
volumes are replicated and that one VM can be relaunched when a HV error 
is detected. It's also consuming less resources.



What's the procedure to replace the original VM?



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt + Gluster Hyperconverged

2016-07-18 Thread Hanson Turner

Hi Fernando,

Not anything spectacular that I have seen, but I'm using 16GB minimum 
each node.


Probably want to setup your hosted-engine as 2cpu, 4096mb ram. I believe 
that's the min reqs.


Thanks,

Hanson


On 07/15/2016 09:48 AM, Fernando Frediani wrote:

Hi folks,

I have a few servers with reasonable amount of raw storage but they 
are 3 with only 8GB of memory each.
I wanted to have them with an oVirt Hyperconverged + Gluster mainly to 
take advantage of the amount of the storage spread between them and 
have ability to live migrate VMs.


Question is: Does running Gluster on the same Hypervisor nodes 
consumes any significant memory that won't be much left for running VMs ?


Thanks
Fernando
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


--
-
- Network Engineer  -
-
-  Andrews Wireless -
- 671 Durham road 21-
-Uxbridge ON, L9P 1R4   -
-P: 905.852.8896-
-F: 905.852.7587-
- Toll free  (877)852.8896  -
-

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] oVirt + Gluster Hyperconverged

2016-07-15 Thread Fernando Frediani

Hi folks,

I have a few servers with reasonable amount of raw storage but they are 
3 with only 8GB of memory each.
I wanted to have them with an oVirt Hyperconverged + Gluster mainly to 
take advantage of the amount of the storage spread between them and have 
ability to live migrate VMs.


Question is: Does running Gluster on the same Hypervisor nodes consumes 
any significant memory that won't be much left for running VMs ?


Thanks
Fernando
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt & gluster - which copy is written to?

2016-05-17 Thread Sahina Bose



On 05/12/2016 05:35 AM, Bill Bill wrote:


Hello,

Let’s say I have a 5 node converged cluster of oVirt & glusterFS with 
a replica count of “3”.


Host1 - replica

Host2 - replica

Host3 - replica

Host4

Host5

If I spin a VM up on Host1 – does the first replica get created local 
to that server?


In the event I move the VM to another host, does the replica of the 
disk follow the VM so essentially it’s writing to a local copy first 
or does it “always” write across the network.




Any writes are written to all 3 hosts that form the replica 3 gluster 
volume. Replica 3 - indicates that there is a 3-way copy of data, and 
the same data resides on Host1, Host2 and Host3.


Reads however are from local host. So for a VM running on Host1, it will 
access the data (i.e. read) from the brick on Host1.


For example, with VMware, the disk will follow the VM to the 
hypervisor you move it to so that the writes are essentially “local”.


I have not been able to find anything on the internet or in the 
documentation that specifies how this process works. Ideally, you 
wouldn’t want a VM residing on Host1 that accesses its disk on Host3. 
Multiple that by hundreds of VM’s and it would create a ton of 
unnecessary network congestion.




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt & gluster - which copy is written to?

2016-05-16 Thread Alexander Wels
On Thursday, May 12, 2016 12:05:22 AM Bill Bill wrote:
> Hello,
> 
> Let’s say I have a 5 node converged cluster of oVirt & glusterFS with a
> replica count of “3”.
 
> Host1 - replica
> Host2 - replica
> Host3 - replica
> Host4
> Host5
> 
> If I spin a VM up on Host1 – does the first replica get created local to
> that server?
 
> In the event I move the VM to another host, does the replica of the disk
> follow the VM so essentially it’s writing to a local copy first or does it
> “always” write across the network.
 
> For example, with VMware, the disk will follow the VM to the hypervisor you
> move it to so that the writes are essentially “local”.
 
> I have not been able to find anything on the internet or in the
> documentation that specifies how this process works. Ideally, you wouldn’t
> want a VM residing on Host1 that accesses its disk on Host3. Multiple that
> by hundreds of VM’s and it would create a ton of unnecessary network
> congestion.
 
While I cannot answer your actual question as I don't know enough about the 
internal implementation of how oVirt interacts with gluster. I just want to 
point out major difference between oVirt and VMWare. In oVirt storage domains 
are attached to data centers, not hosts. So there is no does the disk move 
from host1 to host2. The disk is on the storage domain and host1 and host2 and 
every other host in the data center has access to the storage domain.


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] oVirt & gluster - which copy is written to?

2016-05-16 Thread Bill Bill
Hello,

Let’s say I have a 5 node converged cluster of oVirt & glusterFS with a replica 
count of “3”.

Host1 - replica
Host2 - replica
Host3 - replica
Host4
Host5

If I spin a VM up on Host1 – does the first replica get created local to that 
server?

In the event I move the VM to another host, does the replica of the disk follow 
the VM so essentially it’s writing to a local copy first or does it “always” 
write across the network.

For example, with VMware, the disk will follow the VM to the hypervisor you 
move it to so that the writes are essentially “local”.

I have not been able to find anything on the internet or in the documentation 
that specifies how this process works. Ideally, you wouldn’t want a VM residing 
on Host1 that accesses its disk on Host3. Multiple that by hundreds of VM’s and 
it would create a ton of unnecessary network congestion.

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt + gluster + selfhosted + bonding

2015-09-16 Thread Simone Tiraboschi
On Tue, Sep 15, 2015 at 9:58 PM, Joachim Tingvold 
wrote:

> Hi,
>
> First-time user of oVirt, so bear with me.
>
> Trying to get redundant oVirt + gluster set up. Have four hosts;
>
>   gluster1 (CentOS7)
>   gluster2 (CentOS7)
>   ovirt1 (CentOS7)
>   ovirt2 (CentOS7)
>
> Using replica 3 volume with arbiter node (new in 3.7.0). Got that part up
> and running (using ovirt1 as the arbiter node), and it works fine.
>
> Initial goal (before reading up on both gluster and oVirt) was to have
> everything v6-only, but found out quickly enough that we had to scratch
> that plan for now (I see that there are some activity on both gluster and
> oVirt on this, which is nice).
>
> Anyways. We wanted to use the "self hosted engine gluster"-feature (which,
> by the looks of it, is only present in 3.6). We installed 3.6b4
> (3.6.0.1-0.1.20150821.gitc8ddcd8.el7.centos).
>
> I already had the network set up (couldn't find any specifics on this in
> the somewhat lacking oVirt-documentation?), something along these lines;
>
>  * eth0 + eth1, bonded in bond0 (LACP)
>  * vlan110 on top of bond0: v6-only for mgmt of host
>  * vlan111 on top of bond0: v4 for gluster + ovirt
>
> We then ran the 'hosted-engine --deploy' command, filling out the
> information as best as we could (some of these options seemed to lack
> documentation, or at least we had trouble finding it). The end-result was
> like this[1].
>
> Accepting this, we suddenly found ourselves without connectivity to the
> host. Logged in via KVM, and this[2] was the last part of the log.
>
> All of the interfaces we had before (bond0, vlan110, vlan111) was "wiped
> clean" for it's configuration, and VDSM seems to have taken control on that
> part (however, since the script failed, we seem to have ended up in some
> kind of "limbo mode"). Rebooting didn't help bring things up again, and
> we're currently looking into manually configuring things via VDSM.
>


Hi Joachim,
unfortunately you hit this one:
https://bugzilla.redhat.com/1263311
The latest VDSM build sometimes fails setting up the management network and
so the issue you described.
It's not really systematic, we are seeing it on our CI env: some execution
correctly work while others, with the same code and the same parameters,
fail.
So, till we solve it, just clean up VDSM network conf and retry deploying
hosted-engine.
Feel free to add any relevant findings on that bug.


> Thought I'd post here meanwhile, seeing if we've missed something obvious,
> or if oVirt should've handled this any different?
>
> If relevant, the content of the answer-file referenced in [2] can be found
> here[3].
>
> [1] 
> [2] 
> [3] 
>
> --
> Joachim
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] oVirt + gluster + selfhosted + bonding

2015-09-15 Thread Joachim Tingvold

Hi,

First-time user of oVirt, so bear with me.

Trying to get redundant oVirt + gluster set up. Have four hosts;

  gluster1 (CentOS7)
  gluster2 (CentOS7)
  ovirt1 (CentOS7)
  ovirt2 (CentOS7)

Using replica 3 volume with arbiter node (new in 3.7.0). Got that part 
up and running (using ovirt1 as the arbiter node), and it works fine.


Initial goal (before reading up on both gluster and oVirt) was to have 
everything v6-only, but found out quickly enough that we had to scratch 
that plan for now (I see that there are some activity on both gluster 
and oVirt on this, which is nice).


Anyways. We wanted to use the "self hosted engine gluster"-feature 
(which, by the looks of it, is only present in 3.6). We installed 3.6b4 
(3.6.0.1-0.1.20150821.gitc8ddcd8.el7.centos).


I already had the network set up (couldn't find any specifics on this in 
the somewhat lacking oVirt-documentation?), something along these lines;


 * eth0 + eth1, bonded in bond0 (LACP)
 * vlan110 on top of bond0: v6-only for mgmt of host
 * vlan111 on top of bond0: v4 for gluster + ovirt

We then ran the 'hosted-engine --deploy' command, filling out the 
information as best as we could (some of these options seemed to lack 
documentation, or at least we had trouble finding it). The end-result 
was like this[1].


Accepting this, we suddenly found ourselves without connectivity to the 
host. Logged in via KVM, and this[2] was the last part of the log.


All of the interfaces we had before (bond0, vlan110, vlan111) was "wiped 
clean" for it's configuration, and VDSM seems to have taken control on 
that part (however, since the script failed, we seem to have ended up in 
some kind of "limbo mode"). Rebooting didn't help bring things up again, 
and we're currently looking into manually configuring things via VDSM.


Thought I'd post here meanwhile, seeing if we've missed something 
obvious, or if oVirt should've handled this any different?


If relevant, the content of the answer-file referenced in [2] can be 
found here[3].


[1] 
[2] 
[3] 

--
Joachim
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt+gluster+NFS : storage hicups

2015-09-03 Thread Nicolas Ecarnot

Le 06/08/2015 16:36, Tim Macy a écrit :

Nicolas,  I have the same setup dedicated physical system running engine
on CentOS 6.6 three hosts running CentOS 7.1 with Gluster and KVM, and
firewall is disabled on all hosts.  I also followed the same documents
to build my environment so I assume they are very similar.  I have on
occasion had the same errors and have also found that "ctdb rebalanceip
" is the only way to resolve the problem.  I intend to
remove ctdb since it is not needed with the configuration we are
running.  CTDB is only needed for hosted engine on a floating NFS mount,
so you should be able change the gluster storage domain mount paths to
"localhost:".  The only thing that has prevented me from making
this change is that my environment is live with running VM's.  Please
let me know if you go this route.

>
> Thank you,
> Tim Macy

This week, I eventually took the time to change this, as this DC is not 
in production.


- Our big NFS storage domain was the master, it contained some VMs
- I wiped all my VMs
- I created a very small temporary NFS master domain, because I did not 
want to bother with any issue related with erasing the last master 
storage domain

- I removed the big NFS SD
- I wiped all that was inside, on a filesystem level
[
- I disabled ctdb, removed the "meta" gluster volume that ctdb used for 
its locks

]
- I added a new storage domain, using your advice :
  - gluster type
  - localhost:
- I removed the temp SD, and all switched correctly on the big glusterFS

I then spent some time playing with P2V, and storing new VMs on this new 
style glusterFS storage domain.
I'm watching the CPU and i/o on the hosts, and yes, they are working, 
but that keeps sane.


On this particular change (NFS to glusterFS), everything was very smooth.

Regards,

--
Nicolas ECARNOT
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt/Gluster

2015-08-28 Thread Sander Hoentjen



On 08/21/2015 06:12 PM, Ravishankar N wrote:



On 08/21/2015 07:57 PM, Sander Hoentjen wrote:

Maybe I should formulate some clear questions:
1) Am I correct in assuming that an issue on of of 3 gluster nodes 
should not cause downtime for VM's on other nodes?


From what I understand, yes. Maybe the ovirt folks can confirm. I can 
tell you this much for sure: If you create a replica 3 volume using 3 
nodes, mount the volume locally on each node, and bring down one node, 
the mounts from the other 2 nodes *must* have read+write access to the 
volume.




2) What can I/we do to fix the issue I am seeing?
3) Can anybody else reproduce my issue?

I'll try and see if I can.


Hi Ravi,

Did you get around to this by any chance? This is a blocker issue for 
us. Apart from that, has anybody else have any success with using 
gluster reliably as an ovirt storage solution?


Regards,
Sander
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt/Gluster

2015-08-21 Thread Sander Hoentjen



On 08/21/2015 11:30 AM, Ravishankar N wrote:



On 08/21/2015 01:21 PM, Sander Hoentjen wrote:



On 08/21/2015 09:28 AM, Ravishankar N wrote:



On 08/20/2015 02:14 PM, Sander Hoentjen wrote:



On 08/19/2015 09:04 AM, Ravishankar N wrote:



On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote:

+ Ravi from gluster.

Regards,
Ramesh

- Original Message -
From: Sander Hoentjen san...@hoentjen.eu
To: users@ovirt.org
Sent: Tuesday, August 18, 2015 3:30:35 PM
Subject: [ovirt-users] Ovirt/Gluster

Hi,

We are looking for some easy to manage self contained VM hosting. 
Ovirt
with GlusterFS seems to fit that bill perfectly. I installed it 
and then
starting kicking the tires. First results looked promising, but 
now I

can get a VM to pause indefinitely fairly easy:

My setup is 3 hosts that are in a Virt and Gluster cluster. 
Gluster is
setup as replica-3. The gluster export is used as the storage 
domain for

the VM's.


Hi,

What version of gluster and ovirt are you using?

glusterfs-3.7.3-1.el7.x86_64
vdsm-4.16.20-0.el7.centos.x86_64
ovirt-engine-3.5.3.1-1.el7.centos.noarch




Now when I start the VM all is good, performance is good enough 
so we

are happy. I then start bonnie++ to generate some load. I have a VM
running on host 1, host 2 is SPM and all 3 VM's are seeing some 
network

traffic courtesy of gluster.

Now, for fun, suddenly the network on host3 goes bad (iptables -I 
OUTPUT

-m statistic --mode random --probability 0.75 -j REJECT).
Some time later I see the guest has a small hickup, I'm 
guessing that
is when gluster decides host 3 is not allowed to play anymore. No 
big

deal anyway.
After a while 25% of packages just isn't good enough for Ovirt 
anymore,

so the host will be fenced.


I'm not sure what fencing means w.r.t ovirt and what it actually 
fences. As far is gluster is concerned, since only one node is 
blocked, the VM image should still be accessible by the VM running 
on host1.
Fencing means (at least in this case) that the IPMI of the server 
does a power reset.

After a reboot *sometimes* the VM will be
paused, and even after the gluster self-heal is complete it can 
not be

unpaused, has to be restarted.


Could you provide the gluster mount (fuse?) logs and the brick 
logs of all 3 nodes when the VM is paused? That should give us 
some clue.



Logs are attached. Problem was at around 8:15 - 8:20 UTC
This time however the vm stopped even without a reboot of hyp03



The mount logs  (rhev-data-center-mnt-glusterSD*) are indicating 
frequent disconnects to the bricks  with 'clnt_ping_timer_expired', 
'Client-quorum is not met' and 'Read-only file system' messages.
client-quorum is enabled by default for replica 3 volumes. So if the 
mount cannot connect to 2 bricks at least, quorum is lost and the 
gluster volume becomes read-only. That seems to be the reason why 
the VMs are pausing.
I'm not sure if the frequent disconnects are due a flaky network or 
the bricks not responding to the mount's ping timer due to it's 
epoll threads busy with I/O (unlikely). Can you also share the 
output of `gluster volume info volname` ?
The frequent disconnects are probably because I intentionally broke 
the network on hyp03 (dropped 75% of outgoing packets). In my opinion 
this should not affect the VM an hyp02. Am I wrong to think that?



For client-quorum, If a client (mount)  cannot connect to the number 
of bricks to achieve quorum, the client becomes read-only. So if the 
client on hyp02 can see itself and 01, it shouldn't be affected.

But it was, and I only broke hyp03.




[root@hyp01 ~]# gluster volume info VMS

Volume Name: VMS
Type: Replicate
Volume ID: 9e6657e7-8520-4720-ba9d-78b14a86c8ca
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.99.50.20:/brick/VMS
Brick2: 10.99.50.21:/brick/VMS
Brick3: 10.99.50.22:/brick/VMS
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: on
user.cifs: disable
auth.allow: *
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server


I see that you have enabled server-quorum too. Since you blocked 
hyp03, the if the glusterd on that node cannot  see the other 2 nodes 
due to iptable rules, it would kill all brick processes. See the 7 
How To Test  section in 
http://www.gluster.org/community/documentation/index.php/Features/Server-quorum 
to get a better idea of server-quorum.


Yes but it should only kill the bricks on hyp03, right? So then why does 
the VM on hyp02 die? I don't like the fact that a problem on any one of 
the hosts can bring down any VM on any host.


--
Sander
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt/Gluster

2015-08-21 Thread Sander Hoentjen



On 08/21/2015 02:21 PM, Ravishankar N wrote:



On 08/21/2015 04:32 PM, Sander Hoentjen wrote:



On 08/21/2015 11:30 AM, Ravishankar N wrote:



On 08/21/2015 01:21 PM, Sander Hoentjen wrote:



On 08/21/2015 09:28 AM, Ravishankar N wrote:



On 08/20/2015 02:14 PM, Sander Hoentjen wrote:



On 08/19/2015 09:04 AM, Ravishankar N wrote:



On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote:

+ Ravi from gluster.

Regards,
Ramesh

- Original Message -
From: Sander Hoentjen san...@hoentjen.eu
To: users@ovirt.org
Sent: Tuesday, August 18, 2015 3:30:35 PM
Subject: [ovirt-users] Ovirt/Gluster

Hi,

We are looking for some easy to manage self contained VM 
hosting. Ovirt
with GlusterFS seems to fit that bill perfectly. I installed it 
and then
starting kicking the tires. First results looked promising, but 
now I

can get a VM to pause indefinitely fairly easy:

My setup is 3 hosts that are in a Virt and Gluster cluster. 
Gluster is
setup as replica-3. The gluster export is used as the storage 
domain for

the VM's.


Hi,

What version of gluster and ovirt are you using?

glusterfs-3.7.3-1.el7.x86_64
vdsm-4.16.20-0.el7.centos.x86_64
ovirt-engine-3.5.3.1-1.el7.centos.noarch




Now when I start the VM all is good, performance is good enough 
so we
are happy. I then start bonnie++ to generate some load. I have 
a VM
running on host 1, host 2 is SPM and all 3 VM's are seeing some 
network

traffic courtesy of gluster.

Now, for fun, suddenly the network on host3 goes bad (iptables 
-I OUTPUT

-m statistic --mode random --probability 0.75 -j REJECT).
Some time later I see the guest has a small hickup, I'm 
guessing that
is when gluster decides host 3 is not allowed to play anymore. 
No big

deal anyway.
After a while 25% of packages just isn't good enough for Ovirt 
anymore,

so the host will be fenced.


I'm not sure what fencing means w.r.t ovirt and what it actually 
fences. As far is gluster is concerned, since only one node is 
blocked, the VM image should still be accessible by the VM 
running on host1.
Fencing means (at least in this case) that the IPMI of the server 
does a power reset.

After a reboot *sometimes* the VM will be
paused, and even after the gluster self-heal is complete it can 
not be

unpaused, has to be restarted.


Could you provide the gluster mount (fuse?) logs and the brick 
logs of all 3 nodes when the VM is paused? That should give us 
some clue.



Logs are attached. Problem was at around 8:15 - 8:20 UTC
This time however the vm stopped even without a reboot of hyp03



The mount logs  (rhev-data-center-mnt-glusterSD*) are indicating 
frequent disconnects to the bricks  with 
'clnt_ping_timer_expired', 'Client-quorum is not met' and 
'Read-only file system' messages.
client-quorum is enabled by default for replica 3 volumes. So if 
the mount cannot connect to 2 bricks at least, quorum is lost and 
the gluster volume becomes read-only. That seems to be the reason 
why the VMs are pausing.
I'm not sure if the frequent disconnects are due a flaky network 
or the bricks not responding to the mount's ping timer due to it's 
epoll threads busy with I/O (unlikely). Can you also share the 
output of `gluster volume info volname` ?
The frequent disconnects are probably because I intentionally broke 
the network on hyp03 (dropped 75% of outgoing packets). In my 
opinion this should not affect the VM an hyp02. Am I wrong to think 
that?



For client-quorum, If a client (mount)  cannot connect to the number 
of bricks to achieve quorum, the client becomes read-only. So if the 
client on hyp02 can see itself and 01, it shouldn't be affected.

But it was, and I only broke hyp03.


Beats me then. I see [2015-08-18 15:15:27.922998] W [MSGID: 108001] 
[afr-common.c:4043:afr_notify] 0-VMS-replicate-0: Client-quorum is not 
met on hyp02's mount log but the time stamp is earlier than when you 
say you observed the hang (2015-08-20, around 8:15 - 8:20 UTC?).  
(they do occur in that time on hyp03 though).
Yeah that event is from before. For your information: This setup is used 
to test, so I try to break it and hope I don't succeed. Unfortunately I 
succeeded.






[root@hyp01 ~]# gluster volume info VMS

Volume Name: VMS
Type: Replicate
Volume ID: 9e6657e7-8520-4720-ba9d-78b14a86c8ca
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.99.50.20:/brick/VMS
Brick2: 10.99.50.21:/brick/VMS
Brick3: 10.99.50.22:/brick/VMS
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: on
user.cifs: disable
auth.allow: *
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server


I see that you have enabled server-quorum too. Since you blocked 
hyp03, the if the glusterd on that node cannot  see the other 2 
nodes due to iptable rules, it would kill all brick processes. See 
the 7 How To Test  section

Re: [ovirt-users] Ovirt/Gluster

2015-08-21 Thread Ravishankar N



On 08/21/2015 04:32 PM, Sander Hoentjen wrote:



On 08/21/2015 11:30 AM, Ravishankar N wrote:



On 08/21/2015 01:21 PM, Sander Hoentjen wrote:



On 08/21/2015 09:28 AM, Ravishankar N wrote:



On 08/20/2015 02:14 PM, Sander Hoentjen wrote:



On 08/19/2015 09:04 AM, Ravishankar N wrote:



On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote:

+ Ravi from gluster.

Regards,
Ramesh

- Original Message -
From: Sander Hoentjen san...@hoentjen.eu
To: users@ovirt.org
Sent: Tuesday, August 18, 2015 3:30:35 PM
Subject: [ovirt-users] Ovirt/Gluster

Hi,

We are looking for some easy to manage self contained VM 
hosting. Ovirt
with GlusterFS seems to fit that bill perfectly. I installed it 
and then
starting kicking the tires. First results looked promising, but 
now I

can get a VM to pause indefinitely fairly easy:

My setup is 3 hosts that are in a Virt and Gluster cluster. 
Gluster is
setup as replica-3. The gluster export is used as the storage 
domain for

the VM's.


Hi,

What version of gluster and ovirt are you using?

glusterfs-3.7.3-1.el7.x86_64
vdsm-4.16.20-0.el7.centos.x86_64
ovirt-engine-3.5.3.1-1.el7.centos.noarch




Now when I start the VM all is good, performance is good enough 
so we

are happy. I then start bonnie++ to generate some load. I have a VM
running on host 1, host 2 is SPM and all 3 VM's are seeing some 
network

traffic courtesy of gluster.

Now, for fun, suddenly the network on host3 goes bad (iptables 
-I OUTPUT

-m statistic --mode random --probability 0.75 -j REJECT).
Some time later I see the guest has a small hickup, I'm 
guessing that
is when gluster decides host 3 is not allowed to play anymore. 
No big

deal anyway.
After a while 25% of packages just isn't good enough for Ovirt 
anymore,

so the host will be fenced.


I'm not sure what fencing means w.r.t ovirt and what it actually 
fences. As far is gluster is concerned, since only one node is 
blocked, the VM image should still be accessible by the VM 
running on host1.
Fencing means (at least in this case) that the IPMI of the server 
does a power reset.

After a reboot *sometimes* the VM will be
paused, and even after the gluster self-heal is complete it can 
not be

unpaused, has to be restarted.


Could you provide the gluster mount (fuse?) logs and the brick 
logs of all 3 nodes when the VM is paused? That should give us 
some clue.



Logs are attached. Problem was at around 8:15 - 8:20 UTC
This time however the vm stopped even without a reboot of hyp03



The mount logs  (rhev-data-center-mnt-glusterSD*) are indicating 
frequent disconnects to the bricks  with 'clnt_ping_timer_expired', 
'Client-quorum is not met' and 'Read-only file system' messages.
client-quorum is enabled by default for replica 3 volumes. So if 
the mount cannot connect to 2 bricks at least, quorum is lost and 
the gluster volume becomes read-only. That seems to be the reason 
why the VMs are pausing.
I'm not sure if the frequent disconnects are due a flaky network or 
the bricks not responding to the mount's ping timer due to it's 
epoll threads busy with I/O (unlikely). Can you also share the 
output of `gluster volume info volname` ?
The frequent disconnects are probably because I intentionally broke 
the network on hyp03 (dropped 75% of outgoing packets). In my 
opinion this should not affect the VM an hyp02. Am I wrong to think 
that?



For client-quorum, If a client (mount)  cannot connect to the number 
of bricks to achieve quorum, the client becomes read-only. So if the 
client on hyp02 can see itself and 01, it shouldn't be affected.

But it was, and I only broke hyp03.


Beats me then. I see [2015-08-18 15:15:27.922998] W [MSGID: 108001] 
[afr-common.c:4043:afr_notify] 0-VMS-replicate-0: Client-quorum is not 
met on hyp02's mount log but the time stamp is earlier than when you 
say you observed the hang (2015-08-20, around 8:15 - 8:20 UTC?).  (they 
do occur in that time on hyp03 though).






[root@hyp01 ~]# gluster volume info VMS

Volume Name: VMS
Type: Replicate
Volume ID: 9e6657e7-8520-4720-ba9d-78b14a86c8ca
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.99.50.20:/brick/VMS
Brick2: 10.99.50.21:/brick/VMS
Brick3: 10.99.50.22:/brick/VMS
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: on
user.cifs: disable
auth.allow: *
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server


I see that you have enabled server-quorum too. Since you blocked 
hyp03, the if the glusterd on that node cannot  see the other 2 nodes 
due to iptable rules, it would kill all brick processes. See the 7 
How To Test  section in 
http://www.gluster.org/community/documentation/index.php/Features/Server-quorum 
to get a better idea of server-quorum.


Yes but it should only kill the bricks on hyp03, right? So then why 
does the VM

Re: [ovirt-users] Ovirt/Gluster

2015-08-21 Thread Ravishankar N



On 08/21/2015 07:57 PM, Sander Hoentjen wrote:

Maybe I should formulate some clear questions:
1) Am I correct in assuming that an issue on of of 3 gluster nodes 
should not cause downtime for VM's on other nodes?


From what I understand, yes. Maybe the ovirt folks can confirm. I can 
tell you this much for sure: If you create a replica 3 volume using 3 
nodes, mount the volume locally on each node, and bring down one node, 
the mounts from the other 2 nodes *must* have read+write access to the 
volume.




2) What can I/we do to fix the issue I am seeing?
3) Can anybody else reproduce my issue?

I'll try and see if I can.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt/Gluster

2015-08-21 Thread Ravishankar N



On 08/21/2015 01:21 PM, Sander Hoentjen wrote:



On 08/21/2015 09:28 AM, Ravishankar N wrote:



On 08/20/2015 02:14 PM, Sander Hoentjen wrote:



On 08/19/2015 09:04 AM, Ravishankar N wrote:



On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote:

+ Ravi from gluster.

Regards,
Ramesh

- Original Message -
From: Sander Hoentjen san...@hoentjen.eu
To: users@ovirt.org
Sent: Tuesday, August 18, 2015 3:30:35 PM
Subject: [ovirt-users] Ovirt/Gluster

Hi,

We are looking for some easy to manage self contained VM hosting. 
Ovirt
with GlusterFS seems to fit that bill perfectly. I installed it 
and then

starting kicking the tires. First results looked promising, but now I
can get a VM to pause indefinitely fairly easy:

My setup is 3 hosts that are in a Virt and Gluster cluster. 
Gluster is
setup as replica-3. The gluster export is used as the storage 
domain for

the VM's.


Hi,

What version of gluster and ovirt are you using?

glusterfs-3.7.3-1.el7.x86_64
vdsm-4.16.20-0.el7.centos.x86_64
ovirt-engine-3.5.3.1-1.el7.centos.noarch




Now when I start the VM all is good, performance is good enough so we
are happy. I then start bonnie++ to generate some load. I have a VM
running on host 1, host 2 is SPM and all 3 VM's are seeing some 
network

traffic courtesy of gluster.

Now, for fun, suddenly the network on host3 goes bad (iptables -I 
OUTPUT

-m statistic --mode random --probability 0.75 -j REJECT).
Some time later I see the guest has a small hickup, I'm guessing 
that

is when gluster decides host 3 is not allowed to play anymore. No big
deal anyway.
After a while 25% of packages just isn't good enough for Ovirt 
anymore,

so the host will be fenced.


I'm not sure what fencing means w.r.t ovirt and what it actually 
fences. As far is gluster is concerned, since only one node is 
blocked, the VM image should still be accessible by the VM running 
on host1.
Fencing means (at least in this case) that the IPMI of the server 
does a power reset.

After a reboot *sometimes* the VM will be
paused, and even after the gluster self-heal is complete it can 
not be

unpaused, has to be restarted.


Could you provide the gluster mount (fuse?) logs and the brick logs 
of all 3 nodes when the VM is paused? That should give us some clue.



Logs are attached. Problem was at around 8:15 - 8:20 UTC
This time however the vm stopped even without a reboot of hyp03



The mount logs  (rhev-data-center-mnt-glusterSD*) are indicating 
frequent disconnects to the bricks  with 'clnt_ping_timer_expired', 
'Client-quorum is not met' and 'Read-only file system' messages.
client-quorum is enabled by default for replica 3 volumes. So if the 
mount cannot connect to 2 bricks at least, quorum is lost and the 
gluster volume becomes read-only. That seems to be the reason why the 
VMs are pausing.
I'm not sure if the frequent disconnects are due a flaky network or 
the bricks not responding to the mount's ping timer due to it's epoll 
threads busy with I/O (unlikely). Can you also share the output of 
`gluster volume info volname` ?
The frequent disconnects are probably because I intentionally broke 
the network on hyp03 (dropped 75% of outgoing packets). In my opinion 
this should not affect the VM an hyp02. Am I wrong to think that?



For client-quorum, If a client (mount)  cannot connect to the number of 
bricks to achieve quorum, the client becomes read-only. So if the client 
on hyp02 can see itself and 01, it shouldn't be affected.




[root@hyp01 ~]# gluster volume info VMS

Volume Name: VMS
Type: Replicate
Volume ID: 9e6657e7-8520-4720-ba9d-78b14a86c8ca
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.99.50.20:/brick/VMS
Brick2: 10.99.50.21:/brick/VMS
Brick3: 10.99.50.22:/brick/VMS
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: on
user.cifs: disable
auth.allow: *
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server


I see that you have enabled server-quorum too. Since you blocked hyp03, 
the if the glusterd on that node cannot  see the other 2 nodes due to 
iptable rules, it would kill all brick processes. See the 7 How To Test 
 section in 
http://www.gluster.org/community/documentation/index.php/Features/Server-quorum 
to get a better idea of server-quorum.



storage.owner-uid: 36
storage.owner-gid: 36



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt/Gluster

2015-08-21 Thread Ravishankar N



On 08/20/2015 02:14 PM, Sander Hoentjen wrote:



On 08/19/2015 09:04 AM, Ravishankar N wrote:



On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote:

+ Ravi from gluster.

Regards,
Ramesh

- Original Message -
From: Sander Hoentjen san...@hoentjen.eu
To: users@ovirt.org
Sent: Tuesday, August 18, 2015 3:30:35 PM
Subject: [ovirt-users] Ovirt/Gluster

Hi,

We are looking for some easy to manage self contained VM hosting. Ovirt
with GlusterFS seems to fit that bill perfectly. I installed it and 
then

starting kicking the tires. First results looked promising, but now I
can get a VM to pause indefinitely fairly easy:

My setup is 3 hosts that are in a Virt and Gluster cluster. Gluster is
setup as replica-3. The gluster export is used as the storage domain 
for

the VM's.


Hi,

What version of gluster and ovirt are you using?

glusterfs-3.7.3-1.el7.x86_64
vdsm-4.16.20-0.el7.centos.x86_64
ovirt-engine-3.5.3.1-1.el7.centos.noarch




Now when I start the VM all is good, performance is good enough so we
are happy. I then start bonnie++ to generate some load. I have a VM
running on host 1, host 2 is SPM and all 3 VM's are seeing some network
traffic courtesy of gluster.

Now, for fun, suddenly the network on host3 goes bad (iptables -I 
OUTPUT

-m statistic --mode random --probability 0.75 -j REJECT).
Some time later I see the guest has a small hickup, I'm guessing that
is when gluster decides host 3 is not allowed to play anymore. No big
deal anyway.
After a while 25% of packages just isn't good enough for Ovirt anymore,
so the host will be fenced.


I'm not sure what fencing means w.r.t ovirt and what it actually 
fences. As far is gluster is concerned, since only one node is 
blocked, the VM image should still be accessible by the VM running on 
host1.
Fencing means (at least in this case) that the IPMI of the server does 
a power reset.

After a reboot *sometimes* the VM will be
paused, and even after the gluster self-heal is complete it can not be
unpaused, has to be restarted.


Could you provide the gluster mount (fuse?) logs and the brick logs 
of all 3 nodes when the VM is paused? That should give us some clue.



Logs are attached. Problem was at around 8:15 - 8:20 UTC
This time however the vm stopped even without a reboot of hyp03



The mount logs  (rhev-data-center-mnt-glusterSD*) are indicating 
frequent disconnects to the bricks  with 'clnt_ping_timer_expired', 
'Client-quorum is not met' and 'Read-only file system' messages.
client-quorum is enabled by default for replica 3 volumes. So if the 
mount cannot connect to 2 bricks at least, quorum is lost and the 
gluster volume becomes read-only. That seems to be the reason why the 
VMs are pausing.
I'm not sure if the frequent disconnects are due a flaky network or the 
bricks not responding to the mount's ping timer due to it's epoll 
threads busy with I/O (unlikely). Can you also share the output of 
`gluster volume info volname` ?


Regards,
Ravi




Regards,
Ravi


Is there anything I can do to prevent the VM from being paused?

Regards,
Sander

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users






___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt/Gluster

2015-08-21 Thread Sander Hoentjen



On 08/21/2015 09:28 AM, Ravishankar N wrote:



On 08/20/2015 02:14 PM, Sander Hoentjen wrote:



On 08/19/2015 09:04 AM, Ravishankar N wrote:



On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote:

+ Ravi from gluster.

Regards,
Ramesh

- Original Message -
From: Sander Hoentjen san...@hoentjen.eu
To: users@ovirt.org
Sent: Tuesday, August 18, 2015 3:30:35 PM
Subject: [ovirt-users] Ovirt/Gluster

Hi,

We are looking for some easy to manage self contained VM hosting. 
Ovirt
with GlusterFS seems to fit that bill perfectly. I installed it and 
then

starting kicking the tires. First results looked promising, but now I
can get a VM to pause indefinitely fairly easy:

My setup is 3 hosts that are in a Virt and Gluster cluster. Gluster is
setup as replica-3. The gluster export is used as the storage 
domain for

the VM's.


Hi,

What version of gluster and ovirt are you using?

glusterfs-3.7.3-1.el7.x86_64
vdsm-4.16.20-0.el7.centos.x86_64
ovirt-engine-3.5.3.1-1.el7.centos.noarch




Now when I start the VM all is good, performance is good enough so we
are happy. I then start bonnie++ to generate some load. I have a VM
running on host 1, host 2 is SPM and all 3 VM's are seeing some 
network

traffic courtesy of gluster.

Now, for fun, suddenly the network on host3 goes bad (iptables -I 
OUTPUT

-m statistic --mode random --probability 0.75 -j REJECT).
Some time later I see the guest has a small hickup, I'm guessing 
that

is when gluster decides host 3 is not allowed to play anymore. No big
deal anyway.
After a while 25% of packages just isn't good enough for Ovirt 
anymore,

so the host will be fenced.


I'm not sure what fencing means w.r.t ovirt and what it actually 
fences. As far is gluster is concerned, since only one node is 
blocked, the VM image should still be accessible by the VM running 
on host1.
Fencing means (at least in this case) that the IPMI of the server 
does a power reset.

After a reboot *sometimes* the VM will be
paused, and even after the gluster self-heal is complete it can not be
unpaused, has to be restarted.


Could you provide the gluster mount (fuse?) logs and the brick logs 
of all 3 nodes when the VM is paused? That should give us some clue.



Logs are attached. Problem was at around 8:15 - 8:20 UTC
This time however the vm stopped even without a reboot of hyp03



The mount logs  (rhev-data-center-mnt-glusterSD*) are indicating 
frequent disconnects to the bricks  with 'clnt_ping_timer_expired', 
'Client-quorum is not met' and 'Read-only file system' messages.
client-quorum is enabled by default for replica 3 volumes. So if the 
mount cannot connect to 2 bricks at least, quorum is lost and the 
gluster volume becomes read-only. That seems to be the reason why the 
VMs are pausing.
I'm not sure if the frequent disconnects are due a flaky network or 
the bricks not responding to the mount's ping timer due to it's epoll 
threads busy with I/O (unlikely). Can you also share the output of 
`gluster volume info volname` ?
The frequent disconnects are probably because I intentionally broke the 
network on hyp03 (dropped 75% of outgoing packets). In my opinion this 
should not affect the VM an hyp02. Am I wrong to think that?


[root@hyp01 ~]# gluster volume info VMS

Volume Name: VMS
Type: Replicate
Volume ID: 9e6657e7-8520-4720-ba9d-78b14a86c8ca
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.99.50.20:/brick/VMS
Brick2: 10.99.50.21:/brick/VMS
Brick3: 10.99.50.22:/brick/VMS
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: on
user.cifs: disable
auth.allow: *
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36

--
Sander
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt/Gluster

2015-08-19 Thread Ravishankar N



On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote:

+ Ravi from gluster.

Regards,
Ramesh

- Original Message -
From: Sander Hoentjen san...@hoentjen.eu
To: users@ovirt.org
Sent: Tuesday, August 18, 2015 3:30:35 PM
Subject: [ovirt-users] Ovirt/Gluster

Hi,

We are looking for some easy to manage self contained VM hosting. Ovirt
with GlusterFS seems to fit that bill perfectly. I installed it and then
starting kicking the tires. First results looked promising, but now I
can get a VM to pause indefinitely fairly easy:

My setup is 3 hosts that are in a Virt and Gluster cluster. Gluster is
setup as replica-3. The gluster export is used as the storage domain for
the VM's.


Hi,

What version of gluster and ovirt are you using?



Now when I start the VM all is good, performance is good enough so we
are happy. I then start bonnie++ to generate some load. I have a VM
running on host 1, host 2 is SPM and all 3 VM's are seeing some network
traffic courtesy of gluster.

Now, for fun, suddenly the network on host3 goes bad (iptables -I OUTPUT
-m statistic --mode random --probability 0.75 -j REJECT).
Some time later I see the guest has a small hickup, I'm guessing that
is when gluster decides host 3 is not allowed to play anymore. No big
deal anyway.
After a while 25% of packages just isn't good enough for Ovirt anymore,
so the host will be fenced.


I'm not sure what fencing means w.r.t ovirt and what it actually fences. 
As far is gluster is concerned, since only one node is blocked, the VM 
image should still be accessible by the VM running on host1.

After a reboot *sometimes* the VM will be
paused, and even after the gluster self-heal is complete it can not be
unpaused, has to be restarted.


Could you provide the gluster mount (fuse?) logs and the brick logs of 
all 3 nodes when the VM is paused? That should give us some clue.


Regards,
Ravi


Is there anything I can do to prevent the VM from being paused?

Regards,
Sander

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt/Gluster

2015-08-18 Thread Ramesh Nachimuthu
+ Ravi from gluster.

Regards,
Ramesh

- Original Message -
From: Sander Hoentjen san...@hoentjen.eu
To: users@ovirt.org
Sent: Tuesday, August 18, 2015 3:30:35 PM
Subject: [ovirt-users] Ovirt/Gluster

Hi,

We are looking for some easy to manage self contained VM hosting. Ovirt 
with GlusterFS seems to fit that bill perfectly. I installed it and then 
starting kicking the tires. First results looked promising, but now I 
can get a VM to pause indefinitely fairly easy:

My setup is 3 hosts that are in a Virt and Gluster cluster. Gluster is 
setup as replica-3. The gluster export is used as the storage domain for 
the VM's.

Now when I start the VM all is good, performance is good enough so we 
are happy. I then start bonnie++ to generate some load. I have a VM 
running on host 1, host 2 is SPM and all 3 VM's are seeing some network 
traffic courtesy of gluster.

Now, for fun, suddenly the network on host3 goes bad (iptables -I OUTPUT 
-m statistic --mode random --probability 0.75 -j REJECT).
Some time later I see the guest has a small hickup, I'm guessing that 
is when gluster decides host 3 is not allowed to play anymore. No big 
deal anyway.
After a while 25% of packages just isn't good enough for Ovirt anymore, 
so the host will be fenced. After a reboot *sometimes* the VM will be 
paused, and even after the gluster self-heal is complete it can not be 
unpaused, has to be restarted.

Is there anything I can do to prevent the VM from being paused?

Regards,
Sander

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Ovirt/Gluster

2015-08-18 Thread Sander Hoentjen

Hi,

We are looking for some easy to manage self contained VM hosting. Ovirt 
with GlusterFS seems to fit that bill perfectly. I installed it and then 
starting kicking the tires. First results looked promising, but now I 
can get a VM to pause indefinitely fairly easy:


My setup is 3 hosts that are in a Virt and Gluster cluster. Gluster is 
setup as replica-3. The gluster export is used as the storage domain for 
the VM's.


Now when I start the VM all is good, performance is good enough so we 
are happy. I then start bonnie++ to generate some load. I have a VM 
running on host 1, host 2 is SPM and all 3 VM's are seeing some network 
traffic courtesy of gluster.


Now, for fun, suddenly the network on host3 goes bad (iptables -I OUTPUT 
-m statistic --mode random --probability 0.75 -j REJECT).
Some time later I see the guest has a small hickup, I'm guessing that 
is when gluster decides host 3 is not allowed to play anymore. No big 
deal anyway.
After a while 25% of packages just isn't good enough for Ovirt anymore, 
so the host will be fenced. After a reboot *sometimes* the VM will be 
paused, and even after the gluster self-heal is complete it can not be 
unpaused, has to be restarted.


Is there anything I can do to prevent the VM from being paused?

Regards,
Sander

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt+gluster+NFS : storage hicups

2015-08-09 Thread Vered Volansky
On Thu, Aug 6, 2015 at 3:24 PM, Nicolas Ecarnot nico...@ecarnot.net wrote:

 Hi Vered,

 Thanks for answering.

 Le 06/08/2015 11:08, Vered Volansky a écrit :

 But from times to times, there seem to appear a severe hicup which I
 have great difficulties to diagnose.
 The messages in the web gui are not very precise, and not consistent:
 - some tell about some host having network issues, but I can ping it
 from every place it needs to be reached (especially from the SPM and the
 manager)

 Ping doesn't say much as the ssh protocol is the one being used.
 Please try this and report.


 Try what?

ssh instead of ping.


 Please attach logs (engine+vdsm). Log snippets would be helpful (but more
 important are full logs).


 I guess that what will be most useful is to provide logs at or around the
 precise moment s**t is hitting the fan.
 But this is very difficult to forecast :
 There are times I'm trying hard to break it (see dumb user tests
 previously described) and oVirt is doing well at coping with these
 situations.
 And at the opposite, there are times where even zero VM is running, and I
 see the DC appearing as non operational for some minutes.
 So I'll send logs the next time I see such situation.

You can send logs and just point us to the time your problems occurred.
They are rotated, so unless you removed them they should be available to
you at any time. Just make sure they have the time in question and we'll
dig in.



 In general it smells like an ssh/firewall issue.


 On this test setup, I disabled the firewall on my hosts.
 And you're right, it appears I forgot to disable it on one of the three
 hosts.
 On the one I forgot, a brief look at the iptables rules seemed like very
 conform with what I'm use to see as managed by oVirt, nothing weird.
 Anyway, it is now completely disabled.

Good :)



 On host serv-vm-al01, Error: Network error during communication with
 the Host


 This host had no firewall activated...

 --
 Nicolas ECARNOT

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt+gluster+NFS : storage hicups

2015-08-07 Thread Nicolas Ecarnot

Le 07/08/2015 02:17, Donny Davis a écrit :

I have the same setup, and my only issue is at the switch level with
CTDB. The IP does failover, however until I issue a ping from the
interface ctdb is connected to, the storage will not connect.

If i go to the host with the CTDB vip, and issue a ping from the
interface ctdb is on, everything works as described.


I know the problem you're describing, as we faced it in a completely 
different context. But I'm not sure it's oVirt specific.
In our case, what was worse was that our bonding induced similar issues 
when switching (mode 1), and our arp cache was too long. (do YOU have 
bondig also?)
We're still in the process of correcting that, but as I said, it is in a 
different datacenter, so not related to this thread.


--
Nicolas ECARNOT
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt+gluster+NFS : storage hicups

2015-08-06 Thread Nicolas Ecarnot

Hi Tim,

Nice to read that someone else is fighting with a similar setup :)

Le 06/08/2015 16:36, Tim Macy a écrit :

Nicolas,  I have the same setup dedicated physical system running engine
on CentOS 6.6 three hosts running CentOS 7.1 with Gluster and KVM, and
firewall is disabled on all hosts.  I also followed the same documents
to build my environment so I assume they are very similar.  I have on
occasion had the same errors and have also found that ctdb rebalanceip
floating ip is the only way to resolve the problem.


Indeed, when I'm stopping/continuing my ctdb services, the main action 
is a move of the vIP.

So we agree there is definitively something to dig there!
Either directly, either as a side effect.

I must admit I'd be glad to search further before following the second 
part of your answer.



 I intend to
remove ctdb since it is not needed with the configuration we are
running.  CTDB is only needed for hosted engine on a floating NFS mount,


And in a less obvious manner, it also allows to gently remove a host 
from the vIP managers pool, before removing it on the gluster layer.

Not a great advantage, but worth mentionning.


so you should be able change the gluster storage domain mount paths to
localhost:name.  The only thing that has prevented me from making
this change is that my environment is live with running VM's.  Please
let me know if you go this route.


I'm more than interested to choose this way, if :
- I find no time to investigate the floating vIP issue
- I can simplify this setup
- This can lead to increased perf

About the master storage domain path, should I use only pure gluster and 
completely forget about NFS?


--
Nicolas ECARNOT
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt+gluster+NFS : storage hicups

2015-08-06 Thread Donny Davis
I have the same setup, and my only issue is at the switch level with CTDB.
The IP does failover, however until I issue a ping from the interface ctdb
is connected to, the storage will not connect.

If i go to the host with the CTDB vip, and issue a ping from the interface
ctdb is on, everything works as described.

On Thu, Aug 6, 2015 at 5:18 PM, Nicolas Ecarnot nico...@ecarnot.net wrote:

 Hi Tim,

 Nice to read that someone else is fighting with a similar setup :)

 Le 06/08/2015 16:36, Tim Macy a écrit :

 Nicolas,  I have the same setup dedicated physical system running engine
 on CentOS 6.6 three hosts running CentOS 7.1 with Gluster and KVM, and
 firewall is disabled on all hosts.  I also followed the same documents
 to build my environment so I assume they are very similar.  I have on
 occasion had the same errors and have also found that ctdb rebalanceip
 floating ip is the only way to resolve the problem.


 Indeed, when I'm stopping/continuing my ctdb services, the main action is
 a move of the vIP.
 So we agree there is definitively something to dig there!
 Either directly, either as a side effect.

 I must admit I'd be glad to search further before following the second
 part of your answer.

  I intend to
 remove ctdb since it is not needed with the configuration we are
 running.  CTDB is only needed for hosted engine on a floating NFS mount,


 And in a less obvious manner, it also allows to gently remove a host from
 the vIP managers pool, before removing it on the gluster layer.
 Not a great advantage, but worth mentionning.

 so you should be able change the gluster storage domain mount paths to
 localhost:name.  The only thing that has prevented me from making
 this change is that my environment is live with running VM's.  Please
 let me know if you go this route.


 I'm more than interested to choose this way, if :
 - I find no time to investigate the floating vIP issue
 - I can simplify this setup
 - This can lead to increased perf

 About the master storage domain path, should I use only pure gluster and
 completely forget about NFS?


 --
 Nicolas ECARNOT
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users




-- 
Donny Davis
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt+gluster+NFS : storage hicups

2015-08-06 Thread Sahina Bose



On 08/06/2015 02:38 PM, Vered Volansky wrote:


- Original Message -

From: Nicolas Ecarnot nico...@ecarnot.net
To: users@ovirt.org Users@ovirt.org
Sent: Wednesday, August 5, 2015 5:32:38 PM
Subject: [ovirt-users] ovirt+gluster+NFS : storage hicups

Hi,

I used the two links below to setup a test DC :

http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
http://community.redhat.com/blog/2014/11/up-and-running-with-ovirt-3-5-part-two/

The only thing I did different is I did not usea hosted engine, but I
dedicated a solid server for that.
So I have one engine (CentOS 6.6), and 3 hosts (CentOS 7.0)

As in the doc above, my 3 hosts are publishing 300 Go of replicated
gluster storage, above which ctdb is managing a floating virtual ip that
is used by NFS as the master storage domain.

The last point is that the manager is also presenting a NFS storage I'm
using as an export domain.

It took me some time to plug all this setup as it is a bit more
complicated as my other DC with a real SAN and no gluster, but it is
eventually working (I can run VMs, migrate them...)

I have made many severe tests (from a very dumb user point of view :
unplug/replug the power cable of this server - does ctdb floats the vIP?
does gluster self-heals?, does the VM restart?...)
When precisely looking each layer one by one, all seems to be correct :
ctdb is fast at managing the ip, NFS is OK, gluster seems to
reconstruct, fencing eventually worked with the lanplus workaround, and
so on...

But from times to times, there seem to appear a severe hicup which I
have great difficulties to diagnose.
The messages in the web gui are not very precise, and not consistent:
- some tell about some host having network issues, but I can ping it
from every place it needs to be reached (especially from the SPM and the
manager)

Ping doesn't say much as the ssh protocol is the one being used.
Please try this and report.
Please attach logs (engine+vdsm). Log snippets would be helpful (but more 
important are full logs).

In general it smells like an ssh/firewall issue.


On host serv-vm-al01, Error: Network error during communication with
the Host

- some tell that some volume is degraded, when it's not (gluster
commands are showing no issue. Even the oVirt tab about the volumes are
all green)

- Host serv-vm-al03 cannot access the Storage Domain(s) UNKNOWN
attached to the Data Center
Just by waiting a couple of seconds lead to a self heal with no action.

- Repeated Detected change in status of brick
serv-vm-al03:/gluster/data/brick of volume data from DOWN to UP.
but absolutely no action is made on this filesystem.


This is coming from the earlier issue where the Host status was marked 
Down, the engine sees these bricks as being Down as well, and hence the 
state change messages




At this time, zero VM is running in this test datacenter, and no action
is made on the hosts. Though, I see some looping errors coming and
going, and I find no way to diagnose.

Amongst the *actions* that I had the idea to use to solve some issues :
- I've found that trying to force the self-healing, and playing with
gluster commands had no effect
- I've found that playing with gluster adviced actions find /gluster
-exec stat {} \; ... seem to have no either effect
- I've found that forcing ctdb to move the vIp (ctdb stop, ctdb
continue) DID SOLVE most of these issue.
I believe that it's not what ctdb is doing that helps, but maybe one of
its shell hook that is cleaning some troubles?

As this setup is complexe, I don't ask anyone a silver bullet, but maybe
you may know which layer is the most fragile, and which one I should
look at more closely?

--
Nicolas ECARNOT
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt+gluster+NFS : storage hicups

2015-08-06 Thread Nicolas Ecarnot

Hi Vered,

Thanks for answering.

Le 06/08/2015 11:08, Vered Volansky a écrit :


But from times to times, there seem to appear a severe hicup which I
have great difficulties to diagnose.
The messages in the web gui are not very precise, and not consistent:
- some tell about some host having network issues, but I can ping it
from every place it needs to be reached (especially from the SPM and the
manager)

Ping doesn't say much as the ssh protocol is the one being used.
Please try this and report.


Try what?


Please attach logs (engine+vdsm). Log snippets would be helpful (but more 
important are full logs).


I guess that what will be most useful is to provide logs at or around 
the precise moment s**t is hitting the fan.

But this is very difficult to forecast :
There are times I'm trying hard to break it (see dumb user tests 
previously described) and oVirt is doing well at coping with these 
situations.
And at the opposite, there are times where even zero VM is running, and 
I see the DC appearing as non operational for some minutes.

So I'll send logs the next time I see such situation.



In general it smells like an ssh/firewall issue.


On this test setup, I disabled the firewall on my hosts.
And you're right, it appears I forgot to disable it on one of the three 
hosts.
On the one I forgot, a brief look at the iptables rules seemed like very 
conform with what I'm use to see as managed by oVirt, nothing weird. 
Anyway, it is now completely disabled.





On host serv-vm-al01, Error: Network error during communication with
the Host


This host had no firewall activated...

--
Nicolas ECARNOT
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt+gluster+NFS : storage hicups

2015-08-06 Thread Nicolas Ecarnot

Le 06/08/2015 14:26, Sahina Bose a écrit :


- Host serv-vm-al03 cannot access the Storage Domain(s) UNKNOWN
attached to the Data Center
Just by waiting a couple of seconds lead to a self heal with no action.

- Repeated Detected change in status of brick
serv-vm-al03:/gluster/data/brick of volume data from DOWN to UP.
but absolutely no action is made on this filesystem.


This is coming from the earlier issue where the Host status was marked
Down, the engine sees these bricks as being Down as well, and hence the
state change messages


OK : When I read Host ... cannot access the Storage Domain ... attached 
to the Data Center, and according to the setup described earlier (NFS 
on ctdb on gluster), is it correct to translate it into

this host can not NFS-mount the storage domain?

In which case it will help me to narrow down my debugging.

--
Nicolas ECARNOT
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt+gluster+NFS : storage hicups

2015-08-06 Thread Vered Volansky


- Original Message -
 From: Nicolas Ecarnot nico...@ecarnot.net
 To: users@ovirt.org Users@ovirt.org
 Sent: Wednesday, August 5, 2015 5:32:38 PM
 Subject: [ovirt-users] ovirt+gluster+NFS : storage hicups
 
 Hi,
 
 I used the two links below to setup a test DC :
 
 http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
 http://community.redhat.com/blog/2014/11/up-and-running-with-ovirt-3-5-part-two/
 
 The only thing I did different is I did not usea hosted engine, but I
 dedicated a solid server for that.
 So I have one engine (CentOS 6.6), and 3 hosts (CentOS 7.0)
 
 As in the doc above, my 3 hosts are publishing 300 Go of replicated
 gluster storage, above which ctdb is managing a floating virtual ip that
 is used by NFS as the master storage domain.
 
 The last point is that the manager is also presenting a NFS storage I'm
 using as an export domain.
 
 It took me some time to plug all this setup as it is a bit more
 complicated as my other DC with a real SAN and no gluster, but it is
 eventually working (I can run VMs, migrate them...)
 
 I have made many severe tests (from a very dumb user point of view :
 unplug/replug the power cable of this server - does ctdb floats the vIP?
 does gluster self-heals?, does the VM restart?...)
 When precisely looking each layer one by one, all seems to be correct :
 ctdb is fast at managing the ip, NFS is OK, gluster seems to
 reconstruct, fencing eventually worked with the lanplus workaround, and
 so on...
 
 But from times to times, there seem to appear a severe hicup which I
 have great difficulties to diagnose.
 The messages in the web gui are not very precise, and not consistent:
 - some tell about some host having network issues, but I can ping it
 from every place it needs to be reached (especially from the SPM and the
 manager)
Ping doesn't say much as the ssh protocol is the one being used.
Please try this and report.
Please attach logs (engine+vdsm). Log snippets would be helpful (but more 
important are full logs).

In general it smells like an ssh/firewall issue.

 On host serv-vm-al01, Error: Network error during communication with
 the Host
 
 - some tell that some volume is degraded, when it's not (gluster
 commands are showing no issue. Even the oVirt tab about the volumes are
 all green)
 
 - Host serv-vm-al03 cannot access the Storage Domain(s) UNKNOWN
 attached to the Data Center
 Just by waiting a couple of seconds lead to a self heal with no action.
 
 - Repeated Detected change in status of brick
 serv-vm-al03:/gluster/data/brick of volume data from DOWN to UP.
 but absolutely no action is made on this filesystem.
 
 At this time, zero VM is running in this test datacenter, and no action
 is made on the hosts. Though, I see some looping errors coming and
 going, and I find no way to diagnose.
 
 Amongst the *actions* that I had the idea to use to solve some issues :
 - I've found that trying to force the self-healing, and playing with
 gluster commands had no effect
 - I've found that playing with gluster adviced actions find /gluster
 -exec stat {} \; ... seem to have no either effect
 - I've found that forcing ctdb to move the vIp (ctdb stop, ctdb
 continue) DID SOLVE most of these issue.
 I believe that it's not what ctdb is doing that helps, but maybe one of
 its shell hook that is cleaning some troubles?
 
 As this setup is complexe, I don't ask anyone a silver bullet, but maybe
 you may know which layer is the most fragile, and which one I should
 look at more closely?
 
 --
 Nicolas ECARNOT
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] ovirt+gluster+NFS : storage hicups

2015-08-05 Thread Nicolas Ecarnot

Hi,

I used the two links below to setup a test DC :

http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
http://community.redhat.com/blog/2014/11/up-and-running-with-ovirt-3-5-part-two/

The only thing I did different is I did not usea hosted engine, but I 
dedicated a solid server for that.

So I have one engine (CentOS 6.6), and 3 hosts (CentOS 7.0)

As in the doc above, my 3 hosts are publishing 300 Go of replicated 
gluster storage, above which ctdb is managing a floating virtual ip that 
is used by NFS as the master storage domain.


The last point is that the manager is also presenting a NFS storage I'm 
using as an export domain.


It took me some time to plug all this setup as it is a bit more 
complicated as my other DC with a real SAN and no gluster, but it is 
eventually working (I can run VMs, migrate them...)


I have made many severe tests (from a very dumb user point of view : 
unplug/replug the power cable of this server - does ctdb floats the vIP? 
does gluster self-heals?, does the VM restart?...)
When precisely looking each layer one by one, all seems to be correct : 
ctdb is fast at managing the ip, NFS is OK, gluster seems to 
reconstruct, fencing eventually worked with the lanplus workaround, and 
so on...


But from times to times, there seem to appear a severe hicup which I 
have great difficulties to diagnose.

The messages in the web gui are not very precise, and not consistent:
- some tell about some host having network issues, but I can ping it 
from every place it needs to be reached (especially from the SPM and the 
manager)
On host serv-vm-al01, Error: Network error during communication with 
the Host


- some tell that some volume is degraded, when it's not (gluster 
commands are showing no issue. Even the oVirt tab about the volumes are 
all green)


- Host serv-vm-al03 cannot access the Storage Domain(s) UNKNOWN 
attached to the Data Center

Just by waiting a couple of seconds lead to a self heal with no action.

- Repeated Detected change in status of brick 
serv-vm-al03:/gluster/data/brick of volume data from DOWN to UP.

but absolutely no action is made on this filesystem.

At this time, zero VM is running in this test datacenter, and no action 
is made on the hosts. Though, I see some looping errors coming and 
going, and I find no way to diagnose.


Amongst the *actions* that I had the idea to use to solve some issues :
- I've found that trying to force the self-healing, and playing with 
gluster commands had no effect
- I've found that playing with gluster adviced actions find /gluster 
-exec stat {} \; ... seem to have no either effect
- I've found that forcing ctdb to move the vIp (ctdb stop, ctdb 
continue) DID SOLVE most of these issue.
I believe that it's not what ctdb is doing that helps, but maybe one of 
its shell hook that is cleaning some troubles?


As this setup is complexe, I don't ask anyone a silver bullet, but maybe 
you may know which layer is the most fragile, and which one I should 
look at more closely?


--
Nicolas ECARNOT
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt/gluster storage questions for 2-3 node datacenter

2014-08-29 Thread David King
Paul,

Thanks for the response.

You mention that the issue is orphaned files during updates when one node
is down.  However I am less concerned about adding and removing files
because the file server will be predominately VM disks so the file
structure is fairly static.  Those VM files will be quite active however -
will gluster be able to keep track of partial updates to a large file when
one out of two bricks are down?

Right now I am leaning towards using SSD for host local disk - single
brick gluster volumes intended for VMs which are node specific and then 3
way replicas for the higher availability zones which tend to be more read
oriented.   I presume that read-only access only needs to get data from one
of the 3 replicas so that should be reasonably performant.

Thanks,
David



On Thu, Aug 28, 2014 at 6:13 PM, Paul Robert Marino prmari...@gmail.com
wrote:

 I'll try to answer some of these.
 1) its not a serious problem persay the issue is if one node goes down and
 you delete a file while the second node is down it will be restored when
 the second node comes back which may cause orphaned files where as if you
 use 3 servers they will use quorum to figure out what needs to be restored
 or deleted. Further more your read and write performance may suffer
 especially in comparison to having 1 replica of the file with stripping.

 2) see answer 1 and just create the volume with 1 replica and only include
 the URI for bricks on two of the hosts when you create it.

 3) I think so but have never tried it you just have to define it as a
 local storage domain.

 4) well that's a philosophical question. You can theory have two hosted
 engines on separate VMs on two separate physical boxes but if for any
 reason they both go down you will be living in interesting times (as in
 the Chinese curse)

 5) YES! And have more than one.

 -- Sent from my HP Pre3

 --
 On Aug 28, 2014 9:39 AM, David King da...@rexden.us wrote:

 Hi,

 I am currently testing oVirt 3.4.3 + gluster 3.5.2 for use in my
 relatively small home office environment on a single host.  I have 2  Intel
 hosts with SSD and magnetic disk and one AMD host with only magnetic disk.
  I have been trying to figure out the best way to configure my environment
 given my previous attempt with oVirt 3.3 encountered storage issues.

 I will be hosting two types of VMs - VMs that can be tied to a particular
 system (such as 3 node FreeIPA domain or some test VMs), and VMs which
 could migrate between systems for improved uptime.

 The processor issue seems straightforward.  Have a single datacenter with
 two clusters - one for the Intel systems and one for the AMD systems.  Put
 VMs which need to live migrate on the Intel cluster.  If necessary VMs can
 be manually switched between the Intel and AMD cluster with a downtime.

 The Gluster side of the storage seems less clear.  The bulk of the gluster
 with oVirt issues I experienced and have seen on the list seem to be two
 node setups with 2 bricks in the Gluster volume.

 So here are my questions:

 1) Should I avoid 2 brick Gluster volumes?

 2) What is the risk in having the SSD volumes with only 2 bricks given
 that there would be 3 gluster servers?  How should I configure them?

 3) Is there a way to use local storage for a host locked VM other than
 creating a gluster volume with one brick?

 4) Should I avoid using the hosted engine configuration?  I do have an
 external VMware ESXi system to host the engine for now but would like to
 phase it out eventually.

 5) If I do the hosted engine should I make the underlying gluster volume 3
 brick replicated?

 Thanks in advance for any help you can provide.

 -David

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt/gluster storage questions for 2-3 node datacenter

2014-08-29 Thread Vijay Bellur

On 08/29/2014 07:34 PM, David King wrote:

Paul,

Thanks for the response.

You mention that the issue is orphaned files during updates when one
node is down.  However I am less concerned about adding and removing
files because the file server will be predominately VM disks so the file
structure is fairly static.  Those VM files will be quite active however
- will gluster be able to keep track of partial updates to a large file
when one out of two bricks are down?



Yes, gluster only updates regions of the file that need to be 
synchronized during self-healing. More details on this synchronization 
can be found in the self-healing section of afr's design document [1].



Right now I am leaning towards using SSD for host local disk - single
brick gluster volumes intended for VMs which are node specific and then
3 way replicas for the higher availability zones which tend to be more
read oriented.   I presume that read-only access only needs to get data
from one of the 3 replicas so that should be reasonably performant.


Yes, read operations are directed to only one of the replicas.

Regards,
Vijay

[1] https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt/gluster storage questions for 2-3 node datacenter

2014-08-29 Thread Paul Robert Marino
On Fri, Aug 29, 2014 at 12:25 PM, Vijay Bellur vbel...@redhat.com wrote:
 On 08/29/2014 07:34 PM, David King wrote:

 Paul,

 Thanks for the response.

 You mention that the issue is orphaned files during updates when one
 node is down.  However I am less concerned about adding and removing
 files because the file server will be predominately VM disks so the file
 structure is fairly static.  Those VM files will be quite active however
 - will gluster be able to keep track of partial updates to a large file
 when one out of two bricks are down?


 Yes, gluster only updates regions of the file that need to be synchronized
 during self-healing. More details on this synchronization can be found in
 the self-healing section of afr's design document [1].


 Right now I am leaning towards using SSD for host local disk - single
 brick gluster volumes intended for VMs which are node specific and then

I wouldn't use single brick gluster volumes for local disk you don't
need it and it will actually make it more complicated with no real
benefits.

 3 way replicas for the higher availability zones which tend to be more
 read oriented.   I presume that read-only access only needs to get data
 from one of the 3 replicas so that should be reasonably performant.


 Yes, read operations are directed to only one of the replicas.

 Regards,
 Vijay

 [1] https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt/gluster storage questions for 2-3 node datacenter

2014-08-29 Thread David King
Hi Paul,

I would prefer to do a direct mount for local disk.  However I am not certain 
how to configure a single system with both local storage and bluster replicated 
storage.

- The “Configure Local Storage”  option for Hosts wants to make a datacenter 
and cluster for the system.  I presume that’s because oVirt wants to be able to 
mount the storage on all hosts in a datacenter.  

- Configuring a POSIX storage domain with local disk does not work as oVirt 
wants to mount the disk on all systems in the datacenter.  

I suppose my third option would be to put these systems as libvirt VMs and not 
manage them with oVirt.  This is fairly reasonable as I use Foreman for 
provisioning except that I will need to figure out how to make them co-exist.  
Has anyone tried this?

Am I missing other options for local non-replicated disk?  

Thanks,
David

-- 
David King
On August 29, 2014 at 3:01:49 PM, Paul Robert Marino (prmari...@gmail.com) 
wrote:

On Fri, Aug 29, 2014 at 12:25 PM, Vijay Bellur vbel...@redhat.com wrote:  
 On 08/29/2014 07:34 PM, David King wrote:  
  
 Paul,  
  
 Thanks for the response.  
  
 You mention that the issue is orphaned files during updates when one  
 node is down. However I am less concerned about adding and removing  
 files because the file server will be predominately VM disks so the file  
 structure is fairly static. Those VM files will be quite active however  
 - will gluster be able to keep track of partial updates to a large file  
 when one out of two bricks are down?  
  
  
 Yes, gluster only updates regions of the file that need to be synchronized  
 during self-healing. More details on this synchronization can be found in  
 the self-healing section of afr's design document [1].  
  
  
 Right now I am leaning towards using SSD for host local disk - single  
 brick gluster volumes intended for VMs which are node specific and then  

I wouldn't use single brick gluster volumes for local disk you don't  
need it and it will actually make it more complicated with no real  
benefits.  

 3 way replicas for the higher availability zones which tend to be more  
 read oriented. I presume that read-only access only needs to get data  
 from one of the 3 replicas so that should be reasonably performant.  
  
  
 Yes, read operations are directed to only one of the replicas.  
  
 Regards,  
 Vijay  
  
 [1] https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md  
  
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] oVirt/gluster storage questions for 2-3 node datacenter

2014-08-28 Thread David King
Hi,

I am currently testing oVirt 3.4.3 + gluster 3.5.2 for use in my relatively
small home office environment on a single host.  I have 2  Intel hosts with
SSD and magnetic disk and one AMD host with only magnetic disk.  I have
been trying to figure out the best way to configure my environment given my
previous attempt with oVirt 3.3 encountered storage issues.

I will be hosting two types of VMs - VMs that can be tied to a particular
system (such as 3 node FreeIPA domain or some test VMs), and VMs which
could migrate between systems for improved uptime.

The processor issue seems straightforward.  Have a single datacenter with
two clusters - one for the Intel systems and one for the AMD systems.  Put
VMs which need to live migrate on the Intel cluster.  If necessary VMs can
be manually switched between the Intel and AMD cluster with a downtime.

The Gluster side of the storage seems less clear.  The bulk of the gluster
with oVirt issues I experienced and have seen on the list seem to be two
node setups with 2 bricks in the Gluster volume.

So here are my questions:

1) Should I avoid 2 brick Gluster volumes?

2) What is the risk in having the SSD volumes with only 2 bricks given that
there would be 3 gluster servers?  How should I configure them?

3) Is there a way to use local storage for a host locked VM other than
creating a gluster volume with one brick?

4) Should I avoid using the hosted engine configuration?  I do have an
external VMware ESXi system to host the engine for now but would like to
phase it out eventually.

5) If I do the hosted engine should I make the underlying gluster volume 3
brick replicated?

Thanks in advance for any help you can provide.

-David
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt/gluster storage questions for 2-3 node datacenter

2014-08-28 Thread Paul Robert Marino
I'll try to answer some of these.1) its not a serious problem persay the issue is if one node goes down and you delete a file while the second node is down it will be restored when the second node comes back which may cause orphaned files where as if you use 3 servers they will use quorum to figure out what needs to be restored or deleted. Further more your read and write performance may suffer especially in comparison to having 1 replica of the file with stripping.2) see answer 1 and just create the volume with 1 replica and only include the URI for bricks on two of the hosts when you create it.3) I think so but have never tried it you just have to define it as a local storage domain.4) well that's a philosophical question. You can theory have two hosted engines on separate VMs on two separate physical boxes but if for any reason they both go down you will "be living in interesting times" (as in the Chinese curse)5) YES! And have more than one.-- Sent from my HP Pre3On Aug 28, 2014 9:39 AM, David King da...@rexden.us wrote: Hi,I am currently testing oVirt 3.4.3 + gluster 3.5.2 for use in my relatively small home office environment on a single host.  I have 2  Intel hosts with SSD and magnetic disk and one AMD host with only magnetic disk.  I have been trying to figure out the best way to configure my environment given my previous attempt with oVirt 3.3 encountered storage issues.
I will be hosting two types of VMs - VMs that can be tied to a particular system (such as 3 node FreeIPA domain or some test VMs), and VMs which could migrate between systems for improved uptime.
The processor issue seems straightforward.  Have a single datacenter with two clusters - one for the Intel systems and one for the AMD systems.  Put VMs which need to live migrate on the Intel cluster.  If necessary VMs can be manually switched between the Intel and AMD cluster with a downtime.
The Gluster side of the storage seems less clear.  The bulk of the gluster with oVirt issues I experienced and have seen on the list seem to be two node setups with 2 bricks in the Gluster volume.  
So here are my questions:1) Should I avoid 2 brick Gluster volumes?  2) What is the risk in having the SSD volumes with only 2 bricks given that there would be 3 gluster servers?  How should I configure them?
3) Is there a way to use local storage for a host locked VM other than creating a gluster volume with one brick?  4) Should I avoid using the hosted engine configuration?  I do have an external VMware ESXi system to host the engine for now but would like to phase it out eventually.
5) If I do the hosted engine should I make the underlying gluster volume 3 brick replicated?Thanks in advance for any help you can provide.  -David

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt + gluster with host and hosted VM on different subnets

2014-07-09 Thread Sven Kieske


Am 08.07.2014 21:15, schrieb Martin Sivak:
 Hi,
 
 I do not recommend running hosted engine on top of GlusterFS. Not even on top 
 of NFS compatibility layer the GlusterFS provides.
 
 There have been a lot of issues with setups like that. GlusterFS does not 
 ensure that the metadata writes are atomic and visible to all nodes at the 
 same time and that causes serious trouble (the synchronization algorithm 
 relies on the atomicity assumption).
 
 You can use GlusterFS storage domain for VMs, but the hosted engine storage 
 domain needs something else - NFS or iSCSI (available in 3.5).

You should maybe document this with a big fat warning
sign on the wiki and in the release notes and on the
feature pages.

I think many people will just combine what ovirt offers
if there is just a warning hidden somewhere on a mailing
list.


-- 
Mit freundlichen Grüßen / Regards

Sven Kieske

Systemadministrator
Mittwald CM Service GmbH  Co. KG
Königsberger Straße 6
32339 Espelkamp
T: +49-5772-293-100
F: +49-5772-293-333
https://www.mittwald.de
Geschäftsführer: Robert Meyer
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt + gluster with host and hosted VM on different subnets

2014-07-09 Thread Jason Brooks


- Original Message -
 From: Nicolas Ecarnot nico...@ecarnot.net
 To: users@ovirt.org
 Sent: Tuesday, July 8, 2014 12:55:10 PM
 Subject: Re: [ovirt-users] oVirt + gluster with host and hosted VM on 
 different subnets
 
 Le 08/07/2014 21:15, Martin Sivak a écrit :
  Hi,
 
  I do not recommend running hosted engine on top of GlusterFS. Not
  even on top of NFS compatibility layer the GlusterFS provides.
 
 Martin,
 
 It is very disturbing for us, final users, to read the comment above and
 the web page below, both written by Redhat employees.
 
 http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
 

I did note in that post that this is all new and somewhat in flux  
that there are split-brain issues in play. FWIW, I'm actively working
w/ a 3 node hosted ovirt + glusterfs cluster in my lab.

I'll post about problems  solutions I encounter as I encounter them...

Jason

 We are about to try to setup a 3 nodes hosted ovirt + glusterfs cluster,
 but this kind of comment may delay our project.
 
 --
 Nicolas Ecarnot
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt + gluster with host and hosted VM on different subnets

2014-07-08 Thread Sandro Bonazzola
Il 07/07/2014 15:38, Simone Marchioni ha scritto:
 Hi,
 
 I'm trying to install oVirt 3.4 + gluster looking at the following guides:
 
 http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
 http://community.redhat.com/blog/2014/03/up-and-running-with-ovirt-3-4/
 
 It went smooth until the hosted engine VM configuration: I can reach it by 
 VNC and the host IP, but I can't configure the VM network in a way it works.
 Probably the problem is the assumption that the three hosts (2 hosts + the 
 hosted engine) are on the same subnet sharing the same default gateway.
 
 But my server is on OVH with the subnet 94.23.2.0/24 and my failover IPs are 
 on the subnet 46.105.224.236/30, and my hosted engine need to use one IP
 of the last ones.
 
 Anyone installed oVirt in such a configuration and can give me any tip?

Never tested such configuration.
Andrew, something similar at your installation with an additional NIC?

 
 Thanks
 Simone
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


-- 
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt + gluster with host and hosted VM on different subnets

2014-07-08 Thread Andrew Lau
On Wed, Jul 9, 2014 at 12:23 AM, Sandro Bonazzola sbona...@redhat.com wrote:
 Il 07/07/2014 15:38, Simone Marchioni ha scritto:
 Hi,

 I'm trying to install oVirt 3.4 + gluster looking at the following guides:

 http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
 http://community.redhat.com/blog/2014/03/up-and-running-with-ovirt-3-4/

 It went smooth until the hosted engine VM configuration: I can reach it by 
 VNC and the host IP, but I can't configure the VM network in a way it works.
 Probably the problem is the assumption that the three hosts (2 hosts + the 
 hosted engine) are on the same subnet sharing the same default gateway.

 But my server is on OVH with the subnet 94.23.2.0/24 and my failover IPs are 
 on the subnet 46.105.224.236/30, and my hosted engine need to use one IP
 of the last ones.

 Anyone installed oVirt in such a configuration and can give me any tip?

 Never tested such configuration.
 Andrew, something similar at your installation with an additional NIC?

If I understand correctly, you have your hosts on the 94.23.2.0/24
subnet but you need your hosted engine to be accessible as an address
within 46.105.224.236? If that's true, then the easiest way to do it
is simply run your hosted-engine install with the hosted-engine first
on 94.23.2.0/24, you then add another nic to that hosted-engine VM
which'll have the IP address for 46.105.224.237 (?)... alternatively,
you could just use a nic alias?

If you want to add an extra NIC to your hosted-engine to do that above
scenario, here's a snippet from my notes:
(storage_network is a bridge, replace that with ovirtmgmt or another
bridge you may have created)

hosted-engine --set-maintenance --mode=global

# On all installed hosts
nano /etc/ovirt-hosted-engine/vm.conf
# insert under earlier nicModel
# replace macaddress and uuid from above
# increment slot
devices={nicModel:pv,macAddr:00:16:3e:e1:7b:14,linkActive:true,network:storage_network,filter:vdsm-no-mac-spoofing,specParams:{},deviceId:fdb11208-a888-e587-6053-32c9c0361f96,address:{bus:0x00,slot:0x04,
domain:0x, type:pci,function:0x0},device:bridge,type:interface}

hosted-engine --vm-shutdown
hosted-engine --vm-start

hosted-engine --set-maintenance --mode=none


Although, re-reading your question, what do you mean by 'but I can't
configure the VM network in a way it works.' ? Does the setup fail, or
just when you create a VM you don't have any network connectivity..



 Thanks
 Simone
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


 --
 Sandro Bonazzola
 Better technology. Faster innovation. Powered by community collaboration.
 See how it works at redhat.com
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt + gluster with host and hosted VM on different subnets

2014-07-08 Thread Simone Marchioni

Il 08/07/2014 16:47, Andrew Lau ha scritto:

On Wed, Jul 9, 2014 at 12:23 AM, Sandro Bonazzola sbona...@redhat.com wrote:

Il 07/07/2014 15:38, Simone Marchioni ha scritto:

Hi,

I'm trying to install oVirt 3.4 + gluster looking at the following guides:

http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
http://community.redhat.com/blog/2014/03/up-and-running-with-ovirt-3-4/

It went smooth until the hosted engine VM configuration: I can reach it by VNC 
and the host IP, but I can't configure the VM network in a way it works.
Probably the problem is the assumption that the three hosts (2 hosts + the 
hosted engine) are on the same subnet sharing the same default gateway.

But my server is on OVH with the subnet 94.23.2.0/24 and my failover IPs are on 
the subnet 46.105.224.236/30, and my hosted engine need to use one IP
of the last ones.

Anyone installed oVirt in such a configuration and can give me any tip?

Never tested such configuration.
Andrew, something similar at your installation with an additional NIC?

If I understand correctly, you have your hosts on the 94.23.2.0/24
subnet but you need your hosted engine to be accessible as an address
within 46.105.224.236?


Exactly


If that's true, then the easiest way to do it
is simply run your hosted-engine install with the hosted-engine first
on 94.23.2.0/24, you then add another nic to that hosted-engine VM
which'll have the IP address for 46.105.224.237 (?)...


I'll try this


alternatively, you could just use a nic alias?


We made it work with the following changes (on the host machine in the 
subnet 94.23.2.0/24):
- commented out and removed from running configuration ip rules in 
/etc/sysconfig/network-scripts/rule-ovirtmgmt
- commented out and removed from running configuration ip routes in 
/etc/sysconfig/network-scripts/route-ovirtmgmt
- added /etc/sysconfig/network-scripts/ovirtmgmt:0 with the following 
configuration:

DEVICE=ovirtmgmt:238
ONBOOT=yes
DELAY=0
IPADDR=46.105.224.238
NETMASK=255.255.255.252
BOOTPROTO=static
NM_CONTROLLED=no
- enabled ip forwarding in /proc/sys/net/ipv4/ip_forward

After that installing the hosted-engine VM with the following IP stack:

NETMASK=255.255.255.252
IPADDR=46.105.224.237
GATEWAY=46.105.224.238

seems to work ok.


If you want to add an extra NIC to your hosted-engine to do that above
scenario, here's a snippet from my notes:
(storage_network is a bridge, replace that with ovirtmgmt or another
bridge you may have created)

hosted-engine --set-maintenance --mode=global

# On all installed hosts
nano /etc/ovirt-hosted-engine/vm.conf
# insert under earlier nicModel
# replace macaddress and uuid from above
# increment slot
devices={nicModel:pv,macAddr:00:16:3e:e1:7b:14,linkActive:true,network:storage_network,filter:vdsm-no-mac-spoofing,specParams:{},deviceId:fdb11208-a888-e587-6053-32c9c0361f96,address:{bus:0x00,slot:0x04,
domain:0x, type:pci,function:0x0},device:bridge,type:interface}

hosted-engine --vm-shutdown
hosted-engine --vm-start

hosted-engine --set-maintenance --mode=none


Ok: thanks for the advice!


Although, re-reading your question, what do you mean by 'but I can't
configure the VM network in a way it works.' ? Does the setup fail, or
just when you create a VM you don't have any network connectivity..


The setup works ok: it creates the VM and i can login to it with VNC on 
the host IP (94.23.2.X).
I can install CentOS 6.5 as advised. After the reboot I login again by 
VNC and the host IP (94.23.2.X), and configure the IP stack with the 
other subnet (46.105.224.236/30) and after that the VM is isolated, 
unless I do the steps written above.


Thanks for your support!
Simone




Thanks
Simone
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt + gluster with host and hosted VM on different subnets

2014-07-08 Thread Martin Sivak
Hi,

I do not recommend running hosted engine on top of GlusterFS. Not even on top 
of NFS compatibility layer the GlusterFS provides.

There have been a lot of issues with setups like that. GlusterFS does not 
ensure that the metadata writes are atomic and visible to all nodes at the same 
time and that causes serious trouble (the synchronization algorithm relies on 
the atomicity assumption).

You can use GlusterFS storage domain for VMs, but the hosted engine storage 
domain needs something else - NFS or iSCSI (available in 3.5).

--
Martin Sivák
msi...@redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ

- Original Message -
 Il 08/07/2014 16:47, Andrew Lau ha scritto:
  On Wed, Jul 9, 2014 at 12:23 AM, Sandro Bonazzola sbona...@redhat.com
  wrote:
  Il 07/07/2014 15:38, Simone Marchioni ha scritto:
  Hi,
 
  I'm trying to install oVirt 3.4 + gluster looking at the following
  guides:
 
  http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
  http://community.redhat.com/blog/2014/03/up-and-running-with-ovirt-3-4/
 
  It went smooth until the hosted engine VM configuration: I can reach it
  by VNC and the host IP, but I can't configure the VM network in a way it
  works.
  Probably the problem is the assumption that the three hosts (2 hosts +
  the hosted engine) are on the same subnet sharing the same default
  gateway.
 
  But my server is on OVH with the subnet 94.23.2.0/24 and my failover IPs
  are on the subnet 46.105.224.236/30, and my hosted engine need to use
  one IP
  of the last ones.
 
  Anyone installed oVirt in such a configuration and can give me any tip?
  Never tested such configuration.
  Andrew, something similar at your installation with an additional NIC?
  If I understand correctly, you have your hosts on the 94.23.2.0/24
  subnet but you need your hosted engine to be accessible as an address
  within 46.105.224.236?
 
 Exactly
 
  If that's true, then the easiest way to do it
  is simply run your hosted-engine install with the hosted-engine first
  on 94.23.2.0/24, you then add another nic to that hosted-engine VM
  which'll have the IP address for 46.105.224.237 (?)...
 
 I'll try this
 
  alternatively, you could just use a nic alias?
 
 We made it work with the following changes (on the host machine in the
 subnet 94.23.2.0/24):
 - commented out and removed from running configuration ip rules in
 /etc/sysconfig/network-scripts/rule-ovirtmgmt
 - commented out and removed from running configuration ip routes in
 /etc/sysconfig/network-scripts/route-ovirtmgmt
 - added /etc/sysconfig/network-scripts/ovirtmgmt:0 with the following
 configuration:
 DEVICE=ovirtmgmt:238
 ONBOOT=yes
 DELAY=0
 IPADDR=46.105.224.238
 NETMASK=255.255.255.252
 BOOTPROTO=static
 NM_CONTROLLED=no
 - enabled ip forwarding in /proc/sys/net/ipv4/ip_forward
 
 After that installing the hosted-engine VM with the following IP stack:
 
 NETMASK=255.255.255.252
 IPADDR=46.105.224.237
 GATEWAY=46.105.224.238
 
 seems to work ok.
 
  If you want to add an extra NIC to your hosted-engine to do that above
  scenario, here's a snippet from my notes:
  (storage_network is a bridge, replace that with ovirtmgmt or another
  bridge you may have created)
 
  hosted-engine --set-maintenance --mode=global
 
  # On all installed hosts
  nano /etc/ovirt-hosted-engine/vm.conf
  # insert under earlier nicModel
  # replace macaddress and uuid from above
  # increment slot
  devices={nicModel:pv,macAddr:00:16:3e:e1:7b:14,linkActive:true,network:storage_network,filter:vdsm-no-mac-spoofing,specParams:{},deviceId:fdb11208-a888-e587-6053-32c9c0361f96,address:{bus:0x00,slot:0x04,
  domain:0x, type:pci,function:0x0},device:bridge,type:interface}
 
  hosted-engine --vm-shutdown
  hosted-engine --vm-start
 
  hosted-engine --set-maintenance --mode=none
 
 Ok: thanks for the advice!
 
  Although, re-reading your question, what do you mean by 'but I can't
  configure the VM network in a way it works.' ? Does the setup fail, or
  just when you create a VM you don't have any network connectivity..
 
 The setup works ok: it creates the VM and i can login to it with VNC on
 the host IP (94.23.2.X).
 I can install CentOS 6.5 as advised. After the reboot I login again by
 VNC and the host IP (94.23.2.X), and configure the IP stack with the
 other subnet (46.105.224.236/30) and after that the VM is isolated,
 unless I do the steps written above.
 
 Thanks for your support!
 Simone
 
 
  Thanks
  Simone
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
  --
  Sandro Bonazzola
  Better technology. Faster innovation. Powered by community collaboration.
  See how it works at redhat.com
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt + gluster with host and hosted VM on different subnets

2014-07-08 Thread Nicolas Ecarnot

Le 08/07/2014 21:15, Martin Sivak a écrit :

Hi,

I do not recommend running hosted engine on top of GlusterFS. Not
even on top of NFS compatibility layer the GlusterFS provides.


Martin,

It is very disturbing for us, final users, to read the comment above and 
the web page below, both written by Redhat employees.


http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/

We are about to try to setup a 3 nodes hosted ovirt + glusterfs cluster, 
but this kind of comment may delay our project.


--
Nicolas Ecarnot
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVirt + gluster with host and hosted VM on different subnets

2014-07-08 Thread Andrew Lau
Hi Martin,

Is that because of how the replication works? What if you had, the
kernel-nfs server running ontop of the glusternfs share and a virtual
IP to allow the hosted-engine to only access one of the shares.

Thanks,
Andrew


On Wed, Jul 9, 2014 at 5:15 AM, Martin Sivak msi...@redhat.com wrote:
 Hi,

 I do not recommend running hosted engine on top of GlusterFS. Not even on top 
 of NFS compatibility layer the GlusterFS provides.

 There have been a lot of issues with setups like that. GlusterFS does not 
 ensure that the metadata writes are atomic and visible to all nodes at the 
 same time and that causes serious trouble (the synchronization algorithm 
 relies on the atomicity assumption).

 You can use GlusterFS storage domain for VMs, but the hosted engine storage 
 domain needs something else - NFS or iSCSI (available in 3.5).

 --
 Martin Sivák
 msi...@redhat.com
 Red Hat Czech
 RHEV-M SLA / Brno, CZ

 - Original Message -
 Il 08/07/2014 16:47, Andrew Lau ha scritto:
  On Wed, Jul 9, 2014 at 12:23 AM, Sandro Bonazzola sbona...@redhat.com
  wrote:
  Il 07/07/2014 15:38, Simone Marchioni ha scritto:
  Hi,
 
  I'm trying to install oVirt 3.4 + gluster looking at the following
  guides:
 
  http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
  http://community.redhat.com/blog/2014/03/up-and-running-with-ovirt-3-4/
 
  It went smooth until the hosted engine VM configuration: I can reach it
  by VNC and the host IP, but I can't configure the VM network in a way it
  works.
  Probably the problem is the assumption that the three hosts (2 hosts +
  the hosted engine) are on the same subnet sharing the same default
  gateway.
 
  But my server is on OVH with the subnet 94.23.2.0/24 and my failover IPs
  are on the subnet 46.105.224.236/30, and my hosted engine need to use
  one IP
  of the last ones.
 
  Anyone installed oVirt in such a configuration and can give me any tip?
  Never tested such configuration.
  Andrew, something similar at your installation with an additional NIC?
  If I understand correctly, you have your hosts on the 94.23.2.0/24
  subnet but you need your hosted engine to be accessible as an address
  within 46.105.224.236?

 Exactly

  If that's true, then the easiest way to do it
  is simply run your hosted-engine install with the hosted-engine first
  on 94.23.2.0/24, you then add another nic to that hosted-engine VM
  which'll have the IP address for 46.105.224.237 (?)...

 I'll try this

  alternatively, you could just use a nic alias?

 We made it work with the following changes (on the host machine in the
 subnet 94.23.2.0/24):
 - commented out and removed from running configuration ip rules in
 /etc/sysconfig/network-scripts/rule-ovirtmgmt
 - commented out and removed from running configuration ip routes in
 /etc/sysconfig/network-scripts/route-ovirtmgmt
 - added /etc/sysconfig/network-scripts/ovirtmgmt:0 with the following
 configuration:
 DEVICE=ovirtmgmt:238
 ONBOOT=yes
 DELAY=0
 IPADDR=46.105.224.238
 NETMASK=255.255.255.252
 BOOTPROTO=static
 NM_CONTROLLED=no
 - enabled ip forwarding in /proc/sys/net/ipv4/ip_forward

 After that installing the hosted-engine VM with the following IP stack:

 NETMASK=255.255.255.252
 IPADDR=46.105.224.237
 GATEWAY=46.105.224.238

 seems to work ok.

  If you want to add an extra NIC to your hosted-engine to do that above
  scenario, here's a snippet from my notes:
  (storage_network is a bridge, replace that with ovirtmgmt or another
  bridge you may have created)
 
  hosted-engine --set-maintenance --mode=global
 
  # On all installed hosts
  nano /etc/ovirt-hosted-engine/vm.conf
  # insert under earlier nicModel
  # replace macaddress and uuid from above
  # increment slot
  devices={nicModel:pv,macAddr:00:16:3e:e1:7b:14,linkActive:true,network:storage_network,filter:vdsm-no-mac-spoofing,specParams:{},deviceId:fdb11208-a888-e587-6053-32c9c0361f96,address:{bus:0x00,slot:0x04,
  domain:0x, type:pci,function:0x0},device:bridge,type:interface}
 
  hosted-engine --vm-shutdown
  hosted-engine --vm-start
 
  hosted-engine --set-maintenance --mode=none

 Ok: thanks for the advice!

  Although, re-reading your question, what do you mean by 'but I can't
  configure the VM network in a way it works.' ? Does the setup fail, or
  just when you create a VM you don't have any network connectivity..

 The setup works ok: it creates the VM and i can login to it with VNC on
 the host IP (94.23.2.X).
 I can install CentOS 6.5 as advised. After the reboot I login again by
 VNC and the host IP (94.23.2.X), and configure the IP stack with the
 other subnet (46.105.224.236/30) and after that the VM is isolated,
 unless I do the steps written above.

 Thanks for your support!
 Simone

 
  Thanks
  Simone
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
  --
  Sandro Bonazzola
  Better technology. Faster innovation. Powered by community 

[ovirt-users] oVirt + gluster with host and hosted VM on different subnets

2014-07-07 Thread Simone Marchioni

Hi,

I'm trying to install oVirt 3.4 + gluster looking at the following guides:

http://community.redhat.com/blog/2014/05/ovirt-3-4-glusterized/
http://community.redhat.com/blog/2014/03/up-and-running-with-ovirt-3-4/

It went smooth until the hosted engine VM configuration: I can reach it 
by VNC and the host IP, but I can't configure the VM network in a way it 
works.
Probably the problem is the assumption that the three hosts (2 hosts + 
the hosted engine) are on the same subnet sharing the same default gateway.


But my server is on OVH with the subnet 94.23.2.0/24 and my failover IPs 
are on the subnet 46.105.224.236/30, and my hosted engine need to use 
one IP of the last ones.


Anyone installed oVirt in such a configuration and can give me any tip?

Thanks
Simone
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt + GLUSTER

2014-04-22 Thread Jeremiah Jahn
I am.

On Mon, Apr 21, 2014 at 1:50 PM, Joop jvdw...@xs4all.nl wrote:
 Ovirt User wrote:

 Hello,

 anyone are using ovirt with glusterFS as storage domain in production
 environment ?



 Not directly production but almost. Having problems?

 Regards,

 Joop


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt + GLUSTER

2014-04-22 Thread Gabi C
second to that!

3 node acting both as VMs and gluster hosts, engine on separate vm (ESXi)
machine


On Tue, Apr 22, 2014 at 3:53 PM, Jeremiah Jahn 
jerem...@goodinassociates.com wrote:

 I am.

 On Mon, Apr 21, 2014 at 1:50 PM, Joop jvdw...@xs4all.nl wrote:
  Ovirt User wrote:
 
  Hello,
 
  anyone are using ovirt with glusterFS as storage domain in production
  environment ?
 
 
 
  Not directly production but almost. Having problems?
 
  Regards,
 
  Joop
 
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt + GLUSTER

2014-04-22 Thread Jeremiah Jahn
Nothing too complicated.
SL 6x and 5
8 vm hosts running on a hitachi blade symphony.
25 Server guests
15+ desktop windows guests
3x 12TB storage servers (All 1TB based raid 10 SSDs)  10Gbs PTP
between 2 of the servers, with one geolocation server offsite.
Most of the server images/luns are exported through 4Gbps FC cards
from the servers using LIO except the desktop machines with are
attached directly to the gluster storage pool since normal users can
create them.  Various servers use use the gluster system directly for
storing a large number of  documents and providing public access to
the tune of about 300 to 400 thousand requests per day.  We
aggregate, provide public access, efiling, e-payments and case
management software to most of the illinois circuit court system.
Gluster has worked like a champ.

On Tue, Apr 22, 2014 at 8:05 AM, Ovirt User ldrt8...@gmail.com wrote:
 what type of configuration and use case ?

 Il giorno 22/apr/2014, alle ore 14:53, Jeremiah Jahn 
 jerem...@goodinassociates.com ha scritto:

 I am.

 On Mon, Apr 21, 2014 at 1:50 PM, Joop jvdw...@xs4all.nl wrote:
 Ovirt User wrote:

Hello,

 anyone are using ovirt with glusterFS as storage domain in production
 environment ?



 Not directly production but almost. Having problems?

 Regards,

 Joop


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt + GLUSTER

2014-04-22 Thread Jeremiah Jahn
oh, and engine is currently on a separate server, but will probably
move to one of the storage servers.

On Tue, Apr 22, 2014 at 8:43 AM, Jeremiah Jahn
jerem...@goodinassociates.com wrote:
 Nothing too complicated.
 SL 6x and 5
 8 vm hosts running on a hitachi blade symphony.
 25 Server guests
 15+ desktop windows guests
 3x 12TB storage servers (All 1TB based raid 10 SSDs)  10Gbs PTP
 between 2 of the servers, with one geolocation server offsite.
 Most of the server images/luns are exported through 4Gbps FC cards
 from the servers using LIO except the desktop machines with are
 attached directly to the gluster storage pool since normal users can
 create them.  Various servers use use the gluster system directly for
 storing a large number of  documents and providing public access to
 the tune of about 300 to 400 thousand requests per day.  We
 aggregate, provide public access, efiling, e-payments and case
 management software to most of the illinois circuit court system.
 Gluster has worked like a champ.

 On Tue, Apr 22, 2014 at 8:05 AM, Ovirt User ldrt8...@gmail.com wrote:
 what type of configuration and use case ?

 Il giorno 22/apr/2014, alle ore 14:53, Jeremiah Jahn 
 jerem...@goodinassociates.com ha scritto:

 I am.

 On Mon, Apr 21, 2014 at 1:50 PM, Joop jvdw...@xs4all.nl wrote:
 Ovirt User wrote:

Hello,

 anyone are using ovirt with glusterFS as storage domain in production
 environment ?



 Not directly production but almost. Having problems?

 Regards,

 Joop


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Ovirt + GLUSTER

2014-04-21 Thread Ovirt User
Hello,

anyone are using ovirt with glusterFS as storage domain in production 
environment ?

Thanks
Lukas

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Ovirt + GLUSTER

2014-04-21 Thread Joop

Ovirt User wrote:

Hello,

anyone are using ovirt with glusterFS as storage domain in production 
environment ?

  

Not directly production but almost. Having problems?

Regards,

Joop

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users