[ovirt-users] oVirt Hosted Engine Setup fails

2017-03-03 Thread Manuel Luis Aznar
Hello there,

I am having some trouble when deploying an oVirt 4.1 hosted engine
installation.

When I m just to end the installation and the hosted engine setup script is
about to start the Vm engine (appliance) it fails saying "The VM is not
powring up".

If I double check the service vdsmd i get this error all the time:

vdsm root ERROR failed to retrieve Hosted Engine HA info
 Traceback (most recent call last):
 File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in
_getHaInfo
 stats = instance.get_all_stats()
 File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py",
line 102, in get_all_stats
 with broker.connection(self._retries, self._wait):
 File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
 return self.gen.next()
 File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 99, in connection
 self.connect(retries, wait)
 File
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 78, in connect
 raise BrokerConnectionError(error_msg)
BrokerConnectionError: Failed to connect to broker, the number of errors
has exceeded the limit (1)

Did anyone have experimented the same problem?¿? Any hint on How to solved
it?¿? I have tried several times with clean installations and always
getting the same...

The host where I am trying to do the installation have CentOS 7...


Thanks for all in advance
Will be waiting for any hint to see what I am doing wrong...
Manuel Luis Aznar
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Replicated Glusterfs on top of ZFS

2017-03-03 Thread Darrell Budic
Why are you using an arbitrator if all your HW configs are identical? I’d use a 
true replica 3 in this case.

Also in my experience with gluster and vm hosting, the ZIL/slog degrades write 
performance unless it’s a truly dedicated disk. But I have 8 spinners backing 
my ZFS volumes, so trying to share a sata disk wasn’t a good zil. If yours is 
dedicated SAS, keep it, if it’s SATA, try testing without it.

You don’t have compression enabled on your zfs volume, and I’d recommend 
enabling relatime on it. Depending on the amount of RAM in these boxes, you 
probably want to limit your zfs arc size to 8G or so (1/4 total ram or less). 
Gluster just works volumes hard during a rebuild, what’s the problem you’re 
seeing? If it’s affecting your VMs, using shading and tuning client & server 
threads can help avoid interruptions to your VMs while repairs are running. If 
you really need to limit it, you can use cgroups to keep it from hogging all 
the CPU, but it takes longer to heal, of course. There are a couple older posts 
and blogs about it, if you go back a while.


> On Mar 3, 2017, at 9:02 AM, Arman Khalatyan  wrote:
> 
> The problem itself is not the streaming data performance., and also dd zero 
> does not help much in the production zfs running with compression.
> the main problem comes when the gluster is starting to do something with 
> that, it is using xattrs, probably accessing extended attributes inside the 
> zfs is slower than XFS.
> Also primitive find file or ls -l in the (dot)gluster folders takes ages: 
> 
> now I can see that arbiter host has almost 100% cache miss during the 
> rebuild, which is actually natural while he is reading always the new 
> datasets:
> [root@clei26 ~]# arcstat.py 1
> time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c  
> 15:57:31292910029  100 0029  100   685M   31G  
> 15:57:32   530   476 89   476   89 00   457   89   685M   31G  
> 15:57:33   480   467 97   467   97 00   463   97   685M   31G  
> 15:57:34   452   443 98   443   98 00   435   97   685M   31G  
> 15:57:35   582   547 93   547   93 00   536   94   685M   31G  
> 15:57:36   439   417 94   417   94 00   393   94   685M   31G  
> 15:57:38   435   392 90   392   90 00   374   89   685M   31G  
> 15:57:39   364   352 96   352   96 00   352   96   685M   31G  
> 15:57:40   408   375 91   375   91 00   360   91   685M   31G  
> 15:57:41   552   539 97   539   97 00   539   97   685M   31G  
> 
> It looks like we cannot have in the same system performance and reliability :(
> Simply final conclusion is with the single disk+ssd even zfs doesnot help to 
> speedup the glusterfs healing.
> I will stop here:)
> 
> 
> 
> 
> On Fri, Mar 3, 2017 at 3:35 PM, Juan Pablo  > wrote:
> cd to inside the pool path
> then dd if=/dev/zero of=test.tt  bs=1M 
> leave it runing 5/10 minutes.
> do ctrl+c paste result here.
> etc.
> 
> 2017-03-03 11:30 GMT-03:00 Arman Khalatyan  >:
> No, I have one pool made of the one disk and ssd as a cache and log device.
> I have 3 Glusterfs bricks- separate 3 hosts:Volume type Replicate (Arbiter)= 
> replica 2+1!
> That how much you can push into compute nodes(they have only 3 disk slots).
> 
> 
> On Fri, Mar 3, 2017 at 3:19 PM, Juan Pablo  > wrote:
> ok, you have 3 pools, zclei22, logs and cache, thats wrong. you should have 1 
> pool, with zlog+cache if you are looking for performance.
> also, dont mix drives. 
> whats the performance issue you are facing? 
> 
> 
> regards,
> 
> 2017-03-03 11:00 GMT-03:00 Arman Khalatyan  >:
> This is CentOS 7.3 ZoL version 0.6.5.9-1
> 
> [root@clei22 ~]# lsscsi
> 
> [2:0:0:0]diskATA  INTEL SSDSC2CW24 400i  /dev/sda
> 
> [3:0:0:0]diskATA  HGST HUS724040AL AA70  /dev/sdb
> 
> [4:0:0:0]diskATA  WDC WD2002FYPS-0 1G01  /dev/sdc
> 
> 
> 
> [root@clei22 ~]# pvs ;vgs;lvs
> 
>   PV VGFmt  Attr 
> PSize   PFree
> 
>   /dev/mapper/INTEL_SSDSC2CW240A3_CVCV306302RP240CGN vg_cache  lvm2 a--  
> 223.57g 0
> 
>   /dev/sdc2  centos_clei22 lvm2 a--   
>  1.82t 64.00m
> 
>   VG#PV #LV #SN Attr   VSize   VFree
> 
>   centos_clei22   1   3   0 wz--n-   1.82t 64.00m
> 
>   vg_cache1   2   0 wz--n- 223.57g 0
> 
>   LV   VGAttr   LSize   Pool Origin Data%  Meta%  Move 
> Log Cpy%Sync Convert
> 
>   home centos_clei22 -wi-ao   1.74t   
> 
> 
>   root centos_clei22 -wi-ao  50.00g   
>   

Re: [ovirt-users] best way to remove SAN lun

2017-03-03 Thread Nelson Lameiras
Hello Nir,

I think I was not clear in my explanations, let me try again :

we have a oVirt 4.0.5.5 cluster with multiple hosts (centos 7.2).
In this cluster, we added a SAN volume (iscsi) a few months ago directly in GUI
Later we had to remove a DATA volume (SAN iscsi). Below the steps we have taken 
:

1- we migrated all disks outside the volume (oVirt)
2- we put volume on maintenance (oVirt)
3- we detach volume (oVirt)
4- we removed/destroyed volume (oVirt)

In SAN :
5- we put it offline on SAN
6- we delete it from SAN

We thought this would be enough, but later we has a serious incident when log 
partition went full (partially our fault) :
/var/log/messages was continuously logging that it was still trying to reach 
the SAN volumes (we have since taken care of the issue of log space issue => 
more aggressive logrotate, etc)

The real solution was to add two more steps, using shell in ALL hosts :
4a - logout from SAN : iscsiadm -m node --logout -T iqn.
4b - remove iscsi targets : rm -fr /var/lib/iscsi/nodes/iqn.X

This effectively solved our problem, but was fastidious since we had to do it 
manually in all hosts (imagine if we had hundreds of hosts...)

So my question was : shouldn't it be oVirt job to system "logout" and "remove 
iscsi targets" automatically when a volume is removed from oVirt ? maybe not, 
and I'm missing something?

cordialement, regards, 

Nelson LAMEIRAS 
Ingénieur Systèmes et Réseaux / Systems and Networks engineer 
Tel: +33 5 32 09 09 70 
nelson.lamei...@lyra-network.com 

www.lyra-network.com | www.payzen.eu 





Lyra Network, 109 rue de l'innovation, 31670 Labège, FRANCE

- Original Message -
From: "Nir Soffer" 
To: "Nelson Lameiras" 
Cc: "Gianluca Cecchi" , "Adam Litke" 
, "users" 
Sent: Wednesday, February 22, 2017 8:27:26 AM
Subject: Re: [ovirt-users] best way to remove SAN lun

On Wed, Feb 22, 2017 at 9:03 AM, Nelson Lameiras
 wrote:
> Hello,
>
> Not sure it is the same issue, but we have had a "major" issue recently in 
> our production system when removing a ISCSI volume from oVirt, and then 
> removing it from SAN.

What version? OS version?

The order must be:

1. remove the LUN from storage domain
will be available in next 4.1 release. in older versions you have
to remove the storage domain

2. unzone the LUN on the server

3. remove the multipath devices and the paths on the nodes

> The issue being that each host was still trying to access regularly to the 
> SAN volume in spite of not being completely removed from oVirt.

What do you mean by "not being completely removed"?

Who was accessing the volume?

> This led to an massive increase of error logs, which filled completely 
> /var/log partition,

Which log was full with errors?

> which snowballed into crashing vdsm and other nasty consequences.

You should have big enough /var/log to avoid such issues.

>
> Anyway, the solution was to manually logout from SAN (in each host) with 
> iscsiadm and manually remove iscsi targets (again in each host). It was not 
> difficult once the problem was found because currently we only have 3 hosts 
> in this cluster, but I'm wondering what would happen if we had hundreds of 
> hosts ?
>
> Maybe I'm being naive but shouldn't this be "oVirt job" ? Is there a RFE 
> still waiting to be included on this subject or should I write one ?

We have RFE for this here:
https://bugzilla.redhat.com/1310330

But you must understand that ovirt does not control your storage server,
you are responsible to add devices on the storage server, and remove
them. We are only consuming the devices.

Even we we provide a way to remove devices on all hosts, you will have
to remove the device on the storage server before removing it from
hosts. If not, ovirt will find the removed devices again in the next
scsi rescan,
and we do lot of these to support automatic discovery of new devices
or resized devices.

Nir

>
> cordialement, regards,
>
>
> Nelson LAMEIRAS
> Ingénieur Systèmes et Réseaux / Systems and Networks engineer
> Tel: +33 5 32 09 09 70
> nelson.lamei...@lyra-network.com
>
> www.lyra-network.com | www.payzen.eu
>
>
>
>
>
> Lyra Network, 109 rue de l'innovation, 31670 Labège, FRANCE
>
> - Original Message -
> From: "Nir Soffer" 
> To: "Gianluca Cecchi" , "Adam Litke" 
> 
> Cc: "users" 
> Sent: Tuesday, February 21, 2017 6:32:18 PM
> Subject: Re: [ovirt-users] best way to remove SAN lun
>
> On Tue, Feb 21, 2017 at 7:25 PM, Gianluca Cecchi
>  wrote:
>> On Tue, Feb 21, 2017 at 6:10 PM, Nir Soffer  wrote:
>>>
>>> This is caused by active lvs on the remove storage domains that were not
>>> deactivated during the removal. This is a very old known issue.
>>>
>>> 

Re: [ovirt-users] Staring cluster Hosted Engine stuck - failed liveliness check

2017-03-03 Thread Maton, Brett
Ok thanks, I'll give that another go shortly

On 3 March 2017 at 15:24, Yaniv Kaul  wrote:

>
>
> On Fri, Mar 3, 2017 at 5:21 PM Maton, Brett 
> wrote:
>
>> Hi Simone,
>>
>>   I just tried to install the RPM but got a dependency issue:
>>
>> Error: Package: ovirt-hosted-engine-ha-2.1.0.4-1.el7.centos.noarch
>> (/ovirt-hosted-engine-ha-2.1.0.4-1.el7.centos.noarch)
>>Requires: vdsm-client >= 4.18.6
>>
>> I haven't tried to install vdsm-client as I'm not sure what impact that
>> will have on the other pacakges
>>
>
> vdsm-client is our 'next-gen' vdsClient - the CLI client to VDSM. You can
> and should install it in 4.1.
> Y.
>
>
>>
>> On 3 March 2017 at 10:59, Maton, Brett  wrote:
>>
>> Thanks Simone,
>>
>>   I'll give that a go this evening, I'm remote at the moment.
>>
>> Regards,
>> Brett
>>
>> On 3 March 2017 at 10:48, Simone Tiraboschi  wrote:
>>
>>
>>
>> On Fri, Mar 3, 2017 at 11:35 AM, Maton, Brett 
>> wrote:
>>
>> VM Up not responding - Yes that seems to be the case.
>>
>> I did actually try hosted-engine --console
>>
>> hosted-engine --console
>> The engine VM is running on this host
>> Connected to domain HostedEngine
>> Escape character is ^]
>> error: internal error: cannot find character device 
>>
>>
>> This was the result of https://bugzilla.redhat.
>> com/show_bug.cgi?id=1364132
>>
>> Could you please install this (it's from 4.1.1 RC)
>> http://resources.ovirt.org/pub/ovirt-4.1-pre/rpm/el7/
>> noarch/ovirt-hosted-engine-ha-2.1.0.4-1.el7.centos.noarch.rpm to get it
>> fixed.
>> You need then to run:
>> systemctl restart ovirt-ha-agent
>> hosted-engine --vm-shutdown
>> # wait for it
>> hosted-engine --vm-start
>>
>> At this point you have should have the serial console.
>>
>>
>>
>>
>>
>> On 3 March 2017 at 08:39, Simone Tiraboschi  wrote:
>>
>>
>>
>> On Thu, Mar 2, 2017 at 11:42 AM, Maton, Brett 
>> wrote:
>>
>> What is the correct way to shut down a cluster ?
>>
>> I shutdown my 4.1 cluster following this guide
>> https://github.com/rharmonson/richtech/wiki/OSVDC-Series:-
>> oVirt-3.6-Cluster-Shutdown-and-Startup as the NAS hosting the nfs mounts
>> needed rebooting after updates.
>>
>> ( I realise the guide is for 3.6, surely not that different ? )
>>
>>
>> It seams fine.
>>
>>
>>
>> However hosted engine isn't starting now.
>>
>>   failed liveliness check
>>
>>
>> So it seams that the engine VM is up but the engine is not responding.
>>
>>
>>
>> I've tried connecting from the host starting the HE vm to see what's
>> going on with:
>>
>> virsh console HostedEngine
>>
>>I don't know what authentication it requires, tried engine admin
>> details and root os...
>>
>>
>> hosted-engine --console
>>
>>
>>
>> hosted-engine --add-console-password
>> Enter password:
>> no graphics devices configured
>>
>> unsurprisingly, remote-viewer vnc://localhost:5900
>> fails with Unable to connect to the graphic server vnc://localhost:5900
>>
>> What can I do next ?
>>
>>
>> Check the status of the engine.
>>
>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>>
>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Gluster-users] Hot to force glusterfs to use RDMA?

2017-03-03 Thread Deepak Naidu
>> As you can see from my previous email that the RDMA connection tested with 
>> qperf.
I think you have wrong command. Your testing TCP & not RDMA. Also check if you 
have RDMA & IB modules loaded on your hosts.
root@clei26 ~]# qperf clei22.vib  tcp_bw tcp_lat
tcp_bw:
bw  =  475 MB/sec
tcp_lat:
latency  =  52.8 us
[root@clei26 ~]#

Please run below command to test RDMA

[root@storageN2 ~]# qperf storageN1 ud_lat ud_bw
ud_lat:
latency  =  7.51 us
ud_bw:
send_bw  =  9.21 GB/sec
recv_bw  =  9.21 GB/sec
[root@sc-sdgx-202 ~]#

Read qperf man pages for more info.

* To run a TCP bandwidth and latency test:
qperf myserver tcp_bw tcp_lat
* To run a UDP latency test and then cause the server to terminate:
qperf myserver udp_lat quit
* To measure the RDMA UD latency and bandwidth:
qperf myserver ud_lat ud_bw
* To measure RDMA UC bi-directional bandwidth:
qperf myserver rc_bi_bw
* To get a range of TCP latencies with a message size from 1 to 64K
qperf myserver -oo msg_size:1:64K:*2 -vu tcp_lat


Check if you have RDMA & IB modules loaded

lsmod | grep -i ib

lsmod | grep -i rdma



--
Deepak



From: Arman Khalatyan [mailto:arm2...@gmail.com]
Sent: Thursday, March 02, 2017 10:57 PM
To: Deepak Naidu
Cc: Rafi Kavungal Chundattu Parambil; gluster-us...@gluster.org; users; Sahina 
Bose
Subject: RE: [Gluster-users] [ovirt-users] Hot to force glusterfs to use RDMA?

Dear Deepak, thank you for the hints, which gluster are you using?
As you can see from my previous email that the RDMA connection tested with 
qperf. It is working as expected. In my case the clients are servers as well, 
they are hosts for the ovirt. Disabling selinux is nor recommended by ovirt, 
but i will give a try.

Am 03.03.2017 7:50 vorm. schrieb "Deepak Naidu" 
>:
I have been testing glusterfs over RDMA & below is the command I use. Reading 
up the logs, it looks like your IB(InfiniBand) device is not being initialized. 
I am not sure if u have an issue on the client IB or the storage server IB. 
Also have you configured ur IB devices correctly. I am using IPoIB.
Can you check your firewall, disable selinux, I think, you might have checked 
it already ?

mount -t glusterfs -o transport=rdma storageN1:/vol0 /mnt/vol0



• The below error seems if you have issue starting your volume. I had 
issue, when my transport was set to tcp,rdma. I had to force start my volume. 
If I had set it only to tcp on the volume, the volume would start easily.

[2017-03-02 11:49:47.829391] E [MSGID: 114022] [client.c:2530:client_init_rpc] 
0-GluReplica-client-2: failed to initialize RPC
[2017-03-02 11:49:47.829413] E [MSGID: 101019] [xlator.c:433:xlator_init] 
0-GluReplica-client-2: Initialization of volume 'GluReplica-client-2' failed, 
review your volfile again
[2017-03-02 11:49:47.829425] E [MSGID: 101066] 
[graph.c:324:glusterfs_graph_init] 0-GluReplica-client-2: initializing 
translator failed
[2017-03-02 11:49:47.829436] E [MSGID: 101176] 
[graph.c:673:glusterfs_graph_activate] 0-graph: init failed


• The below error seems if you have issue with IB device. If not 
configured properly.

[2017-03-02 11:49:47.828996] W [MSGID: 103071] 
[rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel 
creation failed [No such device]
[2017-03-02 11:49:47.829067] W [MSGID: 103055] [rdma.c:4896:init] 
0-GluReplica-client-2: Failed to initialize IB Device
[2017-03-02 11:49:47.829080] W [rpc-transport.c:354:rpc_transport_load] 
0-rpc-transport: 'rdma' initialization failed


--
Deepak


From: 
gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org]
 On Behalf Of Sahina Bose
Sent: Thursday, March 02, 2017 10:26 PM
To: Arman Khalatyan; 
gluster-us...@gluster.org; Rafi Kavungal 
Chundattu Parambil
Cc: users
Subject: Re: [Gluster-users] [ovirt-users] Hot to force glusterfs to use RDMA?

[Adding gluster users to help with error]

[2017-03-02 11:49:47.828996] W [MSGID: 103071] 
[rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel 
creation failed [No such device]

On Thu, Mar 2, 2017 at 5:36 PM, Arman Khalatyan 
> wrote:
BTW RDMA is working as expected:
root@clei26 ~]# qperf clei22.vib  tcp_bw tcp_lat
tcp_bw:
bw  =  475 MB/sec
tcp_lat:
latency  =  52.8 us
[root@clei26 ~]#
thank you beforehand.
Arman.

On Thu, Mar 2, 2017 at 12:54 PM, Arman Khalatyan 
> wrote:
just for reference:
 gluster volume info

Volume Name: GluReplica
Type: Replicate
Volume ID: ee686dfe-203a-4caa-a691-26353460cc48
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp,rdma
Bricks:
Brick1: 10.10.10.44:/zclei22/01/glu
Brick2: 10.10.10.42:/zclei21/01/glu
Brick3: 

Re: [ovirt-users] [Gluster-users] Hot to force glusterfs to use RDMA?

2017-03-03 Thread Mohammed Rafi K C
Hi Arman,


On 03/03/2017 12:27 PM, Arman Khalatyan wrote:
> Dear Deepak, thank you for the hints, which gluster are you using?
> As you can see from my previous email that the RDMA connection tested
> with qperf. It is working as expected. In my case the clients are
> servers as well, they are hosts for the ovirt. Disabling selinux is
> nor recommended by ovirt, but i will give a try.

Gluster use IPoIB as mentioned by Deepak. So qperf with default options
may not be a good choice to test IPoIB. Because it will fallback to any
link available between the mentioned server and client. You can force
this behavior, please refer the link [1].

In addition to that, Can you please provide your gluster version,
glusterd logs and brick logs. Because since it complains about absence
of the device, mostly it could be a set up issue. Otherwise it could
have been a permission denied error, I'm not completely ruling out the
possibility of selinux preventing the creation if IB channel. We had
this issue in rhel which is fixed in 7.2 [2] .


[1] :
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Testing_an_RDMA_network_after_IPoIB_is_configured.html
[2] : https://bugzilla.redhat.com/show_bug.cgi?id=1386620

Regards
Rafi KC


>
> Am 03.03.2017 7:50 vorm. schrieb "Deepak Naidu"  >:
>
> I have been testing glusterfs over RDMA & below is the command I
> use. Reading up the logs, it looks like your IB(InfiniBand) device
> is not being initialized. I am not sure if u have an issue on the
> client IB or the storage server IB. Also have you configured ur IB
> devices correctly. I am using IPoIB.
>
> Can you check your firewall, disable selinux, I think, you might
> have checked it already ?
>
>  
>
> *mount -t glusterfs -o transport=rdma storageN1:/vol0 /mnt/vol0*
>
>  
>
>  
>
> · *The below error seems if you have issue starting your
> volume. I had issue, when my transport was set to tcp,rdma. I had
> to force start my volume. If I had set it only to tcp on the
> volume, the volume would start easily.*
>
>  
>
> [2017-03-02 11:49:47.829391] E [MSGID: 114022]
> [client.c:2530:client_init_rpc] 0-GluReplica-client-2: failed to
> initialize RPC
> [2017-03-02 11:49:47.829413] E [MSGID: 101019]
> [xlator.c:433:xlator_init] 0-GluReplica-client-2: Initialization
> of volume 'GluReplica-client-2' failed, review your volfile again
> [2017-03-02 11:49:47.829425] E [MSGID: 101066]
> [graph.c:324:glusterfs_graph_init] 0-GluReplica-client-2:
> initializing translator failed
> [2017-03-02 11:49:47.829436] E [MSGID: 101176]
> [graph.c:673:glusterfs_graph_activate] 0-graph: init failed
>
>  
>
> · *The below error seems if you have issue with IB device.
> If not configured properly.*
>
>  
>
> [2017-03-02 11:49:47.828996] W [MSGID: 103071]
> [rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm
> event channel creation failed [No such device]
> [2017-03-02 11:49:47.829067] W [MSGID: 103055] [rdma.c:4896:init]
> 0-GluReplica-client-2: Failed to initialize IB Device
> [2017-03-02 11:49:47.829080] W
> [rpc-transport.c:354:rpc_transport_load] 0-rpc-transport: 'rdma'
> initialization failed
>
>  
>
>  
>
> --
>
> Deepak
>
>  
>
>  
>
> *From:*gluster-users-boun...@gluster.org
> 
> [mailto:gluster-users-boun...@gluster.org
> ] *On Behalf Of *Sahina Bose
> *Sent:* Thursday, March 02, 2017 10:26 PM
> *To:* Arman Khalatyan; gluster-us...@gluster.org
> ; Rafi Kavungal Chundattu Parambil
> *Cc:* users
> *Subject:* Re: [Gluster-users] [ovirt-users] Hot to force
> glusterfs to use RDMA?
>
>  
>
> [Adding gluster users to help with error]
>
> [2017-03-02 11:49:47.828996] W [MSGID: 103071]
> [rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm
> event channel creation failed [No such device]
>
>  
>
> On Thu, Mar 2, 2017 at 5:36 PM, Arman Khalatyan  > wrote:
>
> BTW RDMA is working as expected:
> root@clei26 ~]# qperf clei22.vib  tcp_bw tcp_lat
> tcp_bw:
> bw  =  475 MB/sec
> tcp_lat:
> latency  =  52.8 us
> [root@clei26 ~]#
>
> thank you beforehand.
>
> Arman.
>
>  
>
> On Thu, Mar 2, 2017 at 12:54 PM, Arman Khalatyan
> > wrote:
>
> just for reference:
>  gluster volume info
>  
> Volume Name: GluReplica
> Type: Replicate
> Volume ID: ee686dfe-203a-4caa-a691-26353460cc48
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp,rdma
> 

Re: [ovirt-users] [Gluster-users] Hot to force glusterfs to use RDMA?

2017-03-03 Thread Deepak Naidu
I have been testing glusterfs over RDMA & below is the command I use. Reading 
up the logs, it looks like your IB(InfiniBand) device is not being initialized. 
I am not sure if u have an issue on the client IB or the storage server IB. 
Also have you configured ur IB devices correctly. I am using IPoIB.
Can you check your firewall, disable selinux, I think, you might have checked 
it already ?

mount -t glusterfs -o transport=rdma storageN1:/vol0 /mnt/vol0



· The below error seems if you have issue starting your volume. I had 
issue, when my transport was set to tcp,rdma. I had to force start my volume. 
If I had set it only to tcp on the volume, the volume would start easily.

[2017-03-02 11:49:47.829391] E [MSGID: 114022] [client.c:2530:client_init_rpc] 
0-GluReplica-client-2: failed to initialize RPC
[2017-03-02 11:49:47.829413] E [MSGID: 101019] [xlator.c:433:xlator_init] 
0-GluReplica-client-2: Initialization of volume 'GluReplica-client-2' failed, 
review your volfile again
[2017-03-02 11:49:47.829425] E [MSGID: 101066] 
[graph.c:324:glusterfs_graph_init] 0-GluReplica-client-2: initializing 
translator failed
[2017-03-02 11:49:47.829436] E [MSGID: 101176] 
[graph.c:673:glusterfs_graph_activate] 0-graph: init failed



· The below error seems if you have issue with IB device. If not 
configured properly.

[2017-03-02 11:49:47.828996] W [MSGID: 103071] 
[rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel 
creation failed [No such device]
[2017-03-02 11:49:47.829067] W [MSGID: 103055] [rdma.c:4896:init] 
0-GluReplica-client-2: Failed to initialize IB Device
[2017-03-02 11:49:47.829080] W [rpc-transport.c:354:rpc_transport_load] 
0-rpc-transport: 'rdma' initialization failed



--
Deepak


From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Sahina Bose
Sent: Thursday, March 02, 2017 10:26 PM
To: Arman Khalatyan; gluster-us...@gluster.org; Rafi Kavungal Chundattu Parambil
Cc: users
Subject: Re: [Gluster-users] [ovirt-users] Hot to force glusterfs to use RDMA?

[Adding gluster users to help with error]

[2017-03-02 11:49:47.828996] W [MSGID: 103071] 
[rdma.c:4589:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel 
creation failed [No such device]

On Thu, Mar 2, 2017 at 5:36 PM, Arman Khalatyan 
> wrote:
BTW RDMA is working as expected:
root@clei26 ~]# qperf clei22.vib  tcp_bw tcp_lat
tcp_bw:
bw  =  475 MB/sec
tcp_lat:
latency  =  52.8 us
[root@clei26 ~]#
thank you beforehand.
Arman.

On Thu, Mar 2, 2017 at 12:54 PM, Arman Khalatyan 
> wrote:
just for reference:
 gluster volume info

Volume Name: GluReplica
Type: Replicate
Volume ID: ee686dfe-203a-4caa-a691-26353460cc48
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp,rdma
Bricks:
Brick1: 10.10.10.44:/zclei22/01/glu
Brick2: 10.10.10.42:/zclei21/01/glu
Brick3: 10.10.10.41:/zclei26/01/glu (arbiter)
Options Reconfigured:
network.ping-timeout: 30
server.allow-insecure: on
storage.owner-gid: 36
storage.owner-uid: 36
cluster.data-self-heal-algorithm: full
features.shard: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on
nfs.disable: on



[root@clei21 ~]# gluster volume status
Status of volume: GluReplica
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick 10.10.10.44:/zclei22/01/glu   49158 49159  Y   15870
Brick 10.10.10.42:/zclei21/01/glu   49156 49157  Y   17473
Brick 10.10.10.41:/zclei26/01/glu   49153 49154  Y   18897
Self-heal Daemon on localhost   N/A   N/AY   17502
Self-heal Daemon on 10.10.10.41 N/A   N/AY   13353
Self-heal Daemon on 10.10.10.44 N/A   N/AY   32745

Task Status of Volume GluReplica
--
There are no active volume tasks

On Thu, Mar 2, 2017 at 12:52 PM, Arman Khalatyan 
> wrote:
I am not able to mount with RDMA over cli
Are there some volfile parameters needs to be tuned?
/usr/bin/mount  -t glusterfs  -o 
backup-volfile-servers=10.10.10.44:10.10.10.42:10.10.10.41,transport=rdma 
10.10.10.44:/GluReplica /mnt

[2017-03-02 11:49:47.795511] I [MSGID: 100030] [glusterfsd.c:2454:main] 
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.9 (args: 
/usr/sbin/glusterfs --volfile-server=10.10.10.44 --volfile-server=10.10.10.44 
--volfile-server=10.10.10.42 --volfile-server=10.10.10.41 
--volfile-server-transport=rdma 

Re: [ovirt-users] Staring cluster Hosted Engine stuck - failed liveliness check

2017-03-03 Thread Yaniv Kaul
On Fri, Mar 3, 2017 at 5:21 PM Maton, Brett 
wrote:

> Hi Simone,
>
>   I just tried to install the RPM but got a dependency issue:
>
> Error: Package: ovirt-hosted-engine-ha-2.1.0.4-1.el7.centos.noarch
> (/ovirt-hosted-engine-ha-2.1.0.4-1.el7.centos.noarch)
>Requires: vdsm-client >= 4.18.6
>
> I haven't tried to install vdsm-client as I'm not sure what impact that
> will have on the other pacakges
>

vdsm-client is our 'next-gen' vdsClient - the CLI client to VDSM. You can
and should install it in 4.1.
Y.


>
> On 3 March 2017 at 10:59, Maton, Brett  wrote:
>
> Thanks Simone,
>
>   I'll give that a go this evening, I'm remote at the moment.
>
> Regards,
> Brett
>
> On 3 March 2017 at 10:48, Simone Tiraboschi  wrote:
>
>
>
> On Fri, Mar 3, 2017 at 11:35 AM, Maton, Brett 
> wrote:
>
> VM Up not responding - Yes that seems to be the case.
>
> I did actually try hosted-engine --console
>
> hosted-engine --console
> The engine VM is running on this host
> Connected to domain HostedEngine
> Escape character is ^]
> error: internal error: cannot find character device 
>
>
> This was the result of https://bugzilla.redhat.com/show_bug.cgi?id=1364132
>
> Could you please install this (it's from 4.1.1 RC)
> http://resources.ovirt.org/pub/ovirt-4.1-pre/rpm/el7/noarch/ovirt-hosted-engine-ha-2.1.0.4-1.el7.centos.noarch.rpm
> to get it fixed.
> You need then to run:
> systemctl restart ovirt-ha-agent
> hosted-engine --vm-shutdown
> # wait for it
> hosted-engine --vm-start
>
> At this point you have should have the serial console.
>
>
>
>
>
> On 3 March 2017 at 08:39, Simone Tiraboschi  wrote:
>
>
>
> On Thu, Mar 2, 2017 at 11:42 AM, Maton, Brett 
> wrote:
>
> What is the correct way to shut down a cluster ?
>
> I shutdown my 4.1 cluster following this guide
> https://github.com/rharmonson/richtech/wiki/OSVDC-Series:-oVirt-3.6-Cluster-Shutdown-and-Startup
> as the NAS hosting the nfs mounts needed rebooting after updates.
>
> ( I realise the guide is for 3.6, surely not that different ? )
>
>
> It seams fine.
>
>
>
> However hosted engine isn't starting now.
>
>   failed liveliness check
>
>
> So it seams that the engine VM is up but the engine is not responding.
>
>
>
> I've tried connecting from the host starting the HE vm to see what's going
> on with:
>
> virsh console HostedEngine
>
>I don't know what authentication it requires, tried engine admin
> details and root os...
>
>
> hosted-engine --console
>
>
>
> hosted-engine --add-console-password
> Enter password:
> no graphics devices configured
>
> unsurprisingly, remote-viewer vnc://localhost:5900
> fails with Unable to connect to the graphic server vnc://localhost:5900
>
> What can I do next ?
>
>
> Check the status of the engine.
>
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
>
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Staring cluster Hosted Engine stuck - failed liveliness check

2017-03-03 Thread Maton, Brett
Hi Simone,

  I just tried to install the RPM but got a dependency issue:

Error: Package: ovirt-hosted-engine-ha-2.1.0.4-1.el7.centos.noarch
(/ovirt-hosted-engine-ha-2.1.0.4-1.el7.centos.noarch)
   Requires: vdsm-client >= 4.18.6

I haven't tried to install vdsm-client as I'm not sure what impact that
will have on the other pacakges

On 3 March 2017 at 10:59, Maton, Brett  wrote:

> Thanks Simone,
>
>   I'll give that a go this evening, I'm remote at the moment.
>
> Regards,
> Brett
>
> On 3 March 2017 at 10:48, Simone Tiraboschi  wrote:
>
>>
>>
>> On Fri, Mar 3, 2017 at 11:35 AM, Maton, Brett 
>> wrote:
>>
>>> VM Up not responding - Yes that seems to be the case.
>>>
>>> I did actually try hosted-engine --console
>>>
>>> hosted-engine --console
>>> The engine VM is running on this host
>>> Connected to domain HostedEngine
>>> Escape character is ^]
>>> error: internal error: cannot find character device 
>>>
>>
>> This was the result of https://bugzilla.redhat.com
>> /show_bug.cgi?id=1364132
>>
>> Could you please install this (it's from 4.1.1 RC)
>> http://resources.ovirt.org/pub/ovirt-4.1-pre/rpm/el7/noarch/
>> ovirt-hosted-engine-ha-2.1.0.4-1.el7.centos.noarch.rpm to get it fixed.
>> You need then to run:
>> systemctl restart ovirt-ha-agent
>> hosted-engine --vm-shutdown
>> # wait for it
>> hosted-engine --vm-start
>>
>> At this point you have should have the serial console.
>>
>>
>>>
>>>
>>>
>>> On 3 March 2017 at 08:39, Simone Tiraboschi  wrote:
>>>


 On Thu, Mar 2, 2017 at 11:42 AM, Maton, Brett  wrote:

> What is the correct way to shut down a cluster ?
>
> I shutdown my 4.1 cluster following this guide
> https://github.com/rharmonson/richtech/wiki/OSVDC-Series:-oV
> irt-3.6-Cluster-Shutdown-and-Startup as the NAS hosting the nfs
> mounts needed rebooting after updates.
>
> ( I realise the guide is for 3.6, surely not that different ? )
>

 It seams fine.


>
> However hosted engine isn't starting now.
>
>   failed liveliness check
>

 So it seams that the engine VM is up but the engine is not responding.


>
> I've tried connecting from the host starting the HE vm to see what's
> going on with:
>
> virsh console HostedEngine
>
>I don't know what authentication it requires, tried engine admin
> details and root os...
>

 hosted-engine --console


>
> hosted-engine --add-console-password
> Enter password:
> no graphics devices configured
>
> unsurprisingly, remote-viewer vnc://localhost:5900
> fails with Unable to connect to the graphic server
> vnc://localhost:5900
>
> What can I do next ?
>

 Check the status of the engine.


>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>

>>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Replicated Glusterfs on top of ZFS

2017-03-03 Thread Juan Pablo
cd to inside the pool path
then dd if=/dev/zero of=test.tt bs=1M
leave it runing 5/10 minutes.
do ctrl+c paste result here.
etc.

2017-03-03 11:30 GMT-03:00 Arman Khalatyan :

> No, I have one pool made of the one disk and ssd as a cache and log device.
> I have 3 Glusterfs bricks- separate 3 hosts:Volume type Replicate
> (Arbiter)= replica 2+1!
> That how much you can push into compute nodes(they have only 3 disk slots).
>
>
> On Fri, Mar 3, 2017 at 3:19 PM, Juan Pablo 
> wrote:
>
>> ok, you have 3 pools, zclei22, logs and cache, thats wrong. you should
>> have 1 pool, with zlog+cache if you are looking for performance.
>> also, dont mix drives.
>> whats the performance issue you are facing?
>>
>>
>> regards,
>>
>> 2017-03-03 11:00 GMT-03:00 Arman Khalatyan :
>>
>>> This is CentOS 7.3 ZoL version 0.6.5.9-1
>>>
>>> [root@clei22 ~]# lsscsi
>>>
>>> [2:0:0:0]diskATA  INTEL SSDSC2CW24 400i  /dev/sda
>>>
>>> [3:0:0:0]diskATA  HGST HUS724040AL AA70  /dev/sdb
>>>
>>> [4:0:0:0]diskATA  WDC WD2002FYPS-0 1G01  /dev/sdc
>>>
>>>
>>> [root@clei22 ~]# pvs ;vgs;lvs
>>>
>>>   PV VGFmt
>>> Attr PSize   PFree
>>>
>>>   /dev/mapper/INTEL_SSDSC2CW240A3_CVCV306302RP240CGN vg_cache  lvm2
>>> a--  223.57g 0
>>>
>>>   /dev/sdc2  centos_clei22 lvm2
>>> a--1.82t 64.00m
>>>
>>>   VG#PV #LV #SN Attr   VSize   VFree
>>>
>>>   centos_clei22   1   3   0 wz--n-   1.82t 64.00m
>>>
>>>   vg_cache1   2   0 wz--n- 223.57g 0
>>>
>>>   LV   VGAttr   LSize   Pool Origin Data%  Meta%
>>> Move Log Cpy%Sync Convert
>>>
>>>   home centos_clei22 -wi-ao   1.74t
>>>
>>>
>>>   root centos_clei22 -wi-ao  50.00g
>>>
>>>
>>>   swap centos_clei22 -wi-ao  31.44g
>>>
>>>
>>>   lv_cache vg_cache  -wi-ao 213.57g
>>>
>>>
>>>   lv_slog  vg_cache  -wi-ao  10.00g
>>>
>>>
>>> [root@clei22 ~]# zpool status -v
>>>
>>>   pool: zclei22
>>>
>>>  state: ONLINE
>>>
>>>   scan: scrub repaired 0 in 0h0m with 0 errors on Tue Feb 28 14:16:07
>>> 2017
>>>
>>> config:
>>>
>>>
>>> NAMESTATE READ WRITE CKSUM
>>>
>>> zclei22 ONLINE   0 0 0
>>>
>>>   HGST_HUS724040ALA640_PN2334PBJ4SV6T1  ONLINE   0 0 0
>>>
>>> logs
>>>
>>>   lv_slog   ONLINE   0 0 0
>>>
>>> cache
>>>
>>>   lv_cache  ONLINE   0 0 0
>>>
>>>
>>> errors: No known data errors
>>>
>>>
>>> *ZFS config:*
>>>
>>> [root@clei22 ~]# zfs get all zclei22/01
>>>
>>> NAMEPROPERTY  VALUE  SOURCE
>>>
>>> zclei22/01  type  filesystem -
>>>
>>> zclei22/01  creation  Tue Feb 28 14:06 2017  -
>>>
>>> zclei22/01  used  389G   -
>>>
>>> zclei22/01  available 3.13T  -
>>>
>>> zclei22/01  referenced389G   -
>>>
>>> zclei22/01  compressratio 1.01x  -
>>>
>>> zclei22/01  mounted   yes-
>>>
>>> zclei22/01  quota none   default
>>>
>>> zclei22/01  reservation   none   default
>>>
>>> zclei22/01  recordsize128K   local
>>>
>>> zclei22/01  mountpoint/zclei22/01default
>>>
>>> zclei22/01  sharenfs  offdefault
>>>
>>> zclei22/01  checksum  on default
>>>
>>> zclei22/01  compression   offlocal
>>>
>>> zclei22/01  atime on default
>>>
>>> zclei22/01  devices   on default
>>>
>>> zclei22/01  exec  on default
>>>
>>> zclei22/01  setuidon default
>>>
>>> zclei22/01  readonly  offdefault
>>>
>>> zclei22/01  zoned offdefault
>>>
>>> zclei22/01  snapdir   hidden default
>>>
>>> zclei22/01  aclinheritrestricted default
>>>
>>> zclei22/01  canmount  on default
>>>
>>> zclei22/01  xattr sa local
>>>
>>> zclei22/01  copies1  default
>>>
>>> zclei22/01  version   5  -
>>>
>>> zclei22/01  utf8only  off-
>>>
>>> zclei22/01  normalization none   -
>>>
>>> zclei22/01  casesensitivity   sensitive  -
>>>
>>> zclei22/01  vscan offdefault
>>>
>>> zclei22/01  nbmand   

Re: [ovirt-users] Replicated Glusterfs on top of ZFS

2017-03-03 Thread Arman Khalatyan
No, I have one pool made of the one disk and ssd as a cache and log device.
I have 3 Glusterfs bricks- separate 3 hosts:Volume type Replicate
(Arbiter)= replica 2+1!
That how much you can push into compute nodes(they have only 3 disk slots).


On Fri, Mar 3, 2017 at 3:19 PM, Juan Pablo 
wrote:

> ok, you have 3 pools, zclei22, logs and cache, thats wrong. you should
> have 1 pool, with zlog+cache if you are looking for performance.
> also, dont mix drives.
> whats the performance issue you are facing?
>
>
> regards,
>
> 2017-03-03 11:00 GMT-03:00 Arman Khalatyan :
>
>> This is CentOS 7.3 ZoL version 0.6.5.9-1
>>
>> [root@clei22 ~]# lsscsi
>>
>> [2:0:0:0]diskATA  INTEL SSDSC2CW24 400i  /dev/sda
>>
>> [3:0:0:0]diskATA  HGST HUS724040AL AA70  /dev/sdb
>>
>> [4:0:0:0]diskATA  WDC WD2002FYPS-0 1G01  /dev/sdc
>>
>>
>> [root@clei22 ~]# pvs ;vgs;lvs
>>
>>   PV VGFmt
>> Attr PSize   PFree
>>
>>   /dev/mapper/INTEL_SSDSC2CW240A3_CVCV306302RP240CGN vg_cache  lvm2
>> a--  223.57g 0
>>
>>   /dev/sdc2  centos_clei22 lvm2
>> a--1.82t 64.00m
>>
>>   VG#PV #LV #SN Attr   VSize   VFree
>>
>>   centos_clei22   1   3   0 wz--n-   1.82t 64.00m
>>
>>   vg_cache1   2   0 wz--n- 223.57g 0
>>
>>   LV   VGAttr   LSize   Pool Origin Data%  Meta%
>> Move Log Cpy%Sync Convert
>>
>>   home centos_clei22 -wi-ao   1.74t
>>
>>
>>   root centos_clei22 -wi-ao  50.00g
>>
>>
>>   swap centos_clei22 -wi-ao  31.44g
>>
>>
>>   lv_cache vg_cache  -wi-ao 213.57g
>>
>>
>>   lv_slog  vg_cache  -wi-ao  10.00g
>>
>>
>> [root@clei22 ~]# zpool status -v
>>
>>   pool: zclei22
>>
>>  state: ONLINE
>>
>>   scan: scrub repaired 0 in 0h0m with 0 errors on Tue Feb 28 14:16:07 2017
>>
>> config:
>>
>>
>> NAMESTATE READ WRITE CKSUM
>>
>> zclei22 ONLINE   0 0 0
>>
>>   HGST_HUS724040ALA640_PN2334PBJ4SV6T1  ONLINE   0 0 0
>>
>> logs
>>
>>   lv_slog   ONLINE   0 0 0
>>
>> cache
>>
>>   lv_cache  ONLINE   0 0 0
>>
>>
>> errors: No known data errors
>>
>>
>> *ZFS config:*
>>
>> [root@clei22 ~]# zfs get all zclei22/01
>>
>> NAMEPROPERTY  VALUE  SOURCE
>>
>> zclei22/01  type  filesystem -
>>
>> zclei22/01  creation  Tue Feb 28 14:06 2017  -
>>
>> zclei22/01  used  389G   -
>>
>> zclei22/01  available 3.13T  -
>>
>> zclei22/01  referenced389G   -
>>
>> zclei22/01  compressratio 1.01x  -
>>
>> zclei22/01  mounted   yes-
>>
>> zclei22/01  quota none   default
>>
>> zclei22/01  reservation   none   default
>>
>> zclei22/01  recordsize128K   local
>>
>> zclei22/01  mountpoint/zclei22/01default
>>
>> zclei22/01  sharenfs  offdefault
>>
>> zclei22/01  checksum  on default
>>
>> zclei22/01  compression   offlocal
>>
>> zclei22/01  atime on default
>>
>> zclei22/01  devices   on default
>>
>> zclei22/01  exec  on default
>>
>> zclei22/01  setuidon default
>>
>> zclei22/01  readonly  offdefault
>>
>> zclei22/01  zoned offdefault
>>
>> zclei22/01  snapdir   hidden default
>>
>> zclei22/01  aclinheritrestricted default
>>
>> zclei22/01  canmount  on default
>>
>> zclei22/01  xattr sa local
>>
>> zclei22/01  copies1  default
>>
>> zclei22/01  version   5  -
>>
>> zclei22/01  utf8only  off-
>>
>> zclei22/01  normalization none   -
>>
>> zclei22/01  casesensitivity   sensitive  -
>>
>> zclei22/01  vscan offdefault
>>
>> zclei22/01  nbmandoffdefault
>>
>> zclei22/01  sharesmb  offdefault
>>
>> zclei22/01  refquota  none   default
>>
>> zclei22/01  refreservationnone   default
>>
>> zclei22/01  primarycache  metadata   local
>>
>> zclei22/01  secondarycache

Re: [ovirt-users] Replicated Glusterfs on top of ZFS

2017-03-03 Thread Juan Pablo
ok, you have 3 pools, zclei22, logs and cache, thats wrong. you should have
1 pool, with zlog+cache if you are looking for performance.
also, dont mix drives.
whats the performance issue you are facing?


regards,

2017-03-03 11:00 GMT-03:00 Arman Khalatyan :

> This is CentOS 7.3 ZoL version 0.6.5.9-1
>
> [root@clei22 ~]# lsscsi
>
> [2:0:0:0]diskATA  INTEL SSDSC2CW24 400i  /dev/sda
>
> [3:0:0:0]diskATA  HGST HUS724040AL AA70  /dev/sdb
>
> [4:0:0:0]diskATA  WDC WD2002FYPS-0 1G01  /dev/sdc
>
>
> [root@clei22 ~]# pvs ;vgs;lvs
>
>   PV VGFmt
> Attr PSize   PFree
>
>   /dev/mapper/INTEL_SSDSC2CW240A3_CVCV306302RP240CGN vg_cache  lvm2
> a--  223.57g 0
>
>   /dev/sdc2  centos_clei22 lvm2
> a--1.82t 64.00m
>
>   VG#PV #LV #SN Attr   VSize   VFree
>
>   centos_clei22   1   3   0 wz--n-   1.82t 64.00m
>
>   vg_cache1   2   0 wz--n- 223.57g 0
>
>   LV   VGAttr   LSize   Pool Origin Data%  Meta%  Move
> Log Cpy%Sync Convert
>
>   home centos_clei22 -wi-ao   1.74t
>
>
>   root centos_clei22 -wi-ao  50.00g
>
>
>   swap centos_clei22 -wi-ao  31.44g
>
>
>   lv_cache vg_cache  -wi-ao 213.57g
>
>
>   lv_slog  vg_cache  -wi-ao  10.00g
>
>
> [root@clei22 ~]# zpool status -v
>
>   pool: zclei22
>
>  state: ONLINE
>
>   scan: scrub repaired 0 in 0h0m with 0 errors on Tue Feb 28 14:16:07 2017
>
> config:
>
>
> NAMESTATE READ WRITE CKSUM
>
> zclei22 ONLINE   0 0 0
>
>   HGST_HUS724040ALA640_PN2334PBJ4SV6T1  ONLINE   0 0 0
>
> logs
>
>   lv_slog   ONLINE   0 0 0
>
> cache
>
>   lv_cache  ONLINE   0 0 0
>
>
> errors: No known data errors
>
>
> *ZFS config:*
>
> [root@clei22 ~]# zfs get all zclei22/01
>
> NAMEPROPERTY  VALUE  SOURCE
>
> zclei22/01  type  filesystem -
>
> zclei22/01  creation  Tue Feb 28 14:06 2017  -
>
> zclei22/01  used  389G   -
>
> zclei22/01  available 3.13T  -
>
> zclei22/01  referenced389G   -
>
> zclei22/01  compressratio 1.01x  -
>
> zclei22/01  mounted   yes-
>
> zclei22/01  quota none   default
>
> zclei22/01  reservation   none   default
>
> zclei22/01  recordsize128K   local
>
> zclei22/01  mountpoint/zclei22/01default
>
> zclei22/01  sharenfs  offdefault
>
> zclei22/01  checksum  on default
>
> zclei22/01  compression   offlocal
>
> zclei22/01  atime on default
>
> zclei22/01  devices   on default
>
> zclei22/01  exec  on default
>
> zclei22/01  setuidon default
>
> zclei22/01  readonly  offdefault
>
> zclei22/01  zoned offdefault
>
> zclei22/01  snapdir   hidden default
>
> zclei22/01  aclinheritrestricted default
>
> zclei22/01  canmount  on default
>
> zclei22/01  xattr sa local
>
> zclei22/01  copies1  default
>
> zclei22/01  version   5  -
>
> zclei22/01  utf8only  off-
>
> zclei22/01  normalization none   -
>
> zclei22/01  casesensitivity   sensitive  -
>
> zclei22/01  vscan offdefault
>
> zclei22/01  nbmandoffdefault
>
> zclei22/01  sharesmb  offdefault
>
> zclei22/01  refquota  none   default
>
> zclei22/01  refreservationnone   default
>
> zclei22/01  primarycache  metadata   local
>
> zclei22/01  secondarycachemetadata   local
>
> zclei22/01  usedbysnapshots   0  -
>
> zclei22/01  usedbydataset 389G   -
>
> zclei22/01  usedbychildren0  -
>
> zclei22/01  usedbyrefreservation  0  -
>
> zclei22/01  logbias   latencydefault
>
> zclei22/01  dedup offdefault
>
> zclei22/01  mlslabel  none   default
>
> 

Re: [ovirt-users] Replicated Glusterfs on top of ZFS

2017-03-03 Thread Arman Khalatyan
This is CentOS 7.3 ZoL version 0.6.5.9-1

[root@clei22 ~]# lsscsi

[2:0:0:0]diskATA  INTEL SSDSC2CW24 400i  /dev/sda

[3:0:0:0]diskATA  HGST HUS724040AL AA70  /dev/sdb

[4:0:0:0]diskATA  WDC WD2002FYPS-0 1G01  /dev/sdc


[root@clei22 ~]# pvs ;vgs;lvs

  PV VGFmt
Attr PSize   PFree

  /dev/mapper/INTEL_SSDSC2CW240A3_CVCV306302RP240CGN vg_cache  lvm2
a--  223.57g 0

  /dev/sdc2  centos_clei22 lvm2
a--1.82t 64.00m

  VG#PV #LV #SN Attr   VSize   VFree

  centos_clei22   1   3   0 wz--n-   1.82t 64.00m

  vg_cache1   2   0 wz--n- 223.57g 0

  LV   VGAttr   LSize   Pool Origin Data%  Meta%  Move
Log Cpy%Sync Convert

  home centos_clei22 -wi-ao
1.74t

  root centos_clei22 -wi-ao
50.00g

  swap centos_clei22 -wi-ao
31.44g

  lv_cache vg_cache  -wi-ao
213.57g

  lv_slog  vg_cache  -wi-ao  10.00g


[root@clei22 ~]# zpool status -v

  pool: zclei22

 state: ONLINE

  scan: scrub repaired 0 in 0h0m with 0 errors on Tue Feb 28 14:16:07 2017

config:


NAMESTATE READ WRITE CKSUM

zclei22 ONLINE   0 0 0

  HGST_HUS724040ALA640_PN2334PBJ4SV6T1  ONLINE   0 0 0

logs

  lv_slog   ONLINE   0 0 0

cache

  lv_cache  ONLINE   0 0 0


errors: No known data errors


*ZFS config:*

[root@clei22 ~]# zfs get all zclei22/01

NAMEPROPERTY  VALUE  SOURCE

zclei22/01  type  filesystem -

zclei22/01  creation  Tue Feb 28 14:06 2017  -

zclei22/01  used  389G   -

zclei22/01  available 3.13T  -

zclei22/01  referenced389G   -

zclei22/01  compressratio 1.01x  -

zclei22/01  mounted   yes-

zclei22/01  quota none   default

zclei22/01  reservation   none   default

zclei22/01  recordsize128K   local

zclei22/01  mountpoint/zclei22/01default

zclei22/01  sharenfs  offdefault

zclei22/01  checksum  on default

zclei22/01  compression   offlocal

zclei22/01  atime on default

zclei22/01  devices   on default

zclei22/01  exec  on default

zclei22/01  setuidon default

zclei22/01  readonly  offdefault

zclei22/01  zoned offdefault

zclei22/01  snapdir   hidden default

zclei22/01  aclinheritrestricted default

zclei22/01  canmount  on default

zclei22/01  xattr sa local

zclei22/01  copies1  default

zclei22/01  version   5  -

zclei22/01  utf8only  off-

zclei22/01  normalization none   -

zclei22/01  casesensitivity   sensitive  -

zclei22/01  vscan offdefault

zclei22/01  nbmandoffdefault

zclei22/01  sharesmb  offdefault

zclei22/01  refquota  none   default

zclei22/01  refreservationnone   default

zclei22/01  primarycache  metadata   local

zclei22/01  secondarycachemetadata   local

zclei22/01  usedbysnapshots   0  -

zclei22/01  usedbydataset 389G   -

zclei22/01  usedbychildren0  -

zclei22/01  usedbyrefreservation  0  -

zclei22/01  logbias   latencydefault

zclei22/01  dedup offdefault

zclei22/01  mlslabel  none   default

zclei22/01  sync  disabled   local

zclei22/01  refcompressratio  1.01x  -

zclei22/01  written   389G   -

zclei22/01  logicalused   396G   -

zclei22/01  logicalreferenced 396G   -

zclei22/01  filesystem_limit  none   default

zclei22/01  snapshot_limitnone   default

zclei22/01  filesystem_count  none   default

zclei22/01  snapshot_count

Re: [ovirt-users] Replicated Glusterfs on top of ZFS

2017-03-03 Thread Juan Pablo
Which operating system version are you using for your zfs storage?
do:
zfs get all your-pool-name
use arc_summary.py from freenas git repo if you wish.


2017-03-03 10:33 GMT-03:00 Arman Khalatyan :

> Pool load:
> [root@clei21 ~]# zpool iostat -v 1
>capacity operations
> bandwidth
> poolalloc   free   read  write
> read  write
> --  -  -  -  -
> -  -
> zclei21 10.1G  3.62T  0112
> 823  8.82M
>   HGST_HUS724040ALA640_PN2334PBJ52XWT1  10.1G  3.62T  0 46
> 626  4.40M
> logs-  -  -  -
> -  -
>   lv_slog225M  9.72G  0 66
> 198  4.45M
> cache   -  -  -  -
> -  -
>   lv_cache  9.81G   204G  0 46
> 56  4.13M
> --  -  -  -  -
> -  -
>
>capacity operations
> bandwidth
> poolalloc   free   read  write
> read  write
> --  -  -  -  -
> -  -
> zclei21 10.1G  3.62T  0191
> 0  12.8M
>   HGST_HUS724040ALA640_PN2334PBJ52XWT1  10.1G  3.62T  0  0
> 0  0
> logs-  -  -  -
> -  -
>   lv_slog225M  9.72G  0191
> 0  12.8M
> cache   -  -  -  -
> -  -
>   lv_cache  9.83G   204G  0218
> 0  20.0M
> --  -  -  -  -
> -  -
>
>capacity operations
> bandwidth
> poolalloc   free   read  write
> read  write
> --  -  -  -  -
> -  -
> zclei21 10.1G  3.62T  0191
> 0  12.7M
>   HGST_HUS724040ALA640_PN2334PBJ52XWT1  10.1G  3.62T  0  0
> 0  0
> logs-  -  -  -
> -  -
>   lv_slog225M  9.72G  0191
> 0  12.7M
> cache   -  -  -  -
> -  -
>   lv_cache  9.83G   204G  0 72
> 0  7.68M
> --  -  -  -  -
> -  -
>
>
> On Fri, Mar 3, 2017 at 2:32 PM, Arman Khalatyan  wrote:
>
>> Glusterfs now in healing mode:
>> Receiver:
>> [root@clei21 ~]# arcstat.py 1
>> time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz
>> c
>> 13:24:49 0 0  0 00 00 00   4.6G
>> 31G
>> 13:24:50   15480 5180   51 0080   51   4.6G
>> 31G
>> 13:24:51   17962 3462   34 0062   42   4.6G
>> 31G
>> 13:24:52   14868 4568   45 0068   45   4.6G
>> 31G
>> 13:24:53   14064 4564   45 0064   45   4.6G
>> 31G
>> 13:24:54   12448 3848   38 0048   38   4.6G
>> 31G
>> 13:24:55   15780 5080   50 0080   50   4.7G
>> 31G
>> 13:24:56   20268 3368   33 0068   41   4.7G
>> 31G
>> 13:24:57   12754 4254   42 0054   42   4.7G
>> 31G
>> 13:24:58   12650 3950   39 0050   39   4.7G
>> 31G
>> 13:24:59   11640 3440   34 0040   34   4.7G
>> 31G
>>
>>
>> Sender
>> [root@clei22 ~]# arcstat.py 1
>> time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz
>> c
>> 13:28:37 8 2 25 2   25 00 2   25   468M
>> 31G
>> 13:28:38  1.2K   727 62   727   62 00   525   54   469M
>> 31G
>> 13:28:39   815   508 62   508   62 00   376   55   469M
>> 31G
>> 13:28:40   994   624 62   624   62 00   450   54   469M
>> 31G
>> 13:28:41   783   456 58   456   58 00   338   50   470M
>> 31G
>> 13:28:42   916   541 59   541   59 00   390   50   470M
>> 31G
>> 13:28:43   768   437 56   437   57 00   313   48   471M
>> 31G
>> 13:28:44   877   534 60   534   60 00   393   53   470M
>> 31G
>> 13:28:45   957   630 65   630   65 00   450   57   470M
>> 31G
>> 13:28:46   819   479 58   479   58 00   357   51   471M
>> 31G
>>
>>
>> On Thu, Mar 2, 2017 at 7:18 PM, Juan Pablo 
>> wrote:
>>
>>> hey,
>>> what are you using for zfs? get an arc status and show please
>>>
>>>
>>> 2017-03-02 9:57 GMT-03:00 Arman Khalatyan :
>>>
 no,
 ZFS itself is not 

Re: [ovirt-users] Replicated Glusterfs on top of ZFS

2017-03-03 Thread Arman Khalatyan
Pool load:
[root@clei21 ~]# zpool iostat -v 1
   capacity operations
bandwidth
poolalloc   free   read  write   read
write
--  -  -  -  -  -
-
zclei21 10.1G  3.62T  0112823
8.82M
  HGST_HUS724040ALA640_PN2334PBJ52XWT1  10.1G  3.62T  0 46626
4.40M
logs-  -  -  -
-  -
  lv_slog225M  9.72G  0 66198
4.45M
cache   -  -  -  -
-  -
  lv_cache  9.81G   204G  0 46 56
4.13M
--  -  -  -  -  -
-

   capacity operations
bandwidth
poolalloc   free   read  write   read
write
--  -  -  -  -  -
-
zclei21 10.1G  3.62T  0191  0
12.8M
  HGST_HUS724040ALA640_PN2334PBJ52XWT1  10.1G  3.62T  0  0
0  0
logs-  -  -  -
-  -
  lv_slog225M  9.72G  0191  0
12.8M
cache   -  -  -  -
-  -
  lv_cache  9.83G   204G  0218  0
20.0M
--  -  -  -  -  -
-

   capacity operations
bandwidth
poolalloc   free   read  write   read
write
--  -  -  -  -  -
-
zclei21 10.1G  3.62T  0191  0
12.7M
  HGST_HUS724040ALA640_PN2334PBJ52XWT1  10.1G  3.62T  0  0
0  0
logs-  -  -  -
-  -
  lv_slog225M  9.72G  0191  0
12.7M
cache   -  -  -  -
-  -
  lv_cache  9.83G   204G  0 72  0
7.68M
--  -  -  -  -  -
-


On Fri, Mar 3, 2017 at 2:32 PM, Arman Khalatyan  wrote:

> Glusterfs now in healing mode:
> Receiver:
> [root@clei21 ~]# arcstat.py 1
> time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
> 13:24:49 0 0  0 00 00 00   4.6G   31G
> 13:24:50   15480 5180   51 0080   51   4.6G   31G
> 13:24:51   17962 3462   34 0062   42   4.6G   31G
> 13:24:52   14868 4568   45 0068   45   4.6G   31G
> 13:24:53   14064 4564   45 0064   45   4.6G   31G
> 13:24:54   12448 3848   38 0048   38   4.6G   31G
> 13:24:55   15780 5080   50 0080   50   4.7G   31G
> 13:24:56   20268 3368   33 0068   41   4.7G   31G
> 13:24:57   12754 4254   42 0054   42   4.7G   31G
> 13:24:58   12650 3950   39 0050   39   4.7G   31G
> 13:24:59   11640 3440   34 0040   34   4.7G   31G
>
>
> Sender
> [root@clei22 ~]# arcstat.py 1
> time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
> 13:28:37 8 2 25 2   25 00 2   25   468M   31G
> 13:28:38  1.2K   727 62   727   62 00   525   54   469M   31G
> 13:28:39   815   508 62   508   62 00   376   55   469M   31G
> 13:28:40   994   624 62   624   62 00   450   54   469M   31G
> 13:28:41   783   456 58   456   58 00   338   50   470M   31G
> 13:28:42   916   541 59   541   59 00   390   50   470M   31G
> 13:28:43   768   437 56   437   57 00   313   48   471M   31G
> 13:28:44   877   534 60   534   60 00   393   53   470M   31G
> 13:28:45   957   630 65   630   65 00   450   57   470M   31G
> 13:28:46   819   479 58   479   58 00   357   51   471M   31G
>
>
> On Thu, Mar 2, 2017 at 7:18 PM, Juan Pablo 
> wrote:
>
>> hey,
>> what are you using for zfs? get an arc status and show please
>>
>>
>> 2017-03-02 9:57 GMT-03:00 Arman Khalatyan :
>>
>>> no,
>>> ZFS itself is not on top of lvm. only ssd was spitted by lvm for
>>> slog(10G) and cache (the rest)
>>> but in any-case the ssd does not help much on glusterfs/ovirt  load it
>>> has almost 100% cache misses:( (terrible performance compare with nfs)
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 2, 2017 at 1:47 PM, FERNANDO FREDIANI <
>>> fernando.fredi...@upx.com> wrote:
>>>
 Am I understanding 

Re: [ovirt-users] Replicated Glusterfs on top of ZFS

2017-03-03 Thread Arman Khalatyan
Glusterfs now in healing mode:
Receiver:
[root@clei21 ~]# arcstat.py 1
time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
13:24:49 0 0  0 00 00 00   4.6G   31G
13:24:50   15480 5180   51 0080   51   4.6G   31G
13:24:51   17962 3462   34 0062   42   4.6G   31G
13:24:52   14868 4568   45 0068   45   4.6G   31G
13:24:53   14064 4564   45 0064   45   4.6G   31G
13:24:54   12448 3848   38 0048   38   4.6G   31G
13:24:55   15780 5080   50 0080   50   4.7G   31G
13:24:56   20268 3368   33 0068   41   4.7G   31G
13:24:57   12754 4254   42 0054   42   4.7G   31G
13:24:58   12650 3950   39 0050   39   4.7G   31G
13:24:59   11640 3440   34 0040   34   4.7G   31G


Sender
[root@clei22 ~]# arcstat.py 1
time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
13:28:37 8 2 25 2   25 00 2   25   468M   31G
13:28:38  1.2K   727 62   727   62 00   525   54   469M   31G
13:28:39   815   508 62   508   62 00   376   55   469M   31G
13:28:40   994   624 62   624   62 00   450   54   469M   31G
13:28:41   783   456 58   456   58 00   338   50   470M   31G
13:28:42   916   541 59   541   59 00   390   50   470M   31G
13:28:43   768   437 56   437   57 00   313   48   471M   31G
13:28:44   877   534 60   534   60 00   393   53   470M   31G
13:28:45   957   630 65   630   65 00   450   57   470M   31G
13:28:46   819   479 58   479   58 00   357   51   471M   31G


On Thu, Mar 2, 2017 at 7:18 PM, Juan Pablo 
wrote:

> hey,
> what are you using for zfs? get an arc status and show please
>
>
> 2017-03-02 9:57 GMT-03:00 Arman Khalatyan :
>
>> no,
>> ZFS itself is not on top of lvm. only ssd was spitted by lvm for
>> slog(10G) and cache (the rest)
>> but in any-case the ssd does not help much on glusterfs/ovirt  load it
>> has almost 100% cache misses:( (terrible performance compare with nfs)
>>
>>
>>
>>
>>
>> On Thu, Mar 2, 2017 at 1:47 PM, FERNANDO FREDIANI <
>> fernando.fredi...@upx.com> wrote:
>>
>>> Am I understanding correctly, but you have Gluster on the top of ZFS
>>> which is on the top of LVM ? If so, why the usage of LVM was necessary ? I
>>> have ZFS with any need of LVM.
>>>
>>> Fernando
>>>
>>> On 02/03/2017 06:19, Arman Khalatyan wrote:
>>>
>>> Hi,
>>> I use 3 nodes with zfs and glusterfs.
>>> Are there any suggestions to optimize it?
>>>
>>> host zfs config 4TB-HDD+250GB-SSD:
>>> [root@clei22 ~]# zpool status
>>>   pool: zclei22
>>>  state: ONLINE
>>>   scan: scrub repaired 0 in 0h0m with 0 errors on Tue Feb 28 14:16:07
>>> 2017
>>> config:
>>>
>>> NAMESTATE READ WRITE CKSUM
>>> zclei22 ONLINE   0 0 0
>>>   HGST_HUS724040ALA640_PN2334PBJ4SV6T1  ONLINE   0 0 0
>>> logs
>>>   lv_slog   ONLINE   0 0 0
>>> cache
>>>   lv_cache  ONLINE   0 0 0
>>>
>>> errors: No known data errors
>>>
>>> Name:
>>> GluReplica
>>> Volume ID:
>>> ee686dfe-203a-4caa-a691-26353460cc48
>>> Volume Type:
>>> Replicate (Arbiter)
>>> Replica Count:
>>> 2 + 1
>>> Number of Bricks:
>>> 3
>>> Transport Types:
>>> TCP, RDMA
>>> Maximum no of snapshots:
>>> 256
>>> Capacity:
>>> 3.51 TiB total, 190.56 GiB used, 3.33 TiB free
>>>
>>>
>>> ___
>>> Users mailing 
>>> listUsers@ovirt.orghttp://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] OVirt new cluster host only logins on one iSCSI

2017-03-03 Thread Duarte Fernandes Rocha
Hello,

> > Thanks for your reply. Maybe I did not explain myself correctly.
> > 
> > > > I have a Hosted engine Setup, 2 physical hosts, 1 for virtualization
> > 
> > and
> > 
> > > > another for managing OVirt.
> > > 
> > > The host that runs the engine VM could also run other VMs at the same
> > 
> > time.
> > 
> > > On the other side, only if both the hosts could run the engine VM,
> > > 
> > > ovirt-ha-agent is ensuring HA: HA on a single host is not HA.
> > 
> > The engine is installed on a host with no virtualization capabilities, it
> > is only used for managing the host agents.
>
> Instead if you directly run engine-engine setup on bare metal you are going
> to directly install the engine on bare metal.
> In that case you the server where you install the engine has no direct
> access to the LUN used to create the storage domains.

Yes, I run the engine on "bare metal". I do not want for this server to see the 
storage domain.

I have server-engine, server-host1 and now server-host2.

server-engine has only the OVirt engine
server-host1 is my main server runing now several VMs, with 4 storage domains 
via iSCSI
server-host2 is a new server I added to the default cluster. 

All is working well except server-host2 has only one iSCSI session established 
and server-host1 
has two sessions (the storage has two iSCSI interfaces active-backup) and 
server-host1 has two 
paths to every storage domain and server-host2 only has one.

I think it could be because at the time I was adding the server to the cluster 
(web UI) I was also 
allowing it on the storage iSCSI and maybe the iscsi was not available the 
first time it tried to 
connect But I already tried removing the host and re-adding but i still 
only creates on2 iscsi 
session.

Regards,

-- 
Duarte Fernandes Rocha 

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Gluster-users] Hot to force glusterfs to use RDMA?

2017-03-03 Thread Arman Khalatyan
I think there are some bug in the vdsmd checks;

2017-03-03 11:15:42,413 ERROR (jsonrpc/7) [storage.HSM] Could not connect
to storageServer (hsm:2391)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2388, in connectStorageServer
conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 167, in connect
self.getMountObj().getRecord().fs_file)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/mount.py", line 237,
in getRecord
(self.fs_spec, self.fs_file))
OSError: [Errno 2] Mount of `10.10.10.44:/GluReplica` at
`/rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica` does not exist
2017-03-03 11:15:42,416 INFO  (jsonrpc/7) [dispatcher] Run and protect:
connectStorageServer, Return response: {'statuslist': [{'status': 100,
'id': u'4b2ea911-ef35-4de0-bd11-c4753e6048d8'}]} (logUtils:52)
2017-03-03 11:15:42,417 INFO  (jsonrpc/7) [jsonrpc.JsonRpcServer] RPC call
StoragePool.connectStorageServer succeeded in 2.63 seconds (__init__:515)
2017-03-03 11:15:44,239 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC call
Host.getAllVmStats succeeded in 0.00 seconds (__init__:515)

[root@clei21 ~]# df | grep glu
10.10.10.44:/GluReplica.rdma   3770662912 407818240 3362844672  11%
/rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica

ls "/rhev/data-center/mnt/glusterSD/10.10.10.44:_GluReplica"
09f95051-bc93-4cf5-85dc-16960cee74e4  __DIRECT_IO_TEST__
[root@clei21 ~]# touch /rhev/data-center/mnt/glusterSD/10.10.10.44
\:_GluReplica/testme.txt
[root@clei21 ~]# unlink /rhev/data-center/mnt/glusterSD/10.10.10.44
\:_GluReplica/testme.txt



On Fri, Mar 3, 2017 at 11:51 AM, Arman Khalatyan  wrote:

> Thank you all  for the nice hints.
> Somehow  my host was not able to access the userspace RDMA, after
> installing:
> yum install -y libmlx4.x86_64
>
> I can mount:
> /usr/bin/mount  -t glusterfs  -o backup-volfile-servers=10.10.
> 10.44:10.10.10.42:10.10.10.41,transport=rdma 10.10.10.44:/GluReplica /mnt
> 10.10.10.44:/GluReplica.rdma   3770662912 407817216 3362845696
> <(336)%20284-5696>  11% /mnt
>
> Looks the rdma and gluster are working except ovirt GUI:(
>
> With  MountOptions:
> backup-volfile-servers=10.10.10.44:10.10.10.42:10.10.10.41,transport=rdma
>
> I am not able to activate storage.
>
>
> ---Gluster Status 
> gluster volume status
> Status of volume: GluReplica
> Gluster process TCP Port  RDMA Port  Online
> Pid
> 
> --
> Brick 10.10.10.44:/zclei22/01/glu   49162 49163  Y
> 17173
> Brick 10.10.10.42:/zclei21/01/glu   49156 49157  Y
> 17113
> Brick 10.10.10.41:/zclei26/01/glu   49157 49158  Y
> 16404
> Self-heal Daemon on localhost   N/A   N/AY
> 16536
> Self-heal Daemon on clei21.vib  N/A   N/AY
> 17134
> Self-heal Daemon on 10.10.10.44 N/A   N/AY
> 17329
>
> Task Status of Volume GluReplica
> 
> --
> There are no active volume tasks
>
>
> -IB status -
>
> ibstat
> CA 'mlx4_0'
> CA type: MT26428
> Number of ports: 1
> Firmware version: 2.7.700
> Hardware version: b0
> Node GUID: 0x002590163758
> System image GUID: 0x00259016375b
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 10
> Base lid: 273
> LMC: 0
> SM lid: 3
> Capability mask: 0x02590868
> Port GUID: 0x002590163759
> Link layer: InfiniBand
>
> Not bad for SDR switch ! :-P
>  qperf clei22.vib  ud_lat ud_bw
> ud_lat:
> latency  =  23.6 us
> ud_bw:
> send_bw  =  981 MB/sec
> recv_bw  =  980 MB/sec
>
>
>
>
> On Fri, Mar 3, 2017 at 9:08 AM, Deepak Naidu  wrote:
>
>> >> As you can see from my previous email that the RDMA connection tested
>> with qperf.
>>
>> I think you have wrong command. Your testing *TCP & not RDMA. *Also
>> check if you have RDMA & IB modules loaded on your hosts.
>>
>> root@clei26 ~]# qperf clei22.vib  tcp_bw tcp_lat
>> tcp_bw:
>> bw  =  475 MB/sec
>> tcp_lat:
>> latency  =  52.8 us
>> [root@clei26 ~]#
>>
>>
>>
>> *Please run below command to test RDMA*
>>
>>
>>
>> *[root@storageN2 ~]# qperf storageN1 ud_lat ud_bw*
>>
>> *ud_lat**:*
>>
>> *latency  =  7.51 us*
>>
>> *ud_bw**:*
>>
>> *send_bw  =  9.21 GB/sec*
>>
>> *recv_bw  =  9.21 GB/sec*
>>
>> *[root@sc-sdgx-202 ~]#*
>>
>>
>>
>> Read qperf man pages for more info.
>>
>>
>>
>> * To run a TCP bandwidth and latency test:
>>
>> qperf myserver tcp_bw tcp_lat
>>
>> * To run a UDP latency test and then cause the server to terminate:
>>
>> qperf myserver udp_lat quit
>>
>> * To measure the RDMA UD latency and bandwidth:
>>
>> qperf myserver ud_lat ud_bw
>>
>> * To measure RDMA UC bi-directional bandwidth:
>>

Re: [ovirt-users] Staring cluster Hosted Engine stuck - failed liveliness check

2017-03-03 Thread Maton, Brett
Thanks Simone,

  I'll give that a go this evening, I'm remote at the moment.

Regards,
Brett

On 3 March 2017 at 10:48, Simone Tiraboschi  wrote:

>
>
> On Fri, Mar 3, 2017 at 11:35 AM, Maton, Brett 
> wrote:
>
>> VM Up not responding - Yes that seems to be the case.
>>
>> I did actually try hosted-engine --console
>>
>> hosted-engine --console
>> The engine VM is running on this host
>> Connected to domain HostedEngine
>> Escape character is ^]
>> error: internal error: cannot find character device 
>>
>
> This was the result of https://bugzilla.redhat.com/show_bug.cgi?id=1364132
>
> Could you please install this (it's from 4.1.1 RC)
> http://resources.ovirt.org/pub/ovirt-4.1-pre/rpm/el7/
> noarch/ovirt-hosted-engine-ha-2.1.0.4-1.el7.centos.noarch.rpm to get it
> fixed.
> You need then to run:
> systemctl restart ovirt-ha-agent
> hosted-engine --vm-shutdown
> # wait for it
> hosted-engine --vm-start
>
> At this point you have should have the serial console.
>
>
>>
>>
>>
>> On 3 March 2017 at 08:39, Simone Tiraboschi  wrote:
>>
>>>
>>>
>>> On Thu, Mar 2, 2017 at 11:42 AM, Maton, Brett 
>>> wrote:
>>>
 What is the correct way to shut down a cluster ?

 I shutdown my 4.1 cluster following this guide
 https://github.com/rharmonson/richtech/wiki/OSVDC-Series:-oV
 irt-3.6-Cluster-Shutdown-and-Startup as the NAS hosting the nfs mounts
 needed rebooting after updates.

 ( I realise the guide is for 3.6, surely not that different ? )

>>>
>>> It seams fine.
>>>
>>>

 However hosted engine isn't starting now.

   failed liveliness check

>>>
>>> So it seams that the engine VM is up but the engine is not responding.
>>>
>>>

 I've tried connecting from the host starting the HE vm to see what's
 going on with:

 virsh console HostedEngine

I don't know what authentication it requires, tried engine admin
 details and root os...

>>>
>>> hosted-engine --console
>>>
>>>

 hosted-engine --add-console-password
 Enter password:
 no graphics devices configured

 unsurprisingly, remote-viewer vnc://localhost:5900
 fails with Unable to connect to the graphic server vnc://localhost:5900

 What can I do next ?

>>>
>>> Check the status of the engine.
>>>
>>>

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


>>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] [Gluster-users] Hot to force glusterfs to use RDMA?

2017-03-03 Thread Arman Khalatyan
Thank you all  for the nice hints.
Somehow  my host was not able to access the userspace RDMA, after
installing:
yum install -y libmlx4.x86_64

I can mount:
/usr/bin/mount  -t glusterfs  -o
backup-volfile-servers=10.10.10.44:10.10.10.42:10.10.10.41,transport=rdma
10.10.10.44:/GluReplica /mnt
10.10.10.44:/GluReplica.rdma   3770662912 407817216 3362845696  11% /mnt

Looks the rdma and gluster are working except ovirt GUI:(

With  MountOptions:
backup-volfile-servers=10.10.10.44:10.10.10.42:10.10.10.41,transport=rdma

I am not able to activate storage.


---Gluster Status 
gluster volume status
Status of volume: GluReplica
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick 10.10.10.44:/zclei22/01/glu   49162 49163  Y
17173
Brick 10.10.10.42:/zclei21/01/glu   49156 49157  Y
17113
Brick 10.10.10.41:/zclei26/01/glu   49157 49158  Y
16404
Self-heal Daemon on localhost   N/A   N/AY
16536
Self-heal Daemon on clei21.vib  N/A   N/AY
17134
Self-heal Daemon on 10.10.10.44 N/A   N/AY
17329

Task Status of Volume GluReplica
--
There are no active volume tasks


-IB status -

ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 1
Firmware version: 2.7.700
Hardware version: b0
Node GUID: 0x002590163758
System image GUID: 0x00259016375b
Port 1:
State: Active
Physical state: LinkUp
Rate: 10
Base lid: 273
LMC: 0
SM lid: 3
Capability mask: 0x02590868
Port GUID: 0x002590163759
Link layer: InfiniBand

Not bad for SDR switch ! :-P
 qperf clei22.vib  ud_lat ud_bw
ud_lat:
latency  =  23.6 us
ud_bw:
send_bw  =  981 MB/sec
recv_bw  =  980 MB/sec




On Fri, Mar 3, 2017 at 9:08 AM, Deepak Naidu  wrote:

> >> As you can see from my previous email that the RDMA connection tested
> with qperf.
>
> I think you have wrong command. Your testing *TCP & not RDMA. *Also check
> if you have RDMA & IB modules loaded on your hosts.
>
> root@clei26 ~]# qperf clei22.vib  tcp_bw tcp_lat
> tcp_bw:
> bw  =  475 MB/sec
> tcp_lat:
> latency  =  52.8 us
> [root@clei26 ~]#
>
>
>
> *Please run below command to test RDMA*
>
>
>
> *[root@storageN2 ~]# qperf storageN1 ud_lat ud_bw*
>
> *ud_lat**:*
>
> *latency  =  7.51 us*
>
> *ud_bw**:*
>
> *send_bw  =  9.21 GB/sec*
>
> *recv_bw  =  9.21 GB/sec*
>
> *[root@sc-sdgx-202 ~]#*
>
>
>
> Read qperf man pages for more info.
>
>
>
> * To run a TCP bandwidth and latency test:
>
> qperf myserver tcp_bw tcp_lat
>
> * To run a UDP latency test and then cause the server to terminate:
>
> qperf myserver udp_lat quit
>
> * To measure the RDMA UD latency and bandwidth:
>
> qperf myserver ud_lat ud_bw
>
> * To measure RDMA UC bi-directional bandwidth:
>
> qperf myserver rc_bi_bw
>
> * To get a range of TCP latencies with a message size from 1 to 64K
>
> qperf myserver -oo msg_size:1:64K:*2 -vu tcp_lat
>
>
>
>
>
> *Check if you have RDMA & IB modules loaded*
>
>
>
> lsmod | grep -i ib
>
>
>
> lsmod | grep -i rdma
>
>
>
>
>
>
>
> --
>
> Deepak
>
>
>
>
>
>
>
> *From:* Arman Khalatyan [mailto:arm2...@gmail.com]
> *Sent:* Thursday, March 02, 2017 10:57 PM
> *To:* Deepak Naidu
> *Cc:* Rafi Kavungal Chundattu Parambil; gluster-us...@gluster.org; users;
> Sahina Bose
> *Subject:* RE: [Gluster-users] [ovirt-users] Hot to force glusterfs to
> use RDMA?
>
>
>
> Dear Deepak, thank you for the hints, which gluster are you using?
>
> As you can see from my previous email that the RDMA connection tested with
> qperf. It is working as expected. In my case the clients are servers as
> well, they are hosts for the ovirt. Disabling selinux is nor recommended by
> ovirt, but i will give a try.
>
>
>
> Am 03.03.2017 7:50 vorm. schrieb "Deepak Naidu" :
>
> I have been testing glusterfs over RDMA & below is the command I use.
> Reading up the logs, it looks like your IB(InfiniBand) device is not being
> initialized. I am not sure if u have an issue on the client IB or the
> storage server IB. Also have you configured ur IB devices correctly. I am
> using IPoIB.
>
> Can you check your firewall, disable selinux, I think, you might have
> checked it already ?
>
>
>
> *mount -t glusterfs -o transport=rdma storageN1:/vol0 /mnt/vol0*
>
>
>
>
>
> · *The below error seems if you have issue starting your volume.
> I had issue, when my transport was set to tcp,rdma. I had to force start my
> volume. If I had set it only to tcp on the volume, the volume would start
> easily.*
>
>
>
> [2017-03-02 11:49:47.829391] E [MSGID: 114022] [client.c:2530:client_init_rpc]
> 0-GluReplica-client-2: failed to 

Re: [ovirt-users] Staring cluster Hosted Engine stuck - failed liveliness check

2017-03-03 Thread Simone Tiraboschi
On Fri, Mar 3, 2017 at 11:35 AM, Maton, Brett 
wrote:

> VM Up not responding - Yes that seems to be the case.
>
> I did actually try hosted-engine --console
>
> hosted-engine --console
> The engine VM is running on this host
> Connected to domain HostedEngine
> Escape character is ^]
> error: internal error: cannot find character device 
>

This was the result of https://bugzilla.redhat.com/show_bug.cgi?id=1364132

Could you please install this (it's from 4.1.1 RC)
http://resources.ovirt.org/pub/ovirt-4.1-pre/rpm/el7/noarch/ovirt-hosted-engine-ha-2.1.0.4-1.el7.centos.noarch.rpm
to get it fixed.
You need then to run:
systemctl restart ovirt-ha-agent
hosted-engine --vm-shutdown
# wait for it
hosted-engine --vm-start

At this point you have should have the serial console.


>
>
>
> On 3 March 2017 at 08:39, Simone Tiraboschi  wrote:
>
>>
>>
>> On Thu, Mar 2, 2017 at 11:42 AM, Maton, Brett 
>> wrote:
>>
>>> What is the correct way to shut down a cluster ?
>>>
>>> I shutdown my 4.1 cluster following this guide
>>> https://github.com/rharmonson/richtech/wiki/OSVDC-Series:-oV
>>> irt-3.6-Cluster-Shutdown-and-Startup as the NAS hosting the nfs mounts
>>> needed rebooting after updates.
>>>
>>> ( I realise the guide is for 3.6, surely not that different ? )
>>>
>>
>> It seams fine.
>>
>>
>>>
>>> However hosted engine isn't starting now.
>>>
>>>   failed liveliness check
>>>
>>
>> So it seams that the engine VM is up but the engine is not responding.
>>
>>
>>>
>>> I've tried connecting from the host starting the HE vm to see what's
>>> going on with:
>>>
>>> virsh console HostedEngine
>>>
>>>I don't know what authentication it requires, tried engine admin
>>> details and root os...
>>>
>>
>> hosted-engine --console
>>
>>
>>>
>>> hosted-engine --add-console-password
>>> Enter password:
>>> no graphics devices configured
>>>
>>> unsurprisingly, remote-viewer vnc://localhost:5900
>>> fails with Unable to connect to the graphic server vnc://localhost:5900
>>>
>>> What can I do next ?
>>>
>>
>> Check the status of the engine.
>>
>>
>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Staring cluster Hosted Engine stuck - failed liveliness check

2017-03-03 Thread Maton, Brett
VM Up not responding - Yes that seems to be the case.

I did actually try hosted-engine --console

hosted-engine --console
The engine VM is running on this host
Connected to domain HostedEngine
Escape character is ^]
error: internal error: cannot find character device 



On 3 March 2017 at 08:39, Simone Tiraboschi  wrote:

>
>
> On Thu, Mar 2, 2017 at 11:42 AM, Maton, Brett 
> wrote:
>
>> What is the correct way to shut down a cluster ?
>>
>> I shutdown my 4.1 cluster following this guide
>> https://github.com/rharmonson/richtech/wiki/OSVDC-Series:-oV
>> irt-3.6-Cluster-Shutdown-and-Startup as the NAS hosting the nfs mounts
>> needed rebooting after updates.
>>
>> ( I realise the guide is for 3.6, surely not that different ? )
>>
>
> It seams fine.
>
>
>>
>> However hosted engine isn't starting now.
>>
>>   failed liveliness check
>>
>
> So it seams that the engine VM is up but the engine is not responding.
>
>
>>
>> I've tried connecting from the host starting the HE vm to see what's
>> going on with:
>>
>> virsh console HostedEngine
>>
>>I don't know what authentication it requires, tried engine admin
>> details and root os...
>>
>
> hosted-engine --console
>
>
>>
>> hosted-engine --add-console-password
>> Enter password:
>> no graphics devices configured
>>
>> unsurprisingly, remote-viewer vnc://localhost:5900
>> fails with Unable to connect to the graphic server vnc://localhost:5900
>>
>> What can I do next ?
>>
>
> Check the status of the engine.
>
>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Contributing on ovirt

2017-03-03 Thread shubham dubey
Hello,
I am interested in being part of ovirt for gsoc 2017.
I have looked into ovirt project ideas and the project that I find
most suitable
is Configuring the backup storage in ovirt.

Since the ovirt online docs have sufficient info for getting started for
development so I don't have a questions about that but I want to clarify
one doubt that did the previous year mentioned project on ovirt gsoc page
are also available to work on?
I will also appreciate any discussion about the project or question from
mentor side.Even some guideline for start working is welcome.

Thanks,
Shubham
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Staring cluster Hosted Engine stuck - failed liveliness check

2017-03-03 Thread Simone Tiraboschi
On Thu, Mar 2, 2017 at 11:42 AM, Maton, Brett 
wrote:

> What is the correct way to shut down a cluster ?
>
> I shutdown my 4.1 cluster following this guide
> https://github.com/rharmonson/richtech/wiki/OSVDC-Series:-oV
> irt-3.6-Cluster-Shutdown-and-Startup as the NAS hosting the nfs mounts
> needed rebooting after updates.
>
> ( I realise the guide is for 3.6, surely not that different ? )
>

It seams fine.


>
> However hosted engine isn't starting now.
>
>   failed liveliness check
>

So it seams that the engine VM is up but the engine is not responding.


>
> I've tried connecting from the host starting the HE vm to see what's going
> on with:
>
> virsh console HostedEngine
>
>I don't know what authentication it requires, tried engine admin
> details and root os...
>

hosted-engine --console


>
> hosted-engine --add-console-password
> Enter password:
> no graphics devices configured
>
> unsurprisingly, remote-viewer vnc://localhost:5900
> fails with Unable to connect to the graphic server vnc://localhost:5900
>
> What can I do next ?
>

Check the status of the engine.


>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users