Re: [ovirt-users] Dedicated NICs for gluster network

2015-11-27 Thread Ivan Bulatovic

Hi Nicolas,

what works for me in 3.6 is creating a new network for gluster within 
oVirt, marking it for gluster use only, optionally setting bonded 
interface upon NIC's that are dedicated for gluster traffic and 
providing it with an IP address without configuring a gateway, and then 
modifying /etc/hosts so that hostnames are resolvable between nodes. 
Every node should have two hostnames, one for ovirtmgmt network that is 
resolvable via DNS (or via /etc/hosts), and the other for gluster 
network that is resolvable purely via /etc/hosts (every node should 
contain entries for themselves and for each gluster node).


Peers should be probed via their gluster hostnames, while ensuring that 
gluster peer status contains only addresses and hostnames that are 
dedicated for gluster on each node. Same goes for adding bricks, 
creating a volume etc.


This way, no communication (except gluster one) should be allowed 
through gluster dedicated vlan. To be on the safe side, we can also 
force gluster to listen only on dedicated interfaces via 
transport.socket.bind-address option (haven't tried this one, will do).


Separation of gluster (or in the future any storage network), live 
migration network, vm and management network is always a good thing. 
Perhaps, we could manage failover of those networks within oVirt, ie. in 
case lm network is down - use gluster network for lm and vice versa. 
Cool candidate for an RFE, but first we need this supported within 
gluster itself. This may prove useful when there is not enough NIC's 
available to do a bond beneath every defined network. But we can still 
separate traffic and provide failover by selecting multiple networks 
without actually doing any load balancing between the two.


As Nathanaël mentioned, marking network for gluster use is only 
available in 3.6. I'm also interested if there is a better way around 
this procedure, or perharps enhancing it.


Kind regards,

Ivan

On 11/27/2015 05:47 PM, Nathanaël Blanchet wrote:

Hello Nicolas,

Did you have a look to this : 
http://www.ovirt.org/Features/Select_Network_For_Gluster ?

But it is only available from >=3.6...

Le 27/11/2015 17:02, Nicolas Ecarnot a écrit :

Hello,

[Here : oVirt 3.5.3, 3 x CentOS 7.0 hosts with replica-3 gluster SD 
on the hosts].


On the switchs, I have created a dedicated VLAN to isolate the 
glusterFS traffic, but I'm not using it yet.
I was thinking of creating a dedicated IP for each node's gluster 
NIC, and a DNS record by the way ("my_nodes_name_GL"), but I fear 
using this hostname or this ip in oVirt GUI host network interface 
tab, leading oVirt think this is a different host.


Not being sure this fear is clearly described, let's say :
- On each node, I create a second ip+(dns record in the soa) used by 
gluster, plugged on the correct VLAN
- in oVirt gui, in the host network setting tab, the interface will 
be seen, with its ip, but reverse-dns-related to a different hostname.
Here, I fear oVirt might check this reverse DNS and declare this NIC 
belongs to another host.


I would also prefer not use a reverse pointing to the name of the 
host management ip, as this is evil and I'm a good guy.


On your side, how do you cope with a dedicated storage network in 
case of storage+compute mixed hosts?






___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Problem with Adding Pre-Configured Domain

2015-11-24 Thread Ivan Bulatovic

Hi Roger,

you can not import the snapshot because you already have an active 
storage domain that contains the same (meta)data. Try detaching the 
storage domain that you take snapshots from, and remove it (do not 
select to wipe the data), and then try to import the snapshot. You will 
see a warning that the domain is already registered within ovirt engine 
and you can force the import to continue. After that, you should see the 
domain registered in ovirt webadmin. Before detaching the domain, make 
sure you have another domain active so it can become a master domain, 
and create exports of the VM's that are part of that domain just in case.


I had the same problem while trying to import a replicated storage 
domain, you can see that oVirt tries to import the domain, but it just 
returns to import domain dialog. It actually mounts the domain for a few 
seconds, then disconnects and removes the mount point under 
/rhev/data-center/ and then it tries to unmount it and fails because 
mount point doesn't exist anymore.


Mentioned it here recently: 
http://lists.ovirt.org/pipermail/users/2015-November/036027.html


Maybe there is a workaround for importing the clone/snap of the domain 
while the source is still active (messing with the metadata), but I 
haven't tried it so far (there are several implications that needs to be 
taken into the account). However, I'm also interested if there is a way 
to do such a thing, especially when you have two separate datacenters 
registered within the same engine, and it would be great for us to be 
able to import snaps/replicated storage domains and/or VM's that reside 
on it while still having the original VM's active and running.


Something similar to the third RFE here (this is only for VM's): 
http://www.ovirt.org/Features/ImportUnregisteredEntities#RFEs


In any case, I'll try this ASAP, always an interesting topic. Any 
insights on this is highly appreciated.


Ivan

On 11/24/2015 12:40 PM, Roger Meier wrote:

Hi All,

I don't know if this is a Bug or an error on my side.

At the moment,  i have a oVirt 3.6 installation with two Nodes and Two 
Storage Server, which
are configured themselfs als master/slave (solaris zfs snapshot copy 
from master to slave all 2 hours)


Now i try to do some tests for some failure use cases like, master 
storage isn't available anymore or

one of the virtual machines must be restored from the snapshot.

Because the data n the slave is a snapshot copy, all data which are on 
the Data Domain NFS Storage,

are also on the slave NFS Storage.

I tried it to add over WebUI over the option "Import Domain" (Import 
Pre-Configured Domain) with both
Domain Functions (Data and Export) but nothing happens, expect some 
errors in the vdsm.log Logfile.


Something like this

Thread-253746::ERROR::2015-11-24 
11:44:41,758::hsm::2549::Storage.HSM::(disconnectStorageServer) Could 
not disconnect from storageServer

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2545, in 
disconnectStorageServer

conObj.disconnect()
  File "/usr/share/vdsm/storage/storageServer.py", line 425, in disconnect
return self._mountCon.disconnect()
  File "/usr/share/vdsm/storage/storageServer.py", line 254, in disconnect
self._mount.umount(True, True)
  File "/usr/share/vdsm/storage/mount.py", line 256, in umount
return self._runcmd(cmd, timeout)
  File "/usr/share/vdsm/storage/mount.py", line 241, in _runcmd
raise MountError(rc, ";".join((out, err)))
MountError: (32, ';umount: 
/rhev/data-center/mnt/192.168.1.13:_oi-srv2-sasData1_oi-srv1-sasData1_nfsshare1: 
mountpoint not found\n')


I checked with nfs-check.py if all permissions are ok, the tool say this:

Konsole output
[root@lin-ovirt1 contrib]# python ./nfs-check.py 
192.168.1.13:/oi-srv2-sasData1/oi-srv1-sasData1/nfsshare1

Current hostname: lin-ovirt1 - IP addr 192.168.1.14
Trying to /bin/mount -t nfs 
192.168.1.13:/oi-srv2-sasData1/oi-srv1-sasData1/nfsshare1...

Executing NFS tests..
Removing vdsmTest file..
Status of tests [OK]
Disconnecting from NFS Server..
Done!

Greetings
Roger Meier


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Importing gluster SD from geo-replicated volume

2015-11-19 Thread Ivan Bulatovic
Hi, I have two DC's (both initialized), two-node each, and on the first 
one I have a replica 2 gluster storage domain that is geo-replicating on 
a replica 2 slave volume on the second DC (managed within the same 
engine). When I stop the replication (volumes are synced) and try to 
import the gluster storage domain that resides on the slave, import 
storage domain dialog throws a general exception.


Exception is raised when vdsm loads the list of backup servers so that 
the backup-volfile-servers mount option could get populated. If I 
override that in storageServer.py, so that it always return blank, or 
when I manually enter this option in the import storage domain dialog, 
then everything works as expected.


Maybe it's worth mentioning that I have a dedicated gluster network and 
hostnames for all nodes in both DC's (node hostname, and hostname I use 
for gluster on that node are different), and that all attempts to import 
a storage domain were on the second DC.


Btw, setting up gluster geo-replication from oVirt was a breeze, easy 
and straightforward. Importing domain based on slave gluster volume 
works when gluster storage domain that resides on master volume gets 
removed from the first DC. This is something that we could improve, if I 
don't detach and remove original gluster sd, import storage dialog just 
shows up again after a short "running circle", but it should provide a 
warning that there is another storage domain already active/registered 
in the engine with the same ID/name and that the domain should be 
removed (or the engine can do it for us). I get this warning only when 
I've already removed storage domain on a master volume from the first DC 
(which doesn't make sense to me).


I can open bug reports for both issues if needed, just want to check if 
the rationale behind the process is correct or not.


vdsm-gluster-4.17.10.1-0.el7
ovirt-engine-webadmin-portal-3.6.1-0.0.master.20151117185807.git529d3d2.el7

engine.log

2015-11-19 07:33:15,245 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(default task-23) [34886be8] Correlation ID: null, Call Stack: null, 
Custom Event ID: -1, Message: The error message for connection 
hostname:/volname returned by VDSM was: General Exception
2015-11-19 07:33:15,245 ERROR 
[org.ovirt.engine.core.bll.storage.BaseFsStorageHelper] (default 
task-23) [34886be8] The connection with details 'hostname:/volname' 
failed because of error code '100' and error message is: general exception


vdsm.log

Thread-38::ERROR::2015-11-19 
07:33:15,237::hsm::2465::Storage.HSM::(connectStorageServer) Could not 
connect to storageServer

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2462, in connectStorageServer
conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 224, in connect
self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)
  File "/usr/share/vdsm/storage/storageServer.py", line 324, in options
backup_servers_option = self._get_backup_servers_option()
  File "/usr/share/vdsm/storage/storageServer.py", line 341, in 
_get_backup_servers_option

servers.remove(self._volfileserver)
ValueError: list.remove(x): x not in list
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Importing gluster SD from geo-replicated volume

2015-11-19 Thread Ivan Bulatovic



On 11/20/2015 06:33 AM, Sahina Bose wrote:



On 11/19/2015 11:21 PM, Ivan Bulatovic wrote:
Hi, I have two DC's (both initialized), two-node each, and on the 
first one I have a replica 2 gluster storage domain that is 
geo-replicating on a replica 2 slave volume on the second DC (managed 
within the same engine). When I stop the replication (volumes are 
synced) and try to import the gluster storage domain that resides on 
the slave, import storage domain dialog throws a general exception.


Exception is raised when vdsm loads the list of backup servers so 
that the backup-volfile-servers mount option could get populated. If 
I override that in storageServer.py, so that it always return blank, 
or when I manually enter this option in the import storage domain 
dialog, then everything works as expected.


This problem is addressed in Ala's patch - 
https://gerrit.ovirt.org/#/c/48308/
Are there multiple interfaces configured for gluster at the slave 
cluster ?


I've created a separate network and marked it as a gluster network on 
both datacenters, but I haven't used transport.socket.bind-address 
option to bind gluster to those particular interfaces. But peers were 
probed via hostnames that are resolving to the address of the interfaces 
that belongs to the gluster network without any aliases.


Nir mentioned this bug a few days back (that Ala's patch addresses), but 
I've managed somehow not to connect the dots. I am using a host in the 
cluster to connect to the volume, but the host is not a part of the 
volume info even though it is the same host, just different hostname 
that is pointing to a different network interface on that host.


I'll test the patch and provide feedback,

Thanks Sahina.
Maybe it's worth mentioning that I have a dedicated gluster network 
and hostnames for all nodes in both DC's (node hostname, and hostname 
I use for gluster on that node are different), and that all attempts 
to import a storage domain were on the second DC.


Btw, setting up gluster geo-replication from oVirt was a breeze, easy 
and straightforward. Importing domain based on slave gluster volume 
works when gluster storage domain that resides on master volume gets 
removed from the first DC. This is something that we could improve, 
if I don't detach and remove original gluster sd, import storage 
dialog just shows up again after a short "running circle", but it 
should provide a warning that there is another storage domain already 
active/registered in the engine with the same ID/name and that the 
domain should be removed (or the engine can do it for us). I get this 
warning only when I've already removed storage domain on a master 
volume from the first DC (which doesn't make sense to me).


Glad to know geo-rep setup was easy.
Regarding the import of storage domain, Nir could help




I can open bug reports for both issues if needed, just want to check 
if the rationale behind the process is correct or not.


vdsm-gluster-4.17.10.1-0.el7
ovirt-engine-webadmin-portal-3.6.1-0.0.master.20151117185807.git529d3d2.el7 



engine.log

2015-11-19 07:33:15,245 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default 
task-23) [34886be8] Correlation ID: null, Call Stack: null, Custom 
Event ID: -1, Message: The error message for connection 
hostname:/volname returned by VDSM was: General Exception
2015-11-19 07:33:15,245 ERROR 
[org.ovirt.engine.core.bll.storage.BaseFsStorageHelper] (default 
task-23) [34886be8] The connection with details 'hostname:/volname' 
failed because of error code '100' and error message is: general 
exception


vdsm.log

Thread-38::ERROR::2015-11-19 
07:33:15,237::hsm::2465::Storage.HSM::(connectStorageServer) Could 
not connect to storageServer

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 2462, in 
connectStorageServer

conObj.connect()
  File "/usr/share/vdsm/storage/storageServer.py", line 224, in connect
self._mount.mount(self.options, self._vfsType, cgroup=self.CGROUP)
  File "/usr/share/vdsm/storage/storageServer.py", line 324, in options
backup_servers_option = self._get_backup_servers_option()
  File "/usr/share/vdsm/storage/storageServer.py", line 341, in 
_get_backup_servers_option

servers.remove(self._volfileserver)
ValueError: list.remove(x): x not in list
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Mix local and shared storage same data center on ovirt3.6 rc?

2015-10-20 Thread Ivan Bulatovic
Hi Jurriën,

I did some tests on ovirt-3.6-snapshot and mixing local and shared 
storage domains is working (with a tweak).

If you opted for Shared storage type when you created a new Data Center, 
you can mix NFS/FC/iSCSI shared storage domains with POSIX Compliant FS 
data domain using a local storage device (logical drive or partition, 
formatted with ext4/xfs). 

Only catch was, I had to chown the device node to vsdm:kvm for it to be 
utilized by oVirt (eg. chown 36:36 /dev/sda2).

It doesn't work on 3.5.4 though, I've changed ownership for the device 
node, had to change ownership for the ext4 fs itself residing on the 
partition, and it still refused to connect stating that not all ovirt 
nodes can write to it (that was expected).

I hope this helps,

Kind regards,

Ivan

On Monday, October 19, 2015 07:29:06 PM Bloemen, Jurriën wrote:
> Hi,
> 
> I found this message:
> http://lists.ovirt.org/pipermail/users/2013-July/015413.html
 
> Itamar from RedHat states that they are busy to make it working both
> local and shared storage on the same host.
 
> Can somebody confirm that this will be the case? Or is there something
> else?
 
> Kind regards,
> 
> Jurriën
> 
> From: Nir Soffer >
> Date: Friday 16 October 2015 00:53
> To: Liam Curtis >
> Cc: "Users@ovirt.org"
> >
 Subject: Re: [ovirt-users]
> Mix local and shared storage same data center on ovirt3.6 rc? 
> 
> בתאריך 7 באוק׳ 2015 8:13 לפנה״צ,‏ "Liam Curtis"
> > כתב:
> >
> >
> > Hello,
> > Sorry for double-post if the case (not sure if case-sensitive)
> >
> >
> >
> > Loving ovirt...Have reinstalled many a time trying to understand and
> > thought I had this working, though now that everything operating
> > properly it seems this functionality is not possible.
>
> >
> >
> > I am running hosted engine over glusterfs and would also like to use
> > some of the other bricks I have set up on the gluster host, but
> > when I try to create a new gluster cluster in data center,  I get
> > error message:
>
> >
> >
> > Failed to connect host  to Storage Pool Default. (which
> > makes sense as storage is local, but trying to overcome this
> > limitation)
>
> >
> >
> > I dont want to use just gluster shared storage in same data center.
> > Any way to work around this?
>
> >
> 
> 
> No, you can not mix local and shared storage.
> 
> For shared storage, the basic assumption is that all hosts see all the
> storage, so you can migrate vms from any host to any host.
 
> You can use local storage by exposing it as shared storage, for
> example, serve local lvs or files as luns via targetcli.
 
> Nir
> 
> 
> >
> >
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
> >
> 
> 
> This message (including any attachments) may contain information that
> is privileged or confidential. If you are not the intended recipient,
> please notify the sender and delete this email immediately from your
> systems and destroy all copies of it. You may not, directly or
> indirectly, use, disclose, distribute, print or copy this email or
> any part of it if you are not the intended recipient

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Recommended setup for a FC based storage domain

2014-06-12 Thread Ivan Bulatovic
Thanks, I really hope someone can help, because right now I'm afraid to
thin provision any large volume due to this. I forgot to mention that,
when I do the export on the NAS, node that is SPM at that moment is
running qemu-img convert process and VDSM process on that node is
running wild (400% cpu load from time to time), but qemu-img convert
never occupies more than 1 thread (100%). There are a few (4-5) nfsd
processes that are going from 40% to 90%. CPU load on the engine admin
panel shows that vm's that run on SPM show abnormaly high cpu load
(40-60%) and they usually run idle at the time. So:

- Runing exprort when SPM node does not have any vm's running is fast
(thin or fully provisioned, doesn't matteR)
- Running export when SPM node does have 3-4 vm's running is painfully
slow for thin provisioned vm's
- Running export when  SPM node does have 3-4 vm's running but vm that
is exporting is not thin provisioned is fast enough even though vm's on
SPM show an increased cpu load also

I'm yet to measure I/O utilization in all scenarios, but I'm positive on
what I wrote for the thin provisioned volume when there are vms runing
on spm, it goes 15MBps max and just at bursts every three or four
seconds (I measure this on the SPM node because the qemu-img convert
runs on that node, even though vm is residing on another node, with its
thin provisioned disk on its local storage shared via nfs).

On čet, 2014-06-12 at 13:31 +, Sven Kieske wrote:
 CC'ing the devel list, maybe some VDSM and storage people can
 explain this?
 
 Am 10.06.2014 12:24, schrieb combuster:
  /etc/libvirt/libvirtd.conf and /etc/vdsm/logger.conf
  
  , but unfortunately maybe I've jumped to conclusions, last weekend, that
  very same thin provisioned vm was running a simple export for 3hrs
  before I've killed the process. But I wondered:
  
  1. The process that runs behind the export is qemu-img convert (from raw
  to raw), and running iotop shows that every three or four seconds it
  reads 10-13 MBps and then idles for a few seconds. Run the numbers on
  100GB (why is he covering the entire 100 of 15GB used on thin volume I
  still don't get it) and you get precisely 3-4 hrs estimated time remaining.
  2. When I run export with SPM on a node that doesn't have any vm's
  running, export finishes for aprox. 30min (iotop shows 40-70MBps read
  speed constantly)
  3. Renicing I/O priority of the qemu-img process as well as the CPU
  priority gave no results, it was still runing slow beyond any explanation.
  
  Debug logs showed nothing of interest, so I disabled anything above
  warning and it suddenly accelerated the export, so I've connected the
  wrong dots.
  
  On 06/10/2014 11:18 AM, Andrew Lau wrote:
  Interesting, which files did you modify to lower the log levels?
 
  On Tue, Jun 3, 2014 at 12:38 AM,  combus...@archlinux.us wrote:
  One word of caution so far, when exporting any vm, the node that acts
  as SPM
  is stressed out to the max. I releived the stress by a certain margin
  with
  lowering libvirtd and vdsm log levels to WARNING. That shortened out the
  export procedure by at least five times. But vdsm process on the SPM
  node  is
  still with high cpu usage so it's best that the SPM node should be
  left with a
  decent CPU time amount to spare. Also, export of VM's with high vdisk
  capacity
  and thin provisioning enabled (let's say 14GB used of 100GB defined)
  took
  around 50min over a 10Gb ethernet interface to a 1Gb export NAS
  device that
  was not stressed out at all by other processes. When I did that
  export with
  debug log levels it took 5hrs :(
 
  So lowering log levels is a must in production enviroment. I've
  deleted the
  lun that I exported on the storage (removed it first from ovirt) and
  for the
  next weekend I am planing to add a new one, export it again on all
  the nodes
  and start a few fresh vm installations. Things I'm going to look for are
  partition alignment and running them from different nodes in the
  cluster at
  the same time. I just hope that not all I/O is going to pass through
  the SPM,
  this is the one thing that bothers me the most.
 
  I'll report back on these results next week, but if anyone has
  experience with
  this kind of things or can point  to some documentation would be great.
 
  On Monday, 2. June 2014. 18.51.52 you wrote:
  I'm curious to hear what other comments arise, as we're analyzing a
  production setup shortly.
 
  On Sun, Jun 1, 2014 at 10:11 PM,  combus...@archlinux.us wrote:
  I need to scratch gluster off because setup is based on CentOS 6.5, so
  essential prerequisites like qemu 1.3 and libvirt 1.0.1 are not met.
  Gluster would still work with EL6, afaik it just won't use libgfapi and
  instead use just a standard mount.
 
  Any info regarding FC storage domain would be appreciated though.
 
  Thanks
 
  Ivan
 
  On Sunday, 1. June 2014. 11.44.33 combus...@archlinux.us wrote:
  Hi,
 
  I have a 4 node cluster setup and my storage options