Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-21 Thread Ravishankar N


On 07/21/2017 11:41 PM, yayo (j) wrote:

Hi,

Sorry for follow up again, but, checking the ovirt interface I've 
found that ovirt report the "engine" volume as an "arbiter" 
configuration and the "data" volume as full replicated volume. Check 
these screenshots:


This is probably some refresh bug in the UI, Sahina might be able to 
tell you.


https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFfVmR5aDQ?usp=sharing

But the "gluster volume info" command report that all 2 volume are 
full replicated:



/Volume Name: data/
/Type: Replicate/
/Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d/
/Status: Started/
/Snapshot Count: 0/
/Number of Bricks: 1 x 3 = 3/
/Transport-type: tcp/
/Bricks:/
/Brick1: gdnode01:/gluster/data/brick/
/Brick2: gdnode02:/gluster/data/brick/
/Brick3: gdnode04:/gluster/data/brick/
/Options Reconfigured:/
/nfs.disable: on/
/performance.readdir-ahead: on/
/transport.address-family: inet/
/storage.owner-uid: 36/
/performance.quick-read: off/
/performance.read-ahead: off/
/performance.io-cache: off/
/performance.stat-prefetch: off/
/performance.low-prio-threads: 32/
/network.remote-dio: enable/
/cluster.eager-lock: enable/
/cluster.quorum-type: auto/
/cluster.server-quorum-type: server/
/cluster.data-self-heal-algorithm: full/
/cluster.locking-scheme: granular/
/cluster.shd-max-threads: 8/
/cluster.shd-wait-qlength: 1/
/features.shard: on/
/user.cifs: off/
/storage.owner-gid: 36/
/features.shard-block-size: 512MB/
/network.ping-timeout: 30/
/performance.strict-o-direct: on/
/cluster.granular-entry-heal: on/
/auth.allow: */
/server.allow-insecure: on/





/Volume Name: engine/
/Type: Replicate/
/Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515/
/Status: Started/
/Snapshot Count: 0/
/Number of Bricks: 1 x 3 = 3/
/Transport-type: tcp/
/Bricks:/
/Brick1: gdnode01:/gluster/engine/brick/
/Brick2: gdnode02:/gluster/engine/brick/
/Brick3: gdnode04:/gluster/engine/brick/
/Options Reconfigured:/
/nfs.disable: on/
/performance.readdir-ahead: on/
/transport.address-family: inet/
/storage.owner-uid: 36/
/performance.quick-read: off/
/performance.read-ahead: off/
/performance.io-cache: off/
/performance.stat-prefetch: off/
/performance.low-prio-threads: 32/
/network.remote-dio: off/
/cluster.eager-lock: enable/
/cluster.quorum-type: auto/
/cluster.server-quorum-type: server/
/cluster.data-self-heal-algorithm: full/
/cluster.locking-scheme: granular/
/cluster.shd-max-threads: 8/
/cluster.shd-wait-qlength: 1/
/features.shard: on/
/user.cifs: off/
/storage.owner-gid: 36/
/features.shard-block-size: 512MB/
/network.ping-timeout: 30/
/performance.strict-o-direct: on/
/cluster.granular-entry-heal: on/
/auth.allow: */

  server.allow-insecure: on


2017-07-21 19:13 GMT+02:00 yayo (j) >:


2017-07-20 14:48 GMT+02:00 Ravishankar N mailto:ravishan...@redhat.com>>:


But it does  say something. All these gfids of completed heals
in the log below are the for the ones that you have given the
getfattr output of. So what is likely happening is there is an
intermittent connection problem between your mount and the
brick process, leading to pending heals again after the heal
gets completed, which is why the numbers are varying each
time. You would need to check why that is the case.
Hope this helps,
Ravi




/[2017-07-20 09:58:46.573079] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1  sinks=2/
/[2017-07-20 09:59:22.995003] I [MSGID: 108026]
[afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
0-engine-replicate-0: performing metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81/
/[2017-07-20 09:59:22.999372] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81. sources=[0] 1  sinks=2/




Hi,

following your suggestion, I've checked the "peer" status and I
found that there is too many name for the hosts, I don't know if
this can be the problem or part of it:

/*gluster peer status on NODE01:*/
/Number of Peers: 2/
/
/
/Hostname: dnode02.localdomain.local/
/Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd/
/State: Peer in Cluster (Connected)/
/Other names:/
/192.168.10.52/
/dnode02.localdomain.local/
/10.10.20.90/
/10.10.10.20/

Re: [ovirt-users] ovirt on sdcard?

2017-07-21 Thread Mahdi Adnan
Hello,


Same here, im running multiple servers on SD cards without issues.


--

Respectfully
Mahdi A. Mahdi


From: users-boun...@ovirt.org  on behalf of Arsène 
Gschwind 
Sent: Thursday, July 20, 2017 11:32:11 AM
To: users@ovirt.org
Subject: Re: [ovirt-users] ovirt on sdcard?


Hi Lionel,

I'm running such a setup since about 4 month without any problem so far, on 
Cisco UCS Blades.

rgds,
Arsène

On 07/19/2017 09:16 PM, Lionel Caignec wrote:

Hi,

i'm planning to install some new hypervisors (ovirt) and i'm wondering if it's 
possible to get it installed on sdcard.
I know there is write limitation on this kind of storage device.
Is it a viable solution? there is somewhere some tuto about tuning ovirt on 
this kind of storage?

Thanks

--
Lionel
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


--

Arsène Gschwind
Fa. Sapify AG im Auftrag der Universität Basel
IT Services
Klingelbergstr. 70 |  CH-4056 Basel  |  Switzerland
Tel. +41 79 449 25 63  |  http://its.unibas.ch 
ITS-ServiceDesk: support-...@unibas.ch | +41 61 
267 14 11
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVIRT 4.1 / iSCSI Multipathing

2017-07-21 Thread Vinícius Ferrão

On 21 Jul 2017, at 15:12, Yaniv Kaul 
mailto:yk...@redhat.com>> wrote:



On Wed, Jul 19, 2017 at 9:13 PM, Vinícius Ferrão 
mailto:fer...@if.ufrj.br>> wrote:
Hello,

I’ve skipped this message entirely yesterday. So this is per design? Because 
the best practices of iSCSI MPIO, as far as I know, recommends two completely 
separate paths. If this can’t be achieved with oVirt what’s the point of 
running MPIO?

With regular storage it is quite easy to achieve using 'iSCSI bonding'.
I think the Dell storage is a bit different and requires some more 
investigation - or experience with it.
 Y.

Yaniv, thank you for answering this. I’m really hoping that a solution would be 
found.

Actually I’m not running anything from DELL. My storage system is FreeNAS which 
is pretty standard and, as far as I know, iSCSI practices dictates segregate 
networks for proper working.

All other major virtualization products supports iSCSI this way: vSphere, 
XenServer and Hyper-V. So I was really surprised that oVirt (and even RHV, I 
requested a trial yesterday) does not implement ISCSI with the well know best 
practices.

There’s a picture of the architecture that I take from Google when searching 
for ”mpio best practives”: 
https://image.slidesharecdn.com/2010-12-06-midwest-reg-vmug-101206110506-phpapp01/95/nextgeneration-best-practices-for-vmware-and-storage-15-728.jpg?cb=1296301640

Ans as you can see it’s segregated networks on a machine reaching the same 
target.

In my case, my datacenter has five Hypervisor Machines, with two NICs dedicated 
for iSCSI. Both NICs connect to different converged ethernet switches and the 
iStorage is connected the same way.

So it really does not make sense that a the first NIC can reach the second NIC 
target. In a case of a switch failure the cluster will go down anyway, so 
what’s the point of running MPIO? Right?

Thanks once again,
V.



May we ask for a bug fix or a feature redesign on this?

MPIO is part of my datacenter, and it was originally build for running 
XenServer, but I’m considering the move to oVirt. MPIO isn’t working right and 
this can be a great no-go for me...

I’m willing to wait and hold my DC project if this can be fixed.

Any answer from the redhat folks?

Thanks,
V.

> On 18 Jul 2017, at 11:09, Uwe Laverenz 
> mailto:u...@laverenz.de>> wrote:
>
> Hi,
>
>
> Am 17.07.2017 um 14:11 schrieb Devin Acosta:
>
>> I am still troubleshooting the issue, I haven’t found any resolution to my 
>> issue at this point yet. I need to figure out by this Friday otherwise I 
>> need to look at Xen or another solution. iSCSI and oVIRT seems problematic.
>
> The configuration of iSCSI-Multipathing via OVirt didn't work for me either. 
> IIRC the underlying problem in my case was that I use totally isolated 
> networks for each path.
>
> Workaround: to make round robin work you have to enable it by editing 
> "/etc/multipath.conf". Just add the 3 lines for the round robin setting (see 
> comment in the file) and additionally add the "# VDSM PRIVATE" comment to 
> keep vdsmd from overwriting your settings.
>
> My multipath.conf:
>
>
>> # VDSM REVISION 1.3
>> # VDSM PRIVATE
>> defaults {
>>polling_interval5
>>no_path_retry   fail
>>user_friendly_names no
>>flush_on_last_del   yes
>>fast_io_fail_tmo5
>>dev_loss_tmo30
>>max_fds 4096
>># 3 lines added manually for multipathing:
>>path_selector   "round-robin 0"
>>path_grouping_policymultibus
>>failbackimmediate
>> }
>> # Remove devices entries when overrides section is available.
>> devices {
>>device {
>># These settings overrides built-in devices settings. It does not 
>> apply
>># to devices without built-in settings (these use the settings in the
>># "defaults" section), or to devices defined in the "devices" section.
>># Note: This is not available yet on Fedora 21. For more info see
>># https://bugzilla.redhat.com/1253799
>>all_devsyes
>>no_path_retry   fail
>>}
>> }
>
>
>
> To enable the settings:
>
>  systemctl restart multipathd
>
> See if it works:
>
>  multipath -ll
>
>
> HTH,
> Uwe
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] oVIRT 4.1 / iSCSI Multipathing

2017-07-21 Thread Yaniv Kaul
On Wed, Jul 19, 2017 at 9:13 PM, Vinícius Ferrão  wrote:

> Hello,
>
> I’ve skipped this message entirely yesterday. So this is per design?
> Because the best practices of iSCSI MPIO, as far as I know, recommends two
> completely separate paths. If this can’t be achieved with oVirt what’s the
> point of running MPIO?
>

With regular storage it is quite easy to achieve using 'iSCSI bonding'.
I think the Dell storage is a bit different and requires some more
investigation - or experience with it.
 Y.


> May we ask for a bug fix or a feature redesign on this?
>
> MPIO is part of my datacenter, and it was originally build for running
> XenServer, but I’m considering the move to oVirt. MPIO isn’t working right
> and this can be a great no-go for me...
>
> I’m willing to wait and hold my DC project if this can be fixed.
>
> Any answer from the redhat folks?
>
> Thanks,
> V.
>
> > On 18 Jul 2017, at 11:09, Uwe Laverenz  wrote:
> >
> > Hi,
> >
> >
> > Am 17.07.2017 um 14:11 schrieb Devin Acosta:
> >
> >> I am still troubleshooting the issue, I haven’t found any resolution to
> my issue at this point yet. I need to figure out by this Friday otherwise I
> need to look at Xen or another solution. iSCSI and oVIRT seems problematic.
> >
> > The configuration of iSCSI-Multipathing via OVirt didn't work for me
> either. IIRC the underlying problem in my case was that I use totally
> isolated networks for each path.
> >
> > Workaround: to make round robin work you have to enable it by editing
> "/etc/multipath.conf". Just add the 3 lines for the round robin setting
> (see comment in the file) and additionally add the "# VDSM PRIVATE" comment
> to keep vdsmd from overwriting your settings.
> >
> > My multipath.conf:
> >
> >
> >> # VDSM REVISION 1.3
> >> # VDSM PRIVATE
> >> defaults {
> >>polling_interval5
> >>no_path_retry   fail
> >>user_friendly_names no
> >>flush_on_last_del   yes
> >>fast_io_fail_tmo5
> >>dev_loss_tmo30
> >>max_fds 4096
> >># 3 lines added manually for multipathing:
> >>path_selector   "round-robin 0"
> >>path_grouping_policymultibus
> >>failbackimmediate
> >> }
> >> # Remove devices entries when overrides section is available.
> >> devices {
> >>device {
> >># These settings overrides built-in devices settings. It does
> not apply
> >># to devices without built-in settings (these use the settings
> in the
> >># "defaults" section), or to devices defined in the "devices"
> section.
> >># Note: This is not available yet on Fedora 21. For more info see
> >># https://bugzilla.redhat.com/1253799
> >>all_devsyes
> >>no_path_retry   fail
> >>}
> >> }
> >
> >
> >
> > To enable the settings:
> >
> >  systemctl restart multipathd
> >
> > See if it works:
> >
> >  multipath -ll
> >
> >
> > HTH,
> > Uwe
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-21 Thread yayo (j)
Hi,

Sorry for follow up again, but, checking the ovirt interface I've found
that ovirt report the "engine" volume as an "arbiter" configuration and the
"data" volume as full replicated volume. Check these screenshots:

https://drive.google.com/drive/folders/0ByUV7xQtP1gCTE8tUTFfVmR5aDQ?usp=sharing

But the "gluster volume info" command report that all 2 volume are full
replicated:


*Volume Name: data*
*Type: Replicate*
*Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d*
*Status: Started*
*Snapshot Count: 0*
*Number of Bricks: 1 x 3 = 3*
*Transport-type: tcp*
*Bricks:*
*Brick1: gdnode01:/gluster/data/brick*
*Brick2: gdnode02:/gluster/data/brick*
*Brick3: gdnode04:/gluster/data/brick*
*Options Reconfigured:*
*nfs.disable: on*
*performance.readdir-ahead: on*
*transport.address-family: inet*
*storage.owner-uid: 36*
*performance.quick-read: off*
*performance.read-ahead: off*
*performance.io-cache: off*
*performance.stat-prefetch: off*
*performance.low-prio-threads: 32*
*network.remote-dio: enable*
*cluster.eager-lock: enable*
*cluster.quorum-type: auto*
*cluster.server-quorum-type: server*
*cluster.data-self-heal-algorithm: full*
*cluster.locking-scheme: granular*
*cluster.shd-max-threads: 8*
*cluster.shd-wait-qlength: 1*
*features.shard: on*
*user.cifs: off*
*storage.owner-gid: 36*
*features.shard-block-size: 512MB*
*network.ping-timeout: 30*
*performance.strict-o-direct: on*
*cluster.granular-entry-heal: on*
*auth.allow: **
*server.allow-insecure: on*





*Volume Name: engine*
*Type: Replicate*
*Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515*
*Status: Started*
*Snapshot Count: 0*
*Number of Bricks: 1 x 3 = 3*
*Transport-type: tcp*
*Bricks:*
*Brick1: gdnode01:/gluster/engine/brick*
*Brick2: gdnode02:/gluster/engine/brick*
*Brick3: gdnode04:/gluster/engine/brick*
*Options Reconfigured:*
*nfs.disable: on*
*performance.readdir-ahead: on*
*transport.address-family: inet*
*storage.owner-uid: 36*
*performance.quick-read: off*
*performance.read-ahead: off*
*performance.io-cache: off*
*performance.stat-prefetch: off*
*performance.low-prio-threads: 32*
*network.remote-dio: off*
*cluster.eager-lock: enable*
*cluster.quorum-type: auto*
*cluster.server-quorum-type: server*
*cluster.data-self-heal-algorithm: full*
*cluster.locking-scheme: granular*
*cluster.shd-max-threads: 8*
*cluster.shd-wait-qlength: 1*
*features.shard: on*
*user.cifs: off*
*storage.owner-gid: 36*
*features.shard-block-size: 512MB*
*network.ping-timeout: 30*
*performance.strict-o-direct: on*
*cluster.granular-entry-heal: on*
*auth.allow: **

  server.allow-insecure: on


2017-07-21 19:13 GMT+02:00 yayo (j) :

> 2017-07-20 14:48 GMT+02:00 Ravishankar N :
>
>>
>> But it does  say something. All these gfids of completed heals in the log
>> below are the for the ones that you have given the getfattr output of. So
>> what is likely happening is there is an intermittent connection problem
>> between your mount and the brick process, leading to pending heals again
>> after the heal gets completed, which is why the numbers are varying each
>> time. You would need to check why that is the case.
>> Hope this helps,
>> Ravi
>>
>>
>>
>> *[2017-07-20 09:58:46.573079] I [MSGID: 108026]
>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
>> Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327.
>> sources=[0] 1  sinks=2*
>> *[2017-07-20 09:59:22.995003] I [MSGID: 108026]
>> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
>> 0-engine-replicate-0: performing metadata selfheal on
>> f05b9742-2771-484a-85fc-5b6974bcef81*
>> *[2017-07-20 09:59:22.999372] I [MSGID: 108026]
>> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
>> Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
>> sources=[0] 1  sinks=2*
>>
>>
>
> Hi,
>
> following your suggestion, I've checked the "peer" status and I found that
> there is too many name for the hosts, I don't know if this can be the
> problem or part of it:
>
> *gluster peer status on NODE01:*
> *Number of Peers: 2*
>
> *Hostname: dnode02.localdomain.local*
> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
> *State: Peer in Cluster (Connected)*
> *Other names:*
> *192.168.10.52*
> *dnode02.localdomain.local*
> *10.10.20.90*
> *10.10.10.20*
>
>
>
>
> *gluster peer status on NODE02:*
> *Number of Peers: 2*
>
> *Hostname: dnode01.localdomain.local*
> *Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
> *State: Peer in Cluster (Connected)*
> *Other names:*
> *gdnode01*
> *10.10.10.10*
>
> *Hostname: gdnode04*
> *Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828*
> *State: Peer in Cluster (Connected)*
> *Other names:*
> *192.168.10.54*
> *10.10.10.40*
>
>
> *gluster peer status on NODE04:*
> *Number of Peers: 2*
>
> *Hostname: dnode02.neridom.dom*
> *Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
> *State: Peer in Cluster (Connected)*
> *Other names:*
> *10.10.20.90*
> *gdnode02*
> *192.168.10.52*
> *10.10.10.20*
>
> *Hostname: dnode01.localdomain.local*
> *Uuid: a568bd60

Re: [ovirt-users] workflow suggestion for the creating and destroying the VMs?

2017-07-21 Thread Yaniv Kaul
On Fri, Jul 21, 2017 at 12:06 PM, Arman Khalatyan  wrote:

> thanks,the downscaling is important for me,
>

It really depends on the guest OS cooperation. While off-lining a CPU is
relatively easy, hot-unplugging memory is a bigger challenge for the OS.

 i was testing something like:
>  1) clone from actual vm(super slow,even if it is 20GB OS, needs more
> investigation,nfs is bottle neck)
> 2) start it with dhcp,
> 3) somehow find the ip
> 4) sync parameters between running vm and new vm.
>
> looks that everything might be possible with the python sdk...
>
> are there some examples or tutorials with cloudinitscripts?
>

https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/start_vm_with_cloud_init.py

But you could also use Ansible, might be even easier:
http://docs.ansible.com/ansible/latest/ovirt_vms_module.html#examples

Y.


>
> Am 21.07.2017 3:58 nachm. schrieb "Yaniv Kaul" :
>
>>
>>
>> On Fri, Jul 21, 2017 at 6:07 AM, Arman Khalatyan 
>> wrote:
>>
>>> Yes, thanks for mentioning puppet, we have foreman for the bare metal
>>> systems.
>>> I was looking something like preboot hook script, to mount the /dev/sda
>>> and copy some stuff there.
>>> Is it possible to do that with cloud-init/sysprep?
>>>
>>
>> It is.
>>
>> However, I'd like to remind you that we also have some scale-up features
>> you might want to consider - you can hot-add CPU and memory to VMs, which
>> in some workloads (but not all) can be helpful and easier.
>> (Hot-removing though is a bigger challenge.)
>> Y.
>>
>>>
>>> On Thu, Jul 20, 2017 at 1:32 PM, Karli Sjöberg 
>>> wrote:
>>>


 Den 20 juli 2017 13:29 skrev Arman Khalatyan :

 Hi,
 Can some one share an experience with dynamic creating and removing VMs
 based on the load?
 Currently I am just creating with the python SDK a clone of the apache
 worker, are there way to copy some config files to the VM before starting
 it ?


 E.g. Puppet could easily swing that sort of job. If you deploy also
 Foreman, it could automate the entire procedure. Just a suggestion

 /K


 Thanks,
 Arman.

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users



>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-21 Thread yayo (j)
2017-07-20 14:48 GMT+02:00 Ravishankar N :

>
> But it does  say something. All these gfids of completed heals in the log
> below are the for the ones that you have given the getfattr output of. So
> what is likely happening is there is an intermittent connection problem
> between your mount and the brick process, leading to pending heals again
> after the heal gets completed, which is why the numbers are varying each
> time. You would need to check why that is the case.
> Hope this helps,
> Ravi
>
>
>
> *[2017-07-20 09:58:46.573079] I [MSGID: 108026]
> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
> Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327.
> sources=[0] 1  sinks=2*
> *[2017-07-20 09:59:22.995003] I [MSGID: 108026]
> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
> 0-engine-replicate-0: performing metadata selfheal on
> f05b9742-2771-484a-85fc-5b6974bcef81*
> *[2017-07-20 09:59:22.999372] I [MSGID: 108026]
> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
> Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
> sources=[0] 1  sinks=2*
>
>

Hi,

following your suggestion, I've checked the "peer" status and I found that
there is too many name for the hosts, I don't know if this can be the
problem or part of it:

*gluster peer status on NODE01:*
*Number of Peers: 2*

*Hostname: dnode02.localdomain.local*
*Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
*State: Peer in Cluster (Connected)*
*Other names:*
*192.168.10.52*
*dnode02.localdomain.local*
*10.10.20.90*
*10.10.10.20*




*gluster peer status on NODE02:*
*Number of Peers: 2*

*Hostname: dnode01.localdomain.local*
*Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
*State: Peer in Cluster (Connected)*
*Other names:*
*gdnode01*
*10.10.10.10*

*Hostname: gdnode04*
*Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828*
*State: Peer in Cluster (Connected)*
*Other names:*
*192.168.10.54*
*10.10.10.40*


*gluster peer status on NODE04:*
*Number of Peers: 2*

*Hostname: dnode02.neridom.dom*
*Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd*
*State: Peer in Cluster (Connected)*
*Other names:*
*10.10.20.90*
*gdnode02*
*192.168.10.52*
*10.10.10.20*

*Hostname: dnode01.localdomain.local*
*Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12*
*State: Peer in Cluster (Connected)*
*Other names:*
*gdnode01*
*10.10.10.10*



All these ip are pingable and hosts resolvible across all 3 nodes but, only
the 10.10.10.0 network is the decidated network for gluster  (rosolved
using gdnode* host names) ... You think that remove other entries can fix
the problem? So, sorry, but, how can I remove other entries?

And, what about the selinux?

Thank you
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] workflow suggestion for the creating and destroying the VMs?

2017-07-21 Thread Arman Khalatyan
thanks,the downscaling is important for me,
 i was testing something like:
 1) clone from actual vm(super slow,even if it is 20GB OS, needs more
investigation,nfs is bottle neck)
2) start it with dhcp,
3) somehow find the ip
4) sync parameters between running vm and new vm.

looks that everything might be possible with the python sdk...

are there some examples or tutorials with cloudinitscripts?

Am 21.07.2017 3:58 nachm. schrieb "Yaniv Kaul" :

>
>
> On Fri, Jul 21, 2017 at 6:07 AM, Arman Khalatyan 
> wrote:
>
>> Yes, thanks for mentioning puppet, we have foreman for the bare metal
>> systems.
>> I was looking something like preboot hook script, to mount the /dev/sda
>> and copy some stuff there.
>> Is it possible to do that with cloud-init/sysprep?
>>
>
> It is.
>
> However, I'd like to remind you that we also have some scale-up features
> you might want to consider - you can hot-add CPU and memory to VMs, which
> in some workloads (but not all) can be helpful and easier.
> (Hot-removing though is a bigger challenge.)
> Y.
>
>>
>> On Thu, Jul 20, 2017 at 1:32 PM, Karli Sjöberg 
>> wrote:
>>
>>>
>>>
>>> Den 20 juli 2017 13:29 skrev Arman Khalatyan :
>>>
>>> Hi,
>>> Can some one share an experience with dynamic creating and removing VMs
>>> based on the load?
>>> Currently I am just creating with the python SDK a clone of the apache
>>> worker, are there way to copy some config files to the VM before starting
>>> it ?
>>>
>>>
>>> E.g. Puppet could easily swing that sort of job. If you deploy also
>>> Foreman, it could automate the entire procedure. Just a suggestion
>>>
>>> /K
>>>
>>>
>>> Thanks,
>>> Arman.
>>>
>>> ___
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>>
>>>
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] Problemas with ovirtmgmt network used to connect VMs

2017-07-21 Thread FERNANDO FREDIANI

Has anyone had problem when using the ovirtmgmt bridge to connect VMs ?

I am still facing a bizarre problem where some VMs connected to this 
bridge stop passing traffic. Checking the problem further I see its mac 
address stops being learned by the bridge and the problem is resolved 
only with a VM reboot.


When I last saw the problem I run brctl showmacs ovirtmgmt and it shows 
me the VM's mac adress with agening timer 200.19. After the VM reboot I 
see the same mac with agening timer 0.00.
I don't see it in another environment where the ovirtmgmt is not used 
for VMs.


Does anyone have any clue about this type of behavior ?

Fernando
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] workflow suggestion for the creating and destroying the VMs?

2017-07-21 Thread Yaniv Kaul
On Fri, Jul 21, 2017 at 6:07 AM, Arman Khalatyan  wrote:

> Yes, thanks for mentioning puppet, we have foreman for the bare metal
> systems.
> I was looking something like preboot hook script, to mount the /dev/sda
> and copy some stuff there.
> Is it possible to do that with cloud-init/sysprep?
>

It is.

However, I'd like to remind you that we also have some scale-up features
you might want to consider - you can hot-add CPU and memory to VMs, which
in some workloads (but not all) can be helpful and easier.
(Hot-removing though is a bigger challenge.)
Y.

>
> On Thu, Jul 20, 2017 at 1:32 PM, Karli Sjöberg 
> wrote:
>
>>
>>
>> Den 20 juli 2017 13:29 skrev Arman Khalatyan :
>>
>> Hi,
>> Can some one share an experience with dynamic creating and removing VMs
>> based on the load?
>> Currently I am just creating with the python SDK a clone of the apache
>> worker, are there way to copy some config files to the VM before starting
>> it ?
>>
>>
>> E.g. Puppet could easily swing that sort of job. If you deploy also
>> Foreman, it could automate the entire procedure. Just a suggestion
>>
>> /K
>>
>>
>> Thanks,
>> Arman.
>>
>> ___
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-21 Thread Ravishankar N


On 07/21/2017 02:55 PM, yayo (j) wrote:
2017-07-20 14:48 GMT+02:00 Ravishankar N >:



But it does  say something. All these gfids of completed heals in
the log below are the for the ones that you have given the
getfattr output of. So what is likely happening is there is an
intermittent connection problem between your mount and the brick
process, leading to pending heals again after the heal gets
completed, which is why the numbers are varying each time. You
would need to check why that is the case.
Hope this helps,
Ravi




/[2017-07-20 09:58:46.573079] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed data selfheal on
e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1  sinks=2/
/[2017-07-20 09:59:22.995003] I [MSGID: 108026]
[afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
0-engine-replicate-0: performing metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81/
/[2017-07-20 09:59:22.999372] I [MSGID: 108026]
[afr-self-heal-common.c:1254:afr_log_selfheal]
0-engine-replicate-0: Completed metadata selfheal on
f05b9742-2771-484a-85fc-5b6974bcef81. sources=[0] 1  sinks=2/




Hi,

But we ha1e 2 gluster volume on the same network and the other one 
(the "Data" gluster) don't have any problems. Why you think there is a 
network problem?


Because pending self-heals come into the picture when I/O from the 
clients (mounts) do not succeed on some bricks. They are mostly due to

(a) the client losing connection to some bricks (likely),
(b) the I/O failing on the bricks themselves (unlikely).

If most of the i/o is also going to the 3rd brick (since you say the 
files are already present on all bricks and I/O is successful) , then it 
is likely to be (a).



How to check this on a gluster infrastructure?

In the fuse mount logs for the engine volume, check if there are any 
messages for brick disconnects. Something along the lines of 
"disconnected from volname-client-x".
Just guessing here, but maybe even the 'data' volume did experience 
disconnects and self-heals later but you did not observe it when you ran 
heal info. See the glustershd log or mount log for for self-heal 
completion messages on /0-data-replicate-0 /also.


Regards,
Ravi

Thank you





___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] workflow suggestion for the creating and destroying the VMs?

2017-07-21 Thread Arman Khalatyan
Yes, thanks for mentioning puppet, we have foreman for the bare metal
systems.
I was looking something like preboot hook script, to mount the /dev/sda and
copy some stuff there.
Is it possible to do that with cloud-init/sysprep?

On Thu, Jul 20, 2017 at 1:32 PM, Karli Sjöberg  wrote:

>
>
> Den 20 juli 2017 13:29 skrev Arman Khalatyan :
>
> Hi,
> Can some one share an experience with dynamic creating and removing VMs
> based on the load?
> Currently I am just creating with the python SDK a clone of the apache
> worker, are there way to copy some config files to the VM before starting
> it ?
>
>
> E.g. Puppet could easily swing that sort of job. If you deploy also
> Foreman, it could automate the entire procedure. Just a suggestion
>
> /K
>
>
> Thanks,
> Arman.
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

2017-07-21 Thread yayo (j)
2017-07-20 14:48 GMT+02:00 Ravishankar N :

>
> But it does  say something. All these gfids of completed heals in the log
> below are the for the ones that you have given the getfattr output of. So
> what is likely happening is there is an intermittent connection problem
> between your mount and the brick process, leading to pending heals again
> after the heal gets completed, which is why the numbers are varying each
> time. You would need to check why that is the case.
> Hope this helps,
> Ravi
>
>
>
> *[2017-07-20 09:58:46.573079] I [MSGID: 108026]
> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
> Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327.
> sources=[0] 1  sinks=2*
> *[2017-07-20 09:59:22.995003] I [MSGID: 108026]
> [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do]
> 0-engine-replicate-0: performing metadata selfheal on
> f05b9742-2771-484a-85fc-5b6974bcef81*
> *[2017-07-20 09:59:22.999372] I [MSGID: 108026]
> [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0:
> Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81.
> sources=[0] 1  sinks=2*
>
>

Hi,

But we ha1e 2 gluster volume on the same network and the other one (the
"Data" gluster) don't have any problems. Why you think there is a network
problem?  How to check this on a gluster infrastructure?

Thank you
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users