Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-26 Thread Abhinandan Prateek
It could be a timeout as in these tickets: 
https://issues.apache.org/jira/browse/CLOUDSTACK-9503/https://issues.apache.org/jira/browse/CLOUDSTACK-9569
Either you get the patch that increases the timeout OR you can add 
router.aggregation.command.each.timeout=<600>  in agent.properties and restart 
cloudstack-agent


From: Linas Žilinskas <li...@host1plus.com<mailto:li...@host1plus.com>>
Reply-To: "dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>" 
<dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>>
Date: Thursday, 20 October 2016 at 3:29 PM
To: "dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>" 
<dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org>>
Subject: patchviasocket seems to be broken with qemu 2.3(+?)


Hi.

We have made an upgrade to 4.9.

Custom build packages with our own patches, which in my mind (i'm the only one 
patching those) should not affect the issue i'll describe.

I'm not sure whether we didn't notice it before, or it's actually related to 
something in 4.9

Basically our system vm's were unable to be patched via the qemu socket. The 
script simply error'ed out with a timeout while trying to push the data to the 
socket.

Executing it manually (with cmd line from the logs) resulted the same. I even 
tried the old perl variant, which also had same result.

So finally we found out that this issue happens only on our HVs which run qemu 
2.3.0, from the centos 7 special interest virtualization repo. Other ones that 
run qemu 1.5, from official repos, can patch the system vms fine.

So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x? Maybe it 
something else special in our setup. e.g. we're running the HVs from a 
preconfigured netboot image (pxe), but all of them, including those with qemu 
1.5, so i have no idea.

Linas Žilinskas
Head of Development
[cid:part1.BFF385E0.BF4C3EA6@host1plus.com]
website<http://www.host1plus.com/> facebook<https://www.facebook.com/Host1Plus> 
twitter<https://twitter.com/Host1Plus> 
linkedin<https://www.linkedin.com/company/digital-energy-technologies-ltd.>

Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom



abhinandan.prat...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-26 Thread Syahrul Sazli Shaharir

Hi,

Update: after a reboot of all hosts during the weekend (resulting in 
reboot of all VMs), the problematic router VM is OK now. Not sure what 
had caused it.


Thanks.

On 2016-12-22 14:03, Syahrul Sazli Shaharir wrote:

On 2016-12-21 23:26, Linas Žilinskas wrote:

At this point I'm not sure what the issue for you could be. Did you
try recreating the failing vrouter?


Yes, multiple times by destroying it and/or restarting the network -
failed every time.


Also, just in case, check if there's free disk space on it. We had
some vrouters stuck due to this, and i saw another thread here
discussing it.


Plenty of space in the stuck VM:-

root@r-691-VM:~# df -h
Filesystem  Size  Used
Avail Use% Mounted on
rootfs  461M  157M  
281M  36% /

udev 10M 0
10M   0% /dev
tmpfs50M  236K
50M   1% /run
/dev/disk/by-uuid/6a0427bc-6052-48de-a4b8-c82d8217ed1d  461M  157M  
281M  36% /

tmpfs   5.0M 0
5.0M   0% /run/lock
tmpfs   207M 0
207M   0% /run/shm
/dev/vda173M   23M
47M  33% /boot
/dev/vda692M  5.6M
81M   7% /home
/dev/vda8   184M  6.2M
169M   4% /opt
/dev/vda11   92M  5.6M
81M   7% /tmp
/dev/vda7   751M  493M
219M  70% /usr
/dev/vda9   563M  157M
377M  30% /var
/dev/vda10  184M  7.2M
168M   5% /var/log

Thanks.



Basically the /var/log/ partition fills up, since it's relatively
small. And if you had issues for a period of time with that specific
router and restarted it multiple times, the log partition might be
full.

On 21/12/16 06:35, Syahrul Sazli Shaharir wrote:


On 2016-12-20 17:53, Wei ZHOU wrote:


Hi Synhrul,

Could you upload the /var/log/cloud.log ?


Sure:-

Working router VM: http://pastebin.com/hwwk86ve

Non-working router VM: http://pastebin.com/G4nv09ab

Thanks.

-Wei

2016-12-20 3:18 GMT+01:00 Syahrul Sazli Shaharir :


On 2016-12-19 18:10, Syahrul Sazli Shaharir wrote:

On 2016-12-19 17:03, Linas Žilinskas wrote:

From the logs it doesn't seem that the script timeouts. "Execution
is
successful", so it manages to pass the data over the socket.

I guess the systemvm just doesn't configure itself for some reason.

You are right, I was able to enter the router VM console at some
point
during the timeout loops, and able to capture syslog output during
the
loop:-

http://pastebin.com/n37aHeSa


I restarted another network, and that network's router VM was able to
be
recreated, even on the same host as the failed network (and both
networks
are exactly same configuration, only VLAN & subnet are different).
Comparing between the two syslog outputs during boot shows the
problematic
network router VM self-configuration got stuck in vm_dhcp_entry.json .


1. Working network router VM : http://pastebin.com/Y6zpDa6M
2. Non-working network router VM : http://pastebin.com/jzfGMGQB

Thanks.


Also, in my personal tests, I noticed some different behaviour with


different kernels. Don't remember the specifics right now, but on
some
combinations (qemu / kernel) the socket acted differently. For
example
the data was sent over the socket, but wasn't visible inside the
VM.
Other times the socket would be stuck from the host side.

So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x)
or
try to login to the system vm and see what's happening from
inside.


Will do this next and feedback the results here.

Thanks for your help! :)

On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:

On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:
On Wed, 26 Oct 2016, Linas ?ilinskas wrote:

So after some investigation I've found out that qemu 2.3.0 is indeed

broken, at least the way CS uses the qemu chardev/socket.

Not sure in which specific version it happened, but it was fixed in
2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.

qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

Also attaching the patch from that commit.

For our own purposes i've included the patch to the qemu-kvm-ev
package (2.3.0) and all is well.

Hi,

I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
package.

The issue initially surfaced following a heartbeat-induced reset of
all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts
persisted for 1 out of 4 router VM/networks, even after upgrading to


latest code. (I have checked the 

Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-21 Thread Syahrul Sazli Shaharir

On 2016-12-21 23:26, Linas Žilinskas wrote:

At this point I'm not sure what the issue for you could be. Did you
try recreating the failing vrouter?


Yes, multiple times by destroying it and/or restarting the network - 
failed every time.



Also, just in case, check if there's free disk space on it. We had
some vrouters stuck due to this, and i saw another thread here
discussing it.


Plenty of space in the stuck VM:-

root@r-691-VM:~# df -h
Filesystem  Size  Used Avail 
Use% Mounted on
rootfs  461M  157M  281M 
 36% /
udev 10M 0   10M 
  0% /dev
tmpfs50M  236K   50M 
  1% /run
/dev/disk/by-uuid/6a0427bc-6052-48de-a4b8-c82d8217ed1d  461M  157M  281M 
 36% /
tmpfs   5.0M 0  5.0M 
  0% /run/lock
tmpfs   207M 0  207M 
  0% /run/shm
/dev/vda173M   23M   47M 
 33% /boot
/dev/vda692M  5.6M   81M 
  7% /home
/dev/vda8   184M  6.2M  169M 
  4% /opt
/dev/vda11   92M  5.6M   81M 
  7% /tmp
/dev/vda7   751M  493M  219M 
 70% /usr
/dev/vda9   563M  157M  377M 
 30% /var
/dev/vda10  184M  7.2M  168M 
  5% /var/log


Thanks.



Basically the /var/log/ partition fills up, since it's relatively
small. And if you had issues for a period of time with that specific
router and restarted it multiple times, the log partition might be
full.

On 21/12/16 06:35, Syahrul Sazli Shaharir wrote:


On 2016-12-20 17:53, Wei ZHOU wrote:


Hi Synhrul,

Could you upload the /var/log/cloud.log ?


Sure:-

Working router VM: http://pastebin.com/hwwk86ve

Non-working router VM: http://pastebin.com/G4nv09ab

Thanks.

-Wei

2016-12-20 3:18 GMT+01:00 Syahrul Sazli Shaharir :


On 2016-12-19 18:10, Syahrul Sazli Shaharir wrote:

On 2016-12-19 17:03, Linas Žilinskas wrote:

From the logs it doesn't seem that the script timeouts. "Execution
is
successful", so it manages to pass the data over the socket.

I guess the systemvm just doesn't configure itself for some reason.

You are right, I was able to enter the router VM console at some
point
during the timeout loops, and able to capture syslog output during
the
loop:-

http://pastebin.com/n37aHeSa


I restarted another network, and that network's router VM was able to
be
recreated, even on the same host as the failed network (and both
networks
are exactly same configuration, only VLAN & subnet are different).
Comparing between the two syslog outputs during boot shows the
problematic
network router VM self-configuration got stuck in vm_dhcp_entry.json .


1. Working network router VM : http://pastebin.com/Y6zpDa6M
2. Non-working network router VM : http://pastebin.com/jzfGMGQB

Thanks.


Also, in my personal tests, I noticed some different behaviour with


different kernels. Don't remember the specifics right now, but on
some
combinations (qemu / kernel) the socket acted differently. For
example
the data was sent over the socket, but wasn't visible inside the
VM.
Other times the socket would be stuck from the host side.

So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x)
or
try to login to the system vm and see what's happening from
inside.


Will do this next and feedback the results here.

Thanks for your help! :)

On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:

On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:
On Wed, 26 Oct 2016, Linas ?ilinskas wrote:

So after some investigation I've found out that qemu 2.3.0 is indeed

broken, at least the way CS uses the qemu chardev/socket.

Not sure in which specific version it happened, but it was fixed in
2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.

qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

Also attaching the patch from that commit.

For our own purposes i've included the patch to the qemu-kvm-ev
package (2.3.0) and all is well.

Hi,

I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
package.

The issue initially surfaced following a heartbeat-induced reset of
all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts
persisted for 1 out of 4 router VM/networks, even after upgrading to


latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
and the patched code are pretty much still intact, as per the
2.4.0-rc3 commit).

Any help would be greatly appreciated.

Thanks.

(Attached are some debug logs from the host's 

Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-21 Thread Linas Žilinskas
At this point I'm not sure what the issue for you could be. Did you try 
recreating the failing vrouter?


Also, just in case, check if there's free disk space on it. We had some 
vrouters stuck due to this, and i saw another thread here discussing it.


Basically the /var/log/ partition fills up, since it's relatively small. 
And if you had issues for a period of time with that specific router and 
restarted it multiple times, the log partition might be full.



On 21/12/16 06:35, Syahrul Sazli Shaharir wrote:

On 2016-12-20 17:53, Wei ZHOU wrote:

Hi Synhrul,

Could you upload the /var/log/cloud.log ?


Sure:-

Working router VM: http://pastebin.com/hwwk86ve

Non-working router VM: http://pastebin.com/G4nv09ab

Thanks.



-Wei

2016-12-20 3:18 GMT+01:00 Syahrul Sazli Shaharir :


On 2016-12-19 18:10, Syahrul Sazli Shaharir wrote:


On 2016-12-19 17:03, Linas Žilinskas wrote:


From the logs it doesn't seem that the script timeouts. "Execution is
successful", so it manages to pass the data over the socket.

I guess the systemvm just doesn't configure itself for some reason.



You are right, I was able to enter the router VM console at some point
during the timeout loops, and able to capture syslog output during the
loop:-

http://pastebin.com/n37aHeSa



I restarted another network, and that network's router VM was able 
to be
recreated, even on the same host as the failed network (and both 
networks

are exactly same configuration, only VLAN & subnet are different).
Comparing between the two syslog outputs during boot shows the 
problematic

network router VM self-configuration got stuck in vm_dhcp_entry.json .

1. Working network router VM : http://pastebin.com/Y6zpDa6M
2. Non-working network router VM : http://pastebin.com/jzfGMGQB

Thanks.




Also, in my personal tests, I noticed some different behaviour with
different kernels. Don't remember the specifics right now, but on 
some
combinations (qemu / kernel) the socket acted differently. For 
example

the data was sent over the socket, but wasn't visible inside the VM.
Other times the socket would be stuck from the host side.

So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x) or
try to login to the system vm and see what's happening from inside.



Will do this next and feedback the results here.

Thanks for your help! :)


On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:


On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:

On Wed, 26 Oct 2016, Linas ?ilinskas wrote:

So after some investigation I've found out that qemu 2.3.0 is indeed
broken, at least the way CS uses the qemu chardev/socket.

Not sure in which specific version it happened, but it was fixed in
2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.

qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

Also attaching the patch from that commit.

For our own purposes i've included the patch to the qemu-kvm-ev
package (2.3.0) and all is well.

Hi,

I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
package.

The issue initially surfaced following a heartbeat-induced reset of
all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts
persisted for 1 out of 4 router VM/networks, even after upgrading to

latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
and the patched code are pretty much still intact, as per the
2.4.0-rc3 commit).

Any help would be greatly appreciated.

Thanks.

(Attached are some debug logs from the host's agent.log)



Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ

Thanks.

--sazli


On 2016-10-20 09:59, Linas ?ilinskas wrote:

Hi.

We have made an upgrade to 4.9.

Custom build packages with our own patches, which in my mind (i'm
the only
one patching those) should not affect the issue i'll describe.

I'm not sure whether we didn't notice it before, or it's actually
related
to something in 4.9

Basically our system vm's were unable to be patched via the qemu
socket.
The script simply error'ed out with a timeout while trying to push
the
data to the socket.

Executing it manually (with cmd line from the logs) resulted the
same. I
even tried the old perl variant, which also had same result.

So finally we found out that this issue happens only on our HVs
which run
qemu 2.3.0, from the centos 7 special interest virtualization repo.
Other
ones that run qemu 1.5, from official repos, can patch the system
vms
fine.

So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x?
Maybe it
something else special in our setup. e.g. we're running the HVs
from a
preconfigured netboot image (pxe), but all of them, including those
with
qemu 1.5, so i have no idea.

Linas ?ilinskas
Head of Development
website  [1] facebook
 [2] twitter
 [3] linkedin

Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-20 Thread Syahrul Sazli Shaharir

On 2016-12-20 17:53, Wei ZHOU wrote:

Hi Synhrul,

Could you upload the /var/log/cloud.log ?


Sure:-

Working router VM: http://pastebin.com/hwwk86ve

Non-working router VM: http://pastebin.com/G4nv09ab

Thanks.



-Wei

2016-12-20 3:18 GMT+01:00 Syahrul Sazli Shaharir :


On 2016-12-19 18:10, Syahrul Sazli Shaharir wrote:


On 2016-12-19 17:03, Linas Žilinskas wrote:

From the logs it doesn't seem that the script timeouts. "Execution 
is

successful", so it manages to pass the data over the socket.

I guess the systemvm just doesn't configure itself for some reason.



You are right, I was able to enter the router VM console at some 
point
during the timeout loops, and able to capture syslog output during 
the

loop:-

http://pastebin.com/n37aHeSa



I restarted another network, and that network's router VM was able to 
be
recreated, even on the same host as the failed network (and both 
networks

are exactly same configuration, only VLAN & subnet are different).
Comparing between the two syslog outputs during boot shows the 
problematic

network router VM self-configuration got stuck in vm_dhcp_entry.json .

1. Working network router VM : http://pastebin.com/Y6zpDa6M
2. Non-working network router VM : http://pastebin.com/jzfGMGQB

Thanks.




Also, in my personal tests, I noticed some different behaviour with
different kernels. Don't remember the specifics right now, but on 
some
combinations (qemu / kernel) the socket acted differently. For 
example

the data was sent over the socket, but wasn't visible inside the VM.
Other times the socket would be stuck from the host side.

So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x) or
try to login to the system vm and see what's happening from inside.



Will do this next and feedback the results here.

Thanks for your help! :)


On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:


On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:

On Wed, 26 Oct 2016, Linas ?ilinskas wrote:

So after some investigation I've found out that qemu 2.3.0 is 
indeed

broken, at least the way CS uses the qemu chardev/socket.

Not sure in which specific version it happened, but it was fixed in
2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.

qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

Also attaching the patch from that commit.

For our own purposes i've included the patch to the qemu-kvm-ev
package (2.3.0) and all is well.

Hi,

I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
package.

The issue initially surfaced following a heartbeat-induced reset of
all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts
persisted for 1 out of 4 router VM/networks, even after upgrading 
to


latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
and the patched code are pretty much still intact, as per the
2.4.0-rc3 commit).

Any help would be greatly appreciated.

Thanks.

(Attached are some debug logs from the host's agent.log)



Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ

Thanks.

--sazli


On 2016-10-20 09:59, Linas ?ilinskas wrote:

Hi.

We have made an upgrade to 4.9.

Custom build packages with our own patches, which in my mind (i'm
the only
one patching those) should not affect the issue i'll describe.

I'm not sure whether we didn't notice it before, or it's actually
related
to something in 4.9

Basically our system vm's were unable to be patched via the qemu
socket.
The script simply error'ed out with a timeout while trying to push
the
data to the socket.

Executing it manually (with cmd line from the logs) resulted the
same. I
even tried the old perl variant, which also had same result.

So finally we found out that this issue happens only on our HVs
which run
qemu 2.3.0, from the centos 7 special interest virtualization repo.
Other
ones that run qemu 1.5, from official repos, can patch the system
vms
fine.

So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x?
Maybe it
something else special in our setup. e.g. we're running the HVs
from a
preconfigured netboot image (pxe), but all of them, including those
with
qemu 1.5, so i have no idea.

Linas ?ilinskas
Head of Development
website  [1] facebook
 [2] twitter
 [3] linkedin

[4]

Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom



--
--sazli



Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-20 Thread Wei ZHOU
Hi Linas,

Good to know it. It looks increase the listening time in
/etc/init.d/cloud-early-config will fix it (default is 5 times * 2 seconds).

-Wei

2016-12-20 9:23 GMT+01:00 Linas Žilinskas :

> I don't think the issue is the same. As i mentioned in the original report
> and my findings afterwards, this is a specifically qemu issue which was
> fixed in 2.4.0-rc3.
>
> The issue was the way qemu exposes the socket to communicate with VM. It
> didn't queue data, so unless the VM was listening on /dev/vport.. at the
> time when data is sent, it would never reiceive it. 2.4.0-rc3 fixed this by
> queueing the data sent, so once sent, it was accessible (only once) when
> the VM checked /dev/vport..
>
> On 19/12/16 10:37, Wei ZHOU wrote:
>
> Hi Linas,
>
> It seems the issue you mentioned has been fixed by the commits for
> https://issues.apache.org/jira/browse/CLOUDSTACK-2823
>
> CloudStack-agent will try to pass the boot args 30 times if the console Ip
> is not accessible.
>
> Weird.
> -Wei
>
>


Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-20 Thread Wei ZHOU
Hi Synhrul,

Could you upload the /var/log/cloud.log ?

-Wei

2016-12-20 3:18 GMT+01:00 Syahrul Sazli Shaharir :

> On 2016-12-19 18:10, Syahrul Sazli Shaharir wrote:
>
>> On 2016-12-19 17:03, Linas Žilinskas wrote:
>>
>>> From the logs it doesn't seem that the script timeouts. "Execution is
>>> successful", so it manages to pass the data over the socket.
>>>
>>> I guess the systemvm just doesn't configure itself for some reason.
>>>
>>
>> You are right, I was able to enter the router VM console at some point
>> during the timeout loops, and able to capture syslog output during the
>> loop:-
>>
>> http://pastebin.com/n37aHeSa
>>
>
> I restarted another network, and that network's router VM was able to be
> recreated, even on the same host as the failed network (and both networks
> are exactly same configuration, only VLAN & subnet are different).
> Comparing between the two syslog outputs during boot shows the problematic
> network router VM self-configuration got stuck in vm_dhcp_entry.json .
>
> 1. Working network router VM : http://pastebin.com/Y6zpDa6M
> 2. Non-working network router VM : http://pastebin.com/jzfGMGQB
>
> Thanks.
>
>
>
>> Also, in my personal tests, I noticed some different behaviour with
>>> different kernels. Don't remember the specifics right now, but on some
>>> combinations (qemu / kernel) the socket acted differently. For example
>>> the data was sent over the socket, but wasn't visible inside the VM.
>>> Other times the socket would be stuck from the host side.
>>>
>>> So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x) or
>>> try to login to the system vm and see what's happening from inside.
>>>
>>
>> Will do this next and feedback the results here.
>>
>> Thanks for your help! :)
>>
>>
>> On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:
>>>
>>> On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:
 On Wed, 26 Oct 2016, Linas ?ilinskas wrote:

 So after some investigation I've found out that qemu 2.3.0 is indeed
 broken, at least the way CS uses the qemu chardev/socket.

 Not sure in which specific version it happened, but it was fixed in
 2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.

 qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

 Also attaching the patch from that commit.

 For our own purposes i've included the patch to the qemu-kvm-ev
 package (2.3.0) and all is well.

 Hi,

 I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
 latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
 package.

 The issue initially surfaced following a heartbeat-induced reset of
 all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
 qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts
 persisted for 1 out of 4 router VM/networks, even after upgrading to

 latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
 and the patched code are pretty much still intact, as per the
 2.4.0-rc3 commit).

 Any help would be greatly appreciated.

 Thanks.

 (Attached are some debug logs from the host's agent.log)

>>>
>>> Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ
>>>
>>> Thanks.
>>>
>>> --sazli

 On 2016-10-20 09:59, Linas ?ilinskas wrote:

 Hi.

 We have made an upgrade to 4.9.

 Custom build packages with our own patches, which in my mind (i'm
 the only
 one patching those) should not affect the issue i'll describe.

 I'm not sure whether we didn't notice it before, or it's actually
 related
 to something in 4.9

 Basically our system vm's were unable to be patched via the qemu
 socket.
 The script simply error'ed out with a timeout while trying to push
 the
 data to the socket.

 Executing it manually (with cmd line from the logs) resulted the
 same. I
 even tried the old perl variant, which also had same result.

 So finally we found out that this issue happens only on our HVs
 which run
 qemu 2.3.0, from the centos 7 special interest virtualization repo.
 Other
 ones that run qemu 1.5, from official repos, can patch the system
 vms
 fine.

 So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x?
 Maybe it
 something else special in our setup. e.g. we're running the HVs
 from a
 preconfigured netboot image (pxe), but all of them, including those
 with
 qemu 1.5, so i have no idea.

 Linas ?ilinskas
 Head of Development
 website  [1] facebook
  [2] twitter
  [3] linkedin
 
 [4]

 Host1Plus is a division of Digital Energy Technologies Ltd.

 26 York Street, 

Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-20 Thread Linas Žilinskas
I don't think the issue is the same. As i mentioned in the original 
report and my findings afterwards, this is a specifically qemu issue 
which was fixed in 2.4.0-rc3.


The issue was the way qemu exposes the socket to communicate with VM. It 
didn't queue data, so unless the VM was listening on /dev/vport.. at the 
time when data is sent, it would never reiceive it. 2.4.0-rc3 fixed this 
by queueing the data sent, so once sent, it was accessible (only once) 
when the VM checked /dev/vport..



On 19/12/16 10:37, Wei ZHOU wrote:

Hi Linas,

It seems the issue you mentioned has been fixed by the commits for 
https://issues.apache.org/jira/browse/CLOUDSTACK-2823


CloudStack-agent will try to pass the boot args 30 times if the 
console Ip is not accessible.


Weird.

-Wei

2016-12-19 10:03 GMT+01:00 Linas Žilinskas >:


From the logs it doesn't seem that the script timeouts. "Execution
is successful", so it manages to pass the data over the socket.

I guess the systemvm just doesn't configure itself for some reason.

Also, in my personal tests, I noticed some different behaviour
with different kernels. Don't remember the specifics right now,
but on some combinations (qemu / kernel) the socket acted
differently. For example the data was sent over the socket, but
wasn't visible inside the VM. Other times the socket would be
stuck from the host side.

So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x)
or try to login to the system vm and see what's happening from inside.


On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:

On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:

On Wed, 26 Oct 2016, Linas ?ilinskas wrote:


So after some investigation I've found out that qemu 2.3.0 is
indeed broken, at least the way CS uses the qemu chardev/socket.

Not sure in which specific version it happened, but it was
fixed in 2.4.0-rc3, specifically noting that CloudStack 4.2 was
not working.

qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

Also attaching the patch from that commit.


For our own purposes i've included the patch to the qemu-kvm-ev
package (2.3.0) and all is well.


Hi,

I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
package.

The issue initially surfaced following a heartbeat-induced reset of
all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py
 timeouts
persisted for 1 out of 4 router VM/networks, even after
upgrading to
latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
and the patched code are pretty much still intact, as per the
2.4.0-rc3 commit).

Any help would be greatly appreciated.

Thanks.

(Attached are some debug logs from the host's agent.log)


Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ

Thanks.



--sazli




On 2016-10-20 09:59, Linas ?ilinskas wrote:


 Hi.

 We have made an upgrade to 4.9.

 Custom build packages with our own patches, which in my mind
(i'm the only
 one patching those) should not affect the issue i'll describe.

 I'm not sure whether we didn't notice it before, or it's
actually related
 to something in 4.9

 Basically our system vm's were unable to be patched via the
qemu socket.
 The script simply error'ed out with a timeout while trying to
push the
 data to the socket.

 Executing it manually (with cmd line from the logs) resulted
the same. I
 even tried the old perl variant, which also had same result.

 So finally we found out that this issue happens only on our
HVs which run
 qemu 2.3.0, from the centos 7 special interest virtualization
repo. Other
 ones that run qemu 1.5, from official repos, can patch the
system vms
 fine.

 So i'm wondering if anyone tested 4.9 with kvm with qemu >=
2.x? Maybe it
 something else special in our setup. e.g. we're running the
HVs from a
 preconfigured netboot image (pxe), but all of them, including
those with
 qemu 1.5, so i have no idea.


 Linas ?ilinskas
 Head of Development
 website 
 facebook

 twitter

 linkedin




 Host1Plus is a division of Digital Energy Technologies Ltd.

 26 York Street, London W1U 6PZ, United Kingdom



Linas ?ilinskas
Head of Development
website 

Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-19 Thread Syahrul Sazli Shaharir

On 2016-12-19 18:10, Syahrul Sazli Shaharir wrote:

On 2016-12-19 17:03, Linas Žilinskas wrote:

From the logs it doesn't seem that the script timeouts. "Execution is
successful", so it manages to pass the data over the socket.

I guess the systemvm just doesn't configure itself for some reason.


You are right, I was able to enter the router VM console at some point
during the timeout loops, and able to capture syslog output during the
loop:-

http://pastebin.com/n37aHeSa


I restarted another network, and that network's router VM was able to be 
recreated, even on the same host as the failed network (and both 
networks are exactly same configuration, only VLAN & subnet are 
different). Comparing between the two syslog outputs during boot shows 
the problematic network router VM self-configuration got stuck in 
vm_dhcp_entry.json .


1. Working network router VM : http://pastebin.com/Y6zpDa6M
2. Non-working network router VM : http://pastebin.com/jzfGMGQB

Thanks.




Also, in my personal tests, I noticed some different behaviour with
different kernels. Don't remember the specifics right now, but on some
combinations (qemu / kernel) the socket acted differently. For example
the data was sent over the socket, but wasn't visible inside the VM.
Other times the socket would be stuck from the host side.

So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x) or
try to login to the system vm and see what's happening from inside.


Will do this next and feedback the results here.

Thanks for your help! :)



On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:


On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:
On Wed, 26 Oct 2016, Linas ?ilinskas wrote:

So after some investigation I've found out that qemu 2.3.0 is indeed
broken, at least the way CS uses the qemu chardev/socket.

Not sure in which specific version it happened, but it was fixed in
2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.

qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

Also attaching the patch from that commit.

For our own purposes i've included the patch to the qemu-kvm-ev
package (2.3.0) and all is well.

Hi,

I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
package.

The issue initially surfaced following a heartbeat-induced reset of
all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts
persisted for 1 out of 4 router VM/networks, even after upgrading to

latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
and the patched code are pretty much still intact, as per the
2.4.0-rc3 commit).

Any help would be greatly appreciated.

Thanks.

(Attached are some debug logs from the host's agent.log)


Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ

Thanks.


--sazli

On 2016-10-20 09:59, Linas ?ilinskas wrote:

Hi.

We have made an upgrade to 4.9.

Custom build packages with our own patches, which in my mind (i'm
the only
one patching those) should not affect the issue i'll describe.

I'm not sure whether we didn't notice it before, or it's actually
related
to something in 4.9

Basically our system vm's were unable to be patched via the qemu
socket.
The script simply error'ed out with a timeout while trying to push
the
data to the socket.

Executing it manually (with cmd line from the logs) resulted the
same. I
even tried the old perl variant, which also had same result.

So finally we found out that this issue happens only on our HVs
which run
qemu 2.3.0, from the centos 7 special interest virtualization repo.
Other
ones that run qemu 1.5, from official repos, can patch the system
vms
fine.

So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x?
Maybe it
something else special in our setup. e.g. we're running the HVs
from a
preconfigured netboot image (pxe), but all of them, including those
with
qemu 1.5, so i have no idea.

Linas ?ilinskas
Head of Development
website  [1] facebook
 [2] twitter
 [3] linkedin

[4]

Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom



--
--sazli


Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-19 Thread Wei ZHOU
Hi Linas,

It seems the issue you mentioned has been fixed by the commits for
https://issues.apache.org/jira/browse/CLOUDSTACK-2823

CloudStack-agent will try to pass the boot args 30 times if the console Ip
is not accessible.

Weird.

-Wei

2016-12-19 10:03 GMT+01:00 Linas Žilinskas :

> From the logs it doesn't seem that the script timeouts. "Execution is
> successful", so it manages to pass the data over the socket.
>
> I guess the systemvm just doesn't configure itself for some reason.
>
> Also, in my personal tests, I noticed some different behaviour with
> different kernels. Don't remember the specifics right now, but on some
> combinations (qemu / kernel) the socket acted differently. For example the
> data was sent over the socket, but wasn't visible inside the VM. Other
> times the socket would be stuck from the host side.
>
> So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x) or try to
> login to the system vm and see what's happening from inside.
>
>
> On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:
>
> On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:
>
> On Wed, 26 Oct 2016, Linas ?ilinskas wrote:
>
> So after some investigation I've found out that qemu 2.3.0 is indeed
> broken, at least the way CS uses the qemu chardev/socket.
>
> Not sure in which specific version it happened, but it was fixed in
> 2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.
>
> qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338
>
> Also attaching the patch from that commit.
>
>
> For our own purposes i've included the patch to the qemu-kvm-ev package
> (2.3.0) and all is well.
>
>
> Hi,
>
> I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
> latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
> package.
>
> The issue initially surfaced following a heartbeat-induced reset of
> all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
> qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts
> persisted for 1 out of 4 router VM/networks, even after upgrading to
> latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
> and the patched code are pretty much still intact, as per the
> 2.4.0-rc3 commit).
>
> Any help would be greatly appreciated.
>
> Thanks.
>
> (Attached are some debug logs from the host's agent.log)
>
>
> Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ
>
> Thanks.
>
>
> --sazli
>
>
>
> On 2016-10-20 09:59, Linas ?ilinskas wrote:
>
>
>  Hi.
>
>  We have made an upgrade to 4.9.
>
>  Custom build packages with our own patches, which in my mind (i'm the
> only
>  one patching those) should not affect the issue i'll describe.
>
>  I'm not sure whether we didn't notice it before, or it's actually related
>  to something in 4.9
>
>  Basically our system vm's were unable to be patched via the qemu socket.
>  The script simply error'ed out with a timeout while trying to push the
>  data to the socket.
>
>  Executing it manually (with cmd line from the logs) resulted the same. I
>  even tried the old perl variant, which also had same result.
>
>  So finally we found out that this issue happens only on our HVs which run
>  qemu 2.3.0, from the centos 7 special interest virtualization repo. Other
>  ones that run qemu 1.5, from official repos, can patch the system vms
>  fine.
>
>  So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x? Maybe it
>  something else special in our setup. e.g. we're running the HVs from a
>  preconfigured netboot image (pxe), but all of them, including those with
>  qemu 1.5, so i have no idea.
>
>
>  Linas ?ilinskas
>  Head of Development
>  website   facebook
>   
> twitter
>    linkedin
>  
> 
>
>  Host1Plus is a division of Digital Energy Technologies Ltd.
>
>  26 York Street, London W1U 6PZ, United Kingdom
>
>
> Linas ?ilinskas
> Head of Development
> website   facebook
>  
> twitter  
> linkedin  technologies-ltd.>
> 
>
> Host1Plus is a division of Digital Energy Technologies Ltd.
>
> 26 York Street, London W1U 6PZ, United Kingdom
>
>
>
>
>
> Linas Žilinskas
> Head of Development
> website  facebook
>  twitter
>  linkedin
> 
>
> Host1Plus is a division of Digital Energy 

Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-19 Thread Syahrul Sazli Shaharir

On 2016-12-19 17:03, Linas Žilinskas wrote:

From the logs it doesn't seem that the script timeouts. "Execution is
successful", so it manages to pass the data over the socket.

I guess the systemvm just doesn't configure itself for some reason.


You are right, I was able to enter the router VM console at some point 
during the timeout loops, and able to capture syslog output during the 
loop:-


http://pastebin.com/n37aHeSa


Also, in my personal tests, I noticed some different behaviour with
different kernels. Don't remember the specifics right now, but on some
combinations (qemu / kernel) the socket acted differently. For example
the data was sent over the socket, but wasn't visible inside the VM.
Other times the socket would be stuck from the host side.

So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x) or
try to login to the system vm and see what's happening from inside.


Will do this next and feedback the results here.

Thanks for your help! :)



On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:


On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:
On Wed, 26 Oct 2016, Linas ?ilinskas wrote:

So after some investigation I've found out that qemu 2.3.0 is indeed
broken, at least the way CS uses the qemu chardev/socket.

Not sure in which specific version it happened, but it was fixed in
2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.

qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

Also attaching the patch from that commit.

For our own purposes i've included the patch to the qemu-kvm-ev
package (2.3.0) and all is well.

Hi,

I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
package.

The issue initially surfaced following a heartbeat-induced reset of
all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts
persisted for 1 out of 4 router VM/networks, even after upgrading to

latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
and the patched code are pretty much still intact, as per the
2.4.0-rc3 commit).

Any help would be greatly appreciated.

Thanks.

(Attached are some debug logs from the host's agent.log)


Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ

Thanks.


--sazli

On 2016-10-20 09:59, Linas ?ilinskas wrote:

Hi.

We have made an upgrade to 4.9.

Custom build packages with our own patches, which in my mind (i'm
the only
one patching those) should not affect the issue i'll describe.

I'm not sure whether we didn't notice it before, or it's actually
related
to something in 4.9

Basically our system vm's were unable to be patched via the qemu
socket.
The script simply error'ed out with a timeout while trying to push
the
data to the socket.

Executing it manually (with cmd line from the logs) resulted the
same. I
even tried the old perl variant, which also had same result.

So finally we found out that this issue happens only on our HVs
which run
qemu 2.3.0, from the centos 7 special interest virtualization repo.
Other
ones that run qemu 1.5, from official repos, can patch the system
vms
fine.

So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x?
Maybe it
something else special in our setup. e.g. we're running the HVs
from a
preconfigured netboot image (pxe), but all of them, including those
with
qemu 1.5, so i have no idea.

Linas ?ilinskas
Head of Development
website  [1] facebook
 [2] twitter
 [3] linkedin

[4]

Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom



--
--sazli


Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-19 Thread Linas Žilinskas
From the logs it doesn't seem that the script timeouts. "Execution is 
successful", so it manages to pass the data over the socket.


I guess the systemvm just doesn't configure itself for some reason.

Also, in my personal tests, I noticed some different behaviour with 
different kernels. Don't remember the specifics right now, but on some 
combinations (qemu / kernel) the socket acted differently. For example 
the data was sent over the socket, but wasn't visible inside the VM. 
Other times the socket would be stuck from the host side.


So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x) or try 
to login to the system vm and see what's happening from inside.



On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:

On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:

On Wed, 26 Oct 2016, Linas ?ilinskas wrote:

So after some investigation I've found out that qemu 2.3.0 is indeed 
broken, at least the way CS uses the qemu chardev/socket.


Not sure in which specific version it happened, but it was fixed in 
2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.


qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

Also attaching the patch from that commit.


For our own purposes i've included the patch to the qemu-kvm-ev 
package (2.3.0) and all is well.


Hi,

I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
package.

The issue initially surfaced following a heartbeat-induced reset of
all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts
persisted for 1 out of 4 router VM/networks, even after upgrading to
latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
and the patched code are pretty much still intact, as per the
2.4.0-rc3 commit).

Any help would be greatly appreciated.

Thanks.

(Attached are some debug logs from the host's agent.log)


Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ

Thanks.



--sazli




On 2016-10-20 09:59, Linas ?ilinskas wrote:


 Hi.

 We have made an upgrade to 4.9.

 Custom build packages with our own patches, which in my mind (i'm 
the only

 one patching those) should not affect the issue i'll describe.

 I'm not sure whether we didn't notice it before, or it's actually 
related

 to something in 4.9

 Basically our system vm's were unable to be patched via the qemu 
socket.
 The script simply error'ed out with a timeout while trying to push 
the

 data to the socket.

 Executing it manually (with cmd line from the logs) resulted the 
same. I

 even tried the old perl variant, which also had same result.

 So finally we found out that this issue happens only on our HVs 
which run
 qemu 2.3.0, from the centos 7 special interest virtualization 
repo. Other

 ones that run qemu 1.5, from official repos, can patch the system vms
 fine.

 So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x? 
Maybe it
 something else special in our setup. e.g. we're running the HVs 
from a
 preconfigured netboot image (pxe), but all of them, including 
those with

 qemu 1.5, so i have no idea.


 Linas ?ilinskas
 Head of Development
 website  facebook
  twitter
  linkedin
 

 Host1Plus is a division of Digital Energy Technologies Ltd.

 26 York Street, London W1U 6PZ, United Kingdom



Linas ?ilinskas
Head of Development
website  facebook 
 twitter 
 linkedin 



Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom







Linas Žilinskas
Head of Development
website  facebook 
 twitter 
 linkedin 



Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom



Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-15 Thread Syahrul Sazli Shaharir

On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:

On Wed, 26 Oct 2016, Linas ?ilinskas wrote:

So after some investigation I've found out that qemu 2.3.0 is indeed 
broken, at least the way CS uses the qemu chardev/socket.


Not sure in which specific version it happened, but it was fixed in 
2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.


qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

Also attaching the patch from that commit.


For our own purposes i've included the patch to the qemu-kvm-ev 
package (2.3.0) and all is well.


Hi,

I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
package.

The issue initially surfaced following a heartbeat-induced reset of
all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts
persisted for 1 out of 4 router VM/networks, even after upgrading to
latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
and the patched code are pretty much still intact, as per the
2.4.0-rc3 commit).

Any help would be greatly appreciated.

Thanks.

(Attached are some debug logs from the host's agent.log)


Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ

Thanks.



--sazli




On 2016-10-20 09:59, Linas ?ilinskas wrote:


 Hi.

 We have made an upgrade to 4.9.

 Custom build packages with our own patches, which in my mind (i'm 
the only

 one patching those) should not affect the issue i'll describe.

 I'm not sure whether we didn't notice it before, or it's actually 
related

 to something in 4.9

 Basically our system vm's were unable to be patched via the qemu 
socket.
 The script simply error'ed out with a timeout while trying to push 
the

 data to the socket.

 Executing it manually (with cmd line from the logs) resulted the 
same. I

 even tried the old perl variant, which also had same result.

 So finally we found out that this issue happens only on our HVs 
which run
 qemu 2.3.0, from the centos 7 special interest virtualization repo. 
Other
 ones that run qemu 1.5, from official repos, can patch the system 
vms

 fine.

 So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x? 
Maybe it
 something else special in our setup. e.g. we're running the HVs from 
a
 preconfigured netboot image (pxe), but all of them, including those 
with

 qemu 1.5, so i have no idea.


 Linas ?ilinskas
 Head of Development
 website  facebook
  twitter
  linkedin
 

 Host1Plus is a division of Digital Energy Technologies Ltd.

 26 York Street, London W1U 6PZ, United Kingdom



Linas ?ilinskas
Head of Development
website  facebook 
 twitter 
 linkedin 



Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom





--
--sazli


[ HP | Dell | Microsoft | Symantec | Server & Network Infrastructure ]
W : www.modern.com.my


Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-12-15 Thread Syahrul Sazli Shaharir

On Wed, 26 Oct 2016, Linas ?ilinskas wrote:

So after some investigation I've found out that qemu 2.3.0 is indeed broken, 
at least the way CS uses the qemu chardev/socket.


Not sure in which specific version it happened, but it was fixed in 
2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.


qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

Also attaching the patch from that commit.


For our own purposes i've included the patch to the qemu-kvm-ev package 
(2.3.0) and all is well.


Hi,

I am facing the exact same issue on latest Cloudstack 4.9.0.1, on latest 
CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7 package.


The issue initially surfaced following a heartbeat-induced reset of all 
hosts, when it was on CS 4.8 @ CentOS 7.0 and stock qemu-kvm-1.5.3. Since 
then, the patchviasocket.pl/py timeouts persisted for 1 out of 4 router 
VM/networks, even after upgrading to latest code. (I have checked the 
qemu-kvm-ev-2.6.0-27.1.el7 source, and the patched code are pretty much 
still intact, as per the 2.4.0-rc3 commit).


Any help would be greatly appreciated.

Thanks.

(Attached are some debug logs from the host's agent.log)

--sazli




On 2016-10-20 09:59, Linas ?ilinskas wrote:


 Hi.

 We have made an upgrade to 4.9.

 Custom build packages with our own patches, which in my mind (i'm the only
 one patching those) should not affect the issue i'll describe.

 I'm not sure whether we didn't notice it before, or it's actually related
 to something in 4.9

 Basically our system vm's were unable to be patched via the qemu socket.
 The script simply error'ed out with a timeout while trying to push the
 data to the socket.

 Executing it manually (with cmd line from the logs) resulted the same. I
 even tried the old perl variant, which also had same result.

 So finally we found out that this issue happens only on our HVs which run
 qemu 2.3.0, from the centos 7 special interest virtualization repo. Other
 ones that run qemu 1.5, from official repos, can patch the system vms
 fine.

 So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x? Maybe it
 something else special in our setup. e.g. we're running the HVs from a
 preconfigured netboot image (pxe), but all of them, including those with
 qemu 1.5, so i have no idea.


 Linas ?ilinskas
 Head of Development
 website  facebook
  twitter
  linkedin
 

 Host1Plus is a division of Digital Energy Technologies Ltd.

 26 York Street, London W1U 6PZ, United Kingdom



Linas ?ilinskas
Head of Development
website  facebook 
 twitter  
linkedin 


Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom




Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-10-28 Thread ilya
Hi Linas

Thank you for posting the solution, i've seen this issue in my lab env
as well.

Much appreciated.

Regards
ilya

On 10/26/16 4:44 AM, Linas Žilinskas wrote:
> So after some investigation I've found out that qemu 2.3.0 is indeed
> broken, at least the way CS uses the qemu chardev/socket.
> 
> Not sure in which specific version it happened, but it was fixed in
> 2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.
> 
> qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338
> 
> Also attaching the patch from that commit.
> 
> 
> For our own purposes i've included the patch to the qemu-kvm-ev package
> (2.3.0) and all is well.
> 
> 
> On 2016-10-20 09:59, Linas Žilinskas wrote:
>>
>> Hi.
>>
>> We have made an upgrade to 4.9.
>>
>> Custom build packages with our own patches, which in my mind (i'm the
>> only one patching those) should not affect the issue i'll describe.
>>
>> I'm not sure whether we didn't notice it before, or it's actually
>> related to something in 4.9
>>
>> Basically our system vm's were unable to be patched via the qemu
>> socket. The script simply error'ed out with a timeout while trying to
>> push the data to the socket.
>>
>> Executing it manually (with cmd line from the logs) resulted the same.
>> I even tried the old perl variant, which also had same result.
>>
>> So finally we found out that this issue happens only on our HVs which
>> run qemu 2.3.0, from the centos 7 special interest virtualization
>> repo. Other ones that run qemu 1.5, from official repos, can patch the
>> system vms fine.
>>
>> So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x? Maybe
>> it something else special in our setup. e.g. we're running the HVs
>> from a preconfigured netboot image (pxe), but all of them, including
>> those with qemu 1.5, so i have no idea.
>>
>>
>> Linas Žilinskas
>> Head of Development
>> website  facebook
>>  twitter
>>  linkedin
>> 
>>
>> Host1Plus is a division of Digital Energy Technologies Ltd.
>>
>> 26 York Street, London W1U 6PZ, United Kingdom
>>
>>  
>>
> 
> Linas Žilinskas
> Head of Development
> website  facebook
>  twitter
>  linkedin
> 
> 
> Host1Plus is a division of Digital Energy Technologies Ltd.
> 
> 26 York Street, London W1U 6PZ, United Kingdom
> 
>  
> 


Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-10-26 Thread Linas Žilinskas
So after some investigation I've found out that qemu 2.3.0 is indeed 
broken, at least the way CS uses the qemu chardev/socket.


Not sure in which specific version it happened, but it was fixed in 
2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.


qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338

Also attaching the patch from that commit.


For our own purposes i've included the patch to the qemu-kvm-ev package 
(2.3.0) and all is well.



On 2016-10-20 09:59, Linas Žilinskas wrote:


Hi.

We have made an upgrade to 4.9.

Custom build packages with our own patches, which in my mind (i'm the 
only one patching those) should not affect the issue i'll describe.


I'm not sure whether we didn't notice it before, or it's actually 
related to something in 4.9


Basically our system vm's were unable to be patched via the qemu 
socket. The script simply error'ed out with a timeout while trying to 
push the data to the socket.


Executing it manually (with cmd line from the logs) resulted the same. 
I even tried the old perl variant, which also had same result.


So finally we found out that this issue happens only on our HVs which 
run qemu 2.3.0, from the centos 7 special interest virtualization 
repo. Other ones that run qemu 1.5, from official repos, can patch the 
system vms fine.


So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x? Maybe 
it something else special in our setup. e.g. we're running the HVs 
from a preconfigured netboot image (pxe), but all of them, including 
those with qemu 1.5, so i have no idea.



Linas Žilinskas
Head of Development
website  facebook 
 twitter 
 linkedin 



Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom



Linas Žilinskas
Head of Development
website  facebook 
 twitter 
 linkedin 



Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom

>From 4bf1cb03fbc43b0055af60d4ff093d6894aa4338 Mon Sep 17 00:00:00 2001
From: Nils Carlson 
Date: Sun, 19 Jul 2015 20:39:56 +
Subject: [PATCH] qemu-char: Fix missed data on unix socket

Commit 812c1057 introduced HUP detection on unix and tcp sockets prior
to a read in tcp_chr_read. This unfortunately broke CloudStack 4.2
which relied on the old behaviour where data on a socket was readable
even if a HUP was present.

A working solution is to properly check the return values from recv,
handling a closed socket once there is no more data to read.

Also enable polling for G_IO_NVAL to ensure the callback is called
for all possible events as these should now be possible to handle
with the improved error detection.

Signed-off-by: Nils Carlson 
Message-Id: <1437338396-22336-1-git-send-email-pyssl...@ludd.ltu.se>
[Do not handle EINTR; use socket_error(). - Paolo]
Signed-off-by: Paolo Bonzini 
---
 qemu-char.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 3200200..d956f8d 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -807,7 +807,8 @@ static gboolean io_watch_poll_prepare(GSource *source, gint *timeout_)
 }
 
 if (now_active) {
-iwp->src = g_io_create_watch(iwp->channel, G_IO_IN | G_IO_ERR | G_IO_HUP);
+iwp->src = g_io_create_watch(iwp->channel,
+ G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL);
 g_source_set_callback(iwp->src, iwp->fd_read, iwp->opaque, NULL);
 g_source_attach(iwp->src, NULL);
 } else {
@@ -2856,12 +2857,6 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
 uint8_t buf[READ_BUF_LEN];
 int len, size;
 
-if (cond & G_IO_HUP) {
-/* connection closed */
-tcp_chr_disconnect(chr);
-return TRUE;
-}
-
 if (!s->connected || s->max_size <= 0) {
 return TRUE;
 }
@@ -2869,7 +2864,9 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque)
 if (len > s->max_size)
 len = s->max_size;
 size = tcp_chr_recv(chr, (void *)buf, len);
-if (size == 0) {
+if (size == 0 ||
+(size < 0 &&
+ socket_error() != EAGAIN && socket_error() != EWOULDBLOCK)) {
 /* connection closed */
 tcp_chr_disconnect(chr);
 } else if (size > 0) {
-- 
2.9.2



Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-10-20 Thread Linas Žilinskas

Nope, we're using CentOS 7 with python 2.7.


And the script itself seems to work, it just gets stuck when trying to 
connect to the socket ( s.connect(..) )




On 2016-10-20 12:38, Rohit Yadav wrote:
If you're using CentOS6 based KVM hosts, can you make sure you've 
argparse installed? The patchviasocket script was changed from perl 
based implementation to a Python based installation that uses a 
library that is not available with Python 2.6.


Try this on kvm host(s):
sudo pip install --upgrade argparse

See, if this fixes the issue.

Regards.

rohit.ya...@shapeblue.com
www.shapeblue.com
@shapeblue


*From:* Linas Žilinskas <li...@host1plus.com>
*Sent:* 20 October 2016 15:29:08
*To:* dev@cloudstack.apache.org
*Subject:* patchviasocket seems to be broken with qemu 2.3(+?)

Hi.

We have made an upgrade to 4.9.

Custom build packages with our own patches, which in my mind (i'm the 
only one patching those) should not affect the issue i'll describe.


I'm not sure whether we didn't notice it before, or it's actually 
related to something in 4.9


Basically our system vm's were unable to be patched via the qemu 
socket. The script simply error'ed out with a timeout while trying to 
push the data to the socket.


Executing it manually (with cmd line from the logs) resulted the same. 
I even tried the old perl variant, which also had same result.


So finally we found out that this issue happens only on our HVs which 
run qemu 2.3.0, from the centos 7 special interest virtualization 
repo. Other ones that run qemu 1.5, from official repos, can patch the 
system vms fine.


So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x? Maybe 
it something else special in our setup. e.g. we're running the HVs 
from a preconfigured netboot image (pxe), but all of them, including 
those with qemu 1.5, so i have no idea.



Linas Žilinskas
Head of Development
website <http://www.host1plus.com/> facebook 
<https://www.facebook.com/Host1Plus> twitter 
<https://twitter.com/Host1Plus> linkedin 
<https://www.linkedin.com/company/digital-energy-technologies-ltd.>


Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom



Linas Žilinskas
Head of Development
website <http://www.host1plus.com/> facebook 
<https://www.facebook.com/Host1Plus> twitter 
<https://twitter.com/Host1Plus> linkedin 
<https://www.linkedin.com/company/digital-energy-technologies-ltd.>


Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom



Re: patchviasocket seems to be broken with qemu 2.3(+?)

2016-10-20 Thread Rohit Yadav
If you're using CentOS6 based KVM hosts, can you make sure you've argparse 
installed? The patchviasocket script was changed from perl based implementation 
to a Python based installation that uses a library that is not available with 
Python 2.6.

Try this on kvm host(s):
sudo pip install --upgrade argparse

See, if this fixes the issue.

Regards.

From: Linas Žilinskas <li...@host1plus.com>
Sent: 20 October 2016 15:29:08
To: dev@cloudstack.apache.org
Subject: patchviasocket seems to be broken with qemu 2.3(+?)


Hi.

We have made an upgrade to 4.9.

Custom build packages with our own patches, which in my mind (i'm the only one 
patching those) should not affect the issue i'll describe.

I'm not sure whether we didn't notice it before, or it's actually related to 
something in 4.9

Basically our system vm's were unable to be patched via the qemu socket. The 
script simply error'ed out with a timeout while trying to push the data to the 
socket.

Executing it manually (with cmd line from the logs) resulted the same. I even 
tried the old perl variant, which also had same result.

So finally we found out that this issue happens only on our HVs which run qemu 
2.3.0, from the centos 7 special interest virtualization repo. Other ones that 
run qemu 1.5, from official repos, can patch the system vms fine.

So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x? Maybe it 
something else special in our setup. e.g. we're running the HVs from a 
preconfigured netboot image (pxe), but all of them, including those with qemu 
1.5, so i have no idea.

Linas Žilinskas
Head of Development
[cid:part1.BFF385E0.BF4C3EA6@host1plus.com]
website<http://www.host1plus.com/> facebook<https://www.facebook.com/Host1Plus> 
twitter<https://twitter.com/Host1Plus> 
linkedin<https://www.linkedin.com/company/digital-energy-technologies-ltd.>

Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom



rohit.ya...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 



patchviasocket seems to be broken with qemu 2.3(+?)

2016-10-20 Thread Linas Žilinskas

Hi.

We have made an upgrade to 4.9.

Custom build packages with our own patches, which in my mind (i'm the 
only one patching those) should not affect the issue i'll describe.


I'm not sure whether we didn't notice it before, or it's actually 
related to something in 4.9


Basically our system vm's were unable to be patched via the qemu socket. 
The script simply error'ed out with a timeout while trying to push the 
data to the socket.


Executing it manually (with cmd line from the logs) resulted the same. I 
even tried the old perl variant, which also had same result.


So finally we found out that this issue happens only on our HVs which 
run qemu 2.3.0, from the centos 7 special interest virtualization repo. 
Other ones that run qemu 1.5, from official repos, can patch the system 
vms fine.


So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x? Maybe 
it something else special in our setup. e.g. we're running the HVs from 
a preconfigured netboot image (pxe), but all of them, including those 
with qemu 1.5, so i have no idea.



Linas Žilinskas
Head of Development
website  facebook 
 twitter 
 linkedin 



Host1Plus is a division of Digital Energy Technologies Ltd.

26 York Street, London W1U 6PZ, United Kingdom