Re: [update] ACS management unable to connect to xenserver hosts after reboot

2016-02-17 Thread Stephan Seitz
Glenn,

thanks for your reply. Unfortunately the SSVM has been destroyed.

We don't have any firewall in between. ACS and XenServers are located in
the same /22. I've double checked every connection and there's no
iptables or similar in the way.
Instead of the SSVM, I've just successfully checked if the consoleproxy
VM is able to connect to Port 8250.

To me it looks, like there's some strange "identity" problem.

mysql> select * from mshost;
+++---+--+---+-++--+-+-+-+
| id | msid   | runid | name | state |
version | service_ip | service_port | last_update | removed |
alert_count |
+++---+--+---+-++--+-+-+-+
|  1 | 57177340185274 | 1455209855143 | acs-management-1 | Up| 4.7.1
| 10.97.13.1 | 9090 | 2016-02-12 16:55:56 | NULL|
0 |
|  3 | 57177340185273 | 1455639355379 | acs-management-1 | Up| 4.7.1
| 10.97.13.1 | 9090 | 2016-02-17 11:31:50 | NULL|
0 |
+++---+--+---+-++--+-+-+-+
2 rows in set (0.00 sec)

Indeed, there is (and always has been) only one management host in this
infrastructure.

With sqldumps at hand, we removed the second row and purged all the
related jobs to that id, but after restarting cloudstack-management,
this entry wasi created again.

Maybe, I'm completely wrong, but is it possible that our management host
"thinks" there's another management host responsible for our cluster?

Since we're fiddling at least two days without any success here, I'm
willing to get a few consulting hours thrown on that.

cheers,

- Stephan

Am Dienstag, den 16.02.2016, 20:39 + schrieb Glenn Wagner: 
> Hi Stephan,
> 
> Check that you can telnet port 8250 on the management server from
> SSVM , check that iptables has been setup correctly 
> Looks like it’s a firewall issue on the ACS Management server
> 
> Thanks
> Glenn
> 
> 
> 
> 
> 
> ShapeBlue
> Glenn Wagner
> Senior
> Consultant
> , 
> ShapeBlue
> d: 
>  | s: +27 21 527 0091
>  | 
> m: 
> +27 73 917 4111
> e: 
> glenn.wag...@shapeblue.com | t: 
>  | 
> w: 
> www.shapeblue.com
> a: 
> 2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West Cape Town 7130 South 
> Africa
> 
> Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue
> Services India LLP is a company incorporated in India and is operated
> under license from Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda
> is a company incorporated in Brasil and is operated under license from
> Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company registered by The
> Republic of South Africa and is traded under license from Shape Blue
> Ltd. ShapeBlue is a registered trademark.
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is addressed.
> Any views or opinions expressed are solely those of the author and do
> not necessarily represent those of Shape Blue Ltd or related
> companies. If you are not the intended recipient of this email, you
> must neither take any action based upon its contents, nor copy or show
> it to anyone. Please contact the sender if you believe you have
> received this email in error.
> 
> 
> 
> 
> 
> -Original Message-
> From: Stephan Seitz [mailto:s.se...@secretresearchfacility.com] 
> Sent: Tuesday, 16 February 2016 5:19 PM
> To: users@cloudstack.apache.org
> Cc: d...@cloudstack.apache.org
> Subject: [update] ACS management unable to connect to xenserver hosts
> after reboot
> 
> Hi again!
> 
> I think we've found the root source, but are unable to mitigate that:
> 
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
> Cancel request received
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
> 57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is
> closed
> 2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
> Routing to peer
> 2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
> (AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
>

RE: [update] ACS management unable to connect to xenserver hosts after reboot

2016-02-16 Thread Glenn Wagner
Hi Stephan,

Check that you can telnet port 8250 on the management server from SSVM , check 
that iptables has been setup correctly
Looks like it’s a firewall issue on the ACS Management server

Thanks
Glenn





[ShapeBlue]<http://www.shapeblue.com>
Glenn Wagner
Senior Consultant   ,   ShapeBlue


d:   | s: +27 21 527 0091<tel:|%20s:%20+27%2021%20527%200091>|  
m:  +27 73 917 4111<tel:+27%2073%20917%204111>

e:  glenn.wag...@shapeblue.com | t: 
<mailto:glenn.wag...@shapeblue.com%20|%20t:> |  w:  
www.shapeblue.com<http://www.shapeblue.com>

a:  2nd Floor, Oudehuis Centre, 122 Main Rd, Somerset West Cape Town 7130 
South Africa


[cid:image6aa740.png@33ce927b.48914897]


Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services 
India LLP is a company incorporated in India and is operated under license from 
Shape Blue Ltd. Shape Blue Brasil Consultoria Ltda is a company incorporated in 
Brasil and is operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd 
is a company registered by The Republic of South Africa and is traded under 
license from Shape Blue Ltd. ShapeBlue is a registered trademark.
This email and any attachments to it may be confidential and are intended 
solely for the use of the individual to whom it is addressed. Any views or 
opinions expressed are solely those of the author and do not necessarily 
represent those of Shape Blue Ltd or related companies. If you are not the 
intended recipient of this email, you must neither take any action based upon 
its contents, nor copy or show it to anyone. Please contact the sender if you 
believe you have received this email in error.




-Original Message-
From: Stephan Seitz [mailto:s.se...@secretresearchfacility.com]
Sent: Tuesday, 16 February 2016 5:19 PM
To: users@cloudstack.apache.org
Cc: d...@cloudstack.apache.org
Subject: [update] ACS management unable to connect to xenserver hosts after 
reboot

Hi again!

I think we've found the root source, but are unable to mitigate that:

2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
Routing to peer
2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
Cancel request received
2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is closed
2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
Routing to peer
2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
Cancel request received
2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is closed

Here's a longer excerpt from the logfile during startup:

http://pastebin.com/SftVJCs4

Maybe someone knows how to resolve this? To me it looks like our single 
management-host has some kind of identity crisis?


Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz:
> Hi acs gurus!
>
> We're currently facing a really strange problem after two somewhat
> simple steps.
> 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> located)
> 2. Upgrade 4.7.0 to 4.7.1
>
> Both steps seemed successful and running, but after a few days I've
> noticed the SSVM in "running, not connected" state, so I decided to
> restart the SSVM. That's where all the trouble begun...
>
> I've pasted a somewhat repetive log excerpt here
> http://pastebin.com/8MM6XUBk
>
> If I try to (force) reconnect a host, we're getting huge repetive log
> entries like pasted here http://pastebin.com/cNR3TtkG
>
> Cloudmonkey quits with following Response:
>
> (local)  > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> Error Connection refused by server: ('Connection aborted.',
> BadStatusLine("''",))
>
>
> I've tcpdump'ed relevant traffic between management and xenservers and
> found simply nothing except some (i assume) unrelated NFS-Packets.
>
> Could please someone shed some light, how to fix that?
>
> Thanks in advance!
>
> - Stephan


Find out more about ShapeBlue and our range of CloudStack related services:
IaaS Cloud Design & Build<http://shapeblue.com/iaas-cloud-design-and-build//> | 
CSForge – rapid IaaS deployment framework<http://shapeblue.com/csforge/>
CloudStack Consulting<http://shapeblue.com/cloudstack-consultancy/> | 
CloudStack Software 
Engineering<http://shapeblue.com/cloud

[update] ACS management unable to connect to xenserver hosts after reboot

2016-02-16 Thread Stephan Seitz
Hi again!

I think we've found the root source, but are unable to mitigate that:

2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-8:null) Seq 6--1: MgmtId 57177340185273: Req:
Routing to peer
2016-02-16 16:13:22,217 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-9:null) Seq 6--1: MgmtId 57177340185273: Req:
Cancel request received
2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-10:null) Seq 1-4458000681143369786: MgmtId
57177340185273: Req: Resource [Host:1] is unreachable: Host 1: Link is
closed
2016-02-16 16:13:22,899 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-10:null) Seq 1--1: MgmtId 57177340185273: Req:
Routing to peer
2016-02-16 16:13:22,900 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-11:null) Seq 1--1: MgmtId 57177340185273: Req:
Cancel request received
2016-02-16 16:13:22,905 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentManager-Handler-12:null) Seq 3-2144839322535198778: MgmtId
57177340185273: Req: Resource [Host:3] is unreachable: Host 3: Link is
closed

Here's a longer excerpt from the logfile during startup:

http://pastebin.com/SftVJCs4

Maybe someone knows how to resolve this? To me it looks like our single
management-host has some kind of identity crisis? 


Am Dienstag, den 16.02.2016, 15:12 +0100 schrieb Stephan Seitz: 
> Hi acs gurus!
> 
> We're currently facing a really strange problem after two somewhat
> simple steps.
> 1. Reboot Management-Node (well there is also a 2nd. NFS-Storage
> located)
> 2. Upgrade 4.7.0 to 4.7.1
> 
> Both steps seemed successful and running, but after a few days I've
> noticed the SSVM in "running, not connected" state, so I decided to
> restart the SSVM. That's where all the trouble begun...
> 
> I've pasted a somewhat repetive log excerpt here
> http://pastebin.com/8MM6XUBk
> 
> If I try to (force) reconnect a host, we're getting huge repetive log
> entries like pasted here http://pastebin.com/cNR3TtkG
> 
> Cloudmonkey quits with following Response:
> 
> (local)  > reconnect host id=df4182f8-24a0-40ca-9ccc-6489f374cd4c
> Error Connection refused by server: ('Connection aborted.',
> BadStatusLine("''",))
> 
> 
> I've tcpdump'ed relevant traffic between management and xenservers and
> found simply nothing except some (i assume) unrelated NFS-Packets.
> 
> Could please someone shed some light, how to fix that?
> 
> Thanks in advance!
> 
> - Stephan