Hi,
I was able to solve this.
Background:
I was restricting access to the web GUI using /etc/default/pveproxy like this:
ALLOW_FROM="172.16.15.128/25"
DENY_FROM="all"
POLICY="allow"
172.16.15.128/25 being the "external" network on eth0. But cluster
communication is using 169.254.42.0/24 on eth2. The
restriction caused connection resets from the pveproxy process on the second
(third, fourth) node.
Changing /etc/default/pveproxy to
ALLOW_FROM="172.16.15.128/25,169.254.42.0/24"
DENY_FROM="all"
POLICY="allow"
and restarting the service on all nodes restored full operation of the web GUI.
Perhaps someone from Proxmox could add this piece of knowledge to the wiki?
Regards,
Uwe
Am 25.02.2017 um 09:23 schrieb Uwe Sauter:
> I'm sorry I forgot to mention that I already switched to "transport: udpu".
>
> I tested multicast before creating the cluster. While the first test (omping
> -c 10000 -i 0.001 -F -q px-a px-b px-c
> px-d) showed no packet loss the second one that is mentioned at [1] (omping
> -c 600 -i 1 -q px-a px-b px-c px-d) showed
> 70% loss for multicast:
>
> root@px-b # omping -c 600 -i 1 -q px-a px-b px-c px-d
> […]
> px-a : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev =
> 0.077/0.250/0.443/0.065
> px-a : multicast, xmt/rcv/%loss = 600/182/69%, min/avg/max/std-dev =
> 0.157/0.280/0.432/0.062
> px-c : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev =
> 0.084/0.236/0.391/0.062
> px-c : multicast, xmt/rcv/%loss = 600/182/69%, min/avg/max/std-dev =
> 0.153/0.265/0.407/0.057
> px-d : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev =
> 0.080/0.243/0.400/0.066
> px-d : multicast, xmt/rcv/%loss = 600/180/70%, min/avg/max/std-dev =
> 0.134/0.265/0.401/0.060
>
> As I have no control of the switch in use I decided to go with UDPU as we
> don't plan to grow the cluster to more than
> ~15 nodes.
>
> This is my corosync.conf (I'm using 169.254.42.0/24 for cluster internal
> communication):
>
> ###############
> logging {
> debug: off
> logfile: /var/log/corosync/corosync.log
> timestamp: on
> to_logfile: yes
> to_syslog: yes
> }
>
> nodelist {
> node {
> name: px-c
> nodeid: 3
> quorum_votes: 1
> ring0_addr: px-c
> }
>
> node {
> name: px-d
> nodeid: 4
> quorum_votes: 1
> ring0_addr: px-d
> }
>
> node {
> name: px-a
> nodeid: 1
> quorum_votes: 1
> ring0_addr: px-a
> }
>
> node {
> name: px-b
> nodeid: 2
> quorum_votes: 1
> ring0_addr: px-b
> }
> }
>
> quorum {
> provider: corosync_votequorum
> }
>
> totem {
> cluster_name: px-infra
> config_version: 5
> ip_version: ipv4
> secauth: on
> transport: udpu
> version: 2
> interface {
> bindnetaddr: 169.254.42.48
> ringnumber: 0
> }
> }
> ###############
>
>
>
>
>
> [1]
> https://pve.proxmox.com/wiki/Multicast_notes#Diagnosis_from_first_principles
>
>
> Am 25.02.2017 um 06:54 schrieb Yannis Milios:
>> In my opinion this is related to difficulties in cluster communication.Have
>> a look these notes:
>>
>> https://pve.proxmox.com/wiki/Multicast_notes
>>
>>
>>
>> On Fri, 24 Feb 2017 at 22:45, Uwe Sauter <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hi,
>>
>> no I didn't think about that.
>>
>> I now tried and restarted pveproxy afterwards but to no avail.
>>
>> Can you explain why you thought that this might help?
>>
>>
>> Regards,
>>
>> Uwe
>>
>>
>> Am 24.02.2017 um 21:28 schrieb Gilberto Nunes:
>> > Hi
>> >
>> > Did you try to execute:
>> >
>> > pvecm updatecerts
>> >
>> > in every nodes???
>> >
>> > 2017-02-24 15:04 GMT-03:00 Uwe Sauter <[email protected]
>> <mailto:[email protected]>
>> <mailto:[email protected] <mailto:[email protected]>>>:
>> >
>> > Hi,
>> >
>> > I have a GUI problem with a four node cluster that I installed
>> recently. I was able
>> > to follow this up to ext-all.js but I'm no web developer so this
>> is where I got stuck.
>> >
>> > Background:
>> > * four node cluster
>> > * each node has two interfaces in use
>> > ** eth0 is 1Gb used for management and some VM traffic
>> > ** eth2 is 10Gb used for cluster synchronization, Ceph and more VM
>> traffic
>> > * host names are resolved via /etc/hosts
>> > * let's call the nodes px-a, px-b, px-c, px-d
>> > * Proxmox version 4.4-12/e71b7a74
>> >
>> >
>> > Problem:
>> > When I access the cluster via the web GUI on px-a I can view all
>> info regarding px-a
>> > without any problems. If I try to view infos regarding the other
>> nodes I almost every
>> > time I get "connection reset by peer (596)".
>> > If I access the cluster GUI on px-b I can view this node's info
>> but not the info of the
>> > other nodes.
>> >
>> > I started to migrate VMs to the cluster today. Before that, when
>> the cluster had no
>> > VMs running, the access between nodes worked without problem.
>> >
>> >
>> > Debugging:
>> > I was able to trace this using Chrome's developer tools up to the
>> point where
>> > some method inside ext-all.js fails with said "connection reset by
>> peer".
>> >
>> > Detail using pretty formatted version of ext-all.js:
>> >
>> > Object (?) Ext.cmd.derive("Ext.data.request.Ajax",
>> Ext.data.request.Base begins at line 11370
>> >
>> > Method "start" begins at line 11394
>> >
>> > Error occurs at line 11409 "h.send(e);"
>> >
>> >
>> > I don't know what causes h.send(e) to fail. Any suggestions what
>> could cause this or how to
>> > debug further is appreciated.
>> >
>> > Regards,
>> >
>> > Uwe
>> > _______________________________________________
>> > pve-user mailing list
>> > [email protected] <mailto:[email protected]>
>> <mailto:[email protected]
>> <mailto:[email protected]>>
>> > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>> <http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user>
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Gilberto Ferreira
>> > +55 (47) 99676-7530
>> > Skype: gilberto.nunes36
>> >
>> _______________________________________________
>> pve-user mailing list
>> [email protected] <mailto:[email protected]>
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>> --
>> Sent from Gmail Mobile
_______________________________________________
pve-user mailing list
[email protected]
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user