Re: [Pulp-list] Resource manager behaving differently between clusters

Sebastian Sonne Wed, 10 Jan 2018 05:06:41 -0800

Update: We seem to have found the issue. Infrastructure told me that there is 
an issue that can pause the VMs anywhere from nanoseconds to seconds, possibly 
hundreds of times with only splitseconds between the pauses. Thus, if the 
active manager pauses, a standby takes over. The paused manager comes back, and 
we have two managers.


By that point, the only bug that’s actually pulp related is that the active 
managers don’t check for other active managers, I guess.

Regards,
Sebastian

> Am 10.01.2018 um 12:37 schrieb Sebastian Sonne <[email protected]>:
>
> Hello everyone.
>
> I have two pulp clusters, each containing three nodes, all systems are up to 
> date (pulp 2.14.3). However, the cluster behavior differs greatly. Let's call 
> the working cluster the external one, and the broken one internal.
>
> The setup: Everything is virtualized. Both clusters are distributed over two 
> datacenters, but they're on different ESX-clusters. All nodes are allowed to 
> migrate between hypervisors.
>
> On the external cluster, "celery status" gives me one resource manager, on 
> the external cluster I get either two or three resource managers. As far as I 
> understand, I can run the resource manager on all nodes, but should only see 
> one in celery, because the other two nodes are going into standby.
>
> Running "ps fauxwww |grep resource_manage[r]" on the external cluster gives 
> me four processes in the whole cluster. The currently active resource manager 
> has two processes, the other ones have one process each. However, on the 
> internal cluster I get six processes, two on each node.
>
> From my understanding, the external cluster works correctly, as the active 
> resource manager has one process to communicate with celery, and one to do 
> work, with the other two nodes only having one active process to communicate 
> with celery and become active in case the currently active resource manager 
> goes down.
>
> Oddly enough, celery seems to also disconnect it's own workers:
>
> "Jan 10 08:52:36 pulp02 pulp[101629]: celery.worker.consumer:INFO: missed 
> heartbeat from reserved_resource_worker-1@pulp02". As such, I think we can 
> eliminate the network"
>
> I'm completely stumped and don't even have a real clue of what logs I could 
> provide, or where to start looking into things.
>
> Grateful for any help,
> Sebastian
>
>
> Sebastian Sonne
> Systems & Applications (OSA)
> noris network AG
> Thomas-Mann-Strasse 16−20
> 90471 Nürnberg
> Deutschland
> Tel +49 911 9352 1184
> Fax +49 911 9352 100
>
> [email protected]
> https://www.noris.de - Mehr Leistung als Standard
> Vorstand: Ingo Kraupa (Vorsitzender), Joachim Astel, Jürgen Städing
> Vorsitzender des Aufsichtsrats: Stefan Schnabel - AG Nürnberg HRB 17689
>
>
>
>
> _______________________________________________
> Pulp-list mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/pulp-list



Sebastian Sonne
Systems & Applications (OSA)
noris network AG
Thomas-Mann-Strasse 16−20
90471 Nürnberg
Deutschland
Tel +49 911 9352 1184
Fax +49 911 9352 100

[email protected]
https://www.noris.de - Mehr Leistung als Standard
Vorstand: Ingo Kraupa (Vorsitzender), Joachim Astel, Jürgen Städing
Vorsitzender des Aufsichtsrats: Stefan Schnabel - AG Nürnberg HRB 17689

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Pulp-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-list

Re: [Pulp-list] Resource manager behaving differently between clusters

Reply via email to