I think you are overthinking this a little bit.  Pulp has a "nodes" concept 
which can provide native replication of repository to one or more child/parent 
nodes.  Here's how we do it:

* One Pulp server in each datacenter
  * "Parent" pulp server in primary DC
  * "Child" pulp server in secondary DCs
* Content is synced to the parent Pulp server from various repositories
* Content is then automatically replicated per a replication schedule to each 
child node
* Clients point to their nearest Pulp server
  * This is done via intelligent DNS (F5 BIG-IP GTM) that hands out the IP 
address for the nearest Pulp server depending on the source of the DNS query.

I don't see a need to have more than one Pulp server in any given datacenter.  
One server can easily handle the load for one datacenter.  If it goes down, our 
BIG-IP device notices the failure and starts handing out another Pulp server 
that is healthy in another datacenter.  Our datacenters are very well connected 
so bandwidth is not a concern.  This scenario requires no shared storage or 
fancy/complicated clustering.

Also, Pulp, as of now, will not be able to handle Debian based repositories.

Thanks,

Josh

-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of Arnold Bechtoldt
Sent: Wednesday, November 27, 2013 3:07 PM
To: [email protected]
Subject: [Pulp-list] High Available Pulp Setup

Hey,

we want to setup Pulp with a two-side HA concept.

There will be two servers in each of two DCs. Two per DC to do a fast failover 
within a DC, another (identically configured) two in the second DC to be able 
two work when the first DC is completly down.

Repositories to be mirorred:

* RHEL server with additional repositories/channels
* EPEL
* Foreman (low prio)
* Puppet Labs (yum.puppetlabs.com)
* rpm repos of some hardware vendors
* rpm repos of some software community projects
* several rpm repos of own software

and the same required for Ubuntu and maybe SLES (ASAP).


Geo-redundant SAN (both DCs) via NFS is available.

When I understood Pulp correctly, Pulp requires mainly httpd with mod_wsgi, 
mongodb and storage (/var/lib/pulp/contents) for pulp-server and any host for 
pulp-admin. pulp-consumer is currently not planned for use.


Besides to the node feature there are no docs concerning pulp HA on the web (or 
PEBKAC) - I would add some as soon I am able to.

We have tested Pulp to mirror the repos mentioned above and cloned some, too.
Some questions remained be open:

* do I need 4 x independent storage space?
* do I have to manage 2 or 4 pulp servers with the same 
content/sync-tasks/clone-tasks? note: every server must be able to provide 
current mirrors of upstream in a short time (5-10 min) after a failover
* is it a expected behaviour that pulp doesn't re-download missing contents to 
/var/lib/pulp/contents/ of a repo (intentionally removed some)?
* is there a way to import contents of a repo (mirror) in another pulp server 
with the same repo settings/parameters?
* does a mongodb replication (master->3 x slave) make sense?

notice: Pulp needs to be run on only one system at the same time.
Active/Active over both DCs isn't a must. The release of packages of the most 
important mirrors to the consuming hosts will be staged.


Thank you for developing Pulp and giving your ideas to this topic.


Arnold

--
Arnold Bechtoldt
IT Engineering & Operations

inovex GmbH

Zur Gießerei 16
D-76227 Karlsruhe
Tel: 07231 31 91 0
Fax: 07231 31 91 91
Mobil: 0173 3181 117
[email protected]
www.inovex.de

Sitz der Gesellschaft: Pforzheim
AG Mannheim, HRB 502126
Geschäftsführer: Stephan Müller


_______________________________________________
Pulp-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-list

Reply via email to