I think you are overthinking this a little bit. Pulp has a "nodes" concept which can provide native replication of repository to one or more child/parent nodes. Here's how we do it:
* One Pulp server in each datacenter * "Parent" pulp server in primary DC * "Child" pulp server in secondary DCs * Content is synced to the parent Pulp server from various repositories * Content is then automatically replicated per a replication schedule to each child node * Clients point to their nearest Pulp server * This is done via intelligent DNS (F5 BIG-IP GTM) that hands out the IP address for the nearest Pulp server depending on the source of the DNS query. I don't see a need to have more than one Pulp server in any given datacenter. One server can easily handle the load for one datacenter. If it goes down, our BIG-IP device notices the failure and starts handing out another Pulp server that is healthy in another datacenter. Our datacenters are very well connected so bandwidth is not a concern. This scenario requires no shared storage or fancy/complicated clustering. Also, Pulp, as of now, will not be able to handle Debian based repositories. Thanks, Josh -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Arnold Bechtoldt Sent: Wednesday, November 27, 2013 3:07 PM To: [email protected] Subject: [Pulp-list] High Available Pulp Setup Hey, we want to setup Pulp with a two-side HA concept. There will be two servers in each of two DCs. Two per DC to do a fast failover within a DC, another (identically configured) two in the second DC to be able two work when the first DC is completly down. Repositories to be mirorred: * RHEL server with additional repositories/channels * EPEL * Foreman (low prio) * Puppet Labs (yum.puppetlabs.com) * rpm repos of some hardware vendors * rpm repos of some software community projects * several rpm repos of own software and the same required for Ubuntu and maybe SLES (ASAP). Geo-redundant SAN (both DCs) via NFS is available. When I understood Pulp correctly, Pulp requires mainly httpd with mod_wsgi, mongodb and storage (/var/lib/pulp/contents) for pulp-server and any host for pulp-admin. pulp-consumer is currently not planned for use. Besides to the node feature there are no docs concerning pulp HA on the web (or PEBKAC) - I would add some as soon I am able to. We have tested Pulp to mirror the repos mentioned above and cloned some, too. Some questions remained be open: * do I need 4 x independent storage space? * do I have to manage 2 or 4 pulp servers with the same content/sync-tasks/clone-tasks? note: every server must be able to provide current mirrors of upstream in a short time (5-10 min) after a failover * is it a expected behaviour that pulp doesn't re-download missing contents to /var/lib/pulp/contents/ of a repo (intentionally removed some)? * is there a way to import contents of a repo (mirror) in another pulp server with the same repo settings/parameters? * does a mongodb replication (master->3 x slave) make sense? notice: Pulp needs to be run on only one system at the same time. Active/Active over both DCs isn't a must. The release of packages of the most important mirrors to the consuming hosts will be staged. Thank you for developing Pulp and giving your ideas to this topic. Arnold -- Arnold Bechtoldt IT Engineering & Operations inovex GmbH Zur Gießerei 16 D-76227 Karlsruhe Tel: 07231 31 91 0 Fax: 07231 31 91 91 Mobil: 0173 3181 117 [email protected] www.inovex.de Sitz der Gesellschaft: Pforzheim AG Mannheim, HRB 502126 Geschäftsführer: Stephan Müller _______________________________________________ Pulp-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/pulp-list
