[Pulp-list] Configuration Recommendations for Distributed Enterprise Pulp Setup

Tim Black Tue, 03 Nov 2020 11:41:19 -0800

Hello pulp community! I'm investigating using pulp for my company's primary
artifact management system, and am looking for some architectural and
configuration recommendations, given our use case and needs:


   - use debian, python, container, and file content plugins
   - produce and consume private content
   -
   - use remotes to implement mirroring of "upstream" public debian, pypi,
   and docker repos for performance and stability reasons (probably using the
   excellent *on demand* feature, and filtering for our arches and
   distributions of interest)
   - replicate our initial pulp instance to new instances at multiple
   sites, implementing scheduled synchronization of all content
   -
   - our file content will dwarf the other plugin content, being counted in
   TB instead of GB

What is your recommendation for configuring a new pulp instance to support
quick crash recovery (backup/restore) and replication to other pulp
instances at multiple sites, synchronizing all content with each other?

Given that I am using ansible (pulp_installer) for all provisioning and
configuration of my pulp instances (and perhaps also using pulp squeezer
<https://github.com/pulp/squeezer> to manage our repositories) what storage
entities are required to backup/restore/replicate in a pulp instance?

I've read about pulp architecture
<https://docs.pulpproject.org/pulpcore/components.html#> and pulp storage
<https://docs.pulpproject.org/pulpcore/installation/storage.html> and also pulp
deployment scenarios
<https://pulpproject.org/2020/07/09/pulp-3.5-installer-roles/>, but it's
unclear to me whether backing up just the django-storages (local fs, sw,
azure) is sufficient, or if the database also must be backed
up/restored/synchronized. Please point me at any other documentation or
discussions that would help shed light on how to achieve my goal of
configuring multiple synchronized pulp instances that can be easily
restored from backup.

I understand that the pulp content plugins themselves implement
synchronization with remotes, so perhaps the best solution is to configure
a cluster of pulp instances using the same ansible playbooks, but defining
one of them as the *primary*, and configure the others to use the *primary*
as a remote? Are there techniques available to set up multi-directional
sync, does it need to follow the *primary*/*secondary* model? (i.e. can I
also set up the primary to synchronize content from all of the *secondaries*,
so that content added to a secondary will become present on the primary and
all other secondaries?)

It seems that pulp has been designed (and then redesigned) for my use case,
but I'm having trouble putting together all the architectural and
configuration pieces required to paint a complete picture of my goal.
Thanks so much for your recommendations and your time!

_______________________________________________
Pulp-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-list

[Pulp-list] Configuration Recommendations for Distributed Enterprise Pulp Setup

Reply via email to