Not an expert here as i started to work with pulp just recently, but i tried to install 2-3 server configuration with a master and 1 or 2 clients. The aim was to spread the load (in active active configuration so no clustered configuration) and avoiding a single point of failure.

Sadly i went stuck with the fact that nodes need Oauth authentication, but it was not working properly and other pages declared oauth deprecated and soon to be removed from pulp.

How then nodes should work its a mystery. Since the documentation was contradicting itself and i didn't managed to make of work ( ssl issues even if i disabled it everywhere) i opted for a totally different approach:

I created a single pulp server and mounted a nas volume.

I moved the /var/lib/pulp and /var/lib/mongodb to the nas and replaced the mentioned path with another nfs mount. Simbolic links could work with mongodb, but not with pulp as some paths need to be available on apache who by default don't follow simlinks.

Once the pulp stuff are located in NAS i exported that volume on 2 more apache servers and made available the same 'published' directory trough those apache server ( you can reuse the pukp.conf in /etc/httpd/conf.d as it need just minor changes). All the clients actually connect to the apache servers, so i can scale horizontally how much do i want and the pulp server only do the repo sync so his load actually its quite low.

The good:
with this configuration the pulp server can be restarted, reinstalled, or shutdown and the repos will still available to the hosts as they connect to the apache servers. This helps pulp maintenance. Having pulp unavailable means only that there will be no new syncs to update the repositories but the repos are available.

The bad:
this is all nice but only if you use pulp as pure rpm repo manager. If you use pulp also to register the hosts, then this configuration its no use for you. since the hosts have to register, they have to connect to the pulp servers and only pulp can 'push' changes to the hosts, so the single point of failure comes back.

the workaround ( no, its not "ugly" :) )
In my work environment we use puppet to define the server configuration and the running services, so we can rebuild it automatically without manual intervention. This includes repo configurationa nd packages installed, so we dont need to register hosts in specific host groups as puppet does everything (better).

Actually during my host registration test i didn't liked the logic behind. We host several thousand hosts and we need to be able to reinstall them when needed without manual intervention. Puppet cope that, so when i was looking how to register a host i was surprised that a host cannot register to a specific puppet host group. You have to do that by hand on the puppet server ( more exactly: using pulp-admin). So anytime a machine register itself you have some manual task on pulp, which its not scalable for us, so in the end we skipped this part and used pulp just are local rpm repo and continued to use puppet for the rest.


On 22/06/15 15:11, Sean Waite wrote:
By children, I'm referring to child nodes - the subservers that can sync from a "parent" node.

Looking again at the resources, below is what I have. It does look like the 1.7g proc is actually a worker.

Some statistics on what I have here (resident memory):
2 celery__main__worker procs listed as "resource_manager" - 41m memory each 2 celery__main__worker procs listed as "reserved_resource_worker" - 42m and 1.7g respectively
1 mongo process - 972m
1 celerybeat - 24m
a pile of httpd procs - 14m each
1 qpid -  21m

For disk utilization, the mongo db is around 3.8G and my directory containing all of the rpms etc is around 95G.

We're on a system with only 3.5G available memory, which is probably part of the problem. We're looking at expanding it, I'm just trying to figure out how much to expand it by. From your numbers above, we'd need 6-7G of memory + 2*N gigs for the workers. Should I expect maybe 3-4 workers at any one time? I've got 2 now, but that is at an idle state.


On Mon, Jun 22, 2015 at 9:24 AM, Brian Bouterse <[email protected] <mailto:[email protected]>> wrote:

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA256

    Hi Sean,

    I'm not really sure what you mean by the term 'children'. Maybe you
    mean process or consumer?

    I expect pulp_resource_manager to use less than 1.7G of memory, but
    its possible. Memory analysis can be a little bit tricky so more
    details are needed about how this is being measured to be sure.

    The biggest memory process within Pulp by far is mongodb. If you can,
    ensure that at least 4G of RAM is available on that machine that you
    are running mongodb on.

    I looked into the docs and we don't talk much about the memory
    requirements. Feel free to file a bug on that if you want. Roughly I
    expect the following amounts of RAM to be available per process:

    pulp_celerybeat, 256MB - 512MB
    pulp_resource_manager, 256MB - 512MB
    pulp_workers. This process spawns N workers. Each worker could use
    256MB - 2GB depending on what its doing.
    httpd, 1GB
    mongodb, 4GB
    qpidd/rabbitMQ, ???

    Note all the pulp_*, processes have a parent and child process, for
    the numbers above I consider each parent/child together. I usually
    show the inheritance using `sudo ps -awfux`.

    I'm interested to see what others think about these numbers too.

    - -Brian


    On 06/22/2015 08:46 AM, Sean Waite wrote:
    > Hi,
    >
    > I've got a pulp server running, and I'd like to add some children.
    > The server itself is a bit hard up on resources, so we're going to
    > rebuild with a larger one. How much resources would the children
    > use? Is it a fairly beefy process/memory hog?
    >
    > We've got a large number of repositories. pulp-resource-manager
    > seems to be using 1.7G of memory, with a .7G of mongodb.
    >
    > Any pointers on how much I might be able to expect?
    >
    > Thanks
    >
    > -- Sean Waite
    > [email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>> Cloud
    Operations
    > Engineer                GPG 3071E870 TraceLink, Inc.
    >
    > Be Excellent to Each Other
    >
    >
    > _______________________________________________ Pulp-list mailing
    > list [email protected] <mailto:[email protected]>
    > https://www.redhat.com/mailman/listinfo/pulp-list
    >
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2

    iQEcBAEBCAAGBQJViAx4AAoJEK48cdELyEfyjeUH/j06u2ERqrvTogSW+T3ZNYgI
    4xnkypN6/oIv87BhaVysif1adYI4R/egiIHlqHGxO0HBWm/AKQygMWJBvMMK3Dlg
    PtGdDdD7BBnGEwuTeFm0qJMlofk3PKmRaPRrFhwFe6DD/UaYgM7FSVsVbyn4zZpf
    HSMSk+j77FoEH8ExUX4i43UJOjkp1vfFgyynKMwxIHi6vLY0VDnmIS3iISlfroIA
    T+ZmS5t2u2NBU3dgTSHNlQsWP4BT2JH8VRWatoVoMc/vwlIJv+fzYn+tMAjNwKu+
    Lepcowq7sXLQzmlqGgYpVMofcBy4Mv3V0z2tjZOzqySF7omIG5YK7uFDLISxu1g=
    =AhJm
    -----END PGP SIGNATURE-----

    _______________________________________________
    Pulp-list mailing list
    [email protected] <mailto:[email protected]>
    https://www.redhat.com/mailman/listinfo/pulp-list




--
Sean Waite [email protected] <mailto:[email protected]>
Cloud Operations Engineer                GPG 3071E870
TraceLink, Inc.

Be Excellent to Each Other


_______________________________________________
Pulp-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-list

_______________________________________________
Pulp-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-list

Reply via email to