We need to look into fixing this bug https://pulp.plan.io/issues/8295 to match the behaviour you have described Matthias.
-------- Regards, Ina Panova Senior Software Engineer| Pulp| Red Hat Inc. "Do not go where the path may lead, go instead where there is no path and leave a trail." On Tue, Mar 30, 2021 at 10:12 AM Matthias Dellweg <mdell...@redhat.com> wrote: > > > On Tue, Mar 30, 2021 at 10:06 AM Sayan Das <say...@redhat.com> wrote: > >> Hello Matthias, >> >> Thanks for your response on this one. >> >> By this, >> ~~ >> Since those stages use python async and asyncio this means, there will be >> 5 parallel downloads (as long as enough requests flow by that stage). Once >> an artifact is downloaded, the next stage will transfer it to the final >> storage location (may be a cloud storage), and so on. >> ~~ >> >> Should I assume that, once 5 parallel download gets completed inside the >> /var/lib/pulp/tmp , they will be immediately be transferred to their actual >> location and then only the next batch of download will start? >> > As far as i know, the downloads are not batched at all. If one completes, > the next one can start, so it's always 5 in parallel. And then if it's > finished, it will be transferred to the storage by one of the following > stages. However pulp does not look at the disk size. So in theory, you > should be safe, but there's no guarantee. > >> >> This question is being raised based on our old experience with pulp 2, >> where a 50+ GB openshift repo was being synced, /var/cache/pulp was of only >> 25 GB and during the content download part only the filesystem got filled >> up and eventually, the task got canceled with disk-space error. It happened >> as pulp2 used to download the data in batches of 5 but it never moved the >> data to their destination until the entire repository was downloaded in >> pulp cache. This was only noticed with docker\ISO\file type repos but NOT >> with yum\rpm type repos. >> >> >> >> Thanks & Regards, >> >> Sayan das >> >> *T*echnical *S*upport *E*ngineer, RHCE >> >> Red Hat India >> <https://www.redhat.com/> >> >> Red Hat India Pvt. Ltd, Level-5, Tower-10, Cyber City >> >> Magarpatta City Hadapsar, Pune-411013, Maharashtra, India. >> >> say...@redhat.com M: +91-7890892756 IRC: Sayan >> <https://red.ht/sig> >> >> >> On Tue, Mar 30, 2021 at 1:25 PM Matthias Dellweg <mdell...@redhat.com> >> wrote: >> >>> I am not quite sure, i understand the right notion of the question, but >>> i'll try to give my view of it. >>> Pulp 3 has a special asynchronous sync pipeline. That means on synching >>> a remote repository (regardless of it's type) there is a pipeline with so >>> called stages. The first stage is supposed to fetch metadata and enumerate >>> content units (blobs, manifests, rpms, files, ...) and pass them into the >>> pipeline. The other stages that run in parallel will each perform one of >>> downloading artifacts, saving them, assemble content units, saving them, >>> adding them to the new repository version. >>> Since those stages use python async and asyncio this means, there will >>> be 5 parallel downloads (as long as enough requests flow by that stage). >>> Once an artifact is downloaded, the next stage will transfer it to the >>> final storage location (may be a cloud storage), and so on. For performance >>> reasons however, some stages (doing database saves) will batch their work >>> into large batches (>= 100). >>> In short: It's different. >>> I hope this explains (high level) what's going on there. >>> Feel free to ask for more detail. >>> >>> On Mon, Mar 29, 2021 at 4:48 PM Sayan Das <say...@redhat.com> wrote: >>> >>>> Hello Everyone, >>>> >>>> I am not sure if my previous email was successfully delivered or not >>>> and hence I am re-sending it. >>>> >>>> I hope someone will be able to help me with some clarification there. >>>> >>>> >>>> Thanks & Regards, >>>> >>>> Sayan das >>>> >>>> *T*echnical *S*upport *E*ngineer, RHCE >>>> >>>> Red Hat India >>>> <https://www.redhat.com/> >>>> >>>> Red Hat India Pvt. Ltd, Level-5, Tower-10, Cyber City >>>> >>>> Magarpatta City Hadapsar, Pune-411013, Maharashtra, India. >>>> >>>> say...@redhat.com M: +91-7890892756 IRC: Sayan >>>> <https://red.ht/sig> >>>> >>>> >>>> On Sat, Mar 27, 2021 at 12:17 AM Sayan Das <say...@redhat.com> wrote: >>>> >>>>> Hello All, >>>>> >>>>> I hope this email finds you all well. >>>>> >>>>> My name is Sayan and I work as a support engineer for the Red Hat >>>>> Satellite 6 product. During a recent interaction with my colleague Ian >>>>> Ballou, we came across a pulp2-vs-pulp3 question that we are looking for >>>>> clarification on and It was suggested that this pulp-dev will be a really >>>>> great place to get that clarification. >>>>> >>>>> Please allow me to explain the pulp 2 behavior. >>>>> >>>>> Some parameters to consider: >>>>> >>>>> Repo Type: Docker or Openshift repo [ Assuming it has 200 units to get >>>>> downloaded ] >>>>> Download Dir: /var/cache/pulp >>>>> Data Dir: /var/lib/pulp/content/units/ >>>>> Download concurrency: 5 >>>>> >>>>> Now, >>>>> * Sync Started for the repo. >>>>> * pulp downloaded 5 units in the "Download Dir" but never moved >>>>> them in "Data Dir" >>>>> * Once those first 5 units were downloaded, Pulp downloads the next >>>>> 5 units and the same cycle keeps on repeating untill all 200 units have >>>>> been downloaded. >>>>> * When all 200 units are downloaded, then the entire content will >>>>> be moved from "Download Dir" to the respective location inside "Data Dir" >>>>> >>>>> >>>>> For pulp 3, >>>>> >>>>> Download Dir: /var/lib/pulp/tmp >>>>> Data Dir: /var/lib/pulp/media >>>>> Download concurrency: 5 [ I heard it's 10 but let's assume it's 5 for >>>>> now ] >>>>> >>>>> >>>>> So the question is, Will pulp 3 behave the same as pulp 2, i.e. >>>>> download the entire repository inside "Download Dir" by the batches of 5 >>>>> units and then move the entire repository to "Data Dir" or the behavior is >>>>> different i.e. after download 5 units in "Download Dir" the content will >>>>> be >>>>> moved to "Data Dir" and then the next 5 units will be downloaded? >>>>> >>>>> Please note, I have specifically mentioned that the repo is a >>>>> Docker\Openshift type repo as we are concerned about only Docker\ISO\File >>>>> type repos at this moment. >>>>> >>>>> Any clarification that can be provided on this will be really >>>>> appreciated. >>>>> >>>>> >>>>> >>>>> >>>>> Thanks & Regards, >>>>> >>>>> Sayan das >>>>> >>>>> *T*echnical *S*upport *E*ngineer, RHCE >>>>> >>>>> Red Hat India >>>>> <https://www.redhat.com/> >>>>> >>>>> Red Hat India Pvt. Ltd, Level-5, Tower-10, Cyber City >>>>> >>>>> Magarpatta City Hadapsar, Pune-411013, Maharashtra, India. >>>>> >>>>> say...@redhat.com M: +91-7890892756 IRC: Sayan >>>>> <https://red.ht/sig> >>>>> >>>> _______________________________________________ >>>> Pulp-dev mailing list >>>> Pulp-dev@redhat.com >>>> https://listman.redhat.com/mailman/listinfo/pulp-dev >>>> >>> _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://listman.redhat.com/mailman/listinfo/pulp-dev >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://listman.redhat.com/mailman/listinfo/pulp-dev