@ipanova, +1 to your names, I updated the epic. FYI, I updated the epic in several ways to allow for the "cache_only" option in the design.
I added a new task to add "policy" also to ContentUnit so the streamer can know what to do: https://pulp.plan.io/issues/3763 Other updates to allow for "cache_only": https://pulp.plan.io/issues/3695#note-2 https://pulp.plan.io/issues/3699#note-3 https://pulp.plan.io/issues/3693 On Thu, Jun 7, 2018 at 5:10 AM, Ina Panova <ipan...@redhat.com> wrote: > we could try to go with: > > policy=immediate -> downloads now while the task runs (no lazy). Also the > default if unspecified. > policy=on_demand -> All the steps in the diagram. Content that is > downloaded is saved so that it's only ever downloaded once. > policy=cache_only -> All the steps in the diagram except step 14. If > squid pushes the bits out of the cache, it will be re-downloaded again to > serve to other clients requesting the same bits. > > > > -------- > Regards, > > Ina Panova > Software Engineer| Pulp| Red Hat Inc. > > "Do not go where the path may lead, > go instead where there is no path and leave a trail." > > On Fri, Jun 1, 2018 at 12:36 AM, Jeff Ortel <jor...@redhat.com> wrote: > >> >> >> On 05/31/2018 04:39 PM, Brian Bouterse wrote: >> >> I updated the epic (https://pulp.plan.io/issues/3693) to use this new >> language. >> >> policy=immediate -> downloads now while the task runs (no lazy). Also >> the default if unspecified. >> policy=cache-and-save -> All the steps in the diagram. Content that is >> downloaded is saved so that it's only ever downloaded once. >> policy=cache -> All the steps in the diagram except step 14. If squid >> pushes the bits out of the cache, it will be re-downloaded again to serve >> to other clients requesting the same bits. >> >> >> These policy names strike me as an odd, non-intuitive mixture. I think we >> need to brainstorm on policy names and/or additional attributes to best >> capture this. Suggest the epic be updated to describe the "modes" or use >> cases without the names for now. I'll try to follow up with other >> suggestions. >> >> >> >> Also @milan, see inline for answers to your question. >> >> On Wed, May 30, 2018 at 3:48 PM, Milan Kovacik <mkova...@redhat.com> >> wrote: >> >>> On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse <bbout...@redhat.com> >>> wrote: >>> > >>> > >>> > On Wed, May 30, 2018 at 8:57 AM, Tom McKay <thomasmc...@redhat.com> >>> wrote: >>> >> >>> >> I think there is a usecase for "proxy only" like is being described >>> here. >>> >> Several years ago there was a project called thumbslug[1] that was >>> used in a >>> >> version of katello instead of pulp. It's job was to check >>> entitlements and >>> >> then proxy content from a cdn. The same functionality could be >>> implemented >>> >> in pulp. (Perhaps it's even as simple as telling squid not to cache >>> anything >>> >> so the content would never make it from cache to pulp in current >>> pulp-2.) >>> > >>> > >>> > What would you call this policy? >>> > policy=proxy? >>> > policy=stream-dont-save? >>> > policy=stream-no-save? >>> > >>> > Are the names 'on-demand' and 'immediate' clear enough? Are there >>> better >>> > names? >>> >> >>> >> >>> >> Overall I'm +1 to the idea of an only-squid version, if others think >>> it >>> >> would be useful. >>> > >>> > >>> > I understand describing this as a "only-squid" version, but for >>> clarity, the >>> > streamer would still be required because it is what requests the bits >>> with >>> > the correctly configured downloader (certs, proxy, etc). The streamer >>> > streams the bits into squid which provides caching and client >>> multiplexing. >>> >>> I have to admit it's just now I'm reading >>> https://docs.pulpproject.org/dev-guide/design/deferred-downl >>> oad.html#apache-reverse-proxy >>> again because of the SSL termination. So the new plan is to use the >>> streamer to terminate the SSL instead of the Apache reverse proxy? >>> >> >> The plan for right now is to not use a reverse proxy and have the >> client's connection terminate at squid directly either via http or https >> depending on how squid is configured. The Reverse proxy in pulp2's design >> served to validate the signed urls and rewrite them for squid. This first >> implementation won't use signed urls. I believe that means we don't need a >> reverse proxy here yet. >> >> >>> W/r the construction of the URL of an artifact, I thought it would be >>> stored in the DB, so the Remote would create it during the sync. >>> >> >> This is correct. The inbound URL from the client after the redirect will >> still be a reference that the "Pulp content app" will resolve to a >> RemoteArtifact. Then the streamer will use that RemoteArtifact data to >> correctly build the downloader. That's the gist of it at least. >> >> >>> > >>> > To confirm my understanding this "squid-only" policy would be the same >>> as >>> > on-demand except that it would *not* perform step 14 from the diagram >>> here >>> > (https://pulp.plan.io/issues/3693). Is that right? >>> yup >>> > >>> >> >>> >> >>> >> [1] https://github.com/candlepin/thumbslug >>> >> >>> >> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik <mkova...@redhat.com> >>> >> wrote: >>> >>> >>> >>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban <dkli...@redhat.com> >>> >>> wrote: >>> >>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik < >>> mkova...@redhat.com> >>> >>> > wrote: >>> >>> >> >>> >>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban < >>> dkli...@redhat.com> >>> >>> >> wrote: >>> >>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik >>> >>> >> > <mkova...@redhat.com> >>> >>> >> > wrote: >>> >>> >> >> >>> >>> >> >> Good point! >>> >>> >> >> More the second; it might be a bit crazy to utilize Squid for >>> that >>> >>> >> >> but >>> >>> >> >> first, let's answer the why ;) >>> >>> >> >> So why does Pulp need to store the content here? >>> >>> >> >> Why don't we point the users to the Squid all the time (for the >>> >>> >> >> lazy >>> >>> >> >> repos)? >>> >>> >> > >>> >>> >> > >>> >>> >> > Pulp's Streamer needs to fetch and store the content because >>> that's >>> >>> >> > Pulp's >>> >>> >> > primary responsibility. >>> >>> >> >>> >>> >> Maybe not that much the storing but rather the content views >>> >>> >> management? >>> >>> >> I mean the partitioning into repositories, promoting. >>> >>> >> >>> >>> > >>> >>> > Exactly this. We want Pulp users to be able to reuse content that >>> was >>> >>> > brought in using the 'on_demand' download policy in other >>> repositories. >>> >>> I see. >>> >>> >>> >>> > >>> >>> >> >>> >>> >> If some of the content lived in Squid and some lived >>> >>> >> > in Pulp, it would be difficult for the user to know what >>> content is >>> >>> >> > actually >>> >>> >> > available in Pulp and what content needs to be fetched from a >>> remote >>> >>> >> > repository. >>> >>> >> >>> >>> >> I'd say the rule of the thumb would be: lazy -> squid, regular -> >>> pulp >>> >>> >> so not that difficult. >>> >>> >> Maybe Pulp could have a concept of Origin, where folks upload >>> stuff to >>> >>> >> a Pulp repo, vs. Proxy for it's repo storage policy? >>> >>> >> >>> >>> > >>> >>> > Squid removes things from the cache at some point. You can probably >>> >>> > configure it to never remove anything from the cache, but then we >>> would >>> >>> > need >>> >>> > to implement orphan cleanup that would work across two systems: >>> pulp >>> >>> > and >>> >>> > squid. >>> >>> >>> >>> Actually "remote" units wouldn't need orphan cleaning from the disk, >>> >>> just dropping them from the DB would suffice. >>> >>> >>> >>> > >>> >>> > Answering that question would still be difficult. Not all content >>> that >>> >>> > is in >>> >>> > the repository that was synced using on_demand download policy >>> will be >>> >>> > in >>> >>> > Squid - only the content that has been requested by clients. So >>> it's >>> >>> > still >>> >>> > hard to know which of the content units have been downloaded and >>> which >>> >>> > have >>> >>> > not been. >>> >>> >>> >>> But the beauty is exactly in that: we don't have to track whether the >>> >>> content is downloaded if it is reverse-proxied[1][2]. >>> >>> Moreover, this would work both with and without a proxy between Pulp >>> >>> and the Origin of the remote unit. >>> >>> A "remote" content artifact might just need to carry it's URL in a DB >>> >>> column for this to work; so the async artifact model, instead of the >>> >>> "policy=on-demand" would have a mandatory remote "URL" attribute; I >>> >>> wouldn't say it's more complex than tracking the "policy" attribute. >>> >>> >>> >>> > >>> >>> > >>> >>> >> >>> >>> >> > >>> >>> >> > As Pulp downloads an Artifact, it calculates all the checksums >>> and >>> >>> >> > it's >>> >>> >> > size. It then performs validation based on information that was >>> >>> >> > provided >>> >>> >> > from the RemoteArtifact. After validation is performed, the >>> >>> >> > Artifact, is >>> >>> >> > saved to the database and it's final place in >>> >>> >> > /var/lib/content/artifacts/. >>> >>> >> >>> >>> >> This could be still achieved by storing the content just >>> temporarily >>> >>> >> in the Squid proxy i.e use Squid as the content source, not the >>> disk. >>> >>> >> >>> >>> >> > Once this information is in the database, Pulp's web server can >>> >>> >> > serve >>> >>> >> > the >>> >>> >> > content without having to involve the Streamer or Squid. >>> >>> >> >>> >>> >> Pulp might serve just the API and the metadata, the content might >>> be >>> >>> >> redirected to the Proxy all the time, correct? >>> >>> >> Doesn't Crane do that btw? >>> >>> > >>> >>> > >>> >>> > Theoretically we could do this, but in practice we would run into >>> >>> > problems >>> >>> > when we needed to scale out the Content app. Right now when the >>> Content >>> >>> > app >>> >>> > needs to be scaled, a user can launch another machine that will >>> run the >>> >>> > Content app. Squid does not support that kind of scaling. Squid can >>> >>> > only >>> >>> > take advantage of additional cores in a single machine >>> >>> >>> >>> I don't think I understand; proxies are actually designed to scale[1] >>> >>> and are used as tools to scale the web too. >>> >>> >>> >>> This is all about the How question but when it comes to my original >>> >>> Why, please correct me if I'm being wrong, the answer so far has >>> been: >>> >>> Pulp always downloads the content because that's what it is >>> supposed to >>> >>> do. >>> >>> >>> >>> Cheers, >>> >>> milan >>> >>> >>> >>> [1] https://en.wikipedia.org/wiki/Reverse_proxy >>> >>> [2] https://paste.fedoraproject.org/paste/zkBTyxZjm330FsqvPP0lIA >>> >>> [3] >>> >>> https://wiki.squid-cache.org/Features/CacheHierarchy?highlig >>> ht=%28faqlisted.yes%29 >>> >>> >>> >>> > >>> >>> >> >>> >>> >> >>> >>> >> Cheers, >>> >>> >> milan >>> >>> >> >>> >>> >> > >>> >>> >> > -dennis >>> >>> >> > >>> >>> >> > >>> >>> >> > >>> >>> >> > >>> >>> >> > >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> -- >>> >>> >> >> cheers >>> >>> >> >> milan >>> >>> >> >> >>> >>> >> >> On Tue, May 29, 2018 at 4:25 PM, Brian Bouterse >>> >>> >> >> <bbout...@redhat.com> >>> >>> >> >> wrote: >>> >>> >> >> > >>> >>> >> >> > On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik >>> >>> >> >> > <mkova...@redhat.com> >>> >>> >> >> > wrote: >>> >>> >> >> >> >>> >>> >> >> >> Hi, >>> >>> >> >> >> >>> >>> >> >> >> Looking at the diagram[1] I'm wondering what's the reasoning >>> >>> >> >> >> behind >>> >>> >> >> >> Pulp having to actually fetch the content locally? >>> >>> >> >> > >>> >>> >> >> > >>> >>> >> >> > Is the question "why is Pulp doing the fetching and not >>> Squid?" >>> >>> >> >> > or >>> >>> >> >> > "why >>> >>> >> >> > is >>> >>> >> >> > Pulp storing the content after fetching it?" or both? >>> >>> >> >> > >>> >>> >> >> >> Couldn't Pulp just rely on the proxy with regards to the >>> content >>> >>> >> >> >> streaming? >>> >>> >> >> >> >>> >>> >> >> >> Thanks, >>> >>> >> >> >> milan >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> [1] https://pulp.plan.io/attachments/130957 >>> >>> >> >> >> >>> >>> >> >> >> On Fri, May 25, 2018 at 9:11 PM, Brian Bouterse >>> >>> >> >> >> <bbout...@redhat.com> >>> >>> >> >> >> wrote: >>> >>> >> >> >> > A mini-team of core devs** met to talk through lazy use >>> cases >>> >>> >> >> >> > for >>> >>> >> >> >> > Pulp3. >>> >>> >> >> >> > It's effectively the same lazy from Pulp2 except: >>> >>> >> >> >> > >>> >>> >> >> >> > * it's now built into core (not just RPM) >>> >>> >> >> >> > * It disincludes repo protection use cases because we >>> haven't >>> >>> >> >> >> > added >>> >>> >> >> >> > repo >>> >>> >> >> >> > protection to Pulp3 yet >>> >>> >> >> >> > * It disincludes the "background" policy which based on >>> >>> >> >> >> > feedback >>> >>> >> >> >> > from >>> >>> >> >> >> > stakeholders provided very little value >>> >>> >> >> >> > * it will no longer will depend on Twisted as a >>> dependency. It >>> >>> >> >> >> > will >>> >>> >> >> >> > use >>> >>> >> >> >> > asyncio instead. >>> >>> >> >> >> > >>> >>> >> >> >> > While it is being built into core, it will require minimal >>> >>> >> >> >> > support >>> >>> >> >> >> > by >>> >>> >> >> >> > a >>> >>> >> >> >> > plugin writer to add support for it. Details in the epic >>> >>> >> >> >> > below. >>> >>> >> >> >> > >>> >>> >> >> >> > The current use cases along with a technical plan are >>> written >>> >>> >> >> >> > on >>> >>> >> >> >> > this >>> >>> >> >> >> > epic: >>> >>> >> >> >> > https://pulp.plan.io/issues/3693 >>> >>> >> >> >> > >>> >>> >> >> >> > We're putting it out for comment, questions, and feedback >>> >>> >> >> >> > before >>> >>> >> >> >> > we >>> >>> >> >> >> > start >>> >>> >> >> >> > into the code. I hope we are able to add this into our >>> next >>> >>> >> >> >> > sprint. >>> >>> >> >> >> > >>> >>> >> >> >> > ** ipanova, jortel, ttereshc, dkliban, bmbouter >>> >>> >> >> >> > >>> >>> >> >> >> > Thanks! >>> >>> >> >> >> > Brian >>> >>> >> >> >> > >>> >>> >> >> >> > >>> >>> >> >> >> > _______________________________________________ >>> >>> >> >> >> > Pulp-dev mailing list >>> >>> >> >> >> > Pulp-dev@redhat.com >>> >>> >> >> >> > https://www.redhat.com/mailman/listinfo/pulp-dev >>> >>> >> >> >> > >>> >>> >> >> > >>> >>> >> >> > >>> >>> >> >> >>> >>> >> >> _______________________________________________ >>> >>> >> >> Pulp-dev mailing list >>> >>> >> >> Pulp-dev@redhat.com >>> >>> >> >> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >>> >> > >>> >>> >> > >>> >>> > >>> >>> > >>> >>> >>> >>> _______________________________________________ >>> >>> Pulp-dev mailing list >>> >>> Pulp-dev@redhat.com >>> >>> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> >>> >> >>> >> >>> >> _______________________________________________ >>> >> Pulp-dev mailing list >>> >> Pulp-dev@redhat.com >>> >> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> >>> > >>> >> >> >> >> _______________________________________________ >> Pulp-dev mailing >> listPulp-dev@redhat.comhttps://www.redhat.com/mailman/listinfo/pulp-dev >> >> >> >> _______________________________________________ >> Pulp-dev mailing list >> Pulp-dev@redhat.com >> https://www.redhat.com/mailman/listinfo/pulp-dev >> >> > > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev > >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev