we could try to go with: policy=immediate -> downloads now while the task runs (no lazy). Also the default if unspecified. policy=on_demand -> All the steps in the diagram. Content that is downloaded is saved so that it's only ever downloaded once. policy=cache_only -> All the steps in the diagram except step 14. If squid pushes the bits out of the cache, it will be re-downloaded again to serve to other clients requesting the same bits.
-------- Regards, Ina Panova Software Engineer| Pulp| Red Hat Inc. "Do not go where the path may lead, go instead where there is no path and leave a trail." On Fri, Jun 1, 2018 at 12:36 AM, Jeff Ortel <jor...@redhat.com> wrote: > > > On 05/31/2018 04:39 PM, Brian Bouterse wrote: > > I updated the epic (https://pulp.plan.io/issues/3693) to use this new > language. > > policy=immediate -> downloads now while the task runs (no lazy). Also the > default if unspecified. > policy=cache-and-save -> All the steps in the diagram. Content that is > downloaded is saved so that it's only ever downloaded once. > policy=cache -> All the steps in the diagram except step 14. If squid > pushes the bits out of the cache, it will be re-downloaded again to serve > to other clients requesting the same bits. > > > These policy names strike me as an odd, non-intuitive mixture. I think we > need to brainstorm on policy names and/or additional attributes to best > capture this. Suggest the epic be updated to describe the "modes" or use > cases without the names for now. I'll try to follow up with other > suggestions. > > > > Also @milan, see inline for answers to your question. > > On Wed, May 30, 2018 at 3:48 PM, Milan Kovacik <mkova...@redhat.com> > wrote: > >> On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse <bbout...@redhat.com> >> wrote: >> > >> > >> > On Wed, May 30, 2018 at 8:57 AM, Tom McKay <thomasmc...@redhat.com> >> wrote: >> >> >> >> I think there is a usecase for "proxy only" like is being described >> here. >> >> Several years ago there was a project called thumbslug[1] that was >> used in a >> >> version of katello instead of pulp. It's job was to check entitlements >> and >> >> then proxy content from a cdn. The same functionality could be >> implemented >> >> in pulp. (Perhaps it's even as simple as telling squid not to cache >> anything >> >> so the content would never make it from cache to pulp in current >> pulp-2.) >> > >> > >> > What would you call this policy? >> > policy=proxy? >> > policy=stream-dont-save? >> > policy=stream-no-save? >> > >> > Are the names 'on-demand' and 'immediate' clear enough? Are there better >> > names? >> >> >> >> >> >> Overall I'm +1 to the idea of an only-squid version, if others think it >> >> would be useful. >> > >> > >> > I understand describing this as a "only-squid" version, but for >> clarity, the >> > streamer would still be required because it is what requests the bits >> with >> > the correctly configured downloader (certs, proxy, etc). The streamer >> > streams the bits into squid which provides caching and client >> multiplexing. >> >> I have to admit it's just now I'm reading >> https://docs.pulpproject.org/dev-guide/design/deferred-downl >> oad.html#apache-reverse-proxy >> again because of the SSL termination. So the new plan is to use the >> streamer to terminate the SSL instead of the Apache reverse proxy? >> > > The plan for right now is to not use a reverse proxy and have the client's > connection terminate at squid directly either via http or https depending > on how squid is configured. The Reverse proxy in pulp2's design served to > validate the signed urls and rewrite them for squid. This first > implementation won't use signed urls. I believe that means we don't need a > reverse proxy here yet. > > >> W/r the construction of the URL of an artifact, I thought it would be >> stored in the DB, so the Remote would create it during the sync. >> > > This is correct. The inbound URL from the client after the redirect will > still be a reference that the "Pulp content app" will resolve to a > RemoteArtifact. Then the streamer will use that RemoteArtifact data to > correctly build the downloader. That's the gist of it at least. > > >> > >> > To confirm my understanding this "squid-only" policy would be the same >> as >> > on-demand except that it would *not* perform step 14 from the diagram >> here >> > (https://pulp.plan.io/issues/3693). Is that right? >> yup >> > >> >> >> >> >> >> [1] https://github.com/candlepin/thumbslug >> >> >> >> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik <mkova...@redhat.com> >> >> wrote: >> >>> >> >>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban <dkli...@redhat.com> >> >>> wrote: >> >>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik < >> mkova...@redhat.com> >> >>> > wrote: >> >>> >> >> >>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban <dkli...@redhat.com >> > >> >>> >> wrote: >> >>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik >> >>> >> > <mkova...@redhat.com> >> >>> >> > wrote: >> >>> >> >> >> >>> >> >> Good point! >> >>> >> >> More the second; it might be a bit crazy to utilize Squid for >> that >> >>> >> >> but >> >>> >> >> first, let's answer the why ;) >> >>> >> >> So why does Pulp need to store the content here? >> >>> >> >> Why don't we point the users to the Squid all the time (for the >> >>> >> >> lazy >> >>> >> >> repos)? >> >>> >> > >> >>> >> > >> >>> >> > Pulp's Streamer needs to fetch and store the content because >> that's >> >>> >> > Pulp's >> >>> >> > primary responsibility. >> >>> >> >> >>> >> Maybe not that much the storing but rather the content views >> >>> >> management? >> >>> >> I mean the partitioning into repositories, promoting. >> >>> >> >> >>> > >> >>> > Exactly this. We want Pulp users to be able to reuse content that >> was >> >>> > brought in using the 'on_demand' download policy in other >> repositories. >> >>> I see. >> >>> >> >>> > >> >>> >> >> >>> >> If some of the content lived in Squid and some lived >> >>> >> > in Pulp, it would be difficult for the user to know what content >> is >> >>> >> > actually >> >>> >> > available in Pulp and what content needs to be fetched from a >> remote >> >>> >> > repository. >> >>> >> >> >>> >> I'd say the rule of the thumb would be: lazy -> squid, regular -> >> pulp >> >>> >> so not that difficult. >> >>> >> Maybe Pulp could have a concept of Origin, where folks upload >> stuff to >> >>> >> a Pulp repo, vs. Proxy for it's repo storage policy? >> >>> >> >> >>> > >> >>> > Squid removes things from the cache at some point. You can probably >> >>> > configure it to never remove anything from the cache, but then we >> would >> >>> > need >> >>> > to implement orphan cleanup that would work across two systems: pulp >> >>> > and >> >>> > squid. >> >>> >> >>> Actually "remote" units wouldn't need orphan cleaning from the disk, >> >>> just dropping them from the DB would suffice. >> >>> >> >>> > >> >>> > Answering that question would still be difficult. Not all content >> that >> >>> > is in >> >>> > the repository that was synced using on_demand download policy will >> be >> >>> > in >> >>> > Squid - only the content that has been requested by clients. So it's >> >>> > still >> >>> > hard to know which of the content units have been downloaded and >> which >> >>> > have >> >>> > not been. >> >>> >> >>> But the beauty is exactly in that: we don't have to track whether the >> >>> content is downloaded if it is reverse-proxied[1][2]. >> >>> Moreover, this would work both with and without a proxy between Pulp >> >>> and the Origin of the remote unit. >> >>> A "remote" content artifact might just need to carry it's URL in a DB >> >>> column for this to work; so the async artifact model, instead of the >> >>> "policy=on-demand" would have a mandatory remote "URL" attribute; I >> >>> wouldn't say it's more complex than tracking the "policy" attribute. >> >>> >> >>> > >> >>> > >> >>> >> >> >>> >> > >> >>> >> > As Pulp downloads an Artifact, it calculates all the checksums >> and >> >>> >> > it's >> >>> >> > size. It then performs validation based on information that was >> >>> >> > provided >> >>> >> > from the RemoteArtifact. After validation is performed, the >> >>> >> > Artifact, is >> >>> >> > saved to the database and it's final place in >> >>> >> > /var/lib/content/artifacts/. >> >>> >> >> >>> >> This could be still achieved by storing the content just >> temporarily >> >>> >> in the Squid proxy i.e use Squid as the content source, not the >> disk. >> >>> >> >> >>> >> > Once this information is in the database, Pulp's web server can >> >>> >> > serve >> >>> >> > the >> >>> >> > content without having to involve the Streamer or Squid. >> >>> >> >> >>> >> Pulp might serve just the API and the metadata, the content might >> be >> >>> >> redirected to the Proxy all the time, correct? >> >>> >> Doesn't Crane do that btw? >> >>> > >> >>> > >> >>> > Theoretically we could do this, but in practice we would run into >> >>> > problems >> >>> > when we needed to scale out the Content app. Right now when the >> Content >> >>> > app >> >>> > needs to be scaled, a user can launch another machine that will run >> the >> >>> > Content app. Squid does not support that kind of scaling. Squid can >> >>> > only >> >>> > take advantage of additional cores in a single machine >> >>> >> >>> I don't think I understand; proxies are actually designed to scale[1] >> >>> and are used as tools to scale the web too. >> >>> >> >>> This is all about the How question but when it comes to my original >> >>> Why, please correct me if I'm being wrong, the answer so far has been: >> >>> Pulp always downloads the content because that's what it is supposed >> to >> >>> do. >> >>> >> >>> Cheers, >> >>> milan >> >>> >> >>> [1] https://en.wikipedia.org/wiki/Reverse_proxy >> >>> [2] https://paste.fedoraproject.org/paste/zkBTyxZjm330FsqvPP0lIA >> >>> [3] >> >>> https://wiki.squid-cache.org/Features/CacheHierarchy?highlig >> ht=%28faqlisted.yes%29 >> >>> >> >>> > >> >>> >> >> >>> >> >> >>> >> Cheers, >> >>> >> milan >> >>> >> >> >>> >> > >> >>> >> > -dennis >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> >> >> >>> >> >> >> >>> >> >> -- >> >>> >> >> cheers >> >>> >> >> milan >> >>> >> >> >> >>> >> >> On Tue, May 29, 2018 at 4:25 PM, Brian Bouterse >> >>> >> >> <bbout...@redhat.com> >> >>> >> >> wrote: >> >>> >> >> > >> >>> >> >> > On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik >> >>> >> >> > <mkova...@redhat.com> >> >>> >> >> > wrote: >> >>> >> >> >> >> >>> >> >> >> Hi, >> >>> >> >> >> >> >>> >> >> >> Looking at the diagram[1] I'm wondering what's the reasoning >> >>> >> >> >> behind >> >>> >> >> >> Pulp having to actually fetch the content locally? >> >>> >> >> > >> >>> >> >> > >> >>> >> >> > Is the question "why is Pulp doing the fetching and not >> Squid?" >> >>> >> >> > or >> >>> >> >> > "why >> >>> >> >> > is >> >>> >> >> > Pulp storing the content after fetching it?" or both? >> >>> >> >> > >> >>> >> >> >> Couldn't Pulp just rely on the proxy with regards to the >> content >> >>> >> >> >> streaming? >> >>> >> >> >> >> >>> >> >> >> Thanks, >> >>> >> >> >> milan >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> [1] https://pulp.plan.io/attachments/130957 >> >>> >> >> >> >> >>> >> >> >> On Fri, May 25, 2018 at 9:11 PM, Brian Bouterse >> >>> >> >> >> <bbout...@redhat.com> >> >>> >> >> >> wrote: >> >>> >> >> >> > A mini-team of core devs** met to talk through lazy use >> cases >> >>> >> >> >> > for >> >>> >> >> >> > Pulp3. >> >>> >> >> >> > It's effectively the same lazy from Pulp2 except: >> >>> >> >> >> > >> >>> >> >> >> > * it's now built into core (not just RPM) >> >>> >> >> >> > * It disincludes repo protection use cases because we >> haven't >> >>> >> >> >> > added >> >>> >> >> >> > repo >> >>> >> >> >> > protection to Pulp3 yet >> >>> >> >> >> > * It disincludes the "background" policy which based on >> >>> >> >> >> > feedback >> >>> >> >> >> > from >> >>> >> >> >> > stakeholders provided very little value >> >>> >> >> >> > * it will no longer will depend on Twisted as a >> dependency. It >> >>> >> >> >> > will >> >>> >> >> >> > use >> >>> >> >> >> > asyncio instead. >> >>> >> >> >> > >> >>> >> >> >> > While it is being built into core, it will require minimal >> >>> >> >> >> > support >> >>> >> >> >> > by >> >>> >> >> >> > a >> >>> >> >> >> > plugin writer to add support for it. Details in the epic >> >>> >> >> >> > below. >> >>> >> >> >> > >> >>> >> >> >> > The current use cases along with a technical plan are >> written >> >>> >> >> >> > on >> >>> >> >> >> > this >> >>> >> >> >> > epic: >> >>> >> >> >> > https://pulp.plan.io/issues/3693 >> >>> >> >> >> > >> >>> >> >> >> > We're putting it out for comment, questions, and feedback >> >>> >> >> >> > before >> >>> >> >> >> > we >> >>> >> >> >> > start >> >>> >> >> >> > into the code. I hope we are able to add this into our next >> >>> >> >> >> > sprint. >> >>> >> >> >> > >> >>> >> >> >> > ** ipanova, jortel, ttereshc, dkliban, bmbouter >> >>> >> >> >> > >> >>> >> >> >> > Thanks! >> >>> >> >> >> > Brian >> >>> >> >> >> > >> >>> >> >> >> > >> >>> >> >> >> > _______________________________________________ >> >>> >> >> >> > Pulp-dev mailing list >> >>> >> >> >> > Pulp-dev@redhat.com >> >>> >> >> >> > https://www.redhat.com/mailman/listinfo/pulp-dev >> >>> >> >> >> > >> >>> >> >> > >> >>> >> >> > >> >>> >> >> >> >>> >> >> _______________________________________________ >> >>> >> >> Pulp-dev mailing list >> >>> >> >> Pulp-dev@redhat.com >> >>> >> >> https://www.redhat.com/mailman/listinfo/pulp-dev >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> >>> >> >>> _______________________________________________ >> >>> Pulp-dev mailing list >> >>> Pulp-dev@redhat.com >> >>> https://www.redhat.com/mailman/listinfo/pulp-dev >> >> >> >> >> >> >> >> _______________________________________________ >> >> Pulp-dev mailing list >> >> Pulp-dev@redhat.com >> >> https://www.redhat.com/mailman/listinfo/pulp-dev >> >> >> > >> > > > > _______________________________________________ > Pulp-dev mailing > listPulp-dev@redhat.comhttps://www.redhat.com/mailman/listinfo/pulp-dev > > > > _______________________________________________ > Pulp-dev mailing list > Pulp-dev@redhat.com > https://www.redhat.com/mailman/listinfo/pulp-dev > >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev