Actually, what about these as names? policy=immediate -> downloads now while the task runs (no lazy). Also the default if unspecified. policy=cache-and-save -> All the steps in the diagram. Content that is downloaded is saved so that it's only ever downloaded once. policy=cache -> All the steps in the diagram except step 14. If squid pushes the bits out of the cache, it will be re-downloaded again to serve to other clients requesting the same bits.
If ^ is better I can update the stories. Other naming ideas and use cases are welcome. Thanks, Brian On Wed, May 30, 2018 at 10:50 AM, Brian Bouterse <bbout...@redhat.com> wrote: > > > On Wed, May 30, 2018 at 8:57 AM, Tom McKay <thomasmc...@redhat.com> wrote: > >> I think there is a usecase for "proxy only" like is being described here. >> Several years ago there was a project called thumbslug[1] that was used in >> a version of katello instead of pulp. It's job was to check entitlements >> and then proxy content from a cdn. The same functionality could be >> implemented in pulp. (Perhaps it's even as simple as telling squid not to >> cache anything so the content would never make it from cache to pulp in >> current pulp-2.) >> > > What would you call this policy? > policy=proxy? > policy=stream-dont-save? > policy=stream-no-save? > > Are the names 'on-demand' and 'immediate' clear enough? Are there better > names? > >> >> Overall I'm +1 to the idea of an only-squid version, if others think it >> would be useful. >> > > I understand describing this as a "only-squid" version, but for clarity, > the streamer would still be required because it is what requests the bits > with the correctly configured downloader (certs, proxy, etc). The streamer > streams the bits into squid which provides caching and client multiplexing. > > To confirm my understanding this "squid-only" policy would be the same as > on-demand except that it would *not* perform step 14 from the diagram here ( > https://pulp.plan.io/issues/3693). Is that right? > > >> >> [1] https://github.com/candlepin/thumbslug >> >> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik <mkova...@redhat.com> >> wrote: >> >>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban <dkli...@redhat.com> >>> wrote: >>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik <mkova...@redhat.com> >>> wrote: >>> >> >>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban <dkli...@redhat.com> >>> wrote: >>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik < >>> mkova...@redhat.com> >>> >> > wrote: >>> >> >> >>> >> >> Good point! >>> >> >> More the second; it might be a bit crazy to utilize Squid for that >>> but >>> >> >> first, let's answer the why ;) >>> >> >> So why does Pulp need to store the content here? >>> >> >> Why don't we point the users to the Squid all the time (for the >>> lazy >>> >> >> repos)? >>> >> > >>> >> > >>> >> > Pulp's Streamer needs to fetch and store the content because that's >>> >> > Pulp's >>> >> > primary responsibility. >>> >> >>> >> Maybe not that much the storing but rather the content views >>> management? >>> >> I mean the partitioning into repositories, promoting. >>> >> >>> > >>> > Exactly this. We want Pulp users to be able to reuse content that was >>> > brought in using the 'on_demand' download policy in other repositories. >>> I see. >>> >>> > >>> >> >>> >> If some of the content lived in Squid and some lived >>> >> > in Pulp, it would be difficult for the user to know what content is >>> >> > actually >>> >> > available in Pulp and what content needs to be fetched from a remote >>> >> > repository. >>> >> >>> >> I'd say the rule of the thumb would be: lazy -> squid, regular -> pulp >>> >> so not that difficult. >>> >> Maybe Pulp could have a concept of Origin, where folks upload stuff to >>> >> a Pulp repo, vs. Proxy for it's repo storage policy? >>> >> >>> > >>> > Squid removes things from the cache at some point. You can probably >>> > configure it to never remove anything from the cache, but then we >>> would need >>> > to implement orphan cleanup that would work across two systems: pulp >>> and >>> > squid. >>> >>> Actually "remote" units wouldn't need orphan cleaning from the disk, >>> just dropping them from the DB would suffice. >>> >>> > >>> > Answering that question would still be difficult. Not all content that >>> is in >>> > the repository that was synced using on_demand download policy will be >>> in >>> > Squid - only the content that has been requested by clients. So it's >>> still >>> > hard to know which of the content units have been downloaded and which >>> have >>> > not been. >>> >>> But the beauty is exactly in that: we don't have to track whether the >>> content is downloaded if it is reverse-proxied[1][2]. >>> Moreover, this would work both with and without a proxy between Pulp >>> and the Origin of the remote unit. >>> A "remote" content artifact might just need to carry it's URL in a DB >>> column for this to work; so the async artifact model, instead of the >>> "policy=on-demand" would have a mandatory remote "URL" attribute; I >>> wouldn't say it's more complex than tracking the "policy" attribute. >>> >>> > >>> > >>> >> >>> >> > >>> >> > As Pulp downloads an Artifact, it calculates all the checksums and >>> it's >>> >> > size. It then performs validation based on information that was >>> provided >>> >> > from the RemoteArtifact. After validation is performed, the >>> Artifact, is >>> >> > saved to the database and it's final place in >>> >> > /var/lib/content/artifacts/. >>> >> >>> >> This could be still achieved by storing the content just temporarily >>> >> in the Squid proxy i.e use Squid as the content source, not the disk. >>> >> >>> >> > Once this information is in the database, Pulp's web server can >>> serve >>> >> > the >>> >> > content without having to involve the Streamer or Squid. >>> >> >>> >> Pulp might serve just the API and the metadata, the content might be >>> >> redirected to the Proxy all the time, correct? >>> >> Doesn't Crane do that btw? >>> > >>> > >>> > Theoretically we could do this, but in practice we would run into >>> problems >>> > when we needed to scale out the Content app. Right now when the >>> Content app >>> > needs to be scaled, a user can launch another machine that will run the >>> > Content app. Squid does not support that kind of scaling. Squid can >>> only >>> > take advantage of additional cores in a single machine >>> >>> I don't think I understand; proxies are actually designed to scale[1] >>> and are used as tools to scale the web too. >>> >>> This is all about the How question but when it comes to my original >>> Why, please correct me if I'm being wrong, the answer so far has been: >>> Pulp always downloads the content because that's what it is supposed to >>> do. >>> >>> Cheers, >>> milan >>> >>> [1] https://en.wikipedia.org/wiki/Reverse_proxy >>> [2] https://paste.fedoraproject.org/paste/zkBTyxZjm330FsqvPP0lIA >>> [3] https://wiki.squid-cache.org/Features/CacheHierarchy?highlig >>> ht=%28faqlisted.yes%29 >>> >>> > >>> >> >>> >> >>> >> Cheers, >>> >> milan >>> >> >>> >> > >>> >> > -dennis >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> cheers >>> >> >> milan >>> >> >> >>> >> >> On Tue, May 29, 2018 at 4:25 PM, Brian Bouterse < >>> bbout...@redhat.com> >>> >> >> wrote: >>> >> >> > >>> >> >> > On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik < >>> mkova...@redhat.com> >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> Hi, >>> >> >> >> >>> >> >> >> Looking at the diagram[1] I'm wondering what's the reasoning >>> behind >>> >> >> >> Pulp having to actually fetch the content locally? >>> >> >> > >>> >> >> > >>> >> >> > Is the question "why is Pulp doing the fetching and not Squid?" >>> or >>> >> >> > "why >>> >> >> > is >>> >> >> > Pulp storing the content after fetching it?" or both? >>> >> >> > >>> >> >> >> Couldn't Pulp just rely on the proxy with regards to the content >>> >> >> >> streaming? >>> >> >> >> >>> >> >> >> Thanks, >>> >> >> >> milan >>> >> >> >> >>> >> >> >> >>> >> >> >> [1] https://pulp.plan.io/attachments/130957 >>> >> >> >> >>> >> >> >> On Fri, May 25, 2018 at 9:11 PM, Brian Bouterse >>> >> >> >> <bbout...@redhat.com> >>> >> >> >> wrote: >>> >> >> >> > A mini-team of core devs** met to talk through lazy use cases >>> for >>> >> >> >> > Pulp3. >>> >> >> >> > It's effectively the same lazy from Pulp2 except: >>> >> >> >> > >>> >> >> >> > * it's now built into core (not just RPM) >>> >> >> >> > * It disincludes repo protection use cases because we haven't >>> >> >> >> > added >>> >> >> >> > repo >>> >> >> >> > protection to Pulp3 yet >>> >> >> >> > * It disincludes the "background" policy which based on >>> feedback >>> >> >> >> > from >>> >> >> >> > stakeholders provided very little value >>> >> >> >> > * it will no longer will depend on Twisted as a dependency. It >>> >> >> >> > will >>> >> >> >> > use >>> >> >> >> > asyncio instead. >>> >> >> >> > >>> >> >> >> > While it is being built into core, it will require minimal >>> support >>> >> >> >> > by >>> >> >> >> > a >>> >> >> >> > plugin writer to add support for it. Details in the epic >>> below. >>> >> >> >> > >>> >> >> >> > The current use cases along with a technical plan are written >>> on >>> >> >> >> > this >>> >> >> >> > epic: >>> >> >> >> > https://pulp.plan.io/issues/3693 >>> >> >> >> > >>> >> >> >> > We're putting it out for comment, questions, and feedback >>> before >>> >> >> >> > we >>> >> >> >> > start >>> >> >> >> > into the code. I hope we are able to add this into our next >>> >> >> >> > sprint. >>> >> >> >> > >>> >> >> >> > ** ipanova, jortel, ttereshc, dkliban, bmbouter >>> >> >> >> > >>> >> >> >> > Thanks! >>> >> >> >> > Brian >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > _______________________________________________ >>> >> >> >> > Pulp-dev mailing list >>> >> >> >> > Pulp-dev@redhat.com >>> >> >> >> > https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> >> >> > >>> >> >> > >>> >> >> > >>> >> >> >>> >> >> _______________________________________________ >>> >> >> Pulp-dev mailing list >>> >> >> Pulp-dev@redhat.com >>> >> >> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> > >>> >> > >>> > >>> > >>> >>> _______________________________________________ >>> Pulp-dev mailing list >>> Pulp-dev@redhat.com >>> https://www.redhat.com/mailman/listinfo/pulp-dev >>> >> >> >> _______________________________________________ >> Pulp-dev mailing list >> Pulp-dev@redhat.com >> https://www.redhat.com/mailman/listinfo/pulp-dev >> >> >
_______________________________________________ Pulp-dev mailing list Pulp-dev@redhat.com https://www.redhat.com/mailman/listinfo/pulp-dev