Re: [Pulp-dev] Lazy for Pulp3

Ina Panova Thu, 07 Jun 2018 02:10:53 -0700

we could try to go with:

policy=immediate  -> downloads now while the task runs (no lazy). Also the
default if unspecified.
policy=on_demand   -> All the steps in the diagram. Content that is
downloaded is saved so that it's only ever downloaded once.
policy=cache_only     -> All the steps in the diagram except step 14. If
squid pushes the bits out of the cache, it will be re-downloaded again to
serve to other clients requesting the same bits.




--------
Regards,

Ina Panova
Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."

On Fri, Jun 1, 2018 at 12:36 AM, Jeff Ortel <jor...@redhat.com> wrote:

>
>
> On 05/31/2018 04:39 PM, Brian Bouterse wrote:
>
> I updated the epic (https://pulp.plan.io/issues/3693) to use this new
> language.
>
> policy=immediate  -> downloads now while the task runs (no lazy). Also the
> default if unspecified.
> policy=cache-and-save   -> All the steps in the diagram. Content that is
> downloaded is saved so that it's only ever downloaded once.
> policy=cache     -> All the steps in the diagram except step 14. If squid
> pushes the bits out of the cache, it will be re-downloaded again to serve
> to other clients requesting the same bits.
>
>
> These policy names strike me as an odd, non-intuitive mixture. I think we
> need to brainstorm on policy names and/or additional attributes to best
> capture this.  Suggest the epic be updated to describe the "modes" or use
> cases without the names for now.  I'll try to follow up with other
> suggestions.
>
>
>
> Also @milan, see inline for answers to your question.
>
> On Wed, May 30, 2018 at 3:48 PM, Milan Kovacik <mkova...@redhat.com>
> wrote:
>
>> On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse <bbout...@redhat.com>
>> wrote:
>> >
>> >
>> > On Wed, May 30, 2018 at 8:57 AM, Tom McKay <thomasmc...@redhat.com>
>> wrote:
>> >>
>> >> I think there is a usecase for "proxy only" like is being described
>> here.
>> >> Several years ago there was a project called thumbslug[1] that was
>> used in a
>> >> version of katello instead of pulp. It's job was to check entitlements
>> and
>> >> then proxy content from a cdn. The same functionality could be
>> implemented
>> >> in pulp. (Perhaps it's even as simple as telling squid not to cache
>> anything
>> >> so the content would never make it from cache to pulp in current
>> pulp-2.)
>> >
>> >
>> > What would you call this policy?
>> > policy=proxy?
>> > policy=stream-dont-save?
>> > policy=stream-no-save?
>> >
>> > Are the names 'on-demand' and 'immediate' clear enough? Are there better
>> > names?
>> >>
>> >>
>> >> Overall I'm +1 to the idea of an only-squid version, if others think it
>> >> would be useful.
>> >
>> >
>> > I understand describing this as a "only-squid" version, but for
>> clarity, the
>> > streamer would still be required because it is what requests the bits
>> with
>> > the correctly configured downloader (certs, proxy, etc). The streamer
>> > streams the bits into squid which provides caching and client
>> multiplexing.
>>
>> I have to admit it's just now I'm reading
>> https://docs.pulpproject.org/dev-guide/design/deferred-downl
>> oad.html#apache-reverse-proxy
>> again because of the SSL termination. So the new plan is to use the
>> streamer to terminate the SSL instead of the Apache reverse proxy?
>>
>
> The plan for right now is to not use a reverse proxy and have the client's
> connection terminate at squid directly either via http or https depending
> on how squid is configured. The Reverse proxy in pulp2's design served to
> validate the signed urls and rewrite them for squid. This first
> implementation won't use signed urls. I believe that means we don't need a
> reverse proxy here yet.
>
>
>> W/r the construction of the URL of an artifact, I thought it would be
>> stored in the DB, so the Remote would create it during the sync.
>>
>
> This is correct. The inbound URL from the client after the redirect will
> still be a reference that the "Pulp content app" will resolve to a
> RemoteArtifact. Then the streamer will use that RemoteArtifact data to
> correctly build the downloader. That's the gist of it at least.
>
>
>> >
>> > To confirm my understanding this "squid-only" policy would be the same
>> as
>> > on-demand except that it would *not* perform step 14 from the diagram
>> here
>> > (https://pulp.plan.io/issues/3693). Is that right?
>> yup
>> >
>> >>
>> >>
>> >> [1] https://github.com/candlepin/thumbslug
>> >>
>> >> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik <mkova...@redhat.com>
>> >> wrote:
>> >>>
>> >>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban <dkli...@redhat.com>
>> >>> wrote:
>> >>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik <
>> mkova...@redhat.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban <dkli...@redhat.com
>> >
>> >>> >> wrote:
>> >>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik
>> >>> >> > <mkova...@redhat.com>
>> >>> >> > wrote:
>> >>> >> >>
>> >>> >> >> Good point!
>> >>> >> >> More the second; it might be a bit crazy to utilize Squid for
>> that
>> >>> >> >> but
>> >>> >> >> first, let's answer the why ;)
>> >>> >> >> So why does Pulp need to store the content here?
>> >>> >> >> Why don't we point the users to the Squid all the time (for the
>> >>> >> >> lazy
>> >>> >> >> repos)?
>> >>> >> >
>> >>> >> >
>> >>> >> > Pulp's Streamer needs to fetch and store the content because
>> that's
>> >>> >> > Pulp's
>> >>> >> > primary responsibility.
>> >>> >>
>> >>> >> Maybe not that much the storing but rather the content views
>> >>> >> management?
>> >>> >> I mean the partitioning into repositories, promoting.
>> >>> >>
>> >>> >
>> >>> > Exactly this. We want Pulp users to be able to reuse content that
>> was
>> >>> > brought in using the 'on_demand' download policy in other
>> repositories.
>> >>> I see.
>> >>>
>> >>> >
>> >>> >>
>> >>> >> If some of the content lived in Squid and some lived
>> >>> >> > in Pulp, it would be difficult for the user to know what content
>> is
>> >>> >> > actually
>> >>> >> > available in Pulp and what content needs to be fetched from a
>> remote
>> >>> >> > repository.
>> >>> >>
>> >>> >> I'd say the rule of the thumb would be: lazy -> squid, regular ->
>> pulp
>> >>> >> so not that difficult.
>> >>> >> Maybe Pulp could have a concept of Origin, where folks upload
>> stuff to
>> >>> >> a Pulp repo, vs. Proxy for it's repo storage policy?
>> >>> >>
>> >>> >
>> >>> > Squid removes things from the cache at some point. You can probably
>> >>> > configure it to never remove anything from the cache, but then we
>> would
>> >>> > need
>> >>> > to implement orphan cleanup that would work across two systems: pulp
>> >>> > and
>> >>> > squid.
>> >>>
>> >>> Actually "remote" units wouldn't need orphan cleaning from the disk,
>> >>> just dropping them from the DB would suffice.
>> >>>
>> >>> >
>> >>> > Answering that question would still be difficult. Not all content
>> that
>> >>> > is in
>> >>> > the repository that was synced using on_demand download policy will
>> be
>> >>> > in
>> >>> > Squid - only the content that has been requested by clients. So it's
>> >>> > still
>> >>> > hard to know which of the content units have been downloaded and
>> which
>> >>> > have
>> >>> > not been.
>> >>>
>> >>> But the beauty is exactly in that: we don't have to track whether the
>> >>> content is downloaded if it is reverse-proxied[1][2].
>> >>> Moreover, this would work both with and without a proxy between Pulp
>> >>> and the Origin of the remote unit.
>> >>> A "remote" content artifact might just need to carry it's URL in a DB
>> >>> column for this to work; so the async artifact model, instead of the
>> >>> "policy=on-demand"  would have a mandatory remote "URL" attribute; I
>> >>> wouldn't say it's more complex than tracking the "policy" attribute.
>> >>>
>> >>> >
>> >>> >
>> >>> >>
>> >>> >> >
>> >>> >> > As Pulp downloads an Artifact, it calculates all the checksums
>> and
>> >>> >> > it's
>> >>> >> > size. It then performs validation based on information that was
>> >>> >> > provided
>> >>> >> > from the RemoteArtifact. After validation is performed, the
>> >>> >> > Artifact, is
>> >>> >> > saved to the database and it's final place in
>> >>> >> > /var/lib/content/artifacts/.
>> >>> >>
>> >>> >> This could be still achieved by storing the content just
>> temporarily
>> >>> >> in the Squid proxy i.e use Squid as the content source, not the
>> disk.
>> >>> >>
>> >>> >> > Once this information is in the database, Pulp's web server can
>> >>> >> > serve
>> >>> >> > the
>> >>> >> > content without having to involve the Streamer or Squid.
>> >>> >>
>> >>> >> Pulp might serve just the API and the metadata, the content might
>> be
>> >>> >> redirected to the Proxy all the time, correct?
>> >>> >> Doesn't Crane do that btw?
>> >>> >
>> >>> >
>> >>> > Theoretically we could do this, but in practice we would run into
>> >>> > problems
>> >>> > when we needed to scale out the Content app. Right now when the
>> Content
>> >>> > app
>> >>> > needs to be scaled, a user can launch another machine that will run
>> the
>> >>> > Content app. Squid does not support that kind of scaling. Squid can
>> >>> > only
>> >>> > take advantage of additional cores in a single machine
>> >>>
>> >>> I don't think I understand; proxies are actually designed to scale[1]
>> >>> and are used as tools to scale the web too.
>> >>>
>> >>> This is all about the How question but when it comes to my original
>> >>> Why, please correct me if I'm being wrong, the answer so far has been:
>> >>>  Pulp always downloads the content because that's what it is supposed
>> to
>> >>> do.
>> >>>
>> >>> Cheers,
>> >>> milan
>> >>>
>> >>> [1] https://en.wikipedia.org/wiki/Reverse_proxy
>> >>> [2] https://paste.fedoraproject.org/paste/zkBTyxZjm330FsqvPP0lIA
>> >>> [3]
>> >>> https://wiki.squid-cache.org/Features/CacheHierarchy?highlig
>> ht=%28faqlisted.yes%29
>> >>>
>> >>> >
>> >>> >>
>> >>> >>
>> >>> >> Cheers,
>> >>> >> milan
>> >>> >>
>> >>> >> >
>> >>> >> > -dennis
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> --
>> >>> >> >> cheers
>> >>> >> >> milan
>> >>> >> >>
>> >>> >> >> On Tue, May 29, 2018 at 4:25 PM, Brian Bouterse
>> >>> >> >> <bbout...@redhat.com>
>> >>> >> >> wrote:
>> >>> >> >> >
>> >>> >> >> > On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik
>> >>> >> >> > <mkova...@redhat.com>
>> >>> >> >> > wrote:
>> >>> >> >> >>
>> >>> >> >> >> Hi,
>> >>> >> >> >>
>> >>> >> >> >> Looking at the diagram[1] I'm wondering what's the reasoning
>> >>> >> >> >> behind
>> >>> >> >> >> Pulp having to actually fetch the content locally?
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > Is the question "why is Pulp doing the fetching and not
>> Squid?"
>> >>> >> >> > or
>> >>> >> >> > "why
>> >>> >> >> > is
>> >>> >> >> > Pulp storing the content after fetching it?" or both?
>> >>> >> >> >
>> >>> >> >> >> Couldn't Pulp just rely on the proxy with regards to the
>> content
>> >>> >> >> >> streaming?
>> >>> >> >> >>
>> >>> >> >> >> Thanks,
>> >>> >> >> >> milan
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> [1] https://pulp.plan.io/attachments/130957
>> >>> >> >> >>
>> >>> >> >> >> On Fri, May 25, 2018 at 9:11 PM, Brian Bouterse
>> >>> >> >> >> <bbout...@redhat.com>
>> >>> >> >> >> wrote:
>> >>> >> >> >> > A mini-team of core devs** met to talk through lazy use
>> cases
>> >>> >> >> >> > for
>> >>> >> >> >> > Pulp3.
>> >>> >> >> >> > It's effectively the same lazy from Pulp2 except:
>> >>> >> >> >> >
>> >>> >> >> >> > * it's now built into core (not just RPM)
>> >>> >> >> >> > * It disincludes repo protection use cases because we
>> haven't
>> >>> >> >> >> > added
>> >>> >> >> >> > repo
>> >>> >> >> >> > protection to Pulp3 yet
>> >>> >> >> >> > * It disincludes the "background" policy which based on
>> >>> >> >> >> > feedback
>> >>> >> >> >> > from
>> >>> >> >> >> > stakeholders provided very little value
>> >>> >> >> >> > * it will no longer will depend on Twisted as a
>> dependency. It
>> >>> >> >> >> > will
>> >>> >> >> >> > use
>> >>> >> >> >> > asyncio instead.
>> >>> >> >> >> >
>> >>> >> >> >> > While it is being built into core, it will require minimal
>> >>> >> >> >> > support
>> >>> >> >> >> > by
>> >>> >> >> >> > a
>> >>> >> >> >> > plugin writer to add support for it. Details in the epic
>> >>> >> >> >> > below.
>> >>> >> >> >> >
>> >>> >> >> >> > The current use cases along with a technical plan are
>> written
>> >>> >> >> >> > on
>> >>> >> >> >> > this
>> >>> >> >> >> > epic:
>> >>> >> >> >> > https://pulp.plan.io/issues/3693
>> >>> >> >> >> >
>> >>> >> >> >> > We're putting it out for comment, questions, and feedback
>> >>> >> >> >> > before
>> >>> >> >> >> > we
>> >>> >> >> >> > start
>> >>> >> >> >> > into the code. I hope we are able to add this into our next
>> >>> >> >> >> > sprint.
>> >>> >> >> >> >
>> >>> >> >> >> > ** ipanova, jortel, ttereshc, dkliban, bmbouter
>> >>> >> >> >> >
>> >>> >> >> >> > Thanks!
>> >>> >> >> >> > Brian
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > _______________________________________________
>> >>> >> >> >> > Pulp-dev mailing list
>> >>> >> >> >> > Pulp-dev@redhat.com
>> >>> >> >> >> > https://www.redhat.com/mailman/listinfo/pulp-dev
>> >>> >> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >> _______________________________________________
>> >>> >> >> Pulp-dev mailing list
>> >>> >> >> Pulp-dev@redhat.com
>> >>> >> >> https://www.redhat.com/mailman/listinfo/pulp-dev
>> >>> >> >
>> >>> >> >
>> >>> >
>> >>> >
>> >>>
>> >>> _______________________________________________
>> >>> Pulp-dev mailing list
>> >>> Pulp-dev@redhat.com
>> >>> https://www.redhat.com/mailman/listinfo/pulp-dev
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Pulp-dev mailing list
>> >> Pulp-dev@redhat.com
>> >> https://www.redhat.com/mailman/listinfo/pulp-dev
>> >>
>> >
>>
>
>
>
> _______________________________________________
> Pulp-dev mailing 
> listPulp-dev@redhat.comhttps://www.redhat.com/mailman/listinfo/pulp-dev
>
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev@redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>

_______________________________________________
Pulp-dev mailing list
Pulp-dev@redhat.com
https://www.redhat.com/mailman/listinfo/pulp-dev

Re: [Pulp-dev] Lazy for Pulp3

Reply via email to