Re: [Yum-devel] curlmulti based parallel downloads

James Antill Fri, 30 Sep 2011 09:20:57 -0700

On Fri, 2011-09-30 at 06:45 -0400, Zdenek Pavlas wrote:
> >  These are all fine, although I'm not sure we want to move to using
> > MultiFileMeter instead of our current progress meter.
> 
> How could we use the current single-file progress meter?
> Show just one bar for the total size?  Or play with ECMA-48
> CSI sequences to switch rows?


 Well we already have a total progress indicator on the single-file, so
all we really need to do is update the "total" based on how far along
all the current downloads are and then output the single-file progress
for either:

1. The latest file we got data from.

2. The file closest to completion.

...where we have to make sure we call _do_end() for everything.

 You could certainly try doing a patch to yum to optionally use
TextMultiFileMeter() (which can be separate from anything else) ... but
my guess is people will complain if we make that the default, plus from
what I can see the UI needs at least some work (which is understandable,
the single-file one has had many hours of tweaking to make it nice).
 Adding some "test cases" to progress.py seems like a good idea too :).

> >  Why do we want to use these two extra functions instead of just
> > triggering off a parameter (like the errors one)?
> 
> I need some 'sync' call that blocks until all downloads are finished,
> and adding the parallel_end() function is IMO better than adding a flag 
> to each request to signal end-of-batch.  Pairing that with parallel_begin()
> eliminated the need to pass a sync/async flag to each request.
>
> I can drop parallel_begin() and the global flag, and add an option
> instead.  Something like 'async = (key, limit)', that reads as:
> "run in parallel with other request, but keep #connections with
> the same key below given limit."

 Yeh, that and a parallel_wait() seems fine. Although I think it would
be nice to be able to do:

 async download A
 async download B
 parallel_wait(A+B)
   async download C (this happens in the cb for A)
   async download D (this happens in the cb for B)
 # Both A+B have finished now, but C+D are still going on.
 parallel_wait(C+D)

> > Even if we can't mix new/old downloads at "0.1" it
> > seems bad to make sure we never can.
> 
> Actually, it kinda works now.  parallel_end() disables parallel
> downloads first, then processes the queue in parallel.  So, if
> there's urlgrab() in checkfunc, it's blocking.  I agree that
> explicit flag would make the behavior more predictable.

 *nods* ... less magic is good :).

> > objects and weird interactions with NSS etc. ... sticking it in an
> > external process should get rid of all of those bugs, using CurlMulti
> > is more likely to give us 666 more variants of those bugs.
> 
> I'm all for running downloads in external process, but there are
> more ways how to do it:

 Yeh, I don't think it's the end of the world if we do the simplest
thing and have one fork+exec per. download to start with ... adding to
that so we get one fork+exec per. repo. seems like it shouldn't be too
hard (so we get keepalive).

 The big thing is that we need the parallel code separate from curl,
because curl won't be in proc.

> > 1. Get all the curl/NSS/etc. API usage out of the process, this
> > should close all the weird/annoying bugs we've had with curl* and make sure
> > we don't get any more. It should also fix DNS hang problems. This
> > probably also fixes C-c as well.
> 
> Not sure what you mean..  Using CurlMulti in external process 
> keeps Curl/NSS away from yum as well..  If there are known problems
> with NSS & CurlMulti interaction then that's a different thing.

 I'm not against using CurlMulti in the external proc. ... but each
proc. has to be limited to a single repo. ... so it wouldn't be great if
the only way we could get parallel downloads was for packages from the
same repo.

> > Full level this is SELinux+chroot+drop privs, and we can
> > be a bit lax in implementing this.
> 
> I tried that concept and chroot() seems currently to be a no-no.
> I assume that it only makes sense to chroot BEFORE connect(),
> since most attacks target bugs in header parsing or SSL handhakes.
> 
> But: chroot() && connect() => Host not found.
> 
> I assume it's because the resolver can't read /etc/resolv.conf
> after the chroot, hmm.  Doing a dummy name lookup before
> chroot helps, but I don't dare to say how reliable is that.

 Yeh, chroot() is hard and you'll probably have to bind mount a bunch of
stuff. SELinux is somewhat easier, and someone else's problem ;).

_______________________________________________
Yum-devel mailing list
Yum-devel@lists.baseurl.org
http://lists.baseurl.org/mailman/listinfo/yum-devel

Re: [Yum-devel] curlmulti based parallel downloads

Reply via email to