On Thu, Oct 6, 2011 at 3:37 PM, Zdenek Pavlas <zpav...@redhat.com> wrote:
> Hi,
>
> I did some experiments with parallelizing metadata downloads
> using the 'bulk urlgrab' api.  Metadata initialization code
> is pretty complex so I only came to a 'staged' 3-pass solution
>
> 1st pass: update metalink files.
> 2nd pass: update repomd files.
> 3rd pass: update sqlite files.
>
> Each stage runs downloads in parallel but the preceding
> stage has to (completely) finish first, so if just one
> repo has a stale metalink, all repos have to wait.
>
> Yes, it's possible to rewrite the repo initialization code
> to a state machine, and do the staging independently for
> each repo.  But that's IMO ugly and intrusive.  The most
> natural approach is to keep yumRepo.py with minimal changes,
> just init repos in separate threads.
>
> I assume sqlite and rpmdb are the main reasons we don't
> support/allow threading in yum, as these break if thread
> A queries a handle opened by thread B.
>
> But we can wrap such calls, serialize them, and process
> all in the main thread's context [see below].
>
> It's a hack, but rewriting a substantial part of yumRepo
> (and supporting most of the old API at the same time)
> seems worse.  I'm not convinced that's the way to go,
> just an idea to discuss.
>
> === CUT ===
> from thread import allocate_lock, get_ident
> lock = allocate_lock()
> syn1 = allocate_lock(); syn1.acquire()
> syn2 = allocate_lock(); syn2.acquire()
> curr = []
>
> # single threading wrapper
> def singlethreaded(fun):
>    def forward(*arg, **karg):
>        lock.acquire() # serialize access to 'curr'
>        curr[:] = fun, arg, karg, None, None
>        syn1.release() # job ready
>        syn2.acquire() # wait till done
>        ret, exc = curr[3:]
>        lock.release()
>        if exc: raise exc
>        return ret
>    return forward
>
> # An example API that does not like threads
> main = get_ident()
>
> @singlethreaded
> def foo(arg):
>    assert get_ident() == main
>    print 'foo', arg
>
> @singlethreaded
> def bar(arg):
>    assert get_ident() == main
>    print 'bar', arg
>
> # test
> from thread import start_new
> def thread(n):
>    # this runs in threads, but foo & bar don't.
>    foo(n)
>    bar(n)
> for n in range(3):
>    start_new(thread, (n,))
> while 1:
>    syn1.acquire() # wait for request
>    try: curr[3] = curr[0](*curr[1], **curr[2])
>    except Exception, curr[4]: pass
>    syn2.release() # signal we're done
> === CUT ===
> _______________________________________________
> Yum-devel mailing list
> Yum-devel@lists.baseurl.org
> http://lists.baseurl.org/mailman/listinfo/yum-devel
>

Have you taken a look at multiprocessing

http://docs.python.org/library/multiprocessing.html

Tim
_______________________________________________
Yum-devel mailing list
Yum-devel@lists.baseurl.org
http://lists.baseurl.org/mailman/listinfo/yum-devel

Reply via email to