Hi,

I did some experiments with parallelizing metadata downloads
using the 'bulk urlgrab' api.  Metadata initialization code
is pretty complex so I only came to a 'staged' 3-pass solution

1st pass: update metalink files.
2nd pass: update repomd files.
3rd pass: update sqlite files.

Each stage runs downloads in parallel but the preceding
stage has to (completely) finish first, so if just one
repo has a stale metalink, all repos have to wait.

Yes, it's possible to rewrite the repo initialization code
to a state machine, and do the staging independently for
each repo.  But that's IMO ugly and intrusive.  The most 
natural approach is to keep yumRepo.py with minimal changes,
just init repos in separate threads.

I assume sqlite and rpmdb are the main reasons we don't
support/allow threading in yum, as these break if thread
A queries a handle opened by thread B.

But we can wrap such calls, serialize them, and process 
all in the main thread's context [see below].

It's a hack, but rewriting a substantial part of yumRepo
(and supporting most of the old API at the same time)
seems worse.  I'm not convinced that's the way to go,
just an idea to discuss.

=== CUT ===
from thread import allocate_lock, get_ident
lock = allocate_lock()
syn1 = allocate_lock(); syn1.acquire()
syn2 = allocate_lock(); syn2.acquire()
curr = []

# single threading wrapper
def singlethreaded(fun):
    def forward(*arg, **karg):
        lock.acquire() # serialize access to 'curr'
        curr[:] = fun, arg, karg, None, None
        syn1.release() # job ready
        syn2.acquire() # wait till done
        ret, exc = curr[3:]
        lock.release()
        if exc: raise exc
        return ret
    return forward

# An example API that does not like threads
main = get_ident()

@singlethreaded
def foo(arg):
    assert get_ident() == main
    print 'foo', arg

@singlethreaded
def bar(arg):
    assert get_ident() == main
    print 'bar', arg

# test
from thread import start_new
def thread(n):
    # this runs in threads, but foo & bar don't.
    foo(n)
    bar(n)
for n in range(3):
    start_new(thread, (n,))
while 1:
    syn1.acquire() # wait for request
    try: curr[3] = curr[0](*curr[1], **curr[2])
    except Exception, curr[4]: pass
    syn2.release() # signal we're done
=== CUT ===
_______________________________________________
Yum-devel mailing list
Yum-devel@lists.baseurl.org
http://lists.baseurl.org/mailman/listinfo/yum-devel

Reply via email to