What release may we see this in?

On 5/27/11 2:47 AM, "Miroslav Suchy" <msu...@redhat.com> wrote:

>I have played with python and parallelism. The result is commit
>2340364e1063b5c38f5d47575c19e53fd4efc516, and I would like to share
>something about it with all of you.
>
>To make story short - download of rpm files in satellite-sync is now
>faster by factor 1.5.
>
>The download of rpm was in quite isolated for-loop, each loop
>independent. So I thought it is nice candidate for parallelism.
>
>I created new class ThreadDownload, which took one file to be downloaded
>from queue.
>I start several threads - each consist of instance instance
>ThreadDownload. When the queue is empty, thread will finish.
>Original for-loop just populate queue and then pick up downloaded
>packages from out_queue and write info to screen and/or to log.
>
>I find that one part of rhnlib is not reentrant, so I put lock around
>that part of code. This is place for future improvement.
>
>I used 4 concurent threads. The HTTP spec says that a single user client
>should not use more than two persistent ones per server
>(which on the other hand is somewhat modest these days, mainstream
>browsers limit it to 6 or so I think).
>
>I used threading instead of multiprocessing, because multiprocessing is
>not available in RHEL5 (python 2.4).
>
>For the same reason I used simple Queue, even I know it is suboptimal.
>If large file happen to be last, then you have finish the downloading
>only with one thread. So better option is to use PriorityQueue and
>download largest files as first, so all threads will be utilized all the
>time. But again PrirorityQueue was introduced in python 2.6 and rhel5
>has 2.4 :(
>Well in fact best order is SmallestFile, 1-LargestFile, 2-LargestFile,
>... This way we can get best time estimation very quickly and the we get
>best utilization of threads. If somebody want to implement this and
>workaround that missing PriorityQueue on RHEL5 - be my guest.
>I suppose the very similar code can go to repo-sync, but I do not use,
>so my motivation is very low for that part of code...
>
>I tested the code quite intensively, but if I broke something ... you
>know my irc nick and email...
>
>Here are some benchmarks I done on syncing channel
>redhat-rhn-proxy-5.4-server-i386-5. Given times are always start and end
>of phase, where satellite-sync downloads rpm files.
>
>threading on
>run 1:
>05:46:29
>05:48:48
>=2:19
>
>run 2:
>06:12:11
>06:14:34
>=2:23
>
>avg. 2:21
>
>threading off
>run 1:
>05:52:16
>05:55:50
>=3:34
>
>run 2:
>08:34:33
>08:38:23
>= 3:50
>
>avg 3:42
>
>Comments are welcome. If there will be positive feedback, I think we can
>apply the same pattern to errata and kickstarts.
>
>Mirek
>
>_______________________________________________
>Spacewalk-devel mailing list
>Spacewalk-devel@redhat.com
>https://www.redhat.com/mailman/listinfo/spacewalk-devel


_______________________________________________
Spacewalk-devel mailing list
Spacewalk-devel@redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-devel

Reply via email to