What release may we see this in? On 5/27/11 2:47 AM, "Miroslav Suchy" <msu...@redhat.com> wrote:
>I have played with python and parallelism. The result is commit >2340364e1063b5c38f5d47575c19e53fd4efc516, and I would like to share >something about it with all of you. > >To make story short - download of rpm files in satellite-sync is now >faster by factor 1.5. > >The download of rpm was in quite isolated for-loop, each loop >independent. So I thought it is nice candidate for parallelism. > >I created new class ThreadDownload, which took one file to be downloaded >from queue. >I start several threads - each consist of instance instance >ThreadDownload. When the queue is empty, thread will finish. >Original for-loop just populate queue and then pick up downloaded >packages from out_queue and write info to screen and/or to log. > >I find that one part of rhnlib is not reentrant, so I put lock around >that part of code. This is place for future improvement. > >I used 4 concurent threads. The HTTP spec says that a single user client >should not use more than two persistent ones per server >(which on the other hand is somewhat modest these days, mainstream >browsers limit it to 6 or so I think). > >I used threading instead of multiprocessing, because multiprocessing is >not available in RHEL5 (python 2.4). > >For the same reason I used simple Queue, even I know it is suboptimal. >If large file happen to be last, then you have finish the downloading >only with one thread. So better option is to use PriorityQueue and >download largest files as first, so all threads will be utilized all the >time. But again PrirorityQueue was introduced in python 2.6 and rhel5 >has 2.4 :( >Well in fact best order is SmallestFile, 1-LargestFile, 2-LargestFile, >... This way we can get best time estimation very quickly and the we get >best utilization of threads. If somebody want to implement this and >workaround that missing PriorityQueue on RHEL5 - be my guest. >I suppose the very similar code can go to repo-sync, but I do not use, >so my motivation is very low for that part of code... > >I tested the code quite intensively, but if I broke something ... you >know my irc nick and email... > >Here are some benchmarks I done on syncing channel >redhat-rhn-proxy-5.4-server-i386-5. Given times are always start and end >of phase, where satellite-sync downloads rpm files. > >threading on >run 1: >05:46:29 >05:48:48 >=2:19 > >run 2: >06:12:11 >06:14:34 >=2:23 > >avg. 2:21 > >threading off >run 1: >05:52:16 >05:55:50 >=3:34 > >run 2: >08:34:33 >08:38:23 >= 3:50 > >avg 3:42 > >Comments are welcome. If there will be positive feedback, I think we can >apply the same pattern to errata and kickstarts. > >Mirek > >_______________________________________________ >Spacewalk-devel mailing list >Spacewalk-devel@redhat.com >https://www.redhat.com/mailman/listinfo/spacewalk-devel _______________________________________________ Spacewalk-devel mailing list Spacewalk-devel@redhat.com https://www.redhat.com/mailman/listinfo/spacewalk-devel