I'm trying to write a multi-task downloader to download files from a website using multi-threading. I have one thread to analyze the webpage, get the addresses of the files to be downloaded and put these in a Queue. Then the main thread will start some threads to get the address from the queue and download it. To keep the maximum files downloaded concurrently, I use a semaphore to control this, like at most 5 downloads at same time.
I tried to use urllib.urlretreive in the download() thread, but from time to time, it seems that one download thread may freeze the whole program. Then I gave up and use subprocess to call wget to do the job. My download thread is like this: def download( url ): subprocess.call(["wget", "-q", url]) with print_lock: print url, 'finished.' semaphore.realease() But later I found that after the specific wget job finished downloading, that download() thread never reach the print url statement. So I end up with files been downloaded, but the download() thread never ends and don't realease the semaphore then block the whole program. My guess is that at the time wget process ends, that specific download thread is not active and missed the return of the call. Any comment and suggestion about this problem? Thanks -- http://mail.python.org/mailman/listinfo/python-list