I'm writing a kind of a web scanner that should retrieve and analyze about 
100k URLs as fast as possible. Of course, it will take time anyway, but I'm 
looking for how to utilize my CPUs and network as much as possible. 

My initial approach was to add all available processors, pack urls into 
tasks and run these tasks in parallel: 

    
using Requests
urls = ...
@time @sync @parallel for url in urls
    resp = get(url)
    println("Status: $(resp.status)")
end

My assumption was that 100k tasks would be created, each task would execute 
GET request and, since this is IO operation, free current thread for other 
tasks. From logs, however, I see that each worker executes tasks one by 
one, every time waiting for GET request to finish. 

So how do I start 100k requests in parallel? 

(100k is here just for example, I can easily split then into chunks of 10k, 
for example, so system limits and overused CPU/network are not an issue; 
issue is in their *underutilization*). 

Thanks

Reply via email to