I am working on a fairly large research project so I am in the process of trying to retrieve the most recent 200 tweets for 400,000 users. It didn't seem like a problem because individual queries took about 1 second to return. Among 5 machines then, this should take about 22.2 hours assuming each request takes 1 second.
After 24 hours, I have retrieved only 25,000 users. Of course, I realize there is variance in my 1 user/second estimate, but this seems quite slow, retrieving between 10 and 80 users per minute, I was expecting to be blocked by rate limiting each hour, but I am nowhere even close to hitting the 20,000/hr whitelist limit. Might it be better to parallelize this process using map/reduce to make several requests simultaneously? Or does the Twitter API HTTP block the other requests while waiting for the first to complete?