Sent from my HTC on the Now Network from Sprint!
----- Reply message ----- From: "Ryan Rosario" <uclamath...@gmail.com> Date: Fri, Oct 16, 2009 1:43 PM Subject: [twitter-dev] Serial or Parallel? Does it make a difference for the API? To: "Twitter Development Talk" <twitter-development-talk@googlegroups.com> I am working on a fairly large research project so I am in the process of trying to retrieve the most recent 200 tweets for 400,000 users. It didn't seem like a problem because individual queries took about 1 second to return. Among 5 machines then, this should take about 22.2 hours assuming each request takes 1 second. After 24 hours, I have retrieved only 25,000 users. Of course, I realize there is variance in my 1 user/second estimate, but this seems quite slow, retrieving between 10 and 80 users per minute, I was expecting to be blocked by rate limiting each hour, but I am nowhere even close to hitting the 20,000/hr whitelist limit. Might it be better to parallelize this process using map/reduce to make several requests simultaneously? Or does the Twitter API HTTP block the other requests while waiting for the first to complete?