Sent from my HTC on the Now Network from Sprint!
----- Reply message -----
From: "Ryan Rosario" <uclamath...@gmail.com>
Date: Fri, Oct 16, 2009 1:43 PM
Subject: [twitter-dev] Serial or Parallel? Does it make a difference for the
To: "Twitter Development Talk" <email@example.com>
I am working on a fairly large research project so I am in the process
of trying to retrieve the most recent 200 tweets for 400,000 users. It
didn't seem like a problem because individual queries took about 1
second to return. Among 5 machines then, this should take about 22.2
hours assuming each request takes 1 second.
After 24 hours, I have retrieved only 25,000 users. Of course, I
realize there is variance in my 1 user/second estimate, but this seems
quite slow, retrieving between 10 and 80 users per minute, I was
expecting to be blocked by rate limiting each hour, but I am nowhere
even close to hitting the 20,000/hr whitelist limit.
Might it be better to parallelize this process using map/reduce to
make several requests simultaneously? Or does the Twitter API HTTP
block the other requests while waiting for the first to complete?