[twitter-dev] Re: Serial or Parallel? Does it make a difference for the API?

dbou...@gmail.com Sat, 17 Oct 2009 12:31:25 -0700


Sent from my HTC on the Now Network from Sprint!

----- Reply message -----
From: "Ryan Rosario" <uclamath...@gmail.com>
Date: Fri, Oct 16, 2009 1:43 PM
Subject: [twitter-dev] Serial or Parallel? Does it make a difference for the 
API?
To: "Twitter Development Talk" <twitter-development-talk@googlegroups.com>

I am working on a fairly large research project so I am in the process
of trying to retrieve the most recent 200 tweets for 400,000 users. It
didn't seem like a problem because individual queries took about 1
second to return. Among 5 machines then, this should take about 22.2
hours assuming each request takes 1 second.

After 24 hours, I have retrieved only 25,000 users. Of course, I
realize there is variance in my 1 user/second estimate, but this seems
quite slow, retrieving between 10 and 80 users per minute, I was
expecting to be blocked by rate limiting each hour, but I am nowhere
even close to hitting the 20,000/hr whitelist limit.

Might it be better to parallelize this process using map/reduce to
make several requests simultaneously? Or does the Twitter API HTTP
block the other requests while waiting for the first to complete?

[twitter-dev] Re: Serial or Parallel? Does it make a difference for the API?

Reply via email to