[twitter-dev] Re: Twitter Search API - Questions Regarding Scaling Out

2011-05-16 Thread Corey Ballou
Thanks for the feedback Brian. Late response here, but I'd be more
than willing to provide you with more details regarding our
application in a private email. You should be receiving said email
shortly.

Regards,

Corey

On Apr 14, 1:12 pm, Brian Sutorius bsutor...@twitter.com wrote:
 While the Streaming API may not provide processed results to you in
 the way that search queries can (logical ORs, etc.), it's a more
 scalable solution for returning a lot of Tweets. Our search system can
 rate limit queries if they become too computationally expensive (in
 addition to the normal query limit), so continuing to add parameters
 to the query up front rather than doing this processing yourself may
 cause you to keep running into limits. Ultimately, circumventing the
 limits put in place by our APIs is not allowed by our API ToS, and
 building your architecture this way just to get around the defaults is
 something we strongly discourage. If you keep being rate limited, you
 should think about re-factoring your prioritization strategy.

 Can you go into a little more detail about what your application does?
 We might be able to guide you towards a mix of Streaming API and
 search queries that gets you what you need but stays within the rate
 limits.

 Brian Sutorius
 Twitter API Policy

 On Apr 13, 10:28 am, Corey Ballou ball...@gmail.com wrote:







  I'm still looking for a community leader answer on this one.

  On Apr 11, 5:50 pm, Corey Ballou ball...@gmail.com wrote:

   Thanks for the reply, I appreciate it.

   I have concerns regarding the streaming APIs, which mainly concern the
   following:

   * usage of logical OR when using locations
   * firehose limitations
   * the user’s location field is not used to filter tweets
   * increased application complexity for parsing the resulting stream of
   data back out into individual searches

   I know that the Search API is not Twitter's preferred choice, but it's
   currently returning the best applicable results for my application.
   It's also worth noting that the API recently received a drastic
   improvement to speed which should theoretically relax the strain on
   the API:

  http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faste...

   I guess I'm mainly interested in knowing whether @twitterapi will
   allow me to use the Search API in the manner I indicated above?
   Essentially I would be willing to guarantee the application worker
   nodes handles 420 rate limiting errors accordingly while still
   supporting multiple twitter accounts and searches.

   On Apr 11, 1:05 pm, M. Edward (Ed) Borasky zn...@borasky-

   research.net wrote:
I don't see an answer here, but I'll tell you how *I* would go about
implementing this:

1. Switch to the Streaming API. Using Search in an application puts a 
strain
on Twitter's servers and makes it difficult to Twitter to manage 
capacity.
That's why it's rate-limited and why the rate limits aren't publicly
disclosed.

2. If your application is a desktop application, use User Streams. If 
it is
a server, use User Streams on a desktop or the low-frequency free 
access to
Streaming on a server to prototype and develop. Your target for a server
will be Site Streams, but that's in closed beta at the moment IIRC.

3. *Concurrently with development*, your business development / sales /
marketing / planning people, or yourself, if it's a one-person shop, 
should
be negotiating with Twitter for access to Site Streams, I'm assuming an
agile development methodology - customer-in-the-loop - and one of the
parties that needs to be in the loop is Twitter for Site Streams. You 
simply
*can't* build an at-scale Twitter application without direct business
discussions with Twitter!

On Mon, Apr 11, 2011 at 8:14 AM, Corey Ballou ball...@gmail.com wrote:
 I tried speaking with Ryan Sarver directly, but he's forwarding me
 here to the community advocates to answer. I believe this answer will
 need to come top down from Twitter, as it's your rate limiting that
 I'm most worried about.

 I have a technical question for all of you in regards to the Search
 API as I want to maintain full compliancy. Currently, the old Search
 API implementation (albeit slower) provides a fuller result set and
 allows for more flexibility in the types and combinations of searches
 allowed. The manner I have developed my application would allow for a
 number of daemonized worker instances running on different IP
 addresses to make calls to the search API on behalf of the stored
 OAuth credentials to avoid rate limiting issues.

 I had a conversation with the Pluggio developer in which he stated
 Twitter had threatened to shutdown his application if he didn't switch
 to a different implementation of the Search API. The problem indicated
 was that he was performing searches for 

[twitter-dev] Re: Twitter Search API - Questions Regarding Scaling Out

2011-04-14 Thread Brian Sutorius


On Apr 13, 10:28 am, Corey Ballou ball...@gmail.com wrote:
 I'm still looking for a community leader answer on this one.

 On Apr 11, 5:50 pm, Corey Ballou ball...@gmail.com wrote:



  Thanks for the reply, I appreciate it.

  I have concerns regarding the streaming APIs, which mainly concern the
  following:

  * usage of logical OR when using locations
  * firehose limitations
  * the user’s location field is not used to filter tweets
  * increased application complexity for parsing the resulting stream of
  data back out into individual searches

  I know that the Search API is not Twitter's preferred choice, but it's
  currently returning the best applicable results for my application.
  It's also worth noting that the API recently received a drastic
  improvement to speed which should theoretically relax the strain on
  the API:

 http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faste...

  I guess I'm mainly interested in knowing whether @twitterapi will
  allow me to use the Search API in the manner I indicated above?
  Essentially I would be willing to guarantee the application worker
  nodes handles 420 rate limiting errors accordingly while still
  supporting multiple twitter accounts and searches.

  On Apr 11, 1:05 pm, M. Edward (Ed) Borasky zn...@borasky-

  research.net wrote:
   I don't see an answer here, but I'll tell you how *I* would go about
   implementing this:

   1. Switch to the Streaming API. Using Search in an application puts a 
   strain
   on Twitter's servers and makes it difficult to Twitter to manage capacity.
   That's why it's rate-limited and why the rate limits aren't publicly
   disclosed.

   2. If your application is a desktop application, use User Streams. If it 
   is
   a server, use User Streams on a desktop or the low-frequency free access 
   to
   Streaming on a server to prototype and develop. Your target for a server
   will be Site Streams, but that's in closed beta at the moment IIRC.

   3. *Concurrently with development*, your business development / sales /
   marketing / planning people, or yourself, if it's a one-person shop, 
   should
   be negotiating with Twitter for access to Site Streams, I'm assuming an
   agile development methodology - customer-in-the-loop - and one of the
   parties that needs to be in the loop is Twitter for Site Streams. You 
   simply
   *can't* build an at-scale Twitter application without direct business
   discussions with Twitter!

   On Mon, Apr 11, 2011 at 8:14 AM, Corey Ballou ball...@gmail.com wrote:
I tried speaking with Ryan Sarver directly, but he's forwarding me
here to the community advocates to answer. I believe this answer will
need to come top down from Twitter, as it's your rate limiting that
I'm most worried about.

I have a technical question for all of you in regards to the Search
API as I want to maintain full compliancy. Currently, the old Search
API implementation (albeit slower) provides a fuller result set and
allows for more flexibility in the types and combinations of searches
allowed. The manner I have developed my application would allow for a
number of daemonized worker instances running on different IP
addresses to make calls to the search API on behalf of the stored
OAuth credentials to avoid rate limiting issues.

I had a conversation with the Pluggio developer in which he stated
Twitter had threatened to shutdown his application if he didn't switch
to a different implementation of the Search API. The problem indicated
was that he was performing searches for multiple Twitter accounts,
which is exactly my use case. Site streams does not make as much sense
for my application given the search queries I wish to perform and the
necessity for logical AND operations on geo-location.

Do you foresee any problems with my current method of using different
IP addresses to stay under the rate limit? I'm trying to stay in full
compliance with Twitter's TOS and would love to find the most
applicable and API friendly solution. I know headway is being made
with Twitter's new search implementation so I would like to stay ahead
of the curve and not get myself stuck in a box.

I still need a method for polling for new search results (say, every
30 minutes, dependent upon the pricing plan) for non-logged in users.

Below is a scaled down representation of how I'm currently handling
searches to help you decide the best plan of action:

1) Searches are performed on a rolling queue basis, say one search
every thirty minutes. There can be a finite number of searches per
Twitter user (say 5 searches per Twitter account). There can be any
number of Twitter accounts.
2) Search results are stored locally for retrieval by a javascript
AJAX long-poller every minute to check for frequent changes.
3) When a user visits the search results page and filters 

[twitter-dev] Re: Twitter Search API - Questions Regarding Scaling Out

2011-04-14 Thread Brian Sutorius
While the Streaming API may not provide processed results to you in
the way that search queries can (logical ORs, etc.), it's a more
scalable solution for returning a lot of Tweets. Our search system can
rate limit queries if they become too computationally expensive (in
addition to the normal query limit), so continuing to add parameters
to the query up front rather than doing this processing yourself may
cause you to keep running into limits. Ultimately, circumventing the
limits put in place by our APIs is not allowed by our API ToS, and
building your architecture this way just to get around the defaults is
something we strongly discourage. If you keep being rate limited, you
should think about re-factoring your prioritization strategy.

Can you go into a little more detail about what your application does?
We might be able to guide you towards a mix of Streaming API and
search queries that gets you what you need but stays within the rate
limits.

Brian Sutorius
Twitter API Policy

On Apr 13, 10:28 am, Corey Ballou ball...@gmail.com wrote:
 I'm still looking for a community leader answer on this one.

 On Apr 11, 5:50 pm, Corey Ballou ball...@gmail.com wrote:



  Thanks for the reply, I appreciate it.

  I have concerns regarding the streaming APIs, which mainly concern the
  following:

  * usage of logical OR when using locations
  * firehose limitations
  * the user’s location field is not used to filter tweets
  * increased application complexity for parsing the resulting stream of
  data back out into individual searches

  I know that the Search API is not Twitter's preferred choice, but it's
  currently returning the best applicable results for my application.
  It's also worth noting that the API recently received a drastic
  improvement to speed which should theoretically relax the strain on
  the API:

 http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faste...

  I guess I'm mainly interested in knowing whether @twitterapi will
  allow me to use the Search API in the manner I indicated above?
  Essentially I would be willing to guarantee the application worker
  nodes handles 420 rate limiting errors accordingly while still
  supporting multiple twitter accounts and searches.

  On Apr 11, 1:05 pm, M. Edward (Ed) Borasky zn...@borasky-

  research.net wrote:
   I don't see an answer here, but I'll tell you how *I* would go about
   implementing this:

   1. Switch to the Streaming API. Using Search in an application puts a 
   strain
   on Twitter's servers and makes it difficult to Twitter to manage capacity.
   That's why it's rate-limited and why the rate limits aren't publicly
   disclosed.

   2. If your application is a desktop application, use User Streams. If it 
   is
   a server, use User Streams on a desktop or the low-frequency free access 
   to
   Streaming on a server to prototype and develop. Your target for a server
   will be Site Streams, but that's in closed beta at the moment IIRC.

   3. *Concurrently with development*, your business development / sales /
   marketing / planning people, or yourself, if it's a one-person shop, 
   should
   be negotiating with Twitter for access to Site Streams, I'm assuming an
   agile development methodology - customer-in-the-loop - and one of the
   parties that needs to be in the loop is Twitter for Site Streams. You 
   simply
   *can't* build an at-scale Twitter application without direct business
   discussions with Twitter!

   On Mon, Apr 11, 2011 at 8:14 AM, Corey Ballou ball...@gmail.com wrote:
I tried speaking with Ryan Sarver directly, but he's forwarding me
here to the community advocates to answer. I believe this answer will
need to come top down from Twitter, as it's your rate limiting that
I'm most worried about.

I have a technical question for all of you in regards to the Search
API as I want to maintain full compliancy. Currently, the old Search
API implementation (albeit slower) provides a fuller result set and
allows for more flexibility in the types and combinations of searches
allowed. The manner I have developed my application would allow for a
number of daemonized worker instances running on different IP
addresses to make calls to the search API on behalf of the stored
OAuth credentials to avoid rate limiting issues.

I had a conversation with the Pluggio developer in which he stated
Twitter had threatened to shutdown his application if he didn't switch
to a different implementation of the Search API. The problem indicated
was that he was performing searches for multiple Twitter accounts,
which is exactly my use case. Site streams does not make as much sense
for my application given the search queries I wish to perform and the
necessity for logical AND operations on geo-location.

Do you foresee any problems with my current method of using different
IP addresses to stay under the rate limit? I'm trying to stay in 

[twitter-dev] Re: Twitter Search API - Questions Regarding Scaling Out

2011-04-13 Thread Corey Ballou
I'm still looking for a community leader answer on this one.

On Apr 11, 5:50 pm, Corey Ballou ball...@gmail.com wrote:
 Thanks for the reply, I appreciate it.

 I have concerns regarding the streaming APIs, which mainly concern the
 following:

 * usage of logical OR when using locations
 * firehose limitations
 * the user’s location field is not used to filter tweets
 * increased application complexity for parsing the resulting stream of
 data back out into individual searches

 I know that the Search API is not Twitter's preferred choice, but it's
 currently returning the best applicable results for my application.
 It's also worth noting that the API recently received a drastic
 improvement to speed which should theoretically relax the strain on
 the API:

 http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faste...

 I guess I'm mainly interested in knowing whether @twitterapi will
 allow me to use the Search API in the manner I indicated above?
 Essentially I would be willing to guarantee the application worker
 nodes handles 420 rate limiting errors accordingly while still
 supporting multiple twitter accounts and searches.

 On Apr 11, 1:05 pm, M. Edward (Ed) Borasky zn...@borasky-







 research.net wrote:
  I don't see an answer here, but I'll tell you how *I* would go about
  implementing this:

  1. Switch to the Streaming API. Using Search in an application puts a strain
  on Twitter's servers and makes it difficult to Twitter to manage capacity.
  That's why it's rate-limited and why the rate limits aren't publicly
  disclosed.

  2. If your application is a desktop application, use User Streams. If it is
  a server, use User Streams on a desktop or the low-frequency free access to
  Streaming on a server to prototype and develop. Your target for a server
  will be Site Streams, but that's in closed beta at the moment IIRC.

  3. *Concurrently with development*, your business development / sales /
  marketing / planning people, or yourself, if it's a one-person shop, should
  be negotiating with Twitter for access to Site Streams, I'm assuming an
  agile development methodology - customer-in-the-loop - and one of the
  parties that needs to be in the loop is Twitter for Site Streams. You simply
  *can't* build an at-scale Twitter application without direct business
  discussions with Twitter!

  On Mon, Apr 11, 2011 at 8:14 AM, Corey Ballou ball...@gmail.com wrote:
   I tried speaking with Ryan Sarver directly, but he's forwarding me
   here to the community advocates to answer. I believe this answer will
   need to come top down from Twitter, as it's your rate limiting that
   I'm most worried about.

   I have a technical question for all of you in regards to the Search
   API as I want to maintain full compliancy. Currently, the old Search
   API implementation (albeit slower) provides a fuller result set and
   allows for more flexibility in the types and combinations of searches
   allowed. The manner I have developed my application would allow for a
   number of daemonized worker instances running on different IP
   addresses to make calls to the search API on behalf of the stored
   OAuth credentials to avoid rate limiting issues.

   I had a conversation with the Pluggio developer in which he stated
   Twitter had threatened to shutdown his application if he didn't switch
   to a different implementation of the Search API. The problem indicated
   was that he was performing searches for multiple Twitter accounts,
   which is exactly my use case. Site streams does not make as much sense
   for my application given the search queries I wish to perform and the
   necessity for logical AND operations on geo-location.

   Do you foresee any problems with my current method of using different
   IP addresses to stay under the rate limit? I'm trying to stay in full
   compliance with Twitter's TOS and would love to find the most
   applicable and API friendly solution. I know headway is being made
   with Twitter's new search implementation so I would like to stay ahead
   of the curve and not get myself stuck in a box.

   I still need a method for polling for new search results (say, every
   30 minutes, dependent upon the pricing plan) for non-logged in users.

   Below is a scaled down representation of how I'm currently handling
   searches to help you decide the best plan of action:

   1) Searches are performed on a rolling queue basis, say one search
   every thirty minutes. There can be a finite number of searches per
   Twitter user (say 5 searches per Twitter account). There can be any
   number of Twitter accounts.
   2) Search results are stored locally for retrieval by a javascript
   AJAX long-poller every minute to check for frequent changes.
   3) When a user visits the search results page and filters results, no
   API calls to Twitter are made, only a local query is required

   Due to this process, the queue is constantly searching for the next
   searches 

[twitter-dev] Re: Twitter Search API - Questions Regarding Scaling Out

2011-04-11 Thread Corey Ballou
Thanks for the reply, I appreciate it.

I have concerns regarding the streaming APIs, which mainly concern the
following:

* usage of logical OR when using locations
* firehose limitations
* the user’s location field is not used to filter tweets
* increased application complexity for parsing the resulting stream of
data back out into individual searches

I know that the Search API is not Twitter's preferred choice, but it's
currently returning the best applicable results for my application.
It's also worth noting that the API recently received a drastic
improvement to speed which should theoretically relax the strain on
the API:

http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faster_1656.html

I guess I'm mainly interested in knowing whether @twitterapi will
allow me to use the Search API in the manner I indicated above?
Essentially I would be willing to guarantee the application worker
nodes handles 420 rate limiting errors accordingly while still
supporting multiple twitter accounts and searches.

On Apr 11, 1:05 pm, M. Edward (Ed) Borasky zn...@borasky-
research.net wrote:
 I don't see an answer here, but I'll tell you how *I* would go about
 implementing this:

 1. Switch to the Streaming API. Using Search in an application puts a strain
 on Twitter's servers and makes it difficult to Twitter to manage capacity.
 That's why it's rate-limited and why the rate limits aren't publicly
 disclosed.

 2. If your application is a desktop application, use User Streams. If it is
 a server, use User Streams on a desktop or the low-frequency free access to
 Streaming on a server to prototype and develop. Your target for a server
 will be Site Streams, but that's in closed beta at the moment IIRC.

 3. *Concurrently with development*, your business development / sales /
 marketing / planning people, or yourself, if it's a one-person shop, should
 be negotiating with Twitter for access to Site Streams, I'm assuming an
 agile development methodology - customer-in-the-loop - and one of the
 parties that needs to be in the loop is Twitter for Site Streams. You simply
 *can't* build an at-scale Twitter application without direct business
 discussions with Twitter!









 On Mon, Apr 11, 2011 at 8:14 AM, Corey Ballou ball...@gmail.com wrote:
  I tried speaking with Ryan Sarver directly, but he's forwarding me
  here to the community advocates to answer. I believe this answer will
  need to come top down from Twitter, as it's your rate limiting that
  I'm most worried about.

  I have a technical question for all of you in regards to the Search
  API as I want to maintain full compliancy. Currently, the old Search
  API implementation (albeit slower) provides a fuller result set and
  allows for more flexibility in the types and combinations of searches
  allowed. The manner I have developed my application would allow for a
  number of daemonized worker instances running on different IP
  addresses to make calls to the search API on behalf of the stored
  OAuth credentials to avoid rate limiting issues.

  I had a conversation with the Pluggio developer in which he stated
  Twitter had threatened to shutdown his application if he didn't switch
  to a different implementation of the Search API. The problem indicated
  was that he was performing searches for multiple Twitter accounts,
  which is exactly my use case. Site streams does not make as much sense
  for my application given the search queries I wish to perform and the
  necessity for logical AND operations on geo-location.

  Do you foresee any problems with my current method of using different
  IP addresses to stay under the rate limit? I'm trying to stay in full
  compliance with Twitter's TOS and would love to find the most
  applicable and API friendly solution. I know headway is being made
  with Twitter's new search implementation so I would like to stay ahead
  of the curve and not get myself stuck in a box.

  I still need a method for polling for new search results (say, every
  30 minutes, dependent upon the pricing plan) for non-logged in users.

  Below is a scaled down representation of how I'm currently handling
  searches to help you decide the best plan of action:

  1) Searches are performed on a rolling queue basis, say one search
  every thirty minutes. There can be a finite number of searches per
  Twitter user (say 5 searches per Twitter account). There can be any
  number of Twitter accounts.
  2) Search results are stored locally for retrieval by a javascript
  AJAX long-poller every minute to check for frequent changes.
  3) When a user visits the search results page and filters results, no
  API calls to Twitter are made, only a local query is required

  Due to this process, the queue is constantly searching for the next
  searches and mentions to perform. I foresee rate limiting concerns
  cropping up with searches being performed for any number of users.

  Can you steer me in the right direction to avoid shutdown