[twitter-dev] Re: Getting screen_name from id without gazillion API calls?

2009-09-05 Thread owkaye
The ideal solution is for Twitter to change the system and 
allow each account to have only one screen name, all the 
time, forever, with no changes.  Then a separate id value 
is not required because all account identification will be 
done by the original screen name.

REST and SEARCH would finally be consistent.  No extra calls 
to figure out who the user really is.  Users would complain 
until they got used to the fact that they cannot change 
their screen names on a whim anymore, but they will learn to 
deal with it soon enough.

Email doesn't just let you change your address whenever you 
feel like it, and I see no reason why Twitter should allow 
screen name changes either ... except that it takes more 
work to standardize the system in this way than to continue 
with what already exists.

But with only the screen name as each unique account 
identifier things would certainly be much simpler.  Many 
fewer requests to the server.  Less data storage.  And being 
that Twitter is supposed to be simple this seems like a goal 
worth pursuing, at least from my point of view.

Owkaye





  When i request friends (or followers) from the Twitter
  API i want to get the screen_name's based on the id's.
 
  I use users/show for this, inputting the id and
  getting back de screen_name.
  This costs ALOT of API calls and i run into the API
  rate limit fast, especially with many friends.
 
  Is there a better way of getting screen_names for
  friends / followers?
  ( Better, meaning in fewer API calls.)




[twitter-dev] Re: Followers/screen_names API

2009-09-05 Thread owkaye
You've just made a perfect argument for my suggestion that 
Twitter use ONLY unchangeable screen names (no more ids) for 
the whole system.

:)

Owkaye



 I know there's been a ton of request for a
 followers/screen_names API, or a friends/screen_names one
 for that matter. Right now the only way of getting all of
 a user's followers is with
 http://twitter.com/followers/ids.xml and that only
 renders the id's. There's no efficient way of getting the
 associated screen_names without doing
 hundreds/thousands/millions of calls or running into API
 rate limits. Twitter has rejected the creation of a
 followers/screen_names API due to performance issues/
 concerns. What if I or you want to present our app users
 with a human readable list of their followers/friends. I
 believe the alternative is a much more performance heavy
 approach for Twitter. What's to stop me from creating a
 1000 (or more) unique users that my app/service uses to
 resolve id's into screen_names? That way I would have
 hundreds of thousands of API calls available each hour
 and could easily create a locally cached db of
 id-to-screen_name pairs. And of course I would have to
 recheck all of them every few days or so to account for
 screen_name changes, since there isn't an API for that
 either. All of this would result in millions of API calls
 a day, just to do something that Twitter could enable
 with one simple API... Hell, I could register a hundred
 thousand users, and create a service that maintains an
 id-to-screen_name pair db for Twitter's entire userbase
 and make it available to the dev community as a service
 to work around this issue... What do you think? Wouldn't
 it be much easier and beneficial to Twitter to enable
 this simple API that many of us have been asking for for
 so long now?

 I look forward to you thoughts...

 Michael




[twitter-dev] A few simple Search API questions

2009-09-04 Thread owkaye
My app needs to retrieve tweets that match a specific phrase:

1- How many searches are allowed from a single IP address 
per hour?  I'm thinking of doing one per minute, is that too 
many?

2- I cannot find examples of phrase-based searches in the API 
docs.  Can someone post a working example of a curl search 
that requires a phrase match?

Owkaye







[twitter-dev] Search API limits

2009-09-04 Thread owkaye
How can I retrieve the maximum number of tweets in a search?

Can rpp be set to more than 100?

What if I do not send a rpp value, does twitter default to 
returning more than 100 per page?

Owkaye






[twitter-dev] Re: Search API limits

2009-09-04 Thread owkaye
 The key to max search results isn't in paging or rpp, but
 in max_id.


Hi David,

I do not understand how max_id can help me.

If I want to get the 10,000 most recent tweets that match 
the phrase michael jackson changing the max_id value 
doesn't seem like it's going to help at all.  

In fact, it doesn't even make sense to use it when trying to 
retrieve the most recent tweets, does it?


 Be careful what you ask for. Retrieval of everything
 available can take a long time (hours).

My understanding is that every request is limited to 100 
tweets max, and this forces multiple requests when trying to 
retrieve more than 100 tweets.  Am I wrong about this?

Owkaye






[twitter-dev] Re: All friends and followers of a Twitter user

2009-08-13 Thread owkaye

Hi Peter, 

I got it working already, that was easy ... and FAST thanks 
to your help!

Owkaye




 friends/idshttp://apiwiki.twitter.com/Twitter-REST-API-M
ethod%3A-friends%C2%A0ids

 followers/idshttp://apiwiki.twitter.com/Twitter-REST-API
-Method%3A-followers%C2%A0ids



[twitter-dev] Re: FW: Twitter is Suing me!!!

2009-08-12 Thread owkaye

 I surely hope people would not judge
 me based on who is following me.

They won't unless they are stupid.  After all, Twitter gives 
you no way to control who follows you, and most people 
understand this.


 Followers do no, zero, nada harm. 
 Just let them be.

Agreed. 


Owkaye






[twitter-dev] Re: FW: Twitter is Suing me!!!

2009-08-12 Thread owkaye

  Perhaps I'm being daft, but how can someone following
  you be spam or wrong, regardless of whether it is
  manual or auto follow?

 It can be spam if you had your account sent to
 auto-notify your phone or inbox when someone follows you.

You're wrong.

SPAM only exists when you do NOT ask for commercial messages 
to be sent to you, and in this case clearly you are asking 
for it.

Owkaye






[twitter-dev] Re: FW: Twitter is Suing me!!!

2009-08-12 Thread owkaye

   I surely hope people would not judge
   me based on who is following me.
 
  They won't unless they are stupid.  After all, Twitter
  gives you no way to control who follows you, and most
  people understand this.

 sure they do. it's called blocking. every time a pain
 in the ass porn bot or social media expert following
 100x more people than follow them follows me, i block
 them. then they can't follow me.

I guess I care so little about who is following me that I 
never bothered to learn about this.  Now that I know about 
it I still have no use for it, and I never will ... but it's 
good to know it's available to others I guess.

Owkaye



[twitter-dev] Re: Following Churn: Specific guidance needed

2009-08-11 Thread owkaye

  Would be very helpful to know the definition of quick
  as relates to following churn suspensions.

 As Cameron pointed out earlier, as soon as they do that,
 the following churners will adjust their methods to be
 just inside that definition of OK.

This seems like a really short-sighted reason for NOT 
clarifying what's acceptable and what's not.  

If it's acceptable then who cares if the churners adjust 
their methods?  At least everyone will know how to avoid 
problems for a change, right?



[twitter-dev] Re: Following Churn: Specific guidance needed

2009-08-11 Thread owkaye

 If users paid due diligence to those they follow and only
 followed those people who demonstrate some value to them,
 follower churn would not exist. Period.

Obviously they won't so maybe it's time to deal with reality 
rather than dreaming of a perfect world.

Owkaye



[twitter-dev] Re: Following Churn: Specific guidance needed

2009-08-11 Thread owkaye



Owkaye




Would be very helpful to know the definition of
quick as relates to following churn suspensions.
  
   As Cameron pointed out earlier, as soon as they do
   that, the following churners will adjust their
   methods to be just inside that definition of OK.
 
  This seems like a really short-sighted reason for NOT
  clarifying what's acceptable and what's not.

 The alternative is considerably more restrictive limits
 that globally apply so that any value up to the mythical
 X has little repercussion ...

Well at least it's fair to everyone EQUALLY instead of 
possibly being prejudice against certain users.



[twitter-dev] Re: API only shows messages from last 7 days

2009-07-29 Thread owkaye

You're probably correct when you say that throwing more 
programmers at the problem is not the solution.  That's not 
what I was suggesting ...

My thought is that there may be no one at Twitter actually 
planning or developing a plan for historical data access, 
and if this is true then hiring someone with the skills and 
the desire to implement this in a practical manner would go 
a very long way towards providing people like us with a 
workable solution now.

Having said this, I agree that in the absence of enough 
people in the company who can be trusted to make wise 
decisions and accomplish a wide variety of projects all at 
the same time, it ends up becoming a priority issue.  When 
there are too few people available to actually take charge 
and make progress on projects like the one we've been 
discussing in this thread, it all comes down to priorities 
-- and when those priorities focus on things we do not need, 
the things we really want are set aside and ignored, with no 
progress being made.

In other companies money is a significant limiting factor, 
but I tend to question this at Twitter given all the reports 
of their financial condition, so I really think it's a 
priority issue in Twitter's case.

Now, if only someone at Twitter could see how important 
historical data access can be to real businesses, and how 
these businesses might be willing to pay for this data, then 
all it would take is to hire the right person to implement 
it.  Twitter simply needs the money, the current ability to 
recognize the future value of such a project, and the 
commitment to make it happen ... and then they hire a 
leader who gets it done.

Easier said than done of course, but there are excellent 
people available who can accomplish such goals when given 
the chance -- and the support they need from within the 
company of course.  

Then again, if these people are already working on it (as 
you may have suggested) then it's going to happen one of 
these days anyways ... :)

Owkaye






 I don't think that adding more people to the staff at
 Twitter is the solution. In one startup I saw a thing
 posted on the refrigerator that had the adage, Adding
 more people to a project already behind schedule will
 only slow it down more. Surely for support and customer
 service issues having more people on the team to deal
 with growth is good, but I doubt throwing more
 programmers at it will help fix most issues. It just
 never seems to work that way.

 While many startups do tend toward younger employees (I
 personally think because being younger normally means
 that you can work a lot with minimal life impact), I'm
 sure that someone with a strong background would be able
 to get a job at Twitter if they were local to the company
 (or willing to move).

 A lot of this surely comes down to priorities inside the
 company. While Doug and Team want to support us
 developers as much as possible, much of our initial
 'value' that we've offered in helping push twitter to the
 masses has already happened. We aren't the core business
 strategy, and with a fixed amount of resources and focus
 they aren't working to push mainly for developer access,
 but for standard user access. This 100% makes sense.
 Users are what is going to make twitter happen, not 3rd
 party developers. They want to provide a stable
 experience on both fronts, but users come first.

 In my private discussions with some team members, I've
 gotten the sense that they have good stuff in the
 pipeline for us and that they are working hard to make it
 happen. However we're only a small part of the overall
 strategy of a quickly growing company that is still
 dealing with massive growing pains which is no fault of
 theirs and something they are dealing with as best they
 can.

 david

 On Jul 28, 1:46 pm, owkaye owk...@gmail.com wrote:
  I'm sure others feel the same way Dave, but it looks
  and feels like Twitter is moving in the opposite
  direction.
 
  The load on a server to extract a big dataset once a
  month would be minimal, and both you and I can see the
  value in this approach. But I'm not sure the folks at
  Twitter do, or if they do maybe they just don't have
  the people who can (and will) get things like this
  implemented.  Is a shortage of competent staff the
  cause of this type of problem?
 
  Even though I have the capabilities I do not have the
  'resume' to get a job there and help them deal with
  some of this stuff, nor do I have the contacts within
  the Twitter organization to put a good word in for me
  and help me get hired so I could do good things for
  them.
 
  I'm 52 years old too, and my age seems to be a negative
  to most of the Web 2.x companies hiring these days.
   This is kind of a shame considering that people like
  me frequently have broader-based experience and
  insights that are sometimes lacking in younger people,
  and because of this we can add a lot more value in the
  areas of planning and structural

[twitter-dev] Re: API only shows messages from last 7 days

2009-07-28 Thread owkaye

I agree with you Dave.  I have several thought about new 
services based on searching Twitter's historical data.  
Unfortunately my ideas appear to be getting less and less 
practical.

Twitter claims to have all its data stored in disk-based 
databases from what I understand ... yet without access to 
this data it is worthless.  

It seems to me they could allow searches of this historical 
data via a new History API then let us cache the results 
on our own servers.  Most of the services I've conceived 
would do this infrequently -- never in real time -- and 
would not impact their existing cached server data because 
this historical data would exist on separate data storage 
servers ... theoretically anyways.

Owkaye







 I am a bit concerned. I remember at one point it being
 between 30-45 days. Now it seems to be getting smaller by
 about 1-day per month. Last month it was closer to 10
 days.

 Is it basically going to keep getting smaller and smaller
 until we get V2 of the API, or will we be forced to all
 use only streaming services and then locally cache
 everything that we'd want to search for any time period?

 I know there are a LOT of problems inherent in the
 massive scaling out of Twitter, and this is just a
 symptom of them- but at the same time I can only imagine
 how unusable Google would be if you only had a 7-day
 window to Search in, and couldn't get any content made
 prior to that. Very worried about this soon being a 2-3
 day window.

 dave


[twitter-dev] Re: API only shows messages from last 7 days

2009-07-28 Thread owkaye

I'm sure others feel the same way Dave, but it looks and 
feels like Twitter is moving in the opposite direction.  

The load on a server to extract a big dataset once a month 
would be minimal, and both you and I can see the value in 
this approach. But I'm not sure the folks at Twitter do, or 
if they do maybe they just don't have the people who can 
(and will) get things like this implemented.  Is a shortage 
of competent staff the cause of this type of problem?

Even though I have the capabilities I do not have the 
'resume' to get a job there and help them deal with some of 
this stuff, nor do I have the contacts within the Twitter 
organization to put a good word in for me and help me get 
hired so I could do good things for them. 

I'm 52 years old too, and my age seems to be a negative to 
most of the Web 2.x companies hiring these days.  This is 
kind of a shame considering that people like me frequently 
have broader-based experience and insights that are 
sometimes lacking in younger people, and because of this we 
can add a lot more value in the areas of planning and 
structural development than people half our age.  Our coding 
skills are honed after so many years of experience too, not 
to mention the thousands of code snippets we have collected 
over the years to contribute to making us even faster.

But since jobs like this are basically not open to me and 
many other folks my age, my alternative is to remain self-
employed and try to build something on top of their existing 
available source data and API's ... and then deal with the 
issues and frustrations created when building a service on 
top of a 'moving target' that sometimes seems to be moving 
in funny directions.

I hear about Twitter having lots of money to work with, and 
I'm probably wrong here but it almost seems like there's too 
little of this money being dedicated to paying new talent 
with long term views of some of these issues, and who will 
implement wise policies to help support and encourage rapid 
growth in the areas that are lacking.  But once again this 
might just be due to a shortage of the right staff.

Obviously we cannot do anything from the outside except 
point out these issues and ask questions, or beg and plead 
for changes, but it sure would be great if a few of us could 
actually get in there as employees and implement a couple of 
the new features we really need -- such as a new Historical 
Search API for example.  Then developers like you and I 
could proceed with some of our plans now, instead of months 
or years from now ... or possibly never.  I would love to 
lead a team on a project like this, or even be one of its 
members, but until it happens I'll focus on building my own 
little space in the Twitter universe and continue to hope 
for the best.

:)

Owkaye




 I would do anything (including paying good amounts of
 money) to be able to purchase access to older datasets
 that I could transfer to my database through non-rest-api
 methods. I'm envisioning being able to download a CSV or
 SQL file that I could merge with my database easily, but
 only have to make a single request to the server to get a
 month of data. I'd sign agreements and pay money for
 such.

 dave

 On Jul 28, 12:03 pm, owkaye owk...@gmail.com wrote:
  I agree with you Dave.  I have several thought about
  new services based on searching Twitter's historical
  data. Unfortunately my ideas appear to be getting less
  and less practical.
 
  Twitter claims to have all its data stored in
  disk-based databases from what I understand ... yet
  without access to this data it is worthless.
 
  It seems to me they could allow searches of this
  historical data via a new History API then let us
  cache the results on our own servers.  Most of the
  services I've conceived would do this infrequently --
  never in real time -- and would not impact their
  existing cached server data because this historical
  data would exist on separate data storage servers ...
  theoretically anyways.
 
  Owkaye
 
   I am a bit concerned. I remember at one point it
   being between 30-45 days. Now it seems to be getting
   smaller by about 1-day per month. Last month it was
   closer to 10 days.
  
   Is it basically going to keep getting smaller and
   smaller until we get V2 of the API, or will we be
   forced to all use only streaming services and then
   locally cache everything that we'd want to search for
   any time period?
  
   I know there are a LOT of problems inherent in the
   massive scaling out of Twitter, and this is just a
   symptom of them- but at the same time I can only
   imagine how unusable Google would be if you only had
   a 7-day window to Search in, and couldn't get any
   content made prior to that. Very worried about this
   soon being a 2-3 day window.
  
   dave




[twitter-dev] Re: Updating the APIs authentication limiting policy

2009-07-22 Thread owkaye

 One solution to this problem is to add to each twitter
 account another private ID.

Jim,

Wouldn't it make more sense to implement this private id 
thing on your own server?

My thought here is that your service should maintain its own 
database of users, and issue a unique private id for each 
of these users.

Then when the visitor tries to login, your code can check to 
see if the private id the visitor has entered is in your own 
database.  If so the person is allowed to login, and if not 
they get an error.

Would this work to solve the problem of am I missing 
something here?

Owkaye






[twitter-dev] Re: Is it okay to close a connection by opening a new one?

2009-07-15 Thread owkaye

  The Streaming API docs say we should avoid opening new
  connections with the same user:pass when that user
  already has a connection open.  But I'm hoping it is
  okay to do this every hour or so ...

 If you're only doing this every hour, that's fine by us.

Great, thanks for the confirmation Alex!

:)


[twitter-dev] Re: Is it okay to close a connection by opening a new one?

2009-07-15 Thread owkaye

 Why can't you do this entirely in your code?  Why do you
 need to close the connection and reconnect?

My software keeps the local data file open as long as the 
connection is open, so the connection must be closed before 
the file can be moved or deleted.


 Closing a file, moving it, and then creating a new file
 should be able to be done extremely fast ...

I know, but these cannot be done while the connection is 
open, thus the need to close it.  And since a new connection 
will need to be opened almost immediately anyways, the 
natural way for me to close it is to open a new one.


 JSON is a much better format to use.  

Not for me it isn't.  My software has built-in XML parsing 
capabilities but it doesn't know how to deal with JSON data 
so XML is clearly the best way for me to go.  


Owkaye






[twitter-dev] Re: Safe url shorteners

2009-07-15 Thread owkaye

 Just wanted to let you guys know about a free service
 we're prototyping for shortening URL's that overcomes a
 few of the limitations of other shorteners.

Only one problems with all these URL shorteners, when the 
companies creating them disappear all their shortened URLs 
become orphans and therefore useless.  

Not a major problem on Twitter because of the typical 
transience of data, but when you run a company like mine 
that needs to reference historic data it will definitely 
create future problems when these companies fail.

Just something for folks to consider ...

Owkaye






[twitter-dev] How to track a phrase in Streaming API?

2009-07-14 Thread owkaye

How do I track a phrase like harry potter?  

The docs only show how to track individual words, not 
phrases ... and this curl command doesn't work properly 
because it finds tweets with harry and not potter:

curl -o /home/ken/twitterStreamJSON.txt 
http://stream.twitter.com/track.json -u username:password -d 
track=harry potter,


Owkaye






[twitter-dev] Re: How to track a phrase in Streaming API?

2009-07-14 Thread owkaye

  How do I track a phrase like harry potter?
 
  The docs only show how to track individual words, not
  phrases ... and this curl command doesn't work properly
  because it finds tweets with harry and not potter:
 
  curl -o /home/ken/twitterStreamJSON.txt
  http://stream.twitter.com/track.json -u
  username:password -d track=harry potter,

 I think the problem is missing quotes and URL
 encoding. Try curl … -d track=harry+potter

Thanks for the suggestion Matt but that doesn't work either.  
Any other ideas?


Owkaye






[twitter-dev] Re: How to track a phrase in Streaming API?

2009-07-14 Thread owkaye

 Currently track works only on keywords, not phrases. 

This answers my question very clearly, thanks John!

I'm storing the data in a local database anyways, so I can 
just do a phrase search of my data and delete the records I 
don't need.  

More data than necessary gets transmitted from Twitter this 
way, but I guess there's no way around it -- and for me the 
end result is the same anyways -- so it looks like I can 
proceed successfully now.

Thanks again for everyone's help, I'll be back when I have 
new questions ... :)

Owkaye






[twitter-dev] Is it okay to close a connection by opening a new one?

2009-07-14 Thread owkaye

The Streaming API docs say we should avoid opening new 
connections with the same user:pass when that user already 
has a connection open.  But I'm hoping it is okay to do this 
every hour or so, here's why:

My plan is to write the streaming XML data to a text file 
during each connection -- but I don't want this file to get 
so big that I have trouble processing it on the back end.  
Therefore I want to rotate these files every hour ...

This means I have to stop writing to the file, close it, move 
it somewhere else, and create a new file so I can use the new 
file to continue storing new streaming XML data.

The obvious way for me to close these files is to close the 
connection -- by opening a new connection -- because from 
what I've read it seems that opening a new connection forces 
the previous connection to close.

Can I do this without running into any black listing or 
denial of service issues?  I mean, is this an acceptable way 
to close a connection ... by opening a new one in order to 
force the old connection to close?

Any info you can provide that will clarify this issue is 
greatly appreciated, thanks!

Owkaye







[twitter-dev] Re: How to insure that all tweets are retrieved in a search?

2009-07-13 Thread owkaye

 First, I wouldn't expect that thousands are going to post
 your promo code per minute. That doesn't seem realistic.

Hi John,

It's more than just a promo code.  There are other aspects 
of this promotion that might create an issue with thousands 
of tweets per minute.  If it happens and I haven't planned 
ahead to deal with it, then I'm screwed because some data 
will be missing that I really should be retrieving, and 
apparently I won't have any way to retrieve it later.


 Second, you can use the /track method on the Streaming
 API, which will return all keyword matches up to a certain
 limit with no other rate limiting. 

I guess this is what I need ... unless you or someone can 
reduce or eliminate the Search API limits.  It really seems 
inappropriate to tie up a connection for streaming data 24 
hours a day when I do not need streaming data.  

All I really need is a search that doesn't restrict me so 
much.  If I had this capability I could easily minimize my 
promotion's impact on Twitter by 2-3 orders of magnitude.  
From my perspective this seems like something Twitter might 
want to support, but then again I do not work at Twitter so 
I'm not as familiar with their priorities as you are.


 Contact us if the default limits are an issue.

I'm only guessing that they will become a problem, but it is 
very clear to me how easily they might become a problem.  

The unfortunate situation here is that *IF* these limits 
become a problem it's already too late to do anything about 
it -- because by then I've permanently lost access to some 
of the data I need -- and even though the data is still in 
your database there's no way for me to get it out because 
the search restrictions get in the way again.

It's just that the API is so limited that the techniques I 
might use with any other service are simply not available at 
Twitter.  For example, imagine this which is a far better 
scenario for my needs:

I run ONE search every day for my search terms, and Twitter 
responds with ALL the matching records no matter how many 
there are -- not just 100 per page or 1500 results per 
search but ALL matches, even if there are hundreds of 
thousands of them.  

If this were possible I could easily do only one search per 
day and store the results in a local database.  Then the 
next day I could run the same search again -- and limit this 
new search to the last 24 hours so I don't have to retrieve 
any of the same records I retrieved the previous day.

Can you imagine how must LESS this would impact Twitter's 
servers when I do not have to keep a connection open 24 
hours a day as with Streaming API ... and I do not have to 
run repetitive searches every few seconds all day long as 
with Search API?  The load savings on your servers would be 
huge, not to mention the bandwidth savings!!!

-

The bottom line here is that I hope you have people who 
understand this situation and are working to improve it, but 
in the meantime my only options appear to be:

1- Use the Streaming API which is clearly an inferior method 
for me because a broken connection will cause me to lose 
important data without warning.

2- Hope that someone at Twitter can raise the limits for 
me on their Search API so I can achieve my goals without 
running thousands of searches every day.

-

As you can see I'm trying to find the best way to get the 
data I need while minimizing the impact on Twitter, that's 
why I'm making comments / suggestions like the ones in this 
email.

So who should I contact at Twitter to see if they can raise 
the search limits for me?  Are you the man?  If not, please 
let me know who I should contact and how.

Thanks!

Owkaye




[twitter-dev] Re: How to insure that all tweets are retrieved in a search?

2009-07-13 Thread owkaye

 We tried allowing access to follower information in a
 one-query method like this and it failed. The main reason
 is that when there are tens of thousands of matches
 things start timing out. While all matches sounds like a
 perfect solution, in practice staying connected for
 minutes at a time and pulling down an unbounded size
 result set has not proved to be a scalable solution.

Maybe a different data system would allow this capability.  
But you have the system you have so I understand why you've 
done what you've done.


 There is no way for anyone at Twitter to change the
 pagination limits without changing them across the board.

This is too bad.  Are you working on changing this in the 
future or is this going to be a limitation that persists for 
years to come?


 As a side note: The pagination limits exist as a
 technical limit and not something meant to stifle
 creativity/usefulness. When you go back in time we have
 to read data from disk and replace recent data in memory
 with that older data. The pagination limit is there to
 prevent too much of our memory space being taken up by
 old data that a very small percentage of requests need.

Okay, this makes sense.  It sounds like the original system 
designers never gave much consideration to the value of 
historical data search and retrieval.  Too bad there's 
nothing that can be done about this right now, but maybe in 
the future ... ?


 The streaming API really is the most scalable solution.

No doubt.  It's disappointing that my software probably 
cannot handle streaming data too, but that's my problem not 
yours.  

Does anyone have sample PHP code that successfully uses the 
twitter Streaming API to retrieve the stream and write it to 
a file or database?  I hate PHP but if it works then that's 
what I'll use, especially if some helpful soul can post some 
code to help me get started.  Thanks.


Owkaye




[twitter-dev] Re: How to insure that all tweets are retrieved in a search?

2009-07-13 Thread owkaye

 I concur with Matt.

 Track in the Streaming API is, in part, intended for
 applications just like yours. Hit the Search API and use
 track together to get the highest proportion of statuses
 possible. The default track limit is intended for human
 readable scale applications. Email me about elevated
 track access for services.

I would use the Streaming API if I could, but now the 
problem is that my server side scripting language probably 
won't be able to use the Streaming API successfully ...

My software hasn't been upgraded in years, and when it was 
first coded streaming data via http didn't even exist.  The 
software has been upgraded once in a while over the past 
decade or so, but the last significant upgrade was more than 
5 years ago and it didn't have anything added to allow 
streaming data access at that time, so I doubt it can handle 
this task now.  

I have an email request in to the current owners but I doubt 
they know how it works either.  They never coded the 
original software or any of the upgrades.  They just bought 
the software without possessing the expertise to understand 
the code, so they really don't know how it works internally 
either.

My best guess is that it cannot write streaming data to a 
database as that data is transmitted, and that's what it 
needs it to do if I have any chance of using the Streaming 
API instead of a search.   So I'll probably have to use some 
other software to accomplish this task.  

Any suggestions which software I should use to make this as 
fast and easy to code as possible?


 It's possible that you are worrying about an unlikely
 event. Sustained single topic statuses in the thousands
 per minute are usually limited to things like massive
 social upheaval, big political events, celebrity death,
 etc.

You may be correct, but to plan for the possibility that 
this may be bigger than expected is simply the way I do 
business.  It doesn't make sense for me to launch a promo 
like this until I'm prepared for the possibilities, right?


Owkaye






[twitter-dev] How to insure that all tweets are retrieved in a search?

2009-07-09 Thread owkaye

I'm building an app that uses the atom search API to retrieve recent
posts which contain a specific keyword.  The API docs say:

Clients may request up to 1,500 statuses via the page and rpp
parameters for the search method.

But this 1500 hits per search cannot be done in a single request
because of the rpp limit.  Instead I have to perform 15 sequential
requests in order to get only 100 items returned on each page ... for
a total of 1500 items.

This is certainly a good way to increase the server load, since 15
connections at 100 results each takes far more server resources than 1
connection returring all 1500 results.  Therefore I'm wondering if I'm
misunderstanding something here, or if this is really the only way I
can get the maximum of 1500 items via atom search?


[twitter-dev] Re: How to insure that all tweets are retrieved in a search?

2009-07-09 Thread owkaye

Thanks Chad, that's what I was afraid of.  I wonder if you
know about this next question:

Twitter API docs say search is rate limited to something
more than REST which is 150 requests per hour, but for the
sake of argument let's say the search rate limit is actually
150 hits per hour ...

Since I have to do 15 consecutive searches to make sure I've
retrieved the last 1500 matching items, does this mean I can
only do 10 sets of 15 searches per hour = 150 request per
hour?

If so, this is only one set of searches every 6 minutes, and
it seems to me that on a trending topic there might be lots
more than 1500 new tweets every 6 minutes.

How can I get around this limit?

I'm not trying to hurt Twitter, but business applications that
require ALL tweets to be recorded cannot deal with these
types of limitations on a practical basis, and if Twitter doesn't
come up with a better way I can see this hindering its future
revenue streams from businesses like mine that want to build
on a solid and easy-to-use foundation.

So getting back to my question of What do I do now ...

Do I have to put my automated search code on a bunch of
separate servers so the IP's are spread around -- such that
none of them hit the limit of 150 searches per hour?

Seems to me that this is the only realistic way to insure
that I can always retrieve all the matching results I need
without hitting the API limits ... but if you or others have
a better suggestion please let me know, thanks.


On Jul 9, 5:52 pm, Chad Etzel jazzyc...@gmail.com wrote:
 Yep, you gotta do 15 requests at 100 rpp each.
 -Chad

 On Thu, Jul 9, 2009 at 5:45 PM, owkayeowk...@gmail.com wrote:

  I'm building an app that uses the atom search API to retrieve recent
  posts which contain a specific keyword.  The API docs say:

  Clients may request up to 1,500 statuses via the page and rpp
  parameters for the search method.

  But this 1500 hits per search cannot be done in a single request
  because of the rpp limit.  Instead I have to perform 15 sequential
  requests in order to get only 100 items returned on each page ... for
  a total of 1500 items.

  This is certainly a good way to increase the server load, since 15
  connections at 100 results each takes far more server resources than 1
  connection returring all 1500 results.  Therefore I'm wondering if I'm
  misunderstanding something here, or if this is really the only way I
  can get the maximum of 1500 items via atom search?


[twitter-dev] Re: How to insure that all tweets are retrieved in a search?

2009-07-09 Thread owkaye

 You are correct, you have to do 15 requests.  However,
 you can cache the results in your end, so when you come
 back, you are only getting the new stuff.

Thanks Scott.  I'm storing the results in a database on my server but
that doesn't stop the search from retrieving the same results
repetitively, because the search string/terms are still the same.

My problem is going to occur when thousands of people start tweeting
my promo codes every minute and I'm not able to retrieve all those
tweets because of the search API limitations.

If I'm limited to retrieving 1500 tweets every 6 minutes and people
post 1000 tweets every minute I need some way of retrieving the
missing 4500 tweets -- but apparently Twitter doesn't offer anything
even remotely close to this capability -- so I can see where it has a
long way to go before it's ready to support the kind of search
capabilities I need.


 Twitter has pretty good date handling, so you specify
 your last date, and pull forward from there.  You may
 even be able to get the last id of the last tweet you
 pulled, and just tell it to get you all the new ones.

Yep, that's what I'm doing ... pulling from the records I haven't
already retrieved based on the since_id value.

But when the new tweets total more than 1500 in a short time, the
excess tweets will get lost and there's no way to retrieve them --
unless I run my searches from multiple servers to avoid Twitter's ip
address limits -- and doing this would be a real kludge that I'm not
tempted to bother with.


  I'm building an app that uses the atom search API to retrieve recent
  posts which contain a specific keyword.  The API docs say:

  Clients may request up to 1,500 statuses via the page and rpp
  parameters for the search method.

  But this 1500 hits per search cannot be done in a single request
  because of the rpp limit.  Instead I have to perform 15 sequential
  requests in order to get only 100 items returned on each page ... for
  a total of 1500 items.

  This is certainly a good way to increase the server load, since 15
  connections at 100 results each takes far more server resources than 1
  connection returring all 1500 results.  Therefore I'm wondering if I'm
  misunderstanding something here, or if this is really the only way I
  can get the maximum of 1500 items via atom search?

 --
 Scott * If you contact me off list replace talklists@ with scott@ *