Thanks for the support and tips! The app is actually live but very limited at the moment due to me just relying on the basic API query. I'll be implementing this major upgrade tonight and then I will follow up here. I don't want my thunder to be stolen on what I think is a brilliant use of twitter data.
On Apr 7, 11:37 am, Doug Williams <[email protected]> wrote: > Matt, > You can easily combine steps 3 and 4 by using the tweet's status_id as the > primary key in the table you are using to store the updates. Any duplicates > will be implicitly rejected by the database upon insertion. > > And as Chad said, please share when you can :) > > Thanks, > Doug Williams > Twitter API Supporthttp://twitter.com/dougw > > On Tue, Apr 7, 2009 at 8:04 AM, Chad Etzel <[email protected]> wrote: > > > Hi Matt, > > > Yep, that sounds like a very reasonable plan of action. 20 calls/hr is > > way below the Search rate-limit threshold, so you're fine there. > > > If there is some niche overlap with some of your keywords, you might > > be able to preemptively filter out some tweets by using the "not" > > filter w/ the Search API. > > > example: > > I want tweets about Apple (computers), but not people talking about > > "apples and oranges". > > > I can search for "apple -orange -oranges" which will eliminate tweets > > with "orange" or "oranges" from the results. > > > Can you share your app? I'm interested to see what it's doing :) > > > -Chad > > > On Tue, Apr 7, 2009 at 9:50 AM, Matt <[email protected]> wrote: > > > > Hello all, > > > > My app has quickly outgrown the basic functionality of the search API. > > > I am searching for a few specific hash tags along with some keywords > > > to pull out a niche group of tweets. This results in about a 35% > > > return of non-related tweets that are cluttering up my results. I am > > > not looking into a method to cache these results to a database so I > > > can do some advanced filtering and not hammer the api with requests. > > > > I'm kinda "talking outloud", but here is what I'm thinking. (PHP > > > MySQL) > > > > 1. I have 4 main queries that I need to run to get all results. > > > 2. Setup a cron job that'll query the api hourly. > > > 3. Parse through the results and store into temp database > > > 4. Check for duplicate tweets since more than 1 query may return the > > > same result > > > 5. Since we can only return 100 results at a time I'll have to loop > > > through pages and make additional queries. I don't think this will > > > happen much as my search is very niche. Count tweets return vs > > > expected results to detect end of results. > > > 6. I'll store since_id in a config table as to not return redundant > > > tweets. > > > > Once the data is in the database I should easily be able to filter out > > > most of the spam using other methods not available through the search > > > api. This should also make twitter happy as I am cutting down on api > > > request drastically. Even if I bump my cron up to 15mins I would only > > > be making < 20 calls an hour. Does this sound like a reasonable basic > > > plan? Is there anything I am overlooking? > > > > Thanks for reading!
