Looks like I am ready to show off my project even though it is a work in progress. I love open development and since this project is for fun, not money, I figured I'd open it up and get some feedback. I'm doing a few broad searches for tweets that appear to be personal forsale or classified ads. I know a few other sites are doing this, but I'm taking a different approach by not just looking for certain hashtags, although they are weighted highly. All tweets are stored locally so I can run them through a few spam killers, which still need some work. I also just started gathering location data, although it hasn't been implemented into the search yet.
http://www.twitshop.com On Apr 7, 11:37 am, Doug Williams <[email protected]> wrote: > Matt, > You can easily combine steps 3 and 4 by using the tweet's status_id as the > primary key in the table you are using to store the updates. Any duplicates > will be implicitly rejected by the database upon insertion. > > And as Chad said, please share when you can :) > > Thanks, > Doug Williams > Twitter API Supporthttp://twitter.com/dougw > > On Tue, Apr 7, 2009 at 8:04 AM, Chad Etzel <[email protected]> wrote: > > > Hi Matt, > > > Yep, that sounds like a very reasonable plan of action. 20 calls/hr is > > way below the Search rate-limit threshold, so you're fine there. > > > If there is some niche overlap with some of your keywords, you might > > be able to preemptively filter out some tweets by using the "not" > > filter w/ the Search API. > > > example: > > I want tweets about Apple (computers), but not people talking about > > "apples and oranges". > > > I can search for "apple -orange -oranges" which will eliminate tweets > > with "orange" or "oranges" from the results. > > > Can you share your app? I'm interested to see what it's doing :) > > > -Chad > > > On Tue, Apr 7, 2009 at 9:50 AM, Matt <[email protected]> wrote: > > > > Hello all, > > > > My app has quickly outgrown the basic functionality of the search API. > > > I am searching for a few specific hash tags along with some keywords > > > to pull out a niche group of tweets. This results in about a 35% > > > return of non-related tweets that are cluttering up my results. I am > > > not looking into a method to cache these results to a database so I > > > can do some advanced filtering and not hammer the api with requests. > > > > I'm kinda "talking outloud", but here is what I'm thinking. (PHP > > > MySQL) > > > > 1. I have 4 main queries that I need to run to get all results. > > > 2. Setup a cron job that'll query the api hourly. > > > 3. Parse through the results and store into temp database > > > 4. Check for duplicate tweets since more than 1 query may return the > > > same result > > > 5. Since we can only return 100 results at a time I'll have to loop > > > through pages and make additional queries. I don't think this will > > > happen much as my search is very niche. Count tweets return vs > > > expected results to detect end of results. > > > 6. I'll store since_id in a config table as to not return redundant > > > tweets. > > > > Once the data is in the database I should easily be able to filter out > > > most of the spam using other methods not available through the search > > > api. This should also make twitter happy as I am cutting down on api > > > request drastically. Even if I bump my cron up to 15mins I would only > > > be making < 20 calls an hour. Does this sound like a reasonable basic > > > plan? Is there anything I am overlooking? > > > > Thanks for reading!
