[twitter-dev] Re: Cache search results and advanced search

Matt Mon, 20 Apr 2009 13:06:55 -0700

Looks like I am ready to show off my project even though it is a work
in progress. I love open development and since this project is for
fun, not money, I figured I'd open it up and get some feedback. I'm
doing a few broad searches for tweets that appear to be personal
forsale or classified ads. I know a few other sites are doing this,
but I'm taking a different approach by not just looking for certain
hashtags, although they are weighted highly. All tweets are stored
locally so I can run them through a few spam killers, which still need
some work. I also just started gathering location data, although it
hasn't been implemented into the search yet.


http://www.twitshop.com


On Apr 7, 11:37 am, Doug Williams <[email protected]> wrote:
> Matt,
> You can easily combine steps 3 and 4 by using the tweet's status_id as the
> primary key in the table you are using to store the updates. Any duplicates
> will be implicitly rejected by the database upon insertion.
>
> And as Chad said, please share when you can :)
>
> Thanks,
> Doug Williams
> Twitter API Supporthttp://twitter.com/dougw
>
> On Tue, Apr 7, 2009 at 8:04 AM, Chad Etzel <[email protected]> wrote:
>
> > Hi Matt,
>
> > Yep, that sounds like a very reasonable plan of action. 20 calls/hr is
> > way below the Search rate-limit threshold, so you're fine there.
>
> > If there is some niche overlap with some of your keywords, you might
> > be able to preemptively filter out some tweets by using the "not"
> > filter w/ the Search API.
>
> > example:
> > I want tweets about Apple (computers), but not people talking about
> > "apples and oranges".
>
> > I can search for "apple -orange -oranges" which will eliminate tweets
> > with "orange" or "oranges" from the results.
>
> > Can you share your app?  I'm interested to see what it's doing :)
>
> > -Chad
>
> > On Tue, Apr 7, 2009 at 9:50 AM, Matt <[email protected]> wrote:
>
> > > Hello all,
>
> > > My app has quickly outgrown the basic functionality of the search API.
> > > I am searching for a few specific hash tags along with some keywords
> > > to pull out a niche group of tweets. This results in about a 35%
> > > return of non-related tweets that are cluttering up my results. I am
> > > not looking into a method to cache these results to a database so I
> > > can do some advanced filtering and not hammer the api with requests.
>
> > > I'm kinda "talking outloud", but here is what I'm thinking. (PHP
> > > MySQL)
>
> > > 1. I have 4 main queries that I need to run to get all results.
> > > 2. Setup a cron job that'll query the api hourly.
> > > 3. Parse through the results and store into temp database
> > > 4. Check for duplicate tweets since more than 1 query may return the
> > > same result
> > > 5. Since we can only return 100 results at a time I'll have to loop
> > > through pages and make additional queries. I don't think this will
> > > happen much as my search is very niche. Count tweets return vs
> > > expected results to detect end of results.
> > > 6. I'll store since_id in a config table as to not return redundant
> > > tweets.
>
> > > Once the data is in the database I should easily be able to filter out
> > > most of the spam using other methods not available through the search
> > > api. This should also make twitter happy as I am cutting down on api
> > > request drastically. Even if I bump my cron up to 15mins I would only
> > > be making < 20 calls an hour. Does this sound like a reasonable basic
> > > plan? Is there anything I am overlooking?
>
> > > Thanks for reading!

[twitter-dev] Re: Cache search results and advanced search

Reply via email to