[twitter-dev] Re: Best way to implement caching of searches (with multithreading)?

2009-10-02 Thread David Fisher

Really a database is the way to go. Any modern database should allow
you to check if a value is in there before inserting, so the same
tweet won't go in there twice. Additionally, not every user search has
to use the up to the minute results. They can go back just a little in
time (30 seconds or so) to have batches in case there are multiple
people searching for the same thing.

On Oct 2, 7:36 am, Bjoern bjoer...@googlemail.com wrote:
 Hi,

 just wondering about a best practice thing. Suppose I show results of
 specific Twitter searches on a web site. How would I go about caching
 the searches?

 The naive approach seems to be to first check in my own database, then
 do a twitter search with the since_id parameter to only get results I
 don't already have. Then store the results from twitter in the
 database, too, and return the merged results to the web site.

 The problem I see is that if multiple user run the same search on my
 web site, threading issues might occur (as each user starts a separate
 thread on my server). Not only could multiple twitter searches with
 the same since_id be executed (maybe forgivable), but trouble starts
 when said results are to be inserted in my local database. Different
 threads could attempt to insert the same messages into my database.

 One simple solution I could imagine: just use the message ids from
 twitter as the primary key in my local database. That way, multiple
 threads saving the same message would just overwrite the message with
 itself. I actually wonder if that is a common solution - to use the
 twitter ids as primary keys (also for users, direct messages...). I
 have kind of arrived at the opinion that this would be the way of
 least resistance, although I feel a bit uneasy about it.

 An alternative that came to my mind might be to have single threaded
 background jobs do the copying of the search results from twitter to
 my database, and only show the results from my cache to the web site.
 This would cause some lag in the time the search results would appear,
 but it would not be too bad. However, if I have a lot of different
 searches, it would become infeasible to update all of them
 periodically. It would become necessary to only trigger an update when
 a user does the search. At that point things might get overly
 complicated: presumably I would need some kind of Ajax solution to
 trigger the caching with the first request, show a spinner while the
 updating of my local db is going on, and then show the results from my
 local cache/db. The trickiest part being to prevent the starting of
 multiple update tasks for the same search.

 All in all the simple solution might be the better way to go?

 Would be interested in hearing your opinions, experiences and
 solutions!

 Thanks!

 Björn


[twitter-dev] Re: Best way to implement caching of searches (with multithreading)?

2009-10-02 Thread Nelu Lazar

Try memcached.

- @NeluLazar


On Oct 2, 7:36 am, Bjoern bjoer...@googlemail.com wrote:
 Hi,

 just wondering about a best practice thing. Suppose I show results of
 specific Twitter searches on a web site. How would I go about caching
 the searches?

 The naive approach seems to be to first check in my own database, then
 do a twitter search with the since_id parameter to only get results I
 don't already have. Then store the results from twitter in the
 database, too, and return the merged results to the web site.

 The problem I see is that if multiple user run the same search on my
 web site, threading issues might occur (as each user starts a separate
 thread on my server). Not only could multiple twitter searches with
 the same since_id be executed (maybe forgivable), but trouble starts
 when said results are to be inserted in my local database. Different
 threads could attempt to insert the same messages into my database.

 One simple solution I could imagine: just use the message ids from
 twitter as the primary key in my local database. That way, multiple
 threads saving the same message would just overwrite the message with
 itself. I actually wonder if that is a common solution - to use the
 twitter ids as primary keys (also for users, direct messages...). I
 have kind of arrived at the opinion that this would be the way of
 least resistance, although I feel a bit uneasy about it.

 An alternative that came to my mind might be to have single threaded
 background jobs do the copying of the search results from twitter to
 my database, and only show the results from my cache to the web site.
 This would cause some lag in the time the search results would appear,
 but it would not be too bad. However, if I have a lot of different
 searches, it would become infeasible to update all of them
 periodically. It would become necessary to only trigger an update when
 a user does the search. At that point things might get overly
 complicated: presumably I would need some kind of Ajax solution to
 trigger the caching with the first request, show a spinner while the
 updating of my local db is going on, and then show the results from my
 local cache/db. The trickiest part being to prevent the starting of
 multiple update tasks for the same search.

 All in all the simple solution might be the better way to go?

 Would be interested in hearing your opinions, experiences and
 solutions!

 Thanks!

 Björn