just wondering about a best practice thing. Suppose I show results of
specific Twitter searches on a web site. How would I go about caching
The naive approach seems to be to first check in my own database, then
do a twitter search with the since_id parameter to only get results I
don't already have. Then store the results from twitter in the
database, too, and return the merged results to the web site.
The problem I see is that if multiple user run the same search on my
web site, threading issues might occur (as each user starts a separate
thread on my server). Not only could multiple twitter searches with
the same since_id be executed (maybe forgivable), but trouble starts
when said results are to be inserted in my local database. Different
threads could attempt to insert the same messages into my database.
One simple solution I could imagine: just use the message ids from
twitter as the primary key in my local database. That way, multiple
threads saving the same message would just overwrite the message with
itself. I actually wonder if that is a common solution - to use the
twitter ids as primary keys (also for users, direct messages...). I
have kind of arrived at the opinion that this would be the way of
least resistance, although I feel a bit uneasy about it.
An alternative that came to my mind might be to have single threaded
background jobs do the copying of the search results from twitter to
my database, and only show the results from my cache to the web site.
This would cause some lag in the time the search results would appear,
but it would not be too bad. However, if I have a lot of different
searches, it would become infeasible to update all of them
periodically. It would become necessary to only trigger an update when
a user does the search. At that point things might get overly
complicated: presumably I would need some kind of Ajax solution to
trigger the "caching" with the first request, show a spinner while the
updating of my local db is going on, and then show the results from my
local cache/db. The trickiest part being to prevent the starting of
multiple update tasks for the same search.
All in all the simple solution might be the better way to go?
Would be interested in hearing your opinions, experiences and