On Mon, Apr 20, 2009 at 5:59 PM, Joseph <[email protected]> wrote:
> > This may not be the best thing to do in the case of statuses. > Optimization implies that you have two tables (minimum), one for the > user info, and one for the tweets. Doing a batch update, means that > you're skipping the step of checking to see if the user is already in > the database, so for every tweet, you will add the same user again. > That will you will slow you down much more than the batch advantage, > and will create confusion (unless you store all in one table, and > that's even more burdensome). There are a couple of ways to deal with this. Given sufficient memory, keep a hash of userIDs in memory and only insert the new ones. If memory consumption is a problem, assuming that the userID as the primary key in the user table, do an INSERT IGNORE for all of the users. With userID indexed, that will be quite fast. It won't be that simple if you have foreign key constraints, but I can't imagine referential integrity is critical for this sort of application. My system is far more constrained by things other than the insert speeds. Nick
