Thank you very much for your answers guys!
On Mon, Nov 4, 2013 at 5:15 PM, Jeff Janes <jeff.ja...@gmail.com> wrote: > On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro <casimiro.lis...@gmail.com> > wrote: > >> Hello all, >> >> I have one query running at ~ 7 seconds and I would like to know if it's >> possible to make it run faster, once this query runs lots of time in my >> experiment. >> > > > Do you mean you want it to be fast because it runs many times, or that you > want it to become fast after it runs many times (i.e. once the data is > fully cached)? The plan you show takes 24 seconds, not 7 seconds. > I want it to be fast because it runs many times. I have an experiment that evaluates recommendation algorithms for a set of twitter users. This query returns recommendation candidates so it is called a lot of times for different users and time intervals. > > >> >> Basically the query return the topics of tweets published by users that >> the user N follows and that are published between D1 and D2. >> >> *Query*: >> >> SELECT tt.tweet_id, tt.topic, tt.topic_value >> FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id = >> t.id >> WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in >> (SELECT followed_id FROM relationship WHERE follower_id = N) >> ORDER BY tt.tweet_id; >> > > > I don't know if this affects the plan at all, but it is silly to do a left > join to "tweet" when the WHERE clause has conditions that can't be > satisfied with a null row. Also, you could try changing the IN-list to an > EXISTS subquery. > I'm sorry the ignorance, but I don't understand the issue with the left join, could you explain more? > Is there some patterns to D1 and D2 that could help the caching? For > example, are they both usually in the just-recent past? > The only pattern is that it is always a one day interval, e.g. D1 = '2013-05-01' and D2 = '2013-05-02'. > > > Indexes: >> "tweet_plk" PRIMARY KEY, btree (id) CLUSTER >> "tweet_creation_time_index" btree (creation_time) >> "tweet_id_index" hash (id) >> "tweet_ios_index" btree (id, user_id, creation_time) >> "tweet_retweeted_idx" hash (retweeted) >> "tweet_user_id_creation_time_index" btree (creation_time, user_id) >> "tweet_user_id_index" hash (user_id) >> > > > Are all of those indexes important? If your table is heavily > updated/inserted, which I assume it is, maintaining those indexes is going > to take up precious RAM that could probably be better used elsewhere. > Probably not. But once this database is read only, the quantity of index grew following my desperation. =) > > Cheers, > > Jeff > Thank you very much again! Caio