Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

Caio Casimiro Mon, 04 Nov 2013 12:45:28 -0800

Thank you very much for your answers guys!


On Mon, Nov 4, 2013 at 5:15 PM, Jeff Janes <[email protected]> wrote:

> On Sun, Nov 3, 2013 at 2:05 PM, Caio Casimiro <[email protected]>
>  wrote:
>
>> Hello all,
>>
>> I have one query running at ~ 7 seconds and I would like to know if it's
>> possible to make it run faster, once this query runs lots of time in my
>> experiment.
>>
>
>
> Do you mean you want it to be fast because it runs many times, or that you
> want it to become fast after it runs many times (i.e. once the data is
> fully cached)?  The plan you show takes 24 seconds, not 7 seconds.
>

I want it to be fast because it runs many times. I have an experiment that
evaluates recommendation algorithms  for a set of twitter users. This query
returns recommendation candidates so it is called a lot of times for
different users and time intervals.


>
>
>>
>> Basically the query return the topics of tweets published by users that
>> the user N follows and that are published between D1 and D2.
>>
>> *Query*:
>>
>> SELECT tt.tweet_id, tt.topic, tt.topic_value
>>             FROM tweet_topic AS tt LEFT JOIN tweet AS t ON tt.tweet_id =
>> t.id
>>             WHERE creation_time BETWEEN 'D1' AND 'D2' AND user_id in
>>             (SELECT followed_id FROM relationship WHERE follower_id = N)
>> ORDER BY tt.tweet_id;
>>
>
>
> I don't know if this affects the plan at all, but it is silly to do a left
> join to "tweet" when the WHERE clause has conditions that can't be
> satisfied with a null row.  Also, you could try changing the IN-list to an
> EXISTS subquery.
>

I'm sorry the ignorance, but I don't understand the issue with the left
join, could you explain more?


> Is there some patterns to D1 and D2 that could help the caching?  For
> example, are they both usually in the just-recent past?
>
The only pattern is that it is always a one day interval, e.g. D1 =
'2013-05-01' and  D2 = '2013-05-02'.

>
>
> Indexes:
>>     "tweet_plk" PRIMARY KEY, btree (id) CLUSTER
>>     "tweet_creation_time_index" btree (creation_time)
>>     "tweet_id_index" hash (id)
>>     "tweet_ios_index" btree (id, user_id, creation_time)
>>     "tweet_retweeted_idx" hash (retweeted)
>>     "tweet_user_id_creation_time_index" btree (creation_time, user_id)
>>     "tweet_user_id_index" hash (user_id)
>>
>
>
> Are all of those indexes important?  If your table is heavily
> updated/inserted, which I assume it is, maintaining those indexes is going
> to take up precious RAM that could probably be better used elsewhere.
>

Probably not. But once this database is read only, the quantity of index
grew following my desperation. =)

>
> Cheers,
>
> Jeff
>

Thank you very much again!
Caio

Re: [PERFORM] Slow index scan on B-Tree index over timestamp field

Reply via email to