Re: [PERFORM] Sequential scan instead of index scan

Ioannis Anagnostopoulos Mon, 06 Aug 2012 08:25:32 -0700

They are random as the data are coming from multiple threads that areinserting in the database. I see what you say about "linking them", andI may give it a try with the date. The other think that "links" themtogether is the 4 georef fields, however at that stage I am trying tocollect statistics on the georefs population of "msg_id" so I don't knowbefore hand the values to limit my query on them... Do you think anindex on "date, msg_id" might do something?


Yiannis


On 06/08/2012 16:16, David Barton wrote:

Hi Yiannis,
Is there anything linking these ids together, or are the relativelyrandom? If they are relatively random, the rows are likely to besprinkled amongst many blocks and so a seq scan is the fastest. I'veseen similar problems with indexed queries in a multi-tennant databasewhere the data is so fragmented that once the record volume hits acertain threshold, Postgres decides to table scan rather than use anindex.
The query optimiser is unlikely to be able to determine the disklocality of 300k rows and so it just takes a punt on a seq scan.
If you added another filter condition on something indexed e.g. lastweek or last month or location or something, you might do better ifthe data does exhibit disk locality. If the data really is scattered,then a seq scan really will be quicker.
Regards, David

On 06/08/12 23:08, Ioannis Anagnostopoulos wrote:
Hi, my query is very simple:

select
            msg_id,
            msg_type,
            ship_pos_messages.pos_georef1,
            ship_pos_messages.pos_georef2,
            ship_pos_messages.pos_georef3,
            ship_pos_messages.pos_georef4,
            obj_id,
            ship_speed,
            ship_heading,
            ship_course,
            pos_point
        from
            feed_all_y2012m08.ship_pos_messages
        where
            extract('day' from msg_date_rec) = 1
            AND msg_id = any(ARRAY[7294724,14174174,22254408]);
The msg_id is the pkey on the ship_pos_messages table and in thisexample it is working fast as it uses the pkey (primary key index) tomake the selection. The expplain anayze follows:"Result (cost=0.00..86.16 rows=5 width=117) (actualtime=128.734..163.319 rows=3 loops=1)"" -> Append (cost=0.00..86.16 rows=5 width=117) (actualtime=128.732..163.315 rows=3 loops=1)"" -> Seq Scan on ship_pos_messages (cost=0.00..0.00 rows=1width=100) (actual time=0.001..0.001 rows=0 loops=1)"" Filter: ((msg_id = ANY('{7294724,14174174,22254408}'::integer[])) AND(date_part('day'::text, msg_date_rec) = 1::double precision))"" -> Seq Scan on ship_a_pos_messages ship_pos_messages(cost=0.00..0.00 rows=1 width=100) (actual time=0.000..0.000 rows=0loops=1)"" Filter: ((msg_id = ANY('{7294724,14174174,22254408}'::integer[])) AND(date_part('day'::text, msg_date_rec) = 1::double precision))"" -> Bitmap Heap Scan on ship_b_std_pos_messagesship_pos_messages (cost=13.41..25.42 rows=1 width=128) (actualtime=49.127..49.127 rows=0 loops=1)"" Recheck Cond: (msg_id = ANY('{7294724,14174174,22254408}'::integer[]))"" Filter: (date_part('day'::text, msg_date_rec) =1::double precision)"" -> Bitmap Index Scan on ship_b_std_pos_messages_pkey(cost=0.00..13.41 rows=3 width=0) (actual time=49.125..49.125 rows=0loops=1)"" Index Cond: (msg_id = ANY('{7294724,14174174,22254408}'::integer[]))"" -> Bitmap Heap Scan on ship_b_ext_pos_messagesship_pos_messages (cost=12.80..24.62 rows=1 width=128) (actualtime=0.029..0.029 rows=0 loops=1)"" Recheck Cond: (msg_id = ANY('{7294724,14174174,22254408}'::integer[]))"" Filter: (date_part('day'::text, msg_date_rec) =1::double precision)"" -> Bitmap Index Scan on ship_b_ext_pos_messages_pkey(cost=0.00..12.80 rows=3 width=0) (actual time=0.027..0.027 rows=0loops=1)"" Index Cond: (msg_id = ANY('{7294724,14174174,22254408}'::integer[]))"" -> Bitmap Heap Scan on ship_a_pos_messages_wk0ship_pos_messages (cost=24.08..36.12 rows=1 width=128) (actualtime=79.572..114.152 rows=3 loops=1)"" Recheck Cond: (msg_id = ANY('{7294724,14174174,22254408}'::integer[]))"" Filter: (date_part('day'::text, msg_date_rec) =1::double precision)"" -> Bitmap Index Scan on ship_a_pos_messages_wk0_pkey(cost=0.00..24.08 rows=3 width=0) (actual time=67.441..67.441 rows=3loops=1)"" Index Cond: (msg_id = ANY('{7294724,14174174,22254408}'::integer[]))"
"Total runtime: 180.146 ms"
I think this is a pretty good plan and quite quick given the size ofthe table (88Million rows at present). However in real life theparameter where I search for msg_id is not an array of 3 ids but of300.000 or more. It is then that the query forgets the plan and goesto sequential scan. Is there any way around? Or is this the best Ican have?
Kind Regards
Yiannis

Re: [PERFORM] Sequential scan instead of index scan

Reply via email to