[PERFORM] PostgreSQL using the wrong Index
We have two index's like so l1_historical=# \d N_intra_time_idx Index N_intra_time_idx Column |Type +- time | timestamp without time zone btree l1_historical=# \d N_intra_pkey Index N_intra_pkey Column |Type +- symbol | text time | timestamp without time zone unique btree (primary key) and on queries like this select * from N_intra where symbol='SOMETHING WHICH DOESNT EXIST' order by time desc limit 1; PostgreSQL takes a very long time to complete, as it effectively scans the entire table, backwards. And the table is huge, about 450 million rows. (btw, there are no triggers or any other exciting things like that on our tables in this db.) but on things where the symbol does exist in the table, it's more or less fine, and nice and fast. Whilst the option the planner has taken might be faster most of the time, the worst case scenario is unacceptable for obvious reasons. I've googled for trying to force the use of a specific index, but can't find anything relevant. Does anyone have any suggestions on getting it to use an index which hopefully will have better worst case performance? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL using the wrong Index
Oh, we are running 7.4.2 btw. And our random_page_cost = 1 On 13 Jun 2005, at 14:02, Alex Stapleton wrote: We have two index's like so l1_historical=# \d N_intra_time_idx Index N_intra_time_idx Column |Type +- time | timestamp without time zone btree l1_historical=# \d N_intra_pkey Index N_intra_pkey Column |Type +- symbol | text time | timestamp without time zone unique btree (primary key) and on queries like this select * from N_intra where symbol='SOMETHING WHICH DOESNT EXIST' order by time desc limit 1; PostgreSQL takes a very long time to complete, as it effectively scans the entire table, backwards. And the table is huge, about 450 million rows. (btw, there are no triggers or any other exciting things like that on our tables in this db.) but on things where the symbol does exist in the table, it's more or less fine, and nice and fast. Whilst the option the planner has taken might be faster most of the time, the worst case scenario is unacceptable for obvious reasons. I've googled for trying to force the use of a specific index, but can't find anything relevant. Does anyone have any suggestions on getting it to use an index which hopefully will have better worst case performance? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL using the wrong Index
Alex Stapleton [EMAIL PROTECTED] writes: l1_historical=# \d N_intra_pkey Index N_intra_pkey Column |Type +- symbol | text time | timestamp without time zone unique btree (primary key) and on queries like this select * from N_intra where symbol='SOMETHING WHICH DOESNT EXIST' order by time desc limit 1; This was just covered in excruciating detail yesterday ... You need to write order by symbol desc, time desc limit 1 to get the planner to recognize the connection to the sort order of this index. Since you're only selecting one value of symbol, the actual output doesn't change. Oh, we are running 7.4.2 btw. And our random_page_cost = 1 I'll bet lunch that that is a bad selection of random_page_cost, unless your database is so small that it all fits in RAM. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL using the wrong Index
Alex Stapleton wrote: Oh, we are running 7.4.2 btw. And our random_page_cost = 1 Which is only correct if your entire db fits into memory. Also, try updating to a later 7.4 version if at all possible. On 13 Jun 2005, at 14:02, Alex Stapleton wrote: We have two index's like so l1_historical=# \d N_intra_time_idx Index N_intra_time_idx Column |Type +- time | timestamp without time zone btree Just so you are aware, writing this as: We have an index on N_intra(time) and one on N_Intra(symbol, time) is a lot more succinct. l1_historical=# \d N_intra_pkey Index N_intra_pkey Column |Type +- symbol | text time | timestamp without time zone unique btree (primary key) and on queries like this select * from N_intra where symbol='SOMETHING WHICH DOESNT EXIST' order by time desc limit 1; PostgreSQL takes a very long time to complete, as it effectively scans the entire table, backwards. And the table is huge, about 450 million rows. (btw, there are no triggers or any other exciting things like that on our tables in this db.) but on things where the symbol does exist in the table, it's more or less fine, and nice and fast. What happens if you do: SELECT * FROM N_intra WHERE symbol='doesnt exist' ORDER BY symbol, time DESC LIMIT 1; Yes, symbol is constant, but it frequently helps the planner realize it can use an index scan if you include all terms in the index in the ORDER BY clause. Whilst the option the planner has taken might be faster most of the time, the worst case scenario is unacceptable for obvious reasons. I've googled for trying to force the use of a specific index, but can't find anything relevant. Does anyone have any suggestions on getting it to use an index which hopefully will have better worst case performance? Try the above first. You could also create a new index on symbol CREATE INDEX N_intra_symbol_idx ON N_intra(symbol); Then the WHERE clause should use the symbol index, which means it can know quickly that an entry doesn't exist. I'm not sure how many entries you have per symbol, though, so this might cause problems in the ORDER BY time portion. I'm guessing what you really want is to just do the ORDER BY symbol, time. John =:- signature.asc Description: OpenPGP digital signature
Re: [PERFORM] PostgreSQL using the wrong Index
On 13 Jun 2005, at 15:47, John A Meinel wrote: Alex Stapleton wrote: Oh, we are running 7.4.2 btw. And our random_page_cost = 1 Which is only correct if your entire db fits into memory. Also, try updating to a later 7.4 version if at all possible. I am aware of this, I didn't configure this machine though unfortuantely. On 13 Jun 2005, at 14:02, Alex Stapleton wrote: We have two index's like so l1_historical=# \d N_intra_time_idx Index N_intra_time_idx Column |Type +- time | timestamp without time zone btree Just so you are aware, writing this as: We have an index on N_intra(time) and one on N_Intra(symbol, time) is a lot more succinct. Sorry, I happened to have them there in my clipboard at the time so I just blindly pasted them in. l1_historical=# \d N_intra_pkey Index N_intra_pkey Column |Type +- symbol | text time | timestamp without time zone unique btree (primary key) and on queries like this select * from N_intra where symbol='SOMETHING WHICH DOESNT EXIST' order by time desc limit 1; PostgreSQL takes a very long time to complete, as it effectively scans the entire table, backwards. And the table is huge, about 450 million rows. (btw, there are no triggers or any other exciting things like that on our tables in this db.) but on things where the symbol does exist in the table, it's more or less fine, and nice and fast. What happens if you do: SELECT * FROM N_intra WHERE symbol='doesnt exist' ORDER BY symbol, time DESC LIMIT 1; Hurrah! I should of thought of this, considering i've done it in the past :) Thanks a lot, that's great. Yes, symbol is constant, but it frequently helps the planner realize it can use an index scan if you include all terms in the index in the ORDER BY clause. Whilst the option the planner has taken might be faster most of the time, the worst case scenario is unacceptable for obvious reasons. I've googled for trying to force the use of a specific index, but can't find anything relevant. Does anyone have any suggestions on getting it to use an index which hopefully will have better worst case performance? Try the above first. You could also create a new index on symbol CREATE INDEX N_intra_symbol_idx ON N_intra(symbol); Then the WHERE clause should use the symbol index, which means it can know quickly that an entry doesn't exist. I'm not sure how many entries you have per symbol, though, so this might cause problems in the ORDER BY time portion. I'm guessing what you really want is to just do the ORDER BY symbol, time. John =:- ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] PostgreSQL using the wrong Index
Tom Lane wrote: This was just covered in excruciating detail yesterday ... You need to write order by symbol desc, time desc limit 1 to get the planner to recognize the connection to the sort order of this index. Since you're only selecting one value of symbol, the actual output doesn't change. Is this the right behavior (not a bug)? Is postgresql planning on changing this soon? Thanks Wei ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings