[PERFORM] Large # of rows in query extremely slow, not using index
Does postgres cache the entire result set before it begins returning data to the client? I have a table with ~8 million rows and I am executing a query which should return about ~800,000 rows. The problem is that as soon as I execute the query it absolutely kills my machine and begins swapping for 5 or 6 minutes before it begins returning results. Is postgres trying to load the whole query into memory before returning anything? Also, why would it choose not to use the index? It is properly estimating the # of rows returned. If I set enable_seqscan to off it is just as slow. Running postgres 8.0 beta2 dev2 explain select * from island_history where date='2004-09-07' and stock='QQQ'; QUERY PLAN --- Seq Scan on island_history (cost=0.00..266711.23 rows=896150 width=83) Filter: ((date = '2004-09-07'::date) AND ((stock)::text = 'QQQ'::text)) (2 rows) Any help would be appreciated --Stephen Table public.island_history Column | Type | Modifiers --++--- date | date | not null stock| character varying(6) | time | time without time zone | not null reference_number | numeric(9,0) | not null message_type | character(1) | not null buy_sell_ind | character(1) | shares | numeric(6,0) | remaining_shares | numeric(6,0) | price| numeric(10,4) | display | character(1) | match_number | numeric(9,0) | not null Indexes: island_history_pkey PRIMARY KEY, btree (date, reference_number, message_type, time, match_number) island_history_date_stock_time btree (date, stock, time) island_history_oid btree (oid) ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Large # of rows in query extremely slow, not using index
On Mon, 13 Sep 2004 21:11:07 -0400, Tom Lane [EMAIL PROTECTED] wrote: Stephen Crowley [EMAIL PROTECTED] writes: Does postgres cache the entire result set before it begins returning data to the client? The backend doesn't, but libpq does, and I think JDBC does too. I'd recommend using a cursor so you can FETCH a reasonable number of rows at a time. That is incredible. Why would libpq do such a thing? JDBC as well? I know oracle doesn't do anything like that, not sure about mysql. Is there any way to turn it off? In this case I was just using psql but will be using JDBC for the app. About cursors, I thought a jdbc ResultSet WAS a cursor, am I mistaken? Thanks, Stephen ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Large # of rows in query extremely slow, not using index
Problem solved.. I set the fetchSize to a reasonable value instead of the default of unlimited in the PreparedStatement and now the query is . After some searching it seeems this is a common problem, would it make sense to change the default value to something other than 0 in the JDBC driver? If I get some extra time I'll look into libpq and see what is required to fix the API. Most thirdparty programs and existing JDBC apps won't work with the current paradigm when returning large result sets. Thanks, Stephen On Mon, 13 Sep 2004 21:49:14 -0400, Tom Lane [EMAIL PROTECTED] wrote: Stephen Crowley [EMAIL PROTECTED] writes: On Mon, 13 Sep 2004 21:11:07 -0400, Tom Lane [EMAIL PROTECTED] wrote: Stephen Crowley [EMAIL PROTECTED] writes: Does postgres cache the entire result set before it begins returning data to the client? The backend doesn't, but libpq does, and I think JDBC does too. That is incredible. Why would libpq do such a thing? Because the API it presents doesn't allow for the possibility of query failure after having given you back a PGresult: either you have the whole result available with no further worries, or you don't. If you think it's incredible, let's see you design an equally easy-to-use API that doesn't make this assumption. (Now having said that, I would have no objection to someone extending libpq to offer an alternative streaming API for query results. It hasn't got to the top of anyone's to-do list though ... and I'm unconvinced that psql could use it if it did exist.) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Large # of rows in query extremely slow, not using
Ok.. now I ran VACUUM FULL' and things seem to be working as they should.. explain analyze select * from history where date='2004-09-07' and stock='MSFT'; Seq Scan on island_history (cost=0.00..275359.13 rows=292274 width=83) (actual time=50.000..411683.000 rows=265632 loops=1) Filter: ((date = '2004-09-07'::date) AND ((stock)::text = 'MSFT'::text)) Total runtime: 412703.000 ms random_page_cost and effective_cache_size are both default, 8 and 1000 explain analyze select * from history where date='2004-09-07' and stock='ORCL'; Index Scan using island_history_date_stock_time on island_history (cost=0.00..181540.07 rows=102166 width=83) (actual time=551.000..200268.000 rows=159618 loops=1) Index Cond: ((date = '2004-09-07'::date) AND ((stock)::text = 'ORCL'::text)) Total runtime: 201009.000 ms So now this in all in proportion and works as expected.. the question is, why would the fact that it needs to be vaccumed cause such a huge hit in performance? When i vacuumed it did free up nearly 25% of the space. --Stephen On Fri, 17 Sep 2004 22:44:05 +0200, Manfred Koizar [EMAIL PROTECTED] wrote: On Thu, 16 Sep 2004 20:51:11 -0500, Stephen Crowley [EMAIL PROTECTED] wrote: explain analyze select * from history where date='2004-09-07' and stock='ORCL' LIMIT 10; - Index Scan using island_history_date_stock_time on island_history (cost=0.00..183099.72 rows=102166 width=83) (actual time=1612.000..1702.000 rows=10 loops=1) ^^ LIMIT 10 hides what would be the most interesting info here. I don't believe that EXPLAIN ANALYSE SELECT * FROM history WHERE ... consumes lots of memory. Please try it. And when you post the results please include your Postgres version, some info about hardware and OS, and your non-default settings, especially random_page_cost and effective_cache_size. May I guess that the correlation of the physical order of tuples in your table to the contents of the date column is pretty good (examine correlation in pg_stats) and that island_history_date_stock_time is a 3-column index? It is well known that the optimizer overestimates the cost of index scans in those situations. This can be compensated to a certain degree by increasing effective_cache_size and/or decreasing random_page_cost (which might harm other planner decisions). You could also try CREATE INDEX history_date_stock ON history(date, stock); This will slow down INSERTs and UPDATEs, though. Servus Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Large # of rows in query extremely slow, not using index
Thanks for the explanation. So what sort of changes need to be made to the client/server protocol to fix this problem? On Thu, 23 Sep 2004 18:22:15 -0500 (EST), Kris Jurka [EMAIL PROTECTED] wrote: On Tue, 14 Sep 2004, Stephen Crowley wrote: Problem solved.. I set the fetchSize to a reasonable value instead of the default of unlimited in the PreparedStatement and now the query is . After some searching it seeems this is a common problem, would it make sense to change the default value to something other than 0 in the JDBC driver? In the JDBC driver, setting the fetch size to a non-zero value means that the query will be run using what the frontend/backend protocol calls a named statement. What this means on the backend is that the planner will not be able to use the values from the query parameters to generate the optimum query plan and must use generic placeholders and create a generic plan. For this reason we have decided not to default to a non-zero fetch size. This is something whose default value could be set by a URL parameter if you think that is something that is really required. Kris Jurka ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]