[PERFORM] Large # of rows in query extremely slow, not using index

2004-09-13 Thread Stephen Crowley
Does postgres cache the entire result set before it begins returning
data to the client?

I have a table with ~8 million rows and I am executing a query which
should return about ~800,000 rows. The problem is that as soon as I
execute the query it absolutely kills my machine and begins swapping
for 5 or 6 minutes before it begins returning results. Is postgres
trying to load the whole query into memory before returning anything?
Also, why would it choose not to use the index? It is properly
estimating the # of rows returned. If I set enable_seqscan to off it
is just as slow.

Running postgres 8.0 beta2 dev2

explain select * from island_history where date='2004-09-07' and stock='QQQ';
QUERY PLAN
---
 Seq Scan on island_history  (cost=0.00..266711.23 rows=896150 width=83)
   Filter: ((date = '2004-09-07'::date) AND ((stock)::text = 'QQQ'::text))
(2 rows)

Any help would be appreciated

--Stephen

 Table public.island_history
  Column  |  Type  | Modifiers
--++---
 date | date   | not null
 stock| character varying(6)   |
 time | time without time zone | not null
 reference_number | numeric(9,0)   | not null
 message_type | character(1)   | not null
 buy_sell_ind | character(1)   |
 shares   | numeric(6,0)   |
 remaining_shares | numeric(6,0)   |
 price| numeric(10,4)  |
 display  | character(1)   |
 match_number | numeric(9,0)   | not null
Indexes:
island_history_pkey PRIMARY KEY, btree (date, reference_number,
message_type, time, match_number)
island_history_date_stock_time btree (date, stock, time)
island_history_oid btree (oid)

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-13 Thread Stephen Crowley
On Mon, 13 Sep 2004 21:11:07 -0400, Tom Lane [EMAIL PROTECTED] wrote:
 Stephen Crowley [EMAIL PROTECTED] writes:
  Does postgres cache the entire result set before it begins returning
  data to the client?
 
 The backend doesn't, but libpq does, and I think JDBC does too.
 
 I'd recommend using a cursor so you can FETCH a reasonable number of
 rows at a time.

That is incredible. Why would libpq do such a thing? JDBC as well? I
know oracle doesn't do anything like that, not sure about mysql. Is
there any way to turn it off? In this case I was just using psql but
will be using JDBC for the app.  About cursors, I thought a jdbc
ResultSet WAS a cursor, am I mistaken?

Thanks,
Stephen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-14 Thread Stephen Crowley
Problem solved.. I set the fetchSize to a reasonable value instead of
the default of unlimited  in the PreparedStatement and now the query
is . After some searching it seeems this is a common problem, would it
make sense to change the default value to something other than 0 in
the JDBC driver?

If I get some extra time I'll look into libpq and see what is required
to fix the API. Most thirdparty programs and existing JDBC apps won't
work with the current paradigm when returning large result sets.

Thanks,
Stephen



On Mon, 13 Sep 2004 21:49:14 -0400, Tom Lane [EMAIL PROTECTED] wrote:
 Stephen Crowley [EMAIL PROTECTED] writes:
  On Mon, 13 Sep 2004 21:11:07 -0400, Tom Lane [EMAIL PROTECTED] wrote:
  Stephen Crowley [EMAIL PROTECTED] writes:
  Does postgres cache the entire result set before it begins returning
  data to the client?
 
  The backend doesn't, but libpq does, and I think JDBC does too.
 
  That is incredible. Why would libpq do such a thing?
 
 Because the API it presents doesn't allow for the possibility of query
 failure after having given you back a PGresult: either you have the
 whole result available with no further worries, or you don't.
 If you think it's incredible, let's see you design an equally
 easy-to-use API that doesn't make this assumption.
 
 (Now having said that, I would have no objection to someone extending
 libpq to offer an alternative streaming API for query results.  It
 hasn't got to the top of anyone's to-do list though ... and I'm
 unconvinced that psql could use it if it did exist.)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Large # of rows in query extremely slow, not using

2004-09-17 Thread Stephen Crowley
Ok.. now I ran VACUUM FULL' and things seem to be working as they should.. 

explain analyze select * from history where date='2004-09-07' and stock='MSFT';

Seq Scan on island_history  (cost=0.00..275359.13 rows=292274
width=83) (actual time=50.000..411683.000 rows=265632 loops=1)
  Filter: ((date = '2004-09-07'::date) AND ((stock)::text = 'MSFT'::text))
Total runtime: 412703.000 ms

random_page_cost and effective_cache_size are both default, 8 and 1000

explain analyze select * from history where date='2004-09-07' and stock='ORCL';

Index Scan using island_history_date_stock_time on island_history 
(cost=0.00..181540.07 rows=102166 width=83) (actual
time=551.000..200268.000 rows=159618 loops=1)
  Index Cond: ((date = '2004-09-07'::date) AND ((stock)::text = 'ORCL'::text))
Total runtime: 201009.000 ms

So now  this in all in proportion and works as expected.. the question
is, why would the fact that it needs to be vaccumed cause such a huge
hit in performance? When i vacuumed it did free up nearly 25% of the
space.

--Stephen

On Fri, 17 Sep 2004 22:44:05 +0200, Manfred Koizar [EMAIL PROTECTED] wrote:
 On Thu, 16 Sep 2004 20:51:11 -0500, Stephen Crowley
 [EMAIL PROTECTED] wrote:
 explain analyze select * from history where date='2004-09-07' and
 stock='ORCL' LIMIT 10;
 
   -  Index Scan using island_history_date_stock_time on
 island_history  (cost=0.00..183099.72 rows=102166 width=83) (actual
 time=1612.000..1702.000 rows=10 loops=1)
   ^^
 LIMIT 10 hides what would be the most interesting info here.  I don't
 believe that
 EXPLAIN ANALYSE SELECT * FROM history WHERE ...
 consumes lots of memory.  Please try it.
 
 And when you post the results please include your Postgres version, some
 info about hardware and OS, and your non-default settings, especially
 random_page_cost and effective_cache_size.
 
 May I guess that the correlation of the physical order of tuples in your
 table to the contents of the date column is pretty good (examine
 correlation in pg_stats) and that island_history_date_stock_time is a
 3-column index?
 
 It is well known that the optimizer overestimates the cost of index
 scans in those situations.  This can be compensated to a certain degree
 by increasing effective_cache_size and/or decreasing random_page_cost
 (which might harm other planner decisions).
 
 You could also try
 CREATE INDEX history_date_stock ON history(date, stock);
 
 This will slow down INSERTs and UPDATEs, though.
 
 Servus
  Manfred


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-23 Thread Stephen Crowley
Thanks for the explanation. So what sort of changes need to be made to
the client/server protocol to fix this problem?



On Thu, 23 Sep 2004 18:22:15 -0500 (EST), Kris Jurka [EMAIL PROTECTED] wrote:
 
 
 On Tue, 14 Sep 2004, Stephen Crowley wrote:
 
  Problem solved.. I set the fetchSize to a reasonable value instead of
  the default of unlimited  in the PreparedStatement and now the query
  is . After some searching it seeems this is a common problem, would it
  make sense to change the default value to something other than 0 in
  the JDBC driver?
 
 In the JDBC driver, setting the fetch size to a non-zero value means that
 the query will be run using what the frontend/backend protocol calls a
 named statement.  What this means on the backend is that the planner will
 not be able to use the values from the query parameters to generate the
 optimum query plan and must use generic placeholders and create a generic
 plan.  For this reason we have decided not to default to a non-zero
 fetch size.  This is something whose default value could be set by a URL
 parameter if you think that is something that is really required.
 
 Kris Jurka
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]