[PERFORM] Insert performance vs Table size

2005-06-27 Thread Praveen Raja
Hi all

I'm wondering if and how the size of a table affects speed of inserts
into it? What if the table has indexes, does that alter the answer?

Thanks



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Insert performance vs Table size

2005-06-27 Thread 李江华
Praveen Raja:

  I think the size of a table don't affect the speed of inserts 
into it.Because PostgreSQL just doing something like append on the data files.
  But the index do speed-down the inserts. Because PostgreSQL should 
maintain the index when doing inserts.

  hope this is useful for your question.



=== 2005-06-27 19:24:06 you wrote:===

Hi all

I'm wondering if and how the size of a table affects speed of inserts
into it? What if the table has indexes, does that alter the answer?

Thanks



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


= = = = = = = = = = = = = = = = = = = =


   Best regards!
  

   李江华
   Seamus Dean
   Alibaba.com
   TEL:0571-85022088-2287
[EMAIL PROTECTED]
2005-06-27

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Insert performance vs Table size

2005-06-27 Thread Jacques Caron

Hi,

At 13:24 27/06/2005, Praveen Raja wrote:

I'm wondering if and how the size of a table affects speed of inserts
into it? What if the table has indexes, does that alter the answer?


Many parameters will affect the result:
- whether there are any indexes (including the primary key, unique 
constraints...) to update or not

- whether there are any foreign keys from or to that table
- the size of the rows
- whether the table (or at least the bits being updated) fit in RAM or not
- whether the table has holes (due to former updates/deletes and vacuum) 
and how they are placed

- and probably a bunch of other things...

Obviously, if you have an append-only (no updates, no deletes) table with 
no indexes and no foreign keys, the size of the table should not matter 
much. As soon as one of those conditions is not met table size will have an 
impact, probably small as long as whatever is needed can be held in RAM, a 
lot bigger once it's not the case.


Hope that helps,

Jacques.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Insert performance vs Table size

2005-06-27 Thread Praveen Raja
Just to clear things up a bit, the scenario that I'm interested in is a
table with a large number of indexes on it (maybe 7-8). In this scenario
other than the overhead of having to maintain the indexes (which I'm
guessing is the same regardless of the size of the table), does the size
of the table play a role in determining insert performance (and I mean
only insert performance)?

-Original Message-
From: Jacques Caron [mailto:[EMAIL PROTECTED] 
Sent: 27 June 2005 13:40
To: Praveen Raja
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Insert performance vs Table size

Hi,

At 13:24 27/06/2005, Praveen Raja wrote:
I'm wondering if and how the size of a table affects speed of inserts
into it? What if the table has indexes, does that alter the answer?

Many parameters will affect the result:
- whether there are any indexes (including the primary key, unique 
constraints...) to update or not
- whether there are any foreign keys from or to that table
- the size of the rows
- whether the table (or at least the bits being updated) fit in RAM or
not
- whether the table has holes (due to former updates/deletes and
vacuum) 
and how they are placed
- and probably a bunch of other things...

Obviously, if you have an append-only (no updates, no deletes) table
with 
no indexes and no foreign keys, the size of the table should not matter 
much. As soon as one of those conditions is not met table size will have
an 
impact, probably small as long as whatever is needed can be held in RAM,
a 
lot bigger once it's not the case.

Hope that helps,

Jacques.



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Insert performance vs Table size

2005-06-27 Thread Jacques Caron

Hi,

At 13:50 27/06/2005, Praveen Raja wrote:

Just to clear things up a bit, the scenario that I'm interested in is a
table with a large number of indexes on it (maybe 7-8).


If you're after performance you'll want to carefully consider which indexes 
are really useful and/or redesign your schema so that you can have less 
indexes on that table. 7 or 8 indexes is quite a lot, and that really has a 
cost.



 In this scenario
other than the overhead of having to maintain the indexes (which I'm
guessing is the same regardless of the size of the table)


Definitely not: indexes grow with the size of the table. Depending on what 
columns you index (and their types), the indexes may be a fraction of the 
size of the table, or they may be very close in size (in extreme cases they 
may even be larger). With 7 or 8 indexes, that can be quite a large volume 
of data to manipulate, especially if the values of the columns inserted can 
span the whole range of the index (rather than being solely id- or 
time-based, for instance, in which case index updates are concentrated in a 
small area of each of the indexes), as this means you'll need to have a 
majority of the indexes in RAM if you want to maintain decent performance.


does the size of the table play a role in determining insert performance 
(and I mean

only insert performance)?


In this case, it's really the indexes that'll cause you trouble, though 
heavily fragmented tables (due to lots of deletes or updates) will also 
incur a penalty just for the data part of the inserts.


Also, don't forget the usual hints if you are going to do lots of inserts:
- batch them in large transactions, don't do them one at a time
- better yet, use COPY rather than INSERT
- in some situations, you might be better of dropping the indexes, doing 
large batch inserts, then re-creating the indexes. YMMV depending on the 
existing/new ratio, whether you need to maintain indexed access to the 
tables, etc.

- pay attention to foreign keys

Jacques.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Performance - moving from oracle to postgresql

2005-06-27 Thread Merlin Moncure
 There are some immediate questions from our engineers about
performance
 
 - Oracle has one particular performance enhancement that Postgres is
 missing.  If you do a select that returns 100,000 rows in a given
order,
 and all you want are rows 99101 to 99200, then Oracle can do that very
 efficiently.  With Postgres, it has to read the first 99200 rows and
 then discard the first 99100.  But...  If we really want to look at
 performance, then we ought to put together a set of benchmarks of some
 typical tasks.

I agree with Rod: you are correct but this is a very odd objection.  You
are declaring a set but are only interested in a tiny subset of that
based on arbitrary critera.  You can do this with cursors or with clever
querying (not without materializing the full set however), but why?  

Merlin


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application

2005-06-27 Thread grupos

Hi !

My company is evaluating to compatibilizate our system (developed in 
C++) to PostgreSQL.


Our programmer made a lot of tests and he informed me that the 
performance using ODBC is very similar than using libpq, even with a big 
number of simultaneous connections/queries. Of course that for us is 
simpler use ODBC because will be easier to maintan as we already support 
a lot of other databases using ODBC (MySQL, DB2, etc).


Someone already had this experience? What are the key benefits using 
libpq insted of ODBC ?


Our application have a heavy load and around 150 concorrent users.

Regards,

Rodrigo Carvalhaes

--
Esta mensagem foi verificada pelo sistema de antivírus e
acredita-se estar livre de perigo.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application

2005-06-27 Thread Merlin Moncure
 Hi !
 
 My company is evaluating to compatibilizate our system (developed in
 C++) to PostgreSQL.
 
 Our programmer made a lot of tests and he informed me that the
 performance using ODBC is very similar than using libpq, even with a
big
 number of simultaneous connections/queries. Of course that for us is
 simpler use ODBC because will be easier to maintan as we already
support
 a lot of other databases using ODBC (MySQL, DB2, etc).
 
 Someone already had this experience? What are the key benefits using
 libpq insted of ODBC ?
 
 Our application have a heavy load and around 150 concorrent users.

The ODBC driver for postgresql implements its own protocol stack.
Unfortunately, it is still on protocol revision 2 (out of 3).  Also, IMO
libpq is a little better tested and durable than the odbc driver.  This
naturally follows from the fact that libpq is more widely used and more
actively developed than odbc.

If you are heavily C++ invested you can consider wrapping libpq yourself
if you want absolute maximum performance.  If you happen to be
developing on Borland platform give strong consideration to Zeos
connection library which is very well designed (it wraps libpq).

You might want to consider posting your question to the odbc list.

Merlin 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application

2005-06-27 Thread Eric Lauzon
i would take a peek at psqlodbc-8.0 drivers ..
i wouldn't battle with other version you might find such as (unixodbc
ones)


-elz


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Merlin Moncure
 Sent: 27 juin 2005 10:29
 To: grupos
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application
 
  Hi !
  
  My company is evaluating to compatibilizate our system (developed in
  C++) to PostgreSQL.
  
  Our programmer made a lot of tests and he informed me that the 
  performance using ODBC is very similar than using libpq, even with a
 big
  number of simultaneous connections/queries. Of course that 
 for us is 
  simpler use ODBC because will be easier to maintan as we already
 support
  a lot of other databases using ODBC (MySQL, DB2, etc).
  
  Someone already had this experience? What are the key 
 benefits using 
  libpq insted of ODBC ?
  
  Our application have a heavy load and around 150 concorrent users.
 
 The ODBC driver for postgresql implements its own protocol stack.
 Unfortunately, it is still on protocol revision 2 (out of 3). 
  Also, IMO libpq is a little better tested and durable than 
 the odbc driver.  This naturally follows from the fact that 
 libpq is more widely used and more actively developed than odbc.
 
 If you are heavily C++ invested you can consider wrapping 
 libpq yourself if you want absolute maximum performance.  If 
 you happen to be developing on Borland platform give strong 
 consideration to Zeos connection library which is very well 
 designed (it wraps libpq).
 
 You might want to consider posting your question to the odbc list.
 
 Merlin 
 
 
 ---(end of 
 broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] [HACKERS] How two perform TPC-H test on postgresql-8.0.2

2005-06-27 Thread Merlin Moncure
[moved to pgsql-performance]
  Currently I want to take a TPC-H test on postgresql-8.0.2. I
have
  downloaded the DBGEN and QGEN from the homepage of TPC. But I
 encountered
  many problems which forced me to request some help. 1. How to load
the
 data
  from flat file generated by dbgen tool? To the best of my knowledge,
 there
  is a SQL Loader in Oracle 2. How to simulate the currency
environment?
  Where can I download a client which connects to DB server through
ODBC?
 
 Get DBT3 from Sourceforge (search on osdldbt).  This is OSDL's
TPCH-like
 test.
 
 However, given your knowledge of PostgreSQL you're unlikely to get any
 kind of
 result you can use -- TPCH requires siginficant database tuning
knowledge.

I don't necessarily agree.  In fact, I remember reading the standards
for one of the TPC benchmarks and it said you were not supposed to
specifically tune for the test.  Any submission, including one with
stock settings, should be given consideration (and the .conf settings
should be submitted along with the benchmark results).  This can only
help to increase the body of knowledge on configuring the database.

Merlin

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] ETL optimization

2005-06-27 Thread Bricklen Anderson
Dennis Bjorklund wrote:
 On Thu, 23 Jun 2005, Bricklen Anderson wrote:
 
 
iii. UNIQUE constraint on table t1. This didn't seem to perform too
badly with fewer rows (preliminary tests), but as you'd expect, on error
the whole transaction would roll back. Is it possible to skip a row if
it causes an error, as opposed to aborting the transaction altogether?
 
 
 You don't need to roll back the whole transaction if you use savepoints or 
 the exception features in pl/pgsql
 
 Take a look at this example:
 
 http://developer.postgresql.org/docs/postgres/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
 
Hmmm... forgot about savepoints. That's an interesting idea that I'll have to
check out. I earlier mentioned that I was going to test the delete + insert
version, and it works pretty well. I got it down to about 3 minutes using that
method. I'll test the savepoint and the exception version that you listed as 
well.

Thanks!

-- 
___

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
___

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Michael Fuhr
On Tue, Jun 28, 2005 at 01:54:08AM +, Karl O. Pinc wrote:
 On 06/27/2005 06:33:03 PM, Michael Fuhr wrote:

 See timeofday().
 
 That only gives you the time at the start of the transaction,
 so you get no indication of how long anything in the
 transaction takes.

Did you read the documentation or try it?  Perhaps you're thinking
of now(), current_timestamp, and friends, which don't advance during
a transaction; but as the documentation states, timeofday() returns
the wall-clock time and does advance during transactions.

I just ran tests on versions of PostgreSQL going back to 7.2.8 and
in all of them timeofday() advanced during a transaction.  Does it
not work on your system?  If not then something's broken -- what
OS and version of PostgreSQL are you using?

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Poor index choice -- multiple indexes of the same

2005-06-27 Thread Karl O. Pinc


On 06/27/2005 05:37:41 PM, Josh Berkus wrote:

Karl,

 Seems to me that when there's a constant value in the query
 and an = comparision it will always be faster to use the (b-tree)
 index that's ordered first by the constant value, as then all
further
 blocks are guarenteed to have a higher relevant information
 density.  At least when compared with another index that has the
 same columns in it.

That really depends on the stats.   Such a choice would *not* be
appropriate if the  comparison was expected to return 1- rows while
the =
condition applied to 15% of the table.


We're talking internals here so I don't know what I'm talking
about, but, when the = comparison returns 15% of the table
you can find your way straight to the 1- (sic) relevent rows
because that 15% is further sorted by the second column of the
index.  So that's one disk read and after that when you scan
the rest of the blocks every datum is relevant/returned.
So your scan will pass through fewer disk blocks.  The only
case that would make sense to consider using the other
index is if the planner knew it could
get the answer in 1 disk read, in which case it should be
able to get the answer out of either index in one disk read
as both indexes are on the same columns.


What is your STATISTICS_TARGET for the relevant columns set to?


STATISTICS_TARGET is the default, which I read as 10 the docs.


When's
the last time you ran analyze?


I'm doing this in a torture test script, loading data.
Every fibnocci number of rows * 100 I VACCUM ANALYZE.
So, 100, 200, 300, 500, 800, etc.

Just for grins I've created the index I'd like it to use
and run VACUUM ANALYZE and shown the EXPLAIN ANALYZE below.


If this is all updated, you want to
post
the pg_stats rows for the relevant columns?


Pg_stats rows below.  (I've tried to wrap the lines short
so as not to mess up anybody's mailer.)

# create index census_sname_date on census (sname, date);
CREATE INDEX
# vacuum analyze census;
VACUUM
# explain analyze select 1 from census where date  '1975-9-21'
 and sname = 'RAD' and status != 'A' ;
QUERY
 PLAN
---
---

 Index Scan using census_date_sname on census  (cost=0.00..2169.51
rows=1437 width=0) (actual time=40.610..40.610 rows=0 loops=1)
   Index Cond: ((date  '1975-09-21'::date) AND (sname =
'RAD'::bpchar))
   Filter: (status  'A'::bpchar)
 Total runtime: 40.652 ms
(4 rows)

Compare with:

# drop index census_date_sname;
DROP INDEX
# explain analyze select date from census where sname = 'RAD'
 and date  '1975-9-21' and status != 'A' limit 1;
  QUERY
PLAN
---
---
 Limit  (cost=0.00..3.37 rows=1 width=4) (actual time=0.097..0.097
rows=0 loops=1)
   -  Index Scan using census_sname_date on census   
(cost=0.00..5203.95 rows=1544 width=4) (actual time=0.094..0.094

rows=0 loops=1)
 Index Cond: ((sname = 'RAD'::bpchar) AND (date 
'1975-09-21'::date))
 Filter: (status  'A'::bpchar)
 Total runtime: 0.133 ms
(5 rows)




# select * from pg_stats where tablename = 'census' and (attname =
'sname' or attname = 'date');
 schemaname | tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals | most_common_freqs | histogram_bounds |
correlation
+---+-+---+---+---
-+
-+
--
--+---
-+
-
 babase | census | date | 0 | 4 | 4687 |
{1979-02-01,1976-06-16,1977-03-23,1978-08-25,1979-09-20,1971-06-28
,1972-04-28,1972-08-27,1974-04-06,1975-03-19}
|
{0.002,0.0017,0.0017,0.0017,0.0017,0.0013
,0.0013,0.0013,0.0013,0.0013}
|
{1959-07-15,1966-02-18,1969-02-22,1971-01-10,1972-07-26,1974-02-09
,1975-05-27,1976-07-28,1977-08-19,1978-08-07,1979-10-02}
| 1
 babase | census | sname | 0 | 7 | 177 |
{MAX,ALT,PRE,COW,EST,JAN,RIN,ZUM,DUT,LUL} |
{0.017,0.015,0.015,0.0146667
,0.014,0.014,0.0136667,0.0136667,0.013,0.013}
| {ALI,BUN,FAN,IBI,LER,NDO,PET,RUS,SLM,TOT,XEN} | 0.0446897
(2 rows)

Thanks.


Karl [EMAIL PROTECTED]
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Karl O. Pinc


On 06/27/2005 08:34:19 PM, Michael Fuhr wrote:

On Tue, Jun 28, 2005 at 01:54:08AM +, Karl O. Pinc wrote:
 On 06/27/2005 06:33:03 PM, Michael Fuhr wrote:

 See timeofday().

 That only gives you the time at the start of the transaction,
 so you get no indication of how long anything in the
 transaction takes.

Did you read the documentation or try it?  Perhaps you're thinking
of now(), current_timestamp, and friends, which don't advance during
a transaction; but as the documentation states, timeofday() returns
the wall-clock time and does advance during transactions.


Very sorry.  I did not read through the complete documentation.


I just ran tests on versions of PostgreSQL going back to 7.2.8 and
in all of them timeofday() advanced during a transaction.


For all your work a documentation patch is appended that
I think is easier to read and might avoid this problem
in the future.  If you don't read all the way through the
current cvs version then you might think, as I did,
that timeofday() is a CURRENT_TIMESTAMP related function.

Sorry, but 3 lines wrap in the patch
in my email client.  :(


Karl [EMAIL PROTECTED]
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein


--- func.sgml   2005-06-26 17:05:35.0 -0500
+++ func.sgml.new   2005-06-27 21:51:05.301097896 -0500
@@ -5787,15 +5787,6 @@
/para

para
-There is also the function functiontimeofday()/function, which  
for historical
-reasons returns a typetext/type string rather than a  
typetimestamp/type value:

-screen
-SELECT timeofday();
-lineannotationResult: /lineannotationcomputeroutputSat Feb 17  
19:07:32.000126 2001 EST/computeroutput

-/screen
-   /para
-
-   para
 It is important to know that
 functionCURRENT_TIMESTAMP/function and related functions  
return

 the start time of the current transaction; their values do not
@@ -5803,8 +5794,7 @@
 the intent is to allow a single transaction to have a consistent
 notion of the quotecurrent/quote time, so that multiple
 modifications within the same transaction bear the same
-time stamp. functiontimeofday()/function
-returns the wall-clock time and does advance during transactions.
+time stamp.
/para

note
@@ -5815,6 +5805,18 @@
/note

para
+There is also the function functiontimeofday()/function which
+returns the wall-clock time and advances during transactions.  For
+historical reasons functiontimeofday()/function returns a
+typetext/type string rather than a typetimestamp/type
+value:
+screen
+SELECT timeofday();
+lineannotationResult: /lineannotationcomputeroutputSat Feb 17  
19:07:32.000126 2001 EST/computeroutput

+/screen
+   /para
+
+   para
 All the date/time data types also accept the special literal value
 literalnow/literal to specify the current date and time.   
Thus,

 the following three all return the same result:


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Poor index choice -- multiple indexes of the same

2005-06-27 Thread Karl O. Pinc


On 06/27/2005 09:36:51 PM, Karl O. Pinc wrote:


I'm doing this in a torture test script, loading data.
Every fibnocci number of rows * 100 I VACCUM ANALYZE.
So, 100, 200, 300, 500, 800, etc.


(And of course disconnect my client and re-connect so
as to use the new statistics.  sure would be nice if
I didn't have to do this.)

Karl [EMAIL PROTECTED]
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] How can I speed up this function?

2005-06-27 Thread David Mitchell
We have the following function in our home grown mirroring package, but 
it isn't running as fast as we would like. We need to select statements 
from the pending_statement table, and we want to select all the 
statements for a single transaction (pending_trans) in one go (that is, 
we either select all the statements for a transaction, or none of them). 
We select as many blocks of statements as it takes to top the 100 
statement limit (so if the last transaction we pull has enough 
statements to put our count at 110, we'll still take it, but then we're 
done).


Here is our function:

CREATE OR REPLACE FUNCTION dbmirror.get_pending()
  RETURNS SETOF dbmirror.pending_statement AS
$BODY$

DECLARE
count INT4;
transaction RECORD;
statement dbmirror.pending_statement;
BEGIN
count := 0;

FOR transaction IN SELECT t.trans_id as ID
FROM pending_trans AS t WHERE fetched = false
ORDER BY trans_id LIMIT 50
LOOP
update pending_trans set fetched =  true where trans_id = 
transaction.id;


	FOR statement IN SELECT s.id, s.transaction_id, s.table_name, s.op, 
s.data

FROM dbmirror.pending_statement AS s
WHERE s.transaction_id = transaction.id
ORDER BY s.id ASC
LOOP
count := count + 1;

RETURN NEXT statement;
END LOOP;

IF count  100 THEN
EXIT;
END IF;
END LOOP;

RETURN;
END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

Table Schemas:

CREATE TABLE dbmirror.pending_trans
(
  trans_id oid NOT NULL,
  fetched bool DEFAULT false,
  CONSTRAINT pending_trans_pkey PRIMARY KEY (trans_id)
)
WITHOUT OIDS;

CREATE TABLE dbmirror.pending_statement
(
  id oid NOT NULL DEFAULT nextval('dbmirror.statement_id_seq'::text),
  transaction_id oid NOT NULL,
  table_name text NOT NULL,
  op char NOT NULL,
  data text NOT NULL,
  CONSTRAINT pending_statement_pkey PRIMARY KEY (id)
)
WITHOUT OIDS;

CREATE UNIQUE INDEX idx_stmt_tran_id_id
  ON dbmirror.pending_statement
  USING btree
  (transaction_id, id);

Postgres 8.0.1 on Linux.

Any Help would be greatly appreciated.

Regards

--
David Mitchell
Software Engineer
Telogis

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Michael Fuhr
On Tue, Jun 28, 2005 at 03:03:06AM +, Karl O. Pinc wrote:
 
 For all your work a documentation patch is appended that
 I think is easier to read and might avoid this problem
 in the future.

Patches should go to the pgsql-patches list -- the people who review
and apply patches might not be following this thread.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread Christopher Kings-Lynne

What's wrong with Slony?

David Mitchell wrote:
We have the following function in our home grown mirroring package, but 
it isn't running as fast as we would like. We need to select statements 
from the pending_statement table, and we want to select all the 
statements for a single transaction (pending_trans) in one go (that is, 
we either select all the statements for a transaction, or none of them). 
We select as many blocks of statements as it takes to top the 100 
statement limit (so if the last transaction we pull has enough 
statements to put our count at 110, we'll still take it, but then we're 
done).


Here is our function:

CREATE OR REPLACE FUNCTION dbmirror.get_pending()
  RETURNS SETOF dbmirror.pending_statement AS
$BODY$

DECLARE
count INT4;
transaction RECORD;
statement dbmirror.pending_statement;
BEGIN
count := 0;

FOR transaction IN SELECT t.trans_id as ID
FROM pending_trans AS t WHERE fetched = false
ORDER BY trans_id LIMIT 50
LOOP
update pending_trans set fetched =  true where trans_id = 
transaction.id;


FOR statement IN SELECT s.id, s.transaction_id, s.table_name, 
s.op, s.data

FROM dbmirror.pending_statement AS s
WHERE s.transaction_id = transaction.id
ORDER BY s.id ASC
LOOP
count := count + 1;

RETURN NEXT statement;
END LOOP;

IF count  100 THEN
EXIT;
END IF;
END LOOP;

RETURN;
END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

Table Schemas:

CREATE TABLE dbmirror.pending_trans
(
  trans_id oid NOT NULL,
  fetched bool DEFAULT false,
  CONSTRAINT pending_trans_pkey PRIMARY KEY (trans_id)
)
WITHOUT OIDS;

CREATE TABLE dbmirror.pending_statement
(
  id oid NOT NULL DEFAULT nextval('dbmirror.statement_id_seq'::text),
  transaction_id oid NOT NULL,
  table_name text NOT NULL,
  op char NOT NULL,
  data text NOT NULL,
  CONSTRAINT pending_statement_pkey PRIMARY KEY (id)
)
WITHOUT OIDS;

CREATE UNIQUE INDEX idx_stmt_tran_id_id
  ON dbmirror.pending_statement
  USING btree
  (transaction_id, id);

Postgres 8.0.1 on Linux.

Any Help would be greatly appreciated.

Regards




---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread David Mitchell


Christopher Kings-Lynne wrote:

What's wrong with Slony?


Because it's not multi-master. Our mirroring package is.

--
David Mitchell
Software Engineer
Telogis

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread Christopher Kings-Lynne

What's wrong with Slony?


Because it's not multi-master. Our mirroring package is.


I'm curious - how did you write a multi-master replication package in 
pgsql, when pgsql doesn't have 2 phase commits or any kind of 
distributed syncing or conflict resolution in a release version?


Chris


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread Keith Worthington

David Mitchell wrote:
We have the following function in our home grown mirroring package, but 
it isn't running as fast as we would like. We need to select statements 
from the pending_statement table, and we want to select all the 
statements for a single transaction (pending_trans) in one go (that is, 
we either select all the statements for a transaction, or none of them). 
We select as many blocks of statements as it takes to top the 100 
statement limit (so if the last transaction we pull has enough 
statements to put our count at 110, we'll still take it, but then we're 
done).


Here is our function:

CREATE OR REPLACE FUNCTION dbmirror.get_pending()
  RETURNS SETOF dbmirror.pending_statement AS
$BODY$

DECLARE
count INT4;
transaction RECORD;
statement dbmirror.pending_statement;
BEGIN
count := 0;

FOR transaction IN SELECT t.trans_id as ID
FROM pending_trans AS t WHERE fetched = false
ORDER BY trans_id LIMIT 50
LOOP
update pending_trans set fetched =  true where trans_id = 
transaction.id;


FOR statement IN SELECT s.id, s.transaction_id, s.table_name, 
s.op, s.data

FROM dbmirror.pending_statement AS s
WHERE s.transaction_id = transaction.id
ORDER BY s.id ASC
LOOP
count := count + 1;

RETURN NEXT statement;
END LOOP;

IF count  100 THEN
EXIT;
END IF;
END LOOP;

RETURN;
END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;


David,

I'm still a newbie and it may not affect performance but why are you 
aliasing the tables?  Can you not simply use


FOR transaction IN SELECT trans_id
 FROM pending_trans
WHERE fetched = false
ORDER BY trans_id
LIMIT 50

and

FOR statement IN SELECT id,
transaction_id,
table_name,
op,
data
   FROM dbmirror.pending_statement
  WHERE pending_statement.transaction_id =
transaction.trans_id
  ORDER BY pending_statement.id

I am pretty sure that the ORDER BY is slowing down both of these 
queries.  Since you are going to go through the whole table eventually 
do you really need to sort the data at this point?


--
Kind Regards,
Keith

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread David Mitchell

Christopher Kings-Lynne wrote:


I'm curious - how did you write a multi-master replication package in 
pgsql, when pgsql doesn't have 2 phase commits or any kind of 
distributed syncing or conflict resolution in a release version?


We didn't write it entirely in pgsql, there is a worker process that 
takes care of actually committing to the database.


Cheers

--
David Mitchell
Software Engineer
Telogis

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread David Mitchell

Hi Keith,

Unfortunately, we must have those sorts. The statements within a 
transaction must be executed on the slave in the same order as they were 
on the master, and similarly, transactions must also go in the same 
order. As for aliasing the tables, that is just a remnant from previous 
versions of the code.


Thanks

David

Keith Worthington wrote:
I'm still a newbie and it may not affect performance but why are you 
aliasing the tables?  Can you not simply use


FOR transaction IN SELECT trans_id
 FROM pending_trans
WHERE fetched = false
ORDER BY trans_id
LIMIT 50

and

FOR statement IN SELECT id,
transaction_id,
table_name,
op,
data
   FROM dbmirror.pending_statement
  WHERE pending_statement.transaction_id =
transaction.trans_id
  ORDER BY pending_statement.id

I am pretty sure that the ORDER BY is slowing down both of these 
queries.  Since you are going to go through the whole table eventually 
do you really need to sort the data at this point?





--
David Mitchell
Software Engineer
Telogis


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread Gnanavel Shanmugam
Merge the two select statements like this and try,

SELECT t.trans_id as ID,s.id, s.transaction_id, s.table_name, s.op, s.data
   FROM pending_trans AS t join dbmirror.pending_statement AS s
   on (s.transaction_id=t.id)
WHERE t.fetched = false order by t.trans_id,s.id limit 100;

 If the above query works in the way you want, then you can also do the
update
using the same.

with regards,
S.Gnanavel


 -Original Message-
 From: [EMAIL PROTECTED]
 Sent: Tue, 28 Jun 2005 14:37:34 +1200
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] How can I speed up this function?

 We have the following function in our home grown mirroring package, but
 it isn't running as fast as we would like. We need to select statements
 from the pending_statement table, and we want to select all the
 statements for a single transaction (pending_trans) in one go (that is,
 we either select all the statements for a transaction, or none of them).
 We select as many blocks of statements as it takes to top the 100
 statement limit (so if the last transaction we pull has enough
 statements to put our count at 110, we'll still take it, but then we're
 done).

 Here is our function:

 CREATE OR REPLACE FUNCTION dbmirror.get_pending()
RETURNS SETOF dbmirror.pending_statement AS
 $BODY$

 DECLARE
  count INT4;
  transaction RECORD;
  statement dbmirror.pending_statement;
  BEGIN
  count := 0;

  FOR transaction IN SELECT t.trans_id as ID
  FROM pending_trans AS t WHERE fetched = false
  ORDER BY trans_id LIMIT 50
  LOOP
  update pending_trans set fetched =  true where trans_id =
 transaction.id;

   FOR statement IN SELECT s.id, s.transaction_id, s.table_name, s.op,
 s.data
  FROM dbmirror.pending_statement AS s
  WHERE s.transaction_id = transaction.id
  ORDER BY s.id ASC
  LOOP
  count := count + 1;

  RETURN NEXT statement;
  END LOOP;

  IF count  100 THEN
  EXIT;
  END IF;
  END LOOP;

  RETURN;
  END;$BODY$
LANGUAGE 'plpgsql' VOLATILE;

 Table Schemas:

 CREATE TABLE dbmirror.pending_trans
 (
trans_id oid NOT NULL,
fetched bool DEFAULT false,
CONSTRAINT pending_trans_pkey PRIMARY KEY (trans_id)
 )
 WITHOUT OIDS;

 CREATE TABLE dbmirror.pending_statement
 (
id oid NOT NULL DEFAULT nextval('dbmirror.statement_id_seq'::text),
transaction_id oid NOT NULL,
table_name text NOT NULL,
op char NOT NULL,
data text NOT NULL,
CONSTRAINT pending_statement_pkey PRIMARY KEY (id)
 )
 WITHOUT OIDS;

 CREATE UNIQUE INDEX idx_stmt_tran_id_id
ON dbmirror.pending_statement
USING btree
(transaction_id, id);

 Postgres 8.0.1 on Linux.

 Any Help would be greatly appreciated.

 Regards

 --
 David Mitchell
 Software Engineer
 Telogis

 ---(end of broadcast)---
 TIP 8: explain analyze is your friend
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] LIKE search with ending % not optimized in v8

2005-06-27 Thread Aditya Damle
Hello. I believe in earlier versions, a query of the
form 
select attrib from ttt where attrib like 'foo%' would
be able to take advantage of an index. I have seen
this in the past. Currently I am using v8.0.3. From
what I can see is that the execultion plan seems to
use a seq scan and to totally ignore the index. Is
this the case?

-Aditya


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread David Mitchell

Hi Gnanavel,

Thanks, but that will only return at most 100 statements. If there is a 
transaction with 110 statements then this will not return all the 
statements for that transaction. We need to make sure that the function 
returns all the statements for a transaction.


Cheers

David

Gnanavel Shanmugam wrote:

Merge the two select statements like this and try,

SELECT t.trans_id as ID,s.id, s.transaction_id, s.table_name, s.op, s.data
   FROM pending_trans AS t join dbmirror.pending_statement AS s
   on (s.transaction_id=t.id)
WHERE t.fetched = false order by t.trans_id,s.id limit 100;

 If the above query works in the way you want, then you can also do the
update
using the same.

with regards,
S.Gnanavel




-Original Message-
From: [EMAIL PROTECTED]
Sent: Tue, 28 Jun 2005 14:37:34 +1200
To: pgsql-performance@postgresql.org
Subject: [PERFORM] How can I speed up this function?

We have the following function in our home grown mirroring package, but
it isn't running as fast as we would like. We need to select statements
from the pending_statement table, and we want to select all the
statements for a single transaction (pending_trans) in one go (that is,
we either select all the statements for a transaction, or none of them).
We select as many blocks of statements as it takes to top the 100
statement limit (so if the last transaction we pull has enough
statements to put our count at 110, we'll still take it, but then we're
done).

Here is our function:

CREATE OR REPLACE FUNCTION dbmirror.get_pending()
  RETURNS SETOF dbmirror.pending_statement AS
$BODY$

DECLARE
count INT4;
transaction RECORD;
statement dbmirror.pending_statement;
BEGIN
count := 0;

FOR transaction IN SELECT t.trans_id as ID
FROM pending_trans AS t WHERE fetched = false
ORDER BY trans_id LIMIT 50
LOOP
update pending_trans set fetched =  true where trans_id =
transaction.id;

FOR statement IN SELECT s.id, s.transaction_id, s.table_name, s.op,
s.data
FROM dbmirror.pending_statement AS s
WHERE s.transaction_id = transaction.id
ORDER BY s.id ASC
LOOP
count := count + 1;

RETURN NEXT statement;
END LOOP;

IF count  100 THEN
EXIT;
END IF;
END LOOP;

RETURN;
END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

Table Schemas:

CREATE TABLE dbmirror.pending_trans
(
  trans_id oid NOT NULL,
  fetched bool DEFAULT false,
  CONSTRAINT pending_trans_pkey PRIMARY KEY (trans_id)
)
WITHOUT OIDS;

CREATE TABLE dbmirror.pending_statement
(
  id oid NOT NULL DEFAULT nextval('dbmirror.statement_id_seq'::text),
  transaction_id oid NOT NULL,
  table_name text NOT NULL,
  op char NOT NULL,
  data text NOT NULL,
  CONSTRAINT pending_statement_pkey PRIMARY KEY (id)
)
WITHOUT OIDS;

CREATE UNIQUE INDEX idx_stmt_tran_id_id
  ON dbmirror.pending_statement
  USING btree
  (transaction_id, id);

Postgres 8.0.1 on Linux.

Any Help would be greatly appreciated.

Regards

--
David Mitchell
Software Engineer
Telogis

---(end of broadcast)---
TIP 8: explain analyze is your friend



--
David Mitchell
Software Engineer
Telogis

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] select distinct on varchar -- wild performance differences!

2005-06-27 Thread Elliott Bennett
Hey, all.  I've bounced this around in #postgres for an hour or so, and 
it was suggested that I post it here as well.  Hopefully someone can 
help me out.


I have three machines.  All have 512MB of ram.
Machine A is a 2.0ghz celeron, running debian, pg verison 7.4.6.
Machine B is a 1.8ghz celeron, running centos 3.4, pg verison 8.0.3.  
(7.3.9 also exhibited the behaviour below, by the way)

Machine C is a 1.0ghz athlon, running centos 4.0,  pg verison 7.4.7.


The SAME data and schema is loaded (from a pg_dump, default parameters) 
onto all three machines.   With the same query: select distinct model 
from exif_common, machines A and C return results quickly (1/4 
second).  Machine B chews on it for 30ish seconds!  Note, this column is 
a VARCHAR(40).


Here's an explain analyze for it.

Machine A (fast): 
photos=# explain analyze select distinct model from exif_common;

 QUERY PLAN
--
Unique  (cost=2629.74..2732.11 rows=5 width=15) (actual time=211.358..265.049 
rows=6 loops=1)
  -  Sort  (cost=2629.74..2680.93 rows=20473 width=15) (actual 
time=211.351..242.296 rows=20473 loops=1)
Sort Key: model
-  Seq Scan on exif_common  (cost=0.00..1163.73 rows=20473 width=15) 
(actual time=0.022..58.635 rows=20473 loops=1)
Total runtime: 265.928 ms
(5 rows)



Machine B (slow): 
photos=# explain analyze select distinct model from exif_common;

 QUERY PLAN
--
Unique  (cost=2640.74..2743.11 rows=6 width=15) (actual 
time=27939.231..32914.134 rows=6 loops=1)
  -  Sort  (cost=2640.74..2691.93 rows=20473 width=15) (actual 
time=27939.222..27983.784 rows=20473 loops=1)
Sort Key: model
-  Seq Scan on exif_common  (cost=0.00..1174.73 rows=20473 width=15) 
(actual time=0.071..97.772 rows=20473 loops=1)
Total runtime: 32915.031 ms
(5 rows)


( yes, i know, six distinct rows out of 20,000   But holy moly!  1/4 
sec vs 32.9 sec?!?! )



Now, if I do a similar query against an INT column, the speeds are more 
in line with each other:


Machine A:
photos=# explain analyze select distinct imagewidth from exif_common;

QUERY PLAN
-
Unique  (cost=2629.74..2732.11 rows=36 width=4) (actual time=179.899..225.934 
rows=107 loops=1)
  -  Sort  (cost=2629.74..2680.93 rows=20473 width=4) (actual 
time=179.891..207.632 rows=20473 loops=1)
Sort Key: imagewidth
-  Seq Scan on exif_common  (cost=0.00..1163.73 rows=20473 width=4) 
(actual time=0.024..62.946 rows=20473 loops=1)
Total runtime: 226.707 ms
(5 rows)



Machine B:
photos=# explain analyze select distinct imagewidth from exif_common;
QUERY PLAN
-
Unique  (cost=2640.74..2743.11 rows=24 width=4) (actual time=209.394..287.131 
rows=107 loops=1)
  -  Sort  (cost=2640.74..2691.93 rows=20473 width=4) (actual 
time=209.384..251.693 rows=20473 loops=1)
Sort Key: imagewidth
-  Seq Scan on exif_common  (cost=0.00..1174.73 rows=20473 width=4) 
(actual time=0.074..94.574 rows=20473 loops=1)
Total runtime: 288.411 ms

(5 rows)




Machine C exhibits the same behaviour as A for all queries.

This weird slow behaviour on machine B also appeared in 7.3.9.  
Upgrading didn't seem to help.


neilc from irc thought it may be a qsort(2) quirk, but a sample C 
program I whipped up testing different sized data sets with a similar 
distribution gave very similar sort timings between the three 
machines..  Therefore, I don't think it's qsort(2) to blame...


Anyone have any ideas as to what may be up with machine B?   


Thanks,
-Elliott

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] parameterized LIKE does not use index

2005-06-27 Thread Kurt De Grave

Hi,

Consider the where-clauses:

WHERE lower(col) LIKE 'abc';
WHERE lower(col) LIKE 'abc%';

these will both use a b-tree functional index in lower(col) if one exists.  
The clause

WHERE lower(col) LIKE '%abc';

can't use the index as you would expect, because of the wildcard at the
front (as mentioned in the manual).  Thus, it has to do a seqscan, on what
in my case is a very large table.  But still that's not too bad, because I
expect an overwhelming amount of the simple cases, and only very few that
start with a percentage sign. Now, what's problematic is if i replace the
literal with a parameter, like this:

WHERE lower(col) LIKE ?

It seems that the parameterized query gets compiled once, and because the
parameter is not yet known, one cannot be sure it doesn't start with a
percentage sign.  Using the parameterized version causes ALL cases to use
a seqscan.

Of course, I could modify the application and send different SQL depending
on which case we're in or just constructing a query with a literal each
time, but is there a way to add a hint to the SQL that would cause the
query to be re-planned if it's a case that could use the index?  Or can I 
convince the (Perl) driver to do so?



kurt.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Postgresql7.4.5 running slow on plpgsql function

2005-06-27 Thread Chun Yit(Chronos)



hi, need some help with some experts 
here.
currently we have a function that use together 
with temp table, it calls search result function, everytime
this function is calling, it will go through 
some filter before come out as a result.
now we have some major problem , the first time 
the function execute, it take about 13 second
second time the function is execute, it take 
about 17 second, every time you execute the function
the time taken will grow about 4 second, 
?
mayi know what going on 
here?
since we use function with temp table, so every 
statement that related to temp table will using EXECUTE
command.

regards
ivan


[PERFORM] index selection by query planner

2005-06-27 Thread Rohit Gaddi




Hi,

I have a tablewith two indiceson the same column, one ofwhich is apartial index. I would like the query planner to use the partial index whenever the query condition lies in the range of the partial index as it would yield better performance.Is there any way to enforce the ordering for the indices? How does the query planner decide which index to use when a particular query is fired? 'Explain Analyze' showed the total index being used in a situation that could be fulfiled by the partial index.

Thanks,
Rohit
		How much free photo storage do you get? Store your friends n family photos for FREE with Yahoo! Photos.  http://in.photos.yahoo.com

[PERFORM] Too slow querying a table of 15 million records

2005-06-27 Thread kjelle
Hello!

I use FreeBSD 4.11 with PostGreSQL 7.3.8.

I got a huge database with roughly 15 million records. There is just one
table, with a time field, a few ints and a few strings.


table test
fields time (timestamp), source (string), destination (string), p1 (int),
p2 (int)


I have run VACUUM ANALYZE ;

I have created indexes on every field, but for some reason my postgre
server wants to use a seqscan, even tho i know a indexed scan would be
much faster.


create index test_time_idx on test (time) ;
create index test_source_idx on test (source) ;
create index test_destination_idx on test (destination) ;
create index test_p1_idx on test (p1) ;
create index test_p2_idx on test (p2) ;



What is really strange, is that when i query a count(*) on one of the int
fields (p1), which has a very low count, postgre uses seqscan. In another
count on the same int field (p1), i know he is giving about 2.2 million
hits, but then he suddenly uses seqscan, instead of a indexed one. Isn't
the whole idea of indexing to increase performance in large queries.. To
make sort of a phonebook for the values, to make it faster to look up what
ever you need... This just seems opposite..

Here is a EXPLAIN of my query

database= explain select date_trunc('hour', time),count(*) as total from
test where p1=53 and time  now() - interval '24 hours' group by
date_trunc order by date_trunc ;
QUERY PLAN
--
 Aggregate  (cost=727622.61..733143.23 rows=73608 width=8)
   -  Group  (cost=727622.61..731303.02 rows=736083 width=8)
 -  Sort  (cost=727622.61..729462.81 rows=736083 width=8)
   Sort Key: date_trunc('hour'::text, time)
   -  Seq Scan on test  (cost=0.00..631133.12 rows=736083
width=8)
 Filter: ((p1 = 53) AND (time  (now() - '1
day'::interval)))
(6 rows)



database= drop INDEX test_TABULATOR
test_source_idx test_destination_idxtest_p1_idx
test_p2_idx   test_time_idx


After all this, i tried to set enable_seqscan to off and
enable_nestedloops to on. This didnt help much either. The time to run the
query is still in minutes. My results are the number of elements for each
hour, and it gives about 1000-2000 hits per hour. I have read somewhere,
about PostGreSQL, that it can easily handle 100-200million records. And
with the right tuned system, have a great performance.. I would like to
learn how :)

I also found an article on a page
(http://techdocs.postgresql.org/techdocs/pgsqladventuresep3.php):

Tip #11:  Don't bother indexing columns with huge numbers of records and a
small range of values, such as BOOLEAN columns.

This tip, regretfully, is perhaps the only tip where I cannot provide a
good, real-world example from my work.  So I'll give you a hypothetical
situation instead:

Imagine that you have a database table with a list of every establishment
vending ice cream in the US.  A simple example might look like:

Where there were almost 1 million rows, but due to simplistic data entry,
only three possible values for type (1-SUPERMARKET, 2-BOUTIQUE, and
3-OTHER) which are relatively evenly distributed.  In this hypothetical
situation, you might find (with testing using EXPLAIN) that an index on
type is ignored and the parser uses a seq scan (or table scan) instead. 
This is because a table scan can actually be faster than an index scan in
this situation.  Thus, any index on type should be dropped.

Certainly, the boolean column (active) requires no indexing as it has only
two possible values and no index will be faster than a table scan.


Then I ask, what is useful with indexing, when I can't use it on a VERY
large database? It is on my 15 million record database it takes for ever
to do seqscans over and over again... This is probably why, as i mentioned
earlier, the reason (read the quote) why he chooses a full scan and not a
indexed one...

So what do I do? :confused:

I'v used SQL for years, but never in such a big scale. Thus, not having to
learn how to deal with large number of records. Usually a maximum of 1000
records. Now, with millions, I need to learn a way to make my sucky
queries better.

Im trying to learn more about tuning my system, makeing better queries and
such. I'v found some documents on the Internet, but far from the best.

Feedback most appreciated!

Regards,
a learning PostGreSQL user





---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]