date:20050627

Re: [PERFORM] perl garbage collector

2005-06-27 Thread Tom Lane

Jean-Max Reymond <[EMAIL PROTECTED]> writes:
> I have a stored procedure written in perl and I doubt that perl's
> garbage collector is working :-(
> after a lot of work, postmaster has a size of 1100 Mb and  I think
> that the keyword "undef" has no effects.

Check the PG list archives --- there's been previous discussion of
similar issues.  I think we concluded that when Perl is built to use
its own private memory allocator, the results of that competing with
malloc are not very pretty :-(.  You end up with a fragmented memory
map and no chance to give anything back to the OS.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Too slow querying a table of 15 million records

2005-06-27 Thread Tobias Brox

[EMAIL PROTECTED] - Tue at 08:33:58PM +0200]
> I use FreeBSD 4.11 with PostGreSQL 7.3.8.
(...)
> database=> explain select date_trunc('hour', time),count(*) as total from
> test where p1=53 and time > now() - interval '24 hours' group by
> date_trunc order by date_trunc ;

I haven't looked through all your email yet, but this phenomena have been up
at the list a couple of times.  Try replacing "now() - interval '24 hours'"
with a fixed time stamp, and see if it helps.

pg7 will plan the query without knowledge of what "now() - interval '24
hours'" will compute to.  This should be fixed in pg8.

-- 
Tobias Brox, +86-13521622905
Nordicbet, IT dept

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] How are text columns stored?

2005-06-27 Thread Tom Lane

Meetesh Karia <[EMAIL PROTECTED]> writes:
> According to section 8.3 of the doc:

> "Long values are also stored in background tables so they do not interfere
> with rapid access to the shorter column values."

> So, how long does a value have to be to be considered "long"?

Several kilobytes.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

[PERFORM] slow simple update?

2005-06-27 Thread Colin Taylor

Hi there,

I'm doing an update of ~30,000 rows and she takes about 15mins on
pretty good hardware, even just after a vacuum analyze.
I was hoping some kind soul could offer some performance advice. Do I
just have too many indexes? Or am I missing some trick with the nulls?


MY QUERY

update bob.product_price set thru_date = '2005-06-22 22:08:49.957'
where thru_date is null;


MY TABLE
=
   Table "bob.product_price"
   Column|   Type   | Modifiers 
-+--+---
 product_id  | character varying(20)| not null
 product_price_type_id   | character varying(20)| not null
 currency_uom_id | character varying(20)| not null
 product_store_id| character varying(20)| not null
 from_date   | timestamp with time zone | not null
 thru_date   | timestamp with time zone | 
 price   | numeric(18,2)| 
 created_date| timestamp with time zone | 
 created_by_user_login   | character varying(255)   | 
 last_modified_date  | timestamp with time zone | 
 last_modified_by_user_login | character varying(255)   | 
 last_updated_stamp  | timestamp with time zone | 
 last_updated_tx_stamp   | timestamp with time zone | 
 created_stamp   | timestamp with time zone | 
 created_tx_stamp| timestamp with time zone | 

Indexes:
-
pk_product_price primary key btree
 (product_id, product_price_type_id, currency_uom_id,
product_store_id, from_date),
prdct_prc_txcrts btree (created_tx_stamp),
prdct_prc_txstmp btree (last_updated_tx_stamp),
prod_price_cbul btree (created_by_user_login),
prod_price_cuom btree (currency_uom_id),
prod_price_lmbul btree (last_modified_by_user_login),
prod_price_prod btree (product_id),
prod_price_pst btree (product_store_id),
prod_price_type btree (product_price_type_id)

Foreign Key constraints: 
-
prod_price_prod FOREIGN KEY (product_id) REFERENCES bob.product(product_id) 
 ON UPDATE NO ACTION ON DELETE NO ACTION,
prod_price_type FOREIGN KEY (product_price_type_id) REFERENCES
bob.product_price_type(product_price_type_id)
 ON UPDATE NO ACTION ON DELETE NO ACTION,
prod_price_cuom FOREIGN KEY (currency_uom_id) REFERENCES bob.uom(uom_id) 
 ON UPDATE NO ACTION ON DELETE NO ACTION,
prod_price_pst FOREIGN KEY (product_store_id) REFERENCES
bob.product_store(product_store_id)
 ON UPDATE NO ACTION ON DELETE NO ACTION,
prod_price_cbul FOREIGN KEY (created_by_user_login) REFERENCES
bob.user_login(user_login_id)
 ON UPDATE NO ACTION ON DELETE NO ACTION,
prod_price_lmbul FOREIGN KEY (last_modified_by_user_login) REFERENCES
bob.user_login(user_login_id)
 ON UPDATE NO ACTION ON DELETE NO ACTION

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[PERFORM] index selection by query planner

2005-06-27 Thread Rohit Gaddi


Hi,
 
I have a table with two indices on the same column, one of which is a partial index. I would like the query planner to use the partial index whenever the query condition lies in the range of the partial index as it would yield better performance. Is there any way to enforce the ordering for the indices? How does the query planner decide which index to use when a particular query is fired?  'Explain Analyze' showed the total index being used in a situation that could be fulfiled by the partial index.
 
Thanks,
Rohit
		How much free photo storage do you get? Store your friends n family photos for FREE with Yahoo! Photos.  http://in.photos.yahoo.com

[PERFORM] Too slow querying a table of 15 million records

2005-06-27 Thread kjelle

Hello!

I use FreeBSD 4.11 with PostGreSQL 7.3.8.

I got a huge database with roughly 15 million records. There is just one
table, with a time field, a few ints and a few strings.


table test
fields time (timestamp), source (string), destination (string), p1 (int),
p2 (int)


I have run VACUUM ANALYZE ;

I have created indexes on every field, but for some reason my postgre
server wants to use a seqscan, even tho i know a indexed scan would be
much faster.


create index test_time_idx on test (time) ;
create index test_source_idx on test (source) ;
create index test_destination_idx on test (destination) ;
create index test_p1_idx on test (p1) ;
create index test_p2_idx on test (p2) ;



What is really strange, is that when i query a count(*) on one of the int
fields (p1), which has a very low count, postgre uses seqscan. In another
count on the same int field (p1), i know he is giving about 2.2 million
hits, but then he suddenly uses seqscan, instead of a indexed one. Isn't
the whole idea of indexing to increase performance in large queries.. To
make sort of a phonebook for the values, to make it faster to look up what
ever you need... This just seems opposite..

Here is a EXPLAIN of my query

database=> explain select date_trunc('hour', time),count(*) as total from
test where p1=53 and time > now() - interval '24 hours' group by
date_trunc order by date_trunc ;
QUERY PLAN
--
 Aggregate  (cost=727622.61..733143.23 rows=73608 width=8)
   ->  Group  (cost=727622.61..731303.02 rows=736083 width=8)
 ->  Sort  (cost=727622.61..729462.81 rows=736083 width=8)
   Sort Key: date_trunc('hour'::text, "time")
   ->  Seq Scan on test  (cost=0.00..631133.12 rows=736083
width=8)
 Filter: ((p1 = 53) AND ("time" > (now() - '1
day'::interval)))
(6 rows)



database=> drop INDEX test_
test_source_idx test_destination_idxtest_p1_idx
test_p2_idx   test_time_idx


After all this, i tried to set enable_seqscan to off and
enable_nestedloops to on. This didnt help much either. The time to run the
query is still in minutes. My results are the number of elements for each
hour, and it gives about 1000-2000 hits per hour. I have read somewhere,
about PostGreSQL, that it can easily handle 100-200million records. And
with the right tuned system, have a great performance.. I would like to
learn how :)

I also found an article on a page
(http://techdocs.postgresql.org/techdocs/pgsqladventuresep3.php):

Tip #11:  Don't bother indexing columns with huge numbers of records and a
small range of values, such as BOOLEAN columns.

This tip, regretfully, is perhaps the only tip where I cannot provide a
good, real-world example from my work.  So I'll give you a hypothetical
situation instead:

Imagine that you have a database table with a list of every establishment
vending ice cream in the US.  A simple example might look like:

Where there were almost 1 million rows, but due to simplistic data entry,
only three possible values for type (1-SUPERMARKET, 2-BOUTIQUE, and
3-OTHER) which are relatively evenly distributed.  In this hypothetical
situation, you might find (with testing using EXPLAIN) that an index on
type is ignored and the parser uses a "seq scan" (or table scan) instead. 
This is because a table scan can actually be faster than an index scan in
this situation.  Thus, any index on type should be dropped.

Certainly, the boolean column (active) requires no indexing as it has only
two possible values and no index will be faster than a table scan.


Then I ask, what is useful with indexing, when I can't use it on a VERY
large database? It is on my 15 million record database it takes for ever
to do seqscans over and over again... This is probably why, as i mentioned
earlier, the reason (read the quote) why he chooses a full scan and not a
indexed one...

So what do I do? :confused:

I'v used SQL for years, but never in such a big scale. Thus, not having to
learn how to deal with large number of records. Usually a maximum of 1000
records. Now, with millions, I need to learn a way to make my sucky
queries better.

Im trying to learn more about tuning my system, makeing better queries and
such. I'v found some documents on the Internet, but far from the best.

Feedback most appreciated!

Regards,
a learning PostGreSQL user





---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[PERFORM] index selection by query planner

2005-06-27 Thread Rohit Gaddi





Hi,
 
I have a table with two indices on the same column, one of which is a partial index. I would like the query planner to use the partial index whenever the query condition lies in the range of the partial index as it would yield better performance. Is there any way to enforce the ordering for the indices? How does the query planner decide which index to use when a particular query is fired?  'Explain Analyze' showed the total index being used in a situation that could be fulfiled by the partial index.
 
Thanks,
Rohit
		How much free photo storage do you get? Store your friends n family photos for FREE with Yahoo! Photos.  http://in.photos.yahoo.com

[PERFORM] Postgresql7.4.5 running slow on plpgsql function

2005-06-27 Thread Chun Yit(Chronos)




>hi, need some help with some experts 
here.
>currently we have a function that use together 
with temp table, it calls search result function, everytime
>this function is calling, it will go through 
some filter before come out as a result.
>now we have some major problem , the first time 
the function execute, it take about 13 second
>second time the function is execute, it take 
about 17 second, every time you execute the function
>the time taken will grow about 4 second, 
?
>may i know what going on 
here?
>since we use function with temp table, so every 
statement that related to temp table will using EXECUTE
>command.
>
>regards
>ivan

[PERFORM] parameterized LIKE does not use index

2005-06-27 Thread Kurt De Grave


Hi,

Consider the where-clauses:

WHERE lower(col) LIKE 'abc';
WHERE lower(col) LIKE 'abc%';

these will both use a b-tree functional index in lower(col) if one exists.  
The clause

WHERE lower(col) LIKE '%abc';

can't use the index as you would expect, because of the wildcard at the
front (as mentioned in the manual).  Thus, it has to do a seqscan, on what
in my case is a very large table.  But still that's not too bad, because I
expect an overwhelming amount of the simple cases, and only very few that
start with a percentage sign. Now, what's problematic is if i replace the
literal with a parameter, like this:

WHERE lower(col) LIKE ?

It seems that the parameterized query gets compiled once, and because the
parameter is not yet known, one cannot be sure it doesn't start with a
percentage sign.  Using the parameterized version causes ALL cases to use
a seqscan.

Of course, I could modify the application and send different SQL depending
on which case we're in or just constructing a query with a literal each
time, but is there a way to add a hint to the SQL that would cause the
query to be re-planned if it's a case that could use the index?  Or can I 
convince the (Perl) driver to do so?



kurt.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[PERFORM] select distinct on varchar -- wild performance differences!

2005-06-27 Thread Elliott Bennett

Hey, all.  I've bounced this around in #postgres for an hour or so, and 
it was suggested that I post it here as well.  Hopefully someone can 
help me out.


I have three machines.  All have 512MB of ram.
Machine A is a 2.0ghz celeron, running debian, pg verison 7.4.6.
Machine B is a 1.8ghz celeron, running centos 3.4, pg verison 8.0.3.  
(7.3.9 also exhibited the behaviour below, by the way)

Machine C is a 1.0ghz athlon, running centos 4.0,  pg verison 7.4.7.


The SAME data and schema is loaded (from a pg_dump, default parameters) 
onto all three machines.   With the same query: "select distinct model 
from exif_common", machines A and C return results quickly (1/4 
second).  Machine B chews on it for 30ish seconds!  Note, this column is 
a VARCHAR(40).


Here's an explain analyze for it.

Machine A (fast): 
photos=# explain analyze select distinct model from exif_common;

 QUERY PLAN
--
Unique  (cost=2629.74..2732.11 rows=5 width=15) (actual time=211.358..265.049 
rows=6 loops=1)
  ->  Sort  (cost=2629.74..2680.93 rows=20473 width=15) (actual 
time=211.351..242.296 rows=20473 loops=1)
Sort Key: model
->  Seq Scan on exif_common  (cost=0.00..1163.73 rows=20473 width=15) 
(actual time=0.022..58.635 rows=20473 loops=1)
Total runtime: 265.928 ms
(5 rows)



Machine B (slow): 
photos=# explain analyze select distinct model from exif_common;

 QUERY PLAN
--
Unique  (cost=2640.74..2743.11 rows=6 width=15) (actual 
time=27939.231..32914.134 rows=6 loops=1)
  ->  Sort  (cost=2640.74..2691.93 rows=20473 width=15) (actual 
time=27939.222..27983.784 rows=20473 loops=1)
Sort Key: model
->  Seq Scan on exif_common  (cost=0.00..1174.73 rows=20473 width=15) 
(actual time=0.071..97.772 rows=20473 loops=1)
Total runtime: 32915.031 ms
(5 rows)


( yes, i know, six distinct rows out of 20,000   But holy moly!  1/4 
sec vs 32.9 sec?!?! )



Now, if I do a similar query against an INT column, the speeds are more 
in line with each other:


Machine A:
photos=# explain analyze select distinct imagewidth from exif_common;

QUERY PLAN
-
Unique  (cost=2629.74..2732.11 rows=36 width=4) (actual time=179.899..225.934 
rows=107 loops=1)
  ->  Sort  (cost=2629.74..2680.93 rows=20473 width=4) (actual 
time=179.891..207.632 rows=20473 loops=1)
Sort Key: imagewidth
->  Seq Scan on exif_common  (cost=0.00..1163.73 rows=20473 width=4) 
(actual time=0.024..62.946 rows=20473 loops=1)
Total runtime: 226.707 ms
(5 rows)



Machine B:
photos=# explain analyze select distinct imagewidth from exif_common;
QUERY PLAN
-
Unique  (cost=2640.74..2743.11 rows=24 width=4) (actual time=209.394..287.131 
rows=107 loops=1)
  ->  Sort  (cost=2640.74..2691.93 rows=20473 width=4) (actual 
time=209.384..251.693 rows=20473 loops=1)
Sort Key: imagewidth
->  Seq Scan on exif_common  (cost=0.00..1174.73 rows=20473 width=4) 
(actual time=0.074..94.574 rows=20473 loops=1)
Total runtime: 288.411 ms

(5 rows)




Machine C exhibits the same behaviour as A for all queries.

This weird slow behaviour on machine B also appeared in 7.3.9.  
Upgrading didn't seem to help.


neilc from irc thought it may be a qsort(2) quirk, but a sample C 
program I whipped up testing different sized data sets with a similar 
distribution gave very similar sort timings between the three 
machines..  Therefore, I don't think it's qsort(2) to blame...


Anyone have any ideas as to what may be up with machine B?   


Thanks,
-Elliott

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[PERFORM] How are text columns stored?

2005-06-27 Thread Meetesh Karia

Hi all,

I'm running PG 8.0.3 on WinXP and I'm coming across some performance
issues related to text columns.  Basically, it appears as though
PG is storing the text data inline with the rest of the row data as
queries that don't touch the text column are slower when there is data
in the text column than when there isn't.  According to section
8.3 of the doc:

"Long values are also stored in background tables so they do not interfere with 
rapid access to the shorter column values."

So, how long does a value have to be to be considered "long"?

If necessary, here is some more specific information about what I'm doing:

1) I create a new table and use 'COPY FROM' to populate it.  When
the data in the text column is limited to a max of 60 characters, this
part takes 2-3 minutes less than when the data is at its full
size.  The table will be populated with ~750k rows.  Here's
an example of the table I create (no, I didn't name the fields
"vc_field1", "vc_field2", etc ;) ):

    create table my_table_import
    (
    vc_field1 varchar(255),
    vc_field2 varchar(255),
    vc_field3 varchar(255),
    f_field1 float8,
    text_field1 text,
    ts_field1 timestamp,
    v_field4 varchar(255),
    i_field1 int8,
    i_field2 int8
    );

2) I populate i_field1 and i_field2 from lookup tables.  This step
takes about 7 mins longer with the full text data than with the limited
data.

    update my_table_import
    set i_field1 = f.i_field1,
    i_field2 = u.i_field2
    from lookup1 as f, lookup2 as u
    where vc_field2 = f.vc_field2
    and vc_field1 = u.vc_field1;

3) I then create an index on this table and run a couple of queries on
it.  Each of these queries takes about 10 minutes longer with the
full text data then without it.  Here's the index that I create
and an example of one of the queries that I run:

    create index idx_my_table_import_i1_i2 on my_table_import (i_field1, i_field2);
    analyze my_table_import;

    select i_field1, i_field2, max(ts_field1) as ts_field1, count(*) as dup_count
        from my_table_import
    where i_field1 between 0 and 
    group by i_field1, i_field2

Thanks for the help,
Meetesh Karia

Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread David Mitchell


Hi Gnanavel,

Thanks, but that will only return at most 100 statements. If there is a 
transaction with 110 statements then this will not return all the 
statements for that transaction. We need to make sure that the function 
returns all the statements for a transaction.


Cheers

David

Gnanavel Shanmugam wrote:

Merge the two select statements like this and try,

SELECT t.trans_id as ID,s.id, s.transaction_id, s.table_name, s.op, s.data
   FROM pending_trans AS t join dbmirror.pending_statement AS s
   on (s.transaction_id=t.id)
WHERE t.fetched = false order by t.trans_id,s.id limit 100;

 If the above query works in the way you want, then you can also do the
update
using the same.

with regards,
S.Gnanavel




-Original Message-
From: [EMAIL PROTECTED]
Sent: Tue, 28 Jun 2005 14:37:34 +1200
To: pgsql-performance@postgresql.org
Subject: [PERFORM] How can I speed up this function?

We have the following function in our home grown mirroring package, but
it isn't running as fast as we would like. We need to select statements
from the pending_statement table, and we want to select all the
statements for a single transaction (pending_trans) in one go (that is,
we either select all the statements for a transaction, or none of them).
We select as many blocks of statements as it takes to top the 100
statement limit (so if the last transaction we pull has enough
statements to put our count at 110, we'll still take it, but then we're
done).

Here is our function:

CREATE OR REPLACE FUNCTION dbmirror.get_pending()
  RETURNS SETOF dbmirror.pending_statement AS
$BODY$

DECLARE
count INT4;
transaction RECORD;
statement dbmirror.pending_statement;
BEGIN
count := 0;

FOR transaction IN SELECT t.trans_id as ID
FROM pending_trans AS t WHERE fetched = false
ORDER BY trans_id LIMIT 50
LOOP
update pending_trans set fetched =  true where trans_id =
transaction.id;

FOR statement IN SELECT s.id, s.transaction_id, s.table_name, s.op,
s.data
FROM dbmirror.pending_statement AS s
WHERE s.transaction_id = transaction.id
ORDER BY s.id ASC
LOOP
count := count + 1;

RETURN NEXT statement;
END LOOP;

IF count > 100 THEN
EXIT;
END IF;
END LOOP;

RETURN;
END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

Table Schemas:

CREATE TABLE dbmirror.pending_trans
(
  trans_id oid NOT NULL,
  fetched bool DEFAULT false,
  CONSTRAINT pending_trans_pkey PRIMARY KEY (trans_id)
)
WITHOUT OIDS;

CREATE TABLE dbmirror.pending_statement
(
  id oid NOT NULL DEFAULT nextval('dbmirror.statement_id_seq'::text),
  transaction_id oid NOT NULL,
  table_name text NOT NULL,
  op char NOT NULL,
  data text NOT NULL,
  CONSTRAINT pending_statement_pkey PRIMARY KEY (id)
)
WITHOUT OIDS;

CREATE UNIQUE INDEX idx_stmt_tran_id_id
  ON dbmirror.pending_statement
  USING btree
  (transaction_id, id);

Postgres 8.0.1 on Linux.

Any Help would be greatly appreciated.

Regards

--
David Mitchell
Software Engineer
Telogis

---(end of broadcast)---
TIP 8: explain analyze is your friend



--
David Mitchell
Software Engineer
Telogis

---(end of broadcast)---
TIP 8: explain analyze is your friend

[PERFORM] LIKE search with ending % not optimized in v8

2005-06-27 Thread Aditya Damle

Hello. I believe in earlier versions, a query of the
form 
select attrib from ttt where attrib like 'foo%' would
be able to take advantage of an index. I have seen
this in the past. Currently I am using v8.0.3. From
what I can see is that the execultion plan seems to
use a seq scan and to totally ignore the index. Is
this the case?

-Aditya


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread Gnanavel Shanmugam

Merge the two select statements like this and try,

SELECT t.trans_id as ID,s.id, s.transaction_id, s.table_name, s.op, s.data
   FROM pending_trans AS t join dbmirror.pending_statement AS s
   on (s.transaction_id=t.id)
WHERE t.fetched = false order by t.trans_id,s.id limit 100;

 If the above query works in the way you want, then you can also do the
update
using the same.

with regards,
S.Gnanavel


> -Original Message-
> From: [EMAIL PROTECTED]
> Sent: Tue, 28 Jun 2005 14:37:34 +1200
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] How can I speed up this function?
>
> We have the following function in our home grown mirroring package, but
> it isn't running as fast as we would like. We need to select statements
> from the pending_statement table, and we want to select all the
> statements for a single transaction (pending_trans) in one go (that is,
> we either select all the statements for a transaction, or none of them).
> We select as many blocks of statements as it takes to top the 100
> statement limit (so if the last transaction we pull has enough
> statements to put our count at 110, we'll still take it, but then we're
> done).
>
> Here is our function:
>
> CREATE OR REPLACE FUNCTION dbmirror.get_pending()
>RETURNS SETOF dbmirror.pending_statement AS
> $BODY$
>
> DECLARE
>  count INT4;
>  transaction RECORD;
>  statement dbmirror.pending_statement;
>  BEGIN
>  count := 0;
>
>  FOR transaction IN SELECT t.trans_id as ID
>  FROM pending_trans AS t WHERE fetched = false
>  ORDER BY trans_id LIMIT 50
>  LOOP
>  update pending_trans set fetched =  true where trans_id =
> transaction.id;
>
>   FOR statement IN SELECT s.id, s.transaction_id, s.table_name, s.op,
> s.data
>  FROM dbmirror.pending_statement AS s
>  WHERE s.transaction_id = transaction.id
>  ORDER BY s.id ASC
>  LOOP
>  count := count + 1;
>
>  RETURN NEXT statement;
>  END LOOP;
>
>  IF count > 100 THEN
>  EXIT;
>  END IF;
>  END LOOP;
>
>  RETURN;
>  END;$BODY$
>LANGUAGE 'plpgsql' VOLATILE;
>
> Table Schemas:
>
> CREATE TABLE dbmirror.pending_trans
> (
>trans_id oid NOT NULL,
>fetched bool DEFAULT false,
>CONSTRAINT pending_trans_pkey PRIMARY KEY (trans_id)
> )
> WITHOUT OIDS;
>
> CREATE TABLE dbmirror.pending_statement
> (
>id oid NOT NULL DEFAULT nextval('dbmirror.statement_id_seq'::text),
>transaction_id oid NOT NULL,
>table_name text NOT NULL,
>op char NOT NULL,
>data text NOT NULL,
>CONSTRAINT pending_statement_pkey PRIMARY KEY (id)
> )
> WITHOUT OIDS;
>
> CREATE UNIQUE INDEX idx_stmt_tran_id_id
>ON dbmirror.pending_statement
>USING btree
>(transaction_id, id);
>
> Postgres 8.0.1 on Linux.
>
> Any Help would be greatly appreciated.
>
> Regards
>
> --
> David Mitchell
> Software Engineer
> Telogis
>
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread David Mitchell


Hi Keith,

Unfortunately, we must have those sorts. The statements within a 
transaction must be executed on the slave in the same order as they were 
on the master, and similarly, transactions must also go in the same 
order. As for aliasing the tables, that is just a remnant from previous 
versions of the code.


Thanks

David

Keith Worthington wrote:
I'm still a newbie and it may not affect performance but why are you 
aliasing the tables?  Can you not simply use


FOR transaction IN SELECT trans_id
 FROM pending_trans
WHERE fetched = false
ORDER BY trans_id
LIMIT 50

and

FOR statement IN SELECT id,
transaction_id,
table_name,
op,
data
   FROM dbmirror.pending_statement
  WHERE pending_statement.transaction_id =
transaction.trans_id
  ORDER BY pending_statement.id

I am pretty sure that the ORDER BY is slowing down both of these 
queries.  Since you are going to go through the whole table eventually 
do you really need to sort the data at this point?





--
David Mitchell
Software Engineer
Telogis


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread David Mitchell


Christopher Kings-Lynne wrote:


I'm curious - how did you write a multi-master replication package in 
pgsql, when pgsql doesn't have 2 phase commits or any kind of 
distributed syncing or conflict resolution in a release version?


We didn't write it entirely in pgsql, there is a worker process that 
takes care of actually committing to the database.


Cheers

--
David Mitchell
Software Engineer
Telogis

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread Keith Worthington


David Mitchell wrote:
We have the following function in our home grown mirroring package, but 
it isn't running as fast as we would like. We need to select statements 
from the pending_statement table, and we want to select all the 
statements for a single transaction (pending_trans) in one go (that is, 
we either select all the statements for a transaction, or none of them). 
We select as many blocks of statements as it takes to top the 100 
statement limit (so if the last transaction we pull has enough 
statements to put our count at 110, we'll still take it, but then we're 
done).


Here is our function:

CREATE OR REPLACE FUNCTION dbmirror.get_pending()
  RETURNS SETOF dbmirror.pending_statement AS
$BODY$

DECLARE
count INT4;
transaction RECORD;
statement dbmirror.pending_statement;
BEGIN
count := 0;

FOR transaction IN SELECT t.trans_id as ID
FROM pending_trans AS t WHERE fetched = false
ORDER BY trans_id LIMIT 50
LOOP
update pending_trans set fetched =  true where trans_id = 
transaction.id;


FOR statement IN SELECT s.id, s.transaction_id, s.table_name, 
s.op, s.data

FROM dbmirror.pending_statement AS s
WHERE s.transaction_id = transaction.id
ORDER BY s.id ASC
LOOP
count := count + 1;

RETURN NEXT statement;
END LOOP;

IF count > 100 THEN
EXIT;
END IF;
END LOOP;

RETURN;
END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;


David,

I'm still a newbie and it may not affect performance but why are you 
aliasing the tables?  Can you not simply use


FOR transaction IN SELECT trans_id
 FROM pending_trans
WHERE fetched = false
ORDER BY trans_id
LIMIT 50

and

FOR statement IN SELECT id,
transaction_id,
table_name,
op,
data
   FROM dbmirror.pending_statement
  WHERE pending_statement.transaction_id =
transaction.trans_id
  ORDER BY pending_statement.id

I am pretty sure that the ORDER BY is slowing down both of these 
queries.  Since you are going to go through the whole table eventually 
do you really need to sort the data at this point?


--
Kind Regards,
Keith

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread Christopher Kings-Lynne


What's wrong with Slony?


Because it's not multi-master. Our mirroring package is.


I'm curious - how did you write a multi-master replication package in 
pgsql, when pgsql doesn't have 2 phase commits or any kind of 
distributed syncing or conflict resolution in a release version?


Chris


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread David Mitchell



Christopher Kings-Lynne wrote:

What's wrong with Slony?


Because it's not multi-master. Our mirroring package is.

--
David Mitchell
Software Engineer
Telogis

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] How can I speed up this function?

2005-06-27 Thread Christopher Kings-Lynne


What's wrong with Slony?

David Mitchell wrote:
We have the following function in our home grown mirroring package, but 
it isn't running as fast as we would like. We need to select statements 
from the pending_statement table, and we want to select all the 
statements for a single transaction (pending_trans) in one go (that is, 
we either select all the statements for a transaction, or none of them). 
We select as many blocks of statements as it takes to top the 100 
statement limit (so if the last transaction we pull has enough 
statements to put our count at 110, we'll still take it, but then we're 
done).


Here is our function:

CREATE OR REPLACE FUNCTION dbmirror.get_pending()
  RETURNS SETOF dbmirror.pending_statement AS
$BODY$

DECLARE
count INT4;
transaction RECORD;
statement dbmirror.pending_statement;
BEGIN
count := 0;

FOR transaction IN SELECT t.trans_id as ID
FROM pending_trans AS t WHERE fetched = false
ORDER BY trans_id LIMIT 50
LOOP
update pending_trans set fetched =  true where trans_id = 
transaction.id;


FOR statement IN SELECT s.id, s.transaction_id, s.table_name, 
s.op, s.data

FROM dbmirror.pending_statement AS s
WHERE s.transaction_id = transaction.id
ORDER BY s.id ASC
LOOP
count := count + 1;

RETURN NEXT statement;
END LOOP;

IF count > 100 THEN
EXIT;
END IF;
END LOOP;

RETURN;
END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

Table Schemas:

CREATE TABLE dbmirror.pending_trans
(
  trans_id oid NOT NULL,
  fetched bool DEFAULT false,
  CONSTRAINT pending_trans_pkey PRIMARY KEY (trans_id)
)
WITHOUT OIDS;

CREATE TABLE dbmirror.pending_statement
(
  id oid NOT NULL DEFAULT nextval('dbmirror.statement_id_seq'::text),
  transaction_id oid NOT NULL,
  table_name text NOT NULL,
  op char NOT NULL,
  data text NOT NULL,
  CONSTRAINT pending_statement_pkey PRIMARY KEY (id)
)
WITHOUT OIDS;

CREATE UNIQUE INDEX idx_stmt_tran_id_id
  ON dbmirror.pending_statement
  USING btree
  (transaction_id, id);

Postgres 8.0.1 on Linux.

Any Help would be greatly appreciated.

Regards




---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Michael Fuhr

On Tue, Jun 28, 2005 at 03:03:06AM +, Karl O. Pinc wrote:
> 
> For all your work a documentation patch is appended that
> I think is easier to read and might avoid this problem
> in the future.

Patches should go to the pgsql-patches list -- the people who review
and apply patches might not be following this thread.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[PERFORM] How can I speed up this function?

2005-06-27 Thread David Mitchell

We have the following function in our home grown mirroring package, but 
it isn't running as fast as we would like. We need to select statements 
from the pending_statement table, and we want to select all the 
statements for a single transaction (pending_trans) in one go (that is, 
we either select all the statements for a transaction, or none of them). 
We select as many blocks of statements as it takes to top the 100 
statement limit (so if the last transaction we pull has enough 
statements to put our count at 110, we'll still take it, but then we're 
done).


Here is our function:

CREATE OR REPLACE FUNCTION dbmirror.get_pending()
  RETURNS SETOF dbmirror.pending_statement AS
$BODY$

DECLARE
count INT4;
transaction RECORD;
statement dbmirror.pending_statement;
BEGIN
count := 0;

FOR transaction IN SELECT t.trans_id as ID
FROM pending_trans AS t WHERE fetched = false
ORDER BY trans_id LIMIT 50
LOOP
update pending_trans set fetched =  true where trans_id = 
transaction.id;


	FOR statement IN SELECT s.id, s.transaction_id, s.table_name, s.op, 
s.data

FROM dbmirror.pending_statement AS s
WHERE s.transaction_id = transaction.id
ORDER BY s.id ASC
LOOP
count := count + 1;

RETURN NEXT statement;
END LOOP;

IF count > 100 THEN
EXIT;
END IF;
END LOOP;

RETURN;
END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

Table Schemas:

CREATE TABLE dbmirror.pending_trans
(
  trans_id oid NOT NULL,
  fetched bool DEFAULT false,
  CONSTRAINT pending_trans_pkey PRIMARY KEY (trans_id)
)
WITHOUT OIDS;

CREATE TABLE dbmirror.pending_statement
(
  id oid NOT NULL DEFAULT nextval('dbmirror.statement_id_seq'::text),
  transaction_id oid NOT NULL,
  table_name text NOT NULL,
  op char NOT NULL,
  data text NOT NULL,
  CONSTRAINT pending_statement_pkey PRIMARY KEY (id)
)
WITHOUT OIDS;

CREATE UNIQUE INDEX idx_stmt_tran_id_id
  ON dbmirror.pending_statement
  USING btree
  (transaction_id, id);

Postgres 8.0.1 on Linux.

Any Help would be greatly appreciated.

Regards

--
David Mitchell
Software Engineer
Telogis

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Poor index choice -- multiple indexes of the same

2005-06-27 Thread Karl O. Pinc



On 06/27/2005 09:36:51 PM, Karl O. Pinc wrote:


I'm doing this in a torture test script, loading data.
Every fibnocci number of rows * 100 I VACCUM ANALYZE.
So, 100, 200, 300, 500, 800, etc.


(And of course disconnect my client and re-connect so
as to use the new statistics.  sure would be nice if
I didn't have to do this.)

Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Karl O. Pinc

On 06/27/2005 08:34:19 PM, Michael Fuhr wrote:

On Tue, Jun 28, 2005 at 01:54:08AM +, Karl O. Pinc wrote:
> On 06/27/2005 06:33:03 PM, Michael Fuhr wrote:
>
> >See timeofday().
>
> That only gives you the time at the start of the transaction,
> so you get no indication of how long anything in the
> transaction takes.

Did you read the documentation or try it?  Perhaps you're thinking
of now(), current_timestamp, and friends, which don't advance during
a transaction; but as the documentation states, "timeofday() returns
the wall-clock time and does advance during transactions."

Very sorry.  I did not read through the complete documentation.

I just ran tests on versions of PostgreSQL going back to 7.2.8 and
in all of them timeofday() advanced during a transaction.

For all your work a documentation patch is appended that
I think is easier to read and might avoid this problem
in the future.  If you don't read all the way through the
current cvs version then you might think, as I did,
that timeofday() is a CURRENT_TIMESTAMP related function.

Sorry, but 3 lines wrap in the patch
in my email client.  :(

Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

--- func.sgml   2005-06-26 17:05:35.0 -0500
+++ func.sgml.new   2005-06-27 21:51:05.301097896 -0500
@@ -5787,15 +5787,6 @@

-There is also the function timeofday(), which  
for historical
-reasons returns a text string rather than a  
timestamp value:

-
-SELECT timeofday();
-Result: Sat Feb 17  
19:07:32.000126 2001 EST

-
-   
-
-   
 It is important to know that
 CURRENT_TIMESTAMP and related functions  
return

 the start time of the current transaction; their values do not
@@ -5803,8 +5794,7 @@
 the intent is to allow a single transaction to have a consistent
 notion of the current time, so that multiple
 modifications within the same transaction bear the same
-time stamp. timeofday()
-returns the wall-clock time and does advance during transactions.
+time stamp.

@@ -5815,6 +5805,18 @@

+There is also the function timeofday() which
+returns the wall-clock time and advances during transactions.  For
+historical reasons timeofday() returns a
+text string rather than a timestamp
+value:
+
+SELECT timeofday();
+Result: Sat Feb 17  
19:07:32.000126 2001 EST

+
+   
+
+   
 All the date/time data types also accept the special literal value
 now to specify the current date and time.   
Thus,

 the following three all return the same result:

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Poor index choice -- multiple indexes of the same

2005-06-27 Thread Karl O. Pinc

On 06/27/2005 05:37:41 PM, Josh Berkus wrote:

Karl,

> Seems to me that when there's a constant value in the query
> and an = comparision it will always be faster to use the (b-tree)
> index that's ordered first by the constant value, as then all
further
> blocks are guarenteed to have a higher relevant information
> density.  At least when compared with another index that has the
> same columns in it.

That really depends on the stats.   Such a choice would *not* be
appropriate if the < comparison was expected to return 1- rows while
the =
condition applied to 15% of the table.

We're talking internals here so I don't know what I'm talking
about, but, when the = comparison returns 15% of the table
you can find your way straight to the 1- (sic) relevent rows
because that 15% is further sorted by the second column of the
index.  So that's one disk read and after that when you scan
the rest of the blocks every datum is relevant/returned.
So your scan will pass through fewer disk blocks.  The only
case that would make sense to consider using the other
index is if the planner knew it could
get the answer in 1 disk read, in which case it should be
able to get the answer out of either index in one disk read
as both indexes are on the same columns.

What is your STATISTICS_TARGET for the relevant columns set to?

STATISTICS_TARGET is the default, which I read as 10 the docs.

When's
the last time you ran analyze?

I'm doing this in a torture test script, loading data.
Every fibnocci number of rows * 100 I VACCUM ANALYZE.
So, 100, 200, 300, 500, 800, etc.

Just for grins I've created the index I'd like it to use
and run VACUUM ANALYZE and shown the EXPLAIN ANALYZE below.

If this is all updated, you want to
post
the pg_stats rows for the relevant columns?

Pg_stats rows below.  (I've tried to wrap the lines short
so as not to mess up anybody's mailer.)

# create index census_sname_date on census (sname, date);
CREATE INDEX
# vacuum analyze census;
VACUUM
# explain analyze select 1 from census where date < '1975-9-21'
 and sname = 'RAD' and status != 'A' ;
QUERY
 PLAN
---
---

 Index Scan using census_date_sname on census  (cost=0.00..2169.51
rows=1437 width=0) (actual time=40.610..40.610 rows=0 loops=1)
   Index Cond: ((date < '1975-09-21'::date) AND (sname =
'RAD'::bpchar))
   Filter: (status <> 'A'::bpchar)
 Total runtime: 40.652 ms
(4 rows)

Compare with:

# drop index census_date_sname;
DROP INDEX
# explain analyze select date from census where sname = 'RAD'
 and date < '1975-9-21' and status != 'A' limit 1;
  QUERY
PLAN
---
---
 Limit  (cost=0.00..3.37 rows=1 width=4) (actual time=0.097..0.097
rows=0 loops=1)
   ->  Index Scan using census_sname_date on census   
(cost=0.00..5203.95 rows=1544 width=4) (actual time=0.094..0.094

rows=0 loops=1)
 Index Cond: ((sname = 'RAD'::bpchar) AND (date <
'1975-09-21'::date))
 Filter: (status <> 'A'::bpchar)
 Total runtime: 0.133 ms
(5 rows)

# select * from pg_stats where tablename = 'census' and (attname =
'sname' or attname = 'date');
 schemaname | tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals | most_common_freqs | histogram_bounds |
correlation
+---+-+---+---+---
-+
-+
--
--+---
-+
-
 babase | census | date | 0 | 4 | 4687 |
{1979-02-01,1976-06-16,1977-03-23,1978-08-25,1979-09-20,1971-06-28
,1972-04-28,1972-08-27,1974-04-06,1975-03-19}
|
{0.002,0.0017,0.0017,0.0017,0.0017,0.0013
,0.0013,0.0013,0.0013,0.0013}
|
{1959-07-15,1966-02-18,1969-02-22,1971-01-10,1972-07-26,1974-02-09
,1975-05-27,1976-07-28,1977-08-19,1978-08-07,1979-10-02}
| 1
 babase | census | sname | 0 | 7 | 177 |
{MAX,ALT,PRE,COW,EST,JAN,RIN,ZUM,DUT,LUL} |
{0.017,0.015,0.015,0.0146667
,0.014,0.014,0.0136667,0.0136667,0.013,0.013}
| {ALI,BUN,FAN,IBI,LER,NDO,PET,RUS,SLM,TOT,XEN} | 0.0446897
(2 rows)

Thanks.

Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Michael Fuhr

On Tue, Jun 28, 2005 at 01:54:08AM +, Karl O. Pinc wrote:
> On 06/27/2005 06:33:03 PM, Michael Fuhr wrote:
>
> >See timeofday().
> 
> That only gives you the time at the start of the transaction,
> so you get no indication of how long anything in the
> transaction takes.

Did you read the documentation or try it?  Perhaps you're thinking
of now(), current_timestamp, and friends, which don't advance during
a transaction; but as the documentation states, "timeofday() returns
the wall-clock time and does advance during transactions."

I just ran tests on versions of PostgreSQL going back to 7.2.8 and
in all of them timeofday() advanced during a transaction.  Does it
not work on your system?  If not then something's broken -- what
OS and version of PostgreSQL are you using?

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[PERFORM] Faster drives for WAL than for data?

2005-06-27 Thread Leigh Dyer


Hi all,

My company currently runs a number of both web-based and more 
transactional projects on a PostgreSQL 7.3 server, and we're looking to 
upgrade to a new machine running 8.0 to boost performance and handle 
data growth in to the future.


Right now I'm looking at a Sun Fire V40z server in a fairly modest 
configuration: 2 Opteron 848 (2.2ghz) CPUs, and 4GB of RAM. The V40z has 
6 drive bays, and from earlier posts and the info at 
http://www.powerpostgresql.com/PerfList/ it sounds like the best 
configuration would be:


* 2 drives in RAID 1 for OS and WAL
* 4 drives in RAID 1+0 for data

However, using 73gb 15krpm drives, I'll be limiting myself to about 
140GB of data storage, and I'm not sure if this will be enough to cover 
the life of the server. If I stick with the faster drives for the WAL, 
how significant a performance impact will there be if I use larger 
10krpm drives for the data?


Also, if anyone could recommend a SCSI RAID card for this configuration, 
or if anyone has any other suggestions, it'd be greatly appreciated.


Thanks
Leigh

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Michael Glaesemann

On Jun 28, 2005, at 10:54 AM, Karl O. Pinc wrote:

On 06/27/2005 06:33:03 PM, Michael Fuhr wrote:

On Mon, Jun 27, 2005 at 11:30:45PM +, Karl O. Pinc wrote:
>
> Short of that I think I'm going to be reduced to
> writing a C function that returns the real
> system time so I can spatter my code with
> RAISE statements that indicate actual execution
> time.
See timeofday().

That only gives you the time at the start of the transaction,
so you get no indication of how long anything in the
transaction takes.

I recommend you look again.

Michael Glaesemann
grzm myrealbox com

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Karl O. Pinc



On 06/27/2005 06:33:03 PM, Michael Fuhr wrote:

On Mon, Jun 27, 2005 at 11:30:45PM +, Karl O. Pinc wrote:
>
> Short of that I think I'm going to be reduced to
> writing a C function that returns the real
> system time so I can spatter my code with
> RAISE statements that indicate actual execution
> time.

See timeofday().


That only gives you the time at the start of the transaction,
so you get no indication of how long anything in the
transaction takes.


Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Michael Fuhr

On Mon, Jun 27, 2005 at 11:30:45PM +, Karl O. Pinc wrote:
> 
> Short of that I think I'm going to be reduced to
> writing a C function that returns the real
> system time so I can spatter my code with
> RAISE statements that indicate actual execution
> time.

See timeofday().

http://www.postgresql.org/docs/8.0/static/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Karl O. Pinc


Hi,

I'm having a hard time finding the poorly performing
statements in my plpgsql procedures, many of which
are triggers.  Am I missing something?

I can get the query plans by starting up a new
connection and doing:
SET DEBUG_PRINT_PLAN TO TRUE;
SET CLIENT_MIN_MESSAGES TO DEBUG1;
And then running code that exercises my functions.
Then I can find the queries that, in theory,
could have problems.  But problems remain
after this.

What I'd really like is a SET variable (or maybe
a clause in CREATE FUNCTION) that causes any
functions compiled to issue EXPLAIN ANALYZE output
and the query text itself, to be RAISEd.
Then I could watch the performance as it ran.

Short of that I think I'm going to be reduced to
writing a C function that returns the real
system time so I can spatter my code with
RAISE statements that indicate actual execution
time.

Is there a better approach?
Does anybody have such a C function handy?


Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Forcing use of a particular index

2005-06-27 Thread Jacques Caron


Hi,

At 01:16 28/06/2005, Karl O. Pinc wrote:

http://www.postgresql.org/docs/8.0/static/indexes-examine.html

Says:

"If you do not succeed in adjusting the costs to be more
appropriate, then you may have to resort to forcing index
usage explicitly."

Is there a way to force a query to use a particular index?


Not that I know of.


If not, what does this sentence mean?


That you can force the planner to use an index (any one) over not using an 
index (and using seq scans instead) by setting enable_seqscan to off.


Jacques.



---(end of broadcast)---
TIP 6: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Poor index choice -- multiple indexes of the same columns

2005-06-27 Thread Josh Berkus

Karl,

> Seems to me that when there's a constant value in the query
> and an = comparision it will always be faster to use the (b-tree)
> index that's ordered first by the constant value, as then all further
> blocks are guarenteed to have a higher relevant information
> density.  At least when compared with another index that has the
> same columns in it.

That really depends on the stats.   Such a choice would *not* be 
appropriate if the < comparison was expected to return 1- rows while the = 
condition applied to 15% of the table.

What is your STATISTICS_TARGET for the relevant columns set to?   When's 
the last time you ran analyze?  If this is all updated, you want to post 
the pg_stats rows for the relevant columns?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

[PERFORM] Forcing use of a particular index

2005-06-27 Thread Karl O. Pinc


http://www.postgresql.org/docs/8.0/static/indexes-examine.html

Says:

"If you do not succeed in adjusting the costs to be more
appropriate, then you may have to resort to forcing index
usage explicitly."

Is there a way to force a query to use a particular index?
If not, what does this sentence mean?

Thanks.


Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[PERFORM] Poor index choice -- multiple indexes of the same columns

2005-06-27 Thread Karl O. Pinc


Postgresql 8.0.3

Hi,

I have a query

select 1
 from census
 where date < '1975-9-21' and sname = 'RAD' and status != 'A'
 limit 1;

Explain analyze says it always uses the index made by:

  CREATE INDEX census_date_sname ON census (date, sname);

this is even after I made the index:

  CREATE INDEX census_sname_date ON census (sname, date);

I made census_sname_date because it ran too slow. By deleting
census_date_sname (temporarly, because my apps don't like this)
I can force the use of census_sname_date and the query runs fine.

Seems to me that when there's a constant value in the query
and an = comparision it will always be faster to use the (b-tree)
index that's ordered first by the constant value, as then all further
blocks are guarenteed to have a higher relevant information
density.  At least when compared with another index that has the
same columns in it.

As you might imagine there are relatively few sname values and
relatively many date values in my data.  I use a query like the
above in a trigger to enforce bounds limitations.  I don't
expect (want) to find any rows returned.

I've figured out how to avoid executing this code very often,
so this is not presently a serious problem for me.

Karl <[EMAIL PROTECTED]>
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] perl garbage collector

2005-06-27 Thread Vivek Khera



On Jun 27, 2005, at 4:46 PM, Jean-Max Reymond wrote:


Hi,
I have a stored procedure written in perl and I doubt that perl's
garbage collector is working :-(
after a lot of work, postmaster has a size of 1100 Mb and  I think
that the keyword "undef" has no effects.
Before tuning my procedure, does it exist a known issue, a  
workaround ?


just because your application frees the memory doesn't mean that the  
OS takes it back.   in other words, don't confuse memory usage with  
memory leakage.



Vivek Khera, Ph.D.
+1-301-869-4449 x806




smime.p7s
Description: S/MIME cryptographic signature

[PERFORM] perl garbage collector

2005-06-27 Thread Jean-Max Reymond

Hi, 
I have a stored procedure written in perl and I doubt that perl's
garbage collector is working :-(
after a lot of work, postmaster has a size of 1100 Mb and  I think
that the keyword "undef" has no effects.
Before tuning my procedure, does it exist a known issue, a workaround ?

-- 
Jean-Max Reymond
CKR Solutions Open Source
Nice France
http://www.ckr-solutions.com

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] ETL optimization

2005-06-27 Thread Bricklen Anderson

Dennis Bjorklund wrote:
> On Thu, 23 Jun 2005, Bricklen Anderson wrote:
> 
> 
>>iii. UNIQUE constraint on table "t1". This didn't seem to perform too
>>badly with fewer rows (preliminary tests), but as you'd expect, on error
>>the whole transaction would roll back. Is it possible to skip a row if
>>it causes an error, as opposed to aborting the transaction altogether?
> 
> 
> You don't need to roll back the whole transaction if you use savepoints or 
> the exception features in pl/pgsql
> 
> Take a look at this example:
> 
> http://developer.postgresql.org/docs/postgres/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
> 
Hmmm... forgot about savepoints. That's an interesting idea that I'll have to
check out. I earlier mentioned that I was going to test the delete + insert
version, and it works pretty well. I got it down to about 3 minutes using that
method. I'll test the savepoint and the exception version that you listed as 
well.

Thanks!

-- 
___

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
___

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] [HACKERS] How two perform TPC-H test on postgresql-8.0.2

2005-06-27 Thread Merlin Moncure

[moved to pgsql-performance]
> > Currently I want to take a TPC-H test on postgresql-8.0.2. I
have
> > downloaded the DBGEN and QGEN from the homepage of TPC. But I
> encountered
> > many problems which forced me to request some help. 1. How to load
the
> data
> > from flat file generated by dbgen tool? To the best of my knowledge,
> there
> > is a SQL Loader in Oracle 2. How to simulate the currency
environment?
> > Where can I download a client which connects to DB server through
ODBC?
> 
> Get DBT3 from Sourceforge (search on "osdldbt").  This is OSDL's
TPCH-like
> test.
> 
> However, given your knowledge of PostgreSQL you're unlikely to get any
> kind of
> result you can use -- TPCH requires siginficant database tuning
knowledge.

I don't necessarily agree.  In fact, I remember reading the standards
for one of the TPC benchmarks and it said you were not supposed to
specifically tune for the test.  Any submission, including one with
stock settings, should be given consideration (and the .conf settings
should be submitted along with the benchmark results).  This can only
help to increase the body of knowledge on configuring the database.

Merlin

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application

2005-06-27 Thread Eric Lauzon

i would take a peek at psqlodbc-8.0 drivers ..
i wouldn't battle with other version you might find such as (unixodbc
ones)


-elz


> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Merlin Moncure
> Sent: 27 juin 2005 10:29
> To: grupos
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application
> 
> > Hi !
> > 
> > My company is evaluating to compatibilizate our system (developed in
> > C++) to PostgreSQL.
> > 
> > Our programmer made a lot of tests and he informed me that the 
> > performance using ODBC is very similar than using libpq, even with a
> big
> > number of simultaneous connections/queries. Of course that 
> for us is 
> > simpler use ODBC because will be easier to maintan as we already
> support
> > a lot of other databases using ODBC (MySQL, DB2, etc).
> > 
> > Someone already had this experience? What are the key 
> benefits using 
> > libpq insted of ODBC ?
> > 
> > Our application have a heavy load and around 150 concorrent users.
> 
> The ODBC driver for postgresql implements its own protocol stack.
> Unfortunately, it is still on protocol revision 2 (out of 3). 
>  Also, IMO libpq is a little better tested and durable than 
> the odbc driver.  This naturally follows from the fact that 
> libpq is more widely used and more actively developed than odbc.
> 
> If you are heavily C++ invested you can consider wrapping 
> libpq yourself if you want absolute maximum performance.  If 
> you happen to be developing on Borland platform give strong 
> consideration to Zeos connection library which is very well 
> designed (it wraps libpq).
> 
> You might want to consider posting your question to the odbc list.
> 
> Merlin 
> 
> 
> ---(end of 
> broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
> 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application

2005-06-27 Thread Merlin Moncure

> Hi !
> 
> My company is evaluating to compatibilizate our system (developed in
> C++) to PostgreSQL.
> 
> Our programmer made a lot of tests and he informed me that the
> performance using ODBC is very similar than using libpq, even with a
big
> number of simultaneous connections/queries. Of course that for us is
> simpler use ODBC because will be easier to maintan as we already
support
> a lot of other databases using ODBC (MySQL, DB2, etc).
> 
> Someone already had this experience? What are the key benefits using
> libpq insted of ODBC ?
> 
> Our application have a heavy load and around 150 concorrent users.

The ODBC driver for postgresql implements its own protocol stack.
Unfortunately, it is still on protocol revision 2 (out of 3).  Also, IMO
libpq is a little better tested and durable than the odbc driver.  This
naturally follows from the fact that libpq is more widely used and more
actively developed than odbc.

If you are heavily C++ invested you can consider wrapping libpq yourself
if you want absolute maximum performance.  If you happen to be
developing on Borland platform give strong consideration to Zeos
connection library which is very well designed (it wraps libpq).

You might want to consider posting your question to the odbc list.

Merlin 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application

2005-06-27 Thread grupos


Hi !

My company is evaluating to compatibilizate our system (developed in 
C++) to PostgreSQL.


Our programmer made a lot of tests and he informed me that the 
performance using ODBC is very similar than using libpq, even with a big 
number of simultaneous connections/queries. Of course that for us is 
simpler use ODBC because will be easier to maintan as we already support 
a lot of other databases using ODBC (MySQL, DB2, etc).


Someone already had this experience? What are the key benefits using 
libpq insted of ODBC ?


Our application have a heavy load and around 150 concorrent users.

Regards,

Rodrigo Carvalhaes

--
Esta mensagem foi verificada pelo sistema de antivírus e
acredita-se estar livre de perigo.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Performance - moving from oracle to postgresql

2005-06-27 Thread Merlin Moncure

> There are some immediate questions from our engineers about
performance
> 
> "- Oracle has one particular performance enhancement that Postgres is
> missing.  If you do a select that returns 100,000 rows in a given
order,
> and all you want are rows 99101 to 99200, then Oracle can do that very
> efficiently.  With Postgres, it has to read the first 99200 rows and
> then discard the first 99100.  But...  If we really want to look at
> performance, then we ought to put together a set of benchmarks of some
> typical tasks."

I agree with Rod: you are correct but this is a very odd objection.  You
are declaring a set but are only interested in a tiny subset of that
based on arbitrary critera.  You can do this with cursors or with clever
querying (not without materializing the full set however), but why?  

Merlin


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Insert performance vs Table size

2005-06-27 Thread Jacques Caron


Hi,

At 13:50 27/06/2005, Praveen Raja wrote:

Just to clear things up a bit, the scenario that I'm interested in is a
table with a large number of indexes on it (maybe 7-8).


If you're after performance you'll want to carefully consider which indexes 
are really useful and/or redesign your schema so that you can have less 
indexes on that table. 7 or 8 indexes is quite a lot, and that really has a 
cost.



 In this scenario
other than the overhead of having to maintain the indexes (which I'm
guessing is the same regardless of the size of the table)


Definitely not: indexes grow with the size of the table. Depending on what 
columns you index (and their types), the indexes may be a fraction of the 
size of the table, or they may be very close in size (in extreme cases they 
may even be larger). With 7 or 8 indexes, that can be quite a large volume 
of data to manipulate, especially if the values of the columns inserted can 
span the whole range of the index (rather than being solely id- or 
time-based, for instance, in which case index updates are concentrated in a 
small area of each of the indexes), as this means you'll need to have a 
majority of the indexes in RAM if you want to maintain decent performance.


does the size of the table play a role in determining insert performance 
(and I mean

only insert performance)?


In this case, it's really the indexes that'll cause you trouble, though 
heavily fragmented tables (due to lots of deletes or updates) will also 
incur a penalty just for the data part of the inserts.


Also, don't forget the usual hints if you are going to do lots of inserts:
- batch them in large transactions, don't do them one at a time
- better yet, use COPY rather than INSERT
- in some situations, you might be better of dropping the indexes, doing 
large batch inserts, then re-creating the indexes. YMMV depending on the 
existing/new ratio, whether you need to maintain indexed access to the 
tables, etc.

- pay attention to foreign keys

Jacques.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Insert performance vs Table size

2005-06-27 Thread Praveen Raja

Just to clear things up a bit, the scenario that I'm interested in is a
table with a large number of indexes on it (maybe 7-8). In this scenario
other than the overhead of having to maintain the indexes (which I'm
guessing is the same regardless of the size of the table), does the size
of the table play a role in determining insert performance (and I mean
only insert performance)?

-Original Message-
From: Jacques Caron [mailto:[EMAIL PROTECTED] 
Sent: 27 June 2005 13:40
To: Praveen Raja
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Insert performance vs Table size

Hi,

At 13:24 27/06/2005, Praveen Raja wrote:
>I'm wondering if and how the size of a table affects speed of inserts
>into it? What if the table has indexes, does that alter the answer?

Many parameters will affect the result:
- whether there are any indexes (including the primary key, unique 
constraints...) to update or not
- whether there are any foreign keys from or to that table
- the size of the rows
- whether the table (or at least the bits being updated) fit in RAM or
not
- whether the table has "holes" (due to former updates/deletes and
vacuum) 
and how they are placed
- and probably a bunch of other things...

Obviously, if you have an append-only (no updates, no deletes) table
with 
no indexes and no foreign keys, the size of the table should not matter 
much. As soon as one of those conditions is not met table size will have
an 
impact, probably small as long as whatever is needed can be held in RAM,
a 
lot bigger once it's not the case.

Hope that helps,

Jacques.

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Insert performance vs Table size

2005-06-27 Thread Jacques Caron


Hi,

At 13:24 27/06/2005, Praveen Raja wrote:

I'm wondering if and how the size of a table affects speed of inserts
into it? What if the table has indexes, does that alter the answer?


Many parameters will affect the result:
- whether there are any indexes (including the primary key, unique 
constraints...) to update or not

- whether there are any foreign keys from or to that table
- the size of the rows
- whether the table (or at least the bits being updated) fit in RAM or not
- whether the table has "holes" (due to former updates/deletes and vacuum) 
and how they are placed

- and probably a bunch of other things...

Obviously, if you have an append-only (no updates, no deletes) table with 
no indexes and no foreign keys, the size of the table should not matter 
much. As soon as one of those conditions is not met table size will have an 
impact, probably small as long as whatever is needed can be held in RAM, a 
lot bigger once it's not the case.


Hope that helps,

Jacques.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Insert performance vs Table size

2005-06-27 Thread 李江华

Praveen Raja:

  I think the size of a table don't affect the speed of inserts 
into it.Because PostgreSQL just doing something like "append" on the data files.
  But the index do speed-down the inserts. Because PostgreSQL should 
maintain the index when doing inserts.

  hope this is useful for your question.



=== 2005-06-27 19:24:06 you wrote：===

>Hi all
>
>I'm wondering if and how the size of a table affects speed of inserts
>into it? What if the table has indexes, does that alter the answer?
>
>Thanks
>
>
>
>---(end of broadcast)---
>TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>

= = = = = = = = = = = = = = = = = = = =


　　　Best regards!
  

   李江华
   Seamus Dean
   Alibaba.com
   TEL:0571-85022088-2287
[EMAIL PROTECTED]
2005-06-27

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[PERFORM] Insert performance vs Table size

2005-06-27 Thread Praveen Raja

Hi all

I'm wondering if and how the size of a table affects speed of inserts
into it? What if the table has indexes, does that alter the answer?

Thanks



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

48 matches

Mail list logo