Re: [PERFORM] bad performance on Solaris 10

2006-04-13 Thread Bruce Momjian
Jignesh K. Shah wrote:
 
 Bruce,
 
 Hard to answer that... People like me who know and love PostgreSQL and  
 Solaris finds this as an opportunity to make their favorite database 
 work best on their favorite operating system.
 
 Many times PostgreSQL has many things based on assumption that it will 
 run on  Linux and it is left to Solaris to emulate that behavior.That 
 said there are ways to improve performance even on UFS on Solaris, it 
 just requires more tweaks.
 
 Hopefully this will lead to few Solaris friendly default values  like 
 fsync/odatasync :-)

Yes, if someone wants to give us a clear answer on which wal_sync method
is best on all versions of Solaris, we can easily make that change.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Better index stategy for many fields with few values

2006-04-13 Thread Markus Schaber
Hi, Jim,

Jim C. Nasby wrote:

I was also thinking about about using a functional index.
If there's a logical relation between those values that they can easily
combined, that may be a good alternative.
 How would that be any better than just doing a multi-column index?

10 different values per column, and 20 columns are 10^20 value combinations.

Partitioning it for the first column gives 10^19 combinations which is
smaller than 2^64, and thus fits into a long value.

And I just guess that a 10-partition functional index on a long value
could perform better than a multi-column index on 20 columns of
character(10), if only because it is approx. 1/25th in size.

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Better index stategy for many fields with few values

2006-04-13 Thread Markus Schaber
Hi, Jim,

Jim Nasby wrote:
 Adding -performance back in
 I would like to try it.
 
 However in an other post I added that contrary to what I stated
 initially all the paramXX columns are not mandatory in the query. So
 it seems that requirement make the problem more complexe.

Okay, this rules out my functional index over 19 columns.

 Doesn't this new requirement rule out this solution? 
 
 No, just group the columns logically.

Yes, that's the solution.

If you have common groups of columns that appear and disappear
synchroneously, pack those together in an (possibly partitioned and/or
functional) index.

Then rely on the query planner that the combines the appropriate indices
via index bitmap scan.

  By the way I have test to index each column individually and check
 what happens in relation to bitscan map. My test table  is 1
 million  rows. The explain analyze command shows that a bit scan is
 sometimes used but I still end up with queries that can take up to
 10s which is way to much.

Is it on the first query, or on repeated queries?

It might be that you're I/O bound, and the backend has to fetch indices
and rows from Disk into RAM.

I currently don't know whether the order of indices in a multi-index
bitmap scan is relevant, but I could imagine that it may be useful to
have the most selective index scanned first.

And keep in mind that, assuming an equal distribution of your
parameters, every index bitmap hits 1/10th of the whole table on
average, so the selectivity generally is low.

The selectivity of a partitioned 3-column index will be much better
(about 1/1th of the whole table), and less index scans and bitmaps
have to be generated.

A functional index may also make sense to CLUSTER the table to optimize
the locality of search results (and so reducing disk I/O). In case your
table has low write activity, but high read-only activity, the overhead
that comes with the additional index is neglible compared to the
performance improvement proper CLUSTERing can generate.

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Slow query - possible bug?

2006-04-13 Thread Gavin Hamill
laterooms=# explain analyze select allocation0_.ID as y1_, 
allocation0_.RoomID as y2_, allocation0_.StatusID as y4_, 
allocation0_.Price as y3_, allocation0_.Number as y5_, 
allocation0_.Date as y6_ from Allocation allocation0_ where 
(allocation0_.Date between '2006-06-09 00:00:00.00' and 
'2006-06-09 00:00:00.00')and(allocation0_.RoomID in(4300591));

   QUERY PLAN
--
Index Scan using ix_date on Allocation allocation0_  (cost=0.00..4.77 
rows=1 width=34) (actual time=1411.325..1689.860 rows=1 loops=1)
  Index Cond: ((Date = '2006-06-09'::date) AND (Date = 
'2006-06-09'::date))

  Filter: (RoomID = 4300591)
Total runtime: 1689.917 ms
(4 rows)

Yep, the two dates are identical - yep I would change the client 
software to do where Date = '2006-06-09 00:00:00.00' if I could...


However, it's clear to see why this simple query is taking so long - the 
plan is selecting /all/ dates after 2006-06-09 and /all/ dates before 
then, and only returning the union of the two - a large waste of effort, 
surely?


VACUUM ANALYZE hasn't improved matters... the schema for the table is

ID int8 NOT NULL DEFAULT 
nextval(('public.allocation_id_seq'::text)::regclass),

 RoomID int4,
 Price numeric(10,2),
 StatusID int4,
 Number int4,
 Date date,

and there are indexes kept for 'RoomID' and 'Date' in this 4.3-million 
row table.


Is this a bug or a hidden feature in pg 8.1.3 ? :)

Cheers,
Gavin.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Slow query - possible bug?

2006-04-13 Thread chris smith
On 4/13/06, Gavin Hamill [EMAIL PROTECTED] wrote:
 laterooms=# explain analyze select allocation0_.ID as y1_,
 allocation0_.RoomID as y2_, allocation0_.StatusID as y4_,
 allocation0_.Price as y3_, allocation0_.Number as y5_,
 allocation0_.Date as y6_ from Allocation allocation0_ where
 (allocation0_.Date between '2006-06-09 00:00:00.00' and
 '2006-06-09 00:00:00.00')and(allocation0_.RoomID in(4300591));
 QUERY PLAN
 --
  Index Scan using ix_date on Allocation allocation0_  (cost=0.00..4.77
 rows=1 width=34) (actual time=1411.325..1689.860 rows=1 loops=1)
Index Cond: ((Date = '2006-06-09'::date) AND (Date =
 '2006-06-09'::date))
Filter: (RoomID = 4300591)
  Total runtime: 1689.917 ms
 (4 rows)

1.6secs isn't too bad on 4.3mill rows...

How many entries are there for that date range?

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Slow query - possible bug?

2006-04-13 Thread Gavin Hamill

chris smith wrote:


1.6secs isn't too bad on 4.3mill rows...

How many entries are there for that date range?
 

1.7 secs /is/ good - it typically takes 5 or 6 seconds, which isn't so 
good. My question is 'why does the planner choose such a bizarre range 
request when both elements of the 'between' are identical? :)'


If I replace the
(allocation0_.Date between '2006-06-09 00:00:00.00' and 
'2006-06-09 00:00:00.00')


with

allocation0_.Date ='2006-04-09 00:00:00.00'

then the query comes back in a few milliseconds (as I'd expect :) - and 
yup I've been using different dates for each test to avoid the query 
being cached.


For ref, there are typically 35000 rows per date :)

Cheers,
Gavin.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Slow query - possible bug?

2006-04-13 Thread Richard Huxton

Gavin Hamill wrote:

chris smith wrote:


1.6secs isn't too bad on 4.3mill rows...

How many entries are there for that date range?
 

1.7 secs /is/ good - it typically takes 5 or 6 seconds, which isn't so 
good. My question is 'why does the planner choose such a bizarre range 
request when both elements of the 'between' are identical? :)'


What's bizarre about the range request, and are you sure it's searching 
doing the union of both conditions separately? It looks to me like it's 
doing a standard range-search. If it was trying to fetch 4.3 million 
rows via that index, I'd expect it to use a different index instead.


If you've got stats turned on, look in pg_stat_user_indexes/tables 
before and after the query to see. Here's an example of a similar query 
against one of my log tables. It's small, but the clause is the same, 
and I don't see any evidence of the whole table being selected.


lamp= SELECT * FROM pg_stat_user_indexes WHERE relname LIKE 'act%';
  relid  | indexrelid | schemaname | relname |  indexrelname  | 
idx_scan | idx_tup_read | idx_tup_fetch

-+++-++--+--+---
 6124993 |7519044 | public | act_log | act_log_ts_idx | 
23 |   18 |18
 6124993 |7371115 | public | act_log | act_log_pkey   | 
0 |0 | 0

(2 rows)

lamp= EXPLAIN ANALYSE SELECT * FROM act_log WHERE al_ts BETWEEN 
'2006-04-05 14:10:23+00'::timestamptz AND '2006-04-05 
14:10:23+00'::timestamptz;


QUERY PLAN
-
 Index Scan using act_log_ts_idx on act_log  (cost=0.00..3.02 rows=1 
width=102) (actual time=0.116..0.131 rows=1 loops=1)
   Index Cond: ((al_ts = '2006-04-05 15:10:23+01'::timestamp with time 
zone) AND (al_ts = '2006-04-05 15:10:23+01'::timestamp with time zone))

 Total runtime: 0.443 ms
(3 rows)

lamp= SELECT * FROM pg_stat_user_indexes WHERE relname LIKE 'act%';
  relid  | indexrelid | schemaname | relname |  indexrelname  | 
idx_scan | idx_tup_read | idx_tup_fetch

-+++-++--+--+---
 6124993 |7519044 | public | act_log | act_log_ts_idx | 
24 |   19 |19
 6124993 |7371115 | public | act_log | act_log_pkey   | 
0 |0 | 0

(2 rows)


1. vacuum full verbose your table (and post the output please)
2. perhaps reindex?
3. Try the explain analyse again and see what happens.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pgmemcache

2006-04-13 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:

Why are AFTER COMMIT triggers impossible?
 
 What happens if such a trigger gets an error?  You can't un-commit.

Then it must be specified that those triggers are in their own
transaction, and cannot abort the transaction.

Or use the 2-phase-commit infrastructure for them.

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-13 Thread Gavin Hamill

Tom Lane wrote:


Gavin Hamill [EMAIL PROTECTED] writes:
 


would a simple  #define LWLOCK_PADDED_SIZE  128 be sufficient?
   



Yeah, that's fine.
 



OK I tried that but noticed no real improvement... in the interim I've 
installed Debian on the pSeries (using 
http://debian.gonicus.de/debian/dists/sarge/main/disks-powerpc/current/pseries/install.txt 
) and using a simple load-test script - it picks a 'hotelsearch' select 
at random from a big file and just does a pg_query on that via PHP...


Using apachebench with 10 clients gave a loadavg of about 10 after a few 
minutes, and the logs showed typical query times of 8 seconds. Again, no 
disk activity, normal context-switching, just full-out CPU usage...


We're improving the quality + efficiency of the hotelsearch function all 
the time (Simon will certainly be able to vouch for its complexity) - am 
really uncertain what to do next tho! :/


Cheers,
Gavin.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] index is not used if I include a function that returns current time in my query

2006-04-13 Thread Cristian Veronesi

Hello, postgresql 7.4.8 on SuSE Linux here.

I have a table called DMO with a column called ORA_RIF defined as
timestamp without time zone ;

I created an index on this table based on this column only.

If I run a query against a text literal the index is used:


explain select * from dmo where ora_rif'2006-01-01';

  QUERY PLAN
-
 Index Scan using dmo_ndx02 on dmo  (cost=0.00..1183.23 rows=736 width=156)
   Index Cond: (ora_rif  '2006-01-01 00:00:00'::timestamp without time
zone)

If I try to use a function that returns the current time instead, a
sequential scan is always performed:


explain select * from dmo where ora_riflocaltimestamp;

  QUERY PLAN
--
 Seq Scan on dmo  (cost=0.00..1008253.22 rows=2703928 width=156)
   Filter: (ora_rif  ('now'::text)::timestamp(6) without time zone)

explain select * from dmo where ora_riflocaltimestamp::timestamp 

without time zone;
  QUERY PLAN
--
 Seq Scan on dmo  (cost=0.00..1008253.22 rows=2703928 width=156)
   Filter: (ora_rif  ('now'::text)::timestamp(6) without time zone)

... etc. ...

(tried with all datetime functions with and without cast)

I even tried to write a function that explicitly returns a timestamp
without time zone value:

create or replace function f () returns timestamp without time zone
as '
declare
  x timestamp without time zone ;
begin
  x := ''2006-01-01 00:00:00'';
  return x ;
end ;
' language plpgsql ;

But the result is the same:


explain select * from dmo ora_riff();

 QUERY PLAN
-
 Seq Scan on dmo  (cost=0.00..987973.76 rows=2703928 width=156)
   Filter: (ora_rif  f())

Any suggestion?

Kind regards,

--
Cristian Veronesi - C.R.P.A. S.p.A. - Reggio Emilia, Italy

The first thing you need to learn about databases is that
they are not just a fancy file system for storing data.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow query - possible bug?

2006-04-13 Thread Tom Lane
Gavin Hamill [EMAIL PROTECTED] writes:
 If I replace the
 (allocation0_.Date between '2006-06-09 00:00:00.00' and 
 '2006-06-09 00:00:00.00')
 with
 allocation0_.Date ='2006-04-09 00:00:00.00'
 then the query comes back in a few milliseconds (as I'd expect :)

Could we see EXPLAIN ANALYZE for
* both forms of the date condition, with the roomid condition;
* both forms of the date condition, WITHOUT the roomid condition;
* just the roomid condition

I'm thinking the planner is misestimating something, but it's hard
to tell what without breaking it down.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Better index stategy for many fields with few values

2006-04-13 Thread Oscar Picasso
Hi Markus,Markus Schaber [EMAIL PROTECTED] wrote:Hi, Oscar,Please reply to the list and not privately, so others can learn fromyour replies, and possibly have better Ideas than me.That was my intention. I made a mistake.Oscar Picasso wrote: I cannot group the columns logically. Any column may or may not appear in a query.That's suboptimal. Summrarizing what I have learned: - I cannot use multicolumn indexes because I cannot group the column logically. - I cannot use funtional indexes - I cannot use clustering.You still can have a set of partitioned multi-column indices,overlapping enough that every combination of columns is covered (or riska sequential sub scan for the last two or three columns, this should nothurt too much if the first 17 columns
 were selective enough).The main problem with indices is that they also decrease write performance.If disk costs are not limited, it will make sense to have WAL, table andindices on different disks / raid arrays, to parallelize writes.Btw, I guess you have multiple, concurrent users?Yes I do.I have just made other tests with only the individual indexes and performance is much better than previously. Obviously there was an I/O problem during my initial test.Something interesting though. If I use few columns in the query the results come very quickly and pg does a sequential scan. When it reachs some threshold (4 or 5 columns) pg switches to bitmap scans. It then takes an almost constant time (~ 2500 ms) not matter how many more columns I add to the where clause.Interestingly enough, queries with many columns are less common. They also return less results and even many times no
 result at all. From the user point of view it would be nice to have a waiting time lower than 2500ms for these queries. Maybe I could achieve that goal simply by tuning postgresql. In a such case where should I look first in order to increase bitmap scanning? Maybe I could, that way, avoid the use of partitioned multi-column indexes.Oscar
		Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls.  Great rates starting at 1/min.

Re: [PERFORM] index is not used if I include a function that returns current time in my query

2006-04-13 Thread Tom Lane
Cristian Veronesi [EMAIL PROTECTED] writes:
 If I try to use a function that returns the current time instead, a
 sequential scan is always performed:
 ...
 Any suggestion?

1. Use something newer than 7.4 ;-)

2. Set up a dummy range constraint, ie

select ... where ora_rif  localtimestamp and ora_rif  'infinity';

The problem you have is that the planner doesn't know the value of the
function and falls back to a default assumption about the selectivity of
the '' condition --- and that default discourages indexscans.  (Note
the very large estimate of number of rows returned.)  In the
range-constraint situation, the planner still doesn't know the value of
the function, but its default assumption for a range constraint is
tighter and it (probably) will choose an indexscan.

Since PG 8.0, the planner understands that it's reasonable to
pre-evaluate certain functions like localtimestamp to obtain
better-than-guess values about selectivity, so updating would
be a better fix.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] multi column query

2006-04-13 Thread Jim Nasby
You need to run EXPLAIN ANALYZE. Also, what's random_page_cost set to? And the 
output of \d chkpfw_tr_dy_dimension. The cost for that index scan looks way too 
high.

And please reply-all so that the list is included.

 -Original Message-
 From: Sriram Dandapani [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, April 12, 2006 7:48 PM
 To: Jim Nasby
 Subject: RE: [PERFORM] multi column query
 
 
 I executed enable_seqscan=off and then ran an explain plan on 
 the query
 
 UPDATE chkpfw_tr_dy_dimension
SET summcount = a.summcount + b.summcount,
bytes = a.bytes + b.bytes,
duration = a.duration + b.duration
from chkpfw_tr_dy_dimension a,
 c_chkpfw_dy_tr_updates b
WHERE a.firstoccurrence = b.firstoccurrence
   AND a.customerid_id = b.customerid_id
AND a.sentryid_id = b.sentryid_id
   AND a.node_id = b.node_id
AND a.interface_id = b.interface_id
AND a.source_id = b.source_id
AND a.destination_id = b.destination_id
AND a.sourceport_id = b.sourceport_id
AND a.destinationport_id = b.destinationport_id
AND a.inoutbound_id = b.inoutbound_id
AND a.action_id = b.action_id
AND a.protocol_id = b.protocol_id
AND a.service_id = b.service_id
AND a.sourcezone_id = b.sourcezone_id
AND a.destinationzone_id =
 b.destinationzone_id;
 
 
 
 Here is the query plan
 
 
 Nested Loop  (cost=20036.18..221851442.39 rows=1 width=166)
   -  Merge Join  (cost=10036.18..121620543.75 rows=1 width=96)
 Merge Cond: ((outer.firstoccurrence =
 inner.firstoccurrence) AND (outer.sentryid_id = 
 inner.sentryid_id)
 AND (outer.node_id = inner.node_id))
 Join Filter: ((outer.customerid_id = inner.customerid_id)
 AND (outer.interface_id = inner.interface_id) AND 
 (outer.source_id
 = inner.source_id) AND (outer.destination_id =
 inner.destination_id) AND (outer.sourceport_id = inner.s (..)
 -  Index Scan using chkpfw_tr_dy_idx1 on
 chkpfw_tr_dy_dimension a  (cost=0.00..21573372.84 rows=6281981
 width=88)
 -  Sort  (cost=10036.18..10037.38 rows=480 
 width=136)
   Sort Key: b.firstoccurrence, b.sentryid_id, b.node_id
   -  Seq Scan on c_chkpfw_dy_tr_updates b
 (cost=1.00..10014.80 rows=480 width=136)
   -  Seq Scan on chkpfw_tr_dy_dimension
 (cost=1.00..100168078.81 rows=6281981 width=70)
 
 -Original Message-
 From: Jim C. Nasby [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, April 12, 2006 5:44 PM
 To: Sriram Dandapani
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] multi column query
 
 On Wed, Apr 12, 2006 at 05:32:32PM -0700, Sriram Dandapani wrote:
  Hi
  
  When I update a table that has 20 columns and the where clause
 includes
  16 of the columns (this is a data warehousing type update 
 on aggregate
  fields),
  
  The bitmap scan is not used by the optimizer. The table is 
 indexed on
 3
  of the 20 fields. The update takes really long to finish (on a 6
 million
  row table)
  
  Do I need to do some magic with configuration to turn on bitmap
 scans.
 
 No. What's explain analyze of the query show? What's it doing now?
 Seqscan? You might try set enable_seqscan=off and see what that does.
 -- 
 Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
 Pervasive Software  http://pervasive.comwork: 512-231-6117
 vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461
 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] pgmemcache

2006-04-13 Thread Tom Lane
Christian Storm [EMAIL PROTECTED] writes:
 Not sure if I follow why this is a problem.  Seems like it would be  
 beneficial to have both BEFORE and AFTER COMMIT triggers.
 With the BEFORE COMMIT trigger you would have the ability to 'un- 
 commit' (rollback) the transaction.  With
 the AFTER COMMIT trigger you wouldn't have that option because the  
 commit has already been successful.  However,
 with an AFTER COMMIT you would be able to trigger other downstream  
 events that rely on a transaction successfully committing.

An AFTER COMMIT trigger would have to be in a separate transaction.
What happens if there's more than one, and one of them fails?  Even
more to the point, if it's a separate transaction, don't you have
to fire all these triggers again when you commit that transaction?
The idea seems circular.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] bad performance on Solaris 10

2006-04-13 Thread Robert Lor

Bruce Momjian wrote On 04/13/06 01:39 AM,:
 
 Yes, if someone wants to give us a clear answer on which wal_sync method
 is best on all versions of Solaris, we can easily make that change.
 

We're doing tests to see how various parameters in postgresql.conf
affect performance on Solaris and will share the results shortly.

Regards,
-Robert


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Blocks read for index scans

2006-04-13 Thread Jim Nasby
While working on determining a good stripe size for a database, I  
realized it would be handy to know what the average request size is.  
Getting this info is a simple matter of joining pg_stat_all_tables  
and pg_statio_all_tables and doing some math, but there's one issue  
I've found; it appears that there's no information on how many heap  
blocks were read in by an index scan. Is there any way to get that info?

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] pgmemcache

2006-04-13 Thread Jim Nasby

On Apr 13, 2006, at 12:38 PM, Tom Lane wrote:


Christian Storm [EMAIL PROTECTED] writes:

Not sure if I follow why this is a problem.  Seems like it would be
beneficial to have both BEFORE and AFTER COMMIT triggers.
With the BEFORE COMMIT trigger you would have the ability to 'un-
commit' (rollback) the transaction.  With
the AFTER COMMIT trigger you wouldn't have that option because the
commit has already been successful.  However,
with an AFTER COMMIT you would be able to trigger other downstream
events that rely on a transaction successfully committing.


An AFTER COMMIT trigger would have to be in a separate transaction.
What happens if there's more than one, and one of them fails?  Even
more to the point, if it's a separate transaction, don't you have
to fire all these triggers again when you commit that transaction?
The idea seems circular.


I suspect that in reality you'd probably want each on-commit trigger  
to be it's own transaction, but it depends on what you're doing.  
Also, I can't see any use for them where you'd actually be  
interacting with the database, only if you were calling something  
externally via a function. One example would be sending an email out  
when a certain table changes; in many cases it's better to let the  
change happen even if the email can't be sent, and you'd rather not  
send an email if the transaction just ends up rolling back for some  
reason. And yes, you'd have to ensure you didn't code yourself up a  
trigger loop.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Inserts optimization?

2006-04-13 Thread Francisco Reyes

Chris writes:


If you can, use copy instead:
http://www.postgresql.org/docs/8.1/interactive/sql-copy.html


I am familiar with copy.
Can't use it in this scenario.

The data is coming from a program called Bacula (Backup server).
It is not static data.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Blocks read for index scans

2006-04-13 Thread Jim Nasby
Adding -performance back in...

 From: Steve Poe [mailto:[EMAIL PROTECTED]
 Jim,
 
 I could be way off, but doesn't from pg_statio_user_tables 
 contain this
 information?

http://www.postgresql.org/docs/8.1/interactive/monitoring-stats.html#MONITORING-STATS-VIEWS
 states:

numbers of disk blocks read and buffer hits in all indexes of that table

That leads me to believe that it's only tracking index blocks read, and not 
heap blocks read. One could presume that each index row read as reported by 
pg_stat_all_tables would represent a heap block read, but a large number of 
those would (hopefully) have already been in shared_buffers.

 On Thu, 2006-04-13 at 13:00 -0500, Jim Nasby wrote:
  While working on determining a good stripe size for a database, I  
  realized it would be handy to know what the average request 
 size is.  
  Getting this info is a simple matter of joining pg_stat_all_tables  
  and pg_statio_all_tables and doing some math, but there's 
 one issue  
  I've found; it appears that there's no information on how 
 many heap  
  blocks were read in by an index scan. Is there any way to 
 get that info?
--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Inserts optimization?

2006-04-13 Thread Francisco Reyes

Tom Lane writes:


Or at least try to do multiple inserts per transaction.


Will see if the program has an option like that.



Also, increasing checkpoint_segments and possibly wal_buffers helps a


Will try those.


Try to get the WAL onto a separate disk
spindle if you can.  (These things don't matter for SELECTs, but they
do matter for writes.)


This particular server is pretty much what I inherited for now for this 
project.and its Raid 5. There is a new server I am setting up 
soon... 8 disks which we are planning to setup

6 disks in RAID 10
2 Hot spares

In RAID 10 would it matter that WALL is in the same RAID set?
Would it be better:
4 disks in RAID10 Data
2 disks RAID 1 WALL
2 hot spares

All in the same RAID controller

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] bad performance on Solaris 10

2006-04-13 Thread Merlin Moncure
On 4/12/06, Josh Berkus josh@agliodbs.com wrote:
 People,

  Lately I find people are not so receptive to VxFS, and Sun is promoting
  ZFS, and we don't have a reasonable near term option for Raw IO in
  Postgres, so we need to work to find a reasonable path for Solaris users
  IMO. The long delays in ZFS production haven't helped us there, as the
  problems with UFS are severe.

I just recently worked with sun solaris 10 and found it to be
reasonably performant without much tuning.  This was on a dual sparc
sunblade workstation which i felt was very well engineered.  I was
able (with zero solaris experience) to get postgresql up and crunching
away at some really data intensive tasks while running an application
compiled their very excellent fortran compiler.

In the enterprise world I am finding that the only linux distrubutions
supported are redhat and suse, meaning if you have a problem with your
san running against your gentoo box you have a serious problem.
Solaris OTOH, is generally very well supported (especially on sun
hardware) and is free.  So I give sun great credit for providing a
free if not necessarily completely open platform for developing open
source applications in an enterprise environment.

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] multi column query

2006-04-13 Thread Sriram Dandapani
Hi Jim

The problem is fixed. The destination table that was being updated had 3
separate indexes. I combined them to a multi-column index and the effect
was amazing.
Thanks for your input

Sriram

-Original Message-
From: Jim Nasby [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 13, 2006 9:42 AM
To: Sriram Dandapani
Cc: Pgsql-Performance (E-mail)
Subject: RE: [PERFORM] multi column query

You need to run EXPLAIN ANALYZE. Also, what's random_page_cost set to?
And the output of \d chkpfw_tr_dy_dimension. The cost for that index
scan looks way too high.

And please reply-all so that the list is included.

 -Original Message-
 From: Sriram Dandapani [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, April 12, 2006 7:48 PM
 To: Jim Nasby
 Subject: RE: [PERFORM] multi column query
 
 
 I executed enable_seqscan=off and then ran an explain plan on 
 the query
 
 UPDATE chkpfw_tr_dy_dimension
SET summcount = a.summcount + b.summcount,
bytes = a.bytes + b.bytes,
duration = a.duration + b.duration
from chkpfw_tr_dy_dimension a,
 c_chkpfw_dy_tr_updates b
WHERE a.firstoccurrence = b.firstoccurrence
   AND a.customerid_id = b.customerid_id
AND a.sentryid_id = b.sentryid_id
   AND a.node_id = b.node_id
AND a.interface_id = b.interface_id
AND a.source_id = b.source_id
AND a.destination_id = b.destination_id
AND a.sourceport_id = b.sourceport_id
AND a.destinationport_id = b.destinationport_id
AND a.inoutbound_id = b.inoutbound_id
AND a.action_id = b.action_id
AND a.protocol_id = b.protocol_id
AND a.service_id = b.service_id
AND a.sourcezone_id = b.sourcezone_id
AND a.destinationzone_id =
 b.destinationzone_id;
 
 
 
 Here is the query plan
 
 
 Nested Loop  (cost=20036.18..221851442.39 rows=1 width=166)
   -  Merge Join  (cost=10036.18..121620543.75 rows=1 width=96)
 Merge Cond: ((outer.firstoccurrence =
 inner.firstoccurrence) AND (outer.sentryid_id = 
 inner.sentryid_id)
 AND (outer.node_id = inner.node_id))
 Join Filter: ((outer.customerid_id = inner.customerid_id)
 AND (outer.interface_id = inner.interface_id) AND 
 (outer.source_id
 = inner.source_id) AND (outer.destination_id =
 inner.destination_id) AND (outer.sourceport_id = inner.s (..)
 -  Index Scan using chkpfw_tr_dy_idx1 on
 chkpfw_tr_dy_dimension a  (cost=0.00..21573372.84 rows=6281981
 width=88)
 -  Sort  (cost=10036.18..10037.38 rows=480 
 width=136)
   Sort Key: b.firstoccurrence, b.sentryid_id, b.node_id
   -  Seq Scan on c_chkpfw_dy_tr_updates b
 (cost=1.00..10014.80 rows=480 width=136)
   -  Seq Scan on chkpfw_tr_dy_dimension
 (cost=1.00..100168078.81 rows=6281981 width=70)
 
 -Original Message-
 From: Jim C. Nasby [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, April 12, 2006 5:44 PM
 To: Sriram Dandapani
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] multi column query
 
 On Wed, Apr 12, 2006 at 05:32:32PM -0700, Sriram Dandapani wrote:
  Hi
  
  When I update a table that has 20 columns and the where clause
 includes
  16 of the columns (this is a data warehousing type update 
 on aggregate
  fields),
  
  The bitmap scan is not used by the optimizer. The table is 
 indexed on
 3
  of the 20 fields. The update takes really long to finish (on a 6
 million
  row table)
  
  Do I need to do some magic with configuration to turn on bitmap
 scans.
 
 No. What's explain analyze of the query show? What's it doing now?
 Seqscan? You might try set enable_seqscan=off and see what that does.
 -- 
 Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
 Pervasive Software  http://pervasive.comwork: 512-231-6117
 vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461
 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] pgmemcache

2006-04-13 Thread PFC



An AFTER COMMIT trigger would have to be in a separate transaction.


	I guess AFTER COMMIT triggers would be like a NOTIFY, but more powerful.  
While NOTIFY can't transmit information to another process, this trigger  
could, and the other process could then view the results of the commited  
transaction.
	Also, implementing a long process (something involving network  
roundtrips, for instance) in a BEFORE COMMIT trigger would delay the  
transaction and any locks it holds with no benefit.



What happens if there's more than one, and one of them fails?


Each one in its own transaction ?


Even more to the point, if it's a separate transaction, don't you have
to fire all these triggers again when you commit that transaction?
The idea seems circular.


	I guess AFTER COMMIT triggers are most useful when coupled to a trigger  
on a modification to a table. So, the before / after commit could be an  
attribute of an AFTER INSERT/UPDATE/DELETE trigger. If the AFTER COMMIT  
trigger doesn't do any modifications to the target table, there will be no  
infinite loop.


	The before/after commit could also be implemented not via triggers, but  
via deferred actions, by telling postgres to execute a specific query just  
before/after the transaction commits. This could be used to implement the  
triggers, but would also be more generic : a trigger on INSERT could then  
defer a call to memcache update once the transaction is commited. It gets  
lisp-ish, but it would be really cool?


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pgmemcache

2006-04-13 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

 Christian Storm [EMAIL PROTECTED] writes:
  Not sure if I follow why this is a problem.  Seems like it would be  
  beneficial to have both BEFORE and AFTER COMMIT triggers.
  With the BEFORE COMMIT trigger you would have the ability to 'un- 
  commit' (rollback) the transaction.  With
  the AFTER COMMIT trigger you wouldn't have that option because the  
  commit has already been successful.  However,
  with an AFTER COMMIT you would be able to trigger other downstream  
  events that rely on a transaction successfully committing.
 
 An AFTER COMMIT trigger would have to be in a separate transaction.
 What happens if there's more than one, and one of them fails?  Even
 more to the point, if it's a separate transaction, don't you have
 to fire all these triggers again when you commit that transaction?
 The idea seems circular.

Maybe it just means they would have to be limited to not making any database
modifications. Ie, all they can do is notify the outside world that the
transaction committed. 

Presumably if you wanted to make any database modifications you would just do
it in the transaction anyways since they wouldn't show up until the
transaction commits.

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Blocks read for index scans

2006-04-13 Thread Tom Lane
Jim Nasby [EMAIL PROTECTED] writes:
 While working on determining a good stripe size for a database, I  
 realized it would be handy to know what the average request size is.  
 Getting this info is a simple matter of joining pg_stat_all_tables  
 and pg_statio_all_tables and doing some math, but there's one issue  
 I've found; it appears that there's no information on how many heap  
 blocks were read in by an index scan. Is there any way to get that info?

If the table is otherwise idle, the change in the table's entry in
pgstatio_all_tables should do, no?

(This is as of 8.1 ... older versions acted a bit differently IIRC.)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] pgmemcache

2006-04-13 Thread Tom Lane
PFC [EMAIL PROTECTED] writes:
   I guess AFTER COMMIT triggers would be like a NOTIFY, but more 
 powerful.  

I'll let you in on a secret: NOTIFY is actually a before-commit
operation.  This is good enough because it never, or hardly ever,
fails.  I would argue that anything you want to do in an AFTER COMMIT
trigger could just as well be done in a BEFORE COMMIT trigger; if that's
not reliable enough then you need to fix your trigger.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] pg 7.4.x - pg_restore impossibly slow

2006-04-13 Thread patrick keshishian
Hi Tom, et.al.,

So I changed the following settings in postgresql.conf file and
restarted PostgreSQL and then proceeded with pg_restore:

# new changes for this test-run
log_statement = true
sort_mem = 10240 # default 1024
vacuum_mem = 20480 # default 8192
# from before
checkpoint_segments = 10
log_pid = true
log_timestamp = true

With these settings and running:

pg_restore -vaOd dbname dbname.DUMP


Things seem to progress better. The first of the large
tables got COPY'ed within 1 hr 40 min:

start: 2006-04-13 11:44:19
finish: 2006-04-13 13:25:36

I ended up ctrl-C'ing out of the pg_restore as the second
large table was taking over 3 hours and the last PostgreSQL
log entry was from over 2.5hrs ago, with message:

2006-04-13 14:09:29 [3049] LOG:  recycled transaction log file
0006006B

Time for something different. Before attempting the same
procedure with fsync off, I ran the following sequence of
commands:

$ dropdb dbname
$ createdb dbname
$ pg_restore -vsOd dbname dbname.DUMP
$ date  db.restore ; pg_restore -vcOd dbname \
dbname.DUMP ; date  db.restore
$ cat db.restore
Thu Apr 13 18:02:51 PDT 2006
Thu Apr 13 18:17:16 PDT 2006

That's just over 14 minutes!

Ideas?

Is this because the -c option drops all foreign keys and
so the restore goes faster?  Should this be the preferred,
recommended and documented method to run pg_restore?
Any drawbacks to this method?

Thanks,
--patrick




On 4/12/06, Tom Lane [EMAIL PROTECTED] wrote:
 patrick keshishian [EMAIL PROTECTED] writes:
  My dev box is much slower hardware than the customer's
  server.  Even with that difference I expected to be able to
  pg_restore the database within one day.  But no.

 Seems a bit odd.  Can you narrow down more closely which step of the
 restore is taking the time?  (Try enabling log_statements.)

 One thought is that kicking up work_mem and vacuum_mem is likely to
 help for some steps (esp. CREATE INDEX and foreign-key checking).
 And be sure you've done the usual tuning for write-intensive activity,
 such as bumping up checkpoint_segments.  Turning off fsync wouldn't
 be a bad idea either.

 regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] pg 7.4.x - pg_restore impossibly slow

2006-04-13 Thread Tom Lane
patrick keshishian [EMAIL PROTECTED] writes:
 With these settings and running:
 pg_restore -vaOd dbname dbname.DUMP

If you had mentioned you were using random nondefault switches, we'd
have told you not to.  -a in particular is a horrid idea performancewise
--- a standard schema-plus-data restore goes way faster because it's
doing index builds and foreign key checks wholesale instead of
incrementally.

 Is this because the -c option drops all foreign keys and
 so the restore goes faster?  Should this be the preferred,
 recommended and documented method to run pg_restore?

It is documented in recent versions of the documentation: see
http://www.postgresql.org/docs/8.1/static/populate.html
particularly the last section.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster