Re: [PERFORM] Table UPDATE is too slow

2004-08-31 Thread Thomas F . O'Connell
What is the datatype of the id column?
-tfo
On Aug 31, 2004, at 1:11 PM, Ron St-Pierre wrote:
We have a web based application with data that is updated daily. The 
biggest bottleneck occurs when we try to update
one of the tables. This table contains 58,000 rows and 62 columns, and 
EVERY column is indexed. Every column is
queryable (?) by the users through the web interface so we are 
reluctant to remove the indexes (recreating them would
be time consuming too). The primary key is an INT and the rest of the 
columns are a mix of NUMERIC, TEXT, and DATEs.
A typical update is:
  UPDATE dataTable SET field01=44.5, field02=44.5, field03='Bob',
field04='foo', ... , field60='2004-08-30', field61='2004-08-29'
WHERE id = 1234;

Also of note is that the update is run about 10 times per day; we get 
blocks of data from 10 different sources, so we pre-process the
data and then update the table. We also run VACUUM FULL ANALYZE on a 
nightly basis.
Does anyone have some idea on how we can increase speed, either by 
changing the updates, designing the database
differently, etc, etc? This is currently a big problem for us.
Other notables:
  The UPDATE is run from a within a function: FOR rec IN SELECT 
...LOOP RETURN NEXT rec; UPDATE dataTable.
  Postgres 7.4.3
  debian stable
  2 GB RAM
  80 DB IDE drive (we can't change it)
  shared_buffers = 2048
  sort_mem = 1024max_fsm_pages = 4
  checkpoint_segments = 5
  random_page_cost = 3
Thanks
Ron

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance Anomalies in 7.4.5

2004-10-21 Thread Thomas F . O'Connell
I know, I know: I should've done this before I posted. REINDEXing and  
VACUUMing mostly fixed this problem. Which gets me back to where I was  
yesterday, reviewing an import process (that existed previously) that  
populates tables in this system that seems to allow small data sets to  
cause simple queries like this to crawl. Is there anything about  
general COPY/INSERT activity that can cause small data sets to become  
so severely slow in postgres that can be prevented other than being  
diligent about VACUUMing? I was hoping that pg_autovacuum along with  
post-import manual VACUUMs would be sufficient, but it doesn't seem to  
be the case necessarily. Granted, I haven't done a methodical and  
complete review of the process, but I'm still surprised at how quickly  
it seems able to debilitate postgres with even small amounts of data. I  
had a similar situation crawl yesterday based on a series of COPYs  
involving 5 rows!

As in, can I look for something to treat the cause rather than the  
symptoms?

If not, should I be REINDEXing manually, as well as VACUUMing manually  
after large data imports (whether via COPY or INSERT)? Or will a VACUUM  
FULL ANALYZE be enough?

Thanks!
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Oct 21, 2004, at 3:36 PM, Thomas F.O'Connell wrote:
I'm seeing some weird behavior on a repurposed server that was wiped  
clean and set up to run as a database and application server with  
postgres and Apache, as well as some command-line PHP scripts.

The box itself is a quad processor (2.4 GHz Intel Xeons) Debian woody  
GNU/Linux (2.6.2) system.

postgres is crawling on some fairly routine queries. I'm wondering if  
this could somehow be related to the fact that this isn't a  
database-only server, but Apache is not really using any resources  
when postgres slows to a crawl.

Here's an example of analysis of a recent query:
EXPLAIN ANALYZE SELECT COUNT(DISTINCT u.id)
FROM userdata as u, userdata_history as h
WHERE h.id = '18181'
AND h.id = u.id;

QUERY PLAN
--- 
--- 
--
 Aggregate  (cost=0.02..0.02 rows=1 width=8) (actual  
time=298321.421..298321.422 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..0.01 rows=1 width=8) (actual  
time=1.771..298305.531 rows=2452 loops=1)
 Join Filter: (inner.id = outer.id)
 -  Seq Scan on userdata u  (cost=0.00..0.00 rows=1 width=8)  
(actual time=0.026..11.869 rows=2452 loops=1)
 -  Seq Scan on userdata_history h  (cost=0.00..0.00 rows=1  
width=8) (actual time=0.005..70.519 rows=41631 loops=2452)
   Filter: (id = 18181::bigint)
 Total runtime: 298321.926 ms
(7 rows)

userdata has a primary/foreign key on id, which references  
userdata_history.id, which is a primary key.

At the time of analysis, the userdata table had  2,500 rows.  
userdata_history had  50,000 rows. I can't imagine how even a seq  
scan could result in a runtime of nearly 5 minutes in these  
circumstances.

Also, doing a count( * ) from each table individually returns nearly  
instantly.

I can provide details of postgresql.conf and kernel settings if  
necessary, but I'm using some pretty well tested settings that I use  
any time I admin a postgres installation these days based on box  
resources and database size. I'm more interested in knowing if there  
are any bird's eye details I should be checking immediately.

Thanks.
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
---(end of  
broadcast)---
TIP 8: explain analyze is your friend

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Triggers During COPY

2005-01-27 Thread Thomas F . O'Connell
I'm involved in an implementation of doing trigger-based counting as a 
substitute for count( * ) in real time in an application. My 
trigger-based counts seem to be working fine and dramatically improve 
the performance of the display of the counts in the application layer.

The problem comes in importing new data into the tables for which the 
counts are maintained. The current import process does some 
preprocessing and then does a COPY from the filesystem to one of the 
tables on which counts are maintained. This means that for each row 
being inserted by COPY, a trigger is fired. This didn't seem like a big 
deal to me until testing began on realistic data sets.

For a 5,000-record import, preprocessing plus the COPY took about 5 
minutes. Once the triggers used for maintaining the counts were added, 
this grew to 25 minutes. While I knew there would be a slowdown per row 
affected, I expected something closer to 2x than to 5x.

It's not unrealistic for this system to require data imports on the 
order of 100,000 records. Whereas this would've taken at most an hour 
and a half before (preprocessing takes a couple of minutes, so the 
actual original COPY takes closer to 2-3 minutes, or just over 1500 
rows per minute), the new version is likely to take more than 7 hours, 
which seems unreasonable to me. Additionally, the process is fairly CPU 
intensive.

I've examined the plans, and, as far as I can tell, the trigger 
functions are being prepared and using the indexes on the involved 
tables, which are hundreds of thousands of rows in the worst cases. The 
basic structure of the functions is a status lookup SELECT (to 
determine whether a count needs to be updated and which one) and one or 
two UPDATE statements (depending on whether both an increment and a 
decrement need to be performed). As I said, it looks like this basic 
format is using indexes appropriately.

Is there anything I could be overlooking that would tweak some more 
performance out of this scenario?

Would it be absurd to drop the triggers during import and recreate them 
afterward and update the counts in a summary update based on 
information from the import process?

-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Triggers During COPY

2005-01-27 Thread Thomas F . O'Connell
I forgot to mention that I'm running 7.4.6. The README includes the 
caveat that pgmemcache is designed for use with 8.0. My instinct is to 
be hesitant using something like that in a production environment 
without some confidence that people have done so with good and reliable 
success or without more extensive testing than I'm likely to have time 
for primarily because support for 7.4.x is never likely to increase. 
Thanks for the tip, though.

For the time being, it sounds like I'll probably try to implement the 
drop/create trigger setup during import.

-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Jan 27, 2005, at 11:41 PM, Josh Berkus wrote:
Thomas,
Would it be absurd to drop the triggers during import and recreate 
them
afterward and update the counts in a summ ary update based on
information from the import process?
That's what I'd do.
Also, might I suggest storing the counts in memcached (see the 
pgmemached
project on pgFoundry) rather than in a table?

--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] [ADMIN] Too slow

2005-03-22 Thread Thomas F . O'Connell
Please post the results of that query as run through EXPLAIN ANALYZE.
Also, I'm going to reply to this on pgsql-performance, which is 
probably where it better belongs.

-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Mar 22, 2005, at 8:23 AM, Sabio - PSQL wrote:
How can I improve speed on my queries. For example this query takes 
one day executing itself and it has not finalized !!!
create table tmp_partes as select * from partes where identificacion 
not in (select cedula from sujetos)

partes have 1888000 rows, an index on identificacion
sujetos have 550 rows, an index on cedula

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] Configuration/Tuning of server/DB

2005-03-28 Thread Thomas F . O'Connell
Reid,
There are a few very valuable tuning documents that are part of the  
established PostgreSQL-related literature. You don't mention which  
version of postgres you'll be running, but here are the documents  
you'll find useful:

postgresql.conf
7.4:  
http://www.varlena.com/varlena/GeneralBits/Tidbits/ 
annotated_conf_e.html
8.0: http://www.powerpostgresql.com/Downloads/annotated_conf_80.html

general tuning
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
Strategic Open Source  Open Your i
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Mar 24, 2005, at 1:46 PM, Reid Thompson wrote:
Using information found on the web, I've come up with some  
configuration and tuning parameters for a server/db that we will be  
implementing.  I was wondering if I could generate some feedback as to  
configuration and tuning so that I could compare my estimations with  
those of others.

Host is AIX 5.1 with 4 cpu's and 4 GB ram.  Postgresql will be sharing  
this machine with other processes.  Storage is an EMC storage array.  
The DB itself is very simple.  Two tables, one with 40-45 columns (  
largest column will likely contain no more than 32 chars of data ),  
the other with less than 5 columns ( largest column will contain no  
more than 20 chars data ).  Expected transactions will be along the  
order of ~600K +- 100K inserts and ~600K +-200K updates per week.

Thanks
---(end of  
broadcast)---
TIP 1: subscribe and unsubscribe commands go to  
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Follow-Up: How to improve db performance with $7K?

2005-04-06 Thread Thomas F . O'Connell
Things might've changed somewhat over the past year, but this is from 
_the_ Linux guy at Dell...

-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
Strategic Open Source  Open Your i
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
Date: Mon, 26 Apr 2004 14:15:02 -0500
From: Matt Domsch [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: PERC3/Di failure workaround hypothesis
--uXxzq0nDebZQVNAZ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Mon, Apr 26, 2004 at 11:10:36AM -0500, Sellek, Greg wrote:
 Short of ordering a Perc4 for every 2650 that I want to upgrade to RH
 ES, is there anything else I can do to get around the Perc3/Di
 problem?
Our working hypothesis for a workaround is to do as follows:
In afacli, set:
Read Cache: enabled
Write Cache: enabled when protected
Then unplug the ROMB battery.  A reboot is not necessary.  The firmware 
will immediately drop into Write-Through Cache mode, which in our 
testing has not exhibited the problem.  Setting the write cache to 
disabled in afacli doesn't seem to help - you've got to unplug the 
battery with it in the above settings.

We are continuing to search for the root cause to the problem, and will 
update the list when we can.

Thanks,
Matt
--
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions linux.dell.com  www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
On Apr 5, 2005, at 11:44 PM, Kevin Brown wrote:
Thomas F.O'Connell wrote:
I'd use two of your drives to create a mirrored partition where 
pg_xlog
resides separate from the actual data.

RAID 10 is probably appropriate for the remaining drives.
Fortunately, you're not using Dell, so you don't have to worry about
the Perc3/Di RAID controller, which is not so compatible with
Linux...
Hmm...I have to wonder how true this is these days.
My company has a Dell 2500 with a Perc3/Di running Debian Linux, with
the 2.6.10 kernel.  The controller seems to work reasonably well,
though I wouldn't doubt that it's slower than a different one might
be.  But so far we haven't had any reliability issues with it.
Now, the performance is pretty bad considering the setup -- a RAID 5
with five 73.6 gig SCSI disks (10K RPM, I believe).  Reads through the
filesystem come through at about 65 megabytes/sec, writes about 35
megabytes/sec (at least, so says bonnie -s 8192).  This is on a
system with a single 3 GHz Xeon and 1 gigabyte of memory.  I'd expect
much better read performance from what is essentially a stripe of 4
fast SCSI disks.
While compatibility hasn't really been an issue, at least as far as
the basics go, I still agree with your general sentiment -- stay away
from the Dells, at least if they have the Perc3/Di controller.  You'll
probably get much better performance out of something else.
--
Kevin Brown   [EMAIL PROTECTED]
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] pgbench Comparison of 7.4.7 to 8.0.2

2005-04-15 Thread Thomas F . O'Connell
I'm in the fortunate position of having a newly built database server 
that's pre-production. I'm about to run it through the ringer with some 
simulations of business data and logic, but I wanted to post the 
results of some preliminary pgbench marking.

http://www.sitening.com/pgbench.html
To me, it looks like basic transactional performance is modestly 
improved at 8.0 across a variety of metrics. I think this bodes well 
for more realistic loads, but I'll be curious to see the results of 
some of the simulations.

I've still got a little bit of preparatory time with this box, so I can 
continue to do some experimentation.

I'd be curious to see whether these numbers meet developer expectations 
and to see whether the developer and user community have insight into 
other pgbench options that would be useful to see.

Thanks!
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
Strategic Open Source: Open Your i
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] pgbench Comparison of 7.4.7 to 8.0.2

2005-04-23 Thread Thomas F . O'Connell
Steve,
Per your and Tom's recommendations, I significantly increased the 
number of transactions used for testing. See my last post.

The database will have pretty heavy mixed use, i.e., both reads and 
writes.

I performed 32 iterations per scenario this go-round.
I'll look into OSDB for further benchmarking. Thanks for the tip.
Since pgbench is part of the postgres distribution and I had it at hand 
and it seems to be somewhat widely referenced, I figured I go ahead and 
post preliminary results from it.

-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
Strategic Open Source: Open Your i
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Apr 15, 2005, at 4:24 PM, Steve Poe wrote:
Tom,
People's opinions on pgbench may vary, so take what I say with a grain 
of salt. Here are my thoughts:

1) Test with no less than 200 transactions per client. I've heard with 
less than this, your results will vary too much with the direction of 
the wind blowing. A high enough value will help rule out some noise 
factor. If I am wrong, please let me know.

2) How is the database going to  be used?  What percentage will be 
read/write if you had to guess? Pgbench is like a TPC-B with will help 
guage the potential throughput of your tps. However, it may not stress 
the server enough to help you make key performance changes. However, 
benchmarks are like statistics...full of lies g.

3) Run not just a couple pgbench runs, but *many* (I do between 20-40 
runs) so you can rule out noise and guage improvement on median 
results.

4) Find something that you test OLTP-type transactions. I used OSDB 
since it is simple to implement and use. Although OSDL's OLTP testing 
will closer to reality.

Steve Poe

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] pgbench Comparison of 7.4.7 to 8.0.2

2005-04-25 Thread Thomas F . O'Connell
Interesting. I should've included standard deviation in my pgbench 
iteration patch. Maybe I'll go back and do that.

I was seeing oscillation across the majority of iterations in the 25 
clients/1000 transaction runs on both database versions.

I've got my box specs and configuration files posted. If you see 
anything obvious about the tuning parameters that should be tweaked, 
please let me know.

Thanks for the feedback!
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
Strategic Open Source: Open Your i
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Apr 25, 2005, at 1:58 AM, Steve Poe wrote:
There was some interesting oscillation behavior in both version of 
postgres that occurred with 25 clients and 1000 transactions at a 
scaling factor of 100. This was repeatable with the distribution 
version of pgbench run iteratively from the command line. I'm not 
sure how to explain this.

Tom,
When you see these oscillations, do they occur after so many generated 
results? Some oscillation is normal, in my opinion, from 10-15% of the 
performance is noise-related.

The key is to tune the server that you either 1) minimize the 
oscillation and/or 2)increase your overall performance above the 
10-15% baseline, and 3) find out what the mean and standard deviation 
between all your results.

If your results are within that range, this maybe normal. I 
follow-up with you later on what I do.

Steve Poe

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Need indexes on empty tables for good performance ?

2005-08-29 Thread Thomas F. O'Connell
Rohan,You should note that in Postgres, indexes are not inherited by child tables.Also, it seems difficult to select from a child table whose name you don't know unless you access the parent. And if you are accessing the data via the parent, I'm reasonably certain that you will find that indexes aren't used (even if they exist on the children) as a result of the way the children are accessed. --Thomas F. O'ConnellCo-Founder, Information ArchitectSitening, LLCStrategic Open Source: Open Your i™http://www.sitening.com/110 30th Avenue North, Suite 6Nashville, TN 37203-6320615-469-5150615-469-5151 (fax) On Aug 22, 2005, at 10:41 PM, Lenard, Rohan (Rohan) wrote:  I've read that indexes aren't used for COUNT(*) and I've noticed (7.3.x) with EXPLAIN that indexes never seem to be used on empty tables - is there any reason to have indexes on empty tables, or will postgresql never use them.   This is not as silly as it sounds - with table inheritance you might have table children with the data and a parent that is empty.  It'd be nice to make sure postgresql knows to never really look at the parent - especially is you don't know the names of all the children ..   Thoughts ?   thx,   Rohan

Re: [PERFORM] poor VACUUM performance on large tables

2005-09-04 Thread Thomas F. O'Connell


On Sep 4, 2005, at 1:16 AM, Jan Peterson wrote:


Hello,

We have been experiencing poor performance of VACUUM in our production
database.  Relevant details of our implementation are as follows:

1.  We have a database that grows to about 100GB.
2.  The database is a mixture of large and small tables.
3.  Bulk data (stored primarily in pg_largeobject, but also in various
TOAST tables) comprises about 45% of our data.
4.  Some of our small tables are very active, with several hundred
updates per hour.
5.  We have a rolling delete function that purges older data on a
periodic basis to keep our maximum database size at or near 100GB.

Everything works great until our rolling delete kicks in.  Of course,
we are doing periodic VACUUMS on all tables, with frequent VACUUMs on
the more active tables.  The problem arises when we start deleting the
bulk data and have to VACUUM pg_largeobject and our other larger
tables.  We have seen VACUUM run for several hours (even tens of
hours).  During this VACUUM process, our smaller tables accumulate
dead rows (we assume because of the transactional nature of the
VACUUM) at a very rapid rate.  Statistics are also skewed during this
process and we have observed the planner choosing sequential scans on
tables where it is obvious that an index scan would be more efficient.

We're looking for ways to improve the performance of VACUUM.  We are
already experimenting with Hannu Krosing's patch for VACUUM, but it's
not really helping (we are still faced with doing a database wide
VACUUM about once every three weeks or so as we approach the
transaction id rollover point... this VACUUM has been measured at 28
hours in an active environment).

Other things we're trying are partitioning tables (rotating the table
that updates happen to and using a view to combine the sub-tables for
querying).  Unfortunately, we are unable to partition the
pg_largeobject table, and that table alone can take up 40+% of our
database storage.  We're also looking at somehow storing our large
objects externally (as files in the local file system) and
implementing a mechanism similar to Oracle's bfile functionality.  Of
course, we can't afford to give up the transactional security of being
able to roll back if a particular update doesn't succeed.

Does anyone have any suggestions to offer on good ways to proceed
given our constraints?  Thanks in advance for any help you can
provide.

-jan-


Do you have your Free Space Map settings configured appropriately?  
See section 16.4.3.2 of the docs:


http://www.postgresql.org/docs/8.0/static/runtime-config.html#RUNTIME- 
CONFIG-RESOURCE


You'll want to run a VACUUM VERBOSE and note the numbers at the end,  
which describe how many pages are used and how many are needed.  
max_fsm_pages should be set according to that, and you can set  
max_fsm_relations based on it, too, although typically one knows  
roughly how many relations are in a database.


http://www.postgresql.org/docs/8.0/static/sql-vacuum.html

Finally, have you experimented with pg_autovacuum, which is located  
in contrib in the source tarballs (and is integrated into the backend  
in 8.1 beta and beyond)? You don't really say how often you're  
running VACUUM, and it might be that you're not vacuuming often enough.


--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-469-5150
615-469-5151 (fax)
---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Index Selection: ORDER BY vs. PRIMARY KEY

2005-09-19 Thread Thomas F. O'Connell
I have a query that looks roughly like this (I've removed irrelevant  
SELECT clause material and obfuscated names, trying to keep them  
consistent where altered in EXPLAIN output):


SELECT u.emma_member_id, h.action_ts
FROM user as u, history as h
WHERE u.user_id = h.user_id
AND h.action_id = '$constant_data'
ORDER BY h.action_ts DESC LIMIT 100 OFFSET 0

The user table has ~25,000 rows. The history table has ~750,000 rows.  
Currently, there is an index on history.action_ts and a separate one  
on history.action_id. There's also a PRIMARY KEY on user.user_id. If  
I run the query as such, I get a plan like this:


   QUERY PLAN
 
 

Limit  (cost=0.00..2196.30 rows=100 width=925) (actual  
time=947.208..3178.775 rows=3 loops=1)
   -  Nested Loop  (cost=0.00..83898.65 rows=3820 width=925)  
(actual time=947.201..3178.759 rows=3 loops=1)
 -  Index Scan Backward using h_action_ts_idx on history h   
(cost=0.00..60823.53 rows=3820 width=480) (actual  
time=946.730..3177.953 rows=3 loops=1)

   Filter: (action_id = $constant_data::bigint)
 -  Index Scan using user_pkey on user u  (cost=0.00..6.01  
rows=1 width=445) (actual time=0.156..0.161 rows=1 loops=3)

   Index Cond: (u.user_id = outer.user_id)
Total runtime: 3179.143 ms
(7 rows)

If I drop the index on the timestamp field, I get a plan like this:

   QUERY PLAN
 
 
-
Limit  (cost=17041.41..17041.66 rows=100 width=925) (actual  
time=201.725..201.735 rows=3 loops=1)
   -  Sort  (cost=17041.41..17050.96 rows=3820 width=925) (actual  
time=201.719..201.722 rows=3 loops=1)

 Sort Key: h.action_ts
 -  Merge Join  (cost=13488.15..16814.13 rows=3820  
width=925) (actual time=7.306..201.666 rows=3 loops=1)

   Merge Cond: (outer.user_id = inner.user_id)
   -  Index Scan using user_pkey on user u   
(cost=0.00..3134.82 rows=26802 width=445) (actual time=0.204..151.351  
rows=24220 loops=1)
   -  Sort  (cost=13488.15..13497.70 rows=3820  
width=480) (actual time=0.226..0.234 rows=3 loops=1)

 Sort Key: h.user_id
 -  Index Scan using h_action_id_idx on history  
h  (cost=0.00..13260.87 rows=3820 width=480) (actual  
time=0.184..0.195 rows=3 loops=1)
   Index Cond: (action_id =  
$constant_data::bigint)

Total runtime: 202.089 ms
(11 rows)

Clearly, if the index on the timestamp field is there, postgres wants  
to use it for the ORDER BY, even though the performance is worse. How  
is this preference made internally? If both indexes exist, will  
postgres always prefer the index on an ordered column? If I need the  
index on the timestamp field for other queries, is my best bet just  
to increase sort_mem for this query?


Here's my version string:
PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC 2.95.4

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-469-5150
615-469-5151 (fax)


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] wal_buffers

2005-10-06 Thread Thomas F. O'Connell


On Oct 5, 2005, at 8:23 AM, Ian Westmacott wrote:


Can anyone tell me what precisely a WAL buffer contains,
so that I can compute an appropriate setting for
wal_buffers (in 8.0.3)?

I know the documentation suggests there is little
evidence that supports increasing wal_buffers, but we
are inserting a large amount of data that, I believe,
easily exceeds the default 64K in a single transaction.
We are also very sensitive to write latency.

As background, we are doing a sustained insert of 2.2
billion rows in 1.3 million transactions per day.  Thats
about 1700 rows per transaction, at (roughly) 50 bytes
per row.


Ian,

The WAL Configuration chapter (25.2) has a pretty good discussion of  
how wal_buffers is used:


http://www.postgresql.org/docs/8.0/static/wal-configuration.html

You might also take a look at Josh Berkus' recent testing on this  
setting:


http://www.powerpostgresql.com/

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-469-5150
615-469-5151 (fax)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] tuning seqscan costs

2005-10-26 Thread Thomas F. O'Connell


On Oct 19, 2005, at 9:51 AM, Katherine Stoovs wrote:


I want to correlate two index rows of different tables to find an
offset so that

table1.value = table2.value AND table1.id = table2.id + offset

is true for a maximum number of rows.

To achieve this, I have the two tables and a table with possible
offset values and execute a query:

SELECT value,(SELECT COUNT(*) FROM table1,table2
  WHERE table1.value = table2.value AND
table1.id = table2.id + offset)
 AS matches FROM offsets ORDER BY matches;

The query is very inefficient, however, because the planner doesn't
use my indexes and executes seqscans instead. I can get it to execute
fast by setting ENABLE_SEQSCAN to OFF, but I have read this will make
the performance bad on other query types so I want to know how to
tweak the planner costs or possibly other stats so the planner will
plan the query correctly and use index scans. There must be something
wrong in the planning parameters after all if a plan that is slower by
a factor of tens or hundreds becomes estimated better than the fast
variant.

I have already issued ANALYZE commands on the tables.

Thanks for your help,
Katherine Stoovs


Katherine,

If offset is a column in offsets, can you add an index on the  
expresion table2.id + offset?


http://www.postgresql.org/docs/8.0/static/indexes-expressional.html

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Open Source Solutions. Optimized Web Development.

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-469-5150
615-469-5151 (fax)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Effects of cascading references in foreign keys

2005-10-29 Thread Thomas F. O'Connell


On Oct 29, 2005, at 9:48 AM, Bruno Wolff III wrote:


On Sat, Oct 29, 2005 at 13:10:31 +0200,
  Martin Lesser [EMAIL PROTECTED] wrote:

Which effects have UPDATEs on REFERENCEd TABLEs when only columns  
in the

referenced table are updated which are not part of the FOREIGN KEY
constraint?


In 8.1 there is a check to see if the foreign key value has changed  
and if
not a trigger isn't queued. In the currently released versions any  
update

will fire triggers.
The check in comment for trigger.c didn't say if this optimization  
applied

to both referencing and referenced keys or just one of those.
If you need to know more you can look at the code at:
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/
for trigger.c.


It seems like this warrants an item somewhere in the release notes,  
and I'm not currently seeing it (or a related item) anywhere. Perhaps  
E.1.3.1 (Performance Improvements)? For some of the more extreme  
UPDATE scenarios I've seen, this could be a big win.


--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Open Source Solutions. Optimized Web Development.

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-469-5150
615-469-5151 (fax)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Figuring out which command failed

2005-11-08 Thread Thomas F. O'Connell


On Nov 7, 2005, at 3:30 PM, Ralph Mason wrote:


Hi,

I have a transaction that has multiple separate command in it  
(nothing unusual there).


However sometimes one of the sql statements will fail and so the  
whole transaction fails.


In some cases I could fix the failing statement if only I knew  
which one it was.  Can anyone think of any way to get which  
statement actually failed from the error message?  If the error  
message gave me the line of the failure it would be excellent, but  
it doesn't.  Perhaps it would be easy for me to patch my version of  
Postgres to do that?


I realize I could do this with 2 phase commit, but that isn't ready  
yet!


Any thoughts or ideas are much appreciated

Thanks
Ralph


2PC might not've been ready yesterday, but it's ready today!

http://www.postgresql.org/docs/whatsnew

--
Thomas F. O'Connell
Database Architecture and Programming
Co-Founder
Sitening, LLC

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-469-5150
615-469-5151 (fax)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Very slow queries - please help

2005-12-03 Thread Thomas F. O'Connell


On Nov 24, 2005, at 12:14 PM, Bealach-na Bo wrote:


The consensus seems to be that I need more indexes and I also need to
look into the NOT IN statement as a possible bottleneck. I've
introduced the indexes which has led to a DRAMATIC change in response
time. Now I have to experiment with INNER JOIN - OUTER JOIN
variations, SET ENABLE_SEQSCAN=OFF.

Forgive me for not mentioning each person individually and by name.
You have all contributed to confirming what I had suspected (and
hoped): that *I* have a lot to learn!

I'm attaching table descriptions, the first few lines of top output
while the queries were running, index lists, sample queries and
EXPLAIN ANALYSE output BEFORE and AFTER the introduction of the
indexes. As I said, DRAMATIC :) I notice that the CPU usage does not
vary very much, it's nearly 100% anyway, but the memory usage drops
markedly, which is another very nice result of the index introduction.

Any more comments and tips would be very welcome.


You might find the following resources from techdocs instructive:

http://techdocs.postgresql.org/redir.php?link=/techdocs/ 
pgsqladventuresep2.php


http://techdocs.postgresql.org/redir.php?link=/techdocs/ 
pgsqladventuresep3.php


These documents provide some guidance into the process of index  
selection. It seems like you could still stand to benefit from more  
indexes based on your queries, table definitions, and current indexes.


--
Thomas F. O'Connell
Database Architecture and Programming
Co-Founder
Sitening, LLC

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005 (cell)
615-469-5150 (office)
615-469-5151 (fax)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] most bang for buck with ~ $20,000

2006-08-08 Thread Thomas F. O'Connell


On Aug 8, 2006, at 4:49 PM, Joshua D. Drake wrote:


I am considering a setup such as this:
  - At least dual cpu (possibly with 2 cores each)
  - 4GB of RAM
  - 2 disk RAID 1 array for root disk
  - 4 disk RAID 1+0 array for PGDATA
  - 2 disk RAID 1 array for pg_xlog
Does anyone know a vendor that might be able provide such setup?


Wouldn't it be preferable to put WAL on a multi-disk RAID 10 if you  
had the opportunity? This gives you the redundancy of RAID 1 but  
approaches the performance of RAID 0, especially as you add disks to  
the array. In benchmarking, I've seen consistent success with this  
approach.


--
Thomas F. O'Connell
Sitening, LLC

http://www.sitening.com/
3004B Poston Avenue
Nashville, TN 37203-1314
615-469-5150 x802
615-469-5151 (fax)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] most bang for buck with ~ $20,000

2006-08-08 Thread Thomas F. O'Connell


On Aug 8, 2006, at 5:28 PM, Joshua D. Drake wrote:


Thomas F. O'Connell wrote:

On Aug 8, 2006, at 4:49 PM, Joshua D. Drake wrote:

I am considering a setup such as this:
  - At least dual cpu (possibly with 2 cores each)
  - 4GB of RAM
  - 2 disk RAID 1 array for root disk
  - 4 disk RAID 1+0 array for PGDATA
  - 2 disk RAID 1 array for pg_xlog
Does anyone know a vendor that might be able provide such setup?
Wouldn't it be preferable to put WAL on a multi-disk RAID 10 if  
you had the opportunity? This gives you the redundancy of RAID 1  
but approaches the performance of RAID 0, especially as you add  
disks to the array. In benchmarking, I've seen consistent success  
with this approach.


WALL is written in order so RAID 1 is usually fine. We also don't  
need journaling for WAL so the speed is even faster.


In which case, which is theoretically better (since I don't have a  
convenient test bed at the moment) for WAL in a write-heavy  
environment? More disks in a RAID 10 (which should theoretically  
improve write throughput in general, to a point) or a 2-disk RAID 1?  
Does it become a price/performance question, or is there virtually no  
benefit to throwing more disks at RAID 10 for WAL if you turn off  
journaling on the filesystem?


--
Thomas F. O'Connell
Sitening, LLC

http://www.sitening.com/
3004B Poston Avenue
Nashville, TN 37203-1314
615-469-5150 x802
615-469-5151 (fax)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] most bang for buck with ~ $20,000

2006-08-08 Thread Thomas F. O'Connell


On Aug 8, 2006, at 6:24 PM, Joshua D. Drake wrote:

In which case, which is theoretically better (since I don't have a  
convenient test bed at the moment) for WAL in a write-heavy  
environment? More disks in a RAID 10 (which should theoretically  
improve write throughput in general, to a point) or a 2-disk RAID  
1? Does it become a price/performance question, or is there  
virtually no benefit to throwing more disks at RAID 10 for WAL if  
you turn off journaling on the filesystem?


Over 4 drives, I would gather that RAID 10 wouldn't gain you  
anything. Possibly over 6 or 8 however, it may be faster because  
you are writing smaller chunks of data, even if two copies of each.


Yeah, where I've seen the benefits in practice, the scenarios have  
involved the availability of a minimum of 6 drives for a RAID 10 for  
WAL. I really should do a comparison of a 2-disk RAID 1 with a  
variety of multi-disk RAID 10 configurations at some point.


--
Thomas F. O'Connell
Sitening, LLC

http://www.sitening.com/
3004B Poston Avenue
Nashville, TN 37203-1314
615-469-5150 x802
615-469-5151 (fax)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org