[PERFORM] monitoring suggestions

2012-07-18 Thread Campbell, Lance
PostgreSQL: 9.1
OS: Red Hat 6
This PostgreSQL instance is used for dynamic web content.  It runs on a 
dedicated server.

So I need some PostgreSQL monitoring advice.  There are two basic strategies 
that I am aware of for configuring PostgreSQL:


1)  In Memory:  With an in memory option you give PostgreSQL 70% or more of 
the memory by setting the shared buffers.  You are relying on PostgreSQL to put 
into memory the information within the database.  The only access to the disk 
from my understanding should be for the initial read of data into a block of 
memory and when updates are made to data blocks.  The advantage of this 
strategy is that if you notice an increase in the Linux swap file then you know 
you need to increase the memory on the server as well as PostgreSQL.

2)  Disk Caching:  With this approach you are relying on the operating 
system to cache disk files in memory.  PostgreSQL will scan the disk cache for 
the data it needs.  In order to use this strategy you set the amount of shared 
buffers to a low number like 1G or less.  You also want to make sure to set 
effective cache size to the amount of memory that you expect your server's OS 
to use for disk caching.  The only major drawback for me with this strategy is 
how do I know when I need more memory for the OS to use when caching my files?

If I were to use option #2 above what type of monitoring would you suggest I 
use to tell me when I need to add more memory?
Thanks,

Lance Campbell
Software Architect
Web Services at Public Affairs
217-333-0382




[PERFORM] Tablespaces on a raid configuration

2012-03-30 Thread Campbell, Lance
PostgreSQL 9.0.x
When PostgreSQL  storage is using a relatively large raid  5 or 6 array is 
there any value in having your tables distributed across multiple tablespaces 
if those tablespaces will exists on the same raid array?  I understand the 
value if you were to have the tablespaces on different raid arrays.  But what 
about on the same one?


Thanks,

Lance Campbell
Software Architect
Web Services at Public Affairs
217-333-0382



[PERFORM] database slowdown while a lot of inserts occur

2012-03-29 Thread Campbell, Lance
PostgreSQL 9.0.x
We have around ten different applications that use the same database.  When one 
particular application is active it does an enormous number of inserts.  Each 
insert is very small.  During this time the database seems to slow down in 
general.  The application in question is inserting into a particular table that 
is not used by the other applications.


1)  What should I do to confirm that the database is the issue and not the 
applications?

2)  How can I identify where the bottle neck is occurring if the issue 
happens to be with the database?

I have been using PostgreSQL for eight years.  It is an amazing database.

Thanks,

Lance Campbell
Software Architect
Web Services at Public Affairs
217-333-0382



Re: [PERFORM] database slowdown while a lot of inserts occur

2012-03-29 Thread Campbell, Lance
I forgot to mention that the slowdown in particular for other applications is 
when they are trying to insert or update tables unrelated to the application 
mentioned in my prior application that does the massive small inserts.


Thanks,

Lance Campbell
Software Architect
Web Services at Public Affairs
217-333-0382

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Campbell, Lance
Sent: Thursday, March 29, 2012 12:59 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] database slowdown while a lot of inserts occur

PostgreSQL 9.0.x
We have around ten different applications that use the same database.  When one 
particular application is active it does an enormous number of inserts.  Each 
insert is very small.  During this time the database seems to slow down in 
general.  The application in question is inserting into a particular table that 
is not used by the other applications.


1)  What should I do to confirm that the database is the issue and not the 
applications?

2)  How can I identify where the bottle neck is occurring if the issue 
happens to be with the database?

I have been using PostgreSQL for eight years.  It is an amazing database.

Thanks,

Lance Campbell
Software Architect
Web Services at Public Affairs
217-333-0382



Re: [PERFORM] How much memory is PostgreSQL using

2010-04-02 Thread Campbell, Lance
Greg,
Thanks for your help.

1) How does the number of buffers provided by pg_buffercache compare to
memory (buffers * X = Y meg)?  
2) Is there a way to tell how many total buffers I have available/max?

Thanks,

Lance Campbell
Software Architect/DBA/Project Manager
Web Services at Public Affairs
217-333-0382


-Original Message-
From: Greg Smith [mailto:g...@2ndquadrant.com] 
Sent: Monday, March 29, 2010 11:54 AM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How much memory is PostgreSQL using

Campbell, Lance wrote:

 Or is there some way to ask PostgreSQL how much memory are you using 
 to cache disk blocks currently?


You can install contrib/pg_buffercache into each database and count how 
many used blocks are there.  Note that running queries using that 
diagnostic tool is really intensive due to the locks it takes, so be 
careful not to do that often on a production system.


 When you do a PG_DUMP does PostgreSQL put the disk blocks into shared 
 buffers as it runs?


To some extent.  Most pg_dump activity involves sequential scans that 
are reading an entire table.  Those are no different from any other 
process that will put disk blocks into shared_buffers.  However, that 
usage pattern makes pg_dump particularly likely to run into an 
optimization in 8.3 and later that limits how much of shared_buffers is 
used when sequentially scanning a large table.  See P10 of 
http://www.westnet.com/~gsmith/content/postgresql/InsideBufferCache.pdf 
for the exact implementation.  Basically, anything bigger than 
shared_buffers / 4 uses a 256K ring to limit its cache use, but it's a 
little more complicated than that.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] How much memory is PostgreSQL using

2010-03-29 Thread Campbell, Lance
PostgreSQL 8.4.3

OS: Linux Red Hat 4.x

 

I changed my strategy with PostgreSQL recently to use a large segment of
memory for shared buffers with the idea of caching disk blocks.  How can
I see how much memory PostgreSQL is using for this?

 

I tried:

 

ps aux | grep post | sort -k4

 

This lists the processes using memory at the bottom.  Are they sharing
memory or using individual their own blocks of memory?

 

When I do top I see that VIRT is the value of my shared buffers plus a
tiny bit.  I see %MEM is only 2.4%, 2.6%, 1.0%,1.5%, and 1.1% for all of
the running processes.  Do I add these percentages up to see what amount
of VIRT I am really using? 

 

Or is there some way to ask PostgreSQL how much memory are you using to
cache disk blocks currently?

 

When you do a PG_DUMP does PostgreSQL put the disk blocks into shared
buffers as it runs? 

 

Thanks,

 

Lance Campbell

Software Architect/DBA/Project Manager

Web Services at Public Affairs

217-333-0382

 



[PERFORM] memory question

2010-03-24 Thread Campbell, Lance
PostgreSQL 8.4.3

Linux Redhat 5.0

 

Question: How much memory do I really need?

 

From my understanding there are two primary strategies for setting up
PostgreSQL in relationship to memory:

 

1)  Rely on Linux to cache the files.  In this approach you set the
shared_buffers to a relatively low number.  

2)  You can set shared_buffers to a very large percentage of your
memory so that PostgreSQL reserves the memory for the database.

 

I am currently using option #1.  I have 24 Gig of memory on my server
and the database takes up 17 Gig of disk space.  When I do the Linux
command top I notice that 19 Gig is allocated for cache.  Is there a
way for me to tell how much of that cache is associated with the caching
of database files?

 

I am basically asking how much memory do I really need?  Maybe I have
complete over kill.  Maybe I am getting to a point where I might need
more memory.

 

My thought was I could use option #2 and then set the number to a lower
amount.  If the performance is bad then slowly work the number up.

 

Our server manager seems to think that I have way to much memory.  He
thinks that we only need 5 Gig.  I don't really believe that.  But I
want to cover myself.  With money tight I don't want to be the person
who is wasting resources.  We need to replace our database servers so I
want to do the right thing.

 

Thanks,

 

Lance Campbell

Software Architect/DBA/Project Manager

Web Services at Public Affairs

217-333-0382

 



[PERFORM] Best settings to load a fresh database

2009-08-06 Thread Campbell, Lance
PostgreSQL 8.3
Linux RedHat 4.X
24G of memory

When loading a file generated from pg_dumpall is there a key setting in
the configuration file that would allow the load to work faster.

Thanks,

Lance Campbell
Project Manager/Software Architect/DBA
Web Services at Public Affairs
217-333-0382



[PERFORM] How many inserts am I doing

2008-07-11 Thread Campbell, Lance
PostgreSQL: 8.2

How can you identify how many inserts are being done in a given time
frame for a database?

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

My e-mail address has changed to [EMAIL PROTECTED]

 



[PERFORM] How many updates and inserts

2008-07-11 Thread Campbell, Lance
PostgreSQL: 8.2

How can I identify how many inserts and updates are being done in a
given time frame for a database?

 

Thanks,

 

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

My e-mail address has changed to [EMAIL PROTECTED]

 



[PERFORM] Creating a foreign key

2008-05-08 Thread Campbell, Lance
PostgreSQL: 8.2

 

When you create a foreign key to a table is there an index that is
created on the foreign key automatically?

 

Example:

Table A has a field called ID.

 

Table B has a field called fk_a_id which has a constraint of being a
foreign key to table A to field ID.

 

Is there an index automatically created on field fk_a_id in table B when
I create a foreign key constraint?

 

 

I assume yes.  But I wanted to check.  I did not see it specifically
mentioned in the documentation.

 

I also see CREATE TABLE / PRIMARY KEY will create implicit index when
creating a primary key but I don't see any similar statement when
creating a foreign key.

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: [PERFORM] Creating a foreign key

2008-05-08 Thread Campbell, Lance
Shaun,
Thanks for the very detailed description of why posgres does not auto
create indexes.  That makes a lot of sense.

Thanks again,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: Shaun Thomas [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 08, 2008 12:19 PM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Creating a foreign key


On Thu, 2008-05-08 at 17:52 +0100, Campbell, Lance wrote:

 Is there an index automatically created on field fk_a_id in table B
 when I create a foreign key constraint?

No.  The problem with doing this is it assumes certain things about your
infrastructure that may be entirely false.  Indexes are to speed up
queries by logarithmically reducing the result set to matched index
parameters, and pretty much nothing else.  Indexes are also not free,
taking up both disk space and CPU time to maintain, slowing down
inserts.

Foreign keys are not bi-directional either.  They actually check the
index in the *source* table to see if the value exists.  Having an index
on a column referring to another table may be advantageous, but it's not
always necessary.  If you never use that column in a where clause, or it
isn't restrictive enough, you gain nothing and lose speed in table
maintenance.  It's totally up to the focus of your table schema design,
really.  Only careful app management and performance analysis can really
tell you where indexes need to go, beyond the rules-of-thumb concepts,
anyway.

 I also see CREATE TABLE / PRIMARY KEY will create implicit index
 when creating a primary key but I don't see any similar statement when
 creating a foreign key.

That's because the definition of a primary key is an index that acts as
the primary lookup for the table.  This is required to be an index,
partially because it has an implied unique constraint, and also because
it has a search-span of approximately 1 when locating a specific row
from that table.

But indexes aren't some kind of magical make a query faster sauce.
With too many values, the cost of scanning them individually becomes
prohibitive, and the database will fall-back to a faster sequence-scan,
which can take advantage of the block-fetch nature of most storage
devices to just blast through all the results for the values it's
looking for.  It's restrictive where clauses *combined* with well-chosen
indexes that give you good performance, with a little tweaking here and
there to make the query-planner happy.

But that's the long version.  Postgres is by no means bare-bones, but it
assumes DBAs are smart enough to manage the structures they bolt onto
the metal. :)

-- 

Shaun Thomas
Database Administrator

Leapfrog Online 
807 Greenwood Street 
Evanston, IL 60201 
Tel. 847-440-8253
Fax. 847-570-5750
www.leapfrogonline.com



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Backup causing poor performance - suggestions

2008-05-05 Thread Campbell, Lance
PostgreSQL: 8.2.4

 

We currently backup all of our database tables per schema using pg_dump
every half hour.  We have been noticing that the database performance
has been very poor during the backup process.  How can I improve the
performance?

 

Server Specs:

Dedicated DB server

Database takes up 8.0 Gig of disk space

2 Xeon 5160 dual cores 3.0 

16 G of memory

Two disks in raid 1 are used for the OS, database and backups.  SAS
10,000 RPM drives.

OS: Linux AS 4.x 64 bit

shared_buffers = 1 GB

work_mem = 20MB

max_fsm_pages = 524288

random_page_cost=1.0

effective_cache_size=16GB

max_connections=150

 

All other settings are the default settings.  

 

I have tried doing backups to a second set of disks but the performance
only improved somewhat.

 

Does anyone have advice on how to improve my performance during backup?
Would adding two quad core processors improve performance?

 

Thanks,

 

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: [PERFORM] Backup causing poor performance - suggestions

2008-05-05 Thread Campbell, Lance
Scott,
The server is a Dell PowerEdge 2900 II with the standard Perc 6/I SAS
controller with 256 MB cache. 

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: Scott Marlowe [mailto:[EMAIL PROTECTED] 
Sent: Monday, May 05, 2008 10:06 AM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Backup causing poor performance - suggestions

On Mon, May 5, 2008 at 8:59 AM, Campbell, Lance [EMAIL PROTECTED] wrote:

 PostgreSQL: 8.2.4

You should update to 8.2.7 as a matter of periodic maintenance.  It's
a very short and easy update.

 We currently backup all of our database tables per schema using
pg_dump
 every half hour.  We have been noticing that the database performance
has
 been very poor during the backup process.  How can I improve the
 performance?



 Server Specs:

 Dedicated DB server

 Database takes up 8.0 Gig of disk space

 2 Xeon 5160 dual cores 3.0

 16 G of memory

 Two disks in raid 1 are used for the OS, database and backups.  SAS
10,000
 RPM drives.

 OS: Linux AS 4.x 64 bit

So, what kind of RAID controller are you using?  And can you add more
drives and / or battery backed cache to it?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planning hot/live backups?

2008-03-24 Thread Campbell, Lance
I back up around 10 Gig of data every half hour using pg_dump.  I don't
backup the entire database at once.  Instead I backup at the schema
namespace level.  But I do all of them every half hour.  It takes four
minutes.  That includes the time to copy the files to the backup server.
I do each schema namespace backup consecutively.  I also run vacuum full
analyze once a day.  My system is up 24/7 as well.  I don't backup in
the middle of the night.  There is so little back.  But I could.  I am
able to have more backups by not doing it when there are only a handful
of transactions.  

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steve Poe
Sent: Monday, March 24, 2008 3:23 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Planning hot/live backups?

The owners of the animal hospital where I work at want to consider
live/hot
backups through out the day so we're less likely to lose a whole
day of transaction.  We use Postgresql 8.0.15. We do 3AM
backups, using pg_dumpall, to a file when there is very little activity.

The hospital enjoys the overall performance of the veterinary
application running
on Postgresql. I know doing a mid-day backup when up to 60 computers
(consistently
35-40) are access client/patient information, it will cause some
frustration. I understand
there needs to be balance of performance and backup of current records.

While I know that not all situations are the same, I am hoping there
is a performance
latency that others have experienced when doing backups during the day
and/or
planning for cluster (or other types of redundancy).

My animal hospital operates 24x7 and is in the south part of the San
Francisco Bay area. Outside
of sharing your experiences/input with me, I would not mind if you/your
company
do this type of consulting offline.

Thank you.

Steve

-
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Windows XP 64 bit

2008-03-19 Thread Campbell, Lance
I posted this question to the admin group.  I just realized that I
should have sent it here.

 

I just read the following article:

 

http://people.planetpostgresql.org/mha/index.php?/archives/162-PostgreSQ
L-vs-64-bit-windows.html

 

Would there be benefits in running PostgreSQL in a 32 bit mode on a 64
bit version of XP?   My thought is that the OS could access more of the
memory for the caching of the files.  On my production Linux box, I
don't allocate more than a 2 Gig to PostgreSQL.  I leave the rest of the
memory available for the caching of disk files.  So even though
PostgreSQL would be running in a 32 bit mode it seems like it would
still run better on a 64 bit XP box compared to a 32 bit version.  This
of course assumes that one does have a sizeable database and more than 3
Gig of memory.

 

Is this a correct assumption?

 

Would the performance be relatively similar to that of Linux? 

 

Thanks,

 

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



[PERFORM] Large Objects and Toast

2007-12-14 Thread Campbell, Lance
PostgreSQL: 8.2

 

My understanding is that when one creates a large object there is no way
to link the large object to a field in a table so that cascading delete
can occur.  Is this correct?  My understanding is that you have to
manually delete the large object.

 

I also read something about the OID ID being limited in size.  What is
the size limit of this OID type?  I am sure that it is bigger than the
number of files that I would be uploaded into my db; but I just want to
get an idea of the range.

 

When putting a reference to a large object in a table, should the type
of the reference object be OID?

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



[PERFORM] Putting files into fields in a table

2007-12-13 Thread Campbell, Lance
PostgreSQL: 8.2

I am looking at the possibility of storing files in some of my database
tables.  My concern is obviously performance.  I have configured
PostgreSQL to take advantage of Linux file caching.  So my PostgreSQL
does not have a large setting for shared_buffers even though I have 24G
of memory.  The performance today is very good.  

 

Some questions I have:

 

What data type should I use for fields that hold files?

Is there anything that I should be aware of when putting files into a
field in a table?

When PostgreSQL accesses a table that has fields that contain files does
it put the fields that contain the files into the shared_buffers memory
area?

  

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: [PERFORM] Putting files into fields in a table

2007-12-13 Thread Campbell, Lance
I did not see much info in the 8.2 documentation on BLOB.  I did ready
about bytea or binary data type.  It seems like it would work for
storing files.  I guess I could stick with the OS for file storage but
it is a pain.  It would be easier to use the DB.

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
Of Alexander Staubo
Sent: Thursday, December 13, 2007 1:39 PM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Putting files into fields in a table

On 12/13/07, Campbell, Lance [EMAIL PROTECTED] wrote:
 I am looking at the possibility of storing files in some of my
database
 tables.  My concern is obviously performance.  I have configured
PostgreSQL
 to take advantage of Linux file caching.  So my PostgreSQL does not
have a
 large setting for shared_buffers even though I have 24G of memory.

This used to be the recommended way before 8.0. In 8.0, it is
advantageous to give PostgreSQL more buffers. You should still make
some room for the kernel cache.

By storing files, I assume you mean a lot of data imported from
files. The procs and cons of storing large amounts of data as
PostgreSQL tuples has been debated before. You might want to search
the archives.

My opinion is that PostgreSQL is fine up to a point (let's say 10,000
tuples x 2KB), above which I would merely store references to
file-system objects. Managing these objects can be painful, especially
in a cluster of more than one machine, but at least it's fast and
lightweight.

 What data type should I use for fields that hold files?

PostgreSQL has two ways of storing large amounts of data in a single
tuple: variable-length columns, and blobs.

Blobs are divided into chunks and stored in separate tables, one tuple
per chunk, indexed by offset, and PostgreSQL allows random access to
the data. The downside is that they take up more room, they're slower
to create, slower to read from end to end, and I believe there are
some operations (such as function expressions) that don't work on
them. Some replication products, including (the last time I looked)
Slony, does not support replicating blobs. Blobs are not deprecated, I
think, but they feel like they are.

Variable-length columns such as bytea and text support a system called
TOAST, which allow the first few kilobytes of the data to be stored
in-place in the tuple, and the overflow to be stored elsewhere and
potentially compressed. This system is much faster and tighter than
blobs, but does not offer random I/O.

 Is there anything that I should be aware of when putting files into a
field
 in a table?

Backup dumps will increase in size in proportion to the size of your
data. PostgreSQL is no speed demon at loading/storing data, so this
might turn out to be the Achilles heel.

 When PostgreSQL accesses a table that has fields that contain files
does it
 put the fields that contain the files into the shared_buffers memory
area?

I believe so.

Alexander.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Putting files into fields in a table

2007-12-13 Thread Campbell, Lance
Erik,
The advantage with storing things in the database verses the file system
is the number of lines of code.  I manage 18 software applications.  I
have developed an application that reads in an XML file and will
generate database java code for inserting, updating, selecting and
deleting data.  So for me the database is a no brainer.  But when I need
to store files that are uploaded by users I have to hand code the
process.  It is not hard.  It is just time consuming.  I want to keep
the amount I can do per hour at a very high level.  The less code the
better.
 
Using a database correctly really saves on the number of lines of code.

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 

-Original Message-
From: Erik Jones [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 13, 2007 2:22 PM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org list
Subject: Re: [PERFORM] Putting files into fields in a table


On Dec 13, 2007, at 2:09 PM, Campbell, Lance wrote:

 I did not see much info in the 8.2 documentation on BLOB.  I did ready
 about bytea or binary data type.  It seems like it would work for
 storing files.  I guess I could stick with the OS for file storage but
 it is a pain.  It would be easier to use the DB.

In postgres they're simply called Large Objects (or LOBs) and there  
is a whole chapter devoted to them in Part IV of the manual.  Note  
that you only need to use this facility if you're going to be storing  
data over 1G in size (at which point your limit becomes 2G).  What  
kind of data are in these files?  What gain do you foresee in storing  
the files directly in the db (as opposed, say, to storing the paths  
to the files in the filesystem)?

Erik Jones

Software Developer | Emma(r)
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] clear pg_stats

2007-11-29 Thread Campbell, Lance
How can I clear the pg_stats views without restarting PostgreSQL?  I
thought there was a function.

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



[PERFORM] work_mem and shared_buffers

2007-11-09 Thread Campbell, Lance
Does the amount of memory allocate to work_mem get subtracted from
shared_buffers?

 

Example:

 

If work_mem is 1M and there are 10 connections and shared_buffers is
100M then would the total be 90 M left for shared_buffers?

 

Or does the amount of memory allocated for work_mem have nothing to do
with shared_buffers?

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: [PERFORM] work_mem and shared_buffers

2007-11-09 Thread Campbell, Lance
Wow.  That is a nice logging feature in 8.3!

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: Bill Moran [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 09, 2007 2:08 PM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] work_mem and shared_buffers

On Fri, 9 Nov 2007 12:08:57 -0600
Campbell, Lance [EMAIL PROTECTED] wrote:

 How do you know when you should up the value of work_mem?  Just play
 with the number.  Is there a query I could do that would tell me if
 PostgreSql is performing SQL that could use more memory for sorting?

8.2 and older, it can be difficult to know, and I don't have a specific
recommendation.

8.3 includes a parameter to log the usage of temporary files by
Postgres.
When a sort can't fit in the available memory, it uses a temp file, thus
you could use this new feature to track when sorts don't fit in
work_mem.

-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] work_mem and shared_buffers

2007-11-09 Thread Campbell, Lance
It is amazing, how after working with databases very actively for over 8
years, I am still learning things.

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: Scott Marlowe [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 09, 2007 1:13 PM
To: Campbell, Lance
Cc: Heikki Linnakangas; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] work_mem and shared_buffers

On Nov 9, 2007 12:08 PM, Campbell, Lance [EMAIL PROTECTED] wrote:
 How do you know when you should up the value of work_mem?  Just play
 with the number.  Is there a query I could do that would tell me if
 PostgreSql is performing SQL that could use more memory for sorting?

Trial and error.  Note that you can set work_mem for a given session.
While it may seem that making work_mem bigger will always help, that's
not necessarily the case.

Using this query:

select count(*) from (select * from myreporttable where lasttime 
now() - interval '1 week' order by random() ) as l

I did the following: (I ran the query by itself once to fill the
buffers / cache of the machine with the data)

work_mem Time:
1000kB 29215.563 ms
4000kB 20612.489 ms
8000kB 18408.087 ms
16000kB 16893.964 ms
32000kB 17681.221 ms
64000kB 22439.988 ms
125MB 23398.891 ms
250MB 25461.797 ms

Note that my best time was at around 16 Meg work_mem.  This data set
is MUCH bigger than 16 Meg, it's around 300-400 Meg.  But work_mem
optimized out at 16 Meg.  Btw, I tried it going as high as 768 Meg,
and it was still slower than 16M.

This machine has 2 Gigs ram and is optimized for IO not CPU performance.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] index stat

2007-11-05 Thread Campbell, Lance
PostgreSQL:8.2.4

 

I am collecting statistics info now on my database.  I have used the
following two queries:

 

select * from pg_stat_all_indexes;

select * from pg_statio_all_indexes;

 

How can I use the information from these two queries to better optimize
my indexes?  Or maybe even get rid of some unnecessary indexes.

 

Example output:

 

  relid  | indexrelid |  schemaname   |relname|
indexrelname| idx_blks_read | idx_blks_hit 

-++---+---+-
--+---+--

   16801 |  57855 | a | screen|
screen_index1 |  1088 |   213618

   16801 |  57857 | a | screen|
screen_index3 |   905 |   201219

   16803 |  16805 | pg_toast  | pg_toast_16801|
pg_toast_16801_index  |  3879 |  1387471

   16978 |  16980 | pg_toast  | pg_toast_16976|
pg_toast_16976_index  | 0 |0

  942806 | 942822 | b| question_result_entry |
question_result_entry_index1  |18 |0

  942806 | 942824 | b| question_result_entry |
question_result_entry_index2  |18 |0

  942806 | 942828 | b| question_result_entry |
question_result_entry_index3  |18 |0

 

  relid  | indexrelid |  schemaname   |relname|
indexrelname| idx_scan  | idx_tup_read | idx_tup_fetch 

-++---+---+-
--+---+--+---

   16801 |  57855 | a| screen
| screen_index1   | 48693 |  1961745 |
1899027

   16801 |  57857 | a| screen
| screen_index3   | 13192 |   132214 |
87665

   16803 |  16805 | pg_toast | pg_toast_16801|
pg_toast_16801_index  |674183 |   887962 |
887962

   16978 |  16980 | pg_toast | pg_toast_16976|
pg_toast_16976_index  | 0 |0 |
0

 942806 | 942822 | b| question_result_entry |
question_result_entry_index1| 0 |0 |
0   

 942806 | 942824 | b| question_result_entry |
question_result_entry_index2| 0 |0 |
0

 942806 | 942828 | b| question_result_entry |
question_result_entry_index3| 0 |0 |
0

 

 

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



[PERFORM] Training Recommendations

2007-11-05 Thread Campbell, Lance
PostgreSQL: 8.2.4

 

Does anyone have any companies they would recommend using for
performance tuning training of PostgreSQL for Linux?  Or general DBA
training?

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: [PERFORM] Suggestions on an update query

2007-10-29 Thread Campbell, Lance
Thanks for all of your help. The problem was that the result_entry table
had some constraints that pointed to a third table.  When I removed
those constraints the performance was amazing.  The update took less
than seven minutes to execute.  I did not even consider the fact that
constraints to another table would impact the performance.

Thanks again,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gregory
Stark
Sent: Friday, October 26, 2007 9:05 PM
To: Joshua D. Drake
Cc: Campbell, Lance; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Suggestions on an update query


Joshua D. Drake [EMAIL PROTECTED] writes:

 On Fri, 26 Oct 2007 15:31:44 -0500
 Campbell, Lance [EMAIL PROTECTED] wrote:

 I forgot to include an additional parameter I am using in
 Postgresql.conf: 
 

 O.k. first, just to get it out of the way (and then I will try and
 help). Please do not top post, it makes replying contextually very
 difficult.

 PostgreSql version 8.2.4
 
 Memory = 8 Gig
 
 CPUs 1 dual core Zeon running at 3.0
 

 O.k. first you might be grinding through your 20 checkpoint segments
 but in reality what I think is happening is you are doing foreign key
 checks against all of it and slowing things down.

If you're going to berate someone about top-posting perhaps you should
attach
your own commentary to relevant bits of context :P

But the original post didn't include any foreign key constraints. I
suspect
you've guessed it right though. In fact I suspect what's happening is he
doesn't have an index on the referencing column so the foreign key
checks are
doing sequential scans of.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Suggestions on an update query

2007-10-26 Thread Campbell, Lance
I forgot to include an additional parameter I am using in
Postgresql.conf: 

 

checkpoint_segments = 30

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Campbell,
Lance
Sent: Friday, October 26, 2007 3:27 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Suggestions on an update query

 

PostgreSql version 8.2.4

Memory = 8 Gig

CPUs 1 dual core Zeon running at 3.0

 

I have a problem with an update query taking over 10 hours in order to
run.   I rebooted my server.  I ran the SQL command analyze.  Could
you please help me with any suggestions?  I have included the two tables
involved in the update below as well as the indexes I am using.  

 

The table result_entry contains 17,767,240 rows and the table
question_number contains 40,787.  Each row from the result_entry table
will match to one and only one row in the table question_number using
the fk_question_id field.  Each row from the question_number table
matches to an average of 436 rows on the result_entry table.

 

CREATE TABLE question_number

(

  fk_form_idinteger not null,

  fk_question_idinteger not null,

  question_number   integer not null,

  sequence_id   integer not null

);

 

ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey
PRIMARY KEY (fk_question_id);

CREATE INDEX question_number_index1 ON question_number USING btree
(question_number);

 

 

CREATE TABLE result_entry (

fk_result_submission_id integer NOT NULL,

fk_question_id integer NOT NULL,

fk_option_order_id integer NOT NULL,  

value character varying,

order_id integer NOT NULL,

question_number integer

);

 

CREATE INDEX result_entery_index1 ON result_entry USING btree
(fk_question_id);

 

 

update result_entry set question_number=question_number.question_number


from question_number where
result_entry.fk_question_id=question_number.fk_question_id;

 

 

 

explain update result_entry set
question_number=question_number.question_number  

from question_number where
result_entry.fk_question_id=question_number.fk_question_id;

 

   QUERY PLAN



-

 Hash Join  (cost=1437.71..1046983.94 rows=17333178 width=32)

   Hash Cond: (result_entry.fk_question_id =
question_number.fk_question_id)

   -  Seq Scan on result_entry  (cost=0.00..612216.78 rows=17333178
width=28)

   -  Hash  (cost=927.87..927.87 rows=40787 width=8)

 -  Seq Scan on question_number  (cost=0.00..927.87 rows=40787
width=8)

(5 rows)

 

 

 

Postgresql.conf settings:

 

shared_buffers = 1GB

work_mem = 10MB

max_fsm_pages = 204800

random_page_cost = 1.0

effective_cache_size = 8GB

 

 

Thanks for any help!

 

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



[PERFORM] Suggestions on an update query

2007-10-26 Thread Campbell, Lance
PostgreSql version 8.2.4

Memory = 8 Gig

CPUs 1 dual core Zeon running at 3.0

 

I have a problem with an update query taking over 10 hours in order to
run.   I rebooted my server.  I ran the SQL command analyze.  Could
you please help me with any suggestions?  I have included the two tables
involved in the update below as well as the indexes I am using.  

 

The table result_entry contains 17,767,240 rows and the table
question_number contains 40,787.  Each row from the result_entry table
will match to one and only one row in the table question_number using
the fk_question_id field.  Each row from the question_number table
matches to an average of 436 rows on the result_entry table.

 

CREATE TABLE question_number

(

  fk_form_idinteger not null,

  fk_question_idinteger not null,

  question_number   integer not null,

  sequence_id   integer not null

);

 

ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey
PRIMARY KEY (fk_question_id);

CREATE INDEX question_number_index1 ON question_number USING btree
(question_number);

 

 

CREATE TABLE result_entry (

fk_result_submission_id integer NOT NULL,

fk_question_id integer NOT NULL,

fk_option_order_id integer NOT NULL,  

value character varying,

order_id integer NOT NULL,

question_number integer

);

 

CREATE INDEX result_entery_index1 ON result_entry USING btree
(fk_question_id);

 

 

update result_entry set question_number=question_number.question_number


from question_number where
result_entry.fk_question_id=question_number.fk_question_id;

 

 

 

explain update result_entry set
question_number=question_number.question_number  

from question_number where
result_entry.fk_question_id=question_number.fk_question_id;

 

   QUERY PLAN



-

 Hash Join  (cost=1437.71..1046983.94 rows=17333178 width=32)

   Hash Cond: (result_entry.fk_question_id =
question_number.fk_question_id)

   -  Seq Scan on result_entry  (cost=0.00..612216.78 rows=17333178
width=28)

   -  Hash  (cost=927.87..927.87 rows=40787 width=8)

 -  Seq Scan on question_number  (cost=0.00..927.87 rows=40787
width=8)

(5 rows)

 

 

 

Postgresql.conf settings:

 

shared_buffers = 1GB

work_mem = 10MB

max_fsm_pages = 204800

random_page_cost = 1.0

effective_cache_size = 8GB

 

 

Thanks for any help!

 

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



[PERFORM] SQL Monitoring

2007-10-09 Thread Campbell, Lance
PGSQL 8.2.4

 

I have noticed a slight spike in the amount of CPU usage in the last few
weeks.  I am sure it has to do with a change or two that was made to
some queries.  What is the best way to log the SQL that is being
executed?  I would prefer to limit the size of the log file to 2 G.  Is
there a way to do this?

 

Thanks for any help,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



[PERFORM] Installing PostgreSQL

2007-08-23 Thread Campbell, Lance
Should installation questions be sent here or to the admin listserv?

 

OS: redhat linux

Version of PostgreSQL: 8.2.4

 

I had a group that now manages our server set up a directory/partition
for us to put postgreSQL into.  The directory is called pgsql_data.  The
directory is more than a regular directory.  It contains a subdirectory
called lost+found.  I would assume this is a logical partition.  I
tried installing postgreSQL directly into this directory but it failed
since there is a file in this directory, lost+found.  Is there a way
around this?  Worst case scenario I will create a subdirectory called
data and put the install in there.  I would have preferred to put it
directly into the pgsql_data.  There would be no other files that would
have gone into the directory/partition other than postgreSQL.  Would it
be possible for me to install postgreSQL into a sub directory of
pgsql_data and then move the files up a directory into pgsql_data?

 

Thanks,

 

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: [PERFORM] Installing PostgreSQL

2007-08-23 Thread Campbell, Lance
Richard,
So what you are saying is that if you install PostgeSQL into a data
directory /abc/data you could then stop the database, move the files
into /def/data, and then start the database making sure to point to the
new data directory.  PostgreSQL is therefore referencing its files
relative to the data directory the files are in.

Is this a correct observation?

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 

-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 23, 2007 12:08 PM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Installing PostgreSQL

Campbell, Lance wrote:
 Should installation questions be sent here or to the admin listserv?

Probably the pgsql-general/admin/novice lists

 OS: redhat linux

RHES?

 Version of PostgreSQL: 8.2.4

OK

 I had a group that now manages our server set up a directory/partition
 for us to put postgreSQL into.  The directory is called pgsql_data.
The
 directory is more than a regular directory.  It contains a
subdirectory
 called lost+found.  I would assume this is a logical partition. 

No - if you get filesystem corruption any recovered disk-blocks are put 
into files here. All your disk partitions will have such a directory.

  I
 tried installing postgreSQL directly into this directory but it failed
 since there is a file in this directory, lost+found.  Is there a way
 around this?  Worst case scenario I will create a subdirectory called
 data and put the install in there.

That's what you want to do. Apart from anything else it lets you set 
ownership  permission of the directory.

   I would have preferred to put it
 directly into the pgsql_data.  There would be no other files that
would
 have gone into the directory/partition other than postgreSQL.  Would
it
 be possible for me to install postgreSQL into a sub directory of
 pgsql_data and then move the files up a directory into pgsql_data?

Just symlink your directory to the correct place if that's what you
want.

Partition at: /mnt/pg_disk
Directory is: /mnt/pg_disk/data
symlink to:   /var/db/data

-- 
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Installing PostgreSQL

2007-08-23 Thread Campbell, Lance
Richard,
I was able to prove that it works.  Thanks for your time.

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 

-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 23, 2007 12:26 PM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Installing PostgreSQL

Campbell, Lance wrote:
 Richard,
 So what you are saying is that if you install PostgeSQL into a data
 directory /abc/data you could then stop the database, move the files
 into /def/data, and then start the database making sure to point to
the
 new data directory.  PostgreSQL is therefore referencing its files
 relative to the data directory the files are in.
 
 Is this a correct observation?

Yes - provided:
1. Ownership and permissions on the destination directory are correct
2. You remember to stop the server when copying


-- 
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Terminology Question

2007-08-20 Thread Campbell, Lance
Terminology Question:

If I use the following statement:

 

I am backing up schema XYZ every 30 minutes.

 

Does this statement imply that I am only backing up the definition of
the data?  Or does it mean that I am backing up the definition of the
data and the data within the schema object?

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



[PERFORM] Table Statistics with pgAdmin III

2007-07-24 Thread Campbell, Lance
I have installed pgAdmin III 1.6.  In the tool when you click on a
particular table you can select a tab called Statistics.  This tab has
all kinds of info on your table.  For some reason the only info I see is
for table size, toast table size and indexes size.  Is there a reason
that the other 15 fields have zeros in them?  I was thinking that maybe
I needed to turn on a setting within my database in order to get
statistics reported.

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: [PERFORM] Table Statistics with pgAdmin III

2007-07-24 Thread Campbell, Lance
All of the fields are zero except for the three I listed in my posting.

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Page
Sent: Tuesday, July 24, 2007 12:50 PM
To: Jean-Max Reymond
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Table Statistics with pgAdmin III



 --- Original Message ---
 From: Jean-Max Reymond [EMAIL PROTECTED]
 To: pgsql-performance@postgresql.org
 Sent: 24/07/07, 18:23:53
 Subject: Re: [PERFORM] Table Statistics with pgAdmin III
 
 Campbell, Lance a écrit :
  I have installed pgAdmin III 1.6.  In the tool when you click on a 
  particular table you can select a tab called Statistics.  This tab has 
  all kinds of info on your table.  For some reason the only info I see is 
  for table size, toast table size and indexes size.  Is there a reason 
  that the other 15 fields have zeros in them?  I was thinking that maybe 
  I needed to turn on a setting within my database in order to get 
  statistics reported.
 
 it seems that the module pgstattuple is needed

That'll allow you to see extra stats in 1.8, but won't alter what you already 
see, in fact 1.6 won't use it at all. What values are at zero?

Regards, Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Volunteer to build a configuration tool

2007-06-21 Thread Campbell, Lance
Greg,
I have a PostgreSQL database that runs on a dedicated server.  The
server has 24Gig of memory.  What would be the max size I would ever
want to set the shared_buffers to if I where to relying on the OS for
disk caching approach?  It seems that no matter how big your dedicated
server is there would be a top limit to the size of shared_buffers.
 
Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Greg Smith
Sent: Thursday, June 21, 2007 2:15 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Volunteer to build a configuration tool

On Wed, 20 Jun 2007, Campbell, Lance wrote:

 If everything I said is correct then I agree Why have 
 effective_cache_size?  Why not just go down the approach that Oracle 
 has taken and require people to rely more on shared_buffers and the 
 general memory driven approach?  Why rely on the disk caching of the
OS?

First off, it may help explain the dynamics here if you know that until 
fairly recent releases, the PostgreSQL shared_buffers cache had some 
performance issues that made it impractical to make it too large.  It 
hasn't been that long that relying more heavily on the Postgres cache
was 
technically feasible.  I think the user community at large is still 
assimilating all the implications of that shift, and as such some of the

territory with making the Postgres memory really large is still being 
mapped out.

There are also still some issues left in that area.  For example, the 
bigger your shared_buffers cache is, the worse the potential is for
having 
a checkpoint take a really long time and disrupt operations.  There are
OS 
tunables that can help work around that issue; similar ones for the 
PostgreSQL buffer cache won't be available until the 8.3 release.

In addition to all that, there are still several reasons to keep relying

on the OS cache:

1) The OS cache memory is shared with other applications, so relying on
it 
lowers the average memory footprint of PostgreSQL.  The database doesn't

have to be a pig that constantly eats all the memory up, while still 
utilizing it when necessary.

2) The OS knows a lot more about the disk layout and similar low-level 
details and can do optimizations a platform-independant program like 
Postgres can't assume are available.

3) There are more people working on optimizing the caching algorithms in

modern operating systems than are coding on this project.  Using that 
sophisticated cache leverages their work.

The Oracle Way presumes that you've got such a massive development
staff 
that you can solve these problems better yourself than the community at 
large, and then support that solution on every platform.  This is why
they 
ended up with solutions like raw partitions, where they just put their
own 
filesystem on the disk and figure out how to make that work well 
everywhere.  If you look at trends in this area, at this point the 
underlying operating systems have gotten good enough that tricks like
that 
are becoming marginal.  Pushing more work toward the OS is a completely 
viable design choice that strengthens every year.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Volunteer to build a configuration tool

2007-06-20 Thread Campbell, Lance
It is amazing how many times you can read something before it actually
sinks in.

There seems to be two possible approaches to optimizing PostgreSQL 8.2:

File caching approach:
This approach is based on the fact that the OS will cache the necessary
PostgreSQL files.  The key here is to set the size of
effective_cache_size value as high as you think the OS has memory to
cache the files.  This approach would need the value of shared_buffers
to be relatively low.  Otherwise you are in a cense storing the data
twice.  One would also have to make sure that work_mem is not too high.
Since the files would be cached by the OS, work_mem could be relatively
low.  This is an ideal approach if you have a dedicated server since
there would be no other software using memory or accessing files that
the OS would try to cache.

Memory driven approach:
In this approach you want to create a large value for shared_buffers.
You are relying on shared_buffers to hold the most commonly accessed
disk blocks.  The value for effective_cache_size would be relatively
small since you are not relying on the OS to cache files.  This seems
like it would be the ideal situation if you have other applications
running on the box.  By setting shared_buffers to a high value you are
guaranteeing memory available to PostgreSQL (this assumes the other
applications did not suck up to much memory to make your OS use virtual
memory).   This also seems more like how Oracle approaches things.  

Do I understand the possible optimization paths correctly?  The only
question I have about this approach is: if I use the memory driven
approach since effective_cache_size would be small I would assume I
would need to fiddle with random_page_cost since there would be know way
for PostgreSQL to know I have a well configured system.

If everything I said is correct then I agree Why have
effective_cache_size?  Why not just go down the approach that Oracle
has taken and require people to rely more on shared_buffers and the
general memory driven approach?  Why rely on the disk caching of the OS?
Memory is only getting cheaper.

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Greg Smith
Sent: Wednesday, June 20, 2007 10:21 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Volunteer to build a configuration tool

On Wed, 20 Jun 2007, PFC wrote:

 Except planner hints (argh) I see no way to give this information to
the 
 machine... since it's mostly in the mind of the DBA.

And the mind of the DBA has a funny way of being completely wrong some 
days about what's really happening under the hood.

 Maybe a per-table cache temperature param (hot, warm, cold), but
what 
 about the log table, the end of which is cached, but not the old
records 
 ? It's messy.

One of the things that was surprising to me when I started looking at
the 
organization of the PostgreSQL buffer cache is how little gross 
information about its contents is available.  I kept expecting to find a

summary section where you could answer questions like how much of the 
cache currently has information about index/table X? used as an input
to 
the optimizer.  I understand that the design model expects much of this
is 
unknowable due to the interaction with the OS cache, and in earlier 
versions you couldn't make shared_buffers big enough for its contents to

be all that interesting, so until recently this wasn't worth collecting.

But in the current era, where it's feasible to have multi-GB caches 
efficiently managed by PG and one can expect processor time is
relatively 
cheap, it seems to me one way to give a major boost to the optimizer is
to 
add some overhead to buffer cache management so it collects such 
information.  When I was trying to do a complete overhaul on the 
background writer, the #1 problem was that I had to assemble my own 
statistics on what was inside the buffer cache as it was scanned,
because 
a direct inspection of every buffer is the only way to know things like 
what percentage of the cache is currently dirty.

I can't figure out if I'm relieved or really worried to discover that
Tom 
isn't completely sure what to do with effective_cache_size either.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] PostgreSQL Configuration Tool for Dummies

2007-06-19 Thread Campbell, Lance
Please read the whole email before replying:

 

I love the feedback I have received but I feel that somehow I did not
communicate the intent of this mini project very well.  So let me
outline a few basics and who the audience was intended for.

 

 

Mini project title:

Initial Configuration Tool for PostgreSQL for Dummies

 

1) This is intended for newbie's.  Not for experienced users or advanced
DBAs.

 

2) This tool is NOT intended to monitor your PostgreSQL efficiency.

 

3) I suggested JavaScript because most people that get started with
PostgreSQL will go to the web in order to find out about issues relating
to configuration.  I wanted a very simple way for people to access the
tool that would not be tied to any particular environment or OS.  If
there is someone that is using a text browser to view the web then they
are probably geeky enough not to want to bother with using this tool.

 

4) The intent is just to give people that have no clue a better starting
point than some very generic defaults.

 

Please think simple.  I stress the word simple.  The real challenge here
is getting the formulas correct.  Someone mentioned to not focus on the
values but just get something out there for everyone to help tweak.  I
agree!

 

What questions do you think should be asked in order to figure out what
values should go into the formulas for the configuration suggestions?  

 

My thoughts:

 

What version of PostgreSQL are you using?

How much memory will be available to PostgreSQL?

How many connections will be made to PostgreSQL?

 

 

Thanks,

 

 

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: [PERFORM] PostgreSQL Configuration Tool for Dummies

2007-06-19 Thread Campbell, Lance
Francisco and Richard,
Why ask about disk or raid?  How would that impact any settings in
postgresql.conf?

I did forget the obvious question:

What OS are you using?

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 

-Original Message-
From: Francisco Reyes [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 19, 2007 11:58 AM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] PostgreSQL Configuration Tool for Dummies

Campbell, Lance writes:

 3) I suggested JavaScript because most people that get started with 
 PostgreSQL will go to the web in order to find out about issues
relating 

Why not c?
It could then go into contrib.
Anyways.. language is likely the least important issue..
As someone mentioned.. once the formulas are worked out it can be done
in a 
few languages.. as people desire..

 How much memory will be available to PostgreSQL?
 How many connections will be made to PostgreSQL?

Will this be a dedicated Postgresql server?
Will there be mostly reads or will there also be significant amount of 
writes?

Are you on a RAID system or do you have several disks over which you
would 
like to run postgresql on?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL Configuration Tool for Dummies

2007-06-19 Thread Campbell, Lance
Yudhvir,

I completely agree.  I was just putting together a similar email.

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Y Sidhu
Sent: Tuesday, June 19, 2007 12:49 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] PostgreSQL Configuration Tool for Dummies

 

 

On 6/19/07, Francisco Reyes [EMAIL PROTECTED] wrote:

Campbell, Lance writes:

 Francisco and Richard,
 Why ask about disk or raid?  How would that impact any settings in
 postgresql.conf?

If the user has 2 disks and says that he will do a lot of updates he
could 
put pg_xlog in the second disk.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Let's not ask about disk or raid at this level of sanity tuning. It is
important for a newbie to take the right first step. When it comes to
disks, we start talking I/O, SATA, SCSI and the varying degrees of SATA
and SCSI, and controller cards. Then we throw in RAID and the different
levels therein. Add to that, we can talk about drivers controlling these
drives and which OS is faster, more stable, etc. As you can see, a
newbie would get drowned. So, please keep it simple. I know many people
on this list are Gurus. We know you are the best in this field, but we
are not and are just trying to improve what we have.

-- 
Yudhvir Singh Sidhu
408 375 3134 cell 



Re: [PERFORM] PostgreSQL Configuration Tool for Dummies

2007-06-19 Thread Campbell, Lance
D'Arcy,
I wanted to put it on the www.postgresql.org site.  That is what I said
in my original email.  I don't believe anyone from the actual project
has contacted me.

I am setting up a JavaScript version first.  If someone wants to do a
different one feel free.  I will have all of the calculations in the
JavaScript so it should be easy to do it in any language.

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of D'Arcy J.M.
Cain
Sent: Tuesday, June 19, 2007 12:32 PM
To: Francisco Reyes
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] PostgreSQL Configuration Tool for Dummies

On Tue, 19 Jun 2007 12:58:26 -0400
Francisco Reyes [EMAIL PROTECTED] wrote:
 Campbell, Lance writes:
  3) I suggested JavaScript because most people that get started with 
  PostgreSQL will go to the web in order to find out about issues
relating 
 
 Why not c?

Why not whatever and install it on www.PostgreSQL.org?  Is there any
reason that this tool would need to be run on every installation.  Run
it on the site and it can always be up to date and can be written in
whatever language is easiest to maintain on the mother system.

I would also like to make a pitch for a JavaScript-free tool.  Just
collect all the pertinent information, work it out and display the
results in a second page.  Some people just don't like JavaScript and
turn it off even if we can run it in our browser.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] PostgreSQL Configuration Tool for Dummies

2007-06-19 Thread Campbell, Lance
Below is a link to the HTML JavaScript configuration page I am creating:

 

http://www.webservices.uiuc.edu/postgresql/

 

I had many suggestions.  Based on the feedback I received, I put
together the initial list of questions.  This list of questions can be
change.

 

Memory

There are many different ways to ask about memory.  Rather than ask a
series of questions I went with a single question, #2.  If it is better
to ask about the memory in a series of questions then please give me the
questions you would ask and why you would ask each of them.  From my
understanding the primary memory issue as it relates to PostgreSQL is
how much memory is available to PostgreSQL.  Remember that this needs
to be as simple as possible.

 

My next step is to list the PostgreSQL parameters found in the
postgresql.conf file and how I will generate their values based on the
questions I have so far.  I will primarily focus on PostgreSQL 8.2.x.
Once I have a consensus from everyone then I will put functionality
behind the Generate Suggested Settings button.

 

Thanks for all of the feedback, 

 

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



[PERFORM] PostgreSQL Configuration Tool for Dummies

2007-06-19 Thread Campbell, Lance
Now I am at the difficult part, what parameters to calculate and how to
calculate them.  Everything below has to do with PostgreSQL version 8.2:

 

The parameters I would think we should calculate are:

max_connections

shared_buffers

work_mem

maintenance_work_mem

effective_cache_size

random_page_cost

 

Any other variables?  I am open to suggestions.

 

 

Calculations based on values supplied in the questions at the top of the
page:

 

max_connection= question #3 or a minimum of 8

 

effective_cache_size={question #2}MB

 

maintenance_work_mem= ({question #2} * .1) MB

 

Any thoughts on the other variables based on the questions found at the
top of the below web page?

 

http://www.webservices.uiuc.edu/postgresql/ 

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



[PERFORM] Volunteer to build a configuration tool

2007-06-18 Thread Campbell, Lance
I am a Java Software architect, DBA, and project manager for the
University of Illinois, Department of Web Services.  We use PostgreSQL
to serve about 2 million pages of dynamic content a month; everything
from calendars, surveys, forms, discussion boards, RSS feeds, etc.  I am
really impressed with this tool.

 

The only major problem area I have found where PostgreSQL is really
lacking is in what should my initial configuration settings be?  I
realize that there are many elements that can impact a DBA's specific
database settings but it would be nice to have a configuration tool
that would get someone up and running better in the beginning.  

 

This is my idea:

 

A JavaScript HTML page that would have some basic questions at the top:

1) How much memory do you have?

2) How many connections will be made to the database?

3) What operating system do you use?

4) Etc...

 

Next the person would press a button, generate, found below the
questions.  The JavaScript HTML page would then generate content for two
Iframes at the bottom on the page.  One Iframe would contain the
contents of the postgresql.conf file.  The postgresql.conf settings
would be tailored more to the individuals needs than the standard
default file.  The second Iframe would contain the default settings one
should consider using with their operating system.

 

My web team would be very happy to develop this for the PostgreSQL
project.   It would have saved us a lot of time by having a
configuration tool in the beginning.  I am willing to make this a very
high priority for my team.

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: [PERFORM] [DOCS] Volunteer to build a configuration tool

2007-06-18 Thread Campbell, Lance
Mario,
The JavaScript configuration tool I proposed would not be in the install
of PostgreSQL.  It would be an HTML page.  It would be part of the HTML
documentation or it could be a separate HTML page that would be linked
from the HTML documentation.

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: Mario Gonzalez [mailto:[EMAIL PROTECTED] 
Sent: Monday, June 18, 2007 10:16 AM
To: Campbell, Lance
Cc: [EMAIL PROTECTED]; pgsql-performance@postgresql.org
Subject: Re: [DOCS] Volunteer to build a configuration tool

On 18/06/07, Campbell, Lance [EMAIL PROTECTED] wrote:

 Next the person would press a button, generate, found below the
questions.
  The JavaScript HTML page would then generate content for two Iframes
at the
 bottom on the page.  One Iframe would contain the contents of the
 postgresql.conf file.  The postgresql.conf settings would be tailored
more
 to the individuals needs than the standard default file.  The second
Iframe
 would contain the default settings one should consider using with
their
 operating system.


  I think it could be a great help to newbies. IMVHO a bash script in
dialog could be better than a javascript file. There are many
administrators with no graphics navigator or with no javascript.



-- 
http://www.advogato.org/person/mgonzalez/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Recommended Initial Settings

2007-02-23 Thread Campbell, Lance
I would like to get someone's recommendations on the best initial
settings for a dedicated PostgreSQL server.  I do realize that there are
a lot of factors that influence how one should configure a database.  I
am just looking for a good starting point.  Ideally I would like the
database to reside as much as possible in memory with no disk access.
The current database size of my 7.x version of PostgreSQL generates a 6
Gig file when doing a database dump.

 

Dedicated PostgreSQL 8.2 Server

Redhat Linux 4.x AS 64 bit version (EM64T)

4 Intel Xeon Processors

20 Gig Memory

Current PostgreSQL database is 6 Gig file when doing a database dump

 

 

/etc/sysctl.conf  file settings:

 

# 11 Gig

kernel.shmmax = 11811160064

 

kernel.sem = 250 32000 100 128

net.ipv4.ip_local_port_range = 1024 65000

net.core.rmem_default = 262144  

net.core.rmem_max = 262144 

net.core.wmem_default = 262144

net.core.wmem_max = 262144  

 

 

postgresql.conf file settings (if not listed then I used the defaults):

 

max_connections = 300

shared_buffers = 10240MB

work_mem = 10MB

effective_cache_size = 512MB

maintenance_work_mem = 100MB

 

 

Any suggestions would be appreciated!

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: [PERFORM] Recommended Initial Settings

2007-02-23 Thread Campbell, Lance
Richard,
Thanks for your reply.  

You said:
Your operating-system should be doing the caching for you.

My understanding is that as long as Linux has memory available it will
cache files.  Then from your comment I get the impression that since
Linux would be caching the data files for the postgres database it would
be redundant to have a large shared_buffers.  Did I understand you
correctly?

Thanks,



Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 23, 2007 10:29 AM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recommended Initial Settings

Campbell, Lance wrote:
 I would like to get someone's recommendations on the best initial
 settings for a dedicated PostgreSQL server.  I do realize that there
are
 a lot of factors that influence how one should configure a database.
I
 am just looking for a good starting point.  Ideally I would like the
 database to reside as much as possible in memory with no disk access.
 The current database size of my 7.x version of PostgreSQL generates a
6
 Gig file when doing a database dump.

Your operating-system should be doing the caching for you.

 Dedicated PostgreSQL 8.2 Server
 Redhat Linux 4.x AS 64 bit version (EM64T)
 4 Intel Xeon Processors

If these are older Xeons, check the mailing list archives for xeon 
context switch.

 20 Gig Memory
 Current PostgreSQL database is 6 Gig file when doing a database dump

OK, so it's plausible the whole thing will fit in RAM (as a 
rule-of-thumb I assume headers, indexes etc. triple or quadruple the 
size). To know better, check the actual disk-usage of $PGDATA.

 /etc/sysctl.conf  file settings:
 
 # 11 Gig
 
 kernel.shmmax = 11811160064

Hmm - that's a lot of shared RAM. See shared_buffers below.

 kernel.sem = 250 32000 100 128
 
 net.ipv4.ip_local_port_range = 1024 65000
 
 net.core.rmem_default = 262144  
 
 net.core.rmem_max = 262144 
 
 net.core.wmem_default = 262144
 
 net.core.wmem_max = 262144  

 postgresql.conf file settings (if not listed then I used the
defaults):
 
 max_connections = 300

How many connections do you expect typically/peak? It doesn't cost much 
to have max_connections set high but your workload is the most important

thing missing from your question.

 shared_buffers = 10240MB

For 7.x that's probably way too big, but 8.x organises its buffers 
better. I'd still be tempted to start a 1 or 2GB and work up - see where

it stops buying you an improvement.

 work_mem = 10MB

If you have large queries, doing big sorts I'd increase this. Don't 
forget it's per-sort, so if you have got about 300 connections live at 
any one time that could be 300*10MB*N if they're all doing something 
complicated. If you only have one connection live, you can increase this

quite substantially.

 effective_cache_size = 512MB

This isn't setting PG's memory usage, it's telling PG how much data your

operating-system is caching. Check free and see what it says. For you,

I'd expect 10GB+.

 maintenance_work_mem = 100MB

This is for admin-related tasks, so you could probably increase it.

Workload workload workload - we need to know what you're doing with it. 
Once connection summarising the entire database will want larger numbers

than 100 connections running many small queries.

HTH
-- 
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] work-mem

2007-01-28 Thread Campbell, Lance
I have been researching how to improve my overall performance of
postgres.  I am a little confused on the reasoning for how work-mem is
used in the postgresql.conf file.  The way I understand the
documentation is you define with work-mem how much memory you want to
allocate per search.  Couldn't you run out of memory?  This approach
seems kind of odd to me.  How do you tell the system not to allocate too
much memory if you all of the sudden got hit with a heavier number of
queries?  

 

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



[PERFORM] work-mem how do I identify the proper size

2007-01-28 Thread Campbell, Lance
If I set work-mem at a particular amount of memory how do I answer the
following questions:

 

1)   How many of my queries were able to run inside the memory I
allocated for work-mem?

2)   How many of my queries had to run from disk because work-mem
was not set high enough?

3)   If a query had to go to disk in order to be sorted or completed
is there a way to identify how much memory it would have taken in order
to run the query from memory?

 

Thanks for all of your help,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu