Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-29 Thread Ron

At 12:46 AM 12/28/2006, Guy Rouillier wrote:
I don't want to violate any license agreement by discussing 
performance, so I'll refer to a large, commercial 
PostgreSQL-compatible DBMS only as BigDBMS here.


I'm trying to convince my employer to replace BigDBMS with 
PostgreSQL for at least some of our Java applications.  As a proof 
of concept, I started with a high-volume (but conceptually simple) 
network data collection application.  This application collects 
files of 5-minute usage statistics from our network devices, and 
stores a raw form of these stats into one table and a normalized 
form into a second table. We are currently storing about 12 million 
rows a day in the normalized table, and each month we start new 
tables.  For the normalized data, the app inserts rows initialized 
to zero for the entire current day first thing in the morning, then 
throughout the day as stats are received, executes updates against 
existing rows.  So the app has very high update activity.


In my test environment, I have a dual-x86 Linux platform running the 
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS 
and PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk 
arrays attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, 
for those familiar with these devices.)  The arrays are set up with 
RAID5.  So I'm working with a consistent hardware platform for this 
comparison.  I'm only processing a small subset of files (144.)


BigDBMS processed this set of data in 2 seconds, with all 
foreign keys in place.  With all foreign keys in place, PG took 
54000 seconds to complete the same job.  I've tried various 
approaches to autovacuum (none, 30-seconds) and it doesn't seem to 
make much difference.  What does seem to make a difference is 
eliminating all the foreign keys; in that configuration, PG takes 
about 3 seconds.  Better, but BigDBMS still has it beat significantly.


If you are using pg configured as default installed, you are not 
getting pg's best performance.  Ditto using data structures optimized 
for BigDBMS.


A= go through each query and see what work_mem needs to be for that 
query to be as RAM resident as possible.  If you have enough RAM, set 
work_mem for that query that large.  Remember that work_mem is =per 
query=, so queries running in parallel eat the sum of each of their work_mem's.


B= Make sure shared buffers is set reasonably.  A good rule of thumb 
for 8.x is that shared buffers should be at least ~1/4 your RAM.  If 
your E4500 is maxed with RAM, there's a good chance shared buffers 
should be considerably more than 1/4 of RAM.


C= What file system are you using?  Unlike BigDBMS, pg does not have 
its own native one, so you have to choose the one that best suits 
your needs.  For update heavy applications involving lots of small 
updates jfs and XFS should both be seriously considered.


D= Your table schema and physical table layout probably needs to 
change.  What BigDBMS likes here is most likely different from what pg likes.


E= pg does not actually update records in place.  It appends new 
records to the table and marks the old version invalid.  This means 
that things like pages size, RAID stripe size, etc etc may need to 
have different values than they do for BigDBMS.  Another consequence 
is that pg likes RAID 10 even more than most of its competitors.


F= This may seem obvious, but how many of the foreign keys and other 
overhead do you actually need?  Get rid of the unnecessary.


G= Bother the folks at Sun, like Josh Berkus, who know pq inside and 
out +and+ know your HW (or have access to those that do ;-) )inside 
and out.  I'll bet they'll have ideas I'm not thinking of.


H= Explain Analyze is your friend.  Slow queries may need better 
table statistics, or better SQL, or may be symptoms of issues "C" or 
"D" above or ...


I've got PG configured so that that the system database is on disk 
array 2, as are the transaction log files.  The default table space 
for the test database is disk array 3.  I've got all the reference 
tables (the tables to which the foreign keys in the stats tables 
refer) on this array.  I also store the stats tables on this 
array.  Finally, I put the indexes for the stats tables on disk 
array 4.  I don't use disk array 1 because I believe it is a software array.
I= With 4 arrays of 12 HDs each, you definitely have enough spindles 
to place pg_xlog somewhere separate from all the other pg tables.  In 
addition, you should analyze you table access patterns and then 
scatter them across your 4 arrays in such as way as to minimize head 
contention.



I'm out of ideas how to improve this picture any further.  I'd 
appreciate some suggestions.  Thanks.

Hope this helps,

Ron Peacetree 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  mes

Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-29 Thread Arnau

Hi all,

A= go through each query and see what work_mem needs to be for that 
query to be as RAM resident as possible.  If you have enough RAM, set 
work_mem for that query that large.  Remember that work_mem is =per 
query=, so queries running in parallel eat the sum of each of their 
work_mem's.


How can I know what work_mem needs a query needs?

Regards
--
Arnau

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Backup/Restore too slow

2006-12-29 Thread Sebastián Baioni
Hi,
We have a database with one table of 10,000,000 tuples and 4 tables with 
5,000,000 tuples.
While in SQL Server it takes 3 minutes to restore this complete database, in 
PostgreSQL it takes more than 2 hours.
The Backup takes 6 minutes in SQLServer and 13 minutes (which is not a problem)

We are running PostgreSQL 8.1 for Windows and we are using:
C:\pg_dump.exe -i -h localhost -p 5432 -U usuario -F c -b -v -f 
"C:\BK\file.backup" base
and 
C:\pg_restore.exe  -i -h localhost -p 5432 -U usuario -d base -O -v 
"C:\BK\file.backup"

We use those parameters because we copied them  from PGAdminIII.

Is there any way to make it faster?

Tanks
 Sebastián
 __
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis! 
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar

[PERFORM] What you would consider as heavy traffic?

2006-12-29 Thread [EMAIL PROTECTED]
Hey Everyone,

I am having a bit of trouble with a web host, and was wondering as what
you would class as a high level of traffic to a database (queries per
second) to an average server running postgres in a shared hosting
environment (very modern servers).

Many Thanks in Advance,
Oliver


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] performance implications of binary placement

2006-12-29 Thread Bob Dusek
Hello all,

I've been running performance tests on various incantations of Postgres
on/off for a month or so.  And, I've just come across some unexpected
results.

When I start my Postgres build as such:

# (Scenario 1)

./configure --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin
--includedir=/usr/include/pgsql --datadir=/usr/share/postgresql
--mandir=/usr/share/man --with-docdir=/usr/share/doc/packages
--disable-rpath --enable-thread-safety --enable-integer-datetimes
--without-python --without-perl --without-tcl --without-tk

It performs significantly worse than when I start my build like this:

# (Scenario 2)

./configure --disable-rpath --enable-thread-safety
--enable-integer-datetimes --without-python --without-perl --without-tcl
--without-tk

Note:  the only differences are that "Scenario 1" includes these
options:

--prefix=/usr --libdir=/usr/lib --bindir=/usr/bin
--includedir=/usr/include/pgsql --datadir=/usr/share/postgresql
--mandir=/usr/share/man --with-docdir=/usr/share/doc/packages

And, to be clear, "Scenario 1" performs worse than "Scenario 2".  Simple
insert statements are taking significantly longer. 

I did not expect to see a performance hit with these options, especially
since "/usr/" on the test machine is mounted as its own partition, and
in both cases, all of the binaries, include files, etc. are in that
partition. 

Has anyone seen this before?  Are hard drive mechanics the only thing in
play here?

The only difference I'm seeing in logging between the two versions is
that Scenario 2 has several of this message littered throughout the
logfile:

ERROR: could not open relation "pg_index_indexrelid_index": No such file
or directory

But, that doesn't seem to be effecting functionality or performance
(especially considering the fact that the logfile that contains that
message is part of the test that is performing better).

We're using Postgres 7.4.8, building from the SLES9 Postgres 7.4.8
source rpm. 

Thanks for any help you can provide.  I can provide more detail if
needed.

Thanks again,

Bob 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Postgresql Configutation and overflow

2006-12-29 Thread Fabricio Peñuelas

Good day,

I have been reading about the configuration of postgresql, but I have a 
server who does not give me the performance that should. The tables are 
indexed and made vacuum regularly, i monitor with top, ps and 
pg_stat_activity and when i checked was slow without a heavy load overage.


Before, the server reached 2000 connections to postgresql (with 
max_connections=3000 in it for future workflow).


I divided the load with another server for better performance, and now reach 
500 connections, but yet is overflow.



My question is about how much memory should i configure in shared_buffers 
and effective_cache_size.


Features:

- 4 Processsors Intel Xeon Dual 3.0Ghz
- 12 GB RAM
- 2 discos en RAID 1 for OS
- 4 discs RAID 5 for DB
- S.O Slackware 11.0 Linux 2.6.17.7
- Postgres 8.1.4


=In internet i found this:

Tuning PostgreSQL for performance
2 Some basic parameters
2.1 Shared buffers

#  Start at 4MB (512) for a workstation
# Medium size data set and 256-512MB available RAM: 16-32MB (2048-4096)
# Large dataset and lots of available RAM (1-4GB): 64-256MB (8192-32768)
==


My postgresql.conf configuration is:

#---
# FILE LOCATIONS
#---

# The default values of these variables are driven from the -D command line
# switch or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'   # use data in another directory
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
#ident_file = 'ConfigDir/pg_ident.conf' # IDENT configuration file

# If external_pid_file is not explicitly set, no extra pid file is written.
#external_pid_file = '(none)'   # write an extra pid file


#---
# CONNECTIONS AND AUTHENTICATION
#---

# - Connection Settings -

listen_addresses = '*'  # what IP address(es) to listen on;
# comma-separated list of addresses;
# defaults to 'localhost', '*' = all
port = 5432
max_connections = 3000
# note: increasing max_connections costs ~400 bytes of shared memory per
# connection slot, plus lock space (see max_locks_per_transaction).  You
# might also need to raise shared_buffers to support more connections.
#superuser_reserved_connections = 2
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal
#bonjour_name = ''  # defaults to the computer name



#---
# RESOURCE USAGE (except WAL)
#---

# - Memory -


shared_buffers = 81920  # min 16 or max_connections*2, 8KB each
temp_buffers = 5000 # min 100, 8KB each
max_prepared_transactions = 1000# can be 0 or more

# note: increasing max_prepared_transactions costs ~600 bytes of shared 
memory


# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 10240# min 64, size in KB
maintenance_work_mem = 253952   # min 1024, size in KB
max_stack_depth = 4096  # min 100, size in KB

# - Free Space Map -

#max_fsm_pages = 2  # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000   # min 100, ~70 bytes each

# - Kernel Resource Usage -

#max_files_per_process = 1000   # min 25
#preload_libraries = ''

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0  # 0-1000 milliseconds
#vacuum_cost_page_hit = 1   # 0-1 credits
#vacuum_cost_page_miss = 10 # 0-1 credits
#vacuum_cost_page_dirty = 20# 0-1 credits
#vacuum_cost_limit = 200# 0-1 credits

# - Background writer -

#bgwriter_delay = 200   # 10-1 milliseconds between rounds
#bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned/round
#bgwriter_lru_maxpages = 5  # 0-1000 buffers max written/round
#bgwriter_all_percent = 0.333   # 0-100% of all buffers scanned/round
#bgwriter_all_maxpages = 5  # 0-1000 buffers max written/round


#---
# WRITE AHEAD LOG
#---

# - Settings -

#fsync = on # turns forced synchronization on or off
#wal_sync_method = fsync# the default is the first option
# supported by the operating system:
#   open_datasync
# 

Re: [PERFORM] Backup/Restore too slow

2006-12-29 Thread Rod Taylor
Rebuilding the indexes or integrity confirmations are probably taking  
most of the time.


What is your work_mem setting?

On 22-Dec-06, at 9:32 AM, Sebastián Baioni wrote:


Hi,
We have a database with one table of 10,000,000 tuples and 4 tables  
with 5,000,000 tuples.
While in SQL Server it takes 3 minutes to restore this complete  
database, in PostgreSQL it takes more than 2 hours.
The Backup takes 6 minutes in SQLServer and 13 minutes (which is  
not a problem)


We are running PostgreSQL 8.1 for Windows and we are using:
C:\pg_dump.exe -i -h localhost -p 5432 -U usuario -F c -b -v -f "C: 
\BK\file.backup" base

and
C:\pg_restore.exe  -i -h localhost -p 5432 -U usuario -d base -O -v  
"C:\BK\file.backup"


We use those parameters because we copied them  from PGAdminIII.

Is there any way to make it faster?

Tanks
 Sebastián
__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar




Re: [PERFORM] What you would consider as heavy traffic?

2006-12-29 Thread Rod Taylor
Depends on what the query is. If the queries take 3 to 5 days to  
execute, then 1 query per day on a 4 CPU machine would be at capacity.


On 23-Dec-06, at 3:12 AM, [EMAIL PROTECTED] wrote:


Hey Everyone,

I am having a bit of trouble with a web host, and was wondering as  
what

you would class as a high level of traffic to a database (queries per
second) to an average server running postgres in a shared hosting
environment (very modern servers).

Many Thanks in Advance,
Oliver


---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Backup/Restore too slow

2006-12-29 Thread Tom Lane
Rod Taylor <[EMAIL PROTECTED]> writes:
> Rebuilding the indexes or integrity confirmations are probably taking
> most of the time.

> What is your work_mem setting?

maintenance_work_mem is the thing to look at, actually.  I concur that
bumping it up might help.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Need Help

2006-12-29 Thread JM
how can i get the disk usage for each table? can i do it via SQL? 



Thanks,

Mailing-Lists

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Backup/Restore too slow

2006-12-29 Thread Sebastián Baioni
Thanks for answering.
This is my configuration:
# - Memory -

shared_buffers = 1000# min 16, at least max_connections*2, 8KB each
#work_mem = 1024# min 64, size in KB
#maintenance_work_mem = 16384# min 1024, size in KB
#max_stack_depth = 2048# min 100, size in KB

The PC where we are runing PostgreSQL server is:
AMD Athlon(tm) 64 Processor
3000+
1.79 GHz, 1.93 GB RAM
with WindowsXP Proffesional, Version 2002 Service Pack 2.

How should we set it?

Thanks a lot!
 Sebastián

Tom Lane <[EMAIL PROTECTED]> escribió: Rod Taylor  writes:
> Rebuilding the indexes or integrity confirmations are probably taking
> most of the time.

> What is your work_mem setting?

maintenance_work_mem is the thing to look at, actually.  I concur that
bumping it up might help.

   regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
 __
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis! 
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar

Re: [PERFORM] URGENT: Out of disk space pg_xlog

2006-12-29 Thread Simon Riggs
On Sat, 2006-12-23 at 13:13 -0500, Bruce Momjian wrote:

> The bottom line is that we know of now cases where a long-running
> transaction would delay recycling of the WAL files, so there is
> certainly something not understood here.

We can see from all of this that a checkpoint definitely didn't occur.
Tom's causal chain was just one way that could have happened, there
could well be others.

I've noticed previously that a checkpoint can be starved out when trying
to acquire the CheckpointStartLock. I've witnessed a two minute delay
plus in obtaining the lock in the face of heavy transactions.

If wal_buffers is small enough, WAL write rate high enough and the
transaction rate high enough, a long queue can form for the
WALWriteLock, which ensures that the CheckpointStartLock would queue
indefinitely. 

I've tried implementing a queueable shared lock for the
CheckpointStartLock. That helps the checkpoint, but it harms performance
of other transactions waiting to commit, so I let that idea go.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Backup/Restore too slow

2006-12-29 Thread Russell Smith

Sebastián Baioni wrote:

Thanks for answering.
This is my configuration:
# - Memory -

shared_buffers = 1000# min 16, at least max_connections*2, 8KB 
each

#work_mem = 1024# min 64, size in KB
#maintenance_work_mem = 16384# min 1024, size in KB
#max_stack_depth = 2048# min 100, size in KB

The PC where we are runing PostgreSQL server is:
AMD Athlon(tm) 64 Processor
3000+
1.79 GHz, 1.93 GB RAM
with WindowsXP Proffesional, Version 2002 Service Pack 2.

How should we set it?
Shared buffers even on a workstation should be higher than 1000 if you 
want some performance.  It depends how much memory you have spare to use 
for PostgreSQL.  But something like

shared_buffers = 2
maintenance_work_mem = 256000

Will certainly give you a performance boost.  You will have to adjust 
those figures based on whatever else you are doing on the machine.


Russell Smith.


Thanks a lot!
 Sebastián

*/Tom Lane <[EMAIL PROTECTED]>/* escribió:

Rod Taylor writes:
> Rebuilding the indexes or integrity confirmations are probably
taking
> most of the time.

> What is your work_mem setting?

maintenance_work_mem is the thing to look at, actually. I concur that
bumping it up might help.

regards, tom lane

---(end of
broadcast)---
TIP 2: Don't 'kill -9' the postmaster

__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar





Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-29 Thread Alvaro Herrera
Ron wrote:

> C= What file system are you using?  Unlike BigDBMS, pg does not have 
> its own native one, so you have to choose the one that best suits 
> your needs.  For update heavy applications involving lots of small 
> updates jfs and XFS should both be seriously considered.

Actually it has been suggested that a combination of ext2 (for WAL) and
ext3 (for data, with data journalling disabled) is a good performer.
AFAIK you don't want the overhead of journalling for the WAL partition.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-29 Thread Jeff Frost

On Fri, 29 Dec 2006, Alvaro Herrera wrote:


Ron wrote:


C= What file system are you using?  Unlike BigDBMS, pg does not have
its own native one, so you have to choose the one that best suits
your needs.  For update heavy applications involving lots of small
updates jfs and XFS should both be seriously considered.


Actually it has been suggested that a combination of ext2 (for WAL) and
ext3 (for data, with data journalling disabled) is a good performer.
AFAIK you don't want the overhead of journalling for the WAL partition.


When benchmarking various options for a new PG server at one of my clients, I 
tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be fastest 
to have ext2 for the WAL.  The winning time was 157m46.713s for ext2, 
159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3 
data=writeback.  This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/ 1GB 
BBU cache.  This config benched out faster than a 6disk RAID10 + 2 disk RAID1 
for those of you who have been wondering if the BBU write back cache mitigates 
the need for separate WAL (at least on this workload).  Those are the fastest 
times for each config, but ext2 WAL was always faster than the other two 
options.  I didn't test any other filesystems in this go around.


--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-29 Thread Alex Turner

You should search the archives for Luke Lonegran's posting about how IO in
Postgresql is significantly bottlenecked because it's not async.  A 12 disk
array is going to max out Postgresql's max theoretical write capacity to
disk, and therefore BigRDBMS is always going to win in such a config.  You
can also look towards Bizgres which allegedly elimates some of these
problems, and is cheaper than most BigRDBMS products.

Alex.

On 12/28/06, Guy Rouillier <[EMAIL PROTECTED]> wrote:


I don't want to violate any license agreement by discussing performance,
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as
BigDBMS here.

I'm trying to convince my employer to replace BigDBMS with PostgreSQL
for at least some of our Java applications.  As a proof of concept, I
started with a high-volume (but conceptually simple) network data
collection application.  This application collects files of 5-minute
usage statistics from our network devices, and stores a raw form of
these stats into one table and a normalized form into a second table.
We are currently storing about 12 million rows a day in the normalized
table, and each month we start new tables.  For the normalized data, the
app inserts rows initialized to zero for the entire current day first
thing in the morning, then throughout the day as stats are received,
executes updates against existing rows.  So the app has very high update
activity.

In my test environment, I have a dual-x86 Linux platform running the
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and
PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk arrays
attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those
familiar with these devices.)  The arrays are set up with RAID5.  So I'm
working with a consistent hardware platform for this comparison.  I'm
only processing a small subset of files (144.)

BigDBMS processed this set of data in 2 seconds, with all foreign
keys in place.  With all foreign keys in place, PG took 54000 seconds to
complete the same job.  I've tried various approaches to autovacuum
(none, 30-seconds) and it doesn't seem to make much difference.  What
does seem to make a difference is eliminating all the foreign keys; in
that configuration, PG takes about 3 seconds.  Better, but BigDBMS
still has it beat significantly.

I've got PG configured so that that the system database is on disk array
2, as are the transaction log files.  The default table space for the
test database is disk array 3.  I've got all the reference tables (the
tables to which the foreign keys in the stats tables refer) on this
array.  I also store the stats tables on this array.  Finally, I put the
indexes for the stats tables on disk array 4.  I don't use disk array 1
because I believe it is a software array.

I'm out of ideas how to improve this picture any further.  I'd
appreciate some suggestions.  Thanks.

--
Guy Rouillier


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings