Re: [PERFORM] What's better: Raid 0 or disk for seperate pg_xlog

2005-03-11 Thread Richard Huxton
Karim Nassar wrote:
Thanks to all for the tips.
On Thu, 2005-03-10 at 09:26 -0600, John A Meinel wrote:
How critical is your data? How update heavy versus read heavy, etc are you? 

Large, relatively infrequent uploads, with frequent reads. The
application is a web front-end to scientific research data. The
scientists have their own copy of the data, so if something went really
bad, we could probably get them to upload again.
If you have very few updates and your reads aren't mostly from RAM you 
could be better off with simply mirroring (assuming that gains you read 
bandwidth). Failing that, use the tablespace feature to balance your 
read load as far as you can.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Statistics not working??

2005-03-11 Thread Hugo Ferreira
Hi there!

I think I may have a problem with the statistics in my postgresql 8.0
running under Windowx XP. When I view both pg_stat_all_tables and
pg_stat_all_indexes, all the numeric columns that should hold the
statistics are 0 (zero). My configuration file has the following:

stats_start_collector = true
stats_command_string = true
stats_reset_on_server_start = false

Any tip?

Thanks in advance,

Hugo Ferreira

-- 
GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C 2615 504B 7B85

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Statistics not working??

2005-03-11 Thread Oleg Bartunov
On Fri, 11 Mar 2005, Hugo Ferreira wrote:
Hi there!
I think I may have a problem with the statistics in my postgresql 8.0
running under Windowx XP. When I view both pg_stat_all_tables and
pg_stat_all_indexes, all the numeric columns that should hold the
statistics are 0 (zero). My configuration file has the following:
stats_start_collector = true
stats_command_string = true
stats_reset_on_server_start = false
Any tip?
You need to define stats_block_level and/or stats_row_level

Thanks in advance,
Hugo Ferreira

Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] [GENERAL] more execution time

2005-03-11 Thread Richard Huxton
ALÝ ÇELÝK wrote:
why this query needs more time?  Its very slow
Difficult to say for sure - could you provide the output of EXPLAIN 
ANALYSE rather than just EXPLAIN?

Some other immediate observations:
1. Perhaps don't post to so many mailing lists at once. If you reply to 
this, maybe reduce it to pgsql-performance?
2. You don't say whether the row estimates are accurate in the EXPLAIN.
3. You seem to be needlessly coalescing personaldetails.masterid since 
you check for it being null in your WHERE clause
4. Do you really need to cast to numeric and generate a "sorting" column 
that you then don't ORDER BY?
5. Is ppid an id number? And are you sure it's safe to calculate it like 
that?
6. What is balance() and how long does it take to calculate its result?

select
  coalesce(personaldetails.masterid::numeric,personaldetails.id) +
(coalesce(personaldetails.id::numeric,0)/100) as sorting,
  floor(coalesce(personaldetails.masterid::numeric,personaldetails.id) +
(coalesce(personaldetails.id::numeric,0)/100)) as ppid,

  balance('MASTER-REGISTRATION',personaldetails.id) as balance,

  balance('MASTER-REGISTRATION',pd2.id) as accbalance,
I'm guessing point 6 is actually your problem - try it without the calls 
to balance() and see what that does to your timings.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


[PERFORM] What is the number of rows in explain?

2005-03-11 Thread Joost Kraaijeveld
Hi all,

Is the number of rows in explain the number of rows that is expected to be 
visited or retrieved?

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] What is the number of rows in explain?

2005-03-11 Thread John A Meinel
Joost Kraaijeveld wrote:
Hi all,
Is the number of rows in explain the number of rows that is expected to be 
visited or retrieved?
Groeten,
Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 

In general, it is the number of rows expected to be retrieved. Since a 
Sequential Scan always visits every row, but the rows= number is after 
filtering.

John
=:->


signature.asc
Description: OpenPGP digital signature


[PERFORM] Performance tuning

2005-03-11 Thread Jacques Caron
Hi all,
I'm preparing a set of servers which will eventually need to handle a high 
volume of queries (both reads and writes, but most reads are very simple 
index-based queries returning a limited set of rows, when not just one), 
and I would like to optimize things as much as possible, so I have a few 
questions on the exact way PostgreSQL's MVCC works, and how transactions, 
updates and vacuuming interact. I hope someone will be able to point me in 
the right direction (feel free to give pointers if I missed the places 
where this is described).

From what I understand (and testing confirms it), bundling many queries in 
one single transaction is more efficient than having each query be a 
separate transaction (like with autocommit on). However, I wonder about the 
limits of this:

- are there any drawbacks to grouping hundreds or thousands of queries 
(inserts/updates) over several minutes in one single transaction? Other 
than the fact that the inserts/updates will not be visible until committed, 
of course. Essentially turning autocommit off, and doing a commit once in a 
while.

- does this apply only to inserts/selects/updates or also for selects? 
Another way to put this is: does a transaction with only one select 
actually have much transaction-related work to do? Or, does a transaction 
with only selects actually have any impact anywhere? Does it really leave a 
trace anywhere? Again, I understand that selects grouped in a transaction 
will not see updates done after the start of the transaction (unless done 
by the same process).

- if during a single transaction several UPDATEs affect the same row, will 
MVCC generate as many row versions as there are updates (like would be the 
case with autocommit) or will they be grouped into one single row version?

Another related issue is that many of the tables are indexed on a date 
field, and one process does a lot of updates on "recent" rows (which lead 
to many dead tuples), but after that "older" rows tend to remain pretty 
much unchanged for quite a while. Other than splitting the tables into 
"old" and "recent" tables, is there any way to make vacuum more efficient? 
Scanning the whole table for dead tuples when only a small portion of the 
table actually has any does not feel like being very efficient in this 
situation.

Other issue: every five minutes or so, I see a noticeable performance drop 
as PostgreSQL checkpoints. This is 7.4.3 with pretty lousy hardware, I know 
8.0 with decent hardware and separate disk(s) for pg_xlog will definitely 
help, but I really wonder if there is any way to reduce the amount of work 
that needs to be done at that point (I'm a strong believer of fixing 
software before hardware). I have already bumped checkpoint_segments to 8, 
but I'm not quite sure I understand how this helps (or doesn't help) 
things. Logs show 3 to 6 "recycled transaction log file" lines at that 
time, that seems quite a lot of work for a load that's still pretty low. 
Does grouping of more queries in transactions help with this? Are there 
other parameters that can affect things, or is just a matter of how much 
inserts/updates/deletes are done, and the amount of data that was changed?

Last point: some of the servers have expandable data (and will be 
replicated with slony-I) and will run with fsync off. I have read 
conflicting statements as to what exactly this does: some sources indicate 
that setting fsync off actually switches off WAL/checkpointing, others that 
it just prevents the fsync (or equivalent) system calls. Since I still see 
checkpointing in that case, I guess it's not exactly the former, but I 
would love to understand more about it. Really, I would love to be able to 
set some tables or databases to "go as fast as you can and don't worry 
about transactions, MVCC or anything like that", but I'm not sure that 
option exists...

Thanks,
Jacques.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PERFORM] Query performance

2005-03-11 Thread Lou O'Quin

As a test, I ran a query in the pgAdmin query tool, which returns about 15K records from a PostgreSQL v8.01 table on my Win2K server.I ran the same query from the local server, from another PC on the same 100 mbit local network, and from a PC on a different network, over the internet. The times for the query to run and the data to return for each of the three locations are shown here: 

Local Server : 571+521 ms 
Local network: 1187+1266 ms 
Internet:14579+4016 msMy question is this: Why does the execution time for the query to run increase so much? Since the query should be running on the server, it's time should be somewhat independent of the network transport delay. (unlike the data transport time) However, it appears to actually be hypersensitive to the transport delay. The ratios of time for the data transport (assuming 1 for the local server) are:1 : 2.43 : 7.71whereas the query execution time ratios are:1 : 2.08 : 25.5  (!!!)Obviously, the transport times will be greater.  But why does the execution time bloat so?


Re: [PERFORM] Query performance

2005-03-11 Thread Tom Lane
"Lou O'Quin" <[EMAIL PROTECTED]> writes:
> it appears to actually be hypersensitive to the transport delay. The =
> ratios of time for the data transport (assuming 1 for the local server) =
> are:
> 1 : 2.43 : 7.71

> whereas the query execution time ratios are:
> 1 : 2.08 : 25.5  (!!!)

How do you know that's what the data transport time is --- ie, how can
you measure that separately from the total query time?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Query performance

2005-03-11 Thread Lou O'Quin


Hi Tom.  I referenced the status line of pgAdmin.  Per the pgAdmin help file:
 
"The status line will show how long the last query took to complete. If a dataset was returned, not only the elapsed time for server execution is displayed, but also the time to retrieve the data from the server to the Data Output page."
 
Lou>>> Tom Lane <[EMAIL PROTECTED]> 3/11/2005 12:10 PM >>>
"Lou O'Quin" <[EMAIL PROTECTED]> writes:> it appears to actually be hypersensitive to the transport delay. The => ratios of time for the data transport (assuming 1 for the local server) => are:> 1 : 2.43 : 7.71> whereas the query execution time ratios are:> 1 : 2.08 : 25.5  (!!!)How do you know that's what the data transport time is --- ie, how canyou measure that separately from the total query time?            regards, tom lane


[PERFORM] Questions about 2 databases.

2005-03-11 Thread jelle
Hello All,
I have a couple of questions about running 2 databases:
1) on a single 7.4.6 postgres instance does each database have it own WAL
   file or is that shared? Is it the same on 8.0.x?
2) what's the high performance way of moving 200 rows between similar
   tables on different databases? Does it matter if the databases are
   on the same or seperate postgres instances?
Background:
My web app does lots of inserts that aren't read until a session is 
complete. The plan is to put the heavy insert session onto a ramdisk based 
pg-db and transfer the relevant data to the master pg-db upon session 
completion. Currently running 7.4.6.

Individual session data is not as critical as the master pg-db so the risk 
associated with running the session pg-db on a ramdisk is acceptable. 
All this is to get past the I/O bottleneck, already tweaked the config 
files, run on multiple RAID-1 spindles, profiled the queries, maxed 
the CPU/ram. Migrating to 64bit fedora soon.

Thanks, this mailing list has been invaluable.
Jelle
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Query performance

2005-03-11 Thread Tom Lane
"Lou O'Quin" <[EMAIL PROTECTED]> writes:
> Hi Tom.  I referenced the status line of pgAdmin.  Per the pgAdmin help
> file:
>
> "The status line will show how long the last query took to complete. If a
> dataset was returned, not only the elapsed time for server execution is
> displayed, but also the time to retrieve the data from the server to the
> Data Output page."

Well, you should probably ask the pgadmin boys exactly what they are
measuring.  In any case, the Postgres server overlaps query execution
with result sending, so I don't think it's possible to get a pure
measurement of just one of those costs --- certainly not by looking at
it only from the client end.

BTW, one factor to consider is that if the test client machines weren't
all the same speed, that would have some impact on their ability to
absorb 15K records ...

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Questions about 2 databases.

2005-03-11 Thread Tom Lane
jelle <[EMAIL PROTECTED]> writes:
> 1) on a single 7.4.6 postgres instance does each database have it own WAL
> file or is that shared? Is it the same on 8.0.x?

Shared.

> 2) what's the high performance way of moving 200 rows between similar
> tables on different databases? Does it matter if the databases are
> on the same or seperate postgres instances?

COPY would be my recommendation.  For a no-programming-effort solution
you could just pipe the output of pg_dump --data-only -t mytable
into psql.  Not sure if it's worth developing a custom application to
replace that.

> My web app does lots of inserts that aren't read until a session is 
> complete. The plan is to put the heavy insert session onto a ramdisk based 
> pg-db and transfer the relevant data to the master pg-db upon session 
> completion. Currently running 7.4.6.

Unless you have a large proportion of sessions that are abandoned and
hence never need be transferred to the main database at all, this seems
like a dead waste of effort :-(.  The work to put the data into the main
database isn't lessened at all; you've just added extra work to manage
the buffer database.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Query performance

2005-03-11 Thread Lou O'Quin


I'll post there concerning how they determine the query execution time vs. data retrieval time.
 
I did think about the processor/memory when choosing the machines - all three of the processors are similar.  All are Pentium P4s with 512 MB memory.
the server is Win2K, P4, 2.3 gHz
the local network client  is a WinXP Pro, P4, 2.2 gHzthe remote network client is WinXP Pro, P4, 1.9 gHz
 
Lou
>>> Tom Lane <[EMAIL PROTECTED]> 3/11/2005 1:21 PM >>>
"Lou O'Quin" <[EMAIL PROTECTED]> writes:> Hi Tom.  I referenced the status line of pgAdmin.  Per the pgAdmin help> file:>> "The status line will show how long the last query took to complete. If a> dataset was returned, not only the elapsed time for server execution is> displayed, but also the time to retrieve the data from the server to the> Data Output page."Well, you should probably ask the pgadmin boys exactly what they aremeasuring.  In any case, the Postgres server overlaps query executionwith result sending, so I don't think it's possible to get a puremeasurement of just one of those costs --- certainly not by looking atit only from the client end.BTW, one factor to consider is that if the test client machines weren'tall the same speed, that would have some impact on their ability toabsorb 15K records ...            regards, tom lane---(end of broadcast)---TIP 9: the planner will ignore your desire to choose an index scan if your  joining column's datatypes do not match


Re: [PERFORM] Questions about 2 databases.

2005-03-11 Thread Richard_D_Levine
> this seems
> like a dead waste of effort :-(.  The work to put the data into the main
> database isn't lessened at all; you've just added extra work to manage
> the buffer database.

True from the view point of the server, but not from the throughput in the
client session (client viewpoint).  The client will have a blazingly fast
session with the buffer database.  I'm assuming the buffer database table
size is zero or very small.  Constraints will be a problem if there are
PKs, FKs that need satisfied on the server that are not adequately testable
in the buffer.  Might not be a problem if the full table fits on the RAM
disk, but you still have to worry about two clients inserting the same PK.

Rick



 
  Tom Lane  
 
  <[EMAIL PROTECTED]>To:   [EMAIL 
PROTECTED]
  Sent by:   cc:   
pgsql-performance@postgresql.org  
  [EMAIL PROTECTED]Subject:  Re: [PERFORM] 
Questions about 2 databases.
  tgresql.org   
 

 

 
  03/11/2005 03:33 PM   
 

 

 




jelle <[EMAIL PROTECTED]> writes:
> 1) on a single 7.4.6 postgres instance does each database have it own WAL
> file or is that shared? Is it the same on 8.0.x?

Shared.

> 2) what's the high performance way of moving 200 rows between similar
> tables on different databases? Does it matter if the databases are
> on the same or seperate postgres instances?

COPY would be my recommendation.  For a no-programming-effort solution
you could just pipe the output of pg_dump --data-only -t mytable
into psql.  Not sure if it's worth developing a custom application to
replace that.

> My web app does lots of inserts that aren't read until a session is
> complete. The plan is to put the heavy insert session onto a ramdisk
based
> pg-db and transfer the relevant data to the master pg-db upon session
> completion. Currently running 7.4.6.

Unless you have a large proportion of sessions that are abandoned and
hence never need be transferred to the main database at all, this seems
like a dead waste of effort :-(.  The work to put the data into the main
database isn't lessened at all; you've just added extra work to manage
the buffer database.

 regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Postgres on RAID5

2005-03-11 Thread Arshavir Grigorian
Hi,
I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has
an Ext3 filesystem which is used by Postgres. Currently we are loading a
50G database on this server from a Postgres dump (copy, not insert) and
are experiencing very slow write performance (35 records per second).
Top shows that the Postgres process (postmaster) is being constantly put
into D state for extended periods of time (2-3 seconds) which I assume
is because it's waiting for disk io. I have just started gathering
system statistics and here is what sar -b shows: (this is while the db
is being loaded - pg_restore)
   tpsrtps wtps  bread/s  bwrtn/s
01:35:01 PM275.77 76.12199.66709.59   2315.23
01:45:01 PM287.25 75.56211.69706.52   2413.06
01:55:01 PM281.73 76.35205.37711.84   2389.86
02:05:01 PM282.83 76.14206.69720.85   2418.51
02:15:01 PM284.07 76.15207.92707.38   2443.60
02:25:01 PM265.46 75.91189.55708.87   2089.21
02:35:01 PM285.21 76.02209.19709.58   2446.46
Average:   280.33 76.04204.30710.66   2359.47
This is a Sun e450 with dual TI UltraSparc II processors and 2G of RAM.
It is currently running Debian Sarge with a 2.4.27-sparc64-smp custom
compiled kernel. Postgres is installed from the Debian package and uses
all the configuration defaults.
I am also copying the pgsql-performance list.
Thanks in advance for any advice/pointers.
Arshavir
Following is some other info that might be helpful.
/proc/scsi# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.00
  Creation Time : Wed Feb 23 17:23:41 2005
 Raid Level : raid5
 Array Size : 123823616 (118.09 GiB 126.80 GB)
Device Size : 8844544 (8.43 GiB 9.06 GB)
   Raid Devices : 15
  Total Devices : 17
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Thu Feb 24 10:05:38 2005
  State : active
 Active Devices : 15
Working Devices : 16
 Failed Devices : 1
  Spare Devices : 1
 Layout : left-symmetric
 Chunk Size : 64K
   UUID : 81ae2c97:06fa4f4d:87bfc6c9:2ee516df
 Events : 0.8
Number   Major   Minor   RaidDevice State
   0   8   640  active sync   /dev/sde
   1   8   801  active sync   /dev/sdf
   2   8   962  active sync   /dev/sdg
   3   8  1123  active sync   /dev/sdh
   4   8  1284  active sync   /dev/sdi
   5   8  1445  active sync   /dev/sdj
   6   8  1606  active sync   /dev/sdk
   7   8  1767  active sync   /dev/sdl
   8   8  1928  active sync   /dev/sdm
   9   8  2089  active sync   /dev/sdn
  10   8  224   10  active sync   /dev/sdo
  11   8  240   11  active sync   /dev/sdp
  12  650   12  active sync   /dev/sdq
  13  65   16   13  active sync   /dev/sdr
  14  65   32   14  active sync   /dev/sds
  15  65   48   15  spare   /dev/sdt
# dumpe2fs -h /dev/md1
dumpe2fs 1.35 (28-Feb-2004)
Filesystem volume name:   
Last mounted on:  
Filesystem UUID:  1bb95bd6-94c7-4344-adf2-8414cadae6fc
Filesystem magic number:  0xEF53
Filesystem revision #:1 (dynamic)
Filesystem features:  has_journal dir_index needs_recovery large_file
Default mount options:(none)
Filesystem state: clean
Errors behavior:  Continue
Filesystem OS type:   Linux
Inode count:  15482880
Block count:  30955904
Reserved block count: 1547795
Free blocks:  28767226
Free inodes:  15482502
First block:  0
Block size:   4096
Fragment size:4096
Blocks per group: 32768
Fragments per group:  32768
Inodes per group: 16384
Inode blocks per group:   512
Filesystem created:   Wed Feb 23 17:27:13 2005
Last mount time:  Wed Feb 23 17:45:25 2005
Last write time:  Wed Feb 23 17:45:25 2005
Mount count:  2
Maximum mount count:  28
Last checked: Wed Feb 23 17:27:13 2005
Check interval:   15552000 (6 months)
Next check after: Mon Aug 22 18:27:13 2005
Reserved blocks uid:  0 (user root)
Reserved blocks gid:  0 (group root)
First inode:  11
Inode size:   128
Journal inode:8
Default directory hash:   tea
Directory Hash Seed:  c35c0226-3b52-4dad-b102-f22feb773592
Journal backup:   inode blocks
# lspci | grep SCSI
:00:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875
(rev 14)
:00:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875
(rev 14)
:00:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875
(rev 14)
:00:04.1 SCSI storag

Re: [PERFORM] Questions about 2 databases.

2005-03-11 Thread jelle
On Fri, 11 Mar 2005, Tom Lane wrote:
[ snip ]
COPY would be my recommendation.  For a no-programming-effort solution
you could just pipe the output of pg_dump --data-only -t mytable
into psql.  Not sure if it's worth developing a custom application to
replace that.
I'm a programming-effort kind of guy so I'll try COPY.

My web app does lots of inserts that aren't read until a session is
complete. The plan is to put the heavy insert session onto a ramdisk based
pg-db and transfer the relevant data to the master pg-db upon session
completion. Currently running 7.4.6.
Unless you have a large proportion of sessions that are abandoned and
hence never need be transferred to the main database at all, this seems
like a dead waste of effort :-(.  The work to put the data into the main
database isn't lessened at all; you've just added extra work to manage
the buffer database.
The insert heavy sessions average 175 page hits generating XML, 1000 
insert/updates which comprise 90% of the insert/update load, of which 200 
inserts need to be transferred to the master db. The other sessions are 
read/cache bound. I hoping to get a speed-up from moving the temporary 
stuff off the master db and using 1 transaction instead of 175 to the disk 
based master db.

Thanks,
Jelle
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Postgres on RAID5

2005-03-11 Thread Tom Lane
Arshavir Grigorian <[EMAIL PROTECTED]> writes:
> I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has
> an Ext3 filesystem which is used by Postgres. Currently we are loading a
> 50G database on this server from a Postgres dump (copy, not insert) and
> are experiencing very slow write performance (35 records per second).

What PG version is this?  What version of pg_dump made the dump file?
How are you measuring that write rate (seeing that pg_restore doesn't
provide any such info)?

> Postgres is installed from the Debian package and uses
> all the configuration defaults.

The defaults are made for a fairly small machine, not big iron.  At a
minimum you want to kick shared_buffers up to 10K or more.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Postgres on RAID5

2005-03-11 Thread Arshavir Grigorian
Tom Lane wrote:
Arshavir Grigorian <[EMAIL PROTECTED]> writes:
I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has
an Ext3 filesystem which is used by Postgres. Currently we are loading a
50G database on this server from a Postgres dump (copy, not insert) and
are experiencing very slow write performance (35 records per second).

What PG version is this?  What version of pg_dump made the dump file?
How are you measuring that write rate (seeing that pg_restore doesn't
provide any such info)?
Sorry I missed the version. Both (the db from which the dump was created 
and the one it's being loaded on) run on Pg 7.4.

Well, if the restore is going on for X number of hours and you have Y 
records loaded, it's not hard to ballpark.


Postgres is installed from the Debian package and uses
all the configuration defaults.

The defaults are made for a fairly small machine, not big iron.  At a
minimum you want to kick shared_buffers up to 10K or more.
			regards, tom lane
Will do. Thanks.
Arshavir
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Postgres on RAID5

2005-03-11 Thread Alvaro Herrera
On Fri, Mar 11, 2005 at 05:29:11PM -0500, Arshavir Grigorian wrote:
> Tom Lane wrote:

> >The defaults are made for a fairly small machine, not big iron.  At a
> >minimum you want to kick shared_buffers up to 10K or more.
> >
> Will do. Thanks.

Also, it may help that you bump up sort_mem while doing [the CREATE
INDEX part of] the restore.

-- 
Alvaro Herrera (<[EMAIL PROTECTED]>)
"We are who we choose to be", sang the goldfinch
when the sun is high (Sandman)

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Postgres on RAID5

2005-03-11 Thread Tom Lane
Arshavir Grigorian <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> How are you measuring that write rate (seeing that pg_restore doesn't
>> provide any such info)?

> Well, if the restore is going on for X number of hours and you have Y 
> records loaded, it's not hard to ballpark.

Yeah, but how do you know that you have Y records loaded?

What I'm trying to get at is what the restore is actually spending its
time on.  It seems unlikely that a COPY per se would run that slowly;
far more likely that the expense is involved with index construction
or foreign key verification.  You could possibly determine what's what
by watching the backend process with "ps" to see what statement type
it's executing most of the time.

BTW, is this a full database restore (schema + data), or are you trying
to load data into pre-existing tables?  The latter is generally a whole
lot slower because both index updates and foreign key checks have to be
done retail instead of wholesale.  There are various ways of working
around that but you have to be aware of what you're doing.

Also, if it is indexing that's eating the time, boosting the sort_mem
setting for the server would help a lot.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Postgres on RAID5

2005-03-11 Thread Arshavir Grigorian
Many thanks for all the response.
I guess there are a lot of things to change and tweak and I wonder what 
would be a good benchmarking sample dataset (size, contents).

My tables are very large (the smallest is 7+ mil records) and take 
several days to load (if not weeks). It would be nice to have a sample 
dataset that would be large enough to mimic my large datasets, but small 
enough to load in a short priod of time. Any suggestions?

Arshavir
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Postgres on RAID5

2005-03-11 Thread Josh Berkus
A,

> This is a Sun e450 with dual TI UltraSparc II processors and 2G of RAM.
> It is currently running Debian Sarge with a 2.4.27-sparc64-smp custom
> compiled kernel. Postgres is installed from the Debian package and uses
> all the configuration defaults.

Please read http://www.powerpostgresql.com/PerfList

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Questions about 2 databases.

2005-03-11 Thread PFC

My web app does lots of inserts that aren't read until a session is  
complete. The plan is to put the heavy insert session onto a ramdisk  
based pg-db and transfer the relevant data to the master pg-db upon  
session completion. Currently running 7.4.6.
	From what you say I'd think you want to avoid making one write  
transaction to the main database on each page view, right ?
	You could simply store the data in a file, and at the end of the session,  
read the file and do all the writes in one transaction.

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Postgres on RAID5

2005-03-11 Thread PFC
	Look for the possibility that a foreign key check might not be using an  
index. This would yield a seq scan for each insertion, which might be your  
problem.

On Fri, 11 Mar 2005 19:22:56 -0500, Arshavir Grigorian <[EMAIL PROTECTED]>  
wrote:

Many thanks for all the response.
I guess there are a lot of things to change and tweak and I wonder what  
would be a good benchmarking sample dataset (size, contents).

My tables are very large (the smallest is 7+ mil records) and take  
several days to load (if not weeks). It would be nice to have a sample  
dataset that would be large enough to mimic my large datasets, but small  
enough to load in a short priod of time. Any suggestions?

Arshavir
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Postgres on RAID5

2005-03-11 Thread Alex Turner
I would recommend running a bonnie++ benchmark on your array to see if
it's the array/controller/raid being crap, or wether it's postgres.  I
have had some very surprising results from arrays that theoretically
should be fast, but turned out to be very slow.

I would also seriously have to recommend against a 14 drive RAID 5!
This is statisticaly as likely to fail as a 7 drive RAID 0 (not
counting the spare, but rebuiling a spare is very hard on existing
drives).

Alex Turner
netEconomist


On Fri, 11 Mar 2005 16:13:05 -0500, Arshavir Grigorian <[EMAIL PROTECTED]> 
wrote:
> Hi,
> 
> I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has
> an Ext3 filesystem which is used by Postgres. Currently we are loading a
> 50G database on this server from a Postgres dump (copy, not insert) and
> are experiencing very slow write performance (35 records per second).
> 
> Top shows that the Postgres process (postmaster) is being constantly put
> into D state for extended periods of time (2-3 seconds) which I assume
> is because it's waiting for disk io. I have just started gathering
> system statistics and here is what sar -b shows: (this is while the db
> is being loaded - pg_restore)
> 
>tpsrtps wtps  bread/s  bwrtn/s
> 01:35:01 PM275.77 76.12199.66709.59   2315.23
> 01:45:01 PM287.25 75.56211.69706.52   2413.06
> 01:55:01 PM281.73 76.35205.37711.84   2389.86
> 
[snip]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Postgres on RAID5

2005-03-11 Thread Joshua D. Drake

On Fri, 11 Mar 2005 16:13:05 -0500, Arshavir Grigorian <[EMAIL PROTECTED]> wrote:
 

Hi,
I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has
an Ext3 filesystem which is used by Postgres. Currently we are loading a
50G database on this server from a Postgres dump (copy, not insert) and
are experiencing very slow write performance (35 records per second).
   

That isn't that surprising. RAID 5 has never been known for its write
performance. You should be running RAID 10.
Sincerely,
Joshua D. Drake

Top shows that the Postgres process (postmaster) is being constantly put
into D state for extended periods of time (2-3 seconds) which I assume
is because it's waiting for disk io. I have just started gathering
system statistics and here is what sar -b shows: (this is while the db
is being loaded - pg_restore)
  tpsrtps wtps  bread/s  bwrtn/s
01:35:01 PM275.77 76.12199.66709.59   2315.23
01:45:01 PM287.25 75.56211.69706.52   2413.06
01:55:01 PM281.73 76.35205.37711.84   2389.86
   

[snip]
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
begin:vcard
fn:Joshua Drake
n:Drake;Joshua
org:Command Prompt, Inc.
adr:;;PO Box 215 ;Cascade Locks;OR;97014;US
email;internet:[EMAIL PROTECTED]
title:Consultant
tel;work:503-667-4564
tel;fax:503-210-0334
x-mozilla-html:FALSE
url:http://www.commandprompt.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster