Re: [PERFORM] What's better: Raid 0 or disk for seperate pg_xlog
Karim Nassar wrote: Thanks to all for the tips. On Thu, 2005-03-10 at 09:26 -0600, John A Meinel wrote: How critical is your data? How update heavy versus read heavy, etc are you? Large, relatively infrequent uploads, with frequent reads. The application is a web front-end to scientific research data. The scientists have their own copy of the data, so if something went really bad, we could probably get them to upload again. If you have very few updates and your reads aren't mostly from RAM you could be better off with simply mirroring (assuming that gains you read bandwidth). Failing that, use the tablespace feature to balance your read load as far as you can. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Statistics not working??
Hi there! I think I may have a problem with the statistics in my postgresql 8.0 running under Windowx XP. When I view both pg_stat_all_tables and pg_stat_all_indexes, all the numeric columns that should hold the statistics are 0 (zero). My configuration file has the following: stats_start_collector = true stats_command_string = true stats_reset_on_server_start = false Any tip? Thanks in advance, Hugo Ferreira -- GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Statistics not working??
On Fri, 11 Mar 2005, Hugo Ferreira wrote: Hi there! I think I may have a problem with the statistics in my postgresql 8.0 running under Windowx XP. When I view both pg_stat_all_tables and pg_stat_all_indexes, all the numeric columns that should hold the statistics are 0 (zero). My configuration file has the following: stats_start_collector = true stats_command_string = true stats_reset_on_server_start = false Any tip? You need to define stats_block_level and/or stats_row_level Thanks in advance, Hugo Ferreira Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [GENERAL] more execution time
ALÝ ÇELÝK wrote: why this query needs more time? Its very slow Difficult to say for sure - could you provide the output of EXPLAIN ANALYSE rather than just EXPLAIN? Some other immediate observations: 1. Perhaps don't post to so many mailing lists at once. If you reply to this, maybe reduce it to pgsql-performance? 2. You don't say whether the row estimates are accurate in the EXPLAIN. 3. You seem to be needlessly coalescing personaldetails.masterid since you check for it being null in your WHERE clause 4. Do you really need to cast to numeric and generate a "sorting" column that you then don't ORDER BY? 5. Is ppid an id number? And are you sure it's safe to calculate it like that? 6. What is balance() and how long does it take to calculate its result? select coalesce(personaldetails.masterid::numeric,personaldetails.id) + (coalesce(personaldetails.id::numeric,0)/100) as sorting, floor(coalesce(personaldetails.masterid::numeric,personaldetails.id) + (coalesce(personaldetails.id::numeric,0)/100)) as ppid, balance('MASTER-REGISTRATION',personaldetails.id) as balance, balance('MASTER-REGISTRATION',pd2.id) as accbalance, I'm guessing point 6 is actually your problem - try it without the calls to balance() and see what that does to your timings. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] What is the number of rows in explain?
Hi all, Is the number of rows in explain the number of rows that is expected to be visited or retrieved? Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] What is the number of rows in explain?
Joost Kraaijeveld wrote: Hi all, Is the number of rows in explain the number of rows that is expected to be visited or retrieved? Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] In general, it is the number of rows expected to be retrieved. Since a Sequential Scan always visits every row, but the rows= number is after filtering. John =:-> signature.asc Description: OpenPGP digital signature
[PERFORM] Performance tuning
Hi all, I'm preparing a set of servers which will eventually need to handle a high volume of queries (both reads and writes, but most reads are very simple index-based queries returning a limited set of rows, when not just one), and I would like to optimize things as much as possible, so I have a few questions on the exact way PostgreSQL's MVCC works, and how transactions, updates and vacuuming interact. I hope someone will be able to point me in the right direction (feel free to give pointers if I missed the places where this is described). From what I understand (and testing confirms it), bundling many queries in one single transaction is more efficient than having each query be a separate transaction (like with autocommit on). However, I wonder about the limits of this: - are there any drawbacks to grouping hundreds or thousands of queries (inserts/updates) over several minutes in one single transaction? Other than the fact that the inserts/updates will not be visible until committed, of course. Essentially turning autocommit off, and doing a commit once in a while. - does this apply only to inserts/selects/updates or also for selects? Another way to put this is: does a transaction with only one select actually have much transaction-related work to do? Or, does a transaction with only selects actually have any impact anywhere? Does it really leave a trace anywhere? Again, I understand that selects grouped in a transaction will not see updates done after the start of the transaction (unless done by the same process). - if during a single transaction several UPDATEs affect the same row, will MVCC generate as many row versions as there are updates (like would be the case with autocommit) or will they be grouped into one single row version? Another related issue is that many of the tables are indexed on a date field, and one process does a lot of updates on "recent" rows (which lead to many dead tuples), but after that "older" rows tend to remain pretty much unchanged for quite a while. Other than splitting the tables into "old" and "recent" tables, is there any way to make vacuum more efficient? Scanning the whole table for dead tuples when only a small portion of the table actually has any does not feel like being very efficient in this situation. Other issue: every five minutes or so, I see a noticeable performance drop as PostgreSQL checkpoints. This is 7.4.3 with pretty lousy hardware, I know 8.0 with decent hardware and separate disk(s) for pg_xlog will definitely help, but I really wonder if there is any way to reduce the amount of work that needs to be done at that point (I'm a strong believer of fixing software before hardware). I have already bumped checkpoint_segments to 8, but I'm not quite sure I understand how this helps (or doesn't help) things. Logs show 3 to 6 "recycled transaction log file" lines at that time, that seems quite a lot of work for a load that's still pretty low. Does grouping of more queries in transactions help with this? Are there other parameters that can affect things, or is just a matter of how much inserts/updates/deletes are done, and the amount of data that was changed? Last point: some of the servers have expandable data (and will be replicated with slony-I) and will run with fsync off. I have read conflicting statements as to what exactly this does: some sources indicate that setting fsync off actually switches off WAL/checkpointing, others that it just prevents the fsync (or equivalent) system calls. Since I still see checkpointing in that case, I guess it's not exactly the former, but I would love to understand more about it. Really, I would love to be able to set some tables or databases to "go as fast as you can and don't worry about transactions, MVCC or anything like that", but I'm not sure that option exists... Thanks, Jacques. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Query performance
As a test, I ran a query in the pgAdmin query tool, which returns about 15K records from a PostgreSQL v8.01 table on my Win2K server.I ran the same query from the local server, from another PC on the same 100 mbit local network, and from a PC on a different network, over the internet. The times for the query to run and the data to return for each of the three locations are shown here: Local Server : 571+521 ms Local network: 1187+1266 ms Internet:14579+4016 msMy question is this: Why does the execution time for the query to run increase so much? Since the query should be running on the server, it's time should be somewhat independent of the network transport delay. (unlike the data transport time) However, it appears to actually be hypersensitive to the transport delay. The ratios of time for the data transport (assuming 1 for the local server) are:1 : 2.43 : 7.71whereas the query execution time ratios are:1 : 2.08 : 25.5 (!!!)Obviously, the transport times will be greater. But why does the execution time bloat so?
Re: [PERFORM] Query performance
"Lou O'Quin" <[EMAIL PROTECTED]> writes: > it appears to actually be hypersensitive to the transport delay. The = > ratios of time for the data transport (assuming 1 for the local server) = > are: > 1 : 2.43 : 7.71 > whereas the query execution time ratios are: > 1 : 2.08 : 25.5 (!!!) How do you know that's what the data transport time is --- ie, how can you measure that separately from the total query time? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Query performance
Hi Tom. I referenced the status line of pgAdmin. Per the pgAdmin help file: "The status line will show how long the last query took to complete. If a dataset was returned, not only the elapsed time for server execution is displayed, but also the time to retrieve the data from the server to the Data Output page." Lou>>> Tom Lane <[EMAIL PROTECTED]> 3/11/2005 12:10 PM >>> "Lou O'Quin" <[EMAIL PROTECTED]> writes:> it appears to actually be hypersensitive to the transport delay. The => ratios of time for the data transport (assuming 1 for the local server) => are:> 1 : 2.43 : 7.71> whereas the query execution time ratios are:> 1 : 2.08 : 25.5 (!!!)How do you know that's what the data transport time is --- ie, how canyou measure that separately from the total query time? regards, tom lane
[PERFORM] Questions about 2 databases.
Hello All, I have a couple of questions about running 2 databases: 1) on a single 7.4.6 postgres instance does each database have it own WAL file or is that shared? Is it the same on 8.0.x? 2) what's the high performance way of moving 200 rows between similar tables on different databases? Does it matter if the databases are on the same or seperate postgres instances? Background: My web app does lots of inserts that aren't read until a session is complete. The plan is to put the heavy insert session onto a ramdisk based pg-db and transfer the relevant data to the master pg-db upon session completion. Currently running 7.4.6. Individual session data is not as critical as the master pg-db so the risk associated with running the session pg-db on a ramdisk is acceptable. All this is to get past the I/O bottleneck, already tweaked the config files, run on multiple RAID-1 spindles, profiled the queries, maxed the CPU/ram. Migrating to 64bit fedora soon. Thanks, this mailing list has been invaluable. Jelle ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Query performance
"Lou O'Quin" <[EMAIL PROTECTED]> writes: > Hi Tom. I referenced the status line of pgAdmin. Per the pgAdmin help > file: > > "The status line will show how long the last query took to complete. If a > dataset was returned, not only the elapsed time for server execution is > displayed, but also the time to retrieve the data from the server to the > Data Output page." Well, you should probably ask the pgadmin boys exactly what they are measuring. In any case, the Postgres server overlaps query execution with result sending, so I don't think it's possible to get a pure measurement of just one of those costs --- certainly not by looking at it only from the client end. BTW, one factor to consider is that if the test client machines weren't all the same speed, that would have some impact on their ability to absorb 15K records ... regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Questions about 2 databases.
jelle <[EMAIL PROTECTED]> writes: > 1) on a single 7.4.6 postgres instance does each database have it own WAL > file or is that shared? Is it the same on 8.0.x? Shared. > 2) what's the high performance way of moving 200 rows between similar > tables on different databases? Does it matter if the databases are > on the same or seperate postgres instances? COPY would be my recommendation. For a no-programming-effort solution you could just pipe the output of pg_dump --data-only -t mytable into psql. Not sure if it's worth developing a custom application to replace that. > My web app does lots of inserts that aren't read until a session is > complete. The plan is to put the heavy insert session onto a ramdisk based > pg-db and transfer the relevant data to the master pg-db upon session > completion. Currently running 7.4.6. Unless you have a large proportion of sessions that are abandoned and hence never need be transferred to the main database at all, this seems like a dead waste of effort :-(. The work to put the data into the main database isn't lessened at all; you've just added extra work to manage the buffer database. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query performance
I'll post there concerning how they determine the query execution time vs. data retrieval time. I did think about the processor/memory when choosing the machines - all three of the processors are similar. All are Pentium P4s with 512 MB memory. the server is Win2K, P4, 2.3 gHz the local network client is a WinXP Pro, P4, 2.2 gHzthe remote network client is WinXP Pro, P4, 1.9 gHz Lou >>> Tom Lane <[EMAIL PROTECTED]> 3/11/2005 1:21 PM >>> "Lou O'Quin" <[EMAIL PROTECTED]> writes:> Hi Tom. I referenced the status line of pgAdmin. Per the pgAdmin help> file:>> "The status line will show how long the last query took to complete. If a> dataset was returned, not only the elapsed time for server execution is> displayed, but also the time to retrieve the data from the server to the> Data Output page."Well, you should probably ask the pgadmin boys exactly what they aremeasuring. In any case, the Postgres server overlaps query executionwith result sending, so I don't think it's possible to get a puremeasurement of just one of those costs --- certainly not by looking atit only from the client end.BTW, one factor to consider is that if the test client machines weren'tall the same speed, that would have some impact on their ability toabsorb 15K records ... regards, tom lane---(end of broadcast)---TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Questions about 2 databases.
> this seems > like a dead waste of effort :-(. The work to put the data into the main > database isn't lessened at all; you've just added extra work to manage > the buffer database. True from the view point of the server, but not from the throughput in the client session (client viewpoint). The client will have a blazingly fast session with the buffer database. I'm assuming the buffer database table size is zero or very small. Constraints will be a problem if there are PKs, FKs that need satisfied on the server that are not adequately testable in the buffer. Might not be a problem if the full table fits on the RAM disk, but you still have to worry about two clients inserting the same PK. Rick Tom Lane <[EMAIL PROTECTED]>To: [EMAIL PROTECTED] Sent by: cc: pgsql-performance@postgresql.org [EMAIL PROTECTED]Subject: Re: [PERFORM] Questions about 2 databases. tgresql.org 03/11/2005 03:33 PM jelle <[EMAIL PROTECTED]> writes: > 1) on a single 7.4.6 postgres instance does each database have it own WAL > file or is that shared? Is it the same on 8.0.x? Shared. > 2) what's the high performance way of moving 200 rows between similar > tables on different databases? Does it matter if the databases are > on the same or seperate postgres instances? COPY would be my recommendation. For a no-programming-effort solution you could just pipe the output of pg_dump --data-only -t mytable into psql. Not sure if it's worth developing a custom application to replace that. > My web app does lots of inserts that aren't read until a session is > complete. The plan is to put the heavy insert session onto a ramdisk based > pg-db and transfer the relevant data to the master pg-db upon session > completion. Currently running 7.4.6. Unless you have a large proportion of sessions that are abandoned and hence never need be transferred to the main database at all, this seems like a dead waste of effort :-(. The work to put the data into the main database isn't lessened at all; you've just added extra work to manage the buffer database. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Postgres on RAID5
Hi, I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has an Ext3 filesystem which is used by Postgres. Currently we are loading a 50G database on this server from a Postgres dump (copy, not insert) and are experiencing very slow write performance (35 records per second). Top shows that the Postgres process (postmaster) is being constantly put into D state for extended periods of time (2-3 seconds) which I assume is because it's waiting for disk io. I have just started gathering system statistics and here is what sar -b shows: (this is while the db is being loaded - pg_restore) tpsrtps wtps bread/s bwrtn/s 01:35:01 PM275.77 76.12199.66709.59 2315.23 01:45:01 PM287.25 75.56211.69706.52 2413.06 01:55:01 PM281.73 76.35205.37711.84 2389.86 02:05:01 PM282.83 76.14206.69720.85 2418.51 02:15:01 PM284.07 76.15207.92707.38 2443.60 02:25:01 PM265.46 75.91189.55708.87 2089.21 02:35:01 PM285.21 76.02209.19709.58 2446.46 Average: 280.33 76.04204.30710.66 2359.47 This is a Sun e450 with dual TI UltraSparc II processors and 2G of RAM. It is currently running Debian Sarge with a 2.4.27-sparc64-smp custom compiled kernel. Postgres is installed from the Debian package and uses all the configuration defaults. I am also copying the pgsql-performance list. Thanks in advance for any advice/pointers. Arshavir Following is some other info that might be helpful. /proc/scsi# mdadm -D /dev/md1 /dev/md1: Version : 00.90.00 Creation Time : Wed Feb 23 17:23:41 2005 Raid Level : raid5 Array Size : 123823616 (118.09 GiB 126.80 GB) Device Size : 8844544 (8.43 GiB 9.06 GB) Raid Devices : 15 Total Devices : 17 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Thu Feb 24 10:05:38 2005 State : active Active Devices : 15 Working Devices : 16 Failed Devices : 1 Spare Devices : 1 Layout : left-symmetric Chunk Size : 64K UUID : 81ae2c97:06fa4f4d:87bfc6c9:2ee516df Events : 0.8 Number Major Minor RaidDevice State 0 8 640 active sync /dev/sde 1 8 801 active sync /dev/sdf 2 8 962 active sync /dev/sdg 3 8 1123 active sync /dev/sdh 4 8 1284 active sync /dev/sdi 5 8 1445 active sync /dev/sdj 6 8 1606 active sync /dev/sdk 7 8 1767 active sync /dev/sdl 8 8 1928 active sync /dev/sdm 9 8 2089 active sync /dev/sdn 10 8 224 10 active sync /dev/sdo 11 8 240 11 active sync /dev/sdp 12 650 12 active sync /dev/sdq 13 65 16 13 active sync /dev/sdr 14 65 32 14 active sync /dev/sds 15 65 48 15 spare /dev/sdt # dumpe2fs -h /dev/md1 dumpe2fs 1.35 (28-Feb-2004) Filesystem volume name: Last mounted on: Filesystem UUID: 1bb95bd6-94c7-4344-adf2-8414cadae6fc Filesystem magic number: 0xEF53 Filesystem revision #:1 (dynamic) Filesystem features: has_journal dir_index needs_recovery large_file Default mount options:(none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 15482880 Block count: 30955904 Reserved block count: 1547795 Free blocks: 28767226 Free inodes: 15482502 First block: 0 Block size: 4096 Fragment size:4096 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16384 Inode blocks per group: 512 Filesystem created: Wed Feb 23 17:27:13 2005 Last mount time: Wed Feb 23 17:45:25 2005 Last write time: Wed Feb 23 17:45:25 2005 Mount count: 2 Maximum mount count: 28 Last checked: Wed Feb 23 17:27:13 2005 Check interval: 15552000 (6 months) Next check after: Mon Aug 22 18:27:13 2005 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode:8 Default directory hash: tea Directory Hash Seed: c35c0226-3b52-4dad-b102-f22feb773592 Journal backup: inode blocks # lspci | grep SCSI :00:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14) :00:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14) :00:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14) :00:04.1 SCSI storag
Re: [PERFORM] Questions about 2 databases.
On Fri, 11 Mar 2005, Tom Lane wrote: [ snip ] COPY would be my recommendation. For a no-programming-effort solution you could just pipe the output of pg_dump --data-only -t mytable into psql. Not sure if it's worth developing a custom application to replace that. I'm a programming-effort kind of guy so I'll try COPY. My web app does lots of inserts that aren't read until a session is complete. The plan is to put the heavy insert session onto a ramdisk based pg-db and transfer the relevant data to the master pg-db upon session completion. Currently running 7.4.6. Unless you have a large proportion of sessions that are abandoned and hence never need be transferred to the main database at all, this seems like a dead waste of effort :-(. The work to put the data into the main database isn't lessened at all; you've just added extra work to manage the buffer database. The insert heavy sessions average 175 page hits generating XML, 1000 insert/updates which comprise 90% of the insert/update load, of which 200 inserts need to be transferred to the master db. The other sessions are read/cache bound. I hoping to get a speed-up from moving the temporary stuff off the master db and using 1 transaction instead of 175 to the disk based master db. Thanks, Jelle ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Postgres on RAID5
Arshavir Grigorian <[EMAIL PROTECTED]> writes: > I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has > an Ext3 filesystem which is used by Postgres. Currently we are loading a > 50G database on this server from a Postgres dump (copy, not insert) and > are experiencing very slow write performance (35 records per second). What PG version is this? What version of pg_dump made the dump file? How are you measuring that write rate (seeing that pg_restore doesn't provide any such info)? > Postgres is installed from the Debian package and uses > all the configuration defaults. The defaults are made for a fairly small machine, not big iron. At a minimum you want to kick shared_buffers up to 10K or more. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Postgres on RAID5
Tom Lane wrote: Arshavir Grigorian <[EMAIL PROTECTED]> writes: I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has an Ext3 filesystem which is used by Postgres. Currently we are loading a 50G database on this server from a Postgres dump (copy, not insert) and are experiencing very slow write performance (35 records per second). What PG version is this? What version of pg_dump made the dump file? How are you measuring that write rate (seeing that pg_restore doesn't provide any such info)? Sorry I missed the version. Both (the db from which the dump was created and the one it's being loaded on) run on Pg 7.4. Well, if the restore is going on for X number of hours and you have Y records loaded, it's not hard to ballpark. Postgres is installed from the Debian package and uses all the configuration defaults. The defaults are made for a fairly small machine, not big iron. At a minimum you want to kick shared_buffers up to 10K or more. regards, tom lane Will do. Thanks. Arshavir ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Postgres on RAID5
On Fri, Mar 11, 2005 at 05:29:11PM -0500, Arshavir Grigorian wrote: > Tom Lane wrote: > >The defaults are made for a fairly small machine, not big iron. At a > >minimum you want to kick shared_buffers up to 10K or more. > > > Will do. Thanks. Also, it may help that you bump up sort_mem while doing [the CREATE INDEX part of] the restore. -- Alvaro Herrera (<[EMAIL PROTECTED]>) "We are who we choose to be", sang the goldfinch when the sun is high (Sandman) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Postgres on RAID5
Arshavir Grigorian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> How are you measuring that write rate (seeing that pg_restore doesn't >> provide any such info)? > Well, if the restore is going on for X number of hours and you have Y > records loaded, it's not hard to ballpark. Yeah, but how do you know that you have Y records loaded? What I'm trying to get at is what the restore is actually spending its time on. It seems unlikely that a COPY per se would run that slowly; far more likely that the expense is involved with index construction or foreign key verification. You could possibly determine what's what by watching the backend process with "ps" to see what statement type it's executing most of the time. BTW, is this a full database restore (schema + data), or are you trying to load data into pre-existing tables? The latter is generally a whole lot slower because both index updates and foreign key checks have to be done retail instead of wholesale. There are various ways of working around that but you have to be aware of what you're doing. Also, if it is indexing that's eating the time, boosting the sort_mem setting for the server would help a lot. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgres on RAID5
Many thanks for all the response. I guess there are a lot of things to change and tweak and I wonder what would be a good benchmarking sample dataset (size, contents). My tables are very large (the smallest is 7+ mil records) and take several days to load (if not weeks). It would be nice to have a sample dataset that would be large enough to mimic my large datasets, but small enough to load in a short priod of time. Any suggestions? Arshavir ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Postgres on RAID5
A, > This is a Sun e450 with dual TI UltraSparc II processors and 2G of RAM. > It is currently running Debian Sarge with a 2.4.27-sparc64-smp custom > compiled kernel. Postgres is installed from the Debian package and uses > all the configuration defaults. Please read http://www.powerpostgresql.com/PerfList -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Questions about 2 databases.
My web app does lots of inserts that aren't read until a session is complete. The plan is to put the heavy insert session onto a ramdisk based pg-db and transfer the relevant data to the master pg-db upon session completion. Currently running 7.4.6. From what you say I'd think you want to avoid making one write transaction to the main database on each page view, right ? You could simply store the data in a file, and at the end of the session, read the file and do all the writes in one transaction. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgres on RAID5
Look for the possibility that a foreign key check might not be using an index. This would yield a seq scan for each insertion, which might be your problem. On Fri, 11 Mar 2005 19:22:56 -0500, Arshavir Grigorian <[EMAIL PROTECTED]> wrote: Many thanks for all the response. I guess there are a lot of things to change and tweak and I wonder what would be a good benchmarking sample dataset (size, contents). My tables are very large (the smallest is 7+ mil records) and take several days to load (if not weeks). It would be nice to have a sample dataset that would be large enough to mimic my large datasets, but small enough to load in a short priod of time. Any suggestions? Arshavir ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Postgres on RAID5
I would recommend running a bonnie++ benchmark on your array to see if it's the array/controller/raid being crap, or wether it's postgres. I have had some very surprising results from arrays that theoretically should be fast, but turned out to be very slow. I would also seriously have to recommend against a 14 drive RAID 5! This is statisticaly as likely to fail as a 7 drive RAID 0 (not counting the spare, but rebuiling a spare is very hard on existing drives). Alex Turner netEconomist On Fri, 11 Mar 2005 16:13:05 -0500, Arshavir Grigorian <[EMAIL PROTECTED]> wrote: > Hi, > > I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has > an Ext3 filesystem which is used by Postgres. Currently we are loading a > 50G database on this server from a Postgres dump (copy, not insert) and > are experiencing very slow write performance (35 records per second). > > Top shows that the Postgres process (postmaster) is being constantly put > into D state for extended periods of time (2-3 seconds) which I assume > is because it's waiting for disk io. I have just started gathering > system statistics and here is what sar -b shows: (this is while the db > is being loaded - pg_restore) > >tpsrtps wtps bread/s bwrtn/s > 01:35:01 PM275.77 76.12199.66709.59 2315.23 > 01:45:01 PM287.25 75.56211.69706.52 2413.06 > 01:55:01 PM281.73 76.35205.37711.84 2389.86 > [snip] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Postgres on RAID5
On Fri, 11 Mar 2005 16:13:05 -0500, Arshavir Grigorian <[EMAIL PROTECTED]> wrote: Hi, I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has an Ext3 filesystem which is used by Postgres. Currently we are loading a 50G database on this server from a Postgres dump (copy, not insert) and are experiencing very slow write performance (35 records per second). That isn't that surprising. RAID 5 has never been known for its write performance. You should be running RAID 10. Sincerely, Joshua D. Drake Top shows that the Postgres process (postmaster) is being constantly put into D state for extended periods of time (2-3 seconds) which I assume is because it's waiting for disk io. I have just started gathering system statistics and here is what sar -b shows: (this is while the db is being loaded - pg_restore) tpsrtps wtps bread/s bwrtn/s 01:35:01 PM275.77 76.12199.66709.59 2315.23 01:45:01 PM287.25 75.56211.69706.52 2413.06 01:55:01 PM281.73 76.35205.37711.84 2389.86 [snip] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL begin:vcard fn:Joshua Drake n:Drake;Joshua org:Command Prompt, Inc. adr:;;PO Box 215 ;Cascade Locks;OR;97014;US email;internet:[EMAIL PROTECTED] title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 x-mozilla-html:FALSE url:http://www.commandprompt.com version:2.1 end:vcard ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster