[PERFORM] pg_dump slow
I Maked a new install on machine this night, and the same results, on console localhost Windows 2000 Server Version 5.00.2195 PG Version 8.1 Franklin Franlin: are you making pg_dump from local or remote box and is this a clean install? Try fresh patched win2k install and see what happens. He claimed this was local, not network. It is certainly an intriguing possibility that W2K and WinXP handle bytea differently. I'm not competent to comment on that however. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] About the relation between fragmentation of file and
On Thu, Dec 01, 2005 at 02:50:56PM +0900, Tatsumi Abe wrote: Could anyone advise some solutions for this fragmentation problem without stopping PostgreSQL ? Stop doing VACUUM FULL so often. If your table size is constant anyway you're just wasting time by compacting the table and shrinking it, and encouraging fragmentation as each table file grows then shrinks a little bit each day. Mike Stone ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] About the relation between fragmentation of file and VACUUM
On Dec 1, 2005, at 00:50, Tatsumi Abe wrote: However, In my current environment I can't stop PostgreSQL and cancel fragmentation. Could anyone advise some solutions for this fragmentation problem without stopping PostgreSQL ? This is somewhat of an aside and intended just as a helpful suggestion since I've been in this spot before: if you have this kind of uptime requirement the first project to work on is getting the environment to the point where you can take out at least one database server at a time for maintenance. You're going to be forced to do this sooner or later - whether by disk failure, software error (Pg or OS), user error (restore from backup) or security issues (must patch fixes). So disk fragmentation is a great thing to worry about at some point, but IMHO you've got your neck under the guillotine and worrying about your cuticles. I've heard the arguments before, usually around budget, and if the company can't spend any money but needs blood-from-stone performance tweaks, somebody isn't doing the math right (I'm assuming this isn't running on a satellite). Plus, your blood pressure will go down when things are more resilient. I've tried the superhero thing before and it's just not worth it. -Bill - Bill McGonigle, Owner Work: 603.448.4440 BFC Computing, LLC Home: 603.448.1668 [EMAIL PROTECTED] Mobile: 603.252.2606 http://www.bfccomputing.com/Pager: 603.442.1833 Jabber: [EMAIL PROTECTED] Text: [EMAIL PROTECTED] Blog: http://blog.bfccomputing.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] slow insert into very large table
Hi there, I need a simple but large table with several million records. I do batch inserts with JDBC. After the first million or so records, the inserts degrade to become VERY slow (like 8 minutes vs initially 20 secondes). The table has no indices except PK while I do the inserts. This is with PostgreSQL 8.0 final for WindowsXP on a Pentium 1.86 GHz, 1GB Memory. HD is fast IDE. I already have shared buffers already set to 25000. I wonder what else I can do. Any ideas? Kindest regards, Wolfgang Gehner -- Infonoia SA 7 rue de Berne 1211 Geneva 1 Tel: +41 22 9000 009 Fax: +41 22 9000 018 http://www.infonoia.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] 15,000 tables
Hi, we are currently running a postgres server (upgraded to 8.1) which has one large database with approx. 15,000 tables. Unfortunately performance suffers from that, because the internal tables (especially that which holds the attribute info) get too large. (We NEED that many tables, please don't recommend to reduce them) Logically these tables could be grouped into 500 databases. My question is: Would performance be better if I had 500 databases (on one postgres server instance) which each contain 30 tables, or is it better to have one large database with 15,000 tables? In the old days of postgres 6.5 we tried that, but performance was horrible with many databases ... BTW: I searched the mailing list, but found nothing on the subject - and there also isn't any information in the documentation about the effects of the number of databases, tables or attributes on the performance. Now, what do you say? Thanks in advance for any comment! Mike ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] slow insert into very large table
Wolfgang Gehner wrote: Hi there, I need a simple but large table with several million records. I do batch inserts with JDBC. After the first million or so records, the inserts degrade to become VERY slow (like 8 minutes vs initially 20 secondes). The table has no indices except PK while I do the inserts. This is with PostgreSQL 8.0 final for WindowsXP on a Pentium 1.86 GHz, 1GB Memory. HD is fast IDE. I already have shared buffers already set to 25000. I wonder what else I can do. Any ideas? Run VACUUM ANALYZE to have statistics reflect the growth of the table. The planner probably still assumes your table to be small, and thus takes wrong plans to check PK indexes or so. Regards, Andreas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 15,000 tables
On Thu, 1 Dec 2005, Michael Riess wrote: Hi, we are currently running a postgres server (upgraded to 8.1) which has one large database with approx. 15,000 tables. Unfortunately performance suffers from that, because the internal tables (especially that which holds the attribute info) get too large. is it becouse the internal tables get large, or is it a problem with disk I/O? with 15,000 tables you are talking about a LOT of files to hold these (30,000 files with one index each and each database being small enough to not need more then one file to hold it), on linux ext2/3 this many files in one directory will slow you down horribly. try different filesystems (from my testing and from other posts it looks like XFS is a leading contender), and also play around with the tablespaces feature in 8.1 to move things out of the main data directory into multiple directories. if you do a ls -l on the parent directory you will see that the size of the directory is large if it's ever had lots of files in it, the only way to shrink it is to mv the old directory to a new name, create a new directory and move the files from the old directory to the new one. David Lang ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] filesystem performance with lots of files
this subject has come up a couple times just today (and it looks like one that keeps popping up). under linux ext2/3 have two known weaknesses (or rather one weakness with two manifestations). searching through large objects on disk is slow, this applies to both directories (creating, opening, deleting files if there are (or have been) lots of files in a directory), and files (seeking to the right place in a file). the rule of thumb that I have used for years is that if files get over a few tens of megs or directories get over a couple thousand entries you will start slowing down. common places you can see this (outside of postgres) 1. directories, mail or news storage. if you let your /var/spool/mqueue directory get large (for example a server that can't send mail for a while or mail gets misconfigured on). there may only be a few files in there after it gets fixed, but if the directory was once large just doing a ls on the directory will be slow. news servers that store each message as a seperate file suffer from this as well, they work around it by useing multiple layers of nested directories so that no directory has too many files in it (navigating the layers of directories costs as well, it's all about the tradeoffs). Mail servers that use maildir (and Cyrus which uses a similar scheme) have the same problem. to fix this you have to create a new directory and move the files to that directory (and then rename the new to the old) ext3 has an option to make searching directories faster (htree), but enabling it kills performance when you create files. And this doesn't help with large files. 2. files, mbox formatted mail files and log files as these files get large, the process of appending to them takes more time. syslog makes this very easy to test. On a box that does syncronous syslog writing (default for most systems useing standard syslog, on linux make sure there is not a - in front of the logfile name) time how long it takes to write a bunch of syslog messages, then make the log file large and time it again. a few weeks ago I did a series of tests to compare different filesystems. the test was for a different purpose so the particulars are not what I woud do for testing aimed at postgres, but I think the data is relavent) and I saw major differences between different filesystems, I'll see aobut re-running the tests to get a complete set of benchmarks in the next few days. My tests had their times vary from 4 min to 80 min depending on the filesystem in use (ext3 with hash_dir posted the worst case). what testing have other people done with different filesystems? David Lang ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 15,000 tables
Hi David, incidentally: The directory which holds our datbase currently contains 73883 files ... do I get a prize or something? ;-) Regards, Mike ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 15,000 tables
Hi David, with 15,000 tables you are talking about a LOT of files to hold these (30,000 files with one index each and each database being small enough to not need more then one file to hold it), on linux ext2/3 this many files in one directory will slow you down horribly. We use ReiserFS, and I don't think that this is causing the problem ... although it would probably help to split the directory up using tablespaces. But thanks for the suggestion! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] About the relation between fragmentation of file and
On Thu, 1 Dec 2005, Richard Huxton wrote: Tatsumi Abe wrote: Question is about the relation between fragmentation of file and VACUUM performance. Environment OS:RedHat Enterprise Linux AS Release 3(Taroon Update 6) Kernel 2.4.21-37.ELsmp on an i686 Filesystem Type ext3 Filesystem features: has_journal filetype needs_recovery sparse_super large_file try different filesystems, ext2/3 do a very poor job when you have lots of files in a directory (and 7000+ files is a lot). you can also try mounting the filesystem with noatime, nodiratime to reduce the seeks when reading, and try mounting it with oldalloc (which changes how the files are arranged on disk when writing and extending them), I've seen drastic speed differences between ext2 and ext3 based on this option (ext2 defaults to oldalloc, ext3 defaults to orlov, which is faster in many cases) CPU:Intel(R) Xeon(TM) CPU 2.80GHz stepping 01 Memory:2.0GB HDD:80GB(S-ATA) SATA max UDMA/133 PostgreSQL:7.3.8 DB Environment 1. Approx. there are 3500 tables in the DB When the performance of inserting data was measured in the above- mentioned environment, it takes six minutes to write 1 lines after 4/5 days the measurement had begun. While searching the reason of bottleneck by executing iostat command it is understood that DISK I/O was problem for the neck as %iowait was almost 100% at that time. On the very first day processing time of VACUUM is not a problem but when the day progress its process time is increasing.Then I examined the fragmentation of database area(pgsql/data/base) by using the following tools. Disk Allocation Viewer http://sourceforge.net/projects/davtools/ Fragmentation rate is 28% before defrag. I'd guess the root of your problem is the number of tables (3500), which if each has one index represents at least 7000 files. That means a lot of your I/O time will probably be spent moving the disk heads between the different files. depending on the size of the tables it can actually be a lot worse then this (remember Postgres splits the tables into fixed size chunks) when postgres adds data it will eventually spill over into additional files, when you do a vaccum does it re-write the tables into a smaller number of files or just rewrite the individual files (makeing each of them smaller, but keeping the same number of files) speaking of this, the selection of the size of these chunks is a comprimize between the time needed to seek in an individual file and the number of files that are created, is there an easy way to tinker with this (I am sure the default is not correct for all filesystems, the filesystem handling of large and/or many files differ drasticly) You say you can't stop the server, so there's no point in thinking about a quick hardware upgrade to help you. Also a version-upgrade is not do-able for you. there's a difference between stopping the server once for an upgrade (hardware or software) and having to stop it every few days to defrag things forever after. David Lang I can only think of two other options: 1. Change the database schema to reduce the number of tables involved. I'm assuming that of the 3500 tables most hold the same data but for different clients (or something similar). This might not be practical either. 2. Re-order how you access the database. ANALYSE the updated tables regularly, but only VACUUM them after deletions. Group your inserts so that all the inserts for table1 go together, then all the inserts for table2 go together and so on. This should help with the fragmentation by making sure the files get extended in larger chunks. Are you sure it's not possible to spend 15 mins offline to solve this? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 15,000 tables
On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote: Hi, we are currently running a postgres server (upgraded to 8.1) which has one large database with approx. 15,000 tables. Unfortunately performance suffers from that, because the internal tables (especially that which holds the attribute info) get too large. (We NEED that many tables, please don't recommend to reduce them) Have you ANALYZEd your database? VACUUMing? BTW, are you using some kind of weird ERP? I have one that treat informix as a fool and don't let me get all of informix potential... maybe the same is in your case... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0
Hi! I've got an urgent problem with an application which is evaluating a monthly survey; it's running quite a lot of queries like this: select SOURCE.NAME as TYPE, count(PARTICIPANT.SESSION_ID) as TOTAL from ( select PARTICIPANT.SESSION_ID from survey.PARTICIPANT, survey.ANSWER where PARTICIPANT.STATUS = 1 and date_trunc('month', PARTICIPANT.CREATED) = date_trunc('month', now()-'1 month'::interval) and PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID and ANSWER.QUESTION_ID = 6 and ANSWER.VALUE = 1 ) as PARTICIPANT, survey.ANSWER, survey.HANDY_JAVA SOURCE where PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID and ANSWER.QUESTION_ID = 16 and ANSWER.VALUE = SOURCE.ID group by SOURCE.NAME, SOURCE.POSITION order by SOURCE.POSITION asc; My current PostgreSQL-version is PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2. Up to 8.0, a query like this took a couple of seconds, maybe even up to a minute. In 8.1 a query like this will run from 30 minutes up to two hours to complete, depending on ist complexity. I've got autovaccum enabled and run a nightly vacuum analyze over all of my databases. Here's some information about the relevant tables: Table answer has got ~ 8.9M rows (estimated 8,872,130, counted 8,876,648), participant has got ~178K rows (estimated 178,165, counted 178,248), HANDY_JAVA has got three rows. This is the explain-analyze-output for the above: Sort (cost=11383.09..11383.10 rows=3 width=16) (actual time=1952676.858..1952676.863 rows=3 loops=1) Sort Key: source.position - HashAggregate (cost=11383.03..11383.07 rows=3 width=16) (actual time=1952676.626..1952676.635 rows=3 loops=1) - Nested Loop (cost=189.32..11383.00 rows=5 width=16) (actual time=6975.812..1952371.782 rows=9806 loops=1) - Nested Loop (cost=3.48..3517.47 rows=42 width=20) (actual time=6819.716..15419.930 rows=9806 loops=1) - Nested Loop (cost=3.48..1042.38 rows=738 width=16) (actual time=258.434..6233.039 rows=162723 loops=1) - Seq Scan on handy_java source (cost=0.00..1.03 rows=3 width=14) (actual time=0.093..0.118 rows=3 loops=1) - Bitmap Heap Scan on answer (cost=3.48..344.04 rows=246 width=8) (actual time=172.381..1820.499 rows=54241 loops=3) Recheck Cond: ((answer.question_id = 16) AND (answer.value = outer.id)) - Bitmap Index Scan on idx02_performance (cost=0.00..3.48 rows=246 width=0) (actual time=98.321..98.321 rows=54245 loops=3) Index Cond: ((answer.question_id = 16) AND (answer.value = outer.id)) - Index Scan using idx01_perf_0006 on participant (cost=0.00..3.34 rows=1 width=4) (actual time=0.049..0.050 rows=0 loops=162723) Index Cond: (participant.session_id = outer.session_id) Filter: ((status = 1) AND (date_trunc('month'::text, created) = date_trunc('month'::text, (now() - '1 mon'::interval - Bitmap Heap Scan on answer (cost=185.85..187.26 rows=1 width=4) (actual time=197.490..197.494 rows=1 loops=9806) Recheck Cond: ((outer.session_id = answer.session_id) AND (answer.question_id = 6) AND (answer.value = 1)) - BitmapAnd (cost=185.85..185.85 rows=1 width=0) (actual time=197.421..197.421 rows=0 loops=9806) - Bitmap Index Scan on idx_answer_session_id (cost=0.00..2.83 rows=236 width=0) (actual time=0.109..0.109 rows=49 loops=9806) Index Cond: (outer.session_id = answer.session_id) - Bitmap Index Scan on idx02_performance (cost=0.00..182.77 rows=20629 width=0) (actual time=195.742..195.742 rows=165697 loops=9806) Index Cond: ((question_id = 6) AND (value = 1)) Total runtime: 1952678.393 ms I am really sorry, but currently I haven't got any 8.0-installation left, so I cannot provide the explain (analyze) output for 8.0. I fiddled a little with the statement and managed to speed things up quite a lot: select SOURCE.NAME as TYPE, count(ANSWER.SESSION_ID) as TOTAL from survey.ANSWER, survey.HANDY_JAVA SOURCE where ANSWER.QUESTION_ID = 16 and ANSWER.VALUE = SOURCE.ID and ANSWER.SESSION_ID in ( select PARTICIPANT.SESSION_ID from survey.PARTICIPANT, survey.ANSWER where PARTICIPANT.STATUS = 1 and date_trunc('month', PARTICIPANT.CREATED) = date_trunc('month', now()-'1 month'::interval) and PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID and ANSWER.QUESTION_ID = 6 and ANSWER.VALUE = 1 ) group by SOURCE.NAME, SOURCE.POSITION order by SOURCE.POSITION asc; Here's the explain analyze output: Sort (cost=27835.39..27835.39 rows=3 width=16) (actual time=9609.207..9609.212 rows=3 loops=1) Sort Key: source.position - HashAggregate
Re: [PERFORM] slow insert into very large table
Wolfgang Gehner [EMAIL PROTECTED] writes: This is with PostgreSQL 8.0 final for WindowsXP on a Pentium 1.86 GHz, 1GB Memory. HD is fast IDE. Try something more recent, like 8.0.3 or 8.0.4. IIRC we had some performance issues in 8.0.0 with tables that grew from zero to large size during a single session. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 15,000 tables
Michael Riess [EMAIL PROTECTED] writes: (We NEED that many tables, please don't recommend to reduce them) No, you don't. Add an additional key column to fold together different tables of the same structure. This will be much more efficient than managing that key at the filesystem level, which is what you're effectively doing now. (If you really have 15000 distinct rowtypes, I'd like to know what your database design is...) regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 15,000 tables
On 12/1/05, Tom Lane [EMAIL PROTECTED] wrote: Michael Riess [EMAIL PROTECTED] writes: (We NEED that many tables, please don't recommend to reduce them) No, you don't. Add an additional key column to fold together different tables of the same structure. This will be much more efficient than managing that key at the filesystem level, which is what you're effectively doing now. (If you really have 15000 distinct rowtypes, I'd like to know what your database design is...) regards, tom lane Maybe he is using some kind of weird ERP... take the case of BaaN (sadly i use it in my work): BaaN creates about 1200 tables per company and i have no control of it... we have about 12000 tables right now... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] pg_dump slow
Franlin: are you making pg_dump from local or remote box and is this a clean install? Try fresh patched win2k install and see what happens. He claimed this was local, not network. It is certainly an intriguing possibility that W2K and WinXP handle bytea differently. I'm not competent to comment on that however. can you make small extraction of this file (~ 100 rows), zip to file and send to me off list? I'll test it vs. a 2000 and xp server and try to reproduce your results. Merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 15,000 tables
Hi, On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote: Hi, we are currently running a postgres server (upgraded to 8.1) which has one large database with approx. 15,000 tables. Unfortunately performance suffers from that, because the internal tables (especially that which holds the attribute info) get too large. (We NEED that many tables, please don't recommend to reduce them) Have you ANALYZEd your database? VACUUMing? Of course ... before 8.1 we routinely did a vacuum full analyze each night. As of 8.1 we use autovacuum. BTW, are you using some kind of weird ERP? I have one that treat informix as a fool and don't let me get all of informix potential... maybe the same is in your case... No. Our database contains tables for we content management systems. The server hosts approx. 500 cms applications, and each of them has approx. 30 tables. That's why I'm asking if it was better to have 500 databases with 30 tables each. In previous Postgres versions this led to even worse performance ... Mike ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0
Markus Wollny [EMAIL PROTECTED] writes: My current problem is that rewriting hundreds of queries, some of them quite a bit more complex than this one, but all of them using the same general scheme, would take quite a lot of time - and I'm expected to hand over the survey results ASAP. So I will obviously have to do a rewrite if there's just no other way, but I wondered if there might be some other option that would allow me to point the planner in the right direction so it would behave the same as in the previous versions, namely 8.0? It looks like set enable_nestloop = 0 might be a workable hack for the immediate need. Once you're not under deadline, I'd like to investigate more closely to find out why 8.1 does worse than 8.0 here. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0
-Ursprüngliche Nachricht- Von: Tom Lane [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 1. Dezember 2005 17:26 An: Markus Wollny Cc: pgsql-performance@postgresql.org Betreff: Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0 It looks like set enable_nestloop = 0 might be a workable hack for the immediate need. Whow - that works miracles :) Sort (cost=81813.13..81813.14 rows=3 width=16) (actual time=7526.745..7526.751 rows=3 loops=1) Sort Key: source.position - HashAggregate (cost=81813.07..81813.11 rows=3 width=16) (actual time=7526.590..7526.601 rows=3 loops=1) - Merge Join (cost=81811.40..81813.03 rows=5 width=16) (actual time=7423.289..7479.175 rows=9806 loops=1) Merge Cond: (outer.id = inner.value) - Sort (cost=1.05..1.06 rows=3 width=14) (actual time=0.085..0.091 rows=3 loops=1) Sort Key: source.id - Seq Scan on handy_java source (cost=0.00..1.03 rows=3 width=14) (actual time=0.039..0.049 rows=3 loops=1) - Sort (cost=81810.35..81811.81 rows=583 width=8) (actual time=7423.179..7440.062 rows=9806 loops=1) Sort Key: mafo.answer.value - Hash Join (cost=27164.31..81783.57 rows=583 width=8) (actual time=6757.521..7360.822 rows=9806 loops=1) Hash Cond: (outer.session_id = inner.session_id) - Bitmap Heap Scan on answer (cost=506.17..54677.92 rows=88334 width=8) (actual time=379.245..2660.344 rows=162809 loops=1) Recheck Cond: (question_id = 16) - Bitmap Index Scan on idx_answer_question_id (cost=0.00..506.17 rows=88334 width=0) (actual time=274.632..274.632 rows=162814 loops=1) Index Cond: (question_id = 16) - Hash (cost=26655.21..26655.21 rows=1175 width=8) (actual time=3831.362..3831.362 rows=9806 loops=1) - Hash Join (cost=4829.33..26655.21 rows=1175 width=8) (actual time=542.227..3800.985 rows=9806 loops=1) Hash Cond: (outer.session_id = inner.session_id) - Bitmap Heap Scan on answer (cost=182.84..21429.34 rows=20641 width=4) (actual time=292.067..2750.376 rows=165762 loops=1) Recheck Cond: ((question_id = 6) AND (value = 1)) - Bitmap Index Scan on idx02_performance (cost=0.00..182.84 rows=20641 width=0) (actual time=167.306..167.306 rows=165769 loops=1) Index Cond: ((question_id = 6) AND (value = 1)) - Hash (cost=4621.13..4621.13 rows=10141 width=4) (actual time=182.842..182.842 rows=11134 loops=1) - Index Scan using idx01_perf_0005 on participant (cost=0.01..4621.13 rows=10141 width=4) (actual time=0.632..136.126 rows=11134 loops=1) Index Cond: (date_trunc('month'::text, created) = date_trunc('month'::text, (now() - '1 mon'::interval))) Filter: (status = 1) Total runtime: 7535.398 ms Once you're not under deadline, I'd like to investigate more closely to find out why 8.1 does worse than 8.0 here. Please tell me what I can do to help in clearing up this issue, I'd be very happy to help! Heck, I am happy anyway that there's such a quick fix, even if it's not a beautiful one :) Kind regards Markus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 15,000 tables
On 01.12.2005, at 17:04 Uhr, Michael Riess wrote: No. Our database contains tables for we content management systems. The server hosts approx. 500 cms applications, and each of them has approx. 30 tables. Just for my curiosity: Are the about 30 tables with similar schemas or do they differ much? We have a small CMS system running here, where I have all information for all clients in tables with relationships to a client table. But I assume you are running a pre-build CMS which is not designed for multi-client ability, right? cug -- PharmaLine, Essen, GERMANY Software and Database Development smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0
Markus Wollny [EMAIL PROTECTED] writes: Once you're not under deadline, I'd like to investigate more closely to find out why 8.1 does worse than 8.0 here. Please tell me what I can do to help in clearing up this issue, I'd be very happy to help! The first thing to do is get 8.0's EXPLAIN ANALYZE for the same query. After we see how that differs from 8.1, we'll know what the next question should be ... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 15,000 tables
Hi Tom, Michael Riess [EMAIL PROTECTED] writes: (We NEED that many tables, please don't recommend to reduce them) No, you don't. Add an additional key column to fold together different tables of the same structure. This will be much more efficient than managing that key at the filesystem level, which is what you're effectively doing now. Been there, done that. (see below) (If you really have 15000 distinct rowtypes, I'd like to know what your database design is...) Sorry, I should have included that info in the initial post. You're right in that most of these tables have a similar structure. But they are independent and can be customized by the users. Think of it this way: On the server there are 500 applications, and each has 30 tables. One of these might be a table which contains the products of a webshop, another contains news items which are displayed on the website etc. etc.. The problem is that the customers can freely change the tables ... add columns, remove columns, change column types etc.. So I cannot use system wide tables with a key column. Mike ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] 15,000 tables
hi michael Have you ANALYZEd your database? VACUUMing? Of course ... before 8.1 we routinely did a vacuum full analyze each night. As of 8.1 we use autovacuum. what i noticed is autovacuum not working properly as it should. i had 8.1 running with autovacuum for just 2 days or so and got warnings in pgadmin that my tables would need an vacuum. i've posted this behaviour some weeks ago to the novice list requesting more infos on how to tweak autovacuum properly - unfortunately without any respones. thats when i switched the nightly analyze job back on - everything runs smooth since then. maybe it helps in your case as well? cheers, thomas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] pg_stat* values ...
Not having found anything so far, does anyone know of, and can point me to, either tools, or articles, that talk about doing tuning based on the information that this sort of information can help with? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] filesystem performance with lots of files
David Lang [EMAIL PROTECTED] wrote a few weeks ago I did a series of tests to compare different filesystems. the test was for a different purpose so the particulars are not what I woud do for testing aimed at postgres, but I think the data is relavent) and I saw major differences between different filesystems, I'll see aobut re-running the tests to get a complete set of benchmarks in the next few days. My tests had their times vary from 4 min to 80 min depending on the filesystem in use (ext3 with hash_dir posted the worst case). what testing have other people done with different filesystems? That's good ... what benchmarks did you used? Regards, Qingqing ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 15,000 tables
Michael Riess [EMAIL PROTECTED] writes: On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote: we are currently running a postgres server (upgraded to 8.1) which has one large database with approx. 15,000 tables. Unfortunately performance suffers from that, because the internal tables (especially that which holds the attribute info) get too large. (We NEED that many tables, please don't recommend to reduce them) Have you ANALYZEd your database? VACUUMing? Of course ... before 8.1 we routinely did a vacuum full analyze each night. As of 8.1 we use autovacuum. VACUUM FULL was probably always overkill, unless always includes versions prior to 7.3... BTW, are you using some kind of weird ERP? I have one that treat informix as a fool and don't let me get all of informix potential... maybe the same is in your case... No. Our database contains tables for we content management systems. The server hosts approx. 500 cms applications, and each of them has approx. 30 tables. That's why I'm asking if it was better to have 500 databases with 30 tables each. In previous Postgres versions this led to even worse performance ... This has the feeling of fitting with Alan Perlis' dictum below... Supposing you have 500 databases, each with 30 tables, each with 4 indices, then you'll find you have, on disk... # of files = 500 x 30 x 5 = 75000 files If each is regularly being accessed, that's bits of 75000 files getting shoved through OS and shared memory caches. Oh, yes, and you'll also have regular participation of some of the pg_catalog files, with ~500 instances of THOSE, multiplied some number of ways... An application with 15000 frequently accessed tables doesn't strike me as being something that can possibly turn out well. You have, in effect, more tables than (arguably) bloated ERP systems like SAP R/3; it only has a few thousand tables, and since many are module-specific, and nobody ever implements *all* the modules, it is likely only a few hundred that are hot spots. No 15000 there... -- (format nil [EMAIL PROTECTED] cbbrowne cbbrowne.com) http://www3.sympatico.ca/cbbrowne/languages.html It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures. -- Alan J. Perlis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 15,000 tables
Hi Michael, I'm a fan of ReiserFS, and I can be wrong, but I believe using a journaling filesystem for the PgSQL database could be slowing things down. Gavin On Dec 1, 2005, at 6:51 AM, Michael Riess wrote: Hi David, with 15,000 tables you are talking about a LOT of files to hold these (30,000 files with one index each and each database being small enough to not need more then one file to hold it), on linux ext2/3 this many files in one directory will slow you down horribly. We use ReiserFS, and I don't think that this is causing the problem ... although it would probably help to split the directory up using tablespaces. But thanks for the suggestion! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings Gavin M. Roy 800 Pound Gorilla [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] COPY into table too slow with index
Im running postgresql 8.1.0 with postgis 1.0.4 on a FC3 system, 3Ghz, 1 GB memory. I am using COPY to fill a table that contains one postgis geometry column. With no geometry index, it takes about 45 seconds to COPY one file. If I add a geometry index, this time degrades. It keeps getting worse as more records are added to the table. It was up to over three minutes per file on my most recent test. The problem is that each file contains about 5 10 minutes of data. Eventually, I want to add the data to the table in real time. So the COPY needs to take less time than actually generating the data. Here is the relevant section of my postgresql.conf. # - Memory - shared_buffers = 5000 # min 16 or max_connections*2, 8KB each #temp_buffers = 1000 # min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 2 # min 64, size in KB maintenance_work_mem = 2 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB Any suggestions for improvement?
Re: [PERFORM] 15,000 tables
Am Donnerstag, den 01.12.2005, 10:07 -0800 schrieb Gavin M. Roy: Hi Michael, I'm a fan of ReiserFS, and I can be wrong, but I believe using a journaling filesystem for the PgSQL database could be slowing things down. Have a 200G+ database, someone pulling the power plug or a regular reboot after a year or so. Wait for the fsck to finish. Now think again :-) ++Tino ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 15,000 tables
Agreed. Also the odds of fs corruption or data loss are higher in a non journaling fs. Best practice seems to be to use a journaling fs but to put the fs log on dedicated spindles separate from the actual fs or pg_xlog. Ron At 01:40 PM 12/1/2005, Tino Wildenhain wrote: Am Donnerstag, den 01.12.2005, 10:07 -0800 schrieb Gavin M. Roy: Hi Michael, I'm a fan of ReiserFS, and I can be wrong, but I believe using a journaling filesystem for the PgSQL database could be slowing things down. Have a 200G+ database, someone pulling the power plug or a regular reboot after a year or so. Wait for the fsck to finish. Now think again :-) ++Tino ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 15,000 tables
Here's a fairly recent post on reiserfs (and performance): http://archives.postgresql.org/pgsql-novice/2005-09/msg7.php I'm still digging on performance of ext2 vrs journaled filesystems, as I know I've seen it before. Gavin My point was not in doing an fsck, but rather in On Dec 1, 2005, at 10:40 AM, Tino Wildenhain wrote: Am Donnerstag, den 01.12.2005, 10:07 -0800 schrieb Gavin M. Roy: Hi Michael, I'm a fan of ReiserFS, and I can be wrong, but I believe using a journaling filesystem for the PgSQL database could be slowing things down. Have a 200G+ database, someone pulling the power plug or a regular reboot after a year or so. Wait for the fsck to finish. Now think again :-) ++Tino Gavin M. Roy 800 Pound Gorilla [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 15,000 tables
Ron [EMAIL PROTECTED] writes: Agreed. Also the odds of fs corruption or data loss are higher in a non journaling fs. Best practice seems to be to use a journaling fs but to put the fs log on dedicated spindles separate from the actual fs or pg_xlog. I think we've determined that best practice is to journal metadata only (not file contents) on PG data filesystems. PG does expect the filesystem to remember where the files are, so you need metadata protection, but journalling file content updates is redundant with PG's own WAL logging. On a filesystem dedicated to WAL, you probably do not need any filesystem journalling at all --- we manage the WAL files in a way that avoids changing metadata for a WAL file that's in active use. A conservative approach would be to journal metadata here too, though. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Insert performance slows down in large batch
I am importing roughly 15 million rows in one batch transaction. I am currently doing this through batch inserts of around 500 at a time, although I am looking at ways to do this via multiple (one-per-table) copy commands for performance reasons. I am currently running: PostgreSQL 8.0.4, Redhat Enterprise Linux 4, ext3, all-on-one partition. I am aware of methods of improving performance by changing ext3 mounting options, splitting WAL, data, and indexes to separate physical disks, etc. I have also adjusted my shared_buffers, work_mem, maintenance_work_mem, and checkpoint_segments and can post their values if anyone thinks it is relevant to my question (See questions at the bottom) What confuses me is that at the beginning of the import, I am inserting roughly 25,000 rows every 7 seconds..and by the time I get towards the end of the import, it is taking 145 seconds for the same number of rows. The inserts are spread across 4 tables and I have dropped all indexes and constraints on these tables, including foreign keys, unique keys, and even primary keys (even though I think primary key doesn't improve performance) The entire bulk import is done in a single transaction. The result is a table with 4.8 million rows, two tables with 4.8*2 million rows, and another table with several thousand rows. So, my questions are: 1) Why does the performance degrade as the table sizes grow? Shouldn't the insert performance remain fairly constant if there are no indexes or constraints? 2) Is there anything I can do to figure out where the time is being spent? Will postgres log any statistics or information to help me diagnose the problem? I have pasted a fairly representative sample of vmstat below my e-mail in case it helps, although I'm not quite how to interpret it in this case. 3) Any other advice, other than the things I listed above (I am aware of using copy, ext3 tuning, multiple disks, tuning postgresql.conf settings)? Thanks in advance, Jeremy Haile #vmstat 2 20 procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0 9368 4416 2536 177878400 124513 2 2 0 96 2 1 0 9368 4416 2536 177878400 0 0 100553 25 0 75 0 1 1 9368 3904 2544 177932000 12164 6 1103 262 24 1 59 16 1 0 9368 3704 2552 177938000 1625624 1140 344 23 1 53 23 1 1 9368 2936 2560 178012000 16832 6 1143 359 23 1 52 24 1 1 9368 3328 2560 177971200 13120 0 285 24 1 58 18 1 0 9368 4544 2560 177855600 5184 0 1046 141 25 0 67 8 1 1 9368 3776 2568 177929600 7296 6 1064 195 24 0 67 9 1 0 9368 4480 2568 177854800 4096 0 1036 133 24 0 69 6 1 0 9368 4480 2576 177860800 7504 0 1070 213 23 0 67 10 1 0 9368 3136 2576 177990000 9536 0 1084 235 23 0 66 10 1 1 9368 3072 2584 177996000 13632 6 1118 313 24 1 60 16 1 0 9368 4480 2592 177859200 857624 1075 204 24 0 63 12 1 0 9368 4480 2592 177859200 0 6 100452 25 0 75 0 1 0 9368 4544 2600 177865200 0 6 100555 25 0 75 0 1 1 9368 3840 2600 177933200 11264 4 1098 260 24 0 63 13 1 1 9368 3072 2592 178015600 1708814 1145 346 24 1 51 24 1 1 9368 4096 2600 177912800 16768 6 1140 360 23 1 54 21 1 1 9368 3840 2600 177933200 16960 0 1142 343 24 1 54 22 1 0 9368 3436 2596 177967600 16960 0 1142 352 24 1 53 23 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 15,000 tables
Heh looks like I left a trailing thought... My post wasn't saying don't use journaled filesystems, but rather that it can be slower than non-journaled filesystems, and I don't consider recovery time from a crash to be a factor in determining the speed of reads and writes on the data. That being said, I think Tom's reply on what to journal and not to journal should really put an end to this side of the conversation. Gavin On Dec 1, 2005, at 10:49 AM, Gavin M. Roy wrote: Here's a fairly recent post on reiserfs (and performance): http://archives.postgresql.org/pgsql-novice/2005-09/msg7.php I'm still digging on performance of ext2 vrs journaled filesystems, as I know I've seen it before. Gavin My point was not in doing an fsck, but rather in On Dec 1, 2005, at 10:40 AM, Tino Wildenhain wrote: Am Donnerstag, den 01.12.2005, 10:07 -0800 schrieb Gavin M. Roy: Hi Michael, I'm a fan of ReiserFS, and I can be wrong, but I believe using a journaling filesystem for the PgSQL database could be slowing things down. Have a 200G+ database, someone pulling the power plug or a regular reboot after a year or so. Wait for the fsck to finish. Now think again :-) ++Tino Gavin M. Roy 800 Pound Gorilla [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings Gavin M. Roy 800 Pound Gorilla [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Insert performance slows down in large batch
Jeremy Haile [EMAIL PROTECTED] writes: 1) Why does the performance degrade as the table sizes grow? Shouldn't the insert performance remain fairly constant if there are no indexes or constraints? Yeah, insert really should be a constant-time operation if there's no add-on operations like index updates or FK checks. Can you get more information about where the time is going with gprof or oprofile? (I'm not sure if oprofile is available for RHEL4, but it is in Fedora 4 so maybe RHEL4 has it too.) If you're not comfortable with performance measurement tools, perhaps you could crank up a test case program that just generates dummy data and inserts it in the same way as your real application does. If you can confirm a slowdown in a test case that other people can look at, we'd be happy to look into the reason for it. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 15,000 tables
Michael Riess [EMAIL PROTECTED] writes: On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote: we are currently running a postgres server (upgraded to 8.1) which has one large database with approx. 15,000 tables. Unfortunately performance suffers from that, because the internal tables (especially that which holds the attribute info) get too large. (We NEED that many tables, please don't recommend to reduce them) Have you ANALYZEd your database? VACUUMing? Of course ... before 8.1 we routinely did a vacuum full analyze each night. As of 8.1 we use autovacuum. VACUUM FULL was probably always overkill, unless always includes versions prior to 7.3... Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, but the database got considerably slower near the end of the week. BTW, are you using some kind of weird ERP? I have one that treat informix as a fool and don't let me get all of informix potential... maybe the same is in your case... No. Our database contains tables for we content management systems. The server hosts approx. 500 cms applications, and each of them has approx. 30 tables. That's why I'm asking if it was better to have 500 databases with 30 tables each. In previous Postgres versions this led to even worse performance ... This has the feeling of fitting with Alan Perlis' dictum below... Supposing you have 500 databases, each with 30 tables, each with 4 indices, then you'll find you have, on disk... # of files = 500 x 30 x 5 = 75000 files If each is regularly being accessed, that's bits of 75000 files getting shoved through OS and shared memory caches. Oh, yes, and you'll also have regular participation of some of the pg_catalog files, with ~500 instances of THOSE, multiplied some number of ways... Not all of the tables are frequently accessed. In fact I would estimate that only 20% are actually used ... but there is no way to determine if or when a table will be used. I thought about a way to swap out tables which have not been used for a couple of days ... maybe I'll do just that. But it would be cumbersome ... I had hoped that an unused table does not hurt performance. But of course the internal tables which contain the meta info get too large. An application with 15000 frequently accessed tables doesn't strike me as being something that can possibly turn out well. You have, in effect, more tables than (arguably) bloated ERP systems like SAP R/3; it only has a few thousand tables, and since many are module-specific, and nobody ever implements *all* the modules, it is likely only a few hundred that are hot spots. No 15000 there.. I think that my systems confirms with the 80/20 rule ... . ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 15,000 tables
On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote: Michael Riess [EMAIL PROTECTED] writes: On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote: we are currently running a postgres server (upgraded to 8.1) which has one large database with approx. 15,000 tables. Unfortunately performance suffers from that, because the internal tables (especially that which holds the attribute info) get too large. (We NEED that many tables, please don't recommend to reduce them) Have you ANALYZEd your database? VACUUMing? Of course ... before 8.1 we routinely did a vacuum full analyze each night. As of 8.1 we use autovacuum. VACUUM FULL was probably always overkill, unless always includes versions prior to 7.3... Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, but the database got considerably slower near the end of the week. BTW, are you using some kind of weird ERP? I have one that treat informix as a fool and don't let me get all of informix potential... maybe the same is in your case... No. Our database contains tables for we content management systems. The server hosts approx. 500 cms applications, and each of them has approx. 30 tables. That's why I'm asking if it was better to have 500 databases with 30 tables each. In previous Postgres versions this led to even worse performance ... This has the feeling of fitting with Alan Perlis' dictum below... Supposing you have 500 databases, each with 30 tables, each with 4 indices, then you'll find you have, on disk... # of files = 500 x 30 x 5 = 75000 files If each is regularly being accessed, that's bits of 75000 files getting shoved through OS and shared memory caches. Oh, yes, and you'll also have regular participation of some of the pg_catalog files, with ~500 instances of THOSE, multiplied some number of ways... Not all of the tables are frequently accessed. In fact I would estimate that only 20% are actually used ... but there is no way to determine if or when a table will be used. I thought about a way to swap out tables which have not been used for a couple of days ... maybe I'll do just that. But it would be cumbersome ... I had hoped that an unused table does not hurt performance. But of course the internal tables which contain the meta info get too large. An application with 15000 frequently accessed tables doesn't strike me as being something that can possibly turn out well. You have, in effect, more tables than (arguably) bloated ERP systems like SAP R/3; it only has a few thousand tables, and since many are module-specific, and nobody ever implements *all* the modules, it is likely only a few hundred that are hot spots. No 15000 there.. I think that my systems confirms with the 80/20 rule ... . How many disks do you have i imagine you can put tables forming one logical database in a tablespace and have tables spread on various disks... -- Atentamente, Jaime Casanova (DBA: DataBase Aniquilator ;) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 15,000 tables
we are currently running a postgres server (upgraded to 8.1) which has one large database with approx. 15,000 tables. Unfortunately performance suffers from that, because the internal tables (especially that which holds the attribute info) get too large. (We NEED that many tables, please don't recommend to reduce them) Logically these tables could be grouped into 500 databases. My question is: Would performance be better if I had 500 databases (on one postgres server instance) which each contain 30 tables, or is it better to have one large database with 15,000 tables? In the old days of postgres 6.5 we tried that, but performance was horrible with many databases ... BTW: I searched the mailing list, but found nothing on the subject - and there also isn't any information in the documentation about the effects of the number of databases, tables or attributes on the performance. Now, what do you say? Thanks in advance for any comment! I've never run near that many databases on one box so I can't comment on the performance. But let's assume for the moment pg runs fine with 500 databases. The most important advantage of multi-schema approach is cross schema querying. I think as you are defining your problem this is a better way to do things. Merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 15,000 tables
On Thu, 2005-12-01 at 13:34, Michael Riess wrote: Michael Riess [EMAIL PROTECTED] writes: On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote: we are currently running a postgres server (upgraded to 8.1) which has one large database with approx. 15,000 tables. Unfortunately performance suffers from that, because the internal tables (especially that which holds the attribute info) get too large. (We NEED that many tables, please don't recommend to reduce them) Have you ANALYZEd your database? VACUUMing? Of course ... before 8.1 we routinely did a vacuum full analyze each night. As of 8.1 we use autovacuum. VACUUM FULL was probably always overkill, unless always includes versions prior to 7.3... Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, but the database got considerably slower near the end of the week. Generally, this means either your vacuums are too infrequent, or your fsm settings are too small. Note that vacuum and analyze aren't married any more, like in the old days. You can issue either separately, depending on your usage conditions. Note that with the newest versions of PostgreSQL you can change the settings for vacuum priority so that while it takes longer to vacuum, it doesn't stomp on the other processes toes so much anymore, so more frequent plain vacuums may be the answer. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 15,000 tables
[EMAIL PROTECTED] wrote: what i noticed is autovacuum not working properly as it should. i had 8.1 running with autovacuum for just 2 days or so and got warnings in pgadmin that my tables would need an vacuum. Hum, so how is autovacuum's documentation lacking? Please read it critically and let us know so we can improve it. http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM Maybe what you need is to lower the vacuum base threshold for tables that are small. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] COPY into table too slow with index: now an I/O question
As a follow up to my own question: I reran the COPY both ways (with the index and without) while running iostat. The following values are averages: %user %nice %sys %iowait %idle no index 39 0 2.8 11 47 index 16 1.5 2.1 34 46 Im no performance guru, so please indulge a couple of silly questions: 1) Why is there so much idle time? I would think the CPU would either be busy or waiting for IO. 2) It seems that I need to improve my disk situation. Would it help to add another drive to my PC and keep the input data on a separate drive from my pg tables? If so, some pointers on the best way to set that up would be appreciated. Please let me know if anyone has additional ideas. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rick Schumeyer Sent: Thursday, December 01, 2005 12:58 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] COPY into table too slow with index Im running postgresql 8.1.0 with postgis 1.0.4 on a FC3 system, 3Ghz, 1 GB memory. I am using COPY to fill a table that contains one postgis geometry column. With no geometry index, it takes about 45 seconds to COPY one file. If I add a geometry index, this time degrades. It keeps getting worse as more records are added to the table. It was up to over three minutes per file on my most recent test. The problem is that each file contains about 5 10 minutes of data. Eventually, I want to add the data to the table in real time. So the COPY needs to take less time than actually generating the data. Here is the relevant section of my postgresql.conf. # - Memory - shared_buffers = 5000 # min 16 or max_connections*2, 8KB each #temp_buffers = 1000 # min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 2 # min 64, size in KB maintenance_work_mem = 2 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB Any suggestions for improvement?
Re: [PERFORM] COPY into table too slow with index: now an I/O
Rick, On 12/1/05 2:18 PM, Rick Schumeyer [EMAIL PROTECTED] wrote: As a follow up to my own question: I reran the COPY both ways (with the index and without) while running iostat. The following values are averages: %user %nice %sys %iowait %idle no index 39 02.811 47 index16 1.5 2.1 34 46 I¹m no performance guru, so please indulge a couple of silly questions: 1) Why is there so much idle time? I would think the CPU would either be busy or waiting for IO. The 100% represents 2 CPUs. When one CPU is fully busy you should see 50% idle time. 2) It seems that I need to improve my disk situation. Would it help to add another drive to my PC and keep the input data on a separate drive from my pg tables? If so, some pointers on the best way to set that up would be appreciated. Putting the index and the table on separate disks will fix this IMO. I think you can do that using the TABLESPACE concept for each. The problem I see is nicely shown by the increase in IOWAIT between the two patterns (with and without index). It seems likely that the pattern is: A - insert a tuple into the table B - insert an entry into the index C - fsync the WAL - repeat This can be as bad as having a disk seek to access the table data every time the 8KB page boundary is crossed, then again for the index, then again for the WAL, and random disk seeks happen only as fast as about 10ms, so you can only do those at a rate of 100/s. Please let me know if anyone has additional ideas. This is a fairly common problem, some people drop the index, load the data, then recreate the index to get around it. - Luke ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] COPY into table too slow with index: now an I/O
Luke Lonergan [EMAIL PROTECTED] writes: The problem I see is nicely shown by the increase in IOWAIT between the two patterns (with and without index). It seems likely that the pattern is: A - insert a tuple into the table B - insert an entry into the index C - fsync the WAL - repeat This can be as bad as having a disk seek to access the table data every time the 8KB page boundary is crossed, then again for the index, then again for the WAL, and random disk seeks happen only as fast as about 10ms, so you can only do those at a rate of 100/s. That analysis is far too simplistic, because only the WAL write has to happen before the transaction can commit. The table and index writes will normally happen at some later point in the bgwriter, and with any luck there will only need to be one write per page, not per tuple. It is true that having WAL and data on the same spindle is bad news, because the disk head has to divide its time between synchronous WAL writes and asynchronous writes of the rest of the files. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] COPY into table too slow with index: now an I/O question
I only have one CPU. Is my copy of iostat confused, or does this have something to do with hyperthreading or dual core? (AFAIK, I don't have a dual core!) The problem (for me) with dropping the index during a copy is that it takes tens of minutes (or more) to recreate the geometry index once the table has, say, 50 million rows. -Original Message- From: Luke Lonergan [mailto:[EMAIL PROTECTED] Sent: Thursday, December 01, 2005 9:27 PM To: Rick Schumeyer; pgsql-performance@postgresql.org Subject: Re: [PERFORM] COPY into table too slow with index: now an I/O question Rick, On 12/1/05 2:18 PM, Rick Schumeyer [EMAIL PROTECTED] wrote: As a follow up to my own question: I reran the COPY both ways (with the index and without) while running iostat. The following values are averages: %user %nice %sys %iowait %idle no index 39 02.811 47 index16 1.5 2.1 34 46 I¹m no performance guru, so please indulge a couple of silly questions: 1) Why is there so much idle time? I would think the CPU would either be busy or waiting for IO. The 100% represents 2 CPUs. When one CPU is fully busy you should see 50% idle time. 2) It seems that I need to improve my disk situation. Would it help to add another drive to my PC and keep the input data on a separate drive from my pg tables? If so, some pointers on the best way to set that up would be appreciated. Putting the index and the table on separate disks will fix this IMO. I think you can do that using the TABLESPACE concept for each. The problem I see is nicely shown by the increase in IOWAIT between the two patterns (with and without index). It seems likely that the pattern is: A - insert a tuple into the table B - insert an entry into the index C - fsync the WAL - repeat This can be as bad as having a disk seek to access the table data every time the 8KB page boundary is crossed, then again for the index, then again for the WAL, and random disk seeks happen only as fast as about 10ms, so you can only do those at a rate of 100/s. Please let me know if anyone has additional ideas. This is a fairly common problem, some people drop the index, load the data, then recreate the index to get around it. - Luke ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 15,000 tables
So say I need 10,000 tables, but I can create tablespaces. Wouldn't that solve the performance problem caused by Linux's (or ext2/3's) problems with large directories? For example, if each user creates (say) 10 tables, and I have 1000 users, I could create 100 tablespaces, and assign groups of 10 users to each tablespace. This would limit each tablespace to 100 tables, and keep the ext2/3 file-system directories manageable. Would this work? Would there be other problems? Thanks, Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] COPY into table too slow with index: now an I/O
Tom, That analysis is far too simplistic, because only the WAL write has to happen before the transaction can commit. The table and index writes will normally happen at some later point in the bgwriter, and with any luck there will only need to be one write per page, not per tuple. That's good to know - makes sense. I suppose we might still thrash over a 1GB range in seeks if the BG writer starts running at full rate in the background, right? Or is there some write combining in the BG writer? It is true that having WAL and data on the same spindle is bad news, because the disk head has to divide its time between synchronous WAL writes and asynchronous writes of the rest of the files. That sounds right - could be tested by him turning fsync off, or by moving the WAL to a different spindle (note I'm not advocating running in production with fsync off). - Luke ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Database restore speed
Steve, When we restore the postmaster process tries to use 100% of the CPU. The questions we have are: 1) What is postmaster doing that it needs so much CPU? Parsing mostly, and attribute conversion from text to DBMS native formats. 2) How can we get our system to go faster? Use Postgres 8.1 or Bizgres. Get a faster CPU. These two points are based on our work to improve COPY speed, which led to a near doubling in Bizgres, and in the 8.1 version it's about 60-70% faster than in Postgres 8.0. There are currently two main bottlenecks in COPY, one is parsing + attribute conversion (if the postgres CPU is nailed at 100% that's what your limit is) and the other is the write speed through the WAL. You can roughly divide the write speed of your disk by 3 to get that limit, e.g. if your disk can write 8k blocks at 100MB/s, then your COPY speed might be limited to 33MB/s. You can tell which of these limits you've hit using vmstat 1 on Linux or iostat on Solaris and watch the blocks input/output on your disk while you watch your CPU. Note: We've tried adjusting the checkpoint_segements parameter to no effect. No surprise. - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] filesystem performance with lots of files
On Thu, 1 Dec 2005, Qingqing Zhou wrote: David Lang [EMAIL PROTECTED] wrote a few weeks ago I did a series of tests to compare different filesystems. the test was for a different purpose so the particulars are not what I woud do for testing aimed at postgres, but I think the data is relavent) and I saw major differences between different filesystems, I'll see aobut re-running the tests to get a complete set of benchmarks in the next few days. My tests had their times vary from 4 min to 80 min depending on the filesystem in use (ext3 with hash_dir posted the worst case). what testing have other people done with different filesystems? That's good ... what benchmarks did you used? I was doing testing in the context of a requirement to sync over a million small files from one machine to another (rsync would take 10 hours to do this over a 100Mb network so I started with the question 'how long would it take to do a tar-ftp-untar cycle with no smarts) so I created 1m x 1K files in a three deep directory tree (10d/10d/10d/1000files) and was doing simple 'time to copy tree', 'time to create tar', 'time to extract from tar', 'time to copy tarfile (1.6G file). I flushed the memory between each test with cat largefile /dev/null (I know now that I should have unmounted and remounted between each test), source and destination on different IDE controllers I don't have all the numbers readily available (and I didn't do all the tests on every filesystem), but I found that even with only 1000 files/directory ext3 had some problems, and if you enabled dir_hash some functions would speed up, but writing lots of files would just collapse (that was the 80 min run) I'll have to script it and re-do the tests (and when I do this I'll also set it to do a test with far fewer, far larger files as well) David Lang ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Open request for benchmarking input (fwd)
here are the suggestions from the MySQL folks, what additional tests should I do. I'd like to see some tests submitted that map out when not to use a particular database engine, so if you have a test that you know a particular database chokes on let me know (bonus credibility if you include tests that your own database has trouble with :) David Lang -- Forwarded message -- Date: Thu, 01 Dec 2005 16:14:25 David, The choice of benchmark depends on what kind of application would you like to see performance for. Than someone speaks about one or other database to be faster than other in general, it makes me smile. That would be the same as tell one car would be able to win all competitions starting from Formula-1 and ending with off-road racing. There are certain well known cases when MySQL will be faster - for example in memory storage engine is hard to beat in point selects, or bulk inserts in MyISAM (no transactional overhead). There are certain known cases when MySQL would not perform well - it is easy to build the query using subqueries which would be horribly slow on MySQL but decent on postgresql... but well writing application for MySQL you would not write such query. I think most database agnostic way would be to select the workload from user point of view and have it implemented the most efficient way for database in question - for example you may find TPC-C implementations by different vendors are a lot different. For my own interests, I would like to at least cover the following bases: 32 bit vs 64 bit vs 64 bit kernel + 32 bit user-space; data warehouse type tests (data memory); and web prefs test (active data RAM) You may grab Dell DVD store: http://linux.dell.com/dvdstore/ for Web benchmark. It does not have PostgreSQL build in but there some implementations available in the Internet DBT2 by OSDL is other good candidate - it does support postgreSQL and MySQL natively. If you want some raw performance number such as number selects/sec you may use SysBench - http://sysbench.sourceforge.net For DataWarehouse workloads you could grab TPC-H or DBT3 implementation by OSDL - We run this successfully with MySQL You also could take a look at http://benchw.sourceforge.net/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] filesystem performance with lots of files
On Fri, 2 Dec 2005, David Lang wrote: I don't have all the numbers readily available (and I didn't do all the tests on every filesystem), but I found that even with only 1000 files/directory ext3 had some problems, and if you enabled dir_hash some functions would speed up, but writing lots of files would just collapse (that was the 80 min run) Interesting. I would suggest test small number but bigger file would be better if the target is for database performance comparison. By small number, I mean 10^2 - 10^3; By bigger, I mean file size from 8k to 1G (PostgreSQL data file is at most this size under normal installation). Let's take TPCC as an example, if we get a TPCC database of 500 files, each one is at most 1G (PostgreSQL has this feature/limit in ordinary installation), then this will give us a 500G database, which is big enough for your current configuration. Regards, Qingqing ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 15,000 tables
On Thu, 1 Dec 2005, Craig A. James wrote: So say I need 10,000 tables, but I can create tablespaces. Wouldn't that solve the performance problem caused by Linux's (or ext2/3's) problems with large directories? For example, if each user creates (say) 10 tables, and I have 1000 users, I could create 100 tablespaces, and assign groups of 10 users to each tablespace. This would limit each tablespace to 100 tables, and keep the ext2/3 file-system directories manageable. Would this work? Would there be other problems? This would definantly help, however there's still the question of how large the tables get, and how many total files are needed to hold the 100 tables. you still have the problem of having to seek around to deal with all these different files (and tablespaces just spread them further apart), you can't solve this, but a large write-back journal (as opposed to metadata-only) would mask the problem. it would be a trade-off, you would end up writing all your data twice, so the throughput would be lower, but since the data is safe as soon as it hits the journal the latency for any one request would be lower, which would allow the system to use the CPU more and overlap it with your seeking. David Lang ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Database restore speed
On Fri, 2 Dec 2005, Luke Lonergan wrote: Steve, When we restore the postmaster process tries to use 100% of the CPU. The questions we have are: 1) What is postmaster doing that it needs so much CPU? Parsing mostly, and attribute conversion from text to DBMS native formats. 2) How can we get our system to go faster? Use Postgres 8.1 or Bizgres. Get a faster CPU. These two points are based on our work to improve COPY speed, which led to a near doubling in Bizgres, and in the 8.1 version it's about 60-70% faster than in Postgres 8.0. There are currently two main bottlenecks in COPY, one is parsing + attribute conversion (if the postgres CPU is nailed at 100% that's what your limit is) and the other is the write speed through the WAL. You can roughly divide the write speed of your disk by 3 to get that limit, e.g. if your disk can write 8k blocks at 100MB/s, then your COPY speed might be limited to 33MB/s. You can tell which of these limits you've hit using vmstat 1 on Linux or iostat on Solaris and watch the blocks input/output on your disk while you watch your CPU. Luke, would it help to have one machine read the file and have it connect to postgres on a different machine when doing the copy? (I'm thinking that the first machine may be able to do a lot of the parseing and conversion, leaving the second machine to just worry about doing the writes) David Lang ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings