Re: [PERFORM] Best OS for Postgres 8.2
On Tue, 2007-05-08 at 23:31 -0400, Greg Smith wrote: My issue wasn't with the idea, it was with the implementation. When I have my newbie hat on, it adds a layer of complexity that isn't needed for simple installs. I find it very hard to agree with that. As a newbie I install postgresql and have a database server installed, and operating. The fact that the DB files are installed somewhere like /var/lib/postgresql/8.1/main is waaay beyond newbie. At that point I can createdb or createuser, but I _do_ _not_ need to know anything about the cluster stuff until there is more than one DB on the machine. The Debian wrappers all default appropriately for the single-cluster case, so having a single cluster has added _no_ perceivable complexity for a newbie (as it should). If you have a second cluster, whether it's the same Pg version or not, things necessarily start to get complicated. OTOH I haven't had any problem explaining to people that the --cluster option applies, and there are sane ways to make that default to a reasonable thing as well. All in all I think that the Debian scripts are excellent. I'm sure there are improvements that could be made, but overall they don't get in the way, they do the right thing in the minimal case, and they give the advanced user a lot more choices about multiple DB instances on the same machine. Cheers, Andrew McMillan. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 Open Source: the difference between trust and antitrust - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Question about memory allocations
On Tue, 2007-04-10 at 15:28 -0400, Steve wrote: I'm trying to tune the memory usage of a new machine that has a -lot- of memory in it (32 gigs). ... shared_buffers = 16GB Really? Wow! Common wisdom in the past has been that values above a couple of hundred MB will degrade performance. Have you done any benchmarks on 8.2.x that show that you get an improvement from this, or did you just take the too much of a good thing is wonderful approach? Cheers, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 You have an unusual equipment for success. Be sure to use it properly. - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Temporary Table
On Tue, 2005-11-08 at 10:22 +, Christian Paul B. Cosinas wrote: I see. But How Can I put this in the Cron of my Linux Server? I really don't have an idea :) What I want to do is to loop around all the databases in my server and execute the vacuum of these 3 tables in each tables. I usually write a small shell script something like: == #!/bin/sh psql somedatabase EOQ VACUUM this; VACUUM that; DELETE FROM someotherplace WHERE delete_this_record; EOQ == and so forth... This makes the SQL quite nicely readable. Regards, Andrew McMillan. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 You work very hard. Don't try to think as well. - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0
On Fri, 2005-07-29 at 09:37 -0800, Matthew Schumacher wrote: On my laptop this takes: real1m33.758s user0m4.285s sys 0m1.181s One interesting effect is the data in bayes_vars has a huge number of updates and needs vacuum _frequently_. After the run a vacuum full compacts it down from 461 pages to 1 page. I wonder why your laptop is so much faster. My 2 min 30 sec test was done on a dual xeon with a LSI megaraid with 128MB cache and writeback caching turned on. I only do development stuff on my laptop, and all of my databases are reconstructable from copies, etc... so I turn off fsync in this case. How often should this table be vacuumed, every 5 minutes? I would be tempted to vacuum after each e-mail, in this case. Also, this test goes a bit faster with sync turned off, if mysql isn't using sync that would be why it's so much faster. Anyone know what the default for mysql is? It depends on your table type for MySQL. For the data in question (i.e. bayes scoring) it would seem that not much would be lost if you did have to restore your data from a day old backup, so perhaps fsync=false is OK for this particular application. Regards, Andrew McMillan. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 What we wish, that we readily believe. -- Demosthenes - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0
On Thu, 2005-07-28 at 16:13 -0800, Matthew Schumacher wrote: Ok, I finally got some test data together so that others can test without installing SA. The schema and test dataset is over at http://www.aptalaska.net/~matt.s/bayes/bayesBenchmark.tar.gz I have a pretty fast machine with a tuned postgres and it takes it about 2 minutes 30 seconds to load the test data. Since the test data is the bayes information on 616 spam messages than comes out to be about 250ms per message. While that is doable, it does add quite a bit of overhead to the email system. On my laptop this takes: real1m33.758s user0m4.285s sys 0m1.181s One interesting effect is the data in bayes_vars has a huge number of updates and needs vacuum _frequently_. After the run a vacuum full compacts it down from 461 pages to 1 page. Regards, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 I don't do it for the money. -- Donald Trump, Art of the Deal - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Postgresql and Software RAID/LVM
On Fri, 2005-06-03 at 11:45 -0700, Steve Poe wrote: I have a small business client that cannot afford high-end/high quality RAID cards for their next server. That's a seperate argument/issue right there for me, but what the client wants is what the client wants. Has anyone ran Postgres with software RAID or LVM on a production box? What have been your experience? Hi, We regularly run LVM on top of software raid for our PostgreSQL servers (and our other servers, for that matter). As far as I can see these systems have not had issues related to either software RAID or LVM - that's around 30 systems all up, maybe 8 running PostgreSQL, in production. The database servers are a variety of dual-Xeon (older) and dual-Opteron (newer) systems. The Xeons are all running Debian Woody with 2.4.xx kernels and the Opterons are all running Sarge with 2.6.x kernels. Regards, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 The secret of being a bore is to say everything -- Voltaire - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Adaptec/LSI/?? RAID
On Wed, 2005-06-01 at 20:42 -0700, Stacy White wrote: We're in the process of buying another Opteron server to run Postgres, and based on the suggestions in this list I've asked our IT director to get an LSI MegaRaid controller rather than one of the Adaptecs. But when we tried to place our order, our vendor (Penguin Computing) advised us: we find LSI does not work well with 4GB of RAM. Our engineering find that LSI card could cause system crashes. One of our customer ... has found that Adaptec cards works well on PostGres SQL -- they're using it as a preforce server with xfs and post-gress. Any comments? Suggestions for other RAID controllers? Hi, We're using the Megaraid (Intel branded model) on a dual Opteron system with 8G RAM very happily. The motherboard is a RioWorks one, the OS is Debian Sarge AMD64 with kernel 2.6.11.8 and PostgreSQL 7.4.7. Cheers, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Adaptec/LSI/?? RAID (what about JBOD?)
On Thu, 2005-06-02 at 14:02 -0700, [EMAIL PROTECTED] wrote: I have a similar question about what to choose (either LSI or Adaptec U320), but plan to use them just for JBOD drivers. I expect to be using either net or freebsd. The system CPU will be Opteron. My impression is that both the ahd and mpt drivers (for U320 Adaptec and LSI, respectively) are quite stable, but not from personal experience. Like I said, I don't plan to have the cards doing RAID in hardware. Should I be pretty safe with either choice of HBA then? On the machine I mentioned earlier in this thread we use the Megaraid for JBOD, but the card setup to use the disks that way was somewhat confusing, requiring us to configure logical drives that in fact matched the physical ones. The card still wanted to write that information onto the disks, reducing the total disk space available by some amount, but also meaning that we were unable to migrate our system from a previous non-RAID card cleanly. Regards, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 Whereof one cannot speak, thereon one must remain silent. -- Wittgenstein - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] OID vs overall system performances on high load
On Sun, 2005-05-29 at 16:17 -0400, Eric Lauzon wrote: I am still in the dark due to my lack of knowledge on internal OID management,but i would presume that a table with OID enable and that has high load would require some more work from pgsql internal to maintain the OID index for the database. So OID can be beneficial on static tables, or tables that you want to be able to manipulate with pgadmin X , but can a table without OID increase performances on insert,delete,update,COPY? I am not really worried about disk space that an OID collumn can take, but i was wandering if an insert in a table of 20 millions and more that has oid would slow the insert process. Since OID seem to act as a global index mabey maintaning that index can become costy over high table load by postgresql backend. There is no OID index, unless you create one. The disk space that an OID column can take has an effect on performance: reducing the amount of physical disk reads will mean that more of your real data is cached, and so forth. How much effect it will have will depend on the relative size of the OID column and the other columns in your data. Regards, Andrew McMillan. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 http://survey.net.nz/ - any more questions? - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] OID vs overall system performances on high load
On Fri, 2005-05-27 at 13:05 -0400, Eric Lauzon wrote: What are the effect of having a table with arround 500 insert/update/delete on two to eight table in a time frame of 2 minutes 24/24h, when you have oid enabled versus the same setup when you dont have oid? That deployment is done on a postgres with 8 to 9 databases, each having those 2 to 8 high load tables with oid enabled. Would the oid colum slow down table scan when you have over 20 millions row? Would the cost of maintaining the oid column inside thoses high load tables when there is no oid reference used for data seeking costy for postgres ressources!? The OID column is an extra few bytes on each row. If you don't have any use for it (and let's face it: most of us don't), then create your tables without OID. The amount of impact that it makes will depend on what the general row size is. If they are rows with a couple of integers then the size of an OID column will be a significant portion of the size of each row, and removing it will make the physical on-disk data size significantly smaller. If the size of the average row is (e.g.) 2k then the OID will only be a very small fraction of the data, and removing it will only make a small difference. Regards, Andrew McMillan. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 ... I want a COLOR T.V. and a VIBRATING BED!!! - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Using LIMIT changes index used by planner
On Mon, 2004-12-13 at 01:13 -0500, Sven Willenberger wrote: I have a question regarding a serious performance hit taken when using a LIMIT clause. I am using version 7.4.6 on FreeBSD 4.10-STABLE with 2GB of memory. The table in question contains some 25 million rows with a bigserial primary key, orderdate index and a referrer index. The 2 select statements are as follow: It's an interesting question, but to be able to get answers from this list you will need to provide EXPLAIN ANALYZE ... rather than just EXPLAIN AFAICS the bad plan on LIMIT is because it optimistically thinks the odds are around the 0.00 end, rather than the 64297840.86 end, and indeed that is what the Limit ... estimate is showing. A bad plan (in your case) is encouraged here by the combination of LIMIT and ORDER BY. For real background on this, and calculated recommendations, we'd need that more detailed output though. As a quick hack, it's possible that you could improve things by increasing the samples on relevant columns with some judicious ALTER TABLE ... ALTER COLUMN ... SET STATISTICS ... commands. Cheers, Andrew McMillan. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 Planning an election? Call us! - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] pg_restore taking 4 hours!
On Wed, 2004-12-01 at 09:16 -0200, Rodrigo Carvalhaes wrote: I am using PostgreSQL with a proprietary ERP software in Brazil. The database have around 1.600 tables (each one with +/- 50 columns). ... max_fsm_pages = 2 max_fsm_relations = 1000 Hi, I doubt that this will improve your pg_restore performance, but if you have 1600 tables in the database then you very likely want to increase the above two settings. In general max_fsm_relations should be more than the total number of tables across all databases in a given installation. The best way to set these is to do a VACUUM VERBOSE, which will print the appropriate minimum numbers at the end of the run, along with the current setting. Cheers, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 Never trust a computer you can't repair yourself. - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Postgres vs. DSpam
On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote: Hi, I have installed the dspam filter (http://www.nuclearelephant.com/projects/dspam) on our mail server (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users with a quite low traffic of 4000 messages/day. So it's a quite common platform/environment, nothing spectacular. I am using DSpam with PostgreSQL here. I have a daily job that cleans the DSpam database up, as follows: DELETE FROM dspam_token_data WHERE (innocent_hits*2) + spam_hits 5 AND CURRENT_DATE - last_hit 60; DELETE FROM dspam_token_data WHERE innocent_hits = 1 AND CURRENT_DATE - last_hit 30; DELETE FROM dspam_token_data WHERE CURRENT_DATE - last_hit 180; DELETE FROM dspam_signature_data WHERE CURRENT_DATE - created_on 14; VACUUM dspam_token_data; VACUUM dspam_signature_data; I also occasionally do a VACUUM FULL ANALYZE; on the database as well. In all honesty though, I think that MySQL is better suited to DSpam than PostgreSQL is. Please, could anyone explain me this difference? Is Postgres that bad? Is MySQL that good? Am I the only one to have observed this behavior? I believe that what DSpam does that is not well-catered for in the way PostgreSQL operates, is that it does very frequent updates to rows in (eventually) quite large tables. In PostgreSQL the UPDATE will result internally in a new record being written, with the old record being marked as deleted. That old record won't be re-used until after a VACUUM has run, and this means that the on-disk tables will have a lot of dead rows in them quite quickly. The reason that PostgreSQL operates this way, is a direct result of the way transactional support is implemented, and it may well change in a version or two. It's got better over the last few versions, with things like pg_autovacuum, but that approach still doesn't suit some types of database updating. Cheers, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!! - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Checking = with timestamp field is slow
On Fri, 2004-11-05 at 12:46 +0530, Antony Paul wrote: Hi all, I have a table which have more than 20 records. I need to get the records which matches like this where today::date = '2004-11-05'; This is the only condition in the query. There is a btree index on the column today. Is there any way to optimise it. Hi Antony, I take it your field is called today (seems dodgy, but these things happen...). Anywa, have you tried indexing on the truncated value? create index xyz_date on xyz( today::date ); analyze xyz; That's one way. It depends on how many of those 200,000 rows are on each date too, as to whether it will get used by your larger query. Regards, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 When in doubt, tell the truth. -- Mark Twain - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Insert performance, what should I expect?
On Wed, 2004-10-20 at 11:53 +1000, Brock Henry wrote: Test 1, For each import, I'm dropping all indexes and pkeys/fkeys, then importing, then adding keys and indexes. Then I've got successive runs. I figure the reindexing will get more expensive as the database grows? Sounds like the right approach to me, if the tables are empty before the import. Successive Imports: 44,49,50,57,55,61,72 (seconds) = average 1051inserts/second (which now that I've written this seems fairly good) (A) Are you doing the whole thing inside a transaction? This will be significantly quicker. COPY would probably be quicker still, but the biggest difference will be a single transaction. (B) If you are starting with empty files, are you ensuring that the dead records are vacuumed before you start? I would recommend a vacuum full on the affected tables prior to the first import run (i.e. when the tables are empty). This is likely to be the reason that the timing on your successive imports increases so much. sort_mem = 4096 You probably want to increase this - if you have 1G of RAM then there is probably some spare. But if you actually expect to use 32 connections then 32 * 4M = 128M might mean a careful calculation is needed. If you are really only likely to have 1-2 connections running concurrently then increase it to (e.g.) 32768. max_fsm_relations = 300 If you do a vacuum full verbose; the last line will give you some clues as to what to set this (and max_fsm_pages) too. effective_cache_size = 16000 16000 * 8k = 128M seems low for a 1G machine - probably you could say 64000 without fear of being wrong. What does free show as cached? Depending on how dedicated the machine is to the database, the effective cache size may be as much as 80-90% of that. Can I expect it to go faster than this? I'll see where I can make my script itself go faster, but I don't think I'll be able to do much. I'll do some pre-prepare type stuff, but I don't expect significant gains, maybe 5-10%. I'd could happily turn off fsync for this job, but not for some other databases the server is hosting. You can probably double the speed - maybe more. Cheers, Andrew, - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 How many things I can do without! -- Socrates - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Seq scan vs. Index scan with different query
On Mon, 2004-07-05 at 12:15 +0200, [EMAIL PROTECTED] wrote: Hello, Can anybody suggest any hint on this: temp= EXPLAIN SELECT DISTINCT number FROM tablex WHERE Date BETWEEN '2004-06-28'::date AND '2004-07-04'::date AND Time BETWEEN '00:00:00'::time AND '18:01:00'::time; Unique (cost=305669.92..306119.43 rows=89 width=8) - Sort (cost=305669.92..305894.67 rows=89903 width=8) Sort Key: number - Index Scan using DateTimeIndex on tablex (cost=0.00..298272.66 rows=89903 width=8) Index Cond: ((Date = '2004-06-28'::date) AND (Date = '2004-07-04'::date) AND (Time = '00:00:00'::time without time zone) AND (Time = '18:01:00'::time without time zone)) temp= EXPLAIN SELECT DISTINCT number FROM tablex WHERE Date BETWEEN '2004-06-28'::date AND '2004-07-04'::date AND Time BETWEEN '00:00:00'::time AND '19:01:00'::time; Unique (cost=315252.77..315742.27 rows=97 width=8) - Sort (cost=315252.77..315497.52 rows=97900 width=8) Sort Key: number - Seq Scan on tablex (cost=0.00..307137.34 rows=97900 width=8) Filter: ((Date = '2004-06-28'::date) AND (Date = '2004-07-04'::date) AND (Time = '00:00:00'::time without time zone) AND (Time = '19:01:00'::time without time zone)) Basically, the difference is in upper Time value (as you can see, it's 18:01:00 in the first query and 19:01:00 in the other one). The question is - why does it use index in first case and it tries to do full sequential scan when the upper Time value is different? DateTimeIndex was created on both columns (Date/Time): CREATE INDEX DateTimeIndex ON tablex USING btree (Date, Time); PostgreSQL is always going to switch at some point, where the number of rows that have to be read from the table exceed some percentage of the total rows in the table. We can possibly be more helpful if you send EXPLAIN ANALYZE, rather than just EXPLAIN. A few things to be careful of: - Is this supposed to be a slice of midnight to 6pm, for each day between 28 June and 4 July? If you want a continuous period from Midnight 28 June - 6pm 4 July you're better to have a single timestamp field. - It is unlikely that the , Time on your index is adding much to your selectivity, and it may be that you would be better off without it. - the DISTINCT can screw up your results, and it usually means that the SQL is not really the best it could be. A _real_ need for DISTINCT is quite rare in my experience, and from what I have seen it adds overhead and tends to encourage bad query plans when used unnecessarily. Hope this is some help. Regards, Andrew McMillan - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 Make things as simple as possible, but no simpler -- Einstein - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Slow vacuum performance
On Thu, 2004-06-17 at 13:09 -0700, Patrick Hatcher wrote: Pg: 7.4.2 RedHat 7.3 Ram: 8gig I have 6 million row table that I vacuum full analyze each night. The time seems to be streching out further and further as I add more rows. I read the archives and Josh's annotated pg.conf guide that setting the FSM higher might help. Currently, my memory settings are set as such. Does this seem low? Last reading from vaccum verbose: INFO: analyzing cdm.cdm_ddw_customer INFO: cdm_ddw_customer: 209106 pages, 3000 rows sampled, 6041742 estimated total rows I think I should now set my max FSM to at least 21 but wanted to make sure Yes, that's my interpretation of those numbers too. I would set max_fsm_pages to 30 (or more) in that case. If you have 8G of RAM in the machine your shared_buffers seems very low too. Depending on how it is used I would increase that to at least the recommended maximum (1 - 80M). You don't quote your setting for effective_cache_size, but you should probably look at what /usr/bin/free reports as cached, divide that by 10, and set it to that as a quick rule of thumb... Regards, Andrew McMillan shared_buffers = 2000 # min 16, at least max_connections*2, 8KB each sort_mem = 12288# min 64, size in KB # - Free Space Map - max_fsm_pages = 10 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~50 bytes each - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 Make things as simple as possible, but no simpler -- Einstein - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] DB Design
On Wed, 2004-05-19 at 15:37 +0800, Michael Ryan S. Puncia wrote: Hi Guys, My question is .. which is better design 1. Single Table with 50 million records or 2. Multiple Table using inheritance to the parents table It's not that simple. Given your e-mail address I assume you want to store Philippines Census data in such a table, but does Census data fit well in a single flat table structure? Not from what I have seen here in NZ, but perhaps Census is simpler there. So to know what the best answer to that question is, people here will surely need more and better information from you about database schema, record size, indexing and query characteristics, and so on. I will use this only for query purpose .. Then you may quite possibly want to consider a different database. Particularly if it is single-user query purposes. For example, there are some SQL databases that would load the entire database into RAM from static files, and then allow query against this. This can obviously give huge performance improvements in situations where volatility is not a problem. Cheers, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Do not overtax your powers. - ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Quad processor options
On Tue, 2004-05-11 at 15:46 -0700, Paul Tuckfield wrote: - the cache column shows that linux is using 2.3G for cache. (way too much) you generally want to give memory to postgres to keep it close to the user, not leave it unused to be claimed by linux cache (need to leave *some* for linux tho) My recommendations: - I'll bet you have a low value for shared buffers, like 1. On your 3G system you should ramp up the value to at least 1G (125000 8k buffers) unless something else runs on the system. It's best to not do things too drastically, so if Im right and you sit at 1 now, try going to 3 then 6 then 125000 or above. Huh? Doesn't this run counter to almost every piece of PostgreSQL performance tuning advice given? I run my own boxes with buffers set to around 1-2 and an effective_cache_size = 375000 (8k pages - around 3G). That's working well with PostgreSQL 7.4.2 under Debian woody (using Oliver Elphick's backported packages from http://people.debian.org/~elphick/debian/). Regards, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Q:How much does it cost to ride the Unibus? A:2 bits. - ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
On Thu, 2004-04-22 at 10:37 -0700, Josh Berkus wrote: Tom, The tricky part is that a slow adaptation rate means we can't have every backend figuring this out for itself --- the right value would have to be maintained globally, and I'm not sure how to do that without adding a lot of overhead. This may be a moot point, since you've stated that changing the loop timing won't solve the problem, but what about making the test part of make? I don't think too many systems are going to change processor architectures once in production, and those that do can be told to re-compile. Sure they do - PostgreSQL is regularly provided as a pre-compiled distribution. I haven't compiled PostgreSQL for years, and we have it running on dozens of machines, some SMP, some not, but most running Debian Linux. Even having a compiler _installed_ on one of our client's database servers would usually be considered against security procedures, and would get a black mark when the auditors came through. Regards, Andrew McMillan - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Planning an election? Call us! - ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] good pc but bad performance,why?
On Wed, 2004-04-07 at 20:56, huang yaqin wrote: Hello, Richard Huxton, You said turning fsync off may cause losing data, that's terrible. I use SCSI disk, and file system is ext3. I tune postgresql.conf and can't get any improvement. So what can I do? Does SCSI disk and IDE disk have difference? Yes, turning off fsync means that the database is not guaranteeing consistency of writes to disk any longer. On the other hand your IDE system probably never was, because IDE drives just typically turn on write caching in hardware without telling anyone. SCSI typically doesn't turn on write caching in the physical drive by default, as Tom Lane pointed out earlier. Good SCSI has a battery backed up cache, and then it is OK to turn on write caching, because the controller has enough battery to complete all writes in the event of a power failure. One thing I recommend is to use ext2 (or almost anything but ext3). There is no real need (or benefit) from having the database on a journalled filesystem - the journalling is only trying to give similar sorts of guarantees to what the fsync in PostgreSQL is doing. The suggestion someone else made regarding use of software raid is probably also a good one if you are trying to use the on-board RAID at the moment. Finally, I would say that because you are seeing poor performance on one box and great performance on another, you should look at the hardware, or at the hardware drivers, for the problem - not so much at PostgreSQL. Of course if it is application performance you want to achieve, we _can_ help here, but you will need to provide more details of what you are trying to do in your application, including; - confirmation that you have done a VACUUM and ANALYZE of all tables before you start - output from EXPLAIN ANALYZE for slow queries - anything else you think is useful. without that sort of detail we can only give vague suggestions, like wrap everything in a transaction - excellent advice, certainly, but you can read that in the manual. There are no magic bullets, but I am sure most of the people on this list have systems that regularly do way more than 50 inserts / second on server hardware. Regards, Andrew McMillan - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 http://survey.net.nz/ - any questions? - ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 100 simultaneous connections, critical limit?
On Thu, 2004-01-15 at 01:48, Jn Ragnarsson wrote: I am writing a website that will probably have some traffic. Right now I wrap every .php page in pg_connect() and pg_close(). Then I read somewhere that Postgres only supports 100 simultaneous connections (default). Is that a limitation? Should I use some other method when writing code for high-traffic website? Whether the overhead of pg_connect() pg_close() has a noticeable effect on your application depends on what you do in between them. TBH I never do that second one myself - PHP will close the connection when the page is finished. I have developed some applications which are trying to be as-fast-as-possible and for which I either use pg_pconnect so you have one DB connection per Apache process, or I use DBBalancer where you have a pool of connections, and pg_connect is _actually_ connecting to DBBalancer in a very low-overhead manner and you have a pool of connections out the back. I am the Debian package maintainer for DBBalancer. You may also want to consider differentiating based on whether the application is writing to the database or not. Pooling and persistent connections can give weird side-effects if transaction scoping is bollixed in the application - a second page view re-using an earlier connection which was serving a different page could find itself in the middle of an unexpected transaction. Temp tables are one thing that can bite you here. There are a few database pooling solutions out there. Using pg_pconnect is the simplest of these, DBBalancer fixes some of it's issues, and others go further still. Another point to consider is that database pooling will give you the biggest performance increase if your queries are all returning small datasets. If you return large datasets it can potentially make things worse (depending on implementation) through double-handling of the data. As others have said too: 100 is just a configuration setting in postgresql.conf - not an implemented limit. Cheers, Andrew McMillan. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 How many things I can do without! - Socrates - ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Tuning PostgreSQL
On Wed, 2003-07-23 at 00:53, Alexander Priem wrote: Wow, I never figured how many different RAID configurations one could think of :) After reading lots of material, forums and of course, this mailing-list, I think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm each), one of those six disks will be a 'hot spare'. I will just put the OS, the WAL and the data one one volume. RAID10 is way to expensive :) The general heuristic is that RAID-5 is not the way to deal with databases. Now surely someone will disagree with me, but as I understand it RAID-5 has a bottleneck on a single disk for the (checksum) information. Bottleneck is not the word you want to hear in the context of database server. RAID-1 (mirroring) or RAID-10 (sort-of-mirrored-RAID-5) is the best choice. As far as FS performance goes, a year or two ago I remember someone doing an evaluation of FS performance for PostgreSQL and they found that the best performance was... FAT Yep: FAT The reason is that a lot of what the database is doing, especially guaranteeing writes (WAL) and so forth is best handled through a filesystem that does not get in the way. The fundamentals will not have changed. It is for this reason that ext2 is very much likely to be better than ext3. XFS is possibly (maybe, perhaps) OK, because there are optimisations in there for databases, but the best optimisation is to not be there at all. That's why Oracle want direct IO to disk partitions so they can implement their own filesystem (i.e. record system... table system...) on a raw partition. Personally I don't plan to reboot my DB server more than once a year (if that (even my_laptop currently has 37 days uptime, not including suspend). On our DB servers I use ext2 (rather than ext3) mounted with noatime, and I bite the 15 minutes to fsck (once a year) rather than screw general performance with journalling database on top of journalling FS. I split pg_xlog onto a separate physical disk, if performance requirements are extreme. Catalyst's last significant project was to write the Domain Name registration system for .nz (using PostgreSQL). Currently we are developing the electoral roll for the same country (2.8 million electors living at 1.4 million addresses). We use Oracle (or Progress, or MySQL) if a client demands them, but we use PostgreSQL if we get to choose. Increasingly we get to choose. Good. Regards, Andrew. -- - Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694OFFICE: +64(4)499-2267 Survey for nothing with http://survey.net.nz/ - ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]