Re: [PERFORM] SCSI vs SATA
On 4-Apr-07, at 8:46 AM, Andreas Kostyrka wrote: * Peter Kovacs <[EMAIL PROTECTED]> [070404 14:40]: This may be a silly question but: will not 3 times as many disk drives mean 3 times higher probability for disk failure? Also rumor has it that SATA drives are more prone to fail than SCSI drivers. More failures will result, in turn, in more administration costs. Actually, the newest research papers show that all discs (be it desktops, or highend SCSI) have basically the same failure statistics. But yes, having 3 times the discs will increase the fault probability. I highly recommend RAID6 to anyone with more than 6 standard SATA drives in a single array. It's actually fairly probable that you will lose 2 drives in a 72 hour window (say over a long weekend) at some point. Andreas Thanks Peter On 4/4/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: On Tue, 3 Apr 2007, Geoff Tolley wrote: Ron wrote: At 07:07 PM 4/3/2007, Ron wrote: For random IO, the 3ware cards are better than PERC Question: will 8*15k 73GB SCSI drives outperform 24*7K 320GB SATA II drives? Nope. Not even if the 15K 73GB HDs were the brand new Savvio 15K screamers. Example assuming 3.5" HDs and RAID 10 => 4 15K 73GB vs 12 7.2K 320GB The 15K's are 2x faster rpm, but they are only ~23% the density => advantage per HD to SATAs. Then there's the fact that there are 1.5x as many 7.2K spindles as 15K spindles... Oops make that =3x= as many 7.2K spindles as 15K spindles... I don't think the density difference will be quite as high as you seem to think: most 320GB SATA drives are going to be 3-4 platters, the most that a 73GB SCSI is going to have is 2, and more likely 1, which would make the SCSIs more like 50% the density of the SATAs. Note that this only really makes a difference to theoretical sequential speeds; if the seeks are random the SCSI drives could easily get there 50% faster (lower rotational latency and they certainly will have better actuators for the heads). Individual 15K SCSIs will trounce 7.2K SATAs in terms of i/os per second. true, but with 3x as many drives (and 4x the capacity per drive) the SATA system will have to do far less seeking for that matter, with 20ish 320G drives, how large would a parition be that only used the outer pysical track of each drive? (almost certinly multiple logical tracks) if you took the time to set this up you could eliminate seeking entirely (at the cost of not useing your capacity, but since you are considering a 12x range in capacity, it's obviously not your primary concern) If you care about how often you'll have to replace a failed drive, then the SCSI option no question, although check the cases for hot- swapability. note that the CMU and Google studies both commented on being surprised at the lack of difference between the reliability of SCSI and SATA drives. David Lang ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] What you would consider as heavy traffic?
Depends on what the query is. If the queries take 3 to 5 days to execute, then 1 query per day on a 4 CPU machine would be at capacity. On 23-Dec-06, at 3:12 AM, [EMAIL PROTECTED] wrote: Hey Everyone, I am having a bit of trouble with a web host, and was wondering as what you would class as a high level of traffic to a database (queries per second) to an average server running postgres in a shared hosting environment (very modern servers). Many Thanks in Advance, Oliver ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Backup/Restore too slow
Rebuilding the indexes or integrity confirmations are probably taking most of the time. What is your work_mem setting? On 22-Dec-06, at 9:32 AM, Sebastián Baioni wrote: Hi, We have a database with one table of 10,000,000 tuples and 4 tables with 5,000,000 tuples. While in SQL Server it takes 3 minutes to restore this complete database, in PostgreSQL it takes more than 2 hours. The Backup takes 6 minutes in SQLServer and 13 minutes (which is not a problem) We are running PostgreSQL 8.1 for Windows and we are using: C:\pg_dump.exe -i -h localhost -p 5432 -U usuario -F c -b -v -f "C: \BK\file.backup" base and C:\pg_restore.exe -i -h localhost -p 5432 -U usuario -d base -O -v "C:\BK\file.backup" We use those parameters because we copied them from PGAdminIII. Is there any way to make it faster? Tanks Sebastián __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
Re: [PERFORM] When to vacuum a table?
On 26-Nov-06, at 8:11 AM, Steinar H. Gunderson wrote: On Sun, Nov 26, 2006 at 09:43:11AM -0300, Marcelo Costa wrote: All its tables constantly manipulated (INSERT, UPDATE, DELETE) they need a VACUUM Just a minor clarification here: INSERT does not create dead rows, only UPDATE and DELETE do. Thus, if you only insert rows, you do not need to vacuum (although you probably need to analyze). Not entirely true. An insert & rollback will create dead rows. If you attempt and fail a large number of insert transactions then you will still need to vacuum. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] autovacuum on a -mostly- r/o table
On Wed, 2006-09-27 at 18:08 +0200, Edoardo Ceccarelli wrote: > > I have read that autovacuum cannot check to see pg load before > launching > vacuum but is there any patch about it? that would sort out the > problem > in a good and simple way. In some cases the solution to high load is to vacuum the tables being hit the heaviest -- meaning that simply checking machine load isn't enough to make that decision. In fact, that high load problem is exactly why autovacuum was created in the first place. -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Optimizing DELETE
On Tue, 2006-09-19 at 15:22 +0200, Ivan Voras wrote: > I've just fired off a "DELETE FROM table" command (i.e. unfiltered > DELETE) on a trivially small table but with many foreign key references > (on similar-sized tables), and I'm waiting for it to finish. It's been > 10 minutes now, which seems very excessive for a table of 9000 rows on a > 3 GHz desktop machine. I would guess that a few of those referenced tables are missing indexes on the referenced column. > 'top' says it's all spent in USER time, and there's a ~~500KB/s write > rate going on. Just before this DELETE, I've deleted data from a larger > table (5 rows) using the same method and it finished in couple of > seconds - maybe it's a PostgreSQL bug? > > My question is: assuming it's not a bug, how to optimize DELETEs? > Increasing work_mem maybe? > > (I'm using PostgreSQL 8.1.4 on FreeBSD 6- amd64) > > (I know about TRUNCATE; I need those foreign key references to cascade) > > ---(end of broadcast)--- > TIP 6: explain analyze is your friend > ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] sub select performance due to seq scans
> capsa=# explain analyze select name from capsa.flatomfilesysentry > where objectid in ( select dstobj from capsa.flatommemberrelation > where srcobj = 'c1c7304a-1fe1-11db-8af7-001143214409'); > > >QUERY PLAN > - > Nested Loop IN Join (cost=0.00..1386.45 rows=5809 width=14) (actual > time=2.933..101467.463 rows=5841 loops=1) >Join Filter: ("outer".objectid = "inner".dstobj) >-> Seq Scan on flatomfilesysentry (cost=0.00..368.09 rows=5809 > width=30) (actual time=0.007..23.451 rows=5844 loops=1) >-> Seq Scan on flatommemberrelation (cost=0.00..439.05 rows=5842 > width=16) (actual time=0.007..11.790 rows=2922 loops=5844) A loop for an IN indicates that you are using a very old version of PostgreSQL (7.2 or earlier). Please double check that the server is 8.1.3 as you indicated and not just the client. >From psql: select version(); Hmm... Perhaps it is an 8.1.3 server with mergejoin and hashjoin disabled? show enable_mergejoin; show enable_hashjoin; You can try this query syntax: select name from capsa.flatomfilesysentry join capsa.flatommemberrelation on (objectid = dstobj) where srcobj = 'c1c7304a-1fe1-11db-8af7-001143214409'; > Filter: (srcobj = > 'c1c7304a-1fe1-11db-8af7-001143214409'::capsa_sys.uuid) > Total runtime: 101482.256 ms > (6 rows) > > capsa=# select count(*) from capsa.flatommemberrelation ; > count > --- > 11932 > (1 row) > > capsa=# select count(*) from capsa.flatomfilesysentry ; > count > --- > 5977 > > > > ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] lots of updates on small table
On Thu, 2005-07-14 at 15:08 +1000, Alison Winters wrote: > Hi, > > Our application requires a number of processes to select and update rows > from a very small (<10 rows) Postgres table on a regular and frequent > basis. These processes often run for weeks at a time, but over the Are these long running transactions or is the process issuing many short transactions? If your transaction lasts a week, then a daily vacuum isn't really doing anything. I presume you also run ANALYZE in some shape or form periodically? > space of a few days we find that updates start getting painfully slow. > We are running a full vacuum/analyze and reindex on the table every day, If they're short transactions, run vacuum (not vacuum full) every 100 or so updates. This might even be once a minute. Analyze periodically as well. -- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Mount database on RAM disk?
> If I could get and deploy some SSD (Solid State Disk) devices that > would make this sort of thing *actually safe,* I'd expect that to be a > pretty fabulous improvement, at least for write-heavy database > activity. Not nearly as much as you would expect. For the price of the SSD and a SCSI controller capable of keeping up to the SSD along with your regular storage with enough throughput to keep up to structure IO, you can purchase a pretty good mid-range SAN which will be just as capable and much more versatile. -- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance - moving from oracle to postgresql
> There are some immediate questions from our engineers about performance > > "- Oracle has one particular performance enhancement that Postgres is > missing. If you do a select that returns 100,000 rows in a given order, > and all you want are rows 99101 to 99200, then Oracle can do that very > efficiently. With Postgres, it has to read the first 99200 rows and > then discard the first 99100. But... If we really want to look at > performance, then we ought to put together a set of benchmarks of some > typical tasks." > > Is this accurate: > accoring to > http://www.postgresql.org/docs/8.0/interactive/queries-limit.html > -- " The rows skipped by an OFFSET clause still have to be computed > inside the server; therefore a large OFFSET can be inefficient." Yes. That's accurate. First you need to determine whether PostgreSQLs method is fast enough for that specific query, and if the performance gains for other queries (inserts, updates, delete) from reduced index management evens out your concern. All performance gains through design changes either increase complexity dramatically or have a performance trade-off elsewhere. I find it rather odd that anyone would issue a single one-off select for 0.1% of the data about 99.1% of the way through, without doing anything with the rest. Perhaps you want to take a look at using a CURSOR? > Where is psql not appropriate to replace Oracle? Anything involving reporting using complex aggregates or very long running selects which Oracle can divide amongst multiple CPUs. Well, PostgreSQL can do it if you give it enough time to run the query, but a CUBE in PostgreSQL on a TB sized table would likely take significantly longer to complete. It's mostly just that the Pg developers haven't implemented those features optimally, or at all, yet. -- ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] max_connections / shared_buffers /
> 1.) shared_buffers I see lot of reference to making > this the size of available ram (for the DB). However, > I also read to make it the size of pgdata directory. > 2.) effective_cache_size - from what I read this is > the 'total' allowed memory for postgresql to use > correct? So, if I am willing to allow 1GB of memory > should I make this 1GB? shared_buffers in your case should be about 1. It is not taken on a per connection basis, but is global for that cluster. Perhaps your memory analysis tool is fooling with you? effective_cache_size is what you want to set to the amount of ram that you expect the kernel to use for caching the database information in memory. PostgreSQL will not allocate this memory, but it will make adjustments to the query execution methods (plan) chosen. > 3.) max_connections, been trying to figure 'how' to > determine this #. I've read this is buffer_size+500k > per a connection. > ie. 128mb(buffer) + 500kb = 128.5mb per connection? Max connections is the number of connections to the database you intend to allow. Shared_buffers must be of a certain minimum size to have that number of connections, but the 10k number above should cover any reasonable configurations. > work_mem and max_stack_depth set to 4096 > maintenance_work_mem set to 64mb Sort_mem and vacuum_mem became work_mem and maintenance_work_mem as those terms better indicate what they really do. > Thanks for any help on this. I'm sure bombardment of > newbies gets old :) That's alright. We only request that once you have things figured out that you, at your leisure, help out a few others. -- ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] How does the transaction buffer work?
> Now of course this isn't Nirvana, you must pay somewhere ;-) and our > weak spot is the need for VACUUM. But you have no need to fear large > individual transactions. No need to fear long running transactions other than their ability to stop VACUUM from doing what it's supposed to be doing, thus possibly impacting performance. -- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Updates on large tables are extremely slow
> Ok, if all 21 are affected, I can understand the problem. > But allow me to say that this is a "functional error" It's a choice between total throughput on a high load, high connection basis (MVCC dramatically wins here), versus a single user, low load scenario (MS Access is designed for this). Believe me when I say that a lot of people have spent a lot of time explicitly making the system work that way. > On 13 Jun 2005, at 18:02, Richard Huxton wrote: > > Yves Vindevogel wrote: > I forgot cc > Begin forwarded message: > From: Yves Vindevogel > <[EMAIL PROTECTED]> > Date: Mon 13 Jun 2005 17:45:19 CEST > To: Tom Lane <[EMAIL PROTECTED]> > Subject: Re: [PERFORM] Updates on large tables > are extremely slow > > Yes, but if I update one column, why should PG > update 21 indexes ? > There's only one index affected ! > > No - all 21 are affected. MVCC creates a new row on disk. > > -- > Richard Huxton > Archonet Ltd > > > Met vriendelijke groeten, > Bien à vous, > Kind regards, > > Yves Vindevogel > Implements > > > > __ > > > > Mail: [EMAIL PROTECTED] - Mobile: +32 (478) 80 82 91 > > Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76 > > Web: http://www.implements.be > > First they ignore you. Then they laugh at you. Then they fight you. > Then you win. > Mahatma Ghandi. > ---(end of broadcast)--- > TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faq -- ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Need help to decide Mysql vs Postgres
On Mon, 2005-06-06 at 12:00 -0400, Amit V Shah wrote: > Hi all, > > Thanks for your replies. > > I ran a very prelimnary test, and found following results. I feel they are > wierd and I dont know what I am doing wrong !!! > > I made a schema with 5 tables. I have a master data table with foreign keys > pointing to other 4 tables. Master data table has around 4 million records. > When I run a select joining it with the baby tables, > > postgres -> returns results in 2.8 seconds > mysql -> takes around 16 seconds (This is with myisam ... with innodb > it takes 220 seconds) We said MySQL was faster for simple selects and non-transaction inserts on a limited number of connections. Assuming you rebuilt statistics in MySQL (myisamchk -a), I would presume that PostgreSQLs more mature optimizer has come into play in the above 5 table join test by finding a better (faster) way of executing the query. If you post EXPLAIN ANALYZE output for the queries, we might be able to tell you what they did differently. > I am all for postgres at this point, however just want to know why I am > getting opposite results !!! Both DBs are on the same machine If possible, it would be wise to run a performance test with the expected load you will receive. If you expect to have 10 clients perform operation X at a time, then benchmark that specific scenario. Both PostgreSQL and MySQL will perform differently in a typical real load situation than with a single user, single query situation. > -Original Message- > From: Jeffrey Tenny [mailto:[EMAIL PROTECTED] > Sent: Monday, June 06, 2005 11:51 AM > To: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Need help to decide Mysql vs Postgres > > > Re: your JDBC wishes: Consider IBM Cloudscape (now Apache Derby) too, > which has an apache license. It's all pure java and it's easy to get going. > > > As to MySql vs Postgres: license issues aside, if you have > transactionally complex needs (multi-table updates, etc), PostgreSQL > wins hands down in my experience. There are a bunch of things about > MySQL that just suck for high end SQL needs. (I like my subqueries, > and I absolutely demand transactional integrity). > > There are some pitfalls to pgsql though, especially for existing SQL > code using MAX and some other things which can really be blindsided > (performance-wise) by pgsql if you don't use the workarounds. > > > MySQL is nice for what I call "raw read speed" applications. But that > license is an issue for me, as it is for you apparently. > > > Some cloudscape info: > http://www-306.ibm.com/software/data/cloudscape/ > > Some info on pitfalls of MySQL and PostgreSQL, an interesting contrast: > http://sql-info.de/postgresql/postgres-gotchas.html > http://sql-info.de/mysql/gotchas.html > > > ---(end of broadcast)--- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > > > ---(end of broadcast)--- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > -- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Prefetch
On Wed, 2005-05-11 at 12:53 +0800, Christopher Kings-Lynne wrote: > > Another trick you can use with large data sets like this when you want > > results > > back in seconds is to have regularly updated tables that aggregate the data > > along each column normally aggregated against the main data set. > > > Maybe some bright person will prove me wrong by posting some working > > information about how to get these apparently absent features working. > > Most people just use simple triggers to maintain aggregate summary tables... Agreed. I've also got a view which calls a function that will 1) use the summary table where data exists, or 2) calculate the summary information, load it into summary table, and send a copy to the client (partial query results cache). It's not all nicely abstracted behind user friendly syntax, but most of those features can be cobbled together (with effort) in PostgreSQL. -- ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Prefetch
> I've done other things that make sense, like using indexes, playing with the > planner constants and turning up the postgres cache buffers. After you load the new days data try running CLUSTER on the structure using a key of (stockID, date) -- probably your primary key. This should significantly reduce the amount of IO required for your calculations involving a few stocks over a period of time. -- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Final decision
> > I did have a question if any folks are using two servers one for > reporting and one for data entry what system should be the beefier? Yeah. We started putting up slaves for reporting purposes and application specific areas using Slony replicating partial data sets to various locations -- some for reporting. If your reports have a long runtime and don't require transactional safety for writes (daily summary written or results aren't recorded in the DB at all) this is probably something to consider. I understand that PGAdmin makes Slony fairly painless to setup, but it can be time consuming to get going and Slony can add new complications depending on the data size and what you're doing with it -- but they're working hard to reduce the impact of those complications. > I have a 2proc machine I will be using and I can either put Sears off > by themselves on this machine or split up functionality and have one > for reporting and one for inserts and updates; so not sure which > machine would be best for which spot (reminder the more robust is a > 4proc with 8 gigs and 2 proc is 4 gigs, both dells). > > > > Thank you for any ideas in this arena. > > > > Joel Fradkin > > > > > > > > > -- ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Disk filling, CPU filling, renegade inserts and
> I'm having a pretty serious problem with postgresql's performance. > Currently, I have a cron task that is set to restart and vacuumdb -faz > every six hours. If that doesn't happen, the disk goes from 10% full > to 95% full within 2 days (and it's a 90GB disk...with the database > being a 2MB download after dump), and the CPU goes from running at > around a 2% load to a 99+% load right away (the stats look like a > square wave). Are you running frequent queries which use temporary tables? -- ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Opteron vs Xeon (Was: What to do with 6 disks?)
> The HPs are at root pretty good machines -- and take 6 drives, so I expect > you're mixed up there. However, they use HP's proprietary RAID controller > which is seriously defective. So you need to factor replacing the RAID > controller into the cost. Do you have any additional materials on what is defective with their raid controllers? -- ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Spend 7K *WHERE*? WAS Intel SRCS16 SATA raid? and How
On Fri, 2005-04-15 at 15:43 -0500, Matthew Nuzum wrote: > I think there are many people who feel that $7,000 is a good budget for a > database server, me being one. The budget for a database server is usually some %age of the value of the data within the database or the value of it's availability. Big budget hardware (well, going from $7k to $100k) often brings more redundancy and reliability improvement than performance improvement. If you're going to lose $100k in business because the database was unavailable for 12 hours, then kick $75k into the hardware and call a profit of $25k over 3 years (hardware lifetime is 3 years, catastrophic failure happens once every 3 or so years...). Ditto for backup systems. If the company depends on the data in the database for it's survival, where bankruptcy or worse would happen as a result of complete dataloss, then it would be a good idea to invest a significant amount of the companies revenue into making damn sure that doesn't happen. Call it an insurance policy. Performance for me dictates which hardware is purchased and configuration is used within $BUDGET, but $BUDGET itself is nearly always defined by the value of the data stored. > * I agree with the threads that more disks are better. > * I also agree that SCSI is better, but can be hard to justify if your > budget is tight, and I have great certainty that 2x SATA drives on a good > controller is better than x SCSI drives for many work loads. > * I also feel that good database design and proper maintenance can be one > of the single biggest performance enhancers available. This can be labor > intensive, however, and sometimes throwing more hardware at a problem is > cheaper than restructuring a db. > > Either way, having a good hardware platform is an excellent place to start, > as much of your tuning will depend on certain aspects of your hardware. > > So if you need a db server, and you have $7k to spend, I'd say spend it. > >From this list, I've gathered that I/O and RAM are your two most important > investments. > > Once you get that figured out, you can still do some performance tuning on > your new server using the excellent advice from this mailing list. > > By the way, for all those who make this list work, I've rarely found such a > thorough, helpful and considerate group of people as these on the > performance list. > -- ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Réf
On Wed, 2005-04-06 at 14:40 -0400, Alex Turner wrote: > I think his point was that 9 * 4 != 2400 Oh.. heh.. I didn't even notice that. Can I pretend I did it in my head using HEX math and that it wasn't a mistake? > On Apr 6, 2005 2:23 PM, Rod Taylor <[EMAIL PROTECTED]> wrote: > > On Wed, 2005-04-06 at 19:42 +0200, Steinar H. Gunderson wrote: > > > On Wed, Apr 06, 2005 at 01:18:29PM -0400, Rod Taylor wrote: > > > > Yeah, I think that can be done provided there is more than one worker. > > > > My limit seems to be about 1000 transactions per second each with a > > > > single insert for a single process (round trip time down the Fibre > > > > Channel is large) but running 4 simultaneously only drops throughput to > > > > about 900 per process (total of 2400 transactions per second) and the > > > > machine still seemed to have lots of oomph to spare. > > > > > > Erm, have I missed something here? 900 * 4 = 2400? > > > > Nope. You've not missed anything. > > > > If I ran 10 processes and the requirement would be met. > > -- > > > > ---(end of broadcast)--- > > TIP 4: Don't 'kill -9' the postmaster > > > -- ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Réf
On Wed, 2005-04-06 at 19:42 +0200, Steinar H. Gunderson wrote: > On Wed, Apr 06, 2005 at 01:18:29PM -0400, Rod Taylor wrote: > > Yeah, I think that can be done provided there is more than one worker. > > My limit seems to be about 1000 transactions per second each with a > > single insert for a single process (round trip time down the Fibre > > Channel is large) but running 4 simultaneously only drops throughput to > > about 900 per process (total of 2400 transactions per second) and the > > machine still seemed to have lots of oomph to spare. > > Erm, have I missed something here? 900 * 4 = 2400? Nope. You've not missed anything. If I ran 10 processes and the requirement would be met. -- ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: Réf. : Re: RE : RE: [PERFORM] Postgresql
On Wed, 2005-04-06 at 19:08 +0200, [EMAIL PROTECTED] wrote: > > On our production server, I can insert 5000 tuples in 2100 ms. > > Single Xeon 2.6 Ghz > 2 Gigs ram > 3ware RAID 5 SATA drives array, 3 drives only :-(( > PG 8.0 - fsync off > > I do think inserting 5000 tuples in a second (i.e 5000 insert > transactions, no bulk load) can be reached with well a configured SCSI > RAID 10 array. Yeah, I think that can be done provided there is more than one worker. My limit seems to be about 1000 transactions per second each with a single insert for a single process (round trip time down the Fibre Channel is large) but running 4 simultaneously only drops throughput to about 900 per process (total of 2400 transactions per second) and the machine still seemed to have lots of oomph to spare. Also worth noting is that this test was performed on a machine which as a noise floor receives about 200 queries per second, which it was serving during the test. > Is pgcluster worth giving a try and can it be trusted for in a > production environnement ? > Will it be possible to get a sort of real-time application ? >From the design of pgcluster it looks like it adds in a significant amount of additional communication so expect your throughput for a single process to drop through the floor. -- ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this
On Wed, 2005-04-06 at 16:12 +, Mohan, Ross wrote: > I wish I had a Dell system and run case to show you Alex, but I don't... > however...using Oracle's "direct path" feature, it's pretty straightforward. > > We've done 110,000 rows per second into index-less tables on a big system > (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a second. > Sustained > for almost 9 minutes. ) Just for kicks I did a local test on a desktop machine (single CPU, single IDE drive) using COPY from STDIN for a set of integers in via a single transaction, no indexes. 1572864 tuples were loaded in 13715.613ms, which is approx 115k rows per second. Okay, no checkpoints and I didn't cross an index boundary, but I also haven't tuned the config file beyond bumping up the buffers. Lets try again with more data this time. 31Million tuples were loaded in approx 279 seconds, or approx 112k rows per second. > I'd love to see PG get into this range..i am a big fan of PG (just a > rank newbie) but I gotta think the underlying code to do this has > to be not-too-complex. I'd say we're there. > -Original Message- > From: Alex Turner [mailto:[EMAIL PROTECTED] > Sent: Wednesday, April 06, 2005 11:38 AM > To: [EMAIL PROTECTED] > Cc: pgsql-performance@postgresql.org; Mohan, Ross > Subject: Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this application ? > > > I think everyone was scared off by the 5000 inserts per second number. > > I've never seen even Oracle do this on a top end Dell system with copious > SCSI attached storage. > > Alex Turner > netEconomist > > On Apr 6, 2005 3:17 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > > Unfortunately. > > > > But we are in the the process to choose Postgresql with pgcluster. I'm > > currently running some tests (performance, stability...) Save the > > money on the license fees, you get it for your hardware ;-) > > > > I still welcome any advices or comments and I'll let you know how the > > project is going on. > > > > Benjamin. > > > > > > > > "Mohan, Ross" <[EMAIL PROTECTED]> > > > > 05/04/2005 20:48 > > > > Pour :<[EMAIL PROTECTED]> > > cc : > > Objet :RE: [PERFORM] Postgresql vs SQLserver for this > > application ? > > > > > > You never got answers on this? Apologies, I don't have one, but'd be > > curious to hear about any you did get > > > > thx > > > > Ross > > > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf > > Of [EMAIL PROTECTED] > > Sent: Monday, April 04, 2005 4:02 AM > > To: pgsql-performance@postgresql.org > > Subject: [PERFORM] Postgresql vs SQLserver for this application ? > > > > > > hi all. > > > > We are designing a quite big application that requires a > > high-performance database backend. The rates we need to obtain are at > > least 5000 inserts per second and 15 selects per second for one > > connection. There should only be 3 or 4 simultaneous connections. > > I think our main concern is to deal with the constant flow of data coming > > from the inserts that must be available for selection as fast as possible. > > (kind of real time access ...) > > > > As a consequence, the database should rapidly increase up to more > > than one hundred gigs. We still have to determine how and when we > > shoud backup old data to prevent the application from a performance > > drop. We intend to develop some kind of real-time partionning on our > > main table keep the flows up. > > > > At first, we were planning to use SQL Server as it has features that > > in my opinion could help us a lot : > > - replication > > - clustering > > > > Recently we started to study Postgresql as a solution for our project : > > - it also has replication > > - Postgis module can handle geographic datatypes (which would > > facilitate our developments) > > - We do have a strong knowledge on Postgresql administration > > (we use it for production processes) > > - it is free (!) and we could save money for hardware > > purchase. > > > > Is SQL server clustering a real asset ? How reliable are Postgresql > > replication tools ? Should I trust Postgresql performance for this > > kind of needs ? > > > > My question is a bit fuzzy but any advices are most welcome... > > hardware,tuning or design tips as well :)) > > > > Thanks a lot. > > > > Benjamin. > > > > > > > > ---(end of broadcast)--- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > -- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] What needs to be done for real Partitioning?
On Sun, 2005-03-20 at 00:29 -0400, Alvaro Herrera wrote: > On Sat, Mar 19, 2005 at 07:05:53PM -0500, Tom Lane wrote: > > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > > We probably also need multi-table indexes. > > > > As Josh says, that seems antithetical to the main point of partitioning, > > which is to be able to rapidly remove (and add) partitions of a table. > > If you have to do index cleaning before you can drop a partition, what's > > the point of partitioning? > > Hmm. You are right, but without that we won't be able to enforce > uniqueness on the partitioned table (we could only enforce it on each > partition, which would mean we can't partition on anything else than > primary keys if the tables have one). IMHO this is something to > consider. Could uniqueness across partitions be checked for using a mechanism similar to what a deferred unique constraint would use (trigger / index combination)? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Speeding up select distinct
On Wed, 2005-03-16 at 19:31 +0100, Laurent Martelli wrote: > >>>>> "Rod" == Rod Taylor <[EMAIL PROTECTED]> writes: > > Rod> On Wed, 2005-03-16 at 18:58 +0100, Laurent Martelli wrote: > >> Consider this query: > >> > >> SELECT distinct owner from pictures; > > Rod> The performance has nothing to do with the number of rows > Rod> returned, but rather the complexity of calculations and amount > Rod> of data to sift through in order to find it. > > Yes, but I thought that an index might be able to know what distinct > values there are and help optime that query very much. The index does know. You just have to visit all of the pages within the index to find out, which it does, and that's why you dropped 10ms. But if you want a sub ms query, you're going to have to normalize the structure. -- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Speeding up select distinct
On Wed, 2005-03-16 at 18:58 +0100, Laurent Martelli wrote: > Consider this query: > > SELECT distinct owner from pictures; The performance has nothing to do with the number of rows returned, but rather the complexity of calculations and amount of data to sift through in order to find it. > Any ideas, apart from more or less manually maintaining a list of > distinct owners in another table ? This would be the proper thing to do, along with adding a foreign key from pictures to the new owner structure for integrity enforcement. -- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance problem on delete from for 10k rows.
> I get this problem on my dev (Windows/7.4/Cygwin) environment. But now > I see that it's also have this problem on my production env. Yes I > tought I was maybe just a cygwin/Windows problem .. apparently not :- Care to try again with logging enabled on the PostgreSQL side within the development environment? log_statement = true log_duration = true log_connections = on Then run it via Java and from pgAdminIII and send us the two log snippets as attachments? Thanks. -- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
On Wed, 2005-02-23 at 15:26 -0300, Bruno Almeida do Lago wrote: > Is there a real limit for max_connections? Here we've an Oracle server with > up to 1200 simultaneous conections over it! If you can reduce them by using something like pgpool between PostgreSQL and the client, you'll save some headache. PostgreSQL did not perform as well with a large number of idle connections and it does otherwise (last time I tested was 7.4 though -- perhaps it's better now). The kernel also starts to play a significant role with a high number of connections. Some operating systems don't perform as well with a high number of processes (process handling, scheduling, file handles, etc.). I think you can do it without any technical issues, but you will probably be happier with the result if you can hide idle connections from the database machine. -- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] seq scan cache vs. index cache smackdown
> My concern is that this kind of testing has very little relevance to the > real world of multiuser processing where contention for the cache becomes an > issue. It may be that, at least in the current situation, postgres is > giving too much weight to seq scans based on single user, straight line To be fair, a large index scan can easily throw the buffers out of whack as well. An index scan on 0.1% of a table with 1 billion tuples will have a similar impact to buffers as a sequential scan of a table with 1 million tuples. Any solution fixing buffers should probably not take into consideration the method being performed (do you really want to skip caching a sequential scan of a 2 tuple table because it didn't use an index) but the volume of data involved as compared to the size of the cache. I've often wondered if a single 1GB toasted tuple could wipe out the buffers. I would suppose that toast doesn't bypass them. -- Rod Taylor <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] VACCUM FULL ANALYZE PROBLEM
On Tue, 2005-02-15 at 09:34 +0800, Michael Ryan S. Puncia wrote: > Hi, > > > > I have 3 tables in the database with 80G of data, one of them is > almost 40G and the remaining 2 tables has 20G each. > > We use this database mainly for query and updating is done only > quarterly and the database perform well. My problem > > is after updating and then run VACCUM FULL ANALYZE vacuuming the > tables takes days to complete. I hope someone I suspect the VACUUM FULL is the painful part. Try running CLUSTER on the table or changing a column type (in 8.0) instead. -- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance Tuning
On Wed, 2005-02-09 at 15:01 -0500, Chris Kratz wrote: > Hello All, > > In contrast to what we hear from most others on this list, we find our > database servers are mostly CPU bound. We are wondering if this is because > we have postgres configured incorrectly in some way, or if we really need > more powerfull processor(s) to gain more performance from postgres. Not necessarily. I had a very disk bound system, bought a bunch of higher end equipment (which focuses on IO) and now have a (faster) but CPU bound system. It's just the way the cookie crumbles. Some things to watch for are large calculations which are easy to move client side, such as queries that sort for display purposes. Or data types which aren't really required (using numeric where an integer would do). > We continue to tune our individual queries where we can, but it seems we > still > are waiting on the db a lot in our app. When we run most queries, top shows > the postmaster running at 90%+ constantly during the duration of the request. > Is this for the duration of a single request or 90% constantly? If it's a single request, odds are you're going through much more information than you need to. Lots of aggregate work (max / min) perhaps or count(*)'s where an approximation would do? > Our question is simply this, is it better to invest in a faster processor at > this point, or are there configuration changes to make it faster? I've done If it's for a single request, you cannot get single processors which are much faster than what you describe as having. Want to send us a few EXPLAIN ANALYZE's of your longer running queries? -- Rod Taylor <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On Sat, 2005-01-22 at 12:41 -0600, Bruno Wolff III wrote: > On Sat, Jan 22, 2005 at 12:13:00 +0900, > Tatsuo Ishii <[EMAIL PROTECTED]> wrote: > > > > Probably VACUUM works well for small to medium size tables, but not > > for huge ones. I'm considering about to implement "on the spot > > salvaging dead tuples". > > You are probably vacuuming too often. You want to wait until a significant > fraction of a large table is dead tuples before doing a vacuum. If you are > scanning a large table and only marking a few tuples as deleted, you aren't > getting much bang for your buck. The big problem occurs when you have a small set of hot tuples within a large table. In the time it takes to vacuum a table with 200M tuples one can update a small subset of that table many many times. Some special purpose vacuum which can target hot spots would be great, but I've always assumed this would come in the form of table partitioning and the ability to vacuum different partitions independently of each-other. -- ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
On Thu, 2005-01-20 at 15:36 +0100, Hervé Piedvache wrote: > Le Jeudi 20 Janvier 2005 15:24, Christopher Kings-Lynne a écrit : > > > Is there any solution with PostgreSQL matching these needs ... ? > > > > You want: http://www.slony.info/ > > > > > Do we have to backport our development to MySQL for this kind of problem > > > ? Is there any other solution than a Cluster for our problem ? > > > > Well, Slony does replication which is basically what you want :) > > > > Only master->slave though, so you will need to have all inserts go via > > the master server, but selects can come off any server. > > Sorry but I don't agree with this ... Slony is a replication solution ... I > don't need replication ... what will I do when my database will grow up to 50 > Gb ... I'll need more than 50 Gb of RAM on each server ??? > This solution is not very realistic for me ... Slony has some other issues with databases > 200GB in size as well (well, it hates long running transactions -- and pg_dump is a regular long running transaction) However, you don't need RAM one each server for this, you simply need enough disk space. Have a Master which takes writes, a "replicator" which you can consider to be a hot-backup of the master, have N slaves replicate off of the otherwise untouched "replicator" machine. For your next trick, have the application send read requests for Clients A-C to slave 1, D-F to slave 2, ... You need enough memory to hold the index sections for clients A-C on slave 1. The rest of the index can remain on disk. It's available should it be required (D-F box crashed, so your application is now feeding those read requests to the A-C machine)... Go to more slaves and smaller segments as you require. Use the absolute cheapest hardware you can find for the slaves that gives reasonable performance. They don't need to be reliable, so RAID 0 on IDE drives is perfectly acceptable. PostgreSQL can do the replication portion quite nicely. You need to implement the "cluster" part in the application side. -- ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Low Performance for big hospital server ..
On Thu, 2005-01-06 at 12:35 -0500, Dave Cramer wrote: > Reading can be worse for a normalized db, which is likely what the > developers were concerned about. To a point. Once you have enough data that you start running out of space in memory then normalization starts to rapidly gain ground again because it's often smaller in size and won't hit the disk as much. Moral of the story is don't tune with a smaller database than you expect to have. > Frank Wiles wrote: > > >On Thu, 6 Jan 2005 09:06:55 -0800 > >Josh Berkus wrote: > > > > > > > >>I can't tell you how many times I've seen this sort of thing. And > >>the developers always tell me "Well, we denormalized for performance > >>reasons ... " > >> > >> > > > > Now that's rich. I don't think I've ever seen a database perform > > worse after it was normalized. In fact, I can't even think of a > > situation where it could! > > > > - > > Frank Wiles <[EMAIL PROTECTED]> > > http://www.wiles.org > > - > > > > > >---(end of broadcast)--- > >TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to [EMAIL PROTECTED] so that your > > message can get through to the mailing list cleanly > > > > > > > > > -- ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] query rewrite using materialized views
On Tue, 2005-01-04 at 13:26 -0600, Wager, Ryan D [NTK] wrote: > Rod, > I do this, PG gets forked many times, it is tough to find the max > number of times I can do this, but I have a Proc::Queue Manager Perl > driver that handles all of the copy calls. I have a quad CPU machine. > Each COPY only hits ones CPU for like 2.1% but anything over about 5 > kicks the load avg up. Sounds like disk IO is slowing down the copy then. > Ill get some explain analysis and table structures out there pronto. > > -Original Message- > From: Rod Taylor [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 04, 2005 1:02 PM > To: Wager, Ryan D [NTK] > Cc: Postgresql Performance > Subject: Re: [PERFORM] query rewrite using materialized views > > > 1)the 250 million records are currently whipped and reinserted as a > > "daily snapshot" and the fastest way I have found "COPY" to do this > from > > a file is no where near fast enough to do this. SQL*Loader from > Oracle > > does some things that I need, ie Direct Path to the db files access > > (skipping the RDBMS), inherently ignoring indexing rules and saving a > > ton of time (Dropping the index, COPY'ing 250 million records, then > > Recreating the index just takes way too long). > > If you have the hardware for it, instead of doing 1 copy, do 1 copy > command per CPU (until your IO is maxed out anyway) and divide the work > amongst them. I can push through 100MB/sec using methods like this -- > which makes loading 100GB of data much faster. > > Ditto for indexes. Don't create a single index on one CPU and wait -- > send off one index creation command per CPU. > > > 2)Finding a way to keep this many records in a fashion that can be > > easily queried. I even tried breaking it up into almost 2800 separate > > tables, basically views of the data pre-broken down, if this is a > > working method it can be done this way, but when I tried it, VACUUM, > and > > the COPY's all seemed to slow down extremely. > > Can you send us EXPLAIN ANALYSE output for the slow selects and a little > insight into what your doing? A basic table structure, and indexes > involved would be handy. You may change column and table names if you > like. > > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of Josh > Berkus > > Sent: Tuesday, January 04, 2005 12:06 PM > > To: pgsql-performance@postgresql.org > > Cc: Yann Michel > > Subject: Re: [PERFORM] query rewrite using materialized views > > > > Yann, > > > > > are there any plans for rewriting queries to preexisting > materialized > > > views? I mean, rewrite a query (within the optimizer) to use a > > > materialized view instead of the originating table? > > > > Automatically, and by default, no. Using the RULES system? Yes, you > > can > > already do this and the folks on the MattView project on pgFoundry are > > > working to make it easier. > > -- ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] query rewrite using materialized views
> 1)the 250 million records are currently whipped and reinserted as a > "daily snapshot" and the fastest way I have found "COPY" to do this from > a file is no where near fast enough to do this. SQL*Loader from Oracle > does some things that I need, ie Direct Path to the db files access > (skipping the RDBMS), inherently ignoring indexing rules and saving a > ton of time (Dropping the index, COPY'ing 250 million records, then > Recreating the index just takes way too long). If you have the hardware for it, instead of doing 1 copy, do 1 copy command per CPU (until your IO is maxed out anyway) and divide the work amongst them. I can push through 100MB/sec using methods like this -- which makes loading 100GB of data much faster. Ditto for indexes. Don't create a single index on one CPU and wait -- send off one index creation command per CPU. > 2)Finding a way to keep this many records in a fashion that can be > easily queried. I even tried breaking it up into almost 2800 separate > tables, basically views of the data pre-broken down, if this is a > working method it can be done this way, but when I tried it, VACUUM, and > the COPY's all seemed to slow down extremely. Can you send us EXPLAIN ANALYSE output for the slow selects and a little insight into what your doing? A basic table structure, and indexes involved would be handy. You may change column and table names if you like. > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Josh Berkus > Sent: Tuesday, January 04, 2005 12:06 PM > To: pgsql-performance@postgresql.org > Cc: Yann Michel > Subject: Re: [PERFORM] query rewrite using materialized views > > Yann, > > > are there any plans for rewriting queries to preexisting materialized > > views? I mean, rewrite a query (within the optimizer) to use a > > materialized view instead of the originating table? > > Automatically, and by default, no. Using the RULES system? Yes, you > can > already do this and the folks on the MattView project on pgFoundry are > working to make it easier. > -- ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Speeding up pg_dump
Are there any tricks to speeding up pg_dump aside from doing them from a replicated machine? I'm using -Fc with no compression. -- ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] VACUUM ANALYZE downgrades performance
On Thu, 2004-12-02 at 17:07 +0100, Dmitry Karasik wrote: > Hi Thomas! > > Thomas> Look at the ACTUAL TIME. It dropped from 0.029ms (using the index > Thomas> scan) to 0.009ms (using a sequential scan.) > > Thomas> Index scans are not always faster, and the planner/optimizer knows > Thomas> this. VACUUM ANALYZE is best run when a large proportion of data > Thomas> has been updated/loaded or in the off hours to refresh the > Thomas> statistics on large datasets. > > While I agree that generally this is true, look how stupid this > behavior looks in this particular case: A developer creates a table > and index, knowing that the table will be large and will be intensively > used. An admin runs 'VACUUM ANALYZE' when table is occasionally empty, > and next, say, 1 day, until another 'VACUUM ANALYZE' starts, the index > is simply not used! Sure you don't suppose to run 'VACUUM ANALYZE' every > 5 minutes as a solution, right? You might want to try this on the next 8.0 beta to come out, or against CVS. Tom recently applied some changes which should mitigate this situation. -- ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] time to stop tuning?
> On limiting the client side connections: we've been gradually pushing up > the client-side connection pool and threads, and have seen steady > improvement in our throughput up to the current barrier we have reached. Very well.. Sometimes more simultaneous workers helps, other times it hinders. > I've attached the plans for the 4 queries that represent ~35% of our > load. These are run against the same dataset, but without any other > load. Another big query basically requires a test to be runnning because Those aren't likely from your production system as there isn't any data in those tables and the queries took less than 1ms. -- Rod Taylor <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] time to stop tuning?
On Fri, 2004-11-26 at 12:13 -0500, David Parker wrote: > > I suspect the ultimate answer to our problem will be: > >1) aggressive client-side caching >2) SQL tuning >3) more backend hardware #0 might actually be using connection pooling and using cached query plans (PREPARE), disabling the statistics daemon, etc. For the plans, send us EXPLAIN ANALYZE output for each of the common queries. If you can try it, I'd give a try at FreeBSD or a newer Linux on your system instead of Solaris. Older versions of Solaris had not received the same amount of attention for Intel hardware as the BSDs and Linux have and I would imagine (having not tested it recently) that this is still true for 32bit Intel. Another interesting test might be to limit the number of simultaneous connections to 8 instead of 30 (client side connection retry) after client side connection pooling via pgpool or similar has been installed. Please report back with your findings. -- Rod Taylor <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Data type to use for primary key
On Mon, 2004-11-22 at 16:54 -0800, Josh Berkus wrote: > Alexandre, > > > What is the common approach? Should I use directly the product_code as > > my ID, or use a sequantial number for speed? (I did the same for the > > company_id, this is a 'serial' and not the shor name of the customer. > > I just don't know what is usually done. > > Don't use SERIAL just because it's there.Ideally, you *want* to use the > product_code if you can. It's your natural key and a natural key is always > superior to a surrogate key all other things being equal. It would be nice if PostgreSQL had some form of transparent surrogate keying in the background which would automatically run around and replace your real data with SERIAL integers. It could use a lookup table for conversions between the surrogate and real values so the user never knows that it's done, a bit like ENUM. Then we could all use the real values with no performance issues for 1) because it's an integer in the background, and 2) because a cascade only touches a single tuple in the lookup table. -- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Tuning suggestions wanted
I'm looking for suggestions on tuning Solaris 9 for a SunFire 890 (Ultra IV chips) connected to an Hitachi 9500V running PostgreSQL 7.4. So that I don't lead people in a direction, I'll hold off for a while before posting our configuration settings. Database is approx 160GB in size with a churn of around 4GB per day (2 GB updated, 2GB inserted, very little removed). It's a mixture of OLTP and reporting. 5% is reports which do trickle writes 95% is short (30 second or less) transactions with about 10 selects, 10 writes (inserts, updates, deletes all mixed in) affecting 150 tuples. Thanks for any tips -- particularly Solaris kernel tuning. -- Rod Taylor <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Solaris 9 Tuning Tips requested
I'm looking for suggestions on tuning Solaris 9 for a SunFire 890 (Ultra IV chips) connected to an Hitachi 9500V running PostgreSQL 7.4. Database is approx 160GB in size with a churn of around 4GB per day (2 GB updated, 2GB inserted, very little removed). It's a mixture of OLTP and reporting. 5% is reports which do trickle writes 95% is short (30 second or less) transactions with about 10 selects, 10 writes (inserts, updates, deletes all mixed in) affecting 150 tuples. Thanks for any tips -- particularly Solaris kernel tuning or oddities in Disk IO or configuration settings as they related to Solaris (as they differ from an Intel). -- ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Strange (?) Index behavior?
> It seems to me that a query saying "SELECT column FROM table WHERE > column LIKE 'AA%';" should be just as fast or very close to the first > case up above. However, explain tells me that this query is not using > the index above, which is what's not making sense to me. It looks for an exact expression match, and doesn't know about values which are equal. You can provide both clauses. WHERE column LIKE 'A%' and column LIKE 'AA%'; ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Better Hardware, worst Results
On Thu, 2004-11-04 at 17:42, [EMAIL PROTECTED] wrote: > Citando Rod Taylor <[EMAIL PROTECTED]>: > > Please send an explain analyze from both. > I'm sendin three explains. In the first the Dell machine didn't use existing > indexes, so I turn enable_seqscan off (this is the second explain). The total > cost decreased, but the total time not. The third explain refers to the cheaper > (and faster) machine. The last thing is the query itself. All 3 plans have crappy estimates. Run ANALYZE in production, then send another explain analyze (as an attachment please, to avoid linewrap). ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Better Hardware, worst Results
On Thu, 2004-11-04 at 16:06, Alvaro Nunes Melo wrote: > Hi, > > I have a very tricky situation here. A client bought a Dell dual-machine > to be used as Database Server, and we have a cheaper machine used in > development. With identical databases, configuration parameters and > running the same query, our machine is almost 3x faster. Please send an explain analyze from both. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Performance Anomalies in 7.4.5
On Thu, 2004-10-28 at 12:31, Tom Lane wrote: > Josh Berkus <[EMAIL PROTECTED]> writes: > >> One drawback to this is that it would require an additional lseek per table > >> while planning, but that doesn't seem like a huge penalty. > > > Hmmm ... would the additional lseek take longer for larger tables, or would it > > be a fixed cost? > > Should be pretty much a fixed cost: one kernel call per table. Is this something that the bgwriter could periodically do and share the data? Possibly in the future it could even force a function or prepared statement recompile if the data has changed significantly? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
On Tue, 2004-10-26 at 13:42, Anjan Dave wrote: > It probably is locking issue. I got a long list of locks held when we ran select * > from pg_locks during a peak time. > > relation | database | transaction | pid | mode | granted > --+--+-+---+--+- > 17239 |17142 | | 3856 | AccessShareLock | t How many have granted = false? > Vmstat would show a lot of disk IO at the same time. > > Is this pointing towards a disk IO issue? Not necessarily. Is your IO reaching the limit or is it just heavy? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
On Mon, 2004-10-25 at 16:53, Anjan Dave wrote: > Hi, > > > > I am dealing with an app here that uses pg to handle a few thousand > concurrent web users. It seems that under heavy load, the INSERT and > UPDATE statements to one or two specific tables keep queuing up, to > the count of 150+ (one table has about 432K rows, other has about > 2.6Million rows), resulting in ʽwaitʼs for other queries, and then This isn't an index issue, it's a locking issue. Sounds like you have a bunch of inserts and updates hitting the same rows over and over again. Eliminate that contention point, and you will have solved your problem. Free free to describe the processes involved, and we can help you do that. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Insert performance, what should I expect?
On Wed, 2004-10-20 at 12:45, Robert Creager wrote: > When grilled further on (Tue, 19 Oct 2004 22:12:28 -0400), > Rod Taylor <[EMAIL PROTECTED]> confessed: > > > > I've done some manual benchmarking running my script 'time script.pl' > > > I realise my script uses some of the time, bench marking shows that > > > %50 of the time is spent in dbd:execute. > > > > > 1) Drop DBD::Pg and switch to the Pg driver for Perl instead (non-DBI > > compliant) which has functions similar to putline() that allow COPY to > > be used. > > COPY can be used with DBD::Pg, per a script I use: > > $dbh->do( "COPY temp_obs_$band ( $col_list ) FROM stdin" ); > $dbh->func( join ( "\t", @data ) . "\n", 'putline' ); > $dbh->func( "\\.\n", 'putline' ); > $dbh->func( 'endcopy' ); Thanks for that. All of the conversations I've seen on the subject stated that DBD::Pg only supported standard DB features -- copy not amongst them. > With sets of data from 1000 to 8000 records, my COPY performance is consistent > at ~1 records per second. Well done. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Insert performance, what should I expect?
> I've done some manual benchmarking running my script 'time script.pl' > I realise my script uses some of the time, bench marking shows that > %50 of the time is spent in dbd:execute. The perl drivers don't currently use database level prepared statements which would give a small boost. But your best bet is to switch to using COPY instead of INSERT. Two ways to do this. 1) Drop DBD::Pg and switch to the Pg driver for Perl instead (non-DBI compliant) which has functions similar to putline() that allow COPY to be used. 2) Have your perl script output a .sql file with the data prepared (COPY statements) which you feed into the database via psql. You can probably achieve a 50% increase in throughput. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Vacuum takes a really long time, vacuum full required
> Whatever the case, the database still slows down to a halt after a month or > so, and I have to go in and shut everything down and do a VACUUM FULL by > hand. One index (of many many) takes 2000 seconds to vacuum. The whole > process takes a few hours. Do a REINDEX on that table instead, and regular vacuum more frequently. > $ pg_config --version > PostgreSQL 7.3.2 7.4.x deals with index growth a little better 7.3 and older did. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: IBM P-series machines (was: [PERFORM] Excessive context
On Mon, 2004-10-11 at 13:38, Andrew Sullivan wrote: > On Tue, Oct 05, 2004 at 09:47:36AM -0700, Josh Berkus wrote: > > As long as you're on x86, scaling outward is the way to go. If you want to > > continue to scale upwards, ask Andrew Sullivan about his experiences running > > PostgreSQL on big IBM boxes. But if you consider an quad-Opteron server > > expensive, I don't think that's an option for you. > The 650s are not cheap, but boy are they fast. I don't have any > numbers I can share, but I can tell you that we recently had a few > days in which our write load was as large as the entire write load > for last year, and you couldn't tell. It is too early for us to say > whether the P series lives up to its billing in terms of relibility: > the real reason we use these machines is reliability, so if > approaching 100% uptime isn't important to you, the speed may not be > worth it. Agreed completely, and the 570 knocks the 650 out of the water -- nearly double the performance for math heavy queries. Beware vendor support for Linux on these things though -- we ran into many of the same issues with vendor support on the IBM machines as we did with the Opterons. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Caching of Queries
> More to the point though, I think this is a feature that really really > should be in the DB, because then it's trivial for people to use. How does putting it into PGPool make it any less trivial for people to use? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Getting rid of nested loop
I set nested_loop = off, which is why I have the high cost. @ is a postgis operator between 2 geomotries (both polygons). It's the @ operator which is expensive. Is there a way to force a cheaper way of doing that join? -> Nested Loop (cost=11905.94..11906.08 rows=1 width=68) (actual time=1739.368..17047.422 rows=100 loops=1) Join Filter: ((COALESCE("outer".geom, "outer".geom) @ COALESCE("inner".geom, "inner".geom)) AND ("outer".region_id <> "inner".region_id)) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] NAS, SAN or any alternate solution ?
> Rod Taylor wrote: > | I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both > | work as well as expected, but do require some tweeking as they normally > | are not optimized for the datablock size that PostgreSQL likes to deal > | with (8k by default) -- this can make as much as a 50% difference in > | performance levels. > I'm also not entirely sure how to make the datablocks line up with the > filesystem blocks. Any suggestions on this would be greatly appreciated. We just played with Veritas settings while running pg_bench on a 200GB database. I no longer have access to the NetApp, but the settings for the Hitachi are below. In tunefstab we have: read_pref_io=8192,read_nstream=4,write_pref_io=8192,write_nstream=2 In fstab it's: defaults,mincache=tmpcache,noatime If you have better settings, please shoot them over so we can try them out. Perhaps even get someone over there to write a new SAN section in the Tuning Chapter. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Large # of rows in query extremely slow, not using
On Mon, 2004-09-13 at 20:51, Stephen Crowley wrote: > Does postgres cache the entire result set before it begins returning > data to the client? Sometimes you need to be careful as to how the clients treat the data. For example psql will resize columns width on the length (width) of the data returned. PHP and Perl will retrieve and cache all of the rows if you request a row count ($sth->rows() or pg_num_rows($rset)) You may find that using a cursor will help you out. > I have a table with ~8 million rows and I am executing a query which > should return about ~800,000 rows. The problem is that as soon as I > execute the query it absolutely kills my machine and begins swapping > for 5 or 6 minutes before it begins returning results. Is postgres > trying to load the whole query into memory before returning anything? > Also, why would it choose not to use the index? It is properly > estimating the # of rows returned. If I set enable_seqscan to off it > is just as slow. > > Running postgres 8.0 beta2 dev2 > > explain select * from island_history where date='2004-09-07' and stock='QQQ'; > QUERY PLAN > --- > Seq Scan on island_history (cost=0.00..266711.23 rows=896150 width=83) >Filter: ((date = '2004-09-07'::date) AND ((stock)::text = 'QQQ'::text)) > (2 rows) > > Any help would be appreciated > > --Stephen > > Table "public.island_history" > Column | Type | Modifiers > --++--- > date | date | not null > stock| character varying(6) | > time | time without time zone | not null > reference_number | numeric(9,0) | not null > message_type | character(1) | not null > buy_sell_ind | character(1) | > shares | numeric(6,0) | > remaining_shares | numeric(6,0) | > price| numeric(10,4) | > display | character(1) | > match_number | numeric(9,0) | not null > Indexes: > "island_history_pkey" PRIMARY KEY, btree (date, reference_number, > message_type, "time", match_number) > "island_history_date_stock_time" btree (date, stock, "time") > "island_history_oid" btree (oid) > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend -- Rod Taylor Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/signature.asc signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Why does a simple query not use an obvious index?
> People expect count(*) _without a where clause_ to be cached in a single > global variable. Postgres can't do this, but the reason has everything to do Someone should write an approx_count('table') function that reads reltuples from pg_class and tell them to use it in combination with autovac. I've yet to see someone use count(*) across a table and not round the result themselves (approx 9 million clients). ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] using an index worst performances
On Fri, 2004-08-20 at 05:37, Gaetano Mendola wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Christopher Kings-Lynne wrote: > > |>>> Without index: 1.140 ms > |>>> With index: 1.400 ms > |>>> With default_statistic_targer = 200: 1.800 ms > |>> > |>> > |>> > |>> > |>> Can I just check that 1.800ms means 1.8 secs (You're using . as the > |>> thousands separator)? > |>> > |>> If it means 1.8ms then frankly the times are too short to mean > |>> anything without running them 100 times and averaging. > |> > |> > |> > |> > |> It mean 1.8 ms and that execution time is sticky to that value even > |> with 1000 times. > | > | > | Given the almost irrelvant difference in the speed of those queries, I'd > | say that with the stats so high, postgres simply takes longer to check > | the statistics to come to the same conclusion. ie. it has to loop over > | 200 rows instead of just 10. > > The time increase seems too much. We can test this. What are the times without the index, with the index and with the higher statistics value when using a prepared query? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Help specifying new machine
On Wed, 2004-08-18 at 11:18, Raoul Buzziol wrote: > > You're not going to be able to get a Dual Athlon MP for the same price > > as a single Xeon. A few years back, this was the case because Xeon CPUs > > & MBs had a huge premium over Athlon. This is no longer true mainly > > because the number of people carrying Athlon MP motherboards has dropped > > down drastically. Go to pricewatch.com and do a search for 760MPX -- you > > get a mere 8 entries. Not surprisingly because who would not want to > > spend a few pennies more for a much superior Dual Opteron? The few > > sellers you see now just keep stuff in inventory for people who need > > replacement parts for emergencies and are willing to pay up the nose > > because it is an emergency. > > I saw pricewatch.com and you're right. > > I looked for some benchmarks, and I would know if I'm right on: > - Dual Opteron 246 have aproximately the same performance of a Dual Xeon 3Gh > (Opteron a little better) > - Opteron system equal or cheeper than Xeon system. For PostgreSQL, Opteron might be a touch worse than Xeon for single processor, little better for Dual, and a whole heck of a bunch better for Quads -- but this depends on your specific work load as memory bound, cpu bound, lots of float math, etc. work loads will perform differently. In general, an Opteron is a better bet simply because you can shove more ram onto it (without workarounds), and you can't beat an extra 8GB ram on an IO bound database (consider your datasize in 1 year). ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] insert
On Fri, 2004-08-13 at 08:10, Ulrich Wisser wrote: > Hi, > > my inserts are done in one transaction, but due to some foreign key > constraints and five indexes sometimes the 100 inserts will take more > than 5 minutes. It is likely that you are missing an index on one of those foreign key'd items. Do an EXPLAIN ANALYZE SELECT * FROM foreign_table WHERE foreign_col = ''; Fix them until they're quick. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware upgrade for a high-traffic database
On Wed, 2004-08-11 at 18:03, Jason Coene wrote: > > -Original Message- > > From: Rod Taylor [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, August 11, 2004 5:46 PM > > To: Jason Coene > > Cc: 'Merlin Moncure'; Postgresql Performance > > Subject: Re: [PERFORM] Hardware upgrade for a high-traffic database > > > > > I'm wondering why our PG server is using so little memory... The system > > has > > > 2GB of memory, though only around 200MB of it are used. Is there a PG > > > > This is the second time you've said this. Surely you're not implying > > there is 1.8GB Free Memory -- rather than 1.8GB in Buffers or Cache. > > Hi Rod, > > I was looking at top and vmstat - which always show under 300MB "Active". > We may hit 400MB at peak. Everything I see (though this isn't my area of > expertise) points to most of the memory simply being unused. Results below, > am I missing something? This looks fine. The memory is not unused (only 5MB is actually empty) but is being used for disk cache. Active is memory used by programs and would need to be swapped if this space was needed. Inactive is memory that is generally dirty. Disk cache is often here. In your case, you likely write to the same pages you're reading from -- which is why this number is so big. It also explains why a checkpoint is a killer; a large chunk of this memory set needs to be pushed to disk. Cache is memory used generally for disk cache that is not dirty. It's been read from the disk and could be cleared immediately if necessary. Wired is memory that cannot be swapped. In your case, Shared Memory is probably Wired (this is good). There is another sysctl to check and set whether it is wired or swappable. Interesting (if dry) read: http://www.freebsd.org/doc/en_US.ISO8859-1/articles/vm-design/index.html ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware upgrade for a high-traffic database
On Wed, 2004-08-11 at 17:31, Brian Hirt wrote: > On Aug 11, 2004, at 3:18 PM, Jason Coene wrote: > > > > I'm wondering why our PG server is using so little memory... The > > system has > > 2GB of memory, though only around 200MB of it are used. Is there a PG > > setting to force more memory usage towards the cache? Additionally, > > we use > > FreeBSD. I've heard that Linux may manage that memory better, any > > truth > > there? Sorry if I'm grabbing at straws here :) > > > > i don't know about freebsd, but linux is very aggressive about using > unused memory for disk cache. we have dedicated linux box running pg Aggressive indeed.. I'm stuck with the version that has a tendency to swap out active processes rather than abandon disk cache -- it gets very annoying! ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Hardware upgrade for a high-traffic database
> I'm wondering why our PG server is using so little memory... The system has > 2GB of memory, though only around 200MB of it are used. Is there a PG This is the second time you've said this. Surely you're not implying there is 1.8GB Free Memory -- rather than 1.8GB in Buffers or Cache. Send output of the below: sysctl vm sysctl -a | grep buffers top | grep -E "(Mem|Swap):" ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware upgrade for a high-traffic database
> Our database is about 20GB on disk, we have some quite large tables - 2M > rows with TEXT fields in a sample table, accessed constantly. We average > about 4,000 - 5,000 queries per second - all from web traffic. As you can 99% is reads? and probably the same data over and over again? You might want to think about a small code change to cache sections of page output in memory for the most commonly generated pages (there are usually 3 or 4 that account for 25% to 50% of web traffic -- starting pages). The fact you're getting 5k queries/second off IDE drives tells me most of the active data is in memory -- so your actual working data set is probably quite small (less than 10% of the 20GB). If the above is all true (mostly reads, smallish dataset, etc.) and the database is not growing very quickly, you might want to look into RAM and RAM bandwidth over disk. An Opteron with 8GB ram using the same old IDE drives. Get a mobo with a SCSI raid controller in it, so the disk component can be upgraded in the future (when necessary). ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Tuning queries on large database
On Wed, 2004-08-04 at 08:44, Valerie Schneider DSI/DEV wrote: > Hi, > > I have some problem of performance on a PG database, and I don't > know how to improve. I Have two questions : one about the storage > of data, one about tuning queries. If possible ! > > My job is to compare Oracle and Postgres. All our operational databases > have been running under Oracle for about fifteen years. Now I try to replace > Oracle by Postgres. You may assume some additional hardware may be required -- this would be purchased out of the Oracle License budget :) > My first remark is that the table takes a lot of place on disk, about > 70 Gb, instead of 35 Gb with oracle. > 125 000 000 rows x 256 b = about 32 Gb. This calculation gives an idea > not so bad for oracle. What about for PG ? How data is stored ? This is due to the datatype you've selected. PostgreSQL does not convert NUMERIC into a more appropriate integer format behind the scenes, nor will it use the faster routines for the math when it is an integer. Currently it makes the assumption that if you've asked for numeric rather than integer or float that you are dealing with either large numbers or require high precision math. Changing most of your columns to integer + Check constraint (where necessary) will give you a large speed boost and reduce disk requirements a little. > The different queries of the bench are "simple" queries (no join, > sub-query, ...) and are using indexes (I "explained" each one to > be sure) : Care to send us the EXPLAIN ANALYZE output for each of the 4 queries after you've improved the datatype selection? -- Rod Taylor Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/signature.asc signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Page Miss Hits
> | ARC still helps, since it makes sure the shared_buffers don't all get > | flushed from the useful small datasets when a seq scan gets executed. > > I'm still not convinced. Why the last backend alive, have to throw away > bunch of memory copied in the SHM? And again, the ARC is a replacement > policy for a cache, which one ? As you know, ARC is a recent addition. I've not seen any benchmarks demonstrating that the optimal SHARED_BUFFERS setting is different today than it was in the past. We know it's changed, but the old buffer strategy had an equally hard time with a small buffer as it did a large one. Does that mean the middle of the curve is still at 15k buffers but the extremes are handled better? Or something completely different? Please feel free to benchmark 7.5 (OSDL folks should be able to help us as well) and report back. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] No index usage with "left join"
> SELECT count(*) FROM contacts LEFT JOIN companies ON contacts.sid = > companies.intfield01 > > contacts.sid (type text, b-tree index on it) > companies.intfield01 (type bigint, b-tree index on it) > How can I force the usage of the indexes when using "left join". Or > any other SQL construct that does the same !? Can anybody please give > us a hint !? You really don't need to use indexes since you're fetching all information from both tables. Anyway, we can be fairly sure this isn't PostgreSQL 7.4 (which would likely choose a far better plan -- hash join rather than nested loop) as it won't join a bigint to a text field without a cast. Try this: set enable_nestloop = false; SELECT count(*) FROM contacts LEFT JOIN companies ON cast(contacts.sid as bigint) = companies.intfield01; set enable_nestloop = true; ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] best way to fetch next/prev record based on index
You only want one record to be returned? Tack a LIMIT 1 onto the end of the query. > My problem is deceptively simple: how you read the next record from a > table based on a given set of values? In practice, this is difficult to > implement. If anybody can suggest a alternative/better way to this, I'm > all ears. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance over a LAN
On Fri, 2004-07-23 at 01:50, William Carney wrote: > Hello, > > Using a test client application that performs 10 insert operations on a > table, with the client application running on the same machine as the > Postgres server, I get the following results for the time taken to run the > test: > > Unix domain socket connection: 26 seconds > Inet domain socket ('localhost'): 35 seconds > The machines used are P4s running FreeBSD 5.2.1. The Postgres version is > 7.4.3. Can anyone tell me why there's such a big difference? Domains sockets have significantly less work to do than inet sockets as well as less delays for the transmission itself. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Odd sorting behaviour
On Wed, 2004-07-21 at 06:04, Steinar H. Gunderson wrote: > On Tue, Jul 20, 2004 at 10:18:19PM -0400, Rod Taylor wrote: > > I've taken a look and managed to cut out quite a bit of used time. > > You'll need to confirm it's the same results though (I didn't -- it is > > the same number of results (query below) > > It looks very much like the same results. Oh.. On my (slow) laptop it cut the time back significantly.. > As my server is 7.2 and not 7.4, that obviously won't help much :-) Thanks > anyway, though -- we'll upgrade eventually, and it'll help then. I see. Yeah, avoid NOT IN like a plague on 7.2. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Odd sorting behaviour
> I could of course post the updated query plan if anybody is interested; let > me know. (The data is still available if anybody needs it as well, of > course.) I've taken a look and managed to cut out quite a bit of used time. You'll need to confirm it's the same results though (I didn't -- it is the same number of results (query below) First off, "DROP INDEX prodid_index;". It doesn't help anything since the primary key is just as usable, but it does take enough space that it causes thrashing in the buffer_cache. Any queries based on prodid will use the index for the PRIMARY KEY instead. Secondly, I had no luck getting the hashjoin but this probably doesn't matter. I've assumed that the number of users will climb faster than the product set offered, and generated additional data via the below command run 4 times: INSERT INTO opinions SELECT prodid, uid + (SELECT max(uid) FROM opinions), opinion FROM opinions; I found that by this point, the hashjoin and mergejoin have essentially the same performance -- in otherwords, as you grow you'll want the mergejoin eventually so I wouldn't worry about it too much. New Query cuts about 1/3rd the time, forcing hashjoin gets another 1/3rd but see the above note: SELECT o3.prodid , SUM(o3.opinion*o12.correlation) AS total_correlation FROM opinions o3 -- Plain join okay since o12.correlation <> 0 -- eliminates any NULLs anyway. -- Was RIGHT JOIN JOIN (SELECT o2.uid , SUM(o1.opinion*o2.opinion)/SQRT(count(*)::numeric) AS correlation FROM opinions AS o1 JOIN opinions AS o2 USING (prodid) WHERE o1.uid = 1355 GROUP BY o2.uid ) AS o12 USING (uid) -- Was old Left join WHERE o3.prodid NOT IN (SELECT prodid FROM opinions AS o4 WHERE uid = 1355) AND o3.opinion <> 0 AND o12.correlation <> 0 GROUP BY o3.prodid ORDER BY total_correlation desc; ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] NAS, SAN or any alternate solution ?
> Would NAS or SAN be good solutions ? (I've read that NAS uses NFS > which could slow down the transfer rate ??) > Has anyone ever tried one of these with postgresql ? I've used both a NetApp and Hitachi based SANs with PostgreSQL. Both work as well as expected, but do require some tweeking as they normally are not optimized for the datablock size that PostgreSQL likes to deal with (8k by default) -- this can make as much as a 50% difference in performance levels. For a NAS setup, be VERY careful that the NFS implementation you're using has the semantics that the database requires (do plenty of failure testing -- pull plugs and things at random). iSCSI looks more promising, but I've not tested how gracefully it fails. Have your supplier run a bunch of benchmarks for random IO with 8k blocks. One side note, SANs seem to be very good at scaling across multiple jobs from multiple sources, but beware your Fibre Channel drivers -- mine seems to spend quite a bit of time managing interrupts and I've not found a way to put it into a polling mode (I'm not a Linux person and that trick usually happens for me on the BSDs). ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] extrem bad performance
> The database grows very slowly. The main load comes from SELECT's and > not from INSERT's or UPDATE's, but the performance gets slower day by day... > > I have no idea where to search for the speed break! Lets start with an example. Please send us an EXPLAIN ANALYZE of a couple of the poorly performing queries. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] query plan wierdness?
> Oddly enough, I put the same database on a different machine, and the > query now behaves as I hoped all along. Notice that I'm using the > "real" query, with the aspid in asc and the other fields in desc order, > yet the query does use the call_idx13 index: Notice that while it only takes 19 seconds to pull the data out of the table, it is spending 30 seconds sorting it -- so the index scan isn't buying you very much. Try it again with ORDER BY ascid DESC and you should get the query down to 20 seconds in total on that Sparc; so I wouldn't call it exactly what you wanted. he decision about whether to use an index or not, is borderline. And as you've shown they take approximately the same amount of time. Use of an index will not necessarily be faster than a sequential scan -- but the penalty for accidentally selecting one when it shouldn't have is much higher. > > Any chance you could put together a test case demonstrating the above > > behaviour? Everything from CREATE TABLE, through dataload to the > EXPLAIN > > ANALYZE. > > > Forgive me for being thick: what exactly would be involved? Due to > HIPAA regulations, I cannot "expose" any of the data. Of course. But that doesn't mean you couldn't create table different name and muck around with the values. But you're getting what you want, so it isn't a problem anymore. > > I hesitated to bring this up because I wanted to focus on the technical > issues rather than have this degenerate into a religious war. The chief > developer in charge of the project brought this query to my attention. > He has a fair amount of political sway in the company, and is now > lobbying to switch to MySQL because he maintains that PostgreSQL is > broken and/or too slow for our needs. He has apparently benchmarked the > same query using MySQL and gotten much more favorable results (I have > been unable to corroborate this yet). > I wouldn't be surprised if MySQL did run this single query faster with nothing else going on during that time. MySQL was designed primarily with a single user in mind, but it is unlikely this will be your production situation so the benchmark is next to useless. Connect 50 clients to the databases running this (and a mixture of other selects) while another 20 clients are firing off updates, inserts, deletes on these and other structures -- or whatever matches your full production load. This is what PostgreSQL (and a number of other DBs) are designed for, typical production loads. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Working on huge RAM based datasets
> What would be most interesting to see is whether this makes it wise to > increase shared buffer size. It may be more effective to bump down > the cache a little, and bump up sort memory; hard to tell. How do we go about scheduling tests with the OSDL folks? If they could do 10 runs with buffers between 1k and 500k it would help us get a broad view of the situation. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] query plan wierdness?
> OK, that makes sense; however, this doesn't: > > elon2=# explain analyse select * from call where aspid='123C' and > OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24 > 23:59:59.999' order by aspid asc, openeddatetime asc, callstatus asc, > calltype asc, callkey asc; > I've modified the "order by" to reflect the call_idx13 index, yet the > query still causes a sequence scan of the table. This query shown above does not have a limit where the original one had LIMIT 26. PostgreSQL has determined that pulling out all the table rows, and sorting them in CPU is cheaper than pulling out all index rows, then randomly pulling out all table rows. Normally, that would be well on the mark. You can sort a large number of tuples for a single random disk seek, but this is not true for you. Considering you're pulling out 450k rows in 8 seconds, I'd also guess the data is mostly in memory. Is that normal? Or is this a result of having run several test queries against the same data multiple times? If it's normal, bump your effective_cache parameter higher to move the sort vs. scan threshold. > Again, that makes sense to me, but if I remove aspid from the query it > still ignores the index You've changed 2 variables. You removed the aspid AND removed the LIMIT. Add back the limit of 26 like you originally showed, and it'll do what I described. > Setting enable_seqscan=off still doesn't cause the desired index to be > selected: > > elon2=# explain analyse select * from call where aspid='123C' and > OpenedDateTime between '2000-01-01 00:00:00.0' and '2004-06-24 > 23:59:59.999' order by aspid desc, openeddatetime desc, callstatus desc, > calltype desc, callkey desc; > > QUERY PLAN > > > > - > Sort (cost=355314.41..356482.87 rows=467384 width=295) (actual > time=33382.92..34088.10 rows=461973 loops=1) >Sort Key: aspid, openeddatetime, callstatus, calltype, callkey >-> Index Scan using call_aspid on call (cost=0.00..43430.25 > rows=467384 width=295) (actual time=0.24..7915.21 rows=461973 loops=1) > Index Cond: (aspid = '123C'::bpchar) > Filter: ((openeddatetime >= '2000-01-01 00:00:00-07'::timestamp > with time zone) AND (openeddatetime <= '2004-06-24 > 23:59:59.999-07'::timestamp with time zone)) > Total runtime: 39196.39 msec I'm a little surprised at this. I should have done a reverse index scan and skipped the sort step. In fact, with a very simple select, I get this: rbt=# \d t Table "public.t" Column | Type | Modifiers ++--- col1 | bpchar | col2 | timestamp(0) without time zone | col3 | integer| col4 | integer| col5 | integer| Indexes: "t_idx" btree (col1, col2, col3, col4, col5) rbt=# set enable_seqscan = false; SET rbt=# explain analyze select * from t order by col1 desc, col2 desc, col3 desc, col4 desc, col5 desc; QUERY PLAN - Index Scan Backward using t_idx on t (cost=0.00..6.20 rows=18 width=52) (actual time=0.046..0.219 rows=18 loops=1) Total runtime: 1.813 ms (2 rows) Any chance you could put together a test case demonstrating the above behaviour? Everything from CREATE TABLE, through dataload to the EXPLAIN ANALYZE. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] vacuum_mem
> It seems vacuum_mem does not have performance > effect at all. Wrong conclusion. It implies that your test case takes less than 64M of memory to track your removed tuples. I think it takes 8 bytes to track a tuple for vacuuming an index, which means it should be able to track 80 deletions. Since you're demonstration had 75 for removal, it's under the limit. Try your test again with 32MB; it should make a single sequential pass on the table, and 2 passes on each index for that table. Either that, or do a few more aborted updates. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] query plan wierdness?
> However, this query performs a sequence scan on the table, ignoring the > call_idx13 index (the only difference is the addition of the aspid field > in the order by clause): You do not have an index which matches the ORDER BY, so PostgreSQL cannot simply scan the index for the data you want. Thus is needs to find all matching rows, order them, etc. > 23:59:59.999' order by aspid, openeddatetime desc, callstatus desc, > calltype desc, callkey desc limit 26; aspid ASC, openeddatetime DESC, callstatus DESC, calltype DESC > call_idx13 btree (aspid, openeddatetime, callstatus, calltype, > callkey), This index is: aspid ASC, openeddatetime ASC, callstatus ASC, calltype ASC, callkey ASC A reverse scan, would of course be DESC, DESC, DESC, DESC, DESC -- neither of which matches your requested order by, thus cannot help the reduce the lines looked at to 26. This leaves your WHERE clause to restrict the dataset and it doesn't do a very good job of it. There are more than 45 rows matching the where clause, which means the sequential scan was probably the right choice (unless you have over 10 million entries in the table). Since your WHERE clause contains a single aspid, an improvement to the PostgreSQL optimizer may be to ignore that field in the ORDER BY as order is no longer important since there is only one possible value. If it did ignore aspid, it would use a plan similar to the first one you provided. You can accomplish the same thing by leaving out aspid ASC OR by setting it to aspid DESC in the ORDER BY. Leaving it out entirely will be slightly faster, but DESC will cause PostgreSQL to use index "call_idx13". ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Query performance
> Can I get any better performance? You can try bumping your sort memory way up (for this query only). Another method would be to cluster the table by the symbol column (eliminates the expensive sort). If you could run a very simple calculation against open & close numbers to eliminate a majority of symbols early, that would be useful as well. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] postgresql and openmosix migration
> 2) You can hire a PG database expert.This will be much faster, but cost > you a lot of money. I wouldn't exactly say "a lot of money". Lots of consulters out there are willing to put in a weeks worth of effort, on site, for significantly less than a support contract with most commercial DB organizations (including MySQL) -- and often give better results since they're on-site rather than over phone or via email. But yes, doing it via this mailing list is probably the cheapest option. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] pg_fetch_array
> Does php need to read database everytime when pg_fetch_array is executed in > the while loop or all the rows have been in the memory after pg_query? You may need to ask the php people about this one. The PostgreSQL protocol would allow data to continue streaming in at the same time as you are processing other rows (asynchronous retrieval). So, optionally they may fetch and cache all rows in local memory at pg_query OR grab them in sets of 1000 rows and cache that (fetching the next set when the first set runs out) OR grab one row for each fetch. You could run a simple select that will fetch 100M rows from a table with no WHERE clause. See how quickly the first row come in, and how much memory is used by the process. I suspect they call all of the rows at pg_query execution. Otherwise they wouldn't know how to respond to a pg_num_rows() call. On a side note, that is a rather unique email address. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] in-transaction insert performance in 7.5devel
> As I understand it, sync() is never called anymore. mdsync() hits the > all the files 1 by 1 with an fsync. My understanding of the commit > process is that 30 tps is quite reasonable for my hardware. Sorry. I didn't see the version in the subject and assumed 7.4 on a Linux machine with excessive journaling enabled. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] in-transaction insert performance in 7.5devel
On Fri, 2004-06-11 at 14:40, Merlin Moncure wrote: > I am batch inserting insert statements into a database with fsync = on. > My single disk system is on a 10k drive...even though I am inside a > transaction there is at least 1 file sync per row insert. Which filesystem? PostgreSQL isn't issuing the sync except at commit of a transaction, but some filesystems do wacky things if you ask them too. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Index oddity
On Wed, 2004-06-09 at 21:45, Christopher Kings-Lynne wrote: > > If I take away the diagonalSize condition in my query I find that there > > are 225 rows that satisfy the other conditions. 155 of these have a > Maybe you should drop your random_page_cost to something less than 4, > eg. 3 or even 2... The big problem is a very poor estimate (off by a couple orders of magnitude). I was hoping someone with more knowledge in fixing those would jump in. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Index oddity
> ... and here is the plan with statistics set to 1000 ... > > Seq Scan on nrgfeature f (cost=0.00..31675.57 rows=18608 width=218) > (actual time=63.544..1002.701 rows=225 loops=1) >Filter: ((upperrightx > 321264.236977215::double precision) AND > (lowerleftx < 324046.799812083::double precision) AND (upperrighty > > 123286.261898636::double precision) AND (lowerlefty < > 124985.927450476::double precision) AND (diagonalsize > 49.999::double > precision)) It's better like this, but still way off the mark. Even your good query which uses the index was out by more than an order of magnitude. Try raising the statistics levels for upperrightx, lowerleftx, upperrighty and lowerlefty. Failing that, you might be able to push it back down again by giving diagonalsize an upper limit. Perhaps 500 is a value that would never occur. AND (diagonalsize BETWEEN 49.999::double precision AND 500) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Index oddity
On Wed, 2004-06-09 at 16:50, ken wrote: > Thanks Rod, > > This setting has no effect however. If I set statistics to 1000, or Okay.. but you never did send EXPLAIN ANALYZE output. I want to know what it is really finding. > On Wed, 2004-06-09 at 13:12, Rod Taylor wrote: > > It seems to believe that the number of rows returned for the >49.999 > > case will be 4 times the number for the >50 case. If that was true, then > > the sequential scan would be correct. > > > > ALTER TABLE ALTER COLUMN diagonalsize SET STATISTICS 1000; > > ANALZYE ; > > > > Send back EXPLAIN ANALYZE output for the >49.999 case. > > > > > The query plan for diagonalSize > 50.000 is ... > > > > > > Index Scan using nrgfeature_xys_index on nrgfeature f > > > (cost=0.00..17395.79 rows=4618 width=220) > > >Index Cond: ((upperrightx > 321264.236977215::double precision) AND > > > (lowerleftx < 324046.799812083::double precision) AND (upperrighty > > > > 123286.261898636::double precision) AND (lowerlefty < > > > 124985.927450476::double precision) AND (diagonalsize > 50::double > > > precision)) > > > > > > ... while for diagonalSize > 49.999 is ... > > > > > > Seq Scan on nrgfeature f (cost=0.00..31954.70 rows=18732 width=220) > > >Filter: ((upperrightx > 321264.236977215::double precision) AND > > > (lowerleftx < 324046.799812083::double precision) AND (upperrighty > > > > 123286.261898636::double precision) AND (lowerlefty < > > > 124985.927450476::double precision) AND (diagonalsize > 49.999::double > > > precision)) > > > > > > ---(end of broadcast)--- > > TIP 9: the planner will ignore your desire to choose an index scan if your > > joining column's datatypes do not match > > ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Index oddity
It seems to believe that the number of rows returned for the >49.999 case will be 4 times the number for the >50 case. If that was true, then the sequential scan would be correct. ALTER TABLE ALTER COLUMN diagonalsize SET STATISTICS 1000; ANALZYE ; Send back EXPLAIN ANALYZE output for the >49.999 case. > The query plan for diagonalSize > 50.000 is ... > > Index Scan using nrgfeature_xys_index on nrgfeature f > (cost=0.00..17395.79 rows=4618 width=220) >Index Cond: ((upperrightx > 321264.236977215::double precision) AND > (lowerleftx < 324046.799812083::double precision) AND (upperrighty > > 123286.261898636::double precision) AND (lowerlefty < > 124985.927450476::double precision) AND (diagonalsize > 50::double > precision)) > > ... while for diagonalSize > 49.999 is ... > > Seq Scan on nrgfeature f (cost=0.00..31954.70 rows=18732 width=220) >Filter: ((upperrightx > 321264.236977215::double precision) AND > (lowerleftx < 324046.799812083::double precision) AND (upperrighty > > 123286.261898636::double precision) AND (lowerlefty < > 124985.927450476::double precision) AND (diagonalsize > 49.999::double > precision)) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] seq scan woes
On Mon, 2004-06-07 at 16:12, Dan Langille wrote: > On 7 Jun 2004 at 16:00, Rod Taylor wrote: > > > On Mon, 2004-06-07 at 15:45, Dan Langille wrote: > > > A production system has had a query recently degrade in performance. > > > What once took < 1s now takes over 1s. I have tracked down the > > > problem to a working example. > > > > What changes have you made to postgresql.conf? > > Nothing recently (ie. past few months). Nothing at all really. > Perhaps I need to start tuning that. > > > Could you send explain analyse again with SEQ_SCAN enabled but with > > nested loops disabled? > > See http://rafb.net/paste/results/zpJEvb28.html This doesn't appear to be the same query as we were shown earlier. > > Off the cuff? I might hazard a guess that effective_cache is too low or > > random_page_cost is a touch too high. Probably the former. > > I grep'd postgresql.conf: > > #effective_cache_size = 1000# typically 8KB each > #random_page_cost = 4 # units are one sequential page fetch cost This would be the issue. You haven't told PostgreSQL anything about your hardware. The defaults are somewhat modest. http://www.postgresql.org/docs/7.4/static/runtime-config.html Skim through the run-time configuration parameters that can be set in postgresql.conf. Pay particular attention to: * shared_buffers (you may be best with 2000 or 4000) * effective_cache_size (set to 50% of ram size if dedicated db machine) * random_page_cost (good disks will bring this down to a 2 from a 4) -- Rod Taylor Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/signature.asc ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] seq scan woes
On Mon, 2004-06-07 at 15:45, Dan Langille wrote: > A production system has had a query recently degrade in performance. > What once took < 1s now takes over 1s. I have tracked down the > problem to a working example. What changes have you made to postgresql.conf? Could you send explain analyse again with SEQ_SCAN enabled but with nested loops disabled? Off the cuff? I might hazard a guess that effective_cache is too low or random_page_cost is a touch too high. Probably the former. -- Rod Taylor Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/signature.asc ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] WAL Optimisation - configuration and usage
> random_page_cost = 0.5 Not likely. The lowest this value should ever be is 1, and thats if you're using something like a ram drive. If you're drives are doing a ton of extra random IO due to the above (rather than sequential reads) it would lower the throughput quite a bit. Try a value of 2 for a while. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] postgres performance: comparing 2 data centers
> The members table contains about 500k rows. It has an index on > (group_id, member_id) and on (member_id, group_id). Yes, bad stats are causing it to pick a poor plan, but you're giving it too many options (which doesn't help) and using space up unnecessarily. Keep (group_id, member_id) Remove (member_id, group_id) Add (member_id) An index on just member_id is actually going to perform better than member_id, group_id since it has a smaller footprint on the disk. Anytime where both group_id and member_id are in the query, the (group_id, member_id) index will likely be used. -- Rod Taylor Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/signature.asc ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] postgres performance: comparing 2 data centers
On Fri, 2004-06-04 at 18:07, Michael Nonemacher wrote: > Slight update: > > Thanks for the replies; this is starting to make a little more sense... > > I've managed to track down the root of the problem to a single query on > a single table. I have a query that looks like this: >select count(*) from members where group_id = ? and member_id > > 0; > > The members table contains about 500k rows. It has an index on > (group_id, member_id) and on (member_id, group_id). > > It seems like the statistics are wildly different depending on whether > the last operation on the table was a 'vacuum analyze' or an 'analyze'. Yes, bad stats are causing it to pick a poor plan (might be better in 7.5), but you're giving it too many options (which doesn't help) and using diskspace up unnecessarily. Keep (group_id, member_id) Remove (member_id, group_id) Add (member_id) An index on just member_id is actually going to perform better than member_id, group_id since it has a smaller footprint on the disk. Anytime where both group_id and member_id are in the query, the (group_id, member_id) index will likely be used. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL and Kernel 2.6.x
On Tue, 2004-06-01 at 23:16, V i s h a l Kashyap @ [Sai Hertz And Control Systems] wrote: > Dear all, > > Have anyone compiled PostgreSQL with kernel 2.6.x > if YES > 1. Was their any performance gains OSDL reports approx 20% improvement. I've seen similar with some data access patterns. > 2. What problems would keeping us away from compiling on kernel 2.6 Nothing that I know of assuming you have vendor support for it. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL caching
> What is essentially required is the "prescient cacheing algorithm," > where the postmaster must consult /dev/esp in order to get a > prediction of what blocks it may need to refer to in the next sixty > seconds. Easy enough. Television does it all the time with live shows. The guy with the buzzer always seems to know what will be said before they say it. All we need is a 5 to 10 second delay... ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] TPCH 100GB - need some help
On Fri, 2004-05-14 at 14:00, Eduardo Almeida wrote: > Hi folks, > > I need some help in a TPCH 100GB benchmark. Performance with 7.5 is much improved over 7.4 for TPCH due to efforts of Tom Lane and OSDL. Give it a try with a recent snapshot of PostgreSQL. Otherwise, disable nested loops for that query. set enable_nestloop = off; ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] LIKE and INDEX
> but if I use: > select url from urlinfo where url like 'http://%.lycos.de'; > it won't use index at all, NOT good! > is there any way I can force secon query use index??? create index nowww on urlinfo(replace(replace(url, 'http://', ''), 'www.', ''))); SELECT url FROM urlinfo WHERE replace(replace(url, 'http://', ''), 'www.', '') = 'lycos.de' AND url LIKE 'http://%.lycos.de' ; The replace() will narrow the search down to all records containing lycos.de. Feel free to write a more complex alternative for replace() that will deal with more than just optional www. Once the results have been narrowed down, you may use the original like expression to confirm it is your record. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Insert only tables and vacuum performance
> Or even better an offset into the datatable for the earliest deleted > row, so if you have a table where you update the row shortly after > insert and then never touch it vacuum can skip most of the table > (inserts are done at the end of the table, right?) Inserts are done at the end of the table as a last resort. But anyway, how do you handle a rolled back insert? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] planner/optimizer question
> I would be nice to get a feel for how much performance loss would be incurred in > maintaining the index flags against possible performance gains for getting the data > back > out again. I guess the real question is, why maintain index flags and not simply drop the index entry altogether? A more interesting case would be to have the backend process record index tuples that it would invalidate (if committed), then on commit send that list to a garbage collection process. It's still vacuum -- just the reaction time for it would be much quicker. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match