[PERFORM] Horribly slow query/ sequential scan
This is a query migrated from postgres. In postgres it runs about 10,000 times *slower* than on informix on somewhat newer hardware. The problem is entirely due to the planner. This PostgreSQL 8.1.4 on linux, 2 gigs of ram. The table: Table reporting.bill_rpt_work Column | Type | Modifiers ---+---+--- report_id | integer | client_id | character varying(10) | contract_id | integer | not null rate | numeric | not null appid | character varying(10) | not null userid| text | not null collection_id | integer | not null client_name | character varying(60) | use_sius | integer | not null is_subscribed | integer | not null hits | numeric | not null sius | numeric | not null total_amnt| numeric | not null royalty_total | numeric | Indexes: billrptw_ndx UNIQUE, btree (report_id, client_id, contract_id, rate, appid, userid, collection_id) billrpt_cntrct_ndx btree (report_id, contract_id, client_id) billrpt_collid_ndx btree (report_id, collection_id, client_id, contract_id) Foreign-key constraints: $1 FOREIGN KEY (report_id) REFERENCES billing_reports(report_id) $2 FOREIGN KEY (client_id) REFERENCES work.clients(client_id) The query: explain analyze select w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs, sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits, sum(w.sius) * w.rate AS BYIUS from bill_rpt_work w, billing_reports b where w.report_id in (select b.report_id from billing_reports where b.report_s_date = '2006-09-30') and (w.client_id = '22741' or w.client_id = '227410') group by 1,2,3 order by 1,2,3; QUERY PLAN -- GroupAggregate (cost=298061335.44..298259321.62 rows=26 width=58) (actual time=372213.673..372213.689 rows=2 loops=1) - Sort (cost=298061335.44..29808.83 rows=8799354 width=58) (actual time=372213.489..372213.503 rows=37 loops=1) Sort Key: w.appid, w.rate, w.is_subscribed - Nested Loop (cost=0.00..296121313.45 rows=8799354 width=58) (actual time=286628.486..372213.053 rows=37 loops=1) Join Filter: (subplan) - Seq Scan on bill_rpt_work w (cost=0.00..85703.20 rows=11238 width=62) (actual time=1.239..1736.746 rows=61020 loops=1) Filter: (((client_id)::text = '22741'::text) OR ((client_id)::text = '227410'::text)) - Seq Scan on billing_reports b (cost=0.00..29.66 rows=1566 width=8) (actual time=0.001..0.879 rows=1566 loops=61020) SubPlan - Result (cost=0.00..29.66 rows=1566 width=0) (actual time=0.000..0.002 rows=1 loops=95557320) One-Time Filter: ($1 = '2006-09-30'::date) - Seq Scan on billing_reports (cost=0.00..29.66 rows=1566 width=0) (actual time=0.001..0.863 rows=1565 loops=61020) Total runtime: 372214.085 ms Informix uses report id/client id as an index, thus eliminating a huge number of rows. The table has 2280545 rows currently; slightly fewer when the above analyze was run. Informix has about 5 times as much data. select count(*) from bill_rpt_work where report_id in (select report_id from billing_reports where report_s_date = '2006-09-30') and (client_id = '22741' or client_id = '227410'); count --- 37 (1 row) So scanning everything seems particularly senseless. I had some success adding client id and report id to the initial select list, but that causes all sorts of problems in calling procedures that expect different data grouping. Any suggestion would be welcome because this is a horrible show stopper. Thanks, Greg Williamson DBA GlobeXplorer LLC ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Horribly slow query/ sequential scan
Voila ! You da man ! other expressions of awe and appreciation ... HAving burdened others with my foolishness too often, I hesitate to ask, but could someone either point me to a reference or explain what the difference might be ... I can see it with the eyes but I am having trouble understanding what Informix might have been doing to my (bad ?) SQL to fix the query. Seeing a redundancy and eliminating it ? The explain analyze for db's sql (slightly faster than Informix on an older Sun machine ... about 20%): GroupAggregate (cost=64.35..64.75 rows=8 width=58) (actual time=0.612..0.629 rows=2 loops=1) - Sort (cost=64.35..64.37 rows=8 width=58) (actual time=0.463..0.476 rows=37 loops=1) Sort Key: w.appid, w.rate, w.is_subscribed - Nested Loop (cost=8.11..64.23 rows=8 width=58) (actual time=0.130..0.211 rows=37 loops=1) Join Filter: (inner.report_id = outer.report_id) - HashAggregate (cost=3.95..3.96 rows=1 width=4) (actual time=0.035..0.035 rows=1 loops=1) - Index Scan using billrpt_sdate_ndx on billing_reports b (cost=0.00..3.94 rows=1 width=4) (actual time=0.021..0.023 rows=1 loops=1) Index Cond: (report_s_date = '2006-09-30'::date) - Bitmap Heap Scan on bill_rpt_work w (cost=4.17..59.92 rows=28 width=62) (actual time=0.084..0.111 rows=37 loops=1) Recheck Cond: (((w.report_id = outer.report_id) AND ((w.client_id)::text = '22741'::text)) OR ((w.report_id = outer.report_id) AND ((w.client_id)::text = '227410'::text))) - BitmapOr (cost=4.17..4.17 rows=28 width=0) (actual time=0.078..0.078 rows=0 loops=1) - Bitmap Index Scan on billrptw_ndx (cost=0.00..2.08 rows=14 width=0) (actual time=0.053..0.053 rows=22 loops=1) Index Cond: ((w.report_id = outer.report_id) AND ((w.client_id)::text = '22741'::text)) - Bitmap Index Scan on billrptw_ndx (cost=0.00..2.08 rows=14 width=0) (actual time=0.024..0.024 rows=15 loops=1) Index Cond: ((w.report_id = outer.report_id) AND ((w.client_id)::text = '227410'::text)) Total runtime: 6.110 ms (16 rows) Thanks again (and sorry for the top-posting but this particular interface is ungainly) G -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tue 1/9/2007 4:35 AM To: Gregory S. Williamson Cc: pgsql-performance@postgresql.org Subject:Re: [PERFORM] Horribly slow query/ sequential scan I don't think I understand the idea behind this query. Do you really need billing_reports twice? The query: explain analyze select w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs, sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits, sum(w.sius) * w.rate AS BYIUS from bill_rpt_work w, billing_reports b where w.report_id in (select b.report_id from billing_reports where b.report_s_date = '2006-09-30') and (w.client_id = '22741' or w.client_id = '227410') group by 1,2,3 order by 1,2,3; Maybe this is the query you want instead? select w.appid, w.rate, w.is_subscribed, sum(w.hits) AS Hits, sum(w.sius) AS IUs, sum(w.total_amnt) AS Total, sum(w.hits) * w.rate AS ByHits, sum(w.sius) * w.rate AS BYIUS from bill_rpt_work w where w.report_id in (select b.report_id from billing_reports b where b.report_s_date = '2006-09-30') and (w.client_id = '22741' or w.client_id = '227410') group by 1,2,3 order by 1,2,3; /Dennis --- Click link below if it is SPAM [EMAIL PROTECTED] https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45a38b1548991076418835[EMAIL PROTECTED]retrain=spamtemplate=historyhistory_page=1 !DSPAM:45a38b1548991076418835! --- Voi ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Horribly slow query/ sequential scan
Thanks for the suggestion ... I will try it when I've had some sleep and the server is quiet again ... the IN seems to have improved markedly since the 7.4 release, as advertised, so I will be interested in trying this. GSW -Original Message- From: Nörder-Tuitje, Marcus [mailto:[EMAIL PROTECTED] Sent: Tue 1/9/2007 4:50 AM To: [EMAIL PROTECTED]; Gregory S. Williamson Cc: pgsql-performance@postgresql.org Subject:AW: [PERFORM] Horribly slow query/ sequential scan Forget abount IN. Its horribly slow. try : select w.appid, w.rate, w.is_subscribed, sum(w.hits) AS Hits, sum(w.sius) AS IUs, sum(w.total_amnt) AS Total, sum(w.hits) * w.rate AS ByHits, sum(w.sius) * w.rate AS BYIUS from bill_rpt_work w where (select b.report_id from billing_reports b where b.report_s_date = '2006-09-30' and w.report_id = b.report_id) and w.client_id IN ('22741','227410') group by 1,2,3 order by 1,2,3; should by faster; assuming : index on report_id in b; index on report_id, client_id in w to enforce useage of indexes on grouping (depends on result size), consider extending w with cols 1,2,3. regards, marcus -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Auftrag von [EMAIL PROTECTED] Gesendet: Dienstag, 9. Januar 2007 13:36 An: Gregory S. Williamson Cc: pgsql-performance@postgresql.org Betreff: Re: [PERFORM] Horribly slow query/ sequential scan I don't think I understand the idea behind this query. Do you really need billing_reports twice? The query: explain analyze select w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs, sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits, sum(w.sius) * w.rate AS BYIUS from bill_rpt_work w, billing_reports b where w.report_id in (select b.report_id from billing_reports where b.report_s_date = '2006-09-30') and (w.client_id = '22741' or w.client_id = '227410') group by 1,2,3 order by 1,2,3; Maybe this is the query you want instead? select w.appid, w.rate, w.is_subscribed, sum(w.hits) AS Hits, sum(w.sius) AS IUs, sum(w.total_amnt) AS Total, sum(w.hits) * w.rate AS ByHits, sum(w.sius) * w.rate AS BYIUS from bill_rpt_work w where w.report_id in (select b.report_id from billing_reports b where b.report_s_date = '2006-09-30') and (w.client_id = '22741' or w.client_id = '227410') group by 1,2,3 order by 1,2,3; /Dennis ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate --- Click link below if it is SPAM [EMAIL PROTECTED] https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45a38ea050372117817174[EMAIL PROTECTED]retrain=spamtemplate=historyhistory_page=1 !DSPAM:45a38ea050372117817174! --- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] New to PostgreSQL, performance considerations
(Re)-Design it to do both, unless there's reason to believe that doing one after the other would skew the results. Then old results are available, new results are also visible and useful for future comparisons. And seeing them side by side mught be an interesting exercise as well, at least for a while. (sorry for top-posting -- web based interface that doesn't do proper quoting) Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Michael Glaesemann Sent: Wed 12/13/2006 10:11 PM To: Tom Lane Cc: Joshua D. Drake; Josh Berkus; pgsql-performance@postgresql.org; Bruce Momjian; Alvaro Herrera; Alexander Staubo; Michael Stone Subject:Re: [PERFORM] New to PostgreSQL, performance considerations On Dec 14, 2006, at 14:44 , Tom Lane wrote: The pgbench app itself becomes the bottleneck at high transaction rates. Awhile back I rewrote it to improve its ability to issue commands concurrently, but then desisted from submitting the changes --- if we change the app like that, future numbers would be incomparable to past ones, which sort of defeats the purpose of a benchmark no? At the same time, if the current pgbench isn't the tool we want to use, is this kind of backward comparison going to hinder any move to improve it? It sounds like there's quite a bit of room for improvement in pg_bench, and in my opinion we should move forward to make an improved tool, one that measures what we want to measure. And while comparison with past results might not be possible, there remains the possibility of rerunning the improved pgbench on previous systems, I should think. Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster --- Click link below if it is SPAM [EMAIL PROTECTED] https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=4580ea76236074356172766[EMAIL PROTECTED]retrain=spamtemplate=historyhistory_page=1 !DSPAM:4580ea76236074356172766! --- ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Hardware advice
If your data is valuable I'd recommend against RAID5 ... see http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt performance aside, I'd advise against RAID5 in almost all circumstances. Why take chances ? Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Sven Geisler Sent: Wed 12/6/2006 1:09 AM To: Alex Turner Cc: Alexandru Coseru; pgsql-performance@postgresql.org Subject:Re: [PERFORM] Hardware advice Hi Alex, Please check out http://www.powerpostgresql.com/PerfList before you use RAID 5 for PostgreSQL. Anyhow, In a larger scale you end up in the response time of the I/O system for an read or write. The read is in modern RAID and SAN environments the part where you have to focus when you want to tune your system because most RAID and SAN system can buffer write. PostgreSQL does use the Linux file system cache which is normally much larger then the RAID or SAN cache for reading. This means whenever a PostgreSQL read goes to the RAID or SAN sub system the response time of the hard disk will become interesting. I guess you can imagine that multiple reads to the same spins are causing an delay in the response time. Alexandru, You should have two XEONs, what every your core count is. This would use the full benefit of the memory architecture. You know two FSBs and two memory channels. Cheers Sven Alex Turner schrieb: The test that I did - which was somewhat limited, showed no benefit splitting disks into seperate partitions for large bulk loads. The program read from one very large file and wrote the input out to two other large files. The totaly throughput on a single partition was close to the maximum theoretical for that logical drive, even though the process was reading and writing to three seperate places on the disk. I don't know what this means for postgresql setups directly, but I would postulate that the benefit from splitting pg_xlog onto a seperate spindle is not as great as it might once have been for large bulk transactions. I am therefore going to be going to a single 6 drive RAID 5 for my data wharehouse application because I want the read speed to be availalbe. I can benefit from fast reads when I want to do large data scans at the expense of slightly slower insert speed. Alex. On 12/5/06, *Alexandru Coseru* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Hello.. Thanks for the advices.. Actually , i'm waiting for the clovertown to show up on the market... Regards Alex - Original Message - From: Sven Geisler [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] To: Alexandru Coseru [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Cc: pgsql-performance@postgresql.org mailto:pgsql-performance@postgresql.org Sent: Tuesday, December 05, 2006 11:57 AM Subject: Re: [PERFORM] Hardware advice Hi Alexandru, Alexandru Coseru schrieb: [...] Question 1: The RAID layout should be: a) 2 hdd in raid 1 for system and pg_xlog and 6 hdd in raid10 for data ? b) 8 hdd in raid10 for all ? c) 2 hdd in raid1 for system , 2 hdd in raid1 for pg_xlog , 4 hdd in raid10 for data ? Obs: I'm going for setup a) , but i want to hear your thoughts as well. This depends on you data size. I think, option a and c are good. The potential bottleneck may the RAID 1 for pg_xlog if you have huge amount of updates and insert. What is about another setup 4 hdd in RAID 10 for System and pg_xlog - System partitions are normally not in heavy use and pg_xlog should be fast for writing. 4 hdd in RAID 10 for data. Question 2: (Don't want to start a flame here. but here is goes) What filesystem should i run for data ? ext3 or xfs ? The tables have ~ 15.000 rel_pages each. The biggest table has now over 30.000 pages. We have a database running with 60,000+ tables. The tables size is between a few kByte for the small tables and up to 30 GB for the largest one. We had no issue with ext3 in the past. Question 3: The block size in postgresql is 8kb. The strip size in the raid ctrl is 64k. Should i increase the pgsql block size to 16 or 32 or even 64k ? You should keep in mind that the file system has also a block size. Ext3 has as maximum 4k. I would set up the partitions aligned to the stripe size to prevent unaligned reads. I guess, you can imagine that a larger block size of postgresql may also end up in unaligned reads because the file system has a smaller block size. RAID Volume and File system set up 1. Make all partitions aligned to
Re: [PERFORM] Slow SELECT on three or more clients
Operating system and some of the basic PostreSQL config settings would be helpful, plus any info you have on your disks, the size of the relevant tables, their structure and indexes vacuum/analyze status ... plus what others have said: Upgrade! There are considerable improvements in, well, *everything* !, since 7.3 (we havew some database atb 7.4.x and I consider them out-of-date). Hopefully this list can provide help to get you through whatever your immediate crisis is, but do consider planning for this as soon as time and resource permit. Data integrity is a _good_ thing! Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Russell Smith Sent: Wed 11/15/2006 5:31 AM To: AMIR FRANCO D. JOVEN Cc: pgsql-performance@postgresql.org Subject:Re: [PERFORM] Slow SELECT on three or more clients AMIR FRANCO D. JOVEN wrote: Hi! Im new to PostgreSQL. My current project uses PostgreSQL 7.3.4. Upgrading your version of PostgreSQL to 8.1 will give you significant benefits to performance. the problem is like this: I have a table with 94 fields and a select with only one resultset in only one client consumes about 0.86 seconds. The client executes three 'select' statements to perform the task which consumes 2.58 seconds. With only one client this is acceptable, but the real problem is as i add more clients, it goes more and more slower. for a single select with one field in one resultset, is 0.86 seconds normal? You will need to attach the query. EXPLAIN ANALYZE SELECT ... where SELECT ... is your query. That will help us work out what the problem is. 0.86 seconds might be slow for a query that returns 1 row, it might be fast for a query that returns a large set with complex joins and where conditions. Fast and slow are not objective terms. They are very dependent on the query. I tried vacuuming and reindexing but to no avail. the total record count in that particular table is 456,541. 456,541 is not all that many records. But again you will need to post more information for us to be able to assist. Thanks in advance. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings --- Click link below if it is SPAM [EMAIL PROTECTED] https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=455b17b2223071076418835[EMAIL PROTECTED]retrain=spamtemplate=historyhistory_page=1 !DSPAM:455b17b2223071076418835! --- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Context switch storm
Based on what other people have posted, hyperthreading seems not to be beneficial for postgres -- try searching through the archives of this list. (And then turn it off and see if it helps.) You might also post a few details: config settings (shared_buffers, work_mem, maintenance_work_mem, wal and checkpoint settings, etc.) are you using autovacuum ? all tables are vacuumed and analyzed regularly ? How big are they ? Do they and indexes fit in RAM ? any particular queries that running and might be related (explain analyze results of them would be useful) disk configuration Other processes on this box ? # of connections to it (I've seen this alone push servers over the edge) HTH, Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of [EMAIL PROTECTED] Sent: Fri 11/3/2006 2:32 AM To: pgsql-performance@postgresql.org Cc: Subject:[PERFORM] Context switch storm Hi, We've migrated one of our servers from pg 7.4 to 8.1 and from times to times (4 hours) the server start doing a lot of context switching and all transactions become very slow. The average context switching for this server as vmstat shows is 1 but when the problem occurs it goes to 25. CPU and memory usage are ok. What is producing this context switching storms? It is a box with 16GB RAM and 4 XEON processors running RedHat Enterprise Linux AS. Should I disable Hyperthreading? Thank you in advance! Reimer --- Click link below if it is SPAM [EMAIL PROTECTED] https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=454b34ac206028992556831[EMAIL PROTECTED]retrain=spamtemplate=historyhistory_page=1 !DSPAM:454b34ac206028992556831! --- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Optimization of this SQL sentence
Off hanbd I can't recommend anything, bur perhaps you could post the details of the tables (columns, indexes),and some info on what version of postgres you are using. Are the tables recently analyzed ? How many rows in them ? Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Ruben Rubio Sent: Tue 10/17/2006 2:05 AM To: pgsql-performance@postgresql.org Cc: Subject:[PERFORM] Optimization of this SQL sentence -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 This SQL sentence is very simple. I need to get better results. I have tried some posibilities and I didn't get good results. SELECT max(idcomment) FROM ficha vf INNER JOIN comment c ON (vf.idficha=c.idfile AND (idestado=3 OR idestado=4)) WHERE idstatus=3 AND ctype=1 QUERY PLAN Aggregate (cost=2730.75..2730.76 rows=1 width=4) (actual time=188.463..188.469 rows=1 loops=1) - Hash Join (cost=1403.44..2730.72 rows=11 width=4) (actual time=141.464..185.404 rows=513 loops=1) Hash Cond: (outer.idfile = inner.idficha) - Seq Scan on comment c (cost=0.00..1321.75 rows=1083 width=8) (actual time=0.291..36.112 rows=642 loops=1) Filter: ((idstatus = 3) AND (ctype = 1)) - Hash (cost=1403.00..1403.00 rows=178 width=4) (actual time=141.004..141.004 rows=6282 loops=1) - Seq Scan on ficha vf (cost=0.00..1403.00 rows=178 width=4) (actual time=0.071..97.885 rows=6282 loops=1) Filter: (((idestado)::text = '3'::text) OR ((idestado)::text = '4'::text)) Total runtime: 188.809 ms Thanks in advance, Ruben Rubio -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD4DBQFFNJzfIo1XmbAXRboRAgPRAJ99+S9wL21b+JN14bQbAoREFXYUcQCYpfEZ p1MCcDMWqTxzSdtssUFWOw== =rUHB -END PGP SIGNATURE- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq --- Click link below if it is SPAM [EMAIL PROTECTED] https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45349c86275246672479766[EMAIL PROTECTED]retrain=spamtemplate=historyhistory_page=1 !DSPAM:45349c86275246672479766! --- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Is postgresql ca do the job for software deployed in
A sodden late night idea ... schemas don't need to have names that are meaningful to outsiders. Still, the point about political aspects is an important one. OTH, schemas provide an elegant way of segregating data. My $0.02 (not worth what it was) Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Markus Schaber Sent: Wed 7/5/2006 3:38 AM To: pgsql-performance@postgresql.org Cc: Subject:Re: [PERFORM] Is postgresql ca do the job for software deployed in Hi, Mikael, Just my 2 cents: Mikael Carneholm wrote: Do you really need to create one *DB* per client - that is, is one schema (in the same DB) per client out of the question? Sometimes, schemas would work _technically_, but not politically, as a postgresql user cannot be prevented from listing all schemas (or even all databases in the same user), regardless whether he/she has access rights. But it is not always acceptable that a customer knows which other customers one has. This forces the use of the one cluster per customer paradigm. Thanks, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org !DSPAM:44ab96fb98231804284693! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal
1.451 ms = 1.451 milliseconds 1451.0 ms = 1.451 seconds ... so 32.918 ms for a commit seems perhaps reasonable ? Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Zeugswetter Andreas DCP SD Sent: Thu 5/11/2006 12:55 AM To: Jim C. Nasby; PFC Cc: Greg Stark; Tom Lane; pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org Subject:Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal Something else worth considering is not using the normal catalog methods for storing information about temp tables, but hacking that together would probably be a rather large task. But the timings suggest, that it cannot be the catalogs in the worst case he showed. 0.101 ms BEGIN 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER NOT NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP 1.4 seconds is not great for create table, is that what we expect ? 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id DESC LIMIT 20 0.443 ms ANALYZE tmp 0.365 ms SELECT * FROM tmp 0.310 ms DROP TABLE tmp 32.918 ms COMMIT CREATING the table is OK, but what happens on COMMIT ? I hear the disk seeking frantically. The 32 seconds for commit can hardly be catalog related. It seems the file is fsynced before it is dropped. Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq !DSPAM:446c0a75172664042098162! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Benchmarck PostgreSQL 8.1.4 MySQL 5.0.20 and Oracle 10g2
That fsync off would make me very unhappy in a production environment not that turning it on would help postgres, but ... one advantage of postgres is its reliability under a pull the plug scenario, but this setting defeats that. FWIW, Xeon has gotten quite negative reviews in these quarters (Opteron seems to do way better), IIRC, and I know we've had issues with Dell's disk i/o, admittedly on a different box. Quite interesting results, even if a bit disappointing to a (newly minted) fan of postgres. I'll be quite interested to hear more. Thanks for the work, although it seems like some of it won;t be able to released, unless Oracle has given some new blessing to releasing benchmark results. Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Olivier Andreotti Sent: Thu 5/18/2006 2:57 AM To: pgsql-performance@postgresql.org Cc: Subject:[PERFORM] Benchmarck PostgreSQL 8.1.4 MySQL 5.0.20 and Oracle 10g2 Hello, I'm running a benchmark with theses 3 databases, and the first results are not very good for PostgreSQL. PostgreSQL is 20% less performance than MySQL (InnoDB tables) My benchmark uses the same server for theses 3 databases : Dell Power edge - Xeon 2.8 Ghz - 2 Go Ram - 3 SCSI disks - Debian Sarge - Linux 2.6 The transactions are a random mix of request in read (select) and write (insert, delete, update) on many tables about 100 000 to 15 000 000 rows. Transactions are executed from 500 connections. For the tunning of PostgreSQL i use official documentation and theses web sites : http://www.revsys.com/writings/postgresql-performance.html http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html Some important points of my postgresql.conf file : max_connections = 510 shared_buffer = 16384 max_prepared_transactions = 510 work_mem = 1024 maintenance_work_mem = 1024 fsync = off wal_buffers = 32 commit_delay = 500 checkpoint_segments = 10 checkpoint_timeout = 300 checkpoint_warning = 0 effective_cache_size = 165 000 autovaccuum = on default_transaction_isolation = 'read_committed' What do you think of my tunning ? Best regards. O.A ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings !DSPAM:446c453a198591465223968! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [postgis-users] Is my query planner failing me, or vice versa?
Forgive the cross-posting, but I found myself wondering if might not be some way future way of telling the planner that a given table (column ?) has a high likelyhood of being TOASTed. Similar to the random_page_cost in spirit. We've got a lot of indexed data that is spatial and have some table where no data is toasted (road segments) and others where evrything is. An idle suggestion from one who knows that he is meddling with ;-} Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Jessica M Salmon Sent: Wednesday, December 14, 2005 9:09 AM To: PostGIS Users Discussion Subject: Re: [postgis-users] Is my query planner failing me, or vice versa? Thanks, Marcus, for explaining. And thanks, Robert, for asking that question about adjusting page size. My tuples are definitely toasted (some of my geometries are 30X too big for a single page!), so I'm glad I'm aware of the TOAST tables now. I suppose there's not much to be done about it, but it's good to know. Thanks everyone for such an array of insightful help. -Meghan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Question
As a sometimes Informix and PostgreSQL DBA, I disagree with the contentions below. We have many tables with 10s of millions of rows in Postgres. We have had (alas) power issues with our lab on more than one occasion and the afflicted servers have recovered like a champ, every time. This person may not like postgres (or very much likes Informix), but he shouldn't conjure up spurious reasons to support his/her prejudice. Informix is an excellent product, but it can be costly for web related applications. PostgeSQL is also an excellent database. Each has differences which may make the decision between the two of them clear. But facts are necessary to have a real discussion. Greg WIlliamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Alejandro Lemus Sent: Monday, July 11, 2005 6:00 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Question In the past week, one guy of Unix Group in Colombia say: Postgrest in production is bat, if the power off in any time the datas is lost why this datas is in plain files. Postgrest no ssupport data bases with more 1 millon of records. Wath tell me in this respect?, is more best Informix as say Ing. Alejandro Lemus G. Radio Taxi Aeropuerto S.A. Avenida de las Américas # 51 - 39 Bogotá - Colombia Tel: 571-4470694 / 571-4202600 Ext. 260 Fax: 571-2624070 email: [EMAIL PROTECTED] __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! Regístrate ya - http://correo.espanol.yahoo.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend !DSPAM:42d26e2065882109568359! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Low Performance for big hospital server ..
Amrit -- -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Mon 1/3/2005 12:18 AM To:Mark Kirkwood Cc:PGsql-performance Subject: Re: [PERFORM] Low Performance for big hospital server .. shared_buffers = 12000 will use 12000*8192 bytes (i.e about 96Mb). It is shared, so no matter how many connections you have it will only use 96M. Now I use the figure of 27853 Will the increasing in effective cache size to arround 20 make a little bit improvement ? Do you think so? Decrease the sort mem too much [8196] make the performance much slower so I use sort_mem = 16384 and leave effective cache to the same value , the result is quite better but I should wait for tomorrow morning [official hour] to see the end result. I would leave it at the figure you proposed (128897), and monitor your performance. (you can always increase it later and see what the effect is). Yes , I use this figure. If the result still poor , putting more ram 6-8Gb [also putting more money too] will solve the problem ? Adding RAM will almost always help, at least for a while. Our small runitme servers have 2 gigs of RAM; the larger ones have 4 gigs; I do anticipate the need to add RAM as we add users. If you have evaluated the queries that are running and verified that they are using indexes properly, etc., and tuned the other parameters for your system and its disks, adding memory helps because it increases the chance that data is already in memory, thus saving the time to fetch it from disk. Studying performance under load with top, vmstat, etc. and detailed analysis of queries can often trade some human time for the money that extra hardware would cost. Sometimes easier to do than getting downtime for a critical server, as well. If you don't have a reliable way of reproducing real loads on a test system, it is best to change things cautiously, and observe the system under load; if you change too many things (ideally only 1 at a time but often that is not possible) you mau actually defeat a good change with a bad one; at the least,m you may not know which change was the most important one if you make several at once. Best of luck, Greg Williamson DBA GlobeXplorer LLC Thanks , Amrit Thailand ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Improve BULK insertion
Rodrigo -- You should definitely drop the indexes and any other FK constraints before loading and then rebuild them. Check your logs and see if there are warnings about checkpoint intervals -- only 3 logs seems like it might be small; if you have the disk space I would definitely consider raising the number. If you haven't already posted your config settings you might do so -- this seems very slow. I regularly use COPY to load or unload data sets in the 200k-900k range and they don't take 90 minutes, even on slower hardware (and usually only a few minutes on our production servers; rebuilding the indexes usually takes longer. This unloading a 300k+ row data set on a dell linux box with not very good disks and 1 gig of RAM: Starting copy of parcel staging table parcels_12031 at Thu Dec 2 01:13:52 2004 Done with staging table copy at Thu Dec 2 01:15:16 2004 ... Starting compression of parcel file at Thu Dec 2 01:15:22 2004 gzip: /tmp/parcels_12031.unl.gz already exists; do you wish to overwrite (y or n )? y Done with compression of parcel file at Thu Dec 2 01:17:23 2004 ... And loading them on a rather faster server: Starting unzip of parcels at Thu Dec 2 01:29:15 2004 Finished with unzip at Thu Dec 2 01:29:22 2004 ... Target db detail table updated at Thu Dec 2 01:29:29 2004 Dropping indexes Dropping fk constraint on tracking id Dropping indexes Done dropping indexes on target parcels table at Thu Dec 2 01:29:30 2004 NOTICE: drop cascades to table f12031.parcel_pins NOTICE: drop cascades to table f12031.parcel_addresses NOTICE: drop cascades to table f12031.parcel_owner_fti NOTICE: drop cascades to table f12031.parcel_owners Removing old parcels entries starting at Thu Dec 2 01:29:30 2004 Done deleting schema and parcels for track_id 10163541 at Thu Dec 2 01:33:04 2004 Starting load of parcels at Thu Dec 2 01:33:04 2004 Done copying data into parcels at Thu Dec 2 01:35:18 2004 Deleting old v_detail reference for track_id 10163541 Done with delete of old v_detail reference Starting creation of foreign key constraint at Thu Dec 2 01:39:43 2004 Done with creation of foreign key constraint at Thu Dec 2 01:42:14 2004 Starting spatial index create at Thu Dec 2 01:42:14 2004 Done creating spatial index at Thu Dec 2 01:55:04 2004 Starting stats on geometry column now Done doing stats for spatial index at Thu Dec 2 02:03:47 2004 Starting index on PIN now Done creating pin index at Thu Dec 2 02:09:36 2004 Starting index on tracking id now Done creating trid index at Thu Dec 2 02:12:35 2004 Starting centroid index now Done creating centroid index at Thu Dec 2 02:24:11 2004 Starting stats on centroid column Done doing stats for spatial index at Thu Dec 2 02:29:55 2004 Doing City/Street Index on parcels table ...Done creating city/street index at Thu Dec 2 02:42:41 2004 with result -1 Committing changes So this took about 70 minutes to delete 20+ rows from a table with about 5 million rows, load a new set and reindex them (and do some statistics for spatial geometry). If the table had only this data the indexing would have been *much* faster. These are moderate size columns -- about 2 dozen columns and some spatial data (polygon and point). Both servers have rather more log files than your setup, but I am not familiar enough with postgres to know how much of an impact that alone will have. The comment about it slowing down part way through a load makes me suspect indexing issues, somehow (not from postgres experience but it rings a bell with other DBs); if you explicitly drop the indexes first and then load does it show the same performance behavior ? If you are doing the data read from, the database write and the WAL logging all on single disk drive, then I would guess that that is your bottleneck. If you use vmstat and/or top or the like, is your I/O pegged ? HTH Greg WIlliamson DBA GlobeXplorer LLC -Original Message- From: Rodrigo Carvalhaes [mailto:[EMAIL PROTECTED] Sent: Sun 12/5/2004 11:52 AM To: Christopher Browne Cc: [EMAIL PROTECTED] Subject:Re: [PERFORM] Improve BULK insertion Hi! 1. I am doing the inserts using pg_restore. The dump was created using pg_dump and the standard format (copy statements) 2. See below the table schema. There are only 7 indexes. 3. My transaction log configuration are : checkpoint_segments = 3 and checkpoint_timeout = 300 and my transaction logs are on the same disk . I know that I can increase the performance separating the transaction logs and making a RAID 5 array BUT I am really curious about WHY this performance is so poor and HOW can I try to improve on this actual machine because actualy this inserts are taking around 90 minutes!!! Cheers! Rodrigo dadosadv=# \d si2010 Table public.si2010 Column | Type | Modifiers
Re: [PERFORM] Performance vs Schemas
Igor, I'm not sure if it is proper to state that schemas are themselves speeding things up. As an example, we have data that is usually accessed by county; when we put all of the data into one big table and select from it using a code for a county of interest, the process is fairly slow as there are several hundred thousand candidate rows from that county in a table with many millions of rows. When we broke out certain aspects of the data into schemas (one per county) the searches become very fast indeed because we can skip the searching for a specific county code with the relevant tables and there is less (unneeded) data in the table being searched. As always, EXPLAIN ANALYZE ... is your friend in understanding what the planner is doing with a given query. See http://www.varlena.com/varlena/GeneralBits/Tidbits/ for some useful information, especially under the performance tips section. HTH, Greg Williamson DBA GlobeXplorer LLC -Original Message- From: Igor Maciel Macaubas [mailto:[EMAIL PROTECTED] Sent: Thu 10/14/2004 11:38 AM To: [EMAIL PROTECTED] Cc: Subject:[PERFORM] Performance vs Schemas Hi all, I recently migrated my database from schema 'public' to multiple schema. I have around 100 tables, and divided them in 14 different schemas, and then adapted my application to use schemas as well. I could percept that the query / insert / update times get pretty much faster then when I was using the old unique schema, and I'd just like to confirm with you if using schemas speed up the things. Is that true ? What else I can do to speed up the query processing, best pratices, recommendations ... ? What about indexed views, does postgresql supports it? Regards, Igor -- [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Cleaning up indexes
If you have set up the postgres instance to write stats, the tables pg_stat_user_indexes, pg_statio_all_indexes and so (use the \dS option at the psql prompt to see these system tables); also check the pg_stat_user_tables table and similar beasts for information on total access, etc. Between these you can get a good idea of what indexes are not being used, and from the sequentail scan info on tables perhaps some idea of what may need some indexes. HTH, Greg Williamson DBA GlobeXplorer LLC -Original Message- From: Martin Foster [mailto:[EMAIL PROTECTED] Sent: Thu 9/23/2004 3:16 PM To: [EMAIL PROTECTED] Cc: Subject:[PERFORM] Cleaning up indexes My database was converted from MySQL a while back and has maintained all of the indexes which were previously used. Tt the time however, there were limitations on the way PostgreSQL handled the indexes compared to MySQL. Meaning that under MySQL, it would make use of a multi-column index even if the rows within did not match.When the conversion was made more indexes were created overall to correct this and proceed with the conversion. Now the time has come to clean up the used indexes. Essentially, I want to know if there is a way in which to determine which indexes are being used and which are not. This will allow me to drop off the unneeded ones and reduce database load as a result. And have things changed as to allow for mismatched multi-column indexes in version 7.4.x or even the upcoming 8.0.x? Martin Foster [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?
FWIW, Informix does allow the fragmentation of data over named dbspaces by round-robin and expression; this is autosupporting as long as the dba keeps enough space available. You may also fragment the index although there are some variations depending on type of Informix (XPS, etc.); this is available in at least 9.3 ... I have never used the index fragmentation as its own beast, but the fragmenting of data works like a charm for spreadling load over more disks. Greg Williamson DBA GlobeXplorer LLC -Original Message- From: Gaetano Mendola [mailto:[EMAIL PROTECTED] Sent: Thursday, August 26, 2004 2:10 PM To: Bruce Momjian; [EMAIL PROTECTED] Subject: Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX? Bruce Momjian wrote: How do vendors actually implement auto-clustering? I assume they move rows around during quiet periods or have lots of empty space in each value bucket. --- IIRC informix doesn't have it, and you have to recluster periodically the table. After having clustered the table with an index in order to recluster the table with another index you have to release the previous one ( ALTER index TO NOT CLUSTER ), the CLUSTER is an index attribute and each table can have only one index with that attribute ON. Regards Gaetano Mendola ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] postgresql performance with multimedia
Not sure about the overall performance, etc. but I think that in order to collect statistics you need to set some values in the postgresql.conf config file, to wit: #--- # RUNTIME STATISTICS #--- # - Statistics Monitoring - #log_parser_stats = false #log_planner_stats = false #log_executor_stats = false #log_statement_stats = false # - Query/Index Statistics Collector - stats_start_collector = true stats_command_string = true stats_block_level = true stats_row_level = true #stats_reset_on_server_start = true If the appropriate values aren't set this could account for why no entries are found in the pg_stat tables. The manual has details on these; you'll need to reload postgres to make any changes effective. Greg -Original Message- From: my ho [mailto:[EMAIL PROTECTED] Sent: Tue 8/24/2004 11:54 PM To: [EMAIL PROTECTED] Cc: Jan Wieck Subject:Re: [PERFORM] postgresql performance with multimedia Tom Lane answered to that question. The code in question does resolve localhost with getaddrinfo() and then tries to create and bind a UDP socket to all returned addresses. For some reason localhost on your system resolves to an address that is not available for bind(2). I tried to put my_ip instead of localhost in bufmng.c and it seems to work (no more complaining). However i check the pg_statio_all_tables and dont see any recorded statistic at all. (all the columns are '0') some time postmaster shut down with this err msg: LOG: statistics collector process (process_id) exited with exit code 1 i starts postmaster with this command: postmaster -i -p $PORT -D $PGDATA -k $PGDATA -N 32 -B 64 -o -s btw, what i want to ask here is does postgreSQL have any kind of read-ahead buffer implemented? 'cos it would be useful in multimedia case when we always scan the large table for continous data. Since there is no mechanism to control that data is stored contiguously in the tables, what would that be good for? i thought that rows in the table will be stored contiguously? in that case, if the user is requesting 1 row, we make sure that the continue rows are ready in the buffer pool so that when they next requested, they wont be asked to read from disk. For multimedia data, this is important 'cos data needs to be presented continuously without any waiting. thanks again for your help MT Ho __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Bulk Insert and Index use
Usualy any bulk load is faster with indexes dropped and the rebuilt ... failing that (like you really need the indexes while loading, say into a hot table) be sure to wrap all the SQL into one transaction (BEGIN;...COMMIT;) ... if any data failes it all fails, which is usually easier to deal with than partial data loads, and it is *much* faster than having each insert being its own transaction. HTH, Greg Williamson DBA GlobeXplorer LLC -Original Message- From: Rudi Starcevic [mailto:[EMAIL PROTECTED] Sent: Tue 8/10/2004 4:04 PM To: [EMAIL PROTECTED] Cc: Subject:[PERFORM] Bulk Insert and Index use Hi, I have a question on bulk checking, inserting into a table and how best to use an index for performance. The data I have to work with is a monthly CD Rom csv data dump of 300,000 property owners from one area/shire. So every CD has 300,000 odd lines, each line of data which fills the 'property' table. Beginning with the first CD each line should require one SELECT and one INSERT as it will be the first property with this address. The SELECT uses fields like 'street' and 'suburb', to check for an existing property, so I have built an index on those fields. My question is does each INSERT rebuild the index on the 'street' and 'suburb' fields? I believe it does but I'm asking to be sure. If this is the case I guess performance will suffer when I have, say, 200,000 rows in the table. Would it be like: a) Use index to search on 'street' and 'suburb' b) No result? Insert new record c) Rebuild index on 'street' and 'suburb' for each row? Would this mean that after 200,000 rows each INSERT will require the index of 000's of rows to be re-indexed? So far I believe my only options are to use either and index or sequential scan and see which is faster. A minute for your thoughts and/or suggestions would be great. Thanks. Regards, Rudi. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Bulk Insert and Index use
If it has to read a majority (or even a good percentage) of the rows in question a sequential scan is probably faster ... and as Jim pointed out, a temp table can often be a useful medium for getting speed in a load and then allowing you to clean/alter data for a final (easy) push. G -Original Message- From: Rudi Starcevic [mailto:[EMAIL PROTECTED] Sent: Tue 8/10/2004 8:33 PM To: [EMAIL PROTECTED] Cc: Subject:Re: [PERFORM] Bulk Insert and Index use Hi Jim, Thanks for your time. If the bulk load has the possibility of duplicating data Yes, each row will require either: a) One SELECT + One INSERT or b) One SELECT + One UPDATE I did think of using more than one table, ie. temp table. As each month worth of data is added I expect to see a change from lots of INSERTS to lots of UPDATES. Perhaps when the UPDATES become more dominant it would be best to start using Indexes. While INSERTS are more prevelant perhaps a seq. scan is better. I guess of all the options available it boils down to which is quicker for my data: index or sequential scan. Many thanks. Jim J wrote: If the bulk load has the possibility of duplicating data, then you need to change methods. Try bulk loading into a temp table, index it like the original, eliminate the dups and merge the tables. It is also possible to do an insert from the temp table into the final table like: insert into original (x,x,x) (select temp.1, temp.2, etc from temp left join original on temp.street=original.street where original.street is null) Good Luck Jim Rudi Starcevic wrote: Hi, I have a question on bulk checking, inserting into a table and how best to use an index for performance. The data I have to work with is a monthly CD Rom csv data dump of 300,000 property owners from one area/shire. So every CD has 300,000 odd lines, each line of data which fills the 'property' table. Beginning with the first CD each line should require one SELECT and one INSERT as it will be the first property with this address. The SELECT uses fields like 'street' and 'suburb', to check for an existing property, so I have built an index on those fields. My question is does each INSERT rebuild the index on the 'street' and 'suburb' fields? I believe it does but I'm asking to be sure. If this is the case I guess performance will suffer when I have, say, 200,000 rows in the table. Would it be like: a) Use index to search on 'street' and 'suburb' b) No result? Insert new record c) Rebuild index on 'street' and 'suburb' for each row? Would this mean that after 200,000 rows each INSERT will require the index of 000's of rows to be re-indexed? So far I believe my only options are to use either and index or sequential scan and see which is faster. A minute for your thoughts and/or suggestions would be great. Thanks. Regards, Rudi. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html -- Regards, Rudi. Internet Media Productions ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] General performance questions about postgres on Apple
Why is my name on a mail from Tom Lane ? Really, he knows a *lot* more than I and should get due credit. Seriously, is this the peformance remailer mangling something ? Greg Williamson (the real one) -Original Message- From: Gregory S. Williamson Sent: Sun 6/6/2004 10:46 PM To: Sean Shanny Cc: [EMAIL PROTECTED] Subject:Re: [PERFORM] General performance questions about postgres on Apple In-reply-to: [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Comments: In-reply-to Sean Shanny [EMAIL PROTECTED]message dated Sun, 22 Feb 2004 21:48:54 -0500 Date: Sun, 22 Feb 2004 22:24:29 -0500 Message-ID: [EMAIL PROTECTED] From: Tom Lane [EMAIL PROTECTED] X-Virus-Scanned: by amavisd-new at postgresql.org X-Mailing-List: pgsql-performance Precedence: bulk Sender: [EMAIL PROTECTED] X-imss-version: 2.5 X-imss-result: Passed X-imss-scores: Clean:99.9 C:21 M:2 S:5 R:5 X-imss-settings: Baseline:2 C:2 M:2 S:2 R:2 (0.1500 0.3000) Return-Path: [EMAIL PROTECTED] X-OriginalArrivalTime: 07 Jun 2004 05:27:21.0994 (UTC) FILETIME=[1BC0EEA0:01C44C50] Sean Shanny [EMAIL PROTECTED] writes: We have the following setting for random page cost: random_page_cost = 1# units are one sequential page fetch cost Any suggestions on what to bump it up to? Well, the default setting is 4 ... what measurements prompted you to reduce it to 1? The particular example you showed suggested that the true value on your setup might be 10 or more. Now I would definitely not suggest that you settle on any particular value based on only one test case. You need to try to determine an appropriate average value, bearing in mind that there's likely to be lots of noise in any particular measurement. But in general, setting random_page_cost to 1 is only reasonable when you are dealing with a fully-cached-in-RAM database, which yours isn't. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware recommendations to scale to silly load
Nitpicking -- Perhaps the 4th data line is meant to be: Inserts in separate transactions 2500 inserts/second ^^^ ?? Greg Williamson -Original Message- From: Bruce Momjian [mailto:[EMAIL PROTECTED] Sent: Tue 9/9/2003 8:25 PM To: Matt Clark Cc: Ron Johnson; PgSQL Performance ML Subject:Re: [PERFORM] Hardware recommendations to scale to silly load Matt Clark wrote: Just a data point, but on my Dual Xeon 2.4Gig machine with a 10k SCSI drive I can do 4k inserts/second if I turn fsync off. If you have a battery-backed controller, you should be able to do the same. (You will not need to turn fsync off --- fsync will just be fast because of the disk drive RAM). Am I missing something? I think Ron asked this, but I will too, is that 4k inserts in one transaction or 4k transactions each with one insert? fsync is very much faster (as are all random writes) with the write-back cache, but I'd hazard a guess that it's still not nearly as fast as turning fsync off altogether. I'll do a test perhaps... Sorry to be replying late. Here is what I found. fsync on Inserts all in one transaction 3700 inserts/second Inserts in separate transactions870 inserts/second fsync off Inserts all in one transaction 3700 inserts/second Inserts all in one transaction 2500 inserts/second ECPG test program attached. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings