[PERFORM] Forcing more agressive index scans for BITMAP AND
just wondering if there's a special tweak i can do to force more usage of indexes to do BITMAP ands? I have a table like A int B int C int D int E int F int g int where A/B/C/D/E are indexes There's ~20millions rows in the table. Query are something like this. select * from table where A=X and B = Y and C = Z and D = AA and E = BB the query plan will only pick 2 indexes to do the bitmap. I'm not sure how to tweak the config for it to use more indexes. Box is a celeron 1.7 w/ 768MB ram with shared buffers at 250MB and effective cache size 350MB -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Forcing more agressive index scans for BITMAP AND
On Mon, 7 Apr 2008, Ow Mun Heng wrote: just wondering if there's a special tweak i can do to force more usage of indexes to do BITMAP ands? There's no need to post this again. You have already had a couple of useful answers. Matthew -- All of this sounds mildly turgid and messy and confusing... but what the heck. That's what programming's all about, really -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Looking for bottleneck during load test
We have a PostgreSQL 8.2.7 database (~230 GB) running on a machine with 8 Intel Xeon Cores and 32 GB RAM (64-bit Linux 2.6.18). Data is stored on an EMC² CLARiiON on RAID 1/0 (8 x 146 GB 15k rpm). When we do random I/O with a small test tool (reading random 8k blocks from big files in 200 threads) on the disk we retrieve data with about 25 MB/s. For testing purpose a test set of about 700.000 queries (those were logged during a problem situation) are executed against the database in 180 concurrent threads. Some of the queries are very small and fast - other ones read more than 5 blocks. All queries are selects (using cursors) - there is only read activity on the database. By setting tuple_fraction for cursors to 0.0 instead of 0.1 (http://archives.postgresql.org/pgsql-performance/2008-04/msg00018.php) we reduced reads during the test (pg_statio_all_tables): - 8.2.7 reads from disk: 4.395.276, reads from cache: 471.575.925 - 8.2.7 cursor_tuple_fraction=0.0 Reads from disk: 3.406.164, reads from cache: 37.924.625 But the duration of the test was only reduced by 18 % (from 110 minutes to 90 minutes). When running the test with tuple_fraction=0.0 we observe the following on the server: - avg read from disk is at 7 MB/s - when we start the random I/O tool during the test we again read data with about 25 MB/s from disk (for me it seems that disk isn't the bottleneck) - cpu time is divided between idle and iowait - user and system cpu are practically zero - there are from 5000 to 1 context switches per second I can't see a bottleneck here. Does anyone has an explanation for that behavior? Regards, Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looking for bottleneck during load test
Hell, Robert wrote: We have a PostgreSQL 8.2.7 database (~230 GB) running on a machine with 8 Intel Xeon Cores and 32 GB RAM (64-bit Linux 2.6.18). Data is stored on an EMC² CLARiiON on RAID 1/0 (8 x 146 GB 15k rpm). When we do random I/O with a small test tool (reading random 8k blocks from big files in 200 threads) on the disk we retrieve data with about 25 MB/s. How do you test random IO? Do you use this utility: http://arctic.org/~dean/randomio/ ? If not, try using it, with same parameters. It might be that the latency is destroying your performance. Do you use NFS or are you accessing the storage as SAN? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [SOLVED] [PERFORM] Query plan excluding index on view
Removing the constants definitely did take care of the issue on 8.3 (still same query plan on 8.1). Thanks for your help in getting this resolved, and sorry again for not including all relevant information on my initial request On Fri, Apr 4, 2008 at 10:20 PM, Tom Lane [EMAIL PROTECTED] wrote: Matt Klinker [EMAIL PROTECTED] writes: --Joined View: CREATE OR REPLACE VIEW directory_listing AS SELECT school.id, school.name, school.description, 119075291 AS listing_type_fid FROM school UNION ALL SELECT company.id, company.name, company.description, 119074833 AS listing_type_fid FROM company; Ah, there's the problem :-(. Can you get rid of the constants here? The planner's currently not smart about UNION ALL subqueries unless their SELECT lists contain just simple column references. (Yes, fixing that is on the todo list, but don't hold your breath... it'll be 8.4 material at the earliest.) regards, tom lane
[PERFORM] Severe performance problems for simple query
Hi folks, Here is the executive summary: * RHEL5 (postgresql 8.1, .conf tweaked for performance [1]) * 2x Intel E5410 @ 2.33GHz (8 cores), 8GB RAM, 15KRPM SAS disks * 4.9 million records in a table (IP address info) * composite primary key: primary key(ipFrom, ipTo) * ipFrom/ipTo are int8 (see below for full schema info [2]) * bad performance on queries of the form: select * from ipTable where ipFrom = val and val = ipTo Part of the problem is that most of the time PostgreSQL decides to use seq scans on the table, resulting in queries taking many seconds (sometimes 3, 7, 20 sec). We did ANALYZE and enabled statistics, and that sometimes fixes the problem temporarily, but overnight (without the database being used), it reverts to seq scans. For example: perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where ipfrom = 2130706433 and 2130706433 = ipto; QUERY PLAN --- Seq Scan on ipligenceipaddress (cost=0.00..139903.80 rows=1209530 width=145) (actual time=1233.628..2100.891 rows=1 loops=1) Filter: ((ipfrom = 2130706433) AND (2130706433 = ipto)) Total runtime: 2100.928 ms (3 rows) Moreover, even when it is using the index, it is not all that fast: perpedes_db=# SET enable_seqscan = off; SET perpedes_db=# EXPLAIN ANALYZE select * from static.ipligenceipaddress where 3507360727 between ipfrom and ipto; QUERY PLAN Index Scan using ipligenceipaddress_pkey on ipligenceipaddress (cost=0.00..148143.67 rows=806199 width=146) (actual time=351.316..351.320 rows=1 loops=1) Index Cond: ((3507360727::bigint = ipfrom) AND (3507360727::bigint = ipto)) Total runtime: 351.355 ms (3 rows) So, my questions are: * did we miss any obvious settings? * why does it decide all of a sudden to do seq scans? * adding a limit 1 sometimes causes the query to be even slower, when in fact it should have helped the DB to return faster, no? * in the ideal case, what execution times should I be expecting? Is ~400ms reasonable? I would have hoped this to be 40ms... * AFAICT, the (ipFrom, ipTo) intervals should be mutually exclusive, so the result should be at most one row. Can this info help the DB do a faster query? If so, how can I express that? * the DB takes tens of minutes to do an ANALYZE on this table, which doesn't happen with the default configuration. Any idea how I can fix that? Thank you! [1] Changes from standard config: --- /var/lib/pgsql/data/postgresql.conf.orig2008-03-21 11:51:45.0 -0400 +++ /var/lib/pgsql/data/postgresql.conf 2008-03-21 21:04:38.0 -0400 @@ -90,19 +90,19 @@ # - Memory - -shared_buffers = 1000 # min 16 or max_connections*2, 8KB each -#temp_buffers = 1000 # min 100, 8KB each -#max_prepared_transactions = 5 # can be 0 or more +shared_buffers = 5 # min 16 or max_connections*2, 8KB each +temp_buffers = 1 # min 100, 8KB each +max_prepared_transactions = 100# can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). -#work_mem = 1024 # min 64, size in KB -#maintenance_work_mem = 16384 # min 1024, size in KB +work_mem = 2048# min 64, size in KB +maintenance_work_mem = 131072 # min 1024, size in KB #max_stack_depth = 2048# min 100, size in KB # - Free Space Map - -#max_fsm_pages = 2 # min max_fsm_relations*16, 6 bytes each -#max_fsm_relations = 1000 # min 100, ~70 bytes each +max_fsm_pages = 20 # min max_fsm_relations*16, 6 bytes each +max_fsm_relations = 1 # min 100, ~70 bytes each # - Kernel Resource Usage - @@ -111,11 +111,11 @@ # - Cost-Based Vacuum Delay - -#vacuum_cost_delay = 0 # 0-1000 milliseconds -#vacuum_cost_page_hit = 1 # 0-1 credits +vacuum_cost_delay = 200# 0-1000 milliseconds +vacuum_cost_page_hit = 6 # 0-1 credits #vacuum_cost_page_miss = 10# 0-1 credits #vacuum_cost_page_dirty = 20 # 0-1 credits -#vacuum_cost_limit = 200 # 0-1 credits +vacuum_cost_limit = 100
Re: [PERFORM] Partitioned tables - planner wont use indexes
kevin kempter wrote: One of the things we need to query is the min date from the master table - we may explore alternatives for this particular query, however even if we fix this query I think we have a fundamental issue with the use of indexes (actuallt the non-use) by the planner. We had a similar requirement, so I've been using a function that loops over the child tables, and queries for the min date from each. If all you need is the date, you can try a function call. Here is a modified version of what I've been using: CREATE OR REPLACE function get_min_date() RETURNS DATE as $_$ DECLARE x RECORD; min_date DATE; min_date_tmp DATE; qry TEXT; BEGIN /* can also test MIN() aggregate, rather than ORDER BY/LIMIT */ FOR x IN EXECUTE 'select tablename from pg_tables where tablename like ''part_20%''' loop qry := 'SELECT logdate FROM '||x.tablename||' ORDER BY logdate LIMIT 1'; EXECUTE qry INTO min_date_tmp; IF (min_date IS NULL OR (min_date_tmp IS NOT NULL AND min_date_tmpmin_date)) THEN min_date := min_date_tmp; END IF; END LOOP; RETURN min_date; END; $_$ language plpgsql immutable; -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Severe performance problems for simple query
On Mon, 7 Apr 2008, Dimi Paun wrote: * bad performance on queries of the form: select * from ipTable where ipFrom = val and val = ipTo This type of query is very hard for a normal B-tree index to answer. For example, say val is half-way between min and max values. If you have an index on ipFrom, it will be able to restrict the entries to about half of them, which is no real benefit over a sequential scan. Likewise, an index on ipTo will be able to restrict the entries to half of them, with no benefit. The intersection of these two halves may be just one entry, but finding that out is non-trivial. An index bitmap scan would do it if you can persuade Postgres to do that, but really you want an R-tree index on the two columns, like I have requested in the past. You can achieve that to some degree by using Postgres' geometric indexes, but it's ugly. Note that the following is completely untested and may not work with int8 values. Firstly, you need to create the index. The index will contain fake boxes that stretch from ipFrom to ipTo. CREATE INDEX index_name ON table_name ((box '((ipFrom, 0), (ipTo, 1))')) Then, instead of querying simply for fromIp and toIp, query on whether the fake box overlaps with a point representing val. SELECT blah FROM table_name WHERE (box '((ipFrom, 0), (ipTo, 2))') @ (point '(val, 1)'); Or alternatively you could adapt the seg GiST index to int8 values. Hope you get this sorted out - it's something I'll have to do at some point soon too. Matthew -- I wouldn't be so paranoid if you weren't all out to get me!! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Severe performance problems for simple query
On Mon, 7 Apr 2008, Dimi Paun wrote: * bad performance on queries of the form: select * from ipTable where ipFrom = val and val = ipTo Oh yes, if you can guarantee that no two entries overlap at all, then there is a simpler way. Just create a B-tree index on ipFrom as usual, sort by ipFrom, and LIMIT to the first result: SELECT blah FROM table_name WHERE ipFrom = 42 ORDER BY ipFrom DESC LIMIT 1 This should run *very* quickly. However, if any entries overlap at all then you will get incorrect results. Matthew -- I'm always interested when [cold callers] try to flog conservatories. Anyone who can actually attach a conservatory to a fourth floor flat stands a marginally better than average chance of winning my custom. (Seen on Usenet) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Severe performance problems for simple query
Matthew wrote: On Mon, 7 Apr 2008, Dimi Paun wrote: * bad performance on queries of the form: select * from ipTable where ipFrom = val and val = ipTo This type of query is very hard for a normal B-tree index to answer. For example, say val is half-way between min and max values. If you have an index on ipFrom, it will be able to restrict the entries to about half of them, which is no real benefit over a sequential scan. Likewise, an index on ipTo will be able to restrict the entries to half of them, with no benefit. The intersection of these two halves may be just one entry, but finding that out is non-trivial. An index bitmap scan would do it if you can persuade Postgres to do that, but really you want an R-tree index on the two columns, like I have requested in the past. If I understood the original post correctly, the ipFrom and ipTo columns actually split a single linear ip address space into non-overlapping chunks. Something like this: ipFrom ipTo 1 10 10 20 20 50 50 60 ... In that case, a regular index on (ipFrom, ipTo) should work just fine, and that's what he's got. Actually, an index on just ipFrom would probably work just as well. The problem is that the planner doesn't know about that special relationship between ipFrom and ipTo. Perhaps it could be hinted by explicitly specifying AND ipTo ipFrom in the query? I don't know why the single index lookup took 300ms, though. That does seem high to me. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Severe performance problems for simple query
On Mon, 2008-04-07 at 17:32 +0100, Heikki Linnakangas wrote: If I understood the original post correctly, the ipFrom and ipTo columns actually split a single linear ip address space into non-overlapping chunks. Something like this: ipFrom ipTo 1 10 10 20 20 50 50 60 ... Indeed. In that case, a regular index on (ipFrom, ipTo) should work just fine, and that's what he's got. Actually, an index on just ipFrom would probably work just as well. No, it doesn't: perpedes_db=# CREATE INDEX temp1 ON static.ipligenceipaddress (ipFrom); CREATE INDEX perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where ipfrom = 2130706433 and 2130706433 = ipto limit 1; QUERY PLAN -- Limit (cost=0.00..0.07 rows=1 width=145) (actual time=1519.526..1519.527 rows=1 loops=1) - Index Scan using temp1 on ipligenceipaddress (cost=0.00..84796.50 rows=1209308 width=145) (actual time=1519.524..1519.524 rows=1 loops=1) Index Cond: (ipfrom = 2130706433) Filter: (2130706433 = ipto) Total runtime: 1519.562 ms (5 rows) This is huge, I'd say... The problem is that the planner doesn't know about that special relationship between ipFrom and ipTo. Perhaps it could be hinted by explicitly specifying AND ipTo ipFrom in the query? Unfortunately, it still does a seq scan: perpedes_db=# SET enable_seqscan = on; SET perpedes_db=# explain ANALYZE select * from static.ipligenceipaddress where ipfrom = 2130706433 and 2130706433 = ipto AND ipTo ipFrom limit 1; QUERY PLAN Limit (cost=0.00..0.35 rows=1 width=145) (actual time=1245.293..1245.294 rows=1 loops=1) - Seq Scan on ipligenceipaddress (cost=0.00..142343.80 rows=403103 width=145) (actual time=1245.290..1245.290 rows=1 loops=1) Filter: ((ipfrom = 2130706433) AND (2130706433 = ipto) AND (ipto ipfrom)) Total runtime: 1245.335 ms (4 rows) I don't know why the single index lookup took 300ms, though. That does seem high to me. That is my feeling. I would have expected order of magnitude faster execution times, the DB runs on fairly decent hardware... -- Dimi Paun [EMAIL PROTECTED] Lattica, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Severe performance problems for simple query
On Mon, 7 Apr 2008, Heikki Linnakangas wrote: In that case, a regular index on (ipFrom, ipTo) should work just fine, and that's what he's got. Actually, an index on just ipFrom would probably work just as well. The problem is that the planner doesn't know about that special relationship between ipFrom and ipTo. Perhaps it could be hinted by explicitly specifying AND ipTo ipFrom in the query? Actually, the problem is that the database doesn't know that the entries don't overlap. For all it knows, you could have data like this: 0 10 10 20 20 30 ... ten million rows later 10030 10040 10040 10050 0 10050 So say you wanted to search for the value of 50,000,000. The index on ipFrom would select five million rows, all of which then have to be filtered by the constraint on ipTo. Likewise, an index on ipTo would return five million rows, all of which then have to be filtered by the constraint on ipFrom. If you just read the index and took the closest entry to the value, then you would miss out on the last entry which overlaps with the whole range. An R-tree on both fields will correctly find the small set of entries that are relevant. It would be very cool to be able to create an R-tree index that would just make the original query run fast without needing alteration. I had a look at this a while back, but it is not currently possible in GiST, because only one field is handed to the index at a time. So all the current R-tree implementations require that you generate an object containing the two values, like the box, and then index that. Something for 8.4? Matthew -- $ rm core Segmentation Fault (core dumped) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] performance using table partitions in Postgres 8.2.6
Hi. I am looking for information about using table partitions in Postgres, in particular w.r.t. performance when querying a partitioned table. I implemented table partitioning following the documentation, which is quite good and easy to follow (Chapter 5.9). I am doing some testing, so at this point my test dataset has 1M records; in the table, each row will have an autoincrement integer index, two numerics, one integer (DMID), and one smallint. I created five master tables and created a different number of partitions for each one (2, 4, 8, 16, 32). I am using a range partition for the integer, DMID, which represents a file index. The values of DMID range from 0 to 180360. I also create and index for DMID. I don't understand the timing results that I am getting. I got these times by averaging the results of querying the database from within a loop in a Perl script: no. of partitions constraint_exclusion off constraint_exclusion on 2 0.597 ms0.427 ms 4 0.653 ms0.414 ms 8 0.673 ms 0.654 ms 161.068 ms 1.014 ms 322.301 ms 1.537 ms I expected that the query time would decrease as the number of partitions increases, but that's not what I am seeing. I get better results (0.29 ms) if I simply index DMID and don't use the partitions. When I run explain analyzeon a query, the results (with and without constraint_exclusion set) indicate that fewer partitions are being scanned when constraint_exclusion is set to on. I am testing table partitioning in Postgres against table partitioning using MySQL. The results for MySQL make sense: more partitions, faster query times. The underlying application is a file index. It is expected that groups of files in selected ranges of DMID values will be accessed more often, but this is not the key implementation issue. This is basically a write once, read often database. We expect that the database will grow to 50M records in a few years, and I thought that using range partitions for the DMID value might decrease the query time. Should I be using many more partitions? Am I expecting too much in terms of performance when using partitions? Do these results point to some obvious implementation error? Thank you for any help/suggestions you can give. Janet Jacobsen -- Janet Jacobsen NERSC Analytics/HPCRD Visualization Group Lawrence Berkeley National Laboratory -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance using table partitions in Postgres 8.2.6
A Dilluns 07 Abril 2008, Janet Jacobsen va escriure: no. of partitions constraint_exclusion off constraint_exclusion on 2 0.597 ms0.427 ms 4 0.653 ms0.414 ms 8 0.673 ms 0.654 ms 161.068 ms 1.014 ms 322.301 ms 1.537 ms I expected that the query time would decrease as the number of partitions increases, but that's not what I am seeing. I get better results (0.29 ms) if I simply index DMID and don't use the partitions. I see really small times here so probably the overhead that partitioning imposes isn't worth yet. Maybe with 50M rows it'll help, you could try feeding those 50M tuples and test again. -- Albert Cervera Areny Dept. Informàtica Sedifa, S.L. Av. Can Bordoll, 149 08202 - Sabadell (Barcelona) Tel. 93 715 51 11 Fax. 93 715 51 12 AVISO LEGAL La presente comunicación y sus anexos tiene como destinatario la persona a la que va dirigida, por lo que si usted lo recibe por error debe notificarlo al remitente y eliminarlo de su sistema, no pudiendo utilizarlo, total o parcialmente, para ningún fin. Su contenido puede tener información confidencial o protegida legalmente y únicamente expresa la opinión del remitente. El uso del correo electrónico vía Internet no permite asegurarni la confidencialidad de los mensajes nisucorrecta recepción. Enel caso de que el destinatario no consintiera la utilización del correo electrónico, deberá ponerlo en nuestro conocimiento inmediatamente. ... DISCLAIMER . This message and its attachments are intended exclusively for the named addressee. If you receive this message in error, please immediately delete it from your system and notify the sender. You may not use this message or any part of it for any purpose. The message may contain information that is confidential or protected by law, and any opinions expressed are those of the individualsender. Internet e-mail guarantees neither the confidentiality nor the proper receipt of the message sent. If the addressee of this message does not consent to the use of internete-mail,pleaseinform usinmmediately. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Performance with temporary table
Hi I have written a program that imputes(or rather corrects data) with in my database. Iam using a temporary table where in i put data from other partitoined table. I then query this table to get the desired data.But the thing is this temporary table has to be craeted for every record that i need to correct and there are thousands of such records that need to be corrected. So the program necessarily creates a temporary table evrytime it has to correct a record. However this table is dropeed after each record is corrected. The program works fine.but it runs for a very long timeor it runs for days. Iam particularyly finding that it takes more time during this statement: NOTICE: theQuery in createtablevolumelaneshist CREATE TEMPORARY TABLE predictiontable(lane_id, measurement_start, speed,volume,occupancy) AS SELECT lane_id, measurement_start, speed,volume,occupancy FROM samantha.lane_data_I_495 WHERE lane_id IN (1317) AND measurement_start BETWEEN '2007-11-18 09:25:00' AND 2007-11-19 01:39:06' Iam not sure if i can use a cursor to replicate the functionality of the temp table. Is the performance bad because of the creation and deletion of the temp table? Thanks Samantha -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] what worked: performance improvements for geo-spatial searching on FreeBSD
The http://www.1-800-save-a-pet.com site is hosted with FreeBSD and PostgreSQL and as a geo-spatial search as a central feature. One thing that made a substantial performance improvement was switching from the geo_distance() search in the earthdistance contrib, to use the cube based geo-spatial calculations, also in available in contrib/ In our case, the slight loss of precision between the two methods didn't matter. The other things that made a noticeable performance improvement was upgrading our servers from FreeBSD 4.x or 5.x (old, I know!) to FreeBSD 6.2 or later. We also upgrade these systems from PostgreSQL 8.1 to 8.2 at the same time. I assume the upgrade to 8.2 must be responsible at least in part for the performance gains. The result of these two rounds of updates is that our overall CPU capacity in the cluster seems to be double or triple what it was before. As the site grows we continue to be very happy with the performance, features and stability of PostgreSQL. Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] recommendations for web/db connection pooling or DBD::Gofer reviews
When traffic to our PostgreSQL-backed website spikes, the first resource we see being exhausted is the DB slots on the master server (currently set to about 400). I expect that as new Apache/mod_perl children are being put to us, they are creating new database connections. I'm interested in recommendations to funnel more of that traffic through fewer DB slots, if that's possible. (We could also consider increasing the handles available, since the DB server has some CPU and memory to spare). I'm particularly interested in review of DBD::Gofer, which seems like it would help with this in our Perl application: http://search.cpan.org/dist/DBI/lib/DBD/Gofer.pm I realize it has limitations, like no transactions, but I think we would still able to use it selectively in our application. Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] recommendations for web/db connection pooling or DBD::Gofer reviews
On Mon, 07 Apr 2008 14:36:00 -0400 Mark Stosberg [EMAIL PROTECTED] wrote: I'm particularly interested in review of DBD::Gofer, which seems like it would help with this in our Perl application: http://search.cpan.org/dist/DBI/lib/DBD/Gofer.pm I realize it has limitations, like no transactions, but I think we would still able to use it selectively in our application. I would stick to proven postgresql technologies such as pgbouncer. Sincerely, Joshua D. Drake Mark -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] what worked: performance improvements for geo-spatial searching on FreeBSD
Mark, do you know about our sky segmentation code Q3C, see details http://www.sai.msu.su/~megera/wiki/SkyPixelization We use it for billions objects in database and quite happy. Oleg On Mon, 7 Apr 2008, Mark Stosberg wrote: The http://www.1-800-save-a-pet.com site is hosted with FreeBSD and PostgreSQL and as a geo-spatial search as a central feature. One thing that made a substantial performance improvement was switching from the geo_distance() search in the earthdistance contrib, to use the cube based geo-spatial calculations, also in available in contrib/ In our case, the slight loss of precision between the two methods didn't matter. The other things that made a noticeable performance improvement was upgrading our servers from FreeBSD 4.x or 5.x (old, I know!) to FreeBSD 6.2 or later. We also upgrade these systems from PostgreSQL 8.1 to 8.2 at the same time. I assume the upgrade to 8.2 must be responsible at least in part for the performance gains. The result of these two rounds of updates is that our overall CPU capacity in the cluster seems to be double or triple what it was before. As the site grows we continue to be very happy with the performance, features and stability of PostgreSQL. Mark Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Severe performance problems for simple query
* Dimi Paun: * 4.9 million records in a table (IP address info) You should use the ip4r type for that. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] per-review of PgBouncer / Slony design
I would stick to proven postgresql technologies such as pgbouncer. Thanks for the fast recommendation, Joshua. I'll consider it. Our application is Slony-replicated web/db project with two slaves. Does this design seem sensible? - Run one pgbouncer server on the master, with settings to service the master and both slaves. - We already handle balancing traffic between the slaves separately, so that can remain unchanged. - Use Session Pooling both both the masters and the slaves. In theory, the slaves should just be doing transaction-less SELECT statements, so a more aggressive setting might be possible, but I believe there might be a leak in the logic where we create a temporary table on the slave in one case. - Redirect all application connections through pgbouncer ### From graphs we keep, we can see that the slaves currently use a max of about 64 connections...they are far from maxing out what's possible. So I was trying to think through if made sense to bother using the pgBouncer layer with them. I through of two potential reasons to still use it: - In the event of a major traffic spike on the web servers, pgbouncer would keep the number of db slots under control. - Potentially there's a performance gain in having PgBouncer hold the connections open. Does that analysis seem correct? For the master's pool size, I thought I would just choose a number that's a little larger that the daily max number of DB slots in use. Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looking for bottleneck during load test
I tried different other tools for random IO (including a self written one which does random lseek and read). This tool, started during one of our tests, achieves 2 iops (8k each). Started alone I get something about 1,500 iops with an avg latency of 100 ms. We are using SAN (EMC CLARiiON CX 300) - are those ~7 MB/s really our bottleneck? Any other tuning ideas? Regards, Robert -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ivan Voras Sent: Montag, 07. April 2008 14:38 To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Looking for bottleneck during load test Hell, Robert wrote: We have a PostgreSQL 8.2.7 database (~230 GB) running on a machine with 8 Intel Xeon Cores and 32 GB RAM (64-bit Linux 2.6.18). Data is stored on an EMC² CLARiiON on RAID 1/0 (8 x 146 GB 15k rpm). When we do random I/O with a small test tool (reading random 8k blocks from big files in 200 threads) on the disk we retrieve data with about 25 MB/s. How do you test random IO? Do you use this utility: http://arctic.org/~dean/randomio/ ? If not, try using it, with same parameters. It might be that the latency is destroying your performance. Do you use NFS or are you accessing the storage as SAN? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looking for bottleneck during load test
In response to Hell, Robert [EMAIL PROTECTED]: I tried different other tools for random IO (including a self written one which does random lseek and read). This tool, started during one of our tests, achieves 2 iops (8k each). Started alone I get something about 1,500 iops with an avg latency of 100 ms. We are using SAN (EMC CLARiiON CX 300) - are those ~7 MB/s really our bottleneck? Any other tuning ideas? You know, with all the performance problems people have been bringing up with regard to SANs, I'm putting SAN in the same category as RAID-5 ... -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ivan Voras Sent: Montag, 07. April 2008 14:38 To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Looking for bottleneck during load test Hell, Robert wrote: We have a PostgreSQL 8.2.7 database (~230 GB) running on a machine with 8 Intel Xeon Cores and 32 GB RAM (64-bit Linux 2.6.18). Data is stored on an EMC² CLARiiON on RAID 1/0 (8 x 146 GB 15k rpm). When we do random I/O with a small test tool (reading random 8k blocks from big files in 200 threads) on the disk we retrieve data with about 25 MB/s. How do you test random IO? Do you use this utility: http://arctic.org/~dean/randomio/ ? If not, try using it, with same parameters. It might be that the latency is destroying your performance. Do you use NFS or are you accessing the storage as SAN? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looking for bottleneck during load test
On Mon, 7 Apr 2008, Bill Moran wrote: You know, with all the performance problems people have been bringing up with regard to SANs, I'm putting SAN in the same category as RAID-5 ... Not really fair, because unlike RAID5 it's at least *possible* to get good write performance out of a SAN. Just harder than most people think it is. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Looking for bottleneck during load test
In response to Greg Smith [EMAIL PROTECTED]: On Mon, 7 Apr 2008, Bill Moran wrote: You know, with all the performance problems people have been bringing up with regard to SANs, I'm putting SAN in the same category as RAID-5 ... Not really fair, because unlike RAID5 it's at least *possible* to get good write performance out of a SAN. Just harder than most people think it is. *shrug* in theory it's possible to get good write performance out of RAID-5 as well, with fast enough disks and enough cache ... -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] bulk insert performance problem
Hi, I have a performance problem with a script that does massive bulk insert in 6 tables. When the script starts the performance is really good but will degrade minute after minute and take almost a day to finish! I almost tried everything suggested on this list, changed our external raid array from raid 5 to raid 10, tweaked postgresql.conf to the best of my knowledge, moved pg_xlog to a different array, dropped the tables before running the script. But the performance gain was negligible even after all these changes... IMHO the hardware that we use should be up to the task: Dell PowerEdge 6850, 4 x 3.0Ghz Dual Core Xeon, 8GB RAM, 3 x 300GB SAS 10K in raid 5 for / and 6 x 300GB SAS 10K in raid 10 (MD1000) for PG data, the data filesystem is ext3 mounted with noatime and data=writeback. Running on openSUSE 10.3 with PostgreSQL 8.2.7. The server is dedicated for PostgreSQL... We tested the same script and schema with Oracle 10g on the same machine and it took only 2.5h to complete! What I don't understand is that with Oracle the performance seems always consistent but with PG it deteriorates over time... Any idea? Is there any other improvements I could do? Thanks Christian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bulk insert performance problem
Christian Bourque wrote: Hi, I have a performance problem with a script that does massive bulk insert in 6 tables. When the script starts the performance is really good but will degrade minute after minute and take almost a day to finish! Would I be correct in guessing that there are foreign key relationships between those tables, and that there are significant numbers of indexes in use? The foreign key checking costs will go up as the tables grow, and AFAIK the indexes get a bit more expensive to maintain too. If possible you should probably drop your foreign key relationships and drop your indexes, insert your data, then re-create the indexes and foreign keys. The foreign keys will be rechecked when you recreate them, and it's *vastly* faster to do it that way. Similarly, building an index from scratch is quite a bit faster than progressively adding to it. Of course, dropping the indices is only useful if you aren't querying the tables as you build them. Also, if you're loading data using stored procedures you should avoid the use of exception blocks. I had some major problems with my bulk data conversion code due to overuse of exception blocks creating large numbers of subtransactions behind the scenes and slowing everything to a crawl. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bulk insert performance problem
Craig Ringer wrote: Christian Bourque wrote: Hi, I have a performance problem with a script that does massive bulk insert in 6 tables. When the script starts the performance is really good but will degrade minute after minute and take almost a day to finish! Would I be correct in guessing that there are foreign key relationships between those tables, and that there are significant numbers of indexes in use? The foreign key checking costs will go up as the tables grow, and AFAIK the indexes get a bit more expensive to maintain too. If possible you should probably drop your foreign key relationships and drop your indexes, insert your data, then re-create the indexes and foreign keys. The foreign keys will be rechecked when you recreate them, and it's *vastly* faster to do it that way. Similarly, building an index from scratch is quite a bit faster than progressively adding to it. Of course, dropping the indices is only useful if you aren't querying the tables as you build them. If you are, add analyze commands through the import, eg every 10,000 rows. Then your checks should be a bit faster. The other suggestion would be to do block commits: begin; do stuff for 5000 rows; commit; repeat until finished. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] bulk data loading
Hi all, I need to do a bulk data loading around 704GB (log file size) at present in 8 hrs (1 am - 9am). The data file size may increase 3 to 5 times in future. Using COPY it takes 96 hrs to finish the task. What is the best way to do it ? HARDWARE: SUN THUMPER/ RAID10 OS : SOLARIS 10. DB: Greenplum/Postgres Regards, Srikanth k Potluri +63 9177444783(philippines)
Re: [PERFORM] bulk insert performance problem
I use 1 rows,have big blob 2008-04-08 bitaoxiao 发件人: Chris 发送时间: 2008-04-08 11:35:57 收件人: Christian Bourque 抄送: pgsql-performance@postgresql.org 主题: Re: [PERFORM] bulk insert performance problem Craig Ringer wrote: Christian Bourque wrote: Hi, I have a performance problem with a script that does massive bulk insert in 6 tables. When the script starts the performance is really good but will degrade minute after minute and take almost a day to finish! Would I be correct in guessing that there are foreign key relationships between those tables, and that there are significant numbers of indexes in use? The foreign key checking costs will go up as the tables grow, and AFAIK the indexes get a bit more expensive to maintain too. If possible you should probably drop your foreign key relationships and drop your indexes, insert your data, then re-create the indexes and foreign keys. The foreign keys will be rechecked when you recreate them, and it's *vastly* faster to do it that way. Similarly, building an index from scratch is quite a bit faster than progressively adding to it. Of course, dropping the indices is only useful if you aren't querying the tables as you build them. If you are, add analyze commands through the import, eg every 10,000 rows. Then your checks should be a bit faster. The other suggestion would be to do block commits: begin; do stuff for 5000 rows; commit; repeat until finished. -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance