Re: [PERFORM] Savepoints in transactions for speed?
Mike, Is there anything that the 1.5 million rows have in common that would allow you to use partitions? if so, you could load the new data into a partition at your leisure, start a transaction, alter the partition table with the old data to no longer inherit from the parent, alter the new partition table to inherit from the parent, commit, then drop the old table. This operation would be very fast, the users probably won't even notice. Bob Lunney On Nov 27, 2012, at 4:04 PM, Mike Blackwell mike.blackw...@rrd.com wrote: I need to delete about 1.5 million records from a table and reload it in one transaction. The usual advice when loading with inserts seems to be group them into transactions of around 1k records. Committing at that point would leave the table in an inconsistent state. Would issuing a savepoint every 1k or so records negate whatever downside there is to keeping a transaction open for all 1.5 million records, or just add more overhead? The data to reload the table is coming from a Perl DBI connection to a different database (not PostgreSQL) so I'm not sure the COPY alternative applies here. Any suggestions are welcome. Mike
Re: [PERFORM] SELECT AND AGG huge tables
Houman, Partition by date and revise your processes to create and load a new child table every day. Since you already know the date append it to the table base name and go straight to the data you need. Also, the index on T.c won't help for this query, you're looking at a full table scan every time. Bob Sent from my iPhone On Oct 15, 2012, at 3:59 PM, houmanb hou...@gmx.at wrote: Dear all, We have a DB containing transactional data. There are about *50* to *100 x 10^6* rows in one *huge* table. We are using postgres 9.1.6 on linux with a *SSD card on PCIex* providing us a constant seeking time. A typical select (see below) takes about 200 secs. As the database is the backend for a web-based reporting facility 200 to 500 or even more secs response times are not acceptable for the customer. Is there any way to speed up select statements like this: SELECT SUM(T.x), SUM(T.y), SUM(T.z), AVG(T.a), AVG(T.b) FROM T GROUP BY T.c WHERE T.creation_date=$SOME_DATE; There is an Index on T.c. But would it help to partition the table by T.c? It should be mentioned, that T.c is actually a foreign key to a Table containing a tiny number of rows (15 rows representing different companies). my postgres.conf is actually the default one, despite the fact that we increased the value for work_mem=128MB Thanks in advance Houman -- View this message in context: http://postgresql.1045698.n5.nabble.com/SELECT-AND-AGG-huge-tables-tp5728306.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] database slowdown while a lot of inserts occur
Tomas, You are correct. I was assuming that each insert was issued as an implicit transaction, without the benefit of an explicit BEGIN/COMMIT batching many of them together, as I've seen countless times in tight loops trying to pose as a batch insert. Bob From: Tomas Vondra t...@fuzzy.cz To: pgsql-performance@postgresql.org Sent: Friday, March 30, 2012 8:11 PM Subject: Re: [PERFORM] database slowdown while a lot of inserts occur On 29.3.2012 21:27, Bob Lunney wrote: Lance, May small inserts cause frequent fsyncs. Is there any way those small inserts can be batched into some larger sets of inserts that use copy to perform the load? Not necessarily - fsync happens at COMMIT time, not when the INSERT is performed (unless each INSERT stands on it's own). Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] database slowdown while a lot of inserts occur
Lance, May small inserts cause frequent fsyncs. Is there any way those small inserts can be batched into some larger sets of inserts that use copy to perform the load? Bob Lunney From: Campbell, Lance la...@illinois.edu To: Campbell, Lance la...@illinois.edu; pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Thursday, March 29, 2012 1:02 PM Subject: Re: [PERFORM] database slowdown while a lot of inserts occur I forgot to mention that the slowdown in particular for other applications is when they are trying to insert or update tables unrelated to the application mentioned in my prior application that does the massive small inserts. Thanks, Lance Campbell Software Architect Web Services at Public Affairs 217-333-0382 From:pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Campbell, Lance Sent: Thursday, March 29, 2012 12:59 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] database slowdown while a lot of inserts occur PostgreSQL 9.0.x We have around ten different applications that use the same database. When one particular application is active it does an enormous number of inserts. Each insert is very small. During this time the database seems to slow down in general. The application in question is inserting into a particular table that is not used by the other applications. 1) What should I do to confirm that the database is the issue and not the applications? 2) How can I identify where the bottle neck is occurring if the issue happens to be with the database? I have been using PostgreSQL for eight years. It is an amazing database. Thanks, Lance Campbell Software Architect Web Services at Public Affairs 217-333-0382
Re: [PERFORM] From Simple to Complex
Possibly. What does psql show work_mem; say? Bob Lunney From: Alessandro Gagliardi alessan...@path.com To: pgsql-performance@postgresql.org Sent: Wednesday, February 1, 2012 12:19 PM Subject: Re: [PERFORM] From Simple to Complex Final update on this thread: since it is only necessary for me to get a rough ratio of the distribution (and not the absolute count), I refactored the query to include a subquery that samples from the moments table thus: SELECT moment_id, block_id FROM moments WHERE inserted BETWEEN 'yesterday' AND 'today' ORDER BY RANDOM() LIMIT 1; I also took advantage of another table called blocks that happens to contain the moment_type as well (thus making it so I don't need to reference pg_class). The final query looks like: SELECT moment_type, emotion, COUNT(feedback_id) FROM (SELECT moment_id, block_id FROM moments WHERE inserted BETWEEN 'yesterday' AND 'today' ORDER BY RANDOM() LIMIT 1) AS sample_moments JOIN blocks USING (block_id) JOIN emotions USING (moment_id) GROUP BY moment_type, emotion ORDER BY moment_type, emotion The explain is at http://explain.depesz.com/s/lYh Interestingly, increasing the limit does not seem to increase the runtime in a linear fashion. When I run it with a limit of 6 I get a runtime of 14991 ms. But if I run it with a limit of 7 I get a runtime of 77744 ms. I assume that that's because I'm hitting a memory limit and paging out. Is that right? On Tue, Jan 31, 2012 at 3:43 PM, Alessandro Gagliardi alessan...@path.com wrote: I just got a pointer on presenting EXPLAIN ANALYZE in a more human friendly fashion (thanks, Agent M!): http://explain.depesz.com/s/A9S From this it looks like the bottleneck happens when Postgres does an Index Scan using emotions_moment_id_idx on emotions before filtering on moments.inserted so I thought I'd try filtering on emotions.inserted instead but that only made it worse. At the same time, I noticed that FROM pg_class, moments WHERE moments.tableoid = pg_class.oid tends to run a bit faster than FROM pg_class JOIN moments ON moments.tableoid = pg_class.oid. So I tried: SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments, emotions WHERE moments.tableoid = pg_class.oid AND emotions.inserted 'yesterday' AND moments.inserted BETWEEN 'yesterday' AND 'today' AND emotions.moment_id = moments.moment_id GROUP BY relname, emotion ORDER BY relname, emotion; That was a bit faster, but still very slow. Here's the EXPLAIN: http://explain.depesz.com/s/ZdF On Tue, Jan 31, 2012 at 2:53 PM, Alessandro Gagliardi alessan...@path.com wrote: I changed the query a bit so the results would not change over the course of the day to: SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments JOIN emotions USING (moment_id) WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND moments.tableoid = pg_class.oid GROUP BY relname, emotion ORDER BY relname, emotion;
Re: [PERFORM] Response time increases over time
Otto, Separate the pg_xlog directory onto its own filesystem and retry your tests. Bob Lunney From: Havasvölgyi Ottó havasvolgyi.o...@gmail.com To: Marti Raudsepp ma...@juffo.org Cc: Aidan Van Dyk ai...@highrise.ca; pgsql-performance@postgresql.org Sent: Thursday, December 8, 2011 9:48 AM Subject: Re: [PERFORM] Response time increases over time I have moved the data directory (xlog, base, global, and everything) to an ext4 file system. The result hasn't changed unfortuately. With the same load test the average response time: 80ms; from 40ms to 120 ms everything occurs. This ext4 has default settings in fstab. Have you got any other idea what is going on here? Thanks, Otto 2011/12/8 Marti Raudsepp ma...@juffo.org On Thu, Dec 8, 2011 at 06:37, Aidan Van Dyk ai...@highrise.ca wrote: Let me guess, debian squeeze, with data and xlog on both on a single ext3 filesystem, and the fsync done by your commit (xlog) is flushing all the dirty data of the entire filesystem (including PG data writes) out before it can return... This is fixed with the data=writeback mount option, right? (If it's the root file system, you need to add rootfsflags=data=writeback to your kernel boot flags) While this setting is safe and recommended for PostgreSQL and other transactional databases, it can cause garbage to appear in recently written files after a crash/power loss -- for applications that don't correctly fsync data to disk. Regards, Marti
Re: [PERFORM] poor execution plan because column dependence
Zito, Using psql log in as the database owner and run analyze verbose. Happiness will ensue. Also, when requesting help with a query its important to state the database version (select version();) and what, if any, configuration changes you have made in postgresql.conf. Listing ony the ones that have changed is sufficient. Finally, the wiki has some good information on the care and feeding of a PostgreSQL database: http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT Bob Lunney --- On Tue, 4/12/11, Václav Ovsík vaclav.ov...@i.cz wrote: From: Václav Ovsík vaclav.ov...@i.cz Subject: [PERFORM] poor execution plan because column dependence To: pgsql-performance@postgresql.org Date: Tuesday, April 12, 2011, 7:23 PM Hi, I have done migration of the Request Tracker 3.8.9 (http://requesttracker.wikia.com/wiki/HomePage) from Mysql to PostgreSQL in testing environment. The RT schema used can be viewed at https://github.com/bestpractical/rt/blob/3.8-trunk/etc/schema.Pg. I have added full text search on table Attachments based on trigrams (and still experimenting with it), but is is not interesting for the problem (the problem is not caused by it directly). The full text search alone works quite good. A user testing a new RT instance reported a poor performance problem with a bit more complex query (more conditions resulting in table joins). Queries are constructed by module DBIx::SearchBuilder. The problematic query logged: rt=# EXPLAIN ANALYZE SELECT DISTINCT main.* FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = 'RT::Ticket') AND (main.Status != 'deleted') AND (main.Status = 'resolved' AND main.LastUpdated '2008-12-31 23:00:00' AND main.Created '2005-12-31 23:00:00' AND main.Queue = '15' AND ( Attachments_2.trigrams @@ text_to_trgm_tsquery('uir') AND Attachments_2.Content ILIKE '%uir%' ) ) AND (main.Type = 'ticket') AND (main.EffectiveId = main.id) ORDER BY main.id ASC; QUERY PLAN - Unique (cost=23928.60..23928.67 rows=1 width=162) (actual time=5201.139..5207.965 rows=649 loops=1) - Sort (cost=23928.60..23928.61 rows=1 width=162) (actual time=5201.137..5201.983 rows=5280 loops=1) Sort Key: main.effectiveid, main.issuestatement, main.resolution, main.owner, main.subject, main.initialpriority, main.finalpriority, main.priority, main.timeestimated, main.timeworked, main.timeleft, main.told, main.starts, main.started, main.due, main.resolved, main.lastupdatedby, main.lastupdated, main.creator, main.created, main.disabled Sort Method: quicksort Memory: 1598kB - Nested Loop (cost=0.00..23928.59 rows=1 width=162) (actual time=10.060..5120.834 rows=5280 loops=1) - Nested Loop (cost=0.00..10222.38 rows=1734 width=166) (actual time=8.702..1328.970 rows=417711 loops=1) - Seq Scan on tickets main (cost=0.00..5687.88 rows=85 width=162) (actual time=8.258..94.012 rows=25410 loops=1) Filter: (((status)::text 'deleted'::text) AND (lastupdated '2008-12-31 23:00:00'::timestamp without time zone) AND (created '2005-12-31 23:00:00'::timestamp without time zone) AND (effectiveid = id) AND (queue = 15) AND ((type)::text = 'ticket'::text) AND ((status)::text = 'resolved'::text)) - Index Scan using transactions1 on transactions transactions_1 (cost=0.00..53.01 rows=27 width=8) (actual time=0.030..0.039 rows=16 loops=25410) Index Cond: (((transactions_1.objecttype)::text = 'RT::Ticket'::text) AND (transactions_1.objectid = main.effectiveid)) - Index Scan using attachments2 on attachments attachments_2 (cost=0.00..7.89 rows=1 width=4) (actual time=0.008..0.009 rows=0 loops=417711) Index Cond: (attachments_2.transactionid = transactions_1.id) Filter: ((attachments_2.trigrams @@ '''uir'''::tsquery) AND (attachments_2.content ~~* '%uir%'::text)) Total runtime: 5208.149 ms (14 rows) The above times are for already
Re: [PERFORM] Slow deleting tables with foreign keys
Jeremy, Does table_revision have a unique index on id? Also, I doubt these two indexes ever get used: CREATE INDEX idx_crs_action_expired_created ON table_version.bde_crs_action_revision USING btree (_revision_expired, _revision_created); CREATE INDEX idx_crs_action_expired_key ON table_version.bde_crs_action_revision USING btree (_revision_expired, audit_id); Bob Lunney --- On Wed, 3/30/11, Jeremy Palmer jpal...@linz.govt.nz wrote: From: Jeremy Palmer jpal...@linz.govt.nz Subject: [PERFORM] Slow deleting tables with foreign keys To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Date: Wednesday, March 30, 2011, 10:16 PM Hi All, I'm trying to delete one row from a table and it's taking an extremely long time. This parent table is referenced by other table's foreign keys, but the particular row I'm trying to delete is not referenced any other rows in the associative tables. This table has the following structure: CREATE TABLE revision ( id serial NOT NULL, revision_time timestamp without time zone NOT NULL DEFAULT now(), start_time timestamp without time zone NOT NULL DEFAULT clock_timestamp(), schema_change boolean NOT NULL, comment text, CONSTRAINT revision_pkey PRIMARY KEY (id) ) WITH ( OIDS=FALSE ); This table is referenced from foreign key by 130 odd other tables. The total number of rows from these referencing tables goes into the hundreds of millions. Each of these tables has been automatically created by script and has the same _revision_created, _revision_expired fields, foreign keys and indexes. Here is an example of one: CREATE TABLE table_version.bde_crs_action_revision ( _revision_created integer NOT NULL, _revision_expired integer, tin_id integer NOT NULL, id integer NOT NULL, sequence integer NOT NULL, att_type character varying(4) NOT NULL, system_action character(1) NOT NULL, audit_id integer NOT NULL, CONSTRAINT pkey_table_version.bde_crs_action_revision PRIMARY KEY (_revision_created, audit_id), CONSTRAINT bde_crs_action_revision__revision_created_fkey FOREIGN KEY (_revision_created) REFERENCES table_version.revision (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT bde_crs_action_revision__revision_expired_fkey FOREIGN KEY (_revision_expired) REFERENCES table_version.revision (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH ( OIDS=FALSE ); ALTER TABLE table_version.bde_crs_action_revision OWNER TO bde_dba; ALTER TABLE table_version.bde_crs_action_revision ALTER COLUMN audit_id SET STATISTICS 500; CREATE INDEX idx_crs_action_audit_id ON table_version.bde_crs_action_revision USING btree (audit_id); CREATE INDEX idx_crs_action_created ON table_version.bde_crs_action_revision USING btree (_revision_created); CREATE INDEX idx_crs_action_expired ON table_version.bde_crs_action_revision USING btree (_revision_expired); CREATE INDEX idx_crs_action_expired_created ON table_version.bde_crs_action_revision USING btree (_revision_expired, _revision_created); CREATE INDEX idx_crs_action_expired_key ON table_version.bde_crs_action_revision USING btree (_revision_expired, audit_id); All of the table have been analysed before I tried to run the query. The fact the all of the foreign keys have a covering index makes me wonder why this delete is taking so long. The explain for delete from table_version.revision where id = 1003 Delete (cost=0.00..1.02 rows=1 width=6) - Seq Scan on revision (cost=0.00..1.02 rows=1 width=6) Filter: (id = 100) I'm running POstgreSQL 9.0.2 on Ubuntu 10.4 Cheers Jeremy __ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or i...@linz.govt.nz) and destroy the original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. __ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] COPY with high # of clients, partitioned table locking issues?
John, Sorry to hear you're struggling with such underpowered hardware. ;-) A little more information would be helpful, though: 1. What version of PG are you running? 2. What are the constraints on the child tables? 3. How many rows does each copy insert? 4. Are these wrapped in transactions? 5. are the child tables created at the same time the copies are taking place? In the same transaction? 6. Are the indexes in place on the child table(s) when the copies are running? Do they have to be to validate the data? 7. What are the configuration settings for the database? (Just the ones changed from the default, please.) 8. Which file system are you running for the database files? Mount options? 9. Are the WAL files on the same file system? Bob Lunney --- On Wed, 3/30/11, Strange, John W john.w.stra...@jpmchase.com wrote: From: Strange, John W john.w.stra...@jpmchase.com Subject: [PERFORM] COPY with high # of clients, partitioned table locking issues? To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Date: Wednesday, March 30, 2011, 4:56 PM Just some information on our setup: - HP DL585 G6 - 4 x AMD Opteron 8435 (24 cores) - 256GB RAM - 2 FusionIO 640GB PCI-SSD (RAID0) - dual 10GB ethernet. - we have several tables that we store calculated values in. - these are inserted by a compute farm that calculates the results and stores them into a partitioned schema (schema listed below) - whenever we do a lot of inserts we seem to get exclusive locks. Is there something we can do to improve the performance around locking when doing a lot of parallel inserts with COPY into? We are not IO bound, what happens is that the copies start to slow down and continue to come in and cause the client to swap, we had hit over 800+ COPYS were in a waiting state, which forced us to start paging heavily creating an issue. If we can figure out the locking issue the copys should clear faster requiring less memory in use. [ 2011-03-30 15:54:55.886 EDT ] 14405 [local] asgprod:4d938288.3845 LOG: process 14405 still waiting for ExclusiveLock on extension of relation 470273 of database 16384 after 5001.894 ms [ 2011-03-30 15:54:55.886 EDT ] 14405 [local] asgprod:4d938288.3845 CONTEXT: COPY reportvalues_part_1931, line 1: 660250 41977959 11917 584573.43642105709 [ 2011-03-30 15:54:55.886 EDT ] 14405 [local] asgprod:4d938288.3845 STATEMENT: COPY reportvalues_part_1931 FROM stdin USING DELIMITERS ' ' [ 2011-03-30 15:54:56.015 EDT ] 7294 [local] asgprod:4d938939.1c7e LOG: process 7294 still waiting for ExclusiveLock on extension of relation 470606 of database 16384 after 5062.968 ms [ 2011-03-30 15:54:56.015 EDT ] 7294 [local] asgprod:4d938939.1c7e CONTEXT: COPY reportvalues_part_1932, line 158: 660729 41998839 887 45000.0 [ 2011-03-30 15:54:56.015 EDT ] 7294 [local] asgprod:4d938939.1c7e STATEMENT: COPY reportvalues_part_1932 FROM stdin USING DELIMITERS ' ' [ 2011-03-30 15:54:56.077 EDT ] 25781 [local] asgprod:4d938556.64b5 LOG: process 25781 still waiting for ExclusiveLock on extension of relation 470606 of database 16384 after 5124.463 ms relation | 16384 | 470606 | | | | | | | | 93/677526 | 14354 | RowExclusiveLock | t relation | 16384 | 470606 | | | | | | | | 1047/4 | 27451 | RowExclusiveLock | t relation | 16384 | 470606 | | | | | | | | 724/58891 | 20721 | RowExclusiveLock | t transactionid | | | | | | 94673393 | | | | 110/502566 | 1506 | ExclusiveLock | t virtualxid | | | | | 975/92 | | | | | 975/92 | 25751 | ExclusiveLock | t extend | 16384 | 470606 | | | | | | | | 672/102043 | 20669 | ExclusiveLock | f extend | 16384 | 470606 | | | | | | | | 1178/10 | 6074 | ExclusiveLock | f virtualxid | | | | | 37/889225 | | | | | 37/889225 | 4623 | ExclusiveLock | t relation | 16384 | 405725 | | | | | | | | 39/822056 | 32502 | AccessShareLock | t transactionid | | | | | | 94673831 | | | | 917/278 | 23134 | ExclusiveLock
Re: [PERFORM] Really really slow select count(*)
Ross, Way off topic now, but from my time programming electrical meters I can tell you pulling the meter from its socket is potentially an extremely dangerous thing to do. If there is a load across the meter's poles the spark that results on disconnect could kill the puller instantly. (You don't want to know what happens if the person isn't killed.) I don't know what property your admin type is trying to protect, but I'm inclined to let it burn and live to work through the insurance collection process. Oh, and +1 for timed escalation of a shutdown. Bob Lunney --- On Wed, 2/16/11, Ross J. Reedstrom reeds...@rice.edu wrote: From: Ross J. Reedstrom reeds...@rice.edu Subject: Re: [PERFORM] Really really slow select count(*) big snip Following you off topic, I know of one admin type who has stated I don't care what sort of fine the power company wants to give me, if my property's on fire, I'm going to pull the meter, in order to hand it to the first responder, rather than have them sit there waiting for the power tech to arrive while my house burns. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] anti-join chosen even when slower than old plan
--- On Thu, 11/11/10, Mladen Gogala mladen.gog...@vmsinfo.com wrote: From: Mladen Gogala mladen.gog...@vmsinfo.com Subject: Re: [PERFORM] anti-join chosen even when slower than old plan To: Kenneth Marshall k...@rice.edu Cc: Robert Haas robertmh...@gmail.com, Tom Lane t...@sss.pgh.pa.us, Kevin Grittner kevin.gritt...@wicourts.gov, pgsql-performance@postgresql.org pgsql-performance@postgresql.org Date: Thursday, November 11, 2010, 9:15 AM Kenneth Marshall wrote: I agree with the goal of avoiding the need for a GUC. This needs to be as automatic as possible. One idea I had had was computing a value for the amount of cache data in the system by keeping a sum or a weighted sum of the table usage in the system. Smaller tables and indexes would contribute a smaller amount to the total, while larger indexes and tables would contribute a larger amount. Then by comparing this running total to the effective_cache_size, set the random and sequential costs for a query. This would allow the case of many 4MB tables to favor disk I/O more than memory I/O. The weighting could be a function of simultaneous users of the table. I know this is a bit of hand-waving but some sort of dynamic feedback needs to be provided to the planning process as system use increases. Regards, Ken Kenneth, you seem to be only concerned with the accuracy of the planning process, not with the plan stability. As a DBA who has to monitor real world applications, I find things like an execution plan changing with the use of the system to be my worst nightmare. The part where you say that this needs to be as automatic as possible probably means that I will not be able to do anything about it, if the optimizer, by any chance, doesn't get it right. That looks to me like an entirely wrong way to go. When application developer tunes the SQL both him and me expect that SQL to always perform that way, not to change the execution plan because the system is utilized more than it was 1 hour ago. Nobody seems to have taken my suggestion about having a parameter which would simply invent the percentage out of thin air seriously, because it's obviously not accurate. However, the planner accuracy is not the only concern. Running applications on the system usually requires plan stability. Means of external control of the execution plan, DBA knobs and buttons that can be turned and pushed to produce the desired plan are also very much desired. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com Mladen, Been there, done that with Oracle for more years than I care to remember or admit. Having the necessary knobs was both daunting and a godsend, depending on if you could find the right one(s) to frob during production use, and you turned them the right way and amount. I personally find having less knobbage with PostgreSQL to be a huge benefit over Oracle. In that spirit, I offer the following suggestion: (Ken's original suggestion inspired me, so if I misunderstand it, Ken, please correct me.) What if the code that managed the shared buffer cache kept track of how many buffers were in the cache for each table and index? Then the optimizer could know the ratio of cached to non-cached table of index buffers (how many pages are in PG's buffer cache vs. the total number of pages required for the entire table, assuming autovacuum is working well) and plan accordingly. It would even be possible to skew the estimate based on the ratio of shared_buffers to effective_cache_size. The optimizer could then dynamically aadjust the random and sequential costs per query prior to planning, with (hopefully) plans optimized to the current condition of the server and host caches just prior to execution. There are lots of assumptions here, the primary ones being the shared buffer cache's state doesn't change significantly between the start of planning and actual execution time, and the host is dedicated to running the database and nothing else that would trash the host's file system cache. I admit that I haven't looked at the code for this yet, so I don't know if I'm on to something or off in the weeds. Regards, Bob Lunney -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] out of memory problem
Be sure that you are starting PostgreSQL using an account with sufficient memory limits: ulimit -m If the account has memory limit below the server's configuration you may get the out of memory error. Bob Lunney --- On Tue, 11/9/10, Till Kirchner till.kirch...@vti.bund.de wrote: From: Till Kirchner till.kirch...@vti.bund.de Subject: [PERFORM] out of memory problem To: pgsql-performance@postgresql.org Date: Tuesday, November 9, 2010, 5:39 AM Hello together, I get an out of memory problem I don't understand. The installed Postgres-Version is: PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.3-5) 4.3.3 It is running on a 32bit Debian machine with 4GB RAM. Thanks for any help in advance Till -- - Main settings are as follows: checkpoint_segments 16 checkpoint_timeout 120s effective_cache_size 128MB maintenance_work_mem 128MB max_fsm_pages 153600 shared_buffers 1GB wal_buffers 256MB work_mem 256MB -- - Used query is: CREATE TABLE temp.bwi_atkis0809_forestland AS SELECT b.gid AS bwi_gid, a.dlm0809id, a.objart_08, a.objart_09 FROM bwi.bwi_pkt AS b, atkis.atkis0809_forestland AS a WHERE b.the_geom a.the_geom AND ST_Within(b.the_geom, a.the_geom) ; COMMIT; (The JOIN is a Spatial one using PostGIS-Functions) -- - Full Table Sizes: atkis0809_forestland 2835mb bwi_pkt 47mb -- - Error Message is: FEHLER: Speicher aufgebraucht DETAIL: Fehler bei Anfrage mit Größe 32. ** Fehler ** FEHLER: Speicher aufgebraucht SQL Status:53200 Detail:Fehler bei Anfrage mit Größe 32. in english: ERROR: out of memory detail: error for request with size 32 -- - The LOG looks as follows: TopMemoryContext: 42800 total in 5 blocks; 4816 free (5 chunks); 37984 used CFuncHash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used TopTransactionContext: 8192 total in 1 blocks; 5520 free (0 chunks); 2672 used Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used Operator lookup cache: 24576 total in 2 blocks; 14072 free (6 chunks); 10504 used MessageContext: 65536 total in 4 blocks; 35960 free (10 chunks); 29576 used smgr relation table: 8192 total in 1 blocks; 2808 free (0 chunks); 5384 used TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used PortalHeapMemory: 1024 total in 1 blocks; 896 free (0 chunks); 128 used ExecutorState: 1833967692 total in 230 blocks; 9008 free (3 chunks); 1833958684 used GiST temporary context: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 8192 total in 1 blocks; 8176 free (9 chunks); 16 used ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used ExprContext: 8192 total in 1 blocks; 3880 free (4 chunks); 4312 used Relcache by OID: 8192 total in 1 blocks; 2856 free (0 chunks); 5336 used CacheMemoryContext: 667472 total in 20 blocks; 195408 free (3 chunks); 472064 used pg_toast_12241534_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used pg_shdepend_depender_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_shdepend_reference_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used pg_depend_reference_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used idx_atkis0809_forestland_the_geom_gist: 1024 total in 1 blocks; 136 free (0 chunks); 888 used atkis0809_forestland_pkey: 1024 total in 1 blocks; 344 free (0 chunks); 680 used btree_bwi_pkt_enr: 1024 total in 1 blocks; 344 free (0 chunks); 680 used btree_bwi_pkt_tnr: 1024 total in 1 blocks; 344 free (0 chunks); 680 used rtree_bwi_pkt: 1024 total in 1 blocks; 136 free (0 chunks); 888 used bwi_pkt_pkey: 1024 total in 1 blocks; 344 free (0 chunks); 680 used pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used pg_constraint_conrelid_index: 1024 total in 1 blocks
Re: [PERFORM] Memory usage - indexes
Tobias, Consult pg_statio_user_indexes to see which indexes have been used and how much. Indexes with comparitively low usages rates aren't helping you much and are candidates for elimination. Also, partitioning large tables can help, since the indexes on each partition are smaller than one huge index on the original table. Good luck! Bob Lunney --- On Thu, 9/23/10, Tobias Brox tobi...@gmail.com wrote: From: Tobias Brox tobi...@gmail.com Subject: [PERFORM] Memory usage - indexes To: pgsql-performance@postgresql.org Date: Thursday, September 23, 2010, 5:50 PM We've come to a tipping point with one of our database servers, it's generally quite loaded but up until recently it was handling the load well - but now we're seeing that it struggles to process all the selects fast enough. Sometimes we're observing some weird lock-like behaviour (see my other post on that), but most of the time the database server is just not capable of handling the load fast enough (causing the queries to pile up in the pg_stat_activity-view). My main hypothesis is that all the important indexes would fit snuggly into the memory before, and now they don't. We'll eventually get the server moved over to new and improved hardware, but while waiting for that to happen we need to do focus on reducing the memory footprint of the database. I have some general questions now ... 1) Are there any good ways to verify my hypothesis? Some months ago I thought of running some small memory-gobbling program on the database server just to see how much memory I could remove before we would see indications of the database being overloaded. It seems a bit radical, but I think the information learned from such an experiment would be very useful ... and we never managed to set up any testing environment that faithfully replicates production traffic. Anyway, it's sort of too late now that we're already observing performance problems even without the memory gobbling script running. 2) I've seen it discussed earlier on this list ... shared_buffers vs OS caches. Some claims that it has very little effect to adjust the size of the shared buffers. Anyway, isn't it a risk that memory is wasted because important data is stored both in the OS cache and the shared buffers? What would happen if using almost all the available memory for shared buffers? Or turn it down to a bare minimum and let the OS do almost all the cache handling? 3) We're discussing to drop some overlapping indexes ... i.e. to drop one out of two indexes looking like this: some_table(a) some_table(a,b) Would the query select * from some_table where a=? run slower if we drop the first index? Significantly? (in our situation I found that the number of distinct b's for each a is low and that the usage stats on the second index is quite low compared with the first one, so I think we'll drop the second index). 4) We're discussing to drop other indexes. Does it make sense at all as long as we're not experiencing problems with inserts/updates? I suppose that if the index isn't used it will remain on disk and won't affect the memory usage ... but what if the index is rarely used ... wouldn't it be better to do a seqscan on a table that is frequently accessed and mostly in memory than to consult an index that is stored on the disk? Sorry for all the stupid questions ;-) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Memory usage - indexes
Tobias, First off, what version of PostgreSQL are you running? If you have 8.4, nightly vacuuming shouldn't be necessary with properly tuned autovacuum jobs. The pertinent difference between pg_stat_user_indexes and pg_statio_user_indexes is the latter shows the number of blocks read from disk or found in the cache. You're correct, unused indexes will remain on disk, but indexes that don't completely fit into memory must be read from disk for each index scan, and that hurts performance. (In fact, it will suddenly drop like a rock. BTDT.) By making smaller equivalent indexes on partitioned data the indexes for individual partitions are more likely to stay in memory, which is particularly important when multiple passes are made over the index by a query. You are correct on all the points you make concerning indexes, but point 4 is the one I'm referring to. You discovered this independently yourself, according to your anecdote about the overlapping indexes. Bob Lunney --- On Fri, 9/24/10, Tobias Brox tobi...@gmail.com wrote: From: Tobias Brox tobi...@gmail.com Subject: Re: [PERFORM] Memory usage - indexes To: Bob Lunney bob_lun...@yahoo.com Cc: pgsql-performance@postgresql.org Date: Friday, September 24, 2010, 12:46 PM On 24 September 2010 18:23, Bob Lunney bob_lun...@yahoo.com wrote: Consult pg_statio_user_indexes to see which indexes have been used and how much. What is the main differences between pg_statio_user_indexes and pg_stat_user_indexes? Indexes with comparitively low usages rates aren't helping you much and are candidates for elimination. No doubt about that - but the question was, would it really help us to drop those indexes? I think the valid reasons for dropping indexes would be: 1) To speed up inserts, updates and deletes 2) To spend less disk space 3) Eventually, speed up nightly vacuum (it wouldn't be an issue with autovacuum though) 4) To spend less memory resources? I'm not at all concerned about 1 and 2 above - we don't have any performance issues on the write part, and we have plenty of disk capacity. We are still doing the nightly vacuum thing, and it does hurt us a bit since it's dragging ever more out in time. Anyway, it's number four I'm wondering most about - is it anything to be concerned about or not for the least frequently used indexes? An index that aren't being used would just stay on disk anyway, right? And if there are limited memory resources, the indexes that are most frequently used would fill up the cache space anyway? That's my thoughts at least - are they way off? We did have similar experiences some years ago - everything was running very fine all until one day when some semi-complicated very-frequently-run selects started taking several seconds to run rather than tens of milliseconds. I found that we had two slightly overlapping indexes like this ... account_transaction(customer_id, trans_type) account_transaction(customer_id, trans_type, created) both of those indexes where heavily used. I simply dropped the first one, and the problems disappeared. I assume that both indexes up to some point fitted snuggly into memory, but one day they were competing for the limited memory space, dropping the redundant index solved the problem all until the next hardware upgrade. I would never have found those indexes searching for the least used indexes in the pg_stat(io)_user_indexes view. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Query
We need more information than that, like:What version of PostgreSQL?What does the hardware look like?What does the disk and tablespace layout look like?How are your configuration variables set?Other than that, are the statistics up to date on the VehicleMake table?Bob Lunney--- On Thu, 8/26/10, Ozer, Pam po...@automotive.com wrote:From: Ozer, Pam po...@automotive.comSubject: [PERFORM] Slow QueryTo: pgsql-performance@postgresql.orgDate: Thursday, August 26, 2010, 8:03 PM I am new to Postgres and I am trying to understand the Explain Analyze so I can tune the following query. I run the same query using mysql and it takes less than 50ms. I run it on postgres and it takes 10 seconds. I feel like I am missing something very obvious. (VehicleUsed is a big table over 750,000records) and datasetgroupyearmakemodel has 15 records. It looks like the cost is highest in the Hash Join on Postalcode. Am I reading this correctly.? I do have indexes on the lower(postalcode) in both tables. Why wouldn’t be using the index? Thanks in advance for any help. Here is my query: select distinct VehicleMake.VehicleMake from VehicleUsed inner join PostalCodeRegionCountyCity on ( lower ( VehicleUsed.PostalCode ) = lower ( PostalCodeRegionCountyCity.PostalCode ) ) INNER JOIN DATASETGROUPYEARMAKEMODEL ON ( VEHICLEUSED.VEHICLEYEAR = DATASETGROUPYEARMAKEMODEL.VEHICLEYEAR ) AND ( VEHICLEUSED.VEHICLEMAKEID = DATASETGROUPYEARMAKEMODEL.VEHICLEMAKEID ) AND ( VEHICLEUSED.VEHICLEMODELID = DATASETGROUPYEARMAKEMODEL.VEHICLEMODELID ) inner join VehicleMake on ( VehicleUsed.VehicleMakeId = VehicleMake.VehicleMakeId ) where ( DatasetGroupYearMakeModel.DatasetGroupId = 3 ) and ( VehicleUsed.DatasetId 113 ) and ( VehicleUsed.ProductGroupId 13 ) and ( PostalCodeRegionCountyCity.RegionId = 36 ) order by VehicleMake.VehicleMake limit 50 Here is the explain analyze "Limit (cost=38292.53..38293.19 rows=261 width=8) (actual time=10675.857..10675.892 rows=42 loops=1)" " - Sort (cost=38292.53..38293.19 rows=261 width=8) (actual time=10675.855..10675.868 rows=42 loops=1)" " Sort Key: vehiclemake.vehiclemake" " Sort Method: quicksort Memory: 18kB" " - HashAggregate (cost=38279.45..38282.06 rows=261 width=8) (actual time=10675.710..10675.728 rows=42 loops=1)" " - Hash Join (cost=436.31..38270.51 rows=3576 width=8) (actual time=4.471..10658.291 rows=10425 loops=1)" " Hash Cond: (vehicleused.vehiclemakeid = vehiclemake.vehiclemakeid)" " - Hash Join (cost=428.43..38213.47 rows=3576 width=4) (actual time=4.152..10639.742 rows=10425 loops=1)" " Hash Cond: (lower((vehicleused.postalcode)::text) = lower((postalcoderegioncountycity.postalcode)::text))" " - Nested Loop (cost=101.81..37776.78 rows=11887 width=10) (actual time=1.172..9876.586 rows=382528 loops=1)" " - Bitmap Heap Scan on datasetgroupyearmakemodel (cost=101.81..948.81 rows=5360 width=6) (actual time=0.988..17.800 rows=5377 loops=1)" " Recheck Cond: (datasetgroupid = 3)" " - Bitmap Index Scan on datasetgroupyearmakemodel_i04 (cost=0.00..100.47 rows=5360 width=0) (actual time=0.830..0.830 rows=5377 loops=1)" " Index Cond: (datasetgroupid = 3)" " - Index Scan using vehicleused_i10 on vehicleused (cost=0.00..6.85 rows=1 width=12) (actual time=0.049..1.775 rows=71 loops=5377)" " Index Cond: ((vehicleused.vehiclemodelid = datasetgroupyearmakemodel.vehiclemodelid) AND (vehicleused.vehiclemakeid = datasetgroupyearmakemodel.vehiclemakeid) AND (vehicleused.vehicleyear = datasetgroupyearmakemodel.vehicleyear))" " Filter: ((vehicleused.datasetid 113) AND (vehicleused.productgroupid 13))" " - Hash (cost=308.93..308.93 rows=1416 width=6) (actual time=2.738..2.738 rows=1435 loops=1)" " - Bitmap Heap Scan on postalcoderegioncountycity (cost=27.23..308.93 rows=1416 width=6) (actual time=0.222..0.955 rows=1435 loops=1)" " Recheck Cond: (regionid = 36)" " - Bitmap Index Scan on postalcoderegioncountycity_i05 (cost=0.00..26.87 rows=1416 width=0) (actual time=0.202..0.202 rows=1435 loops=1)" " Index Cond: (regionid = 36)" " - Hash (cost=4.61..4.61 rows=261 width=10) (actual time=0.307..0.307 rows=261 loops=1)" " - Seq Scan on vehiclemake (cost=0.00..4.61 rows=261 width=10) (actual time=0.033..0.154 rows=261 loops=1)" "Total runtime: 10676.058 ms" Pam Ozer Data Architect po...@automotive.com tel. 949.705.3468 Source Interlink Media 1733 Alton Pkwy Suite 100, Irvine, CA 92606 www.simautomotive.com Confidentiality Notice- This electronic communication, and all information herein, including files attached hereto, is private, and is the property of the sender. This commun
Re: [PERFORM] Triggers or code?
That depends on your application's requirements. If a transaction on table X fails, do you still want the history (noting the failure)? If so, go with embedding the code in your script. If you only want history for successful transactions, a trigger will take care of that for you automatically. Bob Lunney --- On Mon, 8/23/10, DM dm.a...@gmail.com wrote: From: DM dm.a...@gmail.com Subject: [PERFORM] Triggers or code? To: pgsql-performance@postgresql.org Date: Monday, August 23, 2010, 2:42 PM Hello There, I have a table x and a history table x_hist, whats the best way to update the history table. should i need to use triggers or embed a code in my script to update the history table? what is the performance impact of a trigger versus embedding the code in the script? thanks for your time. - Deepak
Re: [PERFORM] requested shared memory size overflows size_t
Tom, First off, I wouldn't use a VM if I could help it, however, sometimes you have to make compromises. With a 16 Gb machine running 64-bit Ubuntu and only PostgreSQL, I'd start by allocating 4 Gb to shared_buffers. That should leave more than enough room for the OS and file system cache. Then I'd begin testing by measuring response times of representative queries with significant amounts of data. Also, what is the disk setup for the box? Filesystem? Can WAL files have their own disk? Is the workload OLTP or OLAP, or a mixture of both? There is more that goes into tuning a PG server for good performance than simply installing the software, setting a couple of GUCs and running it. Bob --- On Thu, 6/10/10, Tom Wilcox hungry...@gmail.com wrote: From: Tom Wilcox hungry...@gmail.com Subject: Re: [PERFORM] requested shared memory size overflows size_t To: Bob Lunney bob_lun...@yahoo.com Cc: Robert Haas robertmh...@gmail.com, pgsql-performance@postgresql.org Date: Thursday, June 10, 2010, 10:45 AM Thanks guys. I am currently installing Pg64 onto a Ubuntu Server 64-bit installation running as a VM in VirtualBox with 16GB of RAM accessible. If what you say is true then what do you suggest I do to configure my new setup to best use the available 16GB (96GB and native install eventually if the test goes well) of RAM on Linux. I was considering starting by using Enterprise DBs tuner to see if that optimises things to a better quality.. Tom On 10/06/2010 15:41, Bob Lunney wrote: True, plus there are the other issues of increased checkpoint times and I/O, bgwriter tuning, etc. It may be better to let the OS cache the files and size shared_buffers to a smaller value. Bob Lunney --- On Wed, 6/9/10, Robert Haasrobertmh...@gmail.com wrote: From: Robert Haasrobertmh...@gmail.com Subject: Re: [PERFORM] requested shared memory size overflows size_t To: Bob Lunneybob_lun...@yahoo.com Cc: pgsql-performance@postgresql.org, Tom Wilcoxhungry...@googlemail.com Date: Wednesday, June 9, 2010, 9:49 PM On Wed, Jun 2, 2010 at 9:26 PM, Bob Lunneybob_lun...@yahoo.com wrote: Your other option, of course, is a nice 64-bit linux variant, which won't have this problem at all. Although, even there, I think I've heard that after 10GB you don't get much benefit from raising it further. Not sure if that's accurate or not... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query about index usage
Jayadevan, PostgreSQL must go to the table to determine if the row you are requesting is visible to your transaction. This is an artifact of the MVCC implementation. Oracle can fetch the data from the index, since it doesn't keep multiple representations of the rows, but it may need to check the undo logs to determine the state that applies to your transaction. Its just two different ways to accomplish the same thing. Bob Lunney --- On Fri, 6/11/10, Jayadevan M jayadevan.maym...@ibsplc.com wrote: From: Jayadevan M jayadevan.maym...@ibsplc.com Subject: [PERFORM] Query about index usage To: pgsql-performance@postgresql.org Date: Friday, June 11, 2010, 5:56 AM Hello all, One query about PostgreSQL's index usage. If I select just one column on which there is an index (or select only columns on which there is an index), and the index is used by PostgreSQL, does PostgreSQL avoid table access if possible? I am trying to understand the differences between Oracle's data access patterns and PostgreSQL's. Here is how it works in Oracle. Case 1 - SELECT column which is not there in the index SQL select name from myt where id = 13890; NAME --- Execution Plan -- Plan hash value: 2609414407 - | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | - | 0 | SELECT STATEMENT | | 1 | 65 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| MYT | 1 | 65 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | MYIDX | 1 | | 1 (0)| 00:00:01 | - Predicate Information (identified by operation id): --- 2 - access(ID=13890) Note - - dynamic sampling used for this statement Statistics -- 0 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 409 bytes sent via SQL*Net to client 384 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Case 1 - SELECT column which is there in the index SQL select id from myt where id = 13890; ID -- 13890 Execution Plan -- Plan hash value: 2555454399 -- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -- | 0 | SELECT STATEMENT | | 1 | 13 | 1 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| MYIDX | 1 | 13 | 1 (0)| 00:00:01 | -- Predicate Information (identified by operation id): --- 1 - access(ID=13890) Note - - dynamic sampling used for this statement Statistics -- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 407 bytes sent via SQL*Net to client 384 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed In the second query where id was selected, the table was not used at all. In PosgreSQL, explain gives me similar output in both cases. Table structure - postgres=# \d myt Table public.myt Column | Type | Modifiers +---+--- id | integer | name | character varying(20) | Indexes: myidx btree (id) Regards, Jayadevan DISCLAIMER: The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable
Re: [PERFORM] requested shared memory size overflows size_t
True, plus there are the other issues of increased checkpoint times and I/O, bgwriter tuning, etc. It may be better to let the OS cache the files and size shared_buffers to a smaller value. Bob Lunney --- On Wed, 6/9/10, Robert Haas robertmh...@gmail.com wrote: From: Robert Haas robertmh...@gmail.com Subject: Re: [PERFORM] requested shared memory size overflows size_t To: Bob Lunney bob_lun...@yahoo.com Cc: pgsql-performance@postgresql.org, Tom Wilcox hungry...@googlemail.com Date: Wednesday, June 9, 2010, 9:49 PM On Wed, Jun 2, 2010 at 9:26 PM, Bob Lunney bob_lun...@yahoo.com wrote: Your other option, of course, is a nice 64-bit linux variant, which won't have this problem at all. Although, even there, I think I've heard that after 10GB you don't get much benefit from raising it further. Not sure if that's accurate or not... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] requested shared memory size overflows size_t
Tom, A 32 bit build could only reference at most 4 Gb - certainly not 60 Gb. Also, Windows doesn't do well with large shared buffer sizes anyway. Try setting shared_buffers to 2 Gb and let the OS file system cache handle the rest. Your other option, of course, is a nice 64-bit linux variant, which won't have this problem at all. Good luck! Bob Lunney --- On Wed, 6/2/10, Tom Wilcox hungry...@googlemail.com wrote: From: Tom Wilcox hungry...@googlemail.com Subject: Re: [PERFORM] requested shared memory size overflows size_t To: pgsql-performance@postgresql.org Date: Wednesday, June 2, 2010, 6:58 AM Hi, Sorry to revive an old thread but I have had this error whilst trying to configure my 32-bit build of postgres to run on a 64-bit Windows Server 2008 machine with 96GB of RAM (that I would very much like to use with postgres). I am getting: 2010-06-02 11:34:09 BSTFATAL: requested shared memory size overflows size_t 2010-06-02 11:41:01 BSTFATAL: could not create shared memory segment: 8 2010-06-02 11:41:01 BSTDETAIL: Failed system call was MapViewOfFileEx. which makes a lot of sense since I was setting shared_buffers (and effective_cache_size) to values like 60GB.. Is it possible to get postgres to make use of the available 96GB RAM on a Windows 32-bit build? Otherwise, how can I get it to work? Im guessing my options are: - Use the 64-bit Linux build (Not a viable option for me - unless from a VM - in which case recommendations?) or - Configure Windows and postgres properly (Preferred option - but I don't know what needs to be done here or if Im testing properly using Resource Monitor) Thanks, Tom -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT present
Jori, What is the PostgreSQL version/shared_buffers/work_mem/effective_cache_size/default_statistics_target? Are the statistics for the table up to date? (Run analyze verbose tablename to update them.) Table and index structure would be nice to know, too. If all else fails you can set enable_seqscan = off for the session, but that is a Big Hammer for what is probably a smaller problem. Bob Lunney --- On Wed, 6/2/10, Jori Jovanovich j...@dimensiology.com wrote: From: Jori Jovanovich j...@dimensiology.com Subject: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT present To: pgsql-performance@postgresql.org Date: Wednesday, June 2, 2010, 4:28 PM hi, I have a problem space where the main goal is to search backward in time for events. Time can go back very far into the past, and so the table can get quite large. However, the vast majority of queries are all satisfied by relatively recent data. I have an index on the row creation date and I would like almost all of my queries to have a query plan looking something like: Limit ... - Index Scan Backward using server_timestamp_idx on events (cost=0.00..623055.91 rows=8695 width=177) ... However, PostgreSQL frequently tries to do a full table scan. Often what controls whether a scan is performed or not is dependent on the size of the LIMIT and how detailed the WHERE clause is. In practice, the scan is always the wrong answer for my use cases (where always is defined to be 99.9%). Some examples: (1) A sample query that devolves to a full table scan EXPLAIN SELECT events.id, events.client_duration, events.message, events.created_by, events.source, events.type, events.event, events.environment, events.server_timestamp, events.session_id, events.reference, events.client_uuid FROM events WHERE client_uuid ~* E'^foo bar so what' ORDER BY server_timestamp DESC LIMIT 20; QUERY PLAN (BAD!) -- Limit (cost=363278.56..363278.61 rows=20 width=177) - Sort (cost=363278.56..363278.62 rows=24 width=177) Sort Key: server_timestamp - Seq Scan on events (cost=0.00..363278.01 rows=24 width=177) Filter: (client_uuid ~* '^foo bar so what'::text) (2) Making the query faster by making the string match LESS specific (odd, seems like it should be MORE) EXPLAIN SELECT events.id, events.client_duration, events.message, events.created_by, events.source, events.type, events.event, events.environment, events.server_timestamp, events.session_id, events.reference, events.client_uuid FROM events WHERE client_uuid ~* E'^foo' ORDER BY server_timestamp DESC LIMIT 20; QUERY PLAN (GOOD!) Limit (cost=0.00..1433.14 rows=20 width=177) - Index Scan Backward using server_timestamp_idx on events (cost=0.00..623055.91 rows=8695 width=177) Filter: (client_uuid ~* '^foo'::text) (3) Alternatively making the query faster by using a smaller limit EXPLAIN SELECT events.id, events.client_duration, events.message, events.created_by, events.source, events.type, events.event, events.environment, events.server_timestamp, events.session_id, events.reference, events.client_uuid FROM events WHERE client_uuid ~* E'^foo bar so what' ORDER BY server_timestamp DESC LIMIT 10; QUERY PLAN (GOOD!) -- Limit (cost=0.00..259606.63 rows=10 width=177) - Index Scan Backward using server_timestamp_idx on events (cost=0.00..623055.91 rows=24 width=177) Filter: (client_uuid ~* '^foo bar so what'::text) I find myself wishing I could just put a SQL HINT on the query to force the index to be used but I understand that HINTs are considered harmful and are therefore not provided for PostgreSQL, so what is the recommended way to solve this? thank you very much
Re: [PERFORM] Slow Bulk Delete
Thilo, Just a few of thoughts off the top of my head: 1. If you know the ids of the rows you want to delete beforhand, insert them in a table, then run the delete based on a join with this table. 2. Better yet, insert the ids into a table using COPY, then use a join to create a new table with the rows you want to keep from the first table. Drop the original source table, truncate the id table, rename the copied table and add indexes and constraints. 3. See if you can partition the table somehow so the rows you want to delete are in a single partitioned child table. When its time to delete them just drop the child table. Of course, if the 1M rows you need to delete is very small compared to the total overall size of the original table the first two techniques might now buy you anything, but its worth a try. Good luck! Bob Lunney --- On Sat, 5/8/10, thilo thilo.tan...@gmail.com wrote: From: thilo thilo.tan...@gmail.com Subject: [PERFORM] Slow Bulk Delete To: pgsql-performance@postgresql.org Date: Saturday, May 8, 2010, 7:39 AM Hi all! We moved from MySQL to Postgresql for some of our projects. So far we're very impressed with the performance (especially INSERTs and UPDATEs), except for a strange problem with the following bulk delete query: DELETE FROM table1 WHERE table2_id = ? I went through these Wiki pages, trying to solve the problem: http://wiki.postgresql.org/wiki/SlowQueryQuestions and http://wiki.postgresql.org/wiki/Performance_Optimization but unfortunately without much luck. Our application is doing batch jobs. On every batch run, we must delete approx. 1M rows in table1 and recreate these entries. The inserts are very fast, but deletes are not. We cannot make updates, because there's no identifying property in the objects of table1. This is what EXPLAIN is telling me: EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id = 11242939 QUERY PLAN Index Scan using sr_index on table1 (cost=0.00..8.56 rows=4 width=6) (actual time=0.111..0.154 rows=4 loops=1) Index Cond: (table2_id = 11242939) Total runtime: 0.421 ms (3 rows) This seems to be very fast (using the index), but running this query from JDBC takes up to 20ms each. For 1M rows this sum up to several hours. When I have a look at pg_top psql uses most of the time for the deletes. CPU usage is 100% (for the core used by postgresql). So it seems that postgresql is doing some sequential scanning or constraint checks. This is the table structure: id bigint (primary key) table2_id bigint (foreign key constraint to table 2, *indexed*) table3_id bigint (foreign key constraint to table 3, *indexed*) some non-referenced text and boolean fields My server settings (Potgresql 8.4.2): shared_buffers = 1024MB effective_cache_size = 2048MB work_mem = 128MB wal_buffers = 64MB checkpoint_segments = 32 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 It would be very nice to give me a hint to solve the problem. It drives me crazy ;-) If you need more details please feel free to ask! Thanks in advance for your help! Kind regards Thilo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump far too slow
If you have a multi-processor machine (more than 2) you could look into pigz, which is a parallelized implementation of gzip. I gotten dramatic reductions in wall time using it to zip dump files. The compressed file is readable by ungzip. Bob Lunney From: Dave Crooke dcro...@gmail.com Subject: Re: [PERFORM] pg_dump far too slow To: David Newall postgre...@davidnewall.com Cc: Tom Lane t...@sss.pgh.pa.us, pgsql-performance@postgresql.org, robertmh...@gmail.com Date: Sunday, March 21, 2010, 10:33 AM One more from me If you think that the pipe to GZIP may be causing pg_dump to stall, try putting something like buffer(1) in the pipeline ... it doesn't generally come with Linux, but you can download source or create your own very easily ... all it needs to do is asynchronously poll stdin and write stdout. I wrote one in Perl when I used to do a lot of digital video hacking, and it helped with chaining together tools like mplayer and mpeg. However, my money says that Tom's point about it being (disk) I/O bound is correct :-) Cheers Dave On Sun, Mar 21, 2010 at 8:17 AM, David Newall postgre...@davidnewall.com wrote: Thanks for all of the suggestions, guys, which gave me some pointers on new directions to look, and I learned some interesting things. The first interesting thing was that piping (uncompressed) pg_dump into gzip, instead of using pg_dump's internal compressor, does bring a lot of extra parallelism into play. (Thank you, Matthew Wakeling.) I observed gzip using 100% CPU, as expected, and also two, count them, two postgres processes collecting data, each consuming a further 80% CPU. It seemed to me that Postgres was starting and stopping these to match the capacity of the consumer (i.e. pg_dump and gzip.) Very nice. Unfortunately one of these processes dropped eventually, and, according to top, the only non-idle process running was gzip (100%.) Obviously there were postgress and pg_dump processes, too, but they were throttled by gzip's rate of output and effectively idle (less than 1% CPU). That is also interesting. The final output from gzip was being produced at the rate of about 0.5MB/second, which seems almost unbelievably slow. I next tried Tom Lane's suggestion, COPY WITH BINARY, which produced the complete 34GB file in 30 minutes (a good result.) I then compressed that with gzip, which took an hour and reduced the file to 32GB (hardly worth the effort) for a total run time of 90 minutes. In that instance, gzip produced output at the rate of 10MB/second, so I tried pg_dump -Z0 to see how quickly that would dump the file. I had the idea that I'd go on to see how quickly gzip would compress it, but unfortunately it filled my disk before finishing (87GB at that point), so there's something worth knowing: pg_dump's output for binary data is very much less compact than COPY WITH BINARY; all those backslashes, as Tom pointed out. For the aforementioned reason, I didn't get to see how gzip would perform. For the record, pg_dump with no compression produced output at the rate of 26MB/second; a rather meaningless number given the 200%+ expansion of final output. I am now confident the performance problem is from gzip, not Postgres and wonder if I should read up on gzip to find why it would work so slowly on a pure text stream, albeit a representation of PDF which intrinsically is fairly compressed. Given the spectacular job that postgres did in adjusting it's rate of output to match the consumer process, I did wonder if there might have been a tragic interaction between postgres and gzip; perhaps postgres limits its rate of output to match gzip; and gzip tries to compress what's available, that being only a few bytes; and perhaps that might be so inefficient that it hogs the CPU; but it don't think that likely. I had a peek at gzip's source (surprisingly readable) and on first blush it does seem that unfortunate input could result in only a few bytes being written each time through the loop, meaning only a few more bytes could be read in. Just to complete the report, I created a child table to hold the PDF's, which are static, and took a dump of just that table, and adjusted my backup command to exclude it. Total size of compressed back sans PDFs circa 7MB taking around 30 seconds.
Re: [PERFORM] Block at a time ...
Greg is correct, as usual. Geometric growth of files is A Bad Thing in an Oracle DBA's world, since you can unexpectedly (automatically?) run out of file system space when the database determines it needs x% more extents than last time. The concept of contiguous extents, however, has some merit, particularly when restoring databases. Prior to parallel restore, a table's files were created and extended in roughly contiguous allocations, presuming there was no other activity on your database disks. (You do dedicate disks, don't you?) When using 8-way parallel restore against a six-disk RAID 10 group I found that table and index scan performance dropped by about 10x. I/O performance was restored by either clustering the tables one at a time, or by dropping and restoring them one at a time. The only reason I can come up with for this behavior is file fragmentation and increased seek times. If PostgreSQL had a mechanism to pre-allocate files prior to restoring the database that might mitigate the problem. Then if we could only get parallel index operations ... Bob Lunney --- On Wed, 3/17/10, Greg Stark gsst...@mit.edu wrote: From: Greg Stark gsst...@mit.edu Subject: Re: [PERFORM] Block at a time ... To: Pierre C li...@peufeu.com Cc: Alvaro Herrera alvhe...@commandprompt.com, Dave Crooke dcro...@gmail.com, pgsql-performance@postgresql.org Date: Wednesday, March 17, 2010, 5:52 AM On Wed, Mar 17, 2010 at 7:32 AM, Pierre C li...@peufeu.com wrote: I was thinking in something like that, except that the factor I'd use would be something like 50% or 100% of current size, capped at (say) 1 GB. This turns out to be a bad idea. One of the first thing Oracle DBAs are told to do is change this default setting to allocate some reasonably large fixed size rather than scaling upwards. This might be mostly due to Oracle's extent-based space management but I'm not so sure. Recall that the filesystem is probably doing some rounding itself. If you allocate 120kB it's probably allocating 128kB itself anyways. Having two layers rounding up will result in odd behaviour. In any case I was planning on doing this a while back. Then I ran some experiments and couldn't actually demonstrate any problem. ext2 seems to do a perfectly reasonable job of avoiding this problem. All the files were mostly large contiguous blocks after running some tests -- IIRC running pgbench. Using fallocate() ? I think we need posix_fallocate(). -- greg -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sub-select in IN clause results in sequential scan
Try replacing the 'current_timestamp - interval 8 days' portion with explicit values (e.g. partitioned_column '2009-10-21'::date ) and see if that works. I think the query planner can only use explicit values to determine if it should go straight to partitioned tables. Bob --- On Thu, 10/29/09, Anj Adu fotogra...@gmail.com wrote: From: Anj Adu fotogra...@gmail.com Subject: Re: [PERFORM] sub-select in IN clause results in sequential scan To: Angayarkanni kangayarka...@gmail.com Cc: Grzegorz Jaśkiewicz gryz...@gmail.com, pgsql-performance@postgresql.org Date: Thursday, October 29, 2009, 10:10 AM Join did not help. A sequential scan is still being done. The hardcoded value in the IN clause performs the best. The time difference is more than an order of magnitude. 2009/10/29 Angayarkanni kangayarka...@gmail.com: 2009/10/29 Grzegorz Jaśkiewicz gryz...@gmail.com On Wed, Oct 28, 2009 at 6:13 PM, Anj Adu fotogra...@gmail.com wrote: Postgres consistently does a sequential scan on the child partitions for this query select * from partitioned_table where partitioned_column current_timestamp - interval 8 days where x in (select yy from z where colname like 'aaa%') If I replace the query with select * from partitioned_table where partitioned_column current_timestamp - interval 8 days where x in (hardcode_value) The results are in line with expectation (very fast and uses a Bitmap Index Scan on the column X) \ use JOIN luke.. -- GJ Yes you try by using Join JAK -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Insert performance and multi-column index order
Greg, Thanks for the mental prod! Yes, the original data is more closely sorted by the timestamptz column, since they represent events coming into the collection system in real time. As for the distribution of data values, it goes without saying the timestamptz value is monotonically increasing, with roughly 1300 entries having the same timestamptz value. The other three columns' values are essentially reference data, with 400 values for the varchar, 680 for the first text column, and 60 for the second text column. The distribution is fairly even, with some small spikes but nothing significant. The duh moment came for me when you pointed out the implicit sort order of the data. After resorting the data into the new index column order the insert performance was largely restored. I didn't monitor the process with vmstat, however - the end result is good enough for me. I believe that the index maintenance of page splitting, etc., that you describe below was exactly the culprit, and that presorting the data solved that problem. I call it my duh moment since I've presorted data for Sybase and Oracle for exactly the same reason, but forgot to apply the lesson to PostgreSQL. BTW, this is PG 8.2.1 and 8.3.7 running on SLES 10.3, although I don't think it matters. Thanks for the help, Greg and Tom! --- On Sat, 6/27/09, Greg Smith gsm...@gregsmith.com wrote: From: Greg Smith gsm...@gregsmith.com Subject: Re: [PERFORM] Insert performance and multi-column index order To: bob_lun...@yahoo.com Cc: pgsql-performance@postgresql.org Date: Saturday, June 27, 2009, 1:08 AM On Fri, 26 Jun 2009, bob_lun...@yahoo.com wrote: The original unique index was in the order (timestamptz, varchar, text, text) and most queries against it were slow. I changed the index order to (varchar, text, timestamptz, text) and queries now fly, but loading data (via copy from stdin) in the table is 2-4 times slower. Is the input data closer to being sorted by the timestamptz field than the varchar field? What you might be seeing is that the working set of index pages needed to keep building the varchar index are bigger or have more of a random access component to them as they spill in and out of the buffer cache. Usually you can get a better idea what the difference is by comparing the output from vmstat while the two are loading. More random read/write requests in the mix will increase the waiting for I/O percentage while not increasing the total amount read/written per second. -- * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance