Re: [PERFORM] Savepoints in transactions for speed?

2012-11-27 Thread Bob Lunney
Mike,

Is there anything that the 1.5 million rows have in common that would allow you 
to use partitions?  if so, you could load the new data into a partition  at 
your leisure, start a transaction, alter the partition table with the old data 
to no longer inherit from the parent, alter the new partition table to
inherit from the parent, commit, then drop the old table.  This operation would 
be very fast, the users probably won't even notice. 

Bob Lunney

On Nov 27, 2012, at 4:04 PM, Mike Blackwell mike.blackw...@rrd.com wrote:

 I need to delete about 1.5 million records from a table and reload it in one 
 transaction.  The usual advice when loading with inserts seems to be group 
 them into transactions of around 1k records.  Committing at that point would 
 leave the table in an inconsistent state.  Would issuing a savepoint every 1k 
 or so records negate whatever downside there is to keeping a transaction open 
 for all 1.5 million records, or just add more overhead?
 
 The data to reload the table is coming from a Perl DBI connection to a 
 different database (not PostgreSQL) so I'm not sure the COPY alternative 
 applies here.
 
 Any suggestions are welcome.
 
 
 Mike


Re: [PERFORM] SELECT AND AGG huge tables

2012-10-15 Thread Bob Lunney
Houman,

Partition by date and revise your processes to create and load a new child 
table every day.   Since you already know the date append it to the table base 
name and go straight to the data you need.  Also, the index on T.c won't help 
for this query, you're looking at a full table scan every time.  

Bob

Sent from my iPhone

On Oct 15, 2012, at 3:59 PM, houmanb hou...@gmx.at wrote:

 Dear all,
 We have a DB containing transactional data. 
 There are about *50* to *100 x 10^6* rows in one *huge* table.
 We are using postgres 9.1.6 on linux with a *SSD card on PCIex* providing us
 a constant seeking time.
 
 A typical select (see below) takes about 200 secs. As the database is the
 backend for a web-based reporting facility 200 to 500 or even more secs
 response times are not acceptable for the customer.
 
 Is there any way to speed up select statements like this:
 
 SELECT
   SUM(T.x),
   SUM(T.y),
   SUM(T.z),
   AVG(T.a),
   AVG(T.b)
 FROM T
 GROUP BY 
   T.c
 WHERE 
   T.creation_date=$SOME_DATE;
 
 There is an Index on T.c. But would it help to partition the table by T.c?
 It should be mentioned, that T.c is actually a foreign key to a Table
 containing a 
 tiny number of rows (15 rows representing different companies).
 my postgres.conf is actually the default one, despite the fact that we
 increased the value for work_mem=128MB
 
 Thanks in advance
 Houman
 
 
 
 
 
 
 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/SELECT-AND-AGG-huge-tables-tp5728306.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] database slowdown while a lot of inserts occur

2012-03-31 Thread Bob Lunney
Tomas,

You are correct.  I was assuming that each insert was issued as an implicit 
transaction, without the benefit of an explicit BEGIN/COMMIT batching many of 
them together, as I've seen countless times in tight loops trying to pose as a 
batch insert.

Bob




 From: Tomas Vondra t...@fuzzy.cz
To: pgsql-performance@postgresql.org 
Sent: Friday, March 30, 2012 8:11 PM
Subject: Re: [PERFORM] database slowdown while a lot of inserts occur
 
On 29.3.2012 21:27, Bob Lunney wrote:
 Lance,
 
 May small inserts cause frequent fsyncs.  Is there any way those small
 inserts can be batched into some larger sets of inserts that use copy to
 perform the load?

Not necessarily - fsync happens at COMMIT time, not when the INSERT is
performed (unless each INSERT stands on it's own).

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] database slowdown while a lot of inserts occur

2012-03-29 Thread Bob Lunney
Lance,

May small inserts cause frequent fsyncs.  Is there any way those small inserts 
can be batched into some larger sets of inserts that use copy to perform the 
load?

Bob Lunney



 From: Campbell, Lance la...@illinois.edu
To: Campbell, Lance la...@illinois.edu; pgsql-performance@postgresql.org 
pgsql-performance@postgresql.org 
Sent: Thursday, March 29, 2012 1:02 PM
Subject: Re: [PERFORM] database slowdown while a lot of inserts occur
 

 
I forgot to mention that the slowdown in particular for other applications is 
when they are trying to insert or update tables unrelated to the application 
mentioned in my prior application that does the massive small inserts.
 
 
Thanks,
 
Lance Campbell
Software Architect
Web Services at Public Affairs
217-333-0382
 
From:pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Campbell, Lance
Sent: Thursday, March 29, 2012 12:59 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] database slowdown while a lot of inserts occur
 
PostgreSQL 9.0.x
We have around ten different applications that use the same database.  When one 
particular application is active it does an enormous number of inserts.  Each 
insert is very small.  During this time the database seems to slow down in 
general.  The application in question is inserting into a particular table that 
is not used by the other applications.
 
1)  What should I do to confirm that the database is the issue and not the 
applications?
2)  How can I identify where the bottle neck is occurring if the issue 
happens to be with the database?
 
I have been using PostgreSQL for eight years.  It is an amazing database.
 
Thanks,
 
Lance Campbell
Software Architect
Web Services at Public Affairs
217-333-0382

Re: [PERFORM] From Simple to Complex

2012-02-01 Thread Bob Lunney
Possibly.  What does

  psql  show work_mem;

say?

Bob Lunney



 From: Alessandro Gagliardi alessan...@path.com
To: pgsql-performance@postgresql.org 
Sent: Wednesday, February 1, 2012 12:19 PM
Subject: Re: [PERFORM] From Simple to Complex
 

Final update on this thread: since it is only necessary for me to get a rough 
ratio of the distribution (and not the absolute count), I refactored the query 
to include a subquery that samples from the moments table thus: SELECT 
moment_id, block_id FROM moments WHERE inserted BETWEEN 'yesterday' AND 
'today' ORDER BY RANDOM() LIMIT 1; I also took advantage of another table 
called blocks that happens to contain the moment_type as well (thus making it 
so I don't need to reference pg_class). The final query looks like:

SELECT moment_type, emotion, COUNT(feedback_id) 
  FROM (SELECT moment_id, block_id 
          FROM moments 
         WHERE inserted BETWEEN 'yesterday' AND 'today' 
         ORDER BY RANDOM() LIMIT 1) AS sample_moments
  JOIN blocks USING (block_id)
  JOIN emotions USING (moment_id)
 GROUP BY moment_type, emotion
 ORDER BY moment_type, emotion

The explain is at http://explain.depesz.com/s/lYh

Interestingly, increasing the limit does not seem to increase the runtime in a 
linear fashion. When I run it with a limit of 6 I get a runtime of 14991 
ms. But if I run it with a limit of 7 I get a runtime of 77744 ms. I assume 
that that's because I'm hitting a memory limit and paging out. Is that right?

On Tue, Jan 31, 2012 at 3:43 PM, Alessandro Gagliardi alessan...@path.com 
wrote:

I just got a pointer on presenting EXPLAIN ANALYZE in a more human friendly 
fashion (thanks, Agent M!): http://explain.depesz.com/s/A9S


From this it looks like the bottleneck happens when Postgres does an Index 
Scan using emotions_moment_id_idx on emotions before filtering on 
moments.inserted so I thought I'd try filtering on emotions.inserted instead 
but that only made it worse. At the same time, I noticed that FROM pg_class, 
moments WHERE moments.tableoid = pg_class.oid tends to run a bit faster than 
FROM pg_class JOIN moments ON moments.tableoid = pg_class.oid. So I tried:


SELECT relname, emotion, COUNT(feedback_id) 
  FROM pg_class, moments, emotions
 WHERE moments.tableoid = pg_class.oid 
   AND emotions.inserted  'yesterday' 
   AND moments.inserted BETWEEN 'yesterday' AND 'today' 
   AND emotions.moment_id = moments.moment_id
 GROUP BY relname, emotion 
 ORDER BY relname, emotion;


That was a bit faster, but still very slow. Here's the 
EXPLAIN: http://explain.depesz.com/s/ZdF


On Tue, Jan 31, 2012 at 2:53 PM, Alessandro Gagliardi alessan...@path.com 
wrote:

I changed the query a bit so the results would not change over the
course of the day to:


SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)
WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND

moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;


Re: [PERFORM] Response time increases over time

2011-12-08 Thread Bob Lunney
Otto,

Separate the pg_xlog directory onto its own filesystem and retry your tests.

Bob Lunney



 From: Havasvölgyi Ottó havasvolgyi.o...@gmail.com
To: Marti Raudsepp ma...@juffo.org 
Cc: Aidan Van Dyk ai...@highrise.ca; pgsql-performance@postgresql.org 
Sent: Thursday, December 8, 2011 9:48 AM
Subject: Re: [PERFORM] Response time increases over time
 

I have moved the data directory (xlog, base, global, and everything) to an ext4 
file system. The result hasn't changed unfortuately. With the same load test 
the average response time: 80ms; from 40ms to 120 ms everything occurs.
This ext4 has default settings in fstab.
Have you got any other idea what is going on here?

Thanks,
Otto





2011/12/8 Marti Raudsepp ma...@juffo.org

On Thu, Dec 8, 2011 at 06:37, Aidan Van Dyk ai...@highrise.ca wrote:
 Let me guess, debian squeeze, with data and xlog on both on a single
 ext3 filesystem, and the fsync done by your commit (xlog) is flushing
 all the dirty data of the entire filesystem (including PG data writes)
 out before it can return...

This is fixed with the data=writeback mount option, right?
(If it's the root file system, you need to add
rootfsflags=data=writeback to your kernel boot flags)

While this setting is safe and recommended for PostgreSQL and other
transactional databases, it can cause garbage to appear in recently
written files after a crash/power loss -- for applications that don't
correctly fsync data to disk.

Regards,
Marti


Re: [PERFORM] poor execution plan because column dependence

2011-04-12 Thread Bob Lunney
Zito,

Using psql log in as the database owner and run analyze verbose.  Happiness 
will ensue.

Also, when requesting help with a query its important to state the database 
version (select version();) and what, if any, configuration changes you have 
made in postgresql.conf.   Listing ony the ones that have changed is sufficient.

Finally, the wiki has some good information on the care and feeding of a 
PostgreSQL database:

http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT



Bob Lunney

--- On Tue, 4/12/11, Václav Ovsík vaclav.ov...@i.cz wrote:

 From: Václav Ovsík vaclav.ov...@i.cz
 Subject: [PERFORM] poor execution plan because column dependence
 To: pgsql-performance@postgresql.org
 Date: Tuesday, April 12, 2011, 7:23 PM
 Hi,
 I have done migration of the Request Tracker 3.8.9
 (http://requesttracker.wikia.com/wiki/HomePage) from
 Mysql to
 PostgreSQL in testing environment.
 The RT schema used can be viewed at
 https://github.com/bestpractical/rt/blob/3.8-trunk/etc/schema.Pg.
 I have added full text search on table Attachments based on
 trigrams
 (and still experimenting with it), but is is not
 interesting for the
 problem (the problem is not caused by it directly).
 The full text search alone works quite good. A user testing
 a new RT instance
 reported a poor performance problem with a bit more complex
 query (more
 conditions resulting in table joins).
 Queries are constructed by module DBIx::SearchBuilder.
 The problematic query logged:
 
 rt=# EXPLAIN ANALYZE SELECT DISTINCT  main.* FROM
 Tickets main JOIN Transactions Transactions_1  ON (
 Transactions_1.ObjectId = main.id ) JOIN Attachments
 Attachments_2  ON ( Attachments_2.TransactionId =
 Transactions_1.id )  WHERE (Transactions_1.ObjectType =
 'RT::Ticket') AND (main.Status != 'deleted') AND
 (main.Status = 'resolved' AND main.LastUpdated 
 '2008-12-31 23:00:00' AND main.Created  '2005-12-31
 23:00:00' AND main.Queue = '15' AND  (
 Attachments_2.trigrams @@ text_to_trgm_tsquery('uir') AND
 Attachments_2.Content ILIKE '%uir%' ) ) AND (main.Type =
 'ticket') AND (main.EffectiveId = main.id)  ORDER BY
 main.id ASC;
                
                
                
                
                
                
                
                
                
                
            QUERY
 PLAN               
                
                
                
                
                
                
                
                
                
             
 -
  Unique  (cost=23928.60..23928.67 rows=1 width=162)
 (actual time=5201.139..5207.965 rows=649 loops=1)
    -  Sort 
 (cost=23928.60..23928.61 rows=1 width=162) (actual
 time=5201.137..5201.983 rows=5280 loops=1)
          Sort Key:
 main.effectiveid, main.issuestatement, main.resolution,
 main.owner, main.subject, main.initialpriority,
 main.finalpriority, main.priority, main.timeestimated,
 main.timeworked, main.timeleft, main.told, main.starts,
 main.started, main.due, main.resolved, main.lastupdatedby,
 main.lastupdated, main.creator, main.created, main.disabled
          Sort Method: 
 quicksort  Memory: 1598kB
          -  Nested
 Loop  (cost=0.00..23928.59 rows=1 width=162) (actual
 time=10.060..5120.834 rows=5280 loops=1)
            
    -  Nested Loop 
 (cost=0.00..10222.38 rows=1734 width=166) (actual
 time=8.702..1328.970 rows=417711 loops=1)
                
      -  Seq Scan on tickets
 main  (cost=0.00..5687.88 rows=85 width=162) (actual
 time=8.258..94.012 rows=25410 loops=1)
                
            Filter:
 (((status)::text  'deleted'::text) AND (lastupdated
  '2008-12-31 23:00:00'::timestamp without time zone) AND
 (created  '2005-12-31 23:00:00'::timestamp without time
 zone) AND (effectiveid = id) AND (queue = 15) AND
 ((type)::text = 'ticket'::text) AND ((status)::text =
 'resolved'::text))
                
      -  Index Scan using
 transactions1 on transactions transactions_1 
 (cost=0.00..53.01 rows=27 width=8) (actual time=0.030..0.039
 rows=16 loops=25410)
                
            Index Cond:
 (((transactions_1.objecttype)::text = 'RT::Ticket'::text)
 AND (transactions_1.objectid = main.effectiveid))
            
    -  Index Scan using attachments2
 on attachments attachments_2  (cost=0.00..7.89 rows=1
 width=4) (actual time=0.008..0.009 rows=0 loops=417711)
                
      Index Cond:
 (attachments_2.transactionid = transactions_1.id)
                
      Filter: ((attachments_2.trigrams @@
 '''uir'''::tsquery) AND (attachments_2.content ~~*
 '%uir%'::text))
  Total runtime: 5208.149 ms
 (14 rows)
 
 The above times are for already

Re: [PERFORM] Slow deleting tables with foreign keys

2011-03-31 Thread Bob Lunney
Jeremy,

Does table_revision have a unique index on id?  Also, I doubt these two indexes 
ever get used:

CREATE INDEX idx_crs_action_expired_created
  ON table_version.bde_crs_action_revision
  USING btree
  (_revision_expired, _revision_created);

CREATE INDEX idx_crs_action_expired_key
  ON table_version.bde_crs_action_revision
  USING btree
  (_revision_expired, audit_id);

Bob Lunney

--- On Wed, 3/30/11, Jeremy Palmer jpal...@linz.govt.nz wrote:

 From: Jeremy Palmer jpal...@linz.govt.nz
 Subject: [PERFORM] Slow deleting tables with foreign keys
 To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org
 Date: Wednesday, March 30, 2011, 10:16 PM
 Hi All,
 
 I'm trying to delete one row from a table and it's taking
 an extremely long time. This parent table is referenced by
 other table's foreign keys, but the particular row I'm
 trying to delete is not referenced any other rows in the
 associative tables. This table has the following structure:
 
 CREATE TABLE revision
 (
   id serial NOT NULL,
   revision_time timestamp without time zone NOT NULL
 DEFAULT now(),
   start_time timestamp without time zone NOT NULL
 DEFAULT clock_timestamp(),
   schema_change boolean NOT NULL,
   comment text,
   CONSTRAINT revision_pkey PRIMARY KEY (id)
 )
 WITH (
   OIDS=FALSE
 );
 
 This table is referenced from foreign key by 130 odd other
 tables. The total number of rows from these referencing
 tables goes into the hundreds of millions. Each of these
 tables has been automatically created by script and has the
 same _revision_created, _revision_expired fields, foreign
 keys and indexes. Here is an example of one:
 
 CREATE TABLE table_version.bde_crs_action_revision
 (
   _revision_created integer NOT NULL,
   _revision_expired integer,
   tin_id integer NOT NULL,
   id integer NOT NULL,
   sequence integer NOT NULL,
   att_type character varying(4) NOT NULL,
   system_action character(1) NOT NULL,
   audit_id integer NOT NULL,
   CONSTRAINT
 pkey_table_version.bde_crs_action_revision PRIMARY KEY
 (_revision_created, audit_id),
   CONSTRAINT
 bde_crs_action_revision__revision_created_fkey FOREIGN KEY
 (_revision_created)
       REFERENCES table_version.revision (id)
 MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE NO
 ACTION,
   CONSTRAINT
 bde_crs_action_revision__revision_expired_fkey FOREIGN KEY
 (_revision_expired)
       REFERENCES table_version.revision (id)
 MATCH SIMPLE
       ON UPDATE NO ACTION ON DELETE NO
 ACTION
 )
 WITH (
   OIDS=FALSE
 );
 ALTER TABLE table_version.bde_crs_action_revision OWNER TO
 bde_dba;
 ALTER TABLE table_version.bde_crs_action_revision ALTER
 COLUMN audit_id SET STATISTICS 500;
 
 
 CREATE INDEX idx_crs_action_audit_id
   ON table_version.bde_crs_action_revision
   USING btree
   (audit_id);
 
 CREATE INDEX idx_crs_action_created
   ON table_version.bde_crs_action_revision
   USING btree
   (_revision_created);
 
 CREATE INDEX idx_crs_action_expired
   ON table_version.bde_crs_action_revision
   USING btree
   (_revision_expired);
 
 CREATE INDEX idx_crs_action_expired_created
   ON table_version.bde_crs_action_revision
   USING btree
   (_revision_expired, _revision_created);
 
 CREATE INDEX idx_crs_action_expired_key
   ON table_version.bde_crs_action_revision
   USING btree
   (_revision_expired, audit_id);
 
 
 All of the table have been analysed before I tried to run
 the query.
 
 The fact the all of the foreign keys have a covering index
 makes me wonder why this delete is taking so long.
 
 The explain for 
 
 delete from table_version.revision where id = 1003
 
 
 Delete  (cost=0.00..1.02 rows=1 width=6)
   -  Seq Scan on revision 
 (cost=0.00..1.02 rows=1 width=6)
         Filter: (id = 100)
 
 I'm running POstgreSQL 9.0.2 on Ubuntu 10.4
 
 Cheers
 Jeremy
 __
 
 This message contains information, which is confidential
 and may be subject to legal privilege. 
 If you are not the intended recipient, you must not peruse,
 use, disseminate, distribute or copy this message.
 If you have received this message in error, please notify
 us immediately (Phone 0800 665 463 or i...@linz.govt.nz)
 and destroy the original message.
 LINZ accepts no responsibility for changes to this email,
 or for any attachments, after its transmission from LINZ.
 
 Thank you.
 __
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] COPY with high # of clients, partitioned table locking issues?

2011-03-30 Thread Bob Lunney
John,

Sorry to hear you're struggling with such underpowered hardware.  ;-)  A little 
more information would be helpful, though:

1.  What version of PG are you running?
2.  What are the constraints on the child tables?
3.  How many rows does each copy insert?
4.  Are these wrapped in transactions?
5.  are the child tables created at the same time the copies are taking place?  
In the same transaction?
6.  Are the indexes in place on the child table(s) when the copies are running? 
 Do they have to be to validate the data?
7.  What are the configuration settings for the database?  (Just the ones 
changed from the default, please.)
8.  Which file system are you running for the database files?  Mount options?
9.  Are the WAL files on the same file system?


Bob Lunney

--- On Wed, 3/30/11, Strange, John W john.w.stra...@jpmchase.com wrote:

 From: Strange, John W john.w.stra...@jpmchase.com
 Subject: [PERFORM] COPY with high # of clients, partitioned table locking 
 issues?
 To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org
 Date: Wednesday, March 30, 2011, 4:56 PM
 Just some information on our setup:
 
 - HP DL585 G6 
 - 4 x AMD Opteron 8435 (24 cores)
 - 256GB RAM
 - 2 FusionIO 640GB PCI-SSD (RAID0)
 - dual 10GB ethernet.
 
 - we have several tables that we store calculated values
 in.
 - these are inserted by a compute farm that calculates the
 results and stores them into a partitioned schema (schema
 listed below)
 - whenever we do a lot of inserts we seem to get exclusive
 locks.
 
 Is there something we can do to improve the performance
 around locking when doing a lot of parallel inserts with
 COPY into?  We are not IO bound, what happens is that
 the copies start to slow down and continue to come in and
 cause the client to swap, we had hit over 800+ COPYS were in
 a waiting state, which forced us to start paging heavily
 creating an issue.  If we can figure out the locking
 issue the copys should clear faster requiring less memory in
 use.
 
 [ 2011-03-30 15:54:55.886 EDT ] 14405 [local]
 asgprod:4d938288.3845 LOG:  process 14405 still waiting
 for ExclusiveLock on extension of relation 470273 of
 database 16384 after 5001.894 ms
 [ 2011-03-30 15:54:55.886 EDT ] 14405 [local]
 asgprod:4d938288.3845 CONTEXT:  COPY
 reportvalues_part_1931, line 1: 660250     
 41977959       
 11917   584573.43642105709
 [ 2011-03-30 15:54:55.886 EDT ] 14405 [local]
 asgprod:4d938288.3845 STATEMENT:  COPY
 reportvalues_part_1931 FROM stdin USING DELIMITERS ' 
      '
 [ 2011-03-30 15:54:56.015 EDT ] 7294 [local]
 asgprod:4d938939.1c7e LOG:  process 7294 still waiting
 for ExclusiveLock on extension of relation 470606 of
 database 16384 after 5062.968 ms
 [ 2011-03-30 15:54:56.015 EDT ] 7294 [local]
 asgprod:4d938939.1c7e CONTEXT:  COPY
 reportvalues_part_1932, line 158: 660729 
    41998839       
 887     45000.0
 [ 2011-03-30 15:54:56.015 EDT ] 7294 [local]
 asgprod:4d938939.1c7e STATEMENT:  COPY
 reportvalues_part_1932 FROM stdin USING DELIMITERS ' 
       '
 [ 2011-03-30 15:54:56.077 EDT ] 25781 [local]
 asgprod:4d938556.64b5 LOG:  process 25781 still waiting
 for ExclusiveLock on extension of relation 470606 of
 database 16384 after 5124.463 ms
 
 relation      |    16384
 |   470606 |      | 
      |         
   |           
    |     
    |       | 
         | 93/677526     
     | 14354 | RowExclusiveLock     
    | t
  relation      |    16384
 |   470606 |      | 
      |         
   |           
    |     
    |       | 
         | 1047/4     
        | 27451 |
 RowExclusiveLock         | t
  relation      |    16384
 |   470606 |      | 
      |         
   |           
    |     
    |       | 
         | 724/58891     
     | 20721 | RowExclusiveLock     
    | t
  transactionid |          | 
         |      | 
      |         
   |      94673393 |     
    |       | 
         | 110/502566     
    |  1506 | ExclusiveLock   
         | t
  virtualxid    |       
   |          |   
   |       | 975/92 
    |           
    |     
    |       | 
         | 975/92     
        | 25751 |
 ExclusiveLock            | t
  extend        |    16384
 |   470606 |      | 
      |         
   |           
    |     
    |       | 
         | 672/102043     
    | 20669 | ExclusiveLock   
         | f
  extend        |    16384
 |   470606 |      | 
      |         
   |           
    |     
    |       | 
         | 1178/10     
       |  6074 | ExclusiveLock 
           | f
  virtualxid    |       
   |          |   
   |       | 37/889225 
 |           
    |     
    |       | 
         | 37/889225     
     |  4623 | ExclusiveLock   
         | t
  relation      |    16384
 |   405725 |      | 
      |         
   |           
    |     
    |       | 
         | 39/822056     
     | 32502 | AccessShareLock     
     | t
  transactionid |          | 
         |      | 
      |         
   |      94673831 |     
    |       | 
         | 917/278     
       | 23134 | ExclusiveLock

Re: [PERFORM] Really really slow select count(*)

2011-02-16 Thread Bob Lunney
Ross,

Way off topic now, but from my time programming electrical meters I can tell 
you pulling the meter from its socket is potentially an extremely dangerous 
thing to do.  If there is a load across the meter's poles the spark that 
results on disconnect could kill the puller instantly.  (You don't want to know 
what happens if the person isn't killed.)  

I don't know what property your admin type is trying to protect, but I'm 
inclined to let it burn and live to work through the insurance collection 
process.

Oh, and +1 for timed escalation of a shutdown.

Bob Lunney

--- On Wed, 2/16/11, Ross J. Reedstrom reeds...@rice.edu wrote:

 From: Ross J. Reedstrom reeds...@rice.edu
 Subject: Re: [PERFORM] Really really slow select count(*)

big snip

 
 Following you off topic, I know of one admin type who has
 stated I don't
 care what sort of fine the power company wants to give me,
 if my
 property's on fire, I'm going to pull the meter, in order
 to hand it to
 the first responder, rather than have them sit there
 waiting for the
 power tech to arrive while my house burns.


  

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-11 Thread Bob Lunney

--- On Thu, 11/11/10, Mladen Gogala mladen.gog...@vmsinfo.com wrote:

 From: Mladen Gogala mladen.gog...@vmsinfo.com
 Subject: Re: [PERFORM] anti-join chosen even when slower than old plan
 To: Kenneth Marshall k...@rice.edu
 Cc: Robert Haas robertmh...@gmail.com, Tom Lane t...@sss.pgh.pa.us, 
 Kevin Grittner kevin.gritt...@wicourts.gov, 
 pgsql-performance@postgresql.org pgsql-performance@postgresql.org
 Date: Thursday, November 11, 2010, 9:15 AM
 Kenneth Marshall wrote:
  I agree with the goal of avoiding the need for a GUC.
 This needs to
  be as automatic as possible. One idea I had had was
 computing a value
  for the amount of cache data in the system by keeping
 a sum or a
  weighted sum of the table usage in the system. Smaller
 tables and
  indexes would contribute a smaller amount to the
 total, while larger
  indexes and tables would contribute a larger amount.
 Then by comparing
  this running total to the effective_cache_size, set
 the random and
  sequential costs for a query. This would allow the
 case of many 4MB
  tables to favor disk I/O more than memory I/O. The
 weighting could
  be a function of simultaneous users of the table. I
 know this is a
  bit of hand-waving but some sort of dynamic feedback
 needs to be
  provided to the planning process as system use
 increases.
  
  Regards,
  Ken
  
    
 Kenneth, you seem to be only concerned with the accuracy of
 the planning process, not with the plan stability. As a DBA
 who has to monitor real world applications, I find things
 like an execution plan changing with the use of the system
 to be my worst nightmare. The part where you say that this
 needs to be as automatic as possible probably means that I
 will not be able to do anything about it, if the optimizer,
 by any chance, doesn't get it right. That looks to me like
 an entirely wrong way to go.
 When application developer tunes the SQL both him and me
 expect that SQL to always perform that way, not to change
 the execution plan because the system is utilized more than
 it was 1 hour ago. Nobody seems to have taken my suggestion
 about having a parameter
 which would simply invent the percentage out of thin air
 seriously, because it's obviously not accurate.
 However, the planner accuracy is not the only concern.
 Running applications on the system usually requires plan
 stability. Means of
 external control of the execution plan, DBA knobs and
 buttons that can be turned and pushed to produce the desired
 plan are also very much desired.
 
 -- Mladen Gogala Sr. Oracle DBA
 1500 Broadway
 New York, NY 10036
 (212) 329-5251
 www.vmsinfo.com 
 

Mladen,

Been there, done that with Oracle for more years than I care to remember or 
admit.  Having the necessary knobs was both daunting and a godsend, depending 
on if you could find the right one(s) to frob during production use, and you 
turned them the right way and amount.  I personally find having less knobbage 
with PostgreSQL to be a huge benefit over Oracle.  In that spirit, I offer the 
following suggestion: (Ken's original suggestion inspired me, so if I 
misunderstand it, Ken, please correct me.)

What if the code that managed the shared buffer cache kept track of how many 
buffers were in the cache for each table and index?  Then the optimizer could 
know the ratio of cached to non-cached table of index buffers (how many pages 
are in PG's buffer cache vs. the total number of pages required for the entire 
table, assuming autovacuum is working well)  and plan accordingly.  It would 
even be possible to skew the estimate based on the ratio of shared_buffers to 
effective_cache_size.  The optimizer could then dynamically aadjust the random 
and sequential costs per query prior to planning, with (hopefully) plans 
optimized to the current condition of the server and host caches just prior to 
execution.

There are lots of assumptions here, the primary ones being the shared buffer 
cache's state doesn't change significantly between the start of planning and 
actual execution time, and the host is dedicated to running the database and 
nothing else that would trash the host's file system cache.  I admit that I 
haven't looked at the code for this yet, so I don't know if I'm on to something 
or off in the weeds.

Regards,

Bob Lunney






-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] out of memory problem

2010-11-09 Thread Bob Lunney
Be sure that you are starting PostgreSQL using an account with sufficient 
memory limits:

  ulimit -m

If the account has memory limit below the server's configuration you may get 
the out of memory error.

Bob Lunney

--- On Tue, 11/9/10, Till Kirchner till.kirch...@vti.bund.de wrote:

 From: Till Kirchner till.kirch...@vti.bund.de
 Subject: [PERFORM] out of memory problem
 To: pgsql-performance@postgresql.org
 Date: Tuesday, November 9, 2010, 5:39 AM
 Hello together,
 
 I get an out of memory problem I don't understand.
 The installed Postgres-Version is:
 PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC
 gcc-4.3.real (Debian 4.3.3-5) 4.3.3
 It is running on a 32bit Debian machine with 4GB RAM.
 
 Thanks for any help in advance
 
 Till
 
 --
 -
 
 Main settings are as follows:
 checkpoint_segments 16
 checkpoint_timeout 120s
 effective_cache_size 128MB
 maintenance_work_mem 128MB
 max_fsm_pages 153600
 shared_buffers 1GB
 wal_buffers 256MB
 work_mem 256MB
 
 --
 -
 
 Used query is:
     CREATE TABLE temp.bwi_atkis0809_forestland
 AS
     SELECT
     b.gid AS bwi_gid,
     a.dlm0809id,
     a.objart_08,
     a.objart_09
     FROM
     bwi.bwi_pkt AS b,
     atkis.atkis0809_forestland AS a
     WHERE
     b.the_geom  a.the_geom AND
 ST_Within(b.the_geom, a.the_geom)
     ;
     COMMIT;
 
 (The JOIN is a Spatial one using PostGIS-Functions)
 
 --
 -
 
 Full Table Sizes:
 atkis0809_forestland 2835mb
 bwi_pkt 47mb
 
 --
 -
 
 Error Message is:
 FEHLER:  Speicher aufgebraucht
 DETAIL:  Fehler bei Anfrage mit Größe 32.
 
 ** Fehler **
 
 FEHLER: Speicher aufgebraucht
 SQL Status:53200
 Detail:Fehler bei Anfrage mit Größe 32.
 
 in english:
 ERROR: out of memory
 detail: error for request with size 32
 
 --
 -
 
 The LOG looks as follows:
 
 TopMemoryContext: 42800 total in 5 blocks; 4816 free (5
 chunks); 37984 used
   CFuncHash: 8192 total in 1 blocks; 4936 free (0
 chunks); 3256 used
   TopTransactionContext: 8192 total in 1 blocks; 5520
 free (0 chunks); 2672 used
   Operator class cache: 8192 total in 1 blocks; 3848
 free (0 chunks); 4344 used
   Operator lookup cache: 24576 total in 2 blocks;
 14072 free (6 chunks); 10504 used
   MessageContext: 65536 total in 4 blocks; 35960 free
 (10 chunks); 29576 used
   smgr relation table: 8192 total in 1 blocks; 2808
 free (0 chunks); 5384 used
   TransactionAbortContext: 32768 total in 1 blocks;
 32752 free (0 chunks); 16 used
   Portal hash: 8192 total in 1 blocks; 3912 free (0
 chunks); 4280 used
   PortalMemory: 8192 total in 1 blocks; 8040 free (0
 chunks); 152 used
     PortalHeapMemory: 1024 total in 1 blocks; 896
 free (0 chunks); 128 used
       ExecutorState: 1833967692 total in 230
 blocks; 9008 free (3 chunks); 1833958684 used
         GiST temporary context: 8192
 total in 1 blocks; 8176 free (0 chunks); 16 used
         ExprContext: 0 total in 0
 blocks; 0 free (0 chunks); 0 used
         ExprContext: 8192 total in 1
 blocks; 8176 free (9 chunks); 16 used
         ExprContext: 0 total in 0
 blocks; 0 free (0 chunks); 0 used
         ExprContext: 8192 total in 1
 blocks; 3880 free (4 chunks); 4312 used
   Relcache by OID: 8192 total in 1 blocks; 2856 free
 (0 chunks); 5336 used
   CacheMemoryContext: 667472 total in 20 blocks;
 195408 free (3 chunks); 472064 used
     pg_toast_12241534_index: 1024 total in 1
 blocks; 240 free (0 chunks); 784 used
     pg_shdepend_depender_index: 1024 total in 1
 blocks; 152 free (0 chunks); 872 used
     pg_shdepend_reference_index: 1024 total in 1
 blocks; 240 free (0 chunks); 784 used
     pg_depend_depender_index: 1024 total in 1
 blocks; 152 free (0 chunks); 872 used
     pg_depend_reference_index: 1024 total in 1
 blocks; 152 free (0 chunks); 872 used
     idx_atkis0809_forestland_the_geom_gist: 1024
 total in 1 blocks; 136 free (0 chunks); 888 used
     atkis0809_forestland_pkey: 1024 total in 1
 blocks; 344 free (0 chunks); 680 used
     btree_bwi_pkt_enr: 1024 total in 1 blocks;
 344 free (0 chunks); 680 used
     btree_bwi_pkt_tnr: 1024 total in 1 blocks;
 344 free (0 chunks); 680 used
     rtree_bwi_pkt: 1024 total in 1 blocks; 136
 free (0 chunks); 888 used
     bwi_pkt_pkey: 1024 total in 1 blocks; 344
 free (0 chunks); 680 used
     pg_attrdef_adrelid_adnum_index: 1024 total in
 1 blocks; 240 free (0 chunks); 784 used
     pg_constraint_conrelid_index: 1024 total in 1
 blocks

Re: [PERFORM] Memory usage - indexes

2010-09-24 Thread Bob Lunney
Tobias,

Consult pg_statio_user_indexes to see which indexes have been used and how 
much.  Indexes with comparitively low usages rates aren't helping you much and 
are candidates for elimination.  Also, partitioning large tables can help, 
since the indexes on each partition are smaller than one huge index on the 
original table.

Good luck!

Bob Lunney

--- On Thu, 9/23/10, Tobias Brox tobi...@gmail.com wrote:

 From: Tobias Brox tobi...@gmail.com
 Subject: [PERFORM] Memory usage - indexes
 To: pgsql-performance@postgresql.org
 Date: Thursday, September 23, 2010, 5:50 PM
 We've come to a tipping point with
 one of our database servers, it's
 generally quite loaded but up until recently it was
 handling the load
 well - but now we're seeing that it struggles to process
 all the
 selects fast enough.  Sometimes we're observing some
 weird lock-like
 behaviour (see my other post on that), but most of the time
 the
 database server is just not capable of handling the load
 fast enough
 (causing the queries to pile up in the
 pg_stat_activity-view).
 
 My main hypothesis is that all the important indexes would
 fit snuggly
 into the memory before, and now they don't.  We'll
 eventually get the
 server moved over to new and improved hardware, but while
 waiting for
 that to happen we need to do focus on reducing the memory
 footprint of
 the database.  I have some general questions now ...
 
 1) Are there any good ways to verify my hypothesis? 
 Some months ago I
 thought of running some small memory-gobbling program on
 the database
 server just to see how much memory I could remove before we
 would see
 indications of the database being overloaded.  It
 seems a bit radical,
 but I think the information learned from such an experiment
 would be
 very useful ... and we never managed to set up any testing
 environment
 that faithfully replicates production traffic. 
 Anyway, it's sort of
 too late now that we're already observing performance
 problems even
 without the memory gobbling script running.
 
 2) I've seen it discussed earlier on this list ...
 shared_buffers vs
 OS caches.  Some claims that it has very little effect
 to adjust the
 size of the shared buffers.  Anyway, isn't it a risk
 that memory is
 wasted because important data is stored both in the OS
 cache and the
 shared buffers?  What would happen if using almost all
 the available
 memory for shared buffers?  Or turn it down to a bare
 minimum and let
 the OS do almost all the cache handling?
 
 3) We're discussing to drop some overlapping indexes ...
 i.e. to drop
 one out of two indexes looking like this:
 
 some_table(a)
 some_table(a,b)
 
 Would the query select * from some_table where a=? run
 slower if we
 drop the first index?  Significantly?
 
 (in our situation I found that the number of distinct b's
 for each a
 is low and that the usage stats on the second index is
 quite low
 compared with the first one, so I think we'll drop the
 second index).
 
 4) We're discussing to drop other indexes.  Does it
 make sense at all
 as long as we're not experiencing problems with
 inserts/updates?  I
 suppose that if the index isn't used it will remain on disk
 and won't
 affect the memory usage ... but what if the index is rarely
 used ...
 wouldn't it be better to do a seqscan on a table that is
 frequently
 accessed and mostly in memory than to consult an index that
 is stored
 on the disk?
 
 Sorry for all the stupid questions ;-)
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Memory usage - indexes

2010-09-24 Thread Bob Lunney
Tobias,

First off, what version of PostgreSQL are you running?  If you have 8.4, 
nightly vacuuming shouldn't be necessary with properly tuned autovacuum jobs.  

The pertinent difference between pg_stat_user_indexes and 
pg_statio_user_indexes is the latter shows the number of blocks read from disk 
or found in the cache.  You're correct, unused indexes will remain on disk, but 
indexes that don't completely fit into memory must be read from disk for each 
index scan, and that hurts performance.  (In fact, it will suddenly drop like  
a rock.  BTDT.)  By making smaller equivalent indexes on partitioned data the 
indexes for individual partitions are more likely to stay in memory, which is 
particularly important when multiple passes are made over the index by a query.

You are correct on all the points you make concerning indexes, but point 4 is 
the one I'm referring to.  You discovered this independently yourself, 
according to your anecdote about the overlapping indexes.

Bob Lunney


--- On Fri, 9/24/10, Tobias Brox tobi...@gmail.com wrote:

 From: Tobias Brox tobi...@gmail.com
 Subject: Re: [PERFORM] Memory usage - indexes
 To: Bob Lunney bob_lun...@yahoo.com
 Cc: pgsql-performance@postgresql.org
 Date: Friday, September 24, 2010, 12:46 PM
 On 24 September 2010 18:23, Bob
 Lunney bob_lun...@yahoo.com
 wrote:
  Consult pg_statio_user_indexes to see which indexes
 have been used
  and how much.
 
 What is the main differences between pg_statio_user_indexes
 and
 pg_stat_user_indexes?
 
    Indexes with comparitively low usages rates
 aren't helping you much and are
  candidates for elimination.
 
 No doubt about that - but the question was, would it really
 help us to
 drop those indexes?
 
 I think the valid reasons for dropping indexes would be:
 
 1) To speed up inserts, updates and deletes
 
 2) To spend less disk space
 
 3) Eventually, speed up nightly vacuum (it wouldn't be an
 issue with
 autovacuum though)
 
 4) To spend less memory resources?
 
 I'm not at all concerned about 1 and 2 above - we don't
 have any
 performance issues on the write part, and we have plenty of
 disk
 capacity.  We are still doing the nightly vacuum
 thing, and it does
 hurt us a bit since it's dragging ever more out in
 time.  Anyway, it's
 number four I'm wondering most about - is it anything to be
 concerned
 about or not for the least frequently used indexes? 
 An index that
 aren't being used would just stay on disk anyway,
 right?  And if there
 are limited memory resources, the indexes that are most
 frequently
 used would fill up the cache space anyway?  That's my
 thoughts at
 least - are they way off?
 
 We did have similar experiences some years ago - everything
 was
 running very fine all until one day when some
 semi-complicated
 very-frequently-run selects started taking several seconds
 to run
 rather than tens of milliseconds.  I found that we had
 two slightly
 overlapping indexes like this ...
 
   account_transaction(customer_id, trans_type)
   account_transaction(customer_id, trans_type,
 created)
 
 both of those indexes where heavily used.  I simply
 dropped the first
 one, and the problems disappeared.  I assume that both
 indexes up to
 some point fitted snuggly into memory, but one day they
 were competing
 for the limited memory space, dropping the redundant index
 solved the
 problem all until the next hardware upgrade.  I would
 never have found
 those indexes searching for the least used indexes in the
 pg_stat(io)_user_indexes view.
 




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Query

2010-08-26 Thread Bob Lunney
We need more information than that, like:What version of PostgreSQL?What does the hardware look like?What does the disk and tablespace layout look like?How are your configuration variables set?Other than that, are the statistics up to date on the VehicleMake table?Bob Lunney--- On Thu, 8/26/10, Ozer, Pam po...@automotive.com wrote:From: Ozer, Pam po...@automotive.comSubject: [PERFORM] Slow QueryTo: pgsql-performance@postgresql.orgDate: Thursday, August 26, 2010, 8:03 PM


 
 





I am new to Postgres and I am trying to understand the
Explain Analyze so I can tune the following query. I run the same query using
mysql and it takes less than 50ms. I run it on postgres and it takes 10
seconds. I feel like I am missing something very obvious. (VehicleUsed is a big
table over 750,000records) and datasetgroupyearmakemodel has 15 records. 

  

It looks like the cost is highest in the Hash Join on
Postalcode. Am I reading this correctly.? I do have indexes on the
lower(postalcode) in both tables. Why wouldn’t be using the index? Thanks
in advance for any help. 

  

Here is my query: 

  

select distinct VehicleMake.VehicleMake 

from VehicleUsed 

inner join PostalCodeRegionCountyCity on ( lower (
VehicleUsed.PostalCode ) = lower ( PostalCodeRegionCountyCity.PostalCode ) ) 

INNER JOIN DATASETGROUPYEARMAKEMODEL ON (
VEHICLEUSED.VEHICLEYEAR = DATASETGROUPYEARMAKEMODEL.VEHICLEYEAR )  

AND ( VEHICLEUSED.VEHICLEMAKEID =
DATASETGROUPYEARMAKEMODEL.VEHICLEMAKEID )  

AND ( VEHICLEUSED.VEHICLEMODELID =
DATASETGROUPYEARMAKEMODEL.VEHICLEMODELID ) 

inner join VehicleMake on ( VehicleUsed.VehicleMakeId =
VehicleMake.VehicleMakeId ) 

where  

( DatasetGroupYearMakeModel.DatasetGroupId = 3 ) and 

( VehicleUsed.DatasetId  113 )  

and ( VehicleUsed.ProductGroupId  13 )  

and ( PostalCodeRegionCountyCity.RegionId = 36 ) 

order by VehicleMake.VehicleMake 

limit 50 

  

Here is the explain analyze 

  

"Limit (cost=38292.53..38293.19 rows=261 width=8)
(actual time=10675.857..10675.892 rows=42 loops=1)" 

" - Sort (cost=38292.53..38293.19 rows=261
width=8) (actual time=10675.855..10675.868 rows=42 loops=1)" 

" Sort Key: vehiclemake.vehiclemake" 

" Sort Method: quicksort Memory: 18kB" 

" - HashAggregate (cost=38279.45..38282.06
rows=261 width=8) (actual time=10675.710..10675.728 rows=42 loops=1)" 

" - Hash Join (cost=436.31..38270.51
rows=3576 width=8) (actual time=4.471..10658.291 rows=10425 loops=1)" 

" Hash Cond:
(vehicleused.vehiclemakeid = vehiclemake.vehiclemakeid)" 

" - Hash Join
(cost=428.43..38213.47 rows=3576 width=4) (actual time=4.152..10639.742
rows=10425 loops=1)" 

" Hash Cond:
(lower((vehicleused.postalcode)::text) =
lower((postalcoderegioncountycity.postalcode)::text))" 

" - Nested Loop
(cost=101.81..37776.78 rows=11887 width=10) (actual time=1.172..9876.586
rows=382528 loops=1)" 

" - Bitmap Heap
Scan on datasetgroupyearmakemodel (cost=101.81..948.81 rows=5360 width=6)
(actual time=0.988..17.800 rows=5377 loops=1)" 

" Recheck Cond:
(datasetgroupid = 3)" 

" - Bitmap
Index Scan on datasetgroupyearmakemodel_i04 (cost=0.00..100.47 rows=5360
width=0) (actual time=0.830..0.830 rows=5377 loops=1)" 

" Index
Cond: (datasetgroupid = 3)" 

" - Index Scan
using vehicleused_i10 on vehicleused (cost=0.00..6.85 rows=1 width=12) (actual
time=0.049..1.775 rows=71 loops=5377)" 

" Index Cond:
((vehicleused.vehiclemodelid = datasetgroupyearmakemodel.vehiclemodelid) AND
(vehicleused.vehiclemakeid = datasetgroupyearmakemodel.vehiclemakeid) AND
(vehicleused.vehicleyear = datasetgroupyearmakemodel.vehicleyear))" 

" Filter:
((vehicleused.datasetid  113) AND (vehicleused.productgroupid 
13))" 

" - Hash
(cost=308.93..308.93 rows=1416 width=6) (actual time=2.738..2.738 rows=1435
loops=1)" 

" - Bitmap Heap
Scan on postalcoderegioncountycity (cost=27.23..308.93 rows=1416 width=6)
(actual time=0.222..0.955 rows=1435 loops=1)" 

" Recheck Cond:
(regionid = 36)" 

" - Bitmap
Index Scan on postalcoderegioncountycity_i05 (cost=0.00..26.87 rows=1416
width=0) (actual time=0.202..0.202 rows=1435 loops=1)" 

" Index
Cond: (regionid = 36)" 

" - Hash (cost=4.61..4.61
rows=261 width=10) (actual time=0.307..0.307 rows=261 loops=1)" 

" - Seq Scan on
vehiclemake (cost=0.00..4.61 rows=261 width=10) (actual time=0.033..0.154
rows=261 loops=1)" 

"Total runtime: 10676.058 ms" 

  



Pam Ozer 

Data Architect 

po...@automotive.com
 


 
  
  tel. 949.705.3468 
  
 





 
  
   
  
  
   
  
  
  Source Interlink Media 
  1733 Alton Pkwy Suite 100, Irvine, CA 92606 
  www.simautomotive.com
   
  
 




Confidentiality Notice- This electronic communication, and all
information herein, including files attached hereto, is private, and is the
property of the sender. This commun

Re: [PERFORM] Triggers or code?

2010-08-25 Thread Bob Lunney
That depends on your application's requirements.  If a transaction on table X 
fails, do you still want the history (noting the failure)?  If so, go with 
embedding the code in your script.  If you only want history for successful 
transactions, a trigger will take care of that for you automatically.
Bob Lunney 

--- On Mon, 8/23/10, DM dm.a...@gmail.com wrote:

From: DM dm.a...@gmail.com
Subject: [PERFORM] Triggers or code?
To: pgsql-performance@postgresql.org
Date: Monday, August 23, 2010, 2:42 PM

Hello There,

I have a table x and a history table x_hist, whats the best way to update the 
history table.

should i need to use triggers or embed a code in my script to update the 
history table?

what is the performance impact of a trigger versus embedding the code in the 
script?


thanks for your time.

- Deepak




  

Re: [PERFORM] requested shared memory size overflows size_t

2010-06-11 Thread Bob Lunney
Tom,

First off, I wouldn't use a VM if I could help it, however, sometimes you have 
to make compromises.  With a 16 Gb machine running 64-bit Ubuntu and only 
PostgreSQL, I'd start by allocating 4 Gb to shared_buffers.  That should leave 
more than enough room for the OS and file system cache.  Then I'd begin testing 
by measuring response times of representative queries with significant amounts 
of data.

Also, what is the disk setup for the box?  Filesystem?  Can WAL files have 
their own disk?  Is the workload OLTP or OLAP, or a mixture of both?  There is 
more that goes into tuning a PG server for good performance than simply 
installing the software, setting a couple of GUCs and running it.

Bob

--- On Thu, 6/10/10, Tom Wilcox hungry...@gmail.com wrote:

 From: Tom Wilcox hungry...@gmail.com
 Subject: Re: [PERFORM] requested shared memory size overflows size_t
 To: Bob Lunney bob_lun...@yahoo.com
 Cc: Robert Haas robertmh...@gmail.com, pgsql-performance@postgresql.org
 Date: Thursday, June 10, 2010, 10:45 AM
 Thanks guys. I am currently
 installing Pg64 onto a Ubuntu Server 64-bit installation
 running as a VM in VirtualBox with 16GB of RAM accessible.
 If what you say is true then what do you suggest I do to
 configure my new setup to best use the available 16GB (96GB
 and native install eventually if the test goes well) of RAM
 on Linux.
 
 I was considering starting by using Enterprise DBs tuner to
 see if that optimises things to a better quality..
 
 Tom
 
 On 10/06/2010 15:41, Bob Lunney wrote:
  True, plus there are the other issues of increased
 checkpoint times and I/O, bgwriter tuning, etc.  It may
 be better to let the OS cache the files and size
 shared_buffers to a smaller value.
  
  Bob Lunney
  
  --- On Wed, 6/9/10, Robert Haasrobertmh...@gmail.com 
 wrote:
  
     
  From: Robert Haasrobertmh...@gmail.com
  Subject: Re: [PERFORM] requested shared memory
 size overflows size_t
  To: Bob Lunneybob_lun...@yahoo.com
  Cc: pgsql-performance@postgresql.org,
 Tom Wilcoxhungry...@googlemail.com
  Date: Wednesday, June 9, 2010, 9:49 PM
  On Wed, Jun 2, 2010 at 9:26 PM, Bob
  Lunneybob_lun...@yahoo.com
  wrote:
       
  Your other option, of course, is a nice 64-bit
 linux
         
  variant, which won't have this problem at all.
  
  Although, even there, I think I've heard that
 after 10GB
  you don't get
  much benefit from raising it further.  Not
 sure if
  that's accurate or
  not...
  
  -- Robert Haas
  EnterpriseDB: http://www.enterprisedb.com
  The Enterprise Postgres Company
  
       
  
  
     
 
 




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query about index usage

2010-06-11 Thread Bob Lunney
Jayadevan,

PostgreSQL must go to the table to determine if the row you are requesting is 
visible to your transaction.  This is an artifact of the MVCC implementation.  
Oracle can fetch the data from the index, since it doesn't keep multiple 
representations of the rows, but it may need to check the undo logs to 
determine the state that applies to your transaction.  Its just two different 
ways to accomplish the same thing.

Bob Lunney

--- On Fri, 6/11/10, Jayadevan M jayadevan.maym...@ibsplc.com wrote:

 From: Jayadevan M jayadevan.maym...@ibsplc.com
 Subject: [PERFORM] Query about index usage
 To: pgsql-performance@postgresql.org
 Date: Friday, June 11, 2010, 5:56 AM
 Hello all,
 
 One query about PostgreSQL's index usage. If I select just
 one column on 
 which there is an index (or select only columns on which
 there is an 
 index), and the index is used by PostgreSQL, does
 PostgreSQL avoid table 
 access if possible?  I am trying to understand the
 differences between 
 Oracle's data access patterns and PostgreSQL's. 
 Here is how it works in Oracle.
 
 Case 1 - SELECT column which is not there in the index 
 
 SQL select name from myt where id = 13890;
 
 NAME
 ---
 
 
 
 Execution Plan
 --
 Plan hash value: 2609414407
 
 -
 | Id  | Operation         
          | Name  |
 Rows  | Bytes | Cost (%CPU)| 
 Time    |
 -
 |   0 | SELECT STATEMENT     
       |   
    |     1 |   
 65 |     2   (0)| 
 00:00:01 |
 |   1 |  TABLE ACCESS BY INDEX ROWID|
 MYT   |     1 | 
   65 |     2   (0)|
 
 00:00:01 |
 |*  2 |   INDEX RANGE SCAN   
       | MYIDX |     1
 |       | 
    1   (0)| 
 00:00:01 |
 -
 
 Predicate Information (identified by operation id):
 ---
 
    2 - access(ID=13890)
 
 Note
 -
    - dynamic sampling used for this
 statement
 
 
 Statistics
 --
           0  recursive calls
           0  db block gets
           4  consistent gets
           0  physical reads
           0  redo size
         409  bytes sent via
 SQL*Net to client
         384  bytes received via
 SQL*Net from client
           2  SQL*Net
 roundtrips to/from client
           0  sorts (memory)
           0  sorts (disk)
           1  rows processed
 
  
  
 Case 1 - SELECT column which is there in the index 
 
 SQL select id from myt where id = 13890;
 
         ID
 --
      13890
 
 
 Execution Plan
 --
 Plan hash value: 2555454399
 
 --
 | Id  | Operation        |
 Name  | Rows  | Bytes | Cost (%CPU)| Time 
    |
 --
 |   0 | SELECT STATEMENT |   
    |     1 |   
 13 |     1   (0)|
 00:00:01 |
 |*  1 |  INDEX RANGE SCAN| MYIDX | 
    1 |    13 | 
    1   (0)| 00:00:01 |
 --
 
 Predicate Information (identified by operation id):
 ---
 
    1 - access(ID=13890)
 
 Note
 -
    - dynamic sampling used for this
 statement
 
 
 Statistics
 --
           0  recursive calls
           0  db block gets
           3  consistent gets
           0  physical reads
           0  redo size
         407  bytes sent via
 SQL*Net to client
         384  bytes received via
 SQL*Net from client
           2  SQL*Net
 roundtrips to/from client
           0  sorts (memory)
           0  sorts (disk)
           1  rows processed
 
 In the second query where id was selected, the table was
 not used at all. 
 In PosgreSQL, explain gives me similar output in both
 cases.
 Table structure - 
 
 postgres=# \d myt
              Table
 public.myt
  Column |         Type 
         | Modifiers
 +---+---
  id     | integer     
          |
  name   | character varying(20) |
 Indexes:
     myidx btree (id)
 
 
 Regards,
 Jayadevan
 
 
 
 
 
 DISCLAIMER: 
 
 The information in this e-mail and any attachment is
 intended only for 
 the person to whom it is addressed and may contain
 confidential and/or 
 privileged material. If you have received this e-mail in
 error, kindly 
 contact the sender and destroy all copies of the original
 communication. 
 IBS makes no warranty, express or implied, nor guarantees
 the accuracy, 
 adequacy or completeness of the information contained in
 this email or any 
 attachment and is not liable

Re: [PERFORM] requested shared memory size overflows size_t

2010-06-10 Thread Bob Lunney
True, plus there are the other issues of increased checkpoint times and I/O, 
bgwriter tuning, etc.  It may be better to let the OS cache the files and size 
shared_buffers to a smaller value.  

Bob Lunney

--- On Wed, 6/9/10, Robert Haas robertmh...@gmail.com wrote:

 From: Robert Haas robertmh...@gmail.com
 Subject: Re: [PERFORM] requested shared memory size overflows size_t
 To: Bob Lunney bob_lun...@yahoo.com
 Cc: pgsql-performance@postgresql.org, Tom Wilcox hungry...@googlemail.com
 Date: Wednesday, June 9, 2010, 9:49 PM
 On Wed, Jun 2, 2010 at 9:26 PM, Bob
 Lunney bob_lun...@yahoo.com
 wrote:
  Your other option, of course, is a nice 64-bit linux
 variant, which won't have this problem at all.
 
 Although, even there, I think I've heard that after 10GB
 you don't get
 much benefit from raising it further.  Not sure if
 that's accurate or
 not...
 
 -- 
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company
 




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-02 Thread Bob Lunney
Tom,

A 32 bit build could only reference at most 4 Gb - certainly not 60 Gb.  Also, 
Windows doesn't do well with large shared buffer sizes anyway.  Try setting 
shared_buffers to 2 Gb and let the OS file system cache handle the rest.

Your other option, of course, is a nice 64-bit linux variant, which won't have 
this problem at all.

Good luck!

Bob Lunney

--- On Wed, 6/2/10, Tom Wilcox hungry...@googlemail.com wrote:

 From: Tom Wilcox hungry...@googlemail.com
 Subject: Re: [PERFORM] requested shared memory size overflows size_t
 To: pgsql-performance@postgresql.org
 Date: Wednesday, June 2, 2010, 6:58 AM
 Hi,
 
 Sorry to revive an old thread but I have had this error
 whilst trying to configure my 32-bit build of postgres to
 run on a 64-bit Windows Server 2008 machine with 96GB of RAM
 (that I would very much like to use with postgres).
 
 I am getting:
 
 2010-06-02 11:34:09 BSTFATAL:  requested shared memory
 size overflows size_t
 2010-06-02 11:41:01 BSTFATAL:  could not create shared
 memory segment: 8
 2010-06-02 11:41:01 BSTDETAIL:  Failed system call was
 MapViewOfFileEx.
 
 which makes a lot of sense since I was setting
 shared_buffers (and effective_cache_size) to values like
 60GB..
 
 Is it possible to get postgres to make use of the available
 96GB RAM on a Windows 32-bit build? Otherwise, how can I get
 it to work?
 
 Im guessing my options are:
 
 - Use the 64-bit Linux build (Not a viable option for me -
 unless from a VM - in which case recommendations?)
 or
 - Configure Windows and postgres properly (Preferred option
 - but I don't know what needs to be done here or if Im
 testing properly using Resource Monitor)
 
 Thanks,
 Tom
 
 
 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT present

2010-06-02 Thread Bob Lunney
Jori,

What is the PostgreSQL 
version/shared_buffers/work_mem/effective_cache_size/default_statistics_target? 
 Are the statistics for the table up to date?  (Run analyze verbose tablename 
to update them.)  Table and index structure would be nice to know, too.

If all else fails you can set enable_seqscan = off for the session, but that is 
a Big Hammer for what is probably a smaller problem.

Bob Lunney

--- On Wed, 6/2/10, Jori Jovanovich j...@dimensiology.com wrote:

From: Jori Jovanovich j...@dimensiology.com
Subject: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT present
To: pgsql-performance@postgresql.org
Date: Wednesday, June 2, 2010, 4:28 PM

hi,

I have a problem space where the main goal is to search backward in time for 
events.  Time can go back very far into the past, and so the
table can get quite large.  However, the vast majority of queries are all 
satisfied by relatively recent data.  I have an index on the row creation date 
and I would like almost all of my queries to have a query plan looking 
something like:

 Limit ...   -  Index Scan Backward using server_timestamp_idx on events  
(cost=0.00..623055.91 rows=8695 width=177)
         ...
However, PostgreSQL frequently tries to do a full table scan.  Often what 
controls whether a scan is performed or not is dependent on the size of the 
LIMIT and how detailed the WHERE clause is.  In practice, the scan is always 
the wrong answer for my use cases (where always is defined to be 99.9%).

Some examples:

(1) A sample query that devolves to a full table scan

  EXPLAIN
   SELECT events.id, events.client_duration, events.message, events.created_by, 
events.source, events.type, events.event, events.environment,
          events.server_timestamp, events.session_id, events.reference, 
events.client_uuid     FROM events
    WHERE client_uuid ~* E'^foo bar so what' ORDER BY server_timestamp DESC
    LIMIT 20;                                QUERY PLAN (BAD!)
-- Limit
  (cost=363278.56..363278.61 rows=20 width=177)
   -  Sort  (cost=363278.56..363278.62 rows=24 width=177)         Sort Key: 
server_timestamp
         -  Seq Scan on events  (cost=0.00..363278.01 rows=24 width=177)
               Filter: (client_uuid ~* '^foo bar so what'::text)


(2) Making the query faster by making the string match LESS specific (odd, 
seems like it should be MORE)

  EXPLAIN
   SELECT events.id, events.client_duration, events.message, events.created_by, 
events.source, events.type, events.event, events.environment,
          events.server_timestamp, events.session_id, events.reference, 
events.client_uuid     FROM events
    WHERE client_uuid ~* E'^foo' ORDER BY server_timestamp DESC
    LIMIT 20;                                QUERY PLAN (GOOD!)                 
                      
 Limit
  (cost=0.00..1433.14 rows=20 width=177)   -  Index Scan Backward using 
server_timestamp_idx on events  (cost=0.00..623055.91 rows=8695 width=177)
         Filter: (client_uuid ~* '^foo'::text)

(3) Alternatively making the query faster by using a smaller limit

  EXPLAIN

   SELECT events.id, events.client_duration, events.message, events.created_by, 
events.source, events.type, events.event, events.environment,
          events.server_timestamp, events.session_id, events.reference, 
events.client_uuid     FROM events
    WHERE client_uuid ~* E'^foo bar so what' ORDER BY server_timestamp DESC
    LIMIT 10;
                                QUERY PLAN (GOOD!)                              
         
--
 Limit  (cost=0.00..259606.63 rows=10 width=177)   -  Index Scan Backward 
using server_timestamp_idx on events  (cost=0.00..623055.91 rows=24 width=177)
         Filter: (client_uuid ~* '^foo bar so what'::text)

I find myself wishing I could just put a SQL HINT on the query to force the 
index to be used but I understand that HINTs are considered harmful and are 
therefore not provided for PostgreSQL, so what is the recommended way to solve 
this?

thank you very much




  

Re: [PERFORM] Slow Bulk Delete

2010-05-12 Thread Bob Lunney
Thilo,

Just a few of thoughts off the top of my head:

1.  If you know the ids of the rows you want to delete beforhand, insert them 
in a table, then run the delete based on a join with this table.

2.  Better yet, insert the ids into a table using COPY, then use a join to 
create a new table with the rows you want to keep from the first table.  Drop 
the original source table, truncate the id table, rename the copied table and 
add indexes and constraints.

3.  See if you can partition the table somehow so the rows you want to delete 
are in a single partitioned child table.  When its time to delete them just 
drop the child table.

Of course, if the 1M rows you need to delete is very small compared to the 
total overall size of the original table the first two techniques might now buy 
you anything, but its worth a try.

Good luck!

Bob Lunney

--- On Sat, 5/8/10, thilo thilo.tan...@gmail.com wrote:

 From: thilo thilo.tan...@gmail.com
 Subject: [PERFORM] Slow Bulk Delete
 To: pgsql-performance@postgresql.org
 Date: Saturday, May 8, 2010, 7:39 AM
 Hi all!
 
 We moved from MySQL to Postgresql for some of our projects.
 So far
 we're very impressed with the performance (especially
 INSERTs and
 UPDATEs), except for a strange problem with the following
 bulk delete
 query:
 
 DELETE FROM table1 WHERE table2_id = ?
 
 I went through these Wiki pages, trying to solve the
 problem:
 
 http://wiki.postgresql.org/wiki/SlowQueryQuestions and
 http://wiki.postgresql.org/wiki/Performance_Optimization
 
 but unfortunately without much luck.
 
 Our application is doing batch jobs. On every batch run, we
 must
 delete approx. 1M rows in table1 and recreate these
 entries. The
 inserts are very fast, but deletes are not. We cannot make
 updates,
 because there's no identifying property in the objects of
 table1.
 
 This is what EXPLAIN is telling me:
 
 EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id =
 11242939
                
                
                
          QUERY
 PLAN
 
  Index Scan using sr_index on table1  (cost=0.00..8.56
 rows=4 width=6)
 (actual time=0.111..0.154 rows=4 loops=1)
    Index Cond: (table2_id = 11242939)
  Total runtime: 0.421 ms
 (3 rows)
 
 This seems to be very fast (using the index), but running
 this query
 from JDBC takes up to 20ms each. For 1M rows this sum up to
 several
 hours. When I have a look at pg_top psql uses most of the
 time for the
 deletes. CPU usage is 100% (for the core used by
 postgresql). So it
 seems that postgresql is doing some sequential scanning or
 constraint
 checks.
 
 This is the table structure:
 
 id   
 bigint (primary key)
 table2_id   
 bigint (foreign key constraint
 to table 2, *indexed*)
 table3_id   
 bigint (foreign key constraint
 to table 3, *indexed*)
 some non-referenced text and boolean fields
 
 My server settings (Potgresql 8.4.2):
 
 shared_buffers = 1024MB
 effective_cache_size = 2048MB
 work_mem = 128MB
 wal_buffers = 64MB
 checkpoint_segments = 32
 checkpoint_timeout = 15min
 checkpoint_completion_target = 0.9
 
 It would be very nice to give me a hint to solve the
 problem. It
 drives me crazy ;-)
 
 If you need more details please feel free to ask!
 
 Thanks in advance for your help!
 
 Kind regards
 
 Thilo
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump far too slow

2010-03-21 Thread Bob Lunney
If you have a multi-processor machine (more than 2) you could look into pigz, 
which is a parallelized implementation of gzip. I gotten dramatic reductions in 
wall time using it to zip dump files.  The compressed file is readable by 
ungzip.

Bob Lunney

From: Dave Crooke dcro...@gmail.com
Subject: Re: [PERFORM] pg_dump far too slow
To: David Newall postgre...@davidnewall.com
Cc: Tom Lane t...@sss.pgh.pa.us, pgsql-performance@postgresql.org, 
robertmh...@gmail.com
Date: Sunday, March 21, 2010, 10:33 AM

One more from me 

If you think that the pipe to GZIP may be causing pg_dump to stall, try putting 
something like buffer(1) in the pipeline ... it doesn't generally come with 
Linux, but you can download source or create your own very easily ... all it 
needs to do is asynchronously poll stdin and write stdout. I wrote one in Perl 
when I used to do a lot of digital video hacking, and it helped with chaining 
together tools like mplayer and mpeg.


However, my money says that Tom's point about it being (disk) I/O bound is 
correct :-)

Cheers
Dave

On Sun, Mar 21, 2010 at 8:17 AM, David Newall postgre...@davidnewall.com 
wrote:

Thanks for all of the suggestions, guys, which gave me some pointers on new 
directions to look, and I learned some interesting things.




The first interesting thing was that piping (uncompressed) pg_dump into gzip, 
instead of using pg_dump's internal compressor, does bring a lot of extra 
parallelism into play.  (Thank you, Matthew Wakeling.)  I observed gzip using 
100% CPU, as expected, and also two, count them, two postgres processes 
collecting data, each consuming a further 80% CPU.  It seemed to me that 
Postgres was starting and stopping these to match the capacity of the consumer 
(i.e. pg_dump and gzip.)  Very nice.  Unfortunately one of these processes 
dropped eventually, and, according to top, the only non-idle process running 
was gzip (100%.)  Obviously there were postgress and pg_dump processes, too, 
but they were throttled by gzip's rate of output and effectively idle (less 
than 1% CPU).  That is also interesting.  The final output from gzip was being 
produced at the rate of about 0.5MB/second, which seems almost unbelievably 
slow.




I next tried Tom Lane's suggestion, COPY WITH BINARY, which produced the 
complete 34GB file in 30 minutes (a good result.)  I then compressed that with 
gzip, which took an hour and reduced the file to 32GB (hardly worth the effort) 
for a total run time of 90 minutes.  In that instance, gzip produced output at 
the rate of 10MB/second, so I tried pg_dump -Z0 to see how quickly that would 
dump the file.  I had the idea that I'd go on to see how quickly gzip would 
compress it, but unfortunately it filled my disk before finishing (87GB at that 
point), so there's something worth knowing: pg_dump's output for binary data is 
very much less compact than COPY WITH BINARY; all those backslashes, as Tom 
pointed out.  For the aforementioned reason, I didn't get to see how gzip would 
perform.  For the record, pg_dump with no compression produced output at the 
rate of 26MB/second; a rather meaningless number given the 200%+ expansion of 
final output.




I am now confident the performance problem is from gzip, not Postgres and 
wonder if I should read up on gzip to find why it would work so slowly on a 
pure text stream, albeit a representation of PDF which intrinsically is fairly 
compressed.  Given the spectacular job that postgres did in adjusting it's rate 
of output to match the consumer process, I did wonder if there might have been 
a tragic interaction between postgres and gzip; perhaps postgres limits its 
rate of output to match gzip; and gzip tries to compress what's available, that 
being only a few bytes; and perhaps that might be so inefficient that it hogs 
the CPU; but it don't think that likely.  I had a peek at gzip's source 
(surprisingly readable) and on first blush it does seem that unfortunate input 
could result in only a few bytes being written each time through the loop, 
meaning only a few more bytes could be read in.




Just to complete the report, I created a child table to hold the PDF's, which 
are static, and took a dump of just that table, and adjusted my backup command 
to exclude it.  Total size of compressed back sans PDFs circa 7MB taking around 
30 seconds.







  

Re: [PERFORM] Block at a time ...

2010-03-17 Thread Bob Lunney
Greg is correct, as usual.  Geometric growth of files is A Bad Thing in an  
Oracle DBA's world, since you can unexpectedly (automatically?) run out of file 
system space when the database determines it needs x% more extents than last 
time.

The concept of contiguous extents, however, has some merit, particularly when 
restoring databases.  Prior to parallel restore, a table's files were created 
and extended in roughly contiguous allocations, presuming there was no other 
activity on your database disks.  (You do dedicate disks, don't you?)  When 
using 8-way parallel restore against a six-disk RAID 10 group I found that 
table and index scan performance dropped by about 10x.  I/O performance was 
restored by either clustering the tables one at a time, or by dropping and 
restoring them one at a time.  The only reason I can come up with for this 
behavior is file fragmentation and increased seek times.

If PostgreSQL had a mechanism to pre-allocate files prior to restoring the 
database that might mitigate the problem.  

Then if we could only get parallel index operations ...

Bob Lunney

--- On Wed, 3/17/10, Greg Stark gsst...@mit.edu wrote:

 From: Greg Stark gsst...@mit.edu
 Subject: Re: [PERFORM] Block at a time ...
 To: Pierre C li...@peufeu.com
 Cc: Alvaro Herrera alvhe...@commandprompt.com, Dave Crooke 
 dcro...@gmail.com, pgsql-performance@postgresql.org
 Date: Wednesday, March 17, 2010, 5:52 AM
 On Wed, Mar 17, 2010 at 7:32 AM,
 Pierre C li...@peufeu.com
 wrote:
  I was thinking in something like that, except that
 the factor I'd use
  would be something like 50% or 100% of current
 size, capped at (say) 1 GB.
 
 This turns out to be a bad idea. One of the first thing
 Oracle DBAs
 are told to do is change this default setting to allocate
 some
 reasonably large fixed size rather than scaling upwards.
 
 This might be mostly due to Oracle's extent-based space
 management but
 I'm not so sure. Recall that the filesystem is probably
 doing some
 rounding itself. If you allocate 120kB it's probably
 allocating 128kB
 itself anyways. Having two layers rounding up will result
 in odd
 behaviour.
 
 In any case I was planning on doing this a while back. Then
 I ran some
 experiments and couldn't actually demonstrate any problem.
 ext2 seems
 to do a perfectly reasonable job of avoiding this problem.
 All the
 files were mostly large contiguous blocks after running
 some tests --
 IIRC running pgbench.
 
 
  Using fallocate() ?
 
 I think we need posix_fallocate().
 
 -- 
 greg
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


  

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] sub-select in IN clause results in sequential scan

2009-10-29 Thread Bob Lunney
Try replacing the 'current_timestamp - interval 8 days' portion with explicit 
values (e.g. partitioned_column  '2009-10-21'::date ) and see if that works.  
I think the query planner can only use explicit values to determine if it 
should go straight to partitioned tables.

Bob

--- On Thu, 10/29/09, Anj Adu fotogra...@gmail.com wrote:

 From: Anj Adu fotogra...@gmail.com
 Subject: Re: [PERFORM] sub-select in IN clause results in sequential scan
 To: Angayarkanni kangayarka...@gmail.com
 Cc: Grzegorz Jaśkiewicz gryz...@gmail.com, 
 pgsql-performance@postgresql.org
 Date: Thursday, October 29, 2009, 10:10 AM
 Join did not help. A sequential scan
 is still being done. The
 hardcoded value in the IN clause performs the best. The
 time
 difference is more than an order of magnitude.
 
 2009/10/29 Angayarkanni kangayarka...@gmail.com:
 
  2009/10/29 Grzegorz Jaśkiewicz gryz...@gmail.com
 
 
  On Wed, Oct 28, 2009 at 6:13 PM, Anj Adu fotogra...@gmail.com
 wrote:
 
  Postgres consistently does a sequential scan
 on the child partitions
  for this query
 
  select * from partitioned_table
  where partitioned_column 
 current_timestamp - interval 8 days
  where x in (select yy from z where colname
 like 'aaa%')
 
  If I replace the query with
 
  select * from partitioned_table
  where partitioned_column 
 current_timestamp - interval 8 days
  where x in (hardcode_value)
 
  The results are in line with expectation (very
 fast and uses a Bitmap
  Index Scan on the column X)
  \
 
  use JOIN luke..
 
  --
  GJ
 
  Yes you try by using Join
 
  JAK
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Insert performance and multi-column index order

2009-06-30 Thread Bob Lunney

Greg,

Thanks for the mental prod!  Yes, the original data is more closely sorted by 
the timestamptz column, since they represent events coming into the collection 
system in real time.  As for the distribution of data values, it goes without 
saying the timestamptz value is monotonically increasing, with roughly 1300 
entries having the same timestamptz value.  The other three columns' values are 
essentially reference data, with 400 values for the varchar, 680 for the first 
text column, and 60 for the second text column.  The distribution is fairly 
even, with some small spikes but nothing significant.

The duh moment came for me when you pointed out the implicit sort order of 
the data.  After resorting the data into the new index column order the insert 
performance was largely restored.  I didn't monitor the process with vmstat, 
however - the end result is good enough for me.  I believe that the index 
maintenance of page splitting, etc., that you describe below was exactly the 
culprit, and that presorting the data solved that problem.  

I call it my duh moment since I've presorted data for Sybase and Oracle for 
exactly the same reason, but forgot to apply the lesson to PostgreSQL.

BTW, this is PG 8.2.1 and 8.3.7 running on SLES 10.3, although I don't think it 
matters.

Thanks for the help, Greg and Tom!

--- On Sat, 6/27/09, Greg Smith gsm...@gregsmith.com wrote:

 From: Greg Smith gsm...@gregsmith.com
 Subject: Re: [PERFORM] Insert performance and multi-column index order
 To: bob_lun...@yahoo.com
 Cc: pgsql-performance@postgresql.org
 Date: Saturday, June 27, 2009, 1:08 AM
 On Fri, 26 Jun 2009, bob_lun...@yahoo.com
 wrote:
 
  The original unique index was in the order
 (timestamptz, varchar, text, text) and most queries against
 it were slow.  I changed the index order to (varchar, text,
 timestamptz, text) and queries now fly, but loading data
 (via copy from stdin) in the table is 2-4 times slower.
 
 Is the input data closer to being sorted by the timestamptz
 field than the varchar field?  What you might be seeing
 is that the working set of index pages needed to keep
 building the varchar index are bigger or have more of a
 random access component to them as they spill in and out of
 the buffer cache.  Usually you can get a better idea
 what the difference is by comparing the output from vmstat
 while the two are loading.  More random read/write
 requests in the mix will increase the waiting for I/O
 percentage while not increasing the total amount
 read/written per second.
 
 --
 * Greg Smith gsm...@gregsmith.com
 http://www.gregsmith.com Baltimore, MD




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance