from:"Bruno Wolff III"

Re: [PERFORM] ECC RAM really needed?

2007-05-25 Thread Bruno Wolff III

On Fri, May 25, 2007 at 18:45:15 -0700,
  Craig James [EMAIL PROTECTED] wrote:
 We're thinking of building some new servers.  We bought some a while back 
 that have ECC (error correcting) RAM, which is absurdly expensive compared 
 to the same amount of non-ECC RAM.  Does anyone have any real-life data 
 about the error rate of non-ECC RAM, and whether it matters or not?  In my 
 long career, I've never once had a computer that corrupted memory, or at 
 least I never knew if it did.  ECC sound like a good idea, but is it 
 solving a non-problem?

In the past when I purchased ECC ram it wasn't that much more expensive
than nonECC ram.

Wikipedia suggests a rule of thumb of one error per month per gigabyte,
though suggests error rates vary widely. They reference a paper that should
provide you with more background.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] BUG #3270: limit 16 optimizer behaviour

2007-05-11 Thread Bruno Wolff III

This should have been asked on the performance list, not filed as a bug.
I doubt anyone will have a complete answer to your question without
EXPLAIN ANALYZE output from the query.

Have you ANALYZE'd the tables recently? Poor statistics is one possible
cause of the issue you are having.

On Fri, May 11, 2007 at 14:07:57 +,
  Liviu Ionescu [EMAIL PROTECTED] wrote:
 
 The following bug has been logged online:
 
 Bug reference:  3270
 Logged by:  Liviu Ionescu
 Email address:  [EMAIL PROTECTED]
 PostgreSQL version: 8.2.4
 Operating system:   Linux
 Description:limit  16 optimizer behaviour
 Details: 
 
 I have a table of about 15Mrows, and a query like this:
 
 SELECT historianid,storagedate,slotdate,status,value FROM historiandata 
 JOIN rtunodes ON(historiandata.historianid=rtunodes.nodeid)
 JOIN rtus ON(rtunodes.rtuid=rtus.nodeid)
 WHERE realmid IN (1119,1422,698,1428) 
 AND historianid in (2996)
 ORDER BY storagedate desc 
 LIMIT 10
 
 if there are no records with the given historianid, if limit is = 16 the
 query is quite fast, otherwise it takes forever.
 
 my current fix was to always increase the limit to 16, but, although I know
 the optimizer behaviour depends on LIMIT, I still feel this looks like a
 bug; if the resultset has no records the value of the LIMIT should not
 matter.
 
 regards,
 
 Liviu Ionescu
 
 
 
 CREATE TABLE historiandata
 (
   historianid int4 NOT NULL,
   status int2 NOT NULL DEFAULT 0,
   value float8,
   slotdate timestamptz NOT NULL,
   storagedate timestamptz NOT NULL DEFAULT now(),
   CONSTRAINT historiandata_pkey PRIMARY KEY (historianid, slotdate),
   CONSTRAINT historianid_fkey FOREIGN KEY (historianid)
   REFERENCES historians (nodeid) MATCH SIMPLE
   ON UPDATE CASCADE ON DELETE RESTRICT
 ) 
 WITHOUT OIDS;
 ALTER TABLE historiandata OWNER TO tomcat;
 
 
 -- Index: historiandata_historianid_index
 
 -- DROP INDEX historiandata_historianid_index;
 
 CREATE INDEX historiandata_historianid_index
   ON historiandata
   USING btree
   (historianid);
 
 -- Index: historiandata_slotdate_index
 
 -- DROP INDEX historiandata_slotdate_index;
 
 CREATE INDEX historiandata_slotdate_index
   ON historiandata
   USING btree
   (slotdate);
 
 -- Index: historiandata_storagedate_index
 
 -- DROP INDEX historiandata_storagedate_index;
 
 CREATE INDEX historiandata_storagedate_index
   ON historiandata
   USING btree
   (storagedate);
 
 
 CREATE TABLE rtunodes
 (
   nodeid int4 NOT NULL,
   rtuid int4 NOT NULL,
   no_publicnodeid int4,
   name varchar(64) NOT NULL,
   isinvalid bool NOT NULL DEFAULT false,
   nodetype varchar(16),
   CONSTRAINT rtunodes_pkey PRIMARY KEY (nodeid),
   CONSTRAINT nodeid_fkey FOREIGN KEY (nodeid)
   REFERENCES nodes (nodeid) MATCH SIMPLE
   ON UPDATE CASCADE ON DELETE RESTRICT,
   CONSTRAINT rtuid_fkey FOREIGN KEY (rtuid)
   REFERENCES rtus (nodeid) MATCH SIMPLE
   ON UPDATE CASCADE ON DELETE RESTRICT
 ) 
 WITHOUT OIDS;
 ALTER TABLE rtunodes OWNER TO tomcat;
 
 
 
 CREATE TABLE rtus
 (
   nodeid int4 NOT NULL,
   passwd varchar(10) NOT NULL,
   xml text,
   no_nextpublicnodeid int4 NOT NULL DEFAULT 1,
   rtudriverid int2,
   realmid int4 NOT NULL,
   enablegetlogin bool NOT NULL DEFAULT false,
   enablegetconfig bool NOT NULL DEFAULT false,
   businfoxml text,
   uniqueid varchar(32) NOT NULL,
   no_publicrtuid int4,
   loginname varchar(10) NOT NULL,
   protocolversion varchar(8) DEFAULT '0.0'::character varying,
   isinvalid bool DEFAULT false,
   CONSTRAINT rtus_pkey PRIMARY KEY (nodeid),
   CONSTRAINT nodeid_fkey FOREIGN KEY (nodeid)
   REFERENCES nodes (nodeid) MATCH SIMPLE
   ON UPDATE CASCADE ON DELETE RESTRICT,
   CONSTRAINT realmid_fkey FOREIGN KEY (realmid)
   REFERENCES realms (nodeid) MATCH SIMPLE
   ON UPDATE CASCADE ON DELETE RESTRICT,
   CONSTRAINT rtudriverid_fkey FOREIGN KEY (rtudriverid)
   REFERENCES rtudrivers (rtudriverid) MATCH SIMPLE
   ON UPDATE CASCADE ON DELETE RESTRICT,
   CONSTRAINT rtus_loginname_unique UNIQUE (loginname),
   CONSTRAINT rtus_uniqueid_unique UNIQUE (uniqueid)
 ) 
 WITHOUT OIDS;
 ALTER TABLE rtus OWNER TO tomcat;
 
 
 -- Index: rtus_realmid_index
 
 -- DROP INDEX rtus_realmid_index;
 
 CREATE INDEX rtus_realmid_index
   ON rtus
   USING btree
   (realmid);
 
 -- Index: rtus_rtudriverid_index
 
 -- DROP INDEX rtus_rtudriverid_index;
 
 CREATE INDEX rtus_rtudriverid_index
   ON rtus
   USING btree
   (rtudriverid);
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Two hard drives --- what to do with them?

2007-02-28 Thread Bruno Wolff III

On Tue, Feb 27, 2007 at 15:35:13 +1030,
  Shane Ambler [EMAIL PROTECTED] wrote:
 
 From all that I have heard this is another advantage of SCSI disks - 
 they honor these settings as you would expect - many IDE/SATA disks 
 often say sure I'll disable the cache but continue to use it or don't 
 retain the setting after restart.

It is easy enough to tests if your disk lie about disabling the cache.
I doubt that it is all that common for modern disks to do that.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Two hard drives --- what to do with them?

2007-02-28 Thread Bruno Wolff III

On Wed, Feb 28, 2007 at 05:21:41 +1030,
  Shane Ambler [EMAIL PROTECTED] wrote:
 
 The difference between SCSI and IDE/SATA in this case is a lot if not 
 all IDE/SATA drives tell you that the cache is disabled when you ask it 
 to but they either don't actually disable it or they don't retain the 
 setting so you get caught later. SCSI disks can be trusted when you set 
 this option.

I have some Western Digital Caviars and they don't lie about disabling
write caching.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)

2007-02-06 Thread Bruno Wolff III

On Tue, Feb 06, 2007 at 09:39:54 -0500,
  Mark Stosberg [EMAIL PROTECTED] wrote:
 
 I've been investigating partial indexes for the pets table. It has about
 300,000 rows, but only about 10 are active, and those are the ones we
 are care about. Queries are also frequently made on males vs females, dogs vs 
 cats

It probably won't pay to make partial indexes on sex or species (at least
for the popular ones), as you aren't likely to save enough by eliminating only
half the cases to make up for maintaining another index. A partial index for
active rows probably does make sense.

 or specific ages, and those specific cases seem like possible candidates for 
 partial indexes
 as well. I played with that approach some, but had trouble coming up with any 
 thing that
 benchmarked faster.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-06 Thread Bruno Wolff III

On Tue, Feb 06, 2007 at 10:31:26 -0800,
  Mark Lewis [EMAIL PROTECTED] wrote:
 
 Sure it's possible:
 
 CREATE TABLE parent (col1 int4);
 -- insert many millions of rows into parent
 CREATE TABLE child  (col1 int4 REFERENCES parent(col1));
 -- insert many millions of rows into child, very very slowly.

I don't think Postgres allows this. You don't have to have an index in the
child table, but do in the parent table.
Quote from http://developer.postgresql.org/pgdocs/postgres/sql-createtable.html:
The referenced columns must be the columns of a unique or primary key
constraint in the referenced table.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] optimizing a geo_distance() proximity query

2007-02-05 Thread Bruno Wolff III

On Mon, Feb 05, 2007 at 14:47:25 -0500,
  Mark Stosberg [EMAIL PROTECTED] wrote:
 
 This is also interesting. Is this approach practical if I want to index
 what's near each of about 40,000 US zipcodes, or the approach mostly
 useful if you there are just a small number of fixed points to address?

I think the answer depends on what your data model is. If you treat each
zip code as having a location at a single point, the earth distance stuff
should work. If you are trying to include the shape of each zip code in
your model and measure distances to the nearest point of zip codes, then
you will probably be better off using postgis.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)

2007-02-05 Thread Bruno Wolff III

On Mon, Feb 05, 2007 at 18:01:05 -0500,
  Mark Stosberg [EMAIL PROTECTED] wrote:
 
 It's also notable that the units used are meters, not miles like
 geo_distance(). That's what the magic number of 16093.44 is-- 10 miles
 converted to meters.

You can change the earth() function in earthdistance.sql before running it
to use some other unit other than meters:

-- earth() returns the radius of the earth in meters. This is the only
-- place you need to change things for the cube base distance functions
-- in order to use different units (or a better value for the Earth's radius).

CREATE OR REPLACE FUNCTION earth() RETURNS float8
LANGUAGE 'sql' IMMUTABLE
AS 'SELECT ''6378168''::float8';

 However, my next step was to try a more real world query that involved
  a more complex where clause and a couple of table joins. So far, that
 result is coming out /slower/ with the new approach, even though the
 index is being used. I believe this may be cause of the additional
 results found that are outside of the sphere, but inside the cube. This
 causes additional rows that need processing in the joined tables.

This is unlikely to be the cause. The ratio of the area of the cube to
the circle for small radii (compared to the radius of the earth, so that
we can consider thinsg flat) is 4/pi = 1.27 which shouldn't cause that
much of a change.
It might be that you are getting a bad plan. The guess on the selectivity
of the gist constraint may not be very good.
Some people here may be able to tell you more if you show us explain
analyze output.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Querying distinct values from a large table

2007-01-30 Thread Bruno Wolff III

On Tue, Jan 30, 2007 at 14:33:34 +0600,
  Igor Lobanov [EMAIL PROTECTED] wrote:
 Greetings!
 
 I have rather large table with about 5 millions of rows and a dozen of 
 columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I need 
 to query distinct pairs of ('a';'b') from this table.
 
 Is there any way to somehow improve the performance of this operation? 
 Table can not be changed.

DISTINCT currently can't use a hash aggregate plan and will use a sort.
If there aren't many distinct values, the hash aggregate plan will run much
faster. To get around this limitation, rewrite the query as a group by.
Something like:
SELECT a, b FROM table GROUP BY a, b;

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] slow result

2007-01-23 Thread Bruno Wolff III

On Tue, Jan 23, 2007 at 11:34:52 +0100,
  Laurent Manchon [EMAIL PROTECTED] wrote:
 Hi,
 
 I have a slow response of my PostgreSQL database 7.4 using this query below
 on a table with 80 rows:
 
 select count(*)from tbl;
 
 PostgreSQL return result in 28 sec every time.
 although MS-SQL return result in 0.02 sec every time.

Besides the other advice mentioned in this thread, check that you don't
have a lot of dead tuples in that table. 28 seconds seems a bit high
for even a sequential scan of 80 tuples unless they are pretty large.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] DB benchmark and pg config file help

2007-01-19 Thread Bruno Wolff III

On Fri, Jan 19, 2007 at 09:05:35 -0500,
  Kevin Hunter [EMAIL PROTECTED] wrote:
 
 Seriously though, that would have bitten me.  Thank you, I did not know 
 that.  Does that mean that I can't publish the results outside of my 
 work/research/personal unit at all?  Or do I just need to obscure about 
 which DB I'm talking?  (Like Vendor {1,2,3,...} Product).

Check with your lawyer. Depending on where you are, those clauses may not even
be valid.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Configuration Advice

2007-01-17 Thread Bruno Wolff III

 From: Steve [EMAIL PROTECTED]
 To: pgsql-performance@postgresql.org
 Sent: 1/17/2007 2:41 PM
 Subject: [PERFORM] Configuration Advice
 
 SO ... our goal here is to make this load process take less time.  It 
 seems the big part is building the big summary table; this big summary 
 table is currently 9 million rows big.  Every night, we drop the table, 
 re-create it, build the 9 million rows of data (we use COPY to put hte 
 data in when it's prepared, not INSERT), and then build the indexes on it 
 -- of which there are many.  Unfortunately this table gets queried 
 in a lot of different ways and needs these indexes; also unfortunately, we 
 have operator class indexes to support both ASC and DESC sorting on 
 columns so these are for all intents and purposes duplicate but required 
 under Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this still 
 a requirement?)

Note that you only need to have the ASC and DESC versions of opclasses when
you are going to use multicolumn indexes with some columns in ASC order and
some in DESC order. For columns used by themselves in an index, you don't
need to do this, no matter which order you are sorting on.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] File Systems Compared

2006-12-17 Thread Bruno Wolff III

On Fri, Dec 15, 2006 at 10:44:39 -0600,
  Bruno Wolff III [EMAIL PROTECTED] wrote:
 
 The other feature I would like is to be able to use write barriers with
 encrypted file systems. I haven't found anythign on whether or not there
 are near term plans by any one to support that.

I asked about this on the dm-crypt list and was told that write barriers
work pre 2.6.19. There was a change for 2.6.19 that might break things for
SMP systems. But that will probably get fixed eventually.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] File Systems Compared

2006-12-15 Thread Bruno Wolff III

The reply wasn't (directly copied to the performance list, but I will
copy this one back.

On Thu, Dec 14, 2006 at 13:21:11 -0800,
  Ron Mayer [EMAIL PROTECTED] wrote:
 Bruno Wolff III wrote:
  On Thu, Dec 14, 2006 at 01:39:00 -0500,
Jim Nasby [EMAIL PROTECTED] wrote:
  On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote:
  This appears to be changing under Linux. Recent kernels have write  
  barriers implemented using cache flush commands (which 
  some drives ignore,  so you need to be careful).
 
 Is it true that some drives ignore this; or is it mostly
 an urban legend that was started by testers that didn't
 have kernels with write barrier support.   I'd be especially
 interested in knowing if there are any currently available
 drives which ignore those commands.
 
  In very recent kernels, software raid using raid 1 will also
  handle write barriers. To get this feature, you are supposed to
  mount ext3 file systems with the barrier=1 option. For other file  
  systems, the parameter may need to be different.
 
 With XFS the default is apparently to enable write barrier
 support unless you explicitly disable it with the nobarrier mount option.
 It also will warn you in the system log if the underlying device
 doesn't have write barrier support.
 
 SGI recommends that you use the nobarrier mount option if you do
 have a persistent (battery backed) write cache on your raid device.
 
   http://oss.sgi.com/projects/xfs/faq.html#wcache
 
 
  But would that actually provide a meaningful benefit? When you  
  COMMIT, the WAL data must hit non-volatile storage of some kind,  
  which without a BBU or something similar, means hitting the platter.  
  So I don't see how enabling the disk cache will help, unless of  
  course it's ignoring fsync.
 
 With write barriers, fsync() waits for the physical disk; but I believe
 the background writes from write() done by pdflush don't have to; so
 it's kinda like only disabling the cache for WAL files and the filesystem's
 journal, but having it enabled for the rest of your write activity (the
 tables except at checkpoints?  the log file?).
 
  Note the use case for this is more for hobbiests or development boxes. You 
  can
  only use it on software raid (md) 1, which rules out most real systems.
  
 
 Ugh.  Looking for where that's documented; and hoping it is or will soon
 work on software 1+0 as well.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] File Systems Compared

2006-12-15 Thread Bruno Wolff III

On Thu, Dec 14, 2006 at 13:21:11 -0800,
  Ron Mayer [EMAIL PROTECTED] wrote:
 Bruno Wolff III wrote:
  On Thu, Dec 14, 2006 at 01:39:00 -0500,
Jim Nasby [EMAIL PROTECTED] wrote:
  On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote:
  This appears to be changing under Linux. Recent kernels have write  
  barriers implemented using cache flush commands (which 
  some drives ignore,  so you need to be careful).
 
 Is it true that some drives ignore this; or is it mostly
 an urban legend that was started by testers that didn't
 have kernels with write barrier support.   I'd be especially
 interested in knowing if there are any currently available
 drives which ignore those commands.

I saw posts claiming this, but no specific drives mentioned. I did see one
post that claimed that the cache flush command was mandated (not optional)
by the spec.

  In very recent kernels, software raid using raid 1 will also
  handle write barriers. To get this feature, you are supposed to
  mount ext3 file systems with the barrier=1 option. For other file  
  systems, the parameter may need to be different.
 
 With XFS the default is apparently to enable write barrier
 support unless you explicitly disable it with the nobarrier mount option.
 It also will warn you in the system log if the underlying device
 doesn't have write barrier support.

I think there might be a similar patch for ext3 going into 2.6.19. I haven't
checked a 2.6.19 kernel to make sure though.

 
 SGI recommends that you use the nobarrier mount option if you do
 have a persistent (battery backed) write cache on your raid device.
 
   http://oss.sgi.com/projects/xfs/faq.html#wcache
 
 
  But would that actually provide a meaningful benefit? When you  
  COMMIT, the WAL data must hit non-volatile storage of some kind,  
  which without a BBU or something similar, means hitting the platter.  
  So I don't see how enabling the disk cache will help, unless of  
  course it's ignoring fsync.
 
 With write barriers, fsync() waits for the physical disk; but I believe
 the background writes from write() done by pdflush don't have to; so
 it's kinda like only disabling the cache for WAL files and the filesystem's
 journal, but having it enabled for the rest of your write activity (the
 tables except at checkpoints?  the log file?).

Not exactly. Whenever you commit the file system log or fsync the wal file,
all previously written blocks will be flushed to the disk platter, before
any new write requests are honored. So journalling semantics will work
properly.

  Note the use case for this is more for hobbiests or development boxes. You 
  can
  only use it on software raid (md) 1, which rules out most real systems.
  
 
 Ugh.  Looking for where that's documented; and hoping it is or will soon
 work on software 1+0 as well.

I saw a comment somewhere that raid 0 provided some problems and the suggestion
was to handle the barrier at a different level (though I don't know how you
could). So I don't belive 1+0 or 5 are currently supported or will be in the
near term.

The other feature I would like is to be able to use write barriers with
encrypted file systems. I haven't found anythign on whether or not there
are near term plans by any one to support that.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] File Systems Compared

2006-12-15 Thread Bruno Wolff III

On Fri, Dec 15, 2006 at 10:34:15 -0600,
  Bruno Wolff III [EMAIL PROTECTED] wrote:
 The reply wasn't (directly copied to the performance list, but I will
 copy this one back.

Sorry about this one, I meant to intersperse my replies and hit the 'y'
key at the wrong time. (And there ended up being a copy on performance
anyway from the news gateway.)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] File Systems Compared

2006-12-14 Thread Bruno Wolff III

On Thu, Dec 14, 2006 at 01:39:00 -0500,
  Jim Nasby [EMAIL PROTECTED] wrote:
 On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote:
 
 This appears to be changing under Linux. Recent kernels have write  
 barriers
 implemented using cache flush commands (which some drives ignore,  
 so you
 need to be careful). In very recent kernels, software raid using  
 raid 1
 will also handle write barriers. To get this feature, you are  
 supposed to
 mount ext3 file systems with the barrier=1 option. For other file  
 systems,
 the parameter may need to be different.
 
 But would that actually provide a meaningful benefit? When you  
 COMMIT, the WAL data must hit non-volatile storage of some kind,  
 which without a BBU or something similar, means hitting the platter.  
 So I don't see how enabling the disk cache will help, unless of  
 course it's ignoring fsync.

When you do an fsync, the OS sends a cache flush command to the drive,
which on most drives (but supposedly there are ones that ignore this
command) doesn't return until all of the cached pages have been written
to the platter, and doesn't return from the fsync until the flush is complete.
While this writes more sectors than you really need, it is safe. And it allows
for caching to speed up some things (though not as much as having queued
commands would).

I have done some tests on my systems and the speeds I am getting make it
clear that write barriers slow things down to about the same range as having
caches disabled. So I believe that it is likely working as advertised.

Note the use case for this is more for hobbiests or development boxes. You can
only use it on software raid (md) 1, which rules out most real systems.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] File Systems Compared

2006-12-11 Thread Bruno Wolff III

On Wed, Dec 06, 2006 at 08:55:14 -0800,
  Mark Lewis [EMAIL PROTECTED] wrote:
  Anyone run their RAIDs with disk caches enabled, or is this akin to
  having fsync off?
 
 Disk write caches are basically always akin to having fsync off.  The
 only time a write-cache is (more or less) safe to enable is when it is
 backed by a battery or in some other way made non-volatile.
 
 So a RAID controller with a battery-backed write cache can enable its
 own write cache, but can't safely enable the write-caches on the disk
 drives it manages.

This appears to be changing under Linux. Recent kernels have write barriers
implemented using cache flush commands (which some drives ignore, so you
need to be careful). In very recent kernels, software raid using raid 1
will also handle write barriers. To get this feature, you are supposed to
mount ext3 file systems with the barrier=1 option. For other file systems,
the parameter may need to be different.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] File Systems Compared

2006-12-06 Thread Bruno Wolff III

On Wed, Dec 06, 2006 at 18:45:56 +0100,
  Markus Schiltknecht [EMAIL PROTECTED] wrote:
 
 Cool, thank you for the example :-)  I thought the MTA or at least the the 
 mailing list would wrap mails at some limit. I've now set word-wrap to  
 characters (it seems not possible to turn it off completely in 
 thunderbird). But when writing, I'm now getting one long line.
 
 What's common practice? What's it on the pgsql mailing lists?

If you do this you should set format=flowed (see rfc 2646). If you do that,
then clients can break the lines in an appropiate way. This is actually
better than fixing the line width in the original message, since the
recipient may not have the same number of characters (or pixels) of display
as the sender.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] BUG #2784: Performance serious degrades over a period of a month

2006-11-27 Thread Bruno Wolff III

This really should have been asked on pgsql-performance and would probably
get a better response there..

On Sun, Nov 26, 2006 at 16:35:52 +,
  Michael Simms [EMAIL PROTECTED] wrote:
 PostgreSQL version: 8.1.4
 Operating system:   Linux kernel 2.6.12
 Description:Performance serious degrades over a period of a month
 Details: 
 
 OK, we have a database that runs perfectly well after a dump and restore,
 but over a period of a month or two, it just degrades to the point of
 uselessness.
 vacuumdb -a is run every 24 hours. We have also run for months at a time
 using -a -z but the effect doesnt change.
 

This sounds like you either need to increase your FSM setting or vacuum
more often. I think vacuumdb -v will give you enough information to tell
if FSM is too low at the frequency you are vacuuming.

 The database is for a counter, not the most critical part of the system, but
 a part of the system nonetheless. Other tables we have also degrade over
 time, but the counter is the most pronounced. There seems to be no common
 feature of the tables that degrade. All I know is that a series of queries
 that are run on the database every 24 hours, after a dump/restore takes 2
 hours. Now, 2 months after, it is taking over 12. We are seriously
 considering switching to mysql to avoid this issue. 

You probably will want to vacuum the counter table more often than the other
tables in the database. Depending on how often the counter(s) are being
updated and how many separate counters are in the table you might want to
vacuum that table as often as once a minute.

Depending on your requirements you might also want to consider using a sequence
instead of a table row for the counter.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Simple join optimized badly?

2006-10-10 Thread Bruno Wolff III

On Mon, Oct 09, 2006 at 23:33:03 +0200,
  Tobias Brox [EMAIL PROTECTED] wrote:
 
 Just a comment from the side line; can't the rough set
 enable_seqscan=off be considered as sort of a hint anyway?  There have
 been situations where we've actually had to resort to such crud.

That only works for simple queries. To be generally useful, you want to
be able to hint how to handle each join being done in the query. The
current controlls affect all joins.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Unsubscribe

2006-10-04 Thread Bruno Wolff III

On Wed, Oct 04, 2006 at 10:03:00 +0200,
  Luc Delgado [EMAIL PROTECTED] wrote:
 
  Please unsubscribe me!  Thank you!

If you really can't figure out how to unsubscribe from a list, you should
contact the list owner, not the list. The list members can't unsubscribe you
(and it isn't their job to) and the owner may not be subscribed to the
list. The convention for lists is that adding '-owner' to the local part
of the list email address will be an address for the owner.
A good place to search to find out how to unsubscribe to a list is to search
for the mailing lists using google. Usually the information on how to subscribe
and unsubscribe are in the same place and you were able to find out how
to subscribe in the first place, so you should be able to figure out how
to unsubscribe by yourself as well.

 
 
  Also, it would be better to have a message foot saying how to unsubscribe.

No, the standard is that the list information is kept in the headers so that
it can be extracted by mail clients that care to. There is an RFC describing
these headers. They are supplied by the mailing list software used for the
Postgres mailing lists. Have your mail client display full headers for one
of the list messages to get the instructions from there.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Unsubscribe

2006-10-04 Thread Bruno Wolff III

On Wed, Oct 04, 2006 at 08:30:03 -0700,
  Joshua D. Drake [EMAIL PROTECTED] wrote:
 
 Although I 100% agree with you Bruno, it should be noted that our lists
 are a closed box for most people. They don't follow what is largely
 considered standard amongst lists which is to have list information at
 the bottom of each e-mail.

There are reasons you don't want to do that. Footers work OK for single
part email messages. They don't make so much sense in multipart messages.
You can probably take a crap shoot and add the footer to the first
text/plain part and not break things. This won't work so well for multipart
alternative messages that have text/plain and text/html parts. You could
also try to insert a footer in to the html part, but thats a bit trickier
since you can't just put it at the end.

However, since the postgres lists are mostly just using text/plain parts
for message bodies and there are already footers being used to distribute
tips, it wouldn't make things significantly worse to add unsubscribe
information as well.

I would prefer just making the unsubscribe instructions easy to find on
the web.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Bruno Wolff III

On Tue, Oct 03, 2006 at 12:13:43 -0700,
  Graham Davis [EMAIL PROTECTED] wrote:
 Also, the multikey index of (assetid, ts) would already be sorted and 
 that is why using such an index in this case is
 faster than doing a sequential scan that does the sorting afterwards.

That isn't necessarily true. The sequentional scan and sort will need a lot
fewer disk seeks and could run faster than using an index scan that has
the disk drives doing seeks for every tuple (in the worst case, where
the on disk order of tuples doesn't match the order in the index).

If your server is caching most of the blocks than the index scan might
give better results. You might try disabling sequentional scans to
try to coerce the other plan and see what results you get. If it is
substantially faster the other way, then you might want to look at lowering
the random page cost factor. However, since this can affect other queries
you need to be careful that you don't speed up one query at the expense
of a lot of other queries.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] BUG #2543: Performance delay acrros the same day

2006-07-21 Thread Bruno Wolff III

On Fri, Jul 21, 2006 at 07:41:02 +,
  Alaa El Gohary [EMAIL PROTECTED] wrote:
 
 The following bug has been logged online:

The report below isn't a bug, its a performance question and should have
been sent to [EMAIL PROTECTED] I am redirecting replies there.

 A query on the postgresql DB takes about 5 seconds and then it starts to
 take more time till it reaches about 60 seconds by the end of the same day.
 I tried vacuum but nothing changed the only thing that works is to dump the
 DB ,drop and create a new one with the dump taken.
 i need to know if there is any way to restore the performance back without
 the need for drop and create
 cause i can't do this accross the day

You most likely aren't vacuuming often enough and/or don't have your FSM
setting high enough.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Commit slower on faster PC

2006-07-12 Thread Bruno Wolff III

On Wed, Jul 12, 2006 at 10:16:40 -0600,
  Koth, Christian (DWBI) [EMAIL PROTECTED] wrote:
 
 I have noticed a strange performance behaviour using a commit statement on 
 two different machines. On one of the machines the commit is many times 
 faster than on the other machine which has faster hardware. Server and client 
 are running always on the same machine.
 
 Server version (same on both machines): PostgreSQL 8.1.3. (same binaries as 
 well)
 
 PC1:
 
 IDE-HDD (approx. 50 MB/s rw), fs: ext3
 
 PC2:
 
 SCSI-HDD (approx. 65 MB/s rw), fs: ext3
 
 Both installations of the database have the same configuration, different 
 from default are only the following settings on both machines:
 
 pgbench gives me the following results:
 PC1:
 
 tps = 293.625393 (excluding connections establishing)
 
 PC2:
 
 tps = 46.519634 (excluding connections establishing)

Have you checked to see if the ide drive is lying about having written the
data to the platters?

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] hyper slow after upgrade to 8.1.4

2006-07-12 Thread Bruno Wolff III

On Wed, Jul 12, 2006 at 15:41:14 -0500,
  Medora Schauer [EMAIL PROTECTED] wrote:
 I have just upgraded from 7.3.4 to 8.1.4 and now *all* db access calls
 are extremely slow.  I didn't need to preserve any old data so at this
 point all my tables are empty.  Just connecting to a db takes several
 seconds.
 
 I know I'm not giving much to go on but I'm stumped.  Can anyone suggest
 how I might track down the cause of this problem?

That connections are slow makes me think DNS is worth looking at. It might
be that reverse lookups are timing out.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] High CPU Usage - PostgreSQL 7.3

2006-07-11 Thread Bruno Wolff III

On Mon, Jul 10, 2006 at 17:55:38 +1000,
  Neil Hepworth [EMAIL PROTECTED] wrote:
 
 running on our server (obviously we need to update certain queries,
 e.g. delete .. using.. and test with 8.1 first) - I will be pushing
 for an upgrade as soon as possible.  And the fsync=false is a

You can set add_missing_from if you want to delay rewriting queries that
use that feature.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Optimizer internals

2006-06-23 Thread Bruno Wolff III

On Thu, Jun 15, 2006 at 15:38:32 -0400,
  John Vincent [EMAIL PROTECTED] wrote:
 Any suggestions? FYI the original question wasn't meant as a poke at
 comparing PG to MySQL to DB2. I'm not making an yvalue judgements either
 way. I'm just trying to understand how we can use it the best way possible.
 
 
 Actually we just thought about something. With PG, we can create an index
 that is a SUM of the column where indexing, no? We're going to test this in
 a few hours. Would that be able to be satisfied by an index scan?

No, that won't work. While you can make indexes on functions of a row, you
can't make indexes on aggregate functions.

You might find making a materialized view of the information you want can
help with performance. The issues with sum are pretty much the same ones
as with count. You can find a couple different ways of doing materialized
views for count in the archives. There is a simple way of doing it that
doesn't work well with lots of concurrent updates and a more complicated
method that does work well with lots of concurrent updates.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Question about clustering multiple columns

2006-06-16 Thread Bruno Wolff III

On Tue, Jun 13, 2006 at 09:04:15 -0700,
  Benjamin Arai [EMAIL PROTECTED] wrote:
 Hi,
  
 I have a database where there are three columns (name,date,data).  The
 queries are almost always something like SELECT date,data FROM table WHERE
 name=blah AND date  1/1/2005 AND date  1/1/2006;.  I currently have three
 B-tree indexes, one for each of the columns.  Is clustering on date index
 going to be what I want, or do I need a index that contains both name and
 date?

I would expect that clustering on the name would be better for the above
query.
You probably want an index on name and date combined.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Question about clustering multiple columns

2006-06-16 Thread Bruno Wolff III

On Fri, Jun 16, 2006 at 11:11:59 -0700,
  Benjamin Arai [EMAIL PROTECTED] wrote:
 Hi,
 
 Thanks for the reply. I have one more question.  Does it matter in which
 order that I make the index?

Please keep replies copied to the lists so that other people can learn from
and crontibute to the discussion.

In this case I am just going to copy back to the performance list, since it
is generally better for perfomance questions than the general list.

 For example, should I create an index cusip,date or date,cusip, does it
 matter which order.  My goal is to cluster the entries by cusip, then for
 each cusip order the data by date (maybe the order by data occurs
 automatically).  Hm, in that case maybe I only need to cluster by cusip, but
 then how do I ensure that each cusip had its data ordered by date?

I think that you want to order by cusip (assuming that corresponds to name
in you sample query below) first. You won't end up having to go through values
in the index that will be filtered out if you do it that way.

The documentation for the cluster command says that it clusters on indexes,
not columns. So if the index is on (cusip, date), then the records will be
ordered by cusip, date immediately after the cluster. (New records added 
after the cluster are not guarenteed to be ordered by the index.)

 
 Benjamin
 
 -Original Message-
 From: Bruno Wolff III [mailto:[EMAIL PROTECTED] 
 Sent: Friday, June 16, 2006 8:32 AM
 To: Benjamin Arai
 Cc: pgsql-general@postgresql.org; pgsql-performance@postgresql.org
 Subject: Re: Question about clustering multiple columns
 
 On Tue, Jun 13, 2006 at 09:04:15 -0700,
   Benjamin Arai [EMAIL PROTECTED] wrote:
  Hi,
   
  I have a database where there are three columns (name,date,data).  The 
  queries are almost always something like SELECT date,data FROM table 
  WHERE name=blah AND date  1/1/2005 AND date  1/1/2006;.  I currently 
  have three B-tree indexes, one for each of the columns.  Is clustering 
  on date index going to be what I want, or do I need a index that 
  contains both name and date?
 
 I would expect that clustering on the name would be better for the above
 query.
 You probably want an index on name and date combined.
 
 !DSPAM:4492ce0d180368658827628!
 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] App very unresponsive while performing simple update

2006-05-31 Thread Bruno Wolff III

On Wed, May 31, 2006 at 01:23:07 -0500,
  Jim C. Nasby [EMAIL PROTECTED] wrote:
 On Sun, May 28, 2006 at 07:20:59PM -0400, Greg Stark wrote:
  Brendan Duddridge [EMAIL PROTECTED] writes:
  More likely you were blocking on some lock. Until that other query holding
  that lock tries to commit Postgres won't actually detect a deadlock, it'll
  just sit waiting until the lock becomes available.
 
 Wow, are you sure that's how it works? I would think it would be able to
 detect deadlocks as soon as both processes are waiting on each other's
 locks.

I don't see how it could wait for a commit. If a command is blocked waiting for
a lock, how are you going to get a commit (you might get a rollback if the
query is aborted)?

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] is it possible to make this faster?

2006-05-25 Thread Bruno Wolff III

On Thu, May 25, 2006 at 16:07:19 -0400,
  Merlin Moncure [EMAIL PROTECTED] wrote:
 been doing a lot of pgsql/mysql performance testing lately, and there
 is one query that mysql does much better than pgsql...and I see it a
 lot in normal development:
 
 select a,b,max(c) from t group by a,b;
 
 t has an index on a,b,c.
 
 in my sample case with cardinality of 1000 for a, 2000 for b, and
 30 records in t, pgsql does a seq. scan on dev box in about a
 second (returning 2000 records).
 
 recent versions of mysql do much better, returning same set in  20ms.
 mysql explain says it uses an index to optimize the group by somehow.
 is there a faster way to write this query?

SELECT DISTINCT ON (a, b) a, b, c FROM t ORDER BY a DESC, b DESC, c DESC;

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] is it possible to make this faster?

2006-05-25 Thread Bruno Wolff III

On Thu, May 25, 2006 at 16:31:40 -0400,
  Merlin Moncure [EMAIL PROTECTED] wrote:
 On 5/25/06, Bruno Wolff III [EMAIL PROTECTED] wrote:
 On Thu, May 25, 2006 at 16:07:19 -0400,
   Merlin Moncure [EMAIL PROTECTED] wrote:
  been doing a lot of pgsql/mysql performance testing lately, and there
  is one query that mysql does much better than pgsql...and I see it a
  lot in normal development:
 
  select a,b,max(c) from t group by a,b;
 
 
 SELECT DISTINCT ON (a, b) a, b, c FROM t ORDER BY a DESC, b DESC, c DESC;
 
 that is actually slower than group by in my case...am i missing
 something? (both essentially resolved to seq_scan)

If there aren't many c's for each (a,b), then a sort might be the best way to
do this. I don't remember if skip scanning ever got done, but if it did, it
would have been 8.1 or later.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Optimizer: limit not taken into account

2006-05-17 Thread Bruno Wolff III

Please don't reply to previous messages to start new threads. This makes it
harder to find stuff in the archives and may keep people from noticing your
message.

On Wed, May 17, 2006 at 08:54:52 -0700,
  Craig A. James [EMAIL PROTECTED] wrote:
 Here's a corner case that might interest someone.  It tripped up one of 
 our programmers.
 
 We have a table with  10 million rows.  The ID column is indexed, the 
 table has been vacuum/analyzed.  Compare these two queries:
 
   select * from tbl where id = 1000 limit 1;
   select * from tbl where id = 1000 order by id limit 1;
 
 The first takes 4 seconds, and uses a full table scan.  The second takes 32 
 msec and uses the index.  Details are below.

I suspect it wasn't intended to be a full table scan. But rather a sequential
scan until it found a matching row. If the data in the table is ordered by
by id, this strategy may not work out well. Where as if the data is randomly
ordered, it would be expected to find a match quickly.

Have you analyzed the table recently? If the planner has bad stats on the
table, that is going to make it more likely to choose a bad plan.


 I understand why the planner makes the choices it does -- the id  
 1000 isn't very selective and under normal circumstances a full table 
 scan is probably the right choice.  But the limit 1 apparently doesn't 
 alter the planner's strategy at all.  We were surprised by this.
 
 Adding the order by was a simple solution.
 
 Craig
 
 
 
 pg= explain analyze select url, url_digest from url_queue where priority 
 = 1000 limit 1;
   QUERY PLAN
 --
 Limit  (cost=0.00..0.65 rows=1 width=108) (actual time=4036.113..4036.117 
 rows=1 loops=1)
   -  Seq Scan on url_queue  (cost=0.00..391254.35 rows=606176 width=108) 
   (actual time=4036.101..4036.101 rows=1 loops=1)
 Filter: (priority = 1000)
 Total runtime: 4036.200 ms
 (4 rows)
 
 pg= explain analyze select url, url_digest from url_queue where priority 
 = 1000 order by priority limit 1;
   QUERY PLAN
 --
 Limit  (cost=0.00..2.38 rows=1 width=112) (actual time=32.445..32.448 
 rows=1 loops=1)
   -  Index Scan using url_queue_priority on url_queue  
   (cost=0.00..1440200.41 rows=606176 width=112) (actual time=32.434..32.434 
   rows=1 loops=1)
 Index Cond: (priority = 1000)
 Total runtime: 32.566 ms
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-12 Thread Bruno Wolff III

On Thu, May 11, 2006 at 18:41:25 -0500,
  Jim C. Nasby [EMAIL PROTECTED] wrote:
 On Thu, May 11, 2006 at 07:20:27PM -0400, Bruce Momjian wrote:
 
 My damn powerbook drive recently failed with very little warning, other
 than I did notice that disk activity seemed to be getting a bit slower.
 IIRC it didn't log any errors or anything. Even if it did, if the OS was
 catching them I'd hope it would pop up a warning or something. But from
 what I've heard, some drives now-a-days will silently remap dead sectors
 without telling the OS anything, which is great until you've used up all
 of the spare sectors and there's nowhere to remap to. :(

You might look into smartmontools. One part of this is a daemon that runs
selftests on the disks on a regular basis. You can have warnings mailed to
you on various conditions. Drives will fail the self test before they
run out of spare sectors. There are other drive characteristics that can
be used to tell if drive failure is imminent and give you a chance to replace
a drive before it fails.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] UNSUBSCRIBE

2006-05-10 Thread Bruno Wolff III

On Wed, May 10, 2006 at 01:15:11 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 Maybe the real problem is at the other end of the process, ie we should
 require some evidence of a greater-than-room-temp IQ to subscribe in the
 first place?

I suspect it is more lazyiness that smarts. That had to at least figure out
how to respond to the confirm message in the first place in order to get
subscribed.
My theory is that they don't want to take the trouble to figure out how to
unsubscribe when they (think that they) can just send a message to the list
(not even the admin) asking to be unsubscribed and it will (well actually won't
on these lists) happen.

Maybe posts with unsubscribe in the subject could be held for moderation
and/or get an automated reply with instructions for unsubscribing.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Arguments Pro/Contra Software Raid

2006-05-09 Thread Bruno Wolff III

On Tue, May 09, 2006 at 12:10:32 +0200,
  Jean-Yves F. Barbier [EMAIL PROTECTED] wrote:
 Naa, you can find ATA | SATA ctrlrs for about EUR30 !

But those are the ones that you would generally be better off not using.

 Definitely NOT, however if your server doen't have a heavy load, the
 software overload can't be noticed (essentially cache managing and
 syncing)

It is fairly common for database machines to be IO, rather than CPU, bound
and so the CPU impact of software raid is low.

 Some hardware ctrlrs are able to avoid the loss of a disk if you turn
 to have some faulty sectors (by relocating internally them); software
 RAID doesn't as sectors *must* be @ the same (linear) addresses.

That is not true. Software raid works just fine on drives that have internally
remapped sectors.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] postgresql transaction id monitoring with nagios

2006-05-02 Thread Bruno Wolff III

On Tue, May 02, 2006 at 12:06:30 -0700,
  Tony Wasson [EMAIL PROTECTED] wrote:
 
 Ah thanks, it's a bug in my understanding of the thresholds.
 
 With the standard freezing policy, the age column will start at one
 billion for a freshly-vacuumed database.
 
 So essentially, 1B is normal, 2B is the max. The logic is now..
 
 The script detects a wrap at 2 billion. It starts warning once one or
 more databases show an age over 1.5 billion transactions. It reports
 critical at 1.75B transactions.
 
 If anyone else understands differently, hit me with a clue bat.

Isn't this obsolete now anyway? I am pretty sure 8.1 has safeguards against
wrap around.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Running on an NFS Mounted Directory

2006-04-27 Thread Bruno Wolff III

On Thu, Apr 27, 2006 at 08:57:51 -0400,
  Ketema Harris [EMAIL PROTECTED] wrote:
 performance from the db.  I also would hopefully then not have to do
 periodic backups from the db server to some other type of storage.  Is this
 not a good idea?  How bad of a performance hit are we talking about?  Also,

You always need to do backups if you care about your data. What if someone
accidental deletes a lot of data? What if someone blows up your data
center (or there is a flood)?

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Running on an NFS Mounted Directory

2006-04-27 Thread Bruno Wolff III

On Thu, Apr 27, 2006 at 09:06:48 -0400,
  Ketema Harris [EMAIL PROTECTED] wrote:
 Yes, your right, I meant not have to do the backups from the db server
 itself.  I can do that within the storage device now, by allocating space
 for it, and letting the device copy the data files on some periodic basis.

Only if the database server isn't running or your SAN provides a way to
provide a snapshot of the data at a particular instant in time.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Takes too long to fetch the data from database

2006-04-21 Thread Bruno Wolff III

On Fri, Apr 21, 2006 at 10:12:24 +0530,
  soni de [EMAIL PROTECTED] wrote:
 I don't want to query exactly 81900 rows into set. I just want to fetch 50
 or 100 rows at a time in a decreasing order of stime.(i.e 50 or 100 rows
 starting from last to end).

You can do this efficiently, if stime has an index and you can deal with using
stime from the previous query instead of the record count. The idea is to
select up 50 or 100 records in descending order where the stime is =
the previous stime. This can give you some overlapping records, so you need
some way to deal with this.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Takes too long to fetch the data from database

2006-04-20 Thread Bruno Wolff III

On Thu, Apr 20, 2006 at 11:07:31 +0530,
  soni de [EMAIL PROTECTED] wrote:
 Please provide me some help regarding how could I use cursor in following
 cases? :
 
 I want to fetch 50 records at a time starting from largest stime.
 
 SELECT * FROM wan ORDER BY stime LIMIT 50 OFFSET 81900;

Something like the following may be faster:
SELECT * FROM wan ORDER BY stime DESC LIMIT 50;

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Bruno Wolff III

On Fri, Mar 17, 2006 at 08:34:26 -0700,
  Dan Harris [EMAIL PROTECTED] wrote:
 Markus Bertheau wrote:
 Have you tried using a GIST index on lat  long? These things are
 meant for two-dimensional data, whereas btree doesn't handle
 two-dimensional data that well. How many rows satisfy either of the
 long / lat condition?
 
   
 
 According to the analyze, less than 500 rows matched.  I'll look into 
 GIST indexes, thanks for the feedback.

Have you looked at using the Earth Distance contrib module? If a spherical
model of the earth is suitable for your application, then it may work for you
and might be easier than trying to create Gist indexes yourself.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] No vacuum for insert-only database?

2006-03-13 Thread Bruno Wolff III

On Mon, Mar 13, 2006 at 09:19:32 -0800,
  Craig A. James [EMAIL PROTECTED] wrote:
 Alvaro Herrera wrote:
 If I only insert data into a table, never update or delete, then I should 
 never have to vacuum it.  Is that correct?
 
 You still need to vacuum eventually, to avoid transaction Id wraparound
 issues.  But not as often.
 
 Thanks.  Any suggestions for what not as often means?  For example, if my 
 database will never contain more than 10 million rows, is that a problem?  
 100 million rows?  When does transaction ID wraparound become a problem?

I believe it is at billion (10^9).

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] help needed asap....

2006-03-12 Thread Bruno Wolff III

On Sun, Mar 12, 2006 at 11:46:25 -,
  Phadnis [EMAIL PROTECTED] wrote:
   
 1 ) when i try to query for count or for any thg it takes a long time to 
 return the result. How to avoid this

Postgres doesn't cache counts, so if you are counting a lot of records, this
may take a while to run. If you do a lot of counts or need them to be fast
even if it slows other things down, there are some things you can do to address
this. Several strategies have been repeatedly discussed in the archives.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Large Table With Only a Few Rows

2006-02-28 Thread Bruno Wolff III

On Mon, Feb 27, 2006 at 06:48:02 -0800,
  Nik [EMAIL PROTECTED] wrote:
 I have a table that has only a few records in it at the time, and they
 get deleted every few seconds and new records are inserted. Table never
 has more than 5-10 records in it.
 
 However, I noticed a deteriorating performance in deletes and inserts
 on it. So I performed vacuum analyze on it three times (twice in a row,
 and once two days later). In the statistics it says that the table size
 is 863Mb, toast table size is 246Mb, and indexes size is 134Mb, even
 though the table has only 5-10 rows in it it. I was wondering how can I
 reclaim all this space and improve the performance?

You can use VACUUM FULL to recover the space. You should be running normal
VACUUMs on that table every minute or two, not once a day.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] rotate records

2006-02-27 Thread Bruno Wolff III

On Tue, Feb 28, 2006 at 09:14:59 +0530,
  Jeevanandam, Kathirvel (IE10) [EMAIL PROTECTED] wrote:
 Hi all,

Please don't hijack existing threads to start new ones. This can cause
people to miss your question and messes up the archives.

Performance questions should generally be posted to the performance list.
I have redirected followups to there.

 
 I am facing performance issues even with less than 3000 records, I am
 using Triggers/SPs in all the tables. What could be the problem.
 Any idea it is good to use triggers w.r.t performance?

A common cause of this kind of thing is not running vacuum often enough
leaving you with a lot of dead tuples.

You should probably start by doing a vacuum full analyse and then showing
the list some problem query sources along with explain analyse output
for them.

 
 Regards,
 Jeeva.K
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Index Usage using IN

2006-02-01 Thread Bruno Wolff III

On Thu, Feb 02, 2006 at 09:12:59 +1300,
  Ralph Mason [EMAIL PROTECTED] wrote:
 Hi,
 
 I have 2 tables both have an index on ID (both ID columns are an oid).
 
 I want to find only only rows in one and not the other.
 
 Select ID from TableA where ID not IN ( Select ID from Table B)
 
 This always generates sequential scans.
 
 Table A has about 250,000 rows.   Table B has about 250,000 Rows.
 
 We should get a Scan on Table B and a Index Lookup on Table A.

I don't think that is going to work if there are NULLs in table B.
I don't know whether or not Postgres has code to special case NULL testing
(either for constraints ruling them out, or doing probes for them in addition
to the key it is trying to match) for doing NOT IN. Just doing a simple
index probe into table A isn't going to tell you all you need to know if
you don't find a match.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-25 Thread Bruno Wolff III

On Sat, Dec 24, 2005 at 22:13:43 -0500,
  Luke Lonergan [EMAIL PROTECTED] wrote:
 David,
 
  now hot-swap may not be supported on all interface types, that may be what 
  you have run into, but with SCSI or SATA you should be able to hot-swap 
  with the right controller.
 
 That's actually the problem - Linux hot swap is virtually non-functional for 
 SCSI.  You can write into the proper places in /proc, then remove and rescan 
 to get a new drive up, but I've found that the resulting OS state is flaky.  
 This is true of the latest 2.6 kernels and LSI and Adaptec SCSI controllers.
 
 The problems I've seen are  with Linux, not the controllers.

The other option is to keep hot spares available so that you can have a failure
or two before you have to pull drives out. This might allow you to get to a
maintenance window to swap out the bad drives.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Bruno Wolff III

On Thu, Dec 15, 2005 at 21:41:06 -0800,
  Craig A. James [EMAIL PROTECTED] wrote:
 
 If I understand enable_seqscan, it's an all-or-nothing affair.  Turning it 
 off turns it off for the whole database, right?  The same is true of all of 

You can turn it off just for specific queries. However, it will apply to
all joins within a query.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Performance degradation after successive UPDATE's

2005-12-07 Thread Bruno Wolff III

On Wed, Dec 07, 2005 at 14:14:31 +0200,
Assaf Yaari [EMAIL PROTECTED] wrote:
Hi Jan,

As I'm novice with PostgreSQL, can you elaborate the term FSM and
settings recommendations?
http://developer.postgresql.org/docs/postgres/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-FSM

BTW: I'm issuing VACUUM ANALYZE every 15 minutes (using cron) and also
changes the setting of fsync to false in postgresql.conf but still time
seems to be growing.

You generally don't want fsync set to false.

Also no other transactions are open.

Have you given us explain analyse samples yet?

Thanks,
Assaf.

-Original Message-
From: Jan Wieck [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 06, 2005 2:35 PM
To: Assaf Yaari
Cc: Bruno Wolff III; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance degradation after
successive UPDATE's

On 12/6/2005 4:08 AM, Assaf Yaari wrote:
Thanks Bruno,

Issuing VACUUM FULL seems not to have influence on the time.
I've added to my script VACUUM ANALYZE every 100 UPDATE's
and run the
test again (on different record) and the time still increase.

I think he meant

- run VACUUM FULL once,
- adjust FSM settings to database size and turnover ratio
- run VACUUM ANALYZE more frequent from there on.

Jan

Any other ideas?

Thanks,
Assaf.

-Original Message-
From: Bruno Wolff III [mailto:[EMAIL PROTECTED]
Sent: Monday, December 05, 2005 10:36 PM
To: Assaf Yaari
Cc: pgsql-performance@postgresql.org
Subject: Re: Performance degradation after successive UPDATE's

On Mon, Dec 05, 2005 at 19:05:01 +0200,
Assaf Yaari [EMAIL PROTECTED] wrote:
Hi,

I'm using PostgreSQL 8.0.3 on Linux RedHat WS 3.0.

My application updates counters in DB. I left a test
over the night
that increased counter of specific record. After night running
(several hundreds of thousands updates), I found out
that the time
spent on UPDATE increased to be more than 1.5 second (at
the beginning
it was less than 10ms)! Issuing VACUUM ANALYZE and even
reboot didn't
seemed to solve the problem.

You need to be running vacuum more often to get rid of the deleted
rows (update is essentially insert + delete). Once you get
too many,
plain vacuum won't be able to clean them up without
raising the value
you use for FSM. By now the table is really bloated and
you probably
want to use vacuum full on it.

---(end of
broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's
datatypes do not
match

--
#=
=#
# It's easier to get forgiveness for being wrong than for
being right. #
# Let's break this rule - forgive me.
#
#==
[EMAIL PROTECTED] #

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Bruno Wolff III

On Tue, Dec 06, 2005 at 10:52:57 +0100,
  Csaba Nagy [EMAIL PROTECTED] wrote:
 Joost,
 
 Why do you use an offset here ? I guess you're traversing the table
 somehow, in this case it would be better to remember the last zipcode +
 housenumber and put an additional condition to get the next bigger than
 the last one you've got... that would go for the index on
 zipcode+housenumber and be very fast. The big offset forces postgres to
 traverse that many entries until it's able to pick the one row for the
 result...

The other problem with saving an offset, is unless the data isn't changing
or you are doing all of the searches in one serialized transaction, the
fixed offset might not put you back where you left off.
Using the last key, instead of counting records is normally a better way
to do this.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Performance degradation after successive UPDATE's

2005-12-05 Thread Bruno Wolff III

On Mon, Dec 05, 2005 at 19:05:01 +0200,
  Assaf Yaari [EMAIL PROTECTED] wrote:
 Hi,
  
 I'm using PostgreSQL 8.0.3 on Linux RedHat WS 3.0.
  
 My application updates counters in DB. I left a test over the night that
 increased counter of specific record. After night running (several
 hundreds of thousands updates), I found out that the time spent on
 UPDATE increased to be more than 1.5 second (at the beginning it was
 less than 10ms)! Issuing VACUUM ANALYZE and even reboot didn't seemed to
 solve the problem.

You need to be running vacuum more often to get rid of the deleted rows
(update is essentially insert + delete). Once you get too many, plain
vacuum won't be able to clean them up without raising the value you use for
FSM. By now the table is really bloated and you probably want to use
vacuum full on it.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Effects of cascading references in foreign keys

2005-10-29 Thread Bruno Wolff III

On Sat, Oct 29, 2005 at 13:10:31 +0200,
  Martin Lesser [EMAIL PROTECTED] wrote:
 Which effects have UPDATEs on REFERENCEd TABLEs when only columns in the
 referenced table are updated which are not part of the FOREIGN KEY
 constraint?

In 8.1 there is a check to see if the foreign key value has changed and if
not a trigger isn't queued. In the currently released versions any update
will fire triggers.
The check in comment for trigger.c didn't say if this optimization applied
to both referencing and referenced keys or just one of those.
If you need to know more you can look at the code at:
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/
for trigger.c.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Effects of cascading references in foreign keys

2005-10-29 Thread Bruno Wolff III

On Sat, Oct 29, 2005 at 08:24:32 -0600,
  Michael Fuhr [EMAIL PROTECTED] wrote:
  Does an UPDATE of e.g. m_fld1 in t_master cause a 'lookup' in all tables
  which have a cascading update-rule or is this 'lookup' only triggered if
  the referenced column in t_master is explicitly updated?
 
 My tests suggest that a lookup on the referring key is done only
 if the referenced key is changed.  Here's an example from 8.1beta4;
 I used this version because EXPLAIN ANALYZE shows triggers and the
 time spent in them, but I see similar performance characteristics
 in earlier versions.  I've intentionally not put an index on the
 referring column to make lookups on it slow.

It looks like this feature was added last May, so I think it only applies
to 8.1.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] blue prints please

2005-10-26 Thread Bruno Wolff III

On Tue, Oct 25, 2005 at 22:24:06 -0600,
  Sidar López Cruz [EMAIL PROTECTED] wrote:
 where can i find bests practices for tunning postgresql?

You should first read the documentation. For 8.1, that would be here:
http://developer.postgresql.org/docs/postgres/runtime-config.html

There is also good information on techdocs at:
http://techdocs.postgresql.org/#techguides
(Look under the subcategory optimising.)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] prepared transactions that persist across sessions?

2005-10-23 Thread Bruno Wolff III

On Sun, Oct 23, 2005 at 00:14:23 -0400,
  [EMAIL PROTECTED] wrote:
 Hey all.
 
 Please point me to a place I should be looking if this is a common
 question that has been debated periodically and at great length
 already. :-)

You probably want to read:
http://candle.pha.pa.us/main/writings/pgsql/sgml/runtime-config-query.html

Connection pooling might be another approach, since it should be possible
to reuse prepared statements when reusing a connection.

 I have a complex query. It's a few Kbytes large, and yes, I've already
 worked on reducing it to be efficient in terms of database design, and
 minimizing the expressions used to join the tables. Running some timing
 tests, I've finding that the query itself, when issued in full, takes
 around 60 milliseconds to complete on modest hardware. If prepared, and
 then executed, however, it appears to take around 60 milliseconds to
 prepare, and 20 milliseconds to execute. I'm not surprised. PostgreSQL
 is very likely calculating the costs of many, many query plans.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] The need for full vacuum / reindex

2005-09-27 Thread Bruno Wolff III

On Wed, Sep 28, 2005 at 05:33:27 +0200,
  Tobias Brox [EMAIL PROTECTED] wrote:
 By occation, we dropped the whole production database and refreshed it from
 a database backup - and all our performance problems seems to have gone.  I
 suppose this means that to keep the database efficient, one eventually does
 have to do reindexing and/or full vacuum from time to time?

Normally you only need to do that if you didn't vacuum often enough or with
high enough fsm setting and bloat has gotten out of hand to the point that
you need to recover some space.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] VACUUM FULL vs CLUSTER

2005-09-23 Thread Bruno Wolff III

On Tue, Sep 20, 2005 at 14:53:19 -0400,
  Markus Benne [EMAIL PROTECTED] wrote:
 I have a table that is purged by 25% each night.  I'd like to do a
 vacuum nightly after the purge to reclaim the space, but I think I'll
 still need to do a vacuum full weekly.
 
 Would there be any benefit to doing a cluster instead of the vacuum?

If you have a proper FSM setting you shouldn't need to do vacuum fulls
(unless you have an older version of postgres where index bloat might
be an issue).

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] VACUUM FULL vs CLUSTER

2005-09-23 Thread Bruno Wolff III

On Fri, Sep 23, 2005 at 18:16:44 +0200,
  Stef [EMAIL PROTECTED] wrote:
 Bruno Wolff III mentioned :
 = If you have a proper FSM setting you shouldn't need to do vacuum fulls
 = (unless you have an older version of postgres where index bloat might
 = be an issue).
 
 What version of postgres was the last version that had
 the index bloat problem?

You can check the release notes to be sure, but my memory is that the
unbounded bloat problem was fixed in 7.4. There still are usage patterns
that can result in bloating, but it is limited to some constant multiplier
of the minimum index size.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Slow update

2005-09-12 Thread Bruno Wolff III

On Mon, Sep 12, 2005 at 10:14:25 +0100,
  Hilary Forbes [EMAIL PROTECTED] wrote:
 Hello everyone
 
 I must be doing something very wrong here so help please!  I have two tables
 
 tableA has 300,000 recs
 tableB has 20,000 recs
 
 I need to set the value of a field in table A to a value in table B depending 
 on the existence of the record in table B.  So what I have done is
 
 UPDATE tableA set a.val1=b.somefield FROM tableA a, tableB b WHERE 
 a.key1=b.key1;
 
 The primary key of tableA is key1 and that of tableB is key1 ie the join is 
 on primary keys.
 
 The optimizer has elected to d a sequential scan on tableA to determine 
 which fields to update rather than the query being driveb by tableB and it is 
 taking forever.  Surely I must be able to force the system to read down 
 tableB in preference to reading down tableA?

It would help to see the exact query and the explain analyze output. Hopefully
you didn't really write the query similar to above, since it is using illegal
syntax and the if it was changed slightly to become legal than it would do a
cross join of table A with the inner join of tableA and tableB, which isn't
what you want.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] RAID Configuration Sugestion

2005-08-30 Thread Bruno Wolff III

On Tue, Aug 30, 2005 at 09:37:17 -0300,
  Alvaro Nunes Melo [EMAIL PROTECTED] wrote:
 
 The server is a Dual Xeon 3.0 with 2 GB RAM and two SCSI disks. Our main 
 doubt is what is the best configuration for the disks. We are thinking 
 about use them in a RAID-0 array. Is this the best option? What do you 
 suggest on partitioning? Separate partitions for the OS, data and pg_xlog?

You don't have a lot of options with just two disks. What are you trying
to accomplish with raid?

Raid 0 will possibly give you some speed up, while raid 1 will give you some
fault tolerance, some speed of of reads, but cost you half your disk space.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] RAID Configuration Sugestion

2005-08-30 Thread Bruno Wolff III

Please keep replies copied to the list so that others may contribute to
and learn from the discussion.

On Tue, Aug 30, 2005 at 10:15:13 -0300,
  Alvaro Nunes Melo [EMAIL PROTECTED] wrote:
 Hello Bruno,
 
 Bruno Wolff III wrote:
 
 On Tue, Aug 30, 2005 at 09:37:17 -0300,
  Alvaro Nunes Melo [EMAIL PROTECTED] wrote:
  
 
 The server is a Dual Xeon 3.0 with 2 GB RAM and two SCSI disks. Our main 
 doubt is what is the best configuration for the disks. We are thinking 
 about use them in a RAID-0 array. Is this the best option? What do you 
 suggest on partitioning? Separate partitions for the OS, data and pg_xlog?
 
 Our main goal is performance speedup. Disk space might not be a problem. 
 I've read a lot here about movig pg_xlog to different partitions, and 
 we'll surely make tests to see what configuration might be better.

This isn't a very good mix of hardware for running postgres. Xeons have
some context switching issues for which you will probably see some
speed up in 8.1. (So if you aren't going into production for sevral
months you might want to be using 8.1beta.) Having only two disk drives
is also not a good idea.

With what you have you either want to use raid 0 and not worry too much
about how the disks are partitioned or use one disk for wal logging
and the other for other stuff. There are other people on the list who
can probably give you a better idea of which of these options is likely
to be better in your case. However, they may need to know more about
your raid controller. In particular how much battery backed memory does
it have and its model.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Need indexes on empty tables for good performance ?

2005-08-27 Thread Bruno Wolff III

On Tue, Aug 23, 2005 at 13:41:32 +1000,
  Lenard, Rohan (Rohan) [EMAIL PROTECTED] wrote:
 I've read that indexes aren't used for COUNT(*) and I've noticed (7.3.x)
 with EXPLAIN that indexes never seem to be used on empty tables - is
 there any reason to have indexes on empty tables, or will postgresql
 never use them.

count will use indexes if appropiate. The counts themselves are NOT in the
indexes, so counts of significant fractions of a table (in particular
of the whole table) won't benefit from indexes.

You aren't going to get query speed ups by putting indexes on empty tables.
However, they may be required if you have unique or primary keys declared
in the table. You may want them to enforce some kinds of constraints.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] How does the planner execute unions?

2005-08-26 Thread Bruno Wolff III

On Fri, Aug 26, 2005 at 16:14:18 -0400,
  Chris Hoover [EMAIL PROTECTED] wrote:
 Hopefully a quick question.
 
 In 7.3.4, how does the planner execute a query with union alls in it?
 
 Does it execute the unions serially, or does it launch a thread for
 each union (or maybe something else entirely).

Postgres doesn't have parallel execution of parts of queries. So it is
going to do one part followed by the other part.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Caching by Postgres

2005-08-23 Thread Bruno Wolff III

On Tue, Aug 23, 2005 at 10:10:45 -0700,
  gokulnathbabu manoharan [EMAIL PROTECTED] wrote:
 Hi all,
 
 I like to know the caching policies of Postgresql. 
 What parameter in the postgresql.conf affects the
 cache size used by the Postgresql?  As far as I have
 searched my knowledge of the parameters are

The main policy is to let the OS do most of the caching.

 1. shared_buffers - Sets the limit on the amount of
 shared memory used.  If I take this is as the cache
 size then my performance should increase with the
 increase in the size of shared_buffers.  But it seems
 it is not the case and my performance actually
 decreases with the increase in the shared_buffers.  I
 have a RAM size of 32 GB.  The table which I use more
 frequently has around 68 million rows.  Can I cache
 this entire table in RAM?

Using extermely large values for shared buffers is known to be a performance
loss for Postgres. Some improvements were made for 8.0 and more for 8.1.

The OS will cache frequently used data from files for you. So if you are using
that table a lot and the rows aren't too wide, it should mostly be cached
for you by the OS.

 2. work_mem - It is the amount of memory used by an
 operation.  My guess is once the operation is complete
 this is freed and hence has nothing to do with the
 caching.

This is used for sorts and some other things.

 3. effective_cache_size - The parameter used by the
 query planner and has nothing to do with the actual
 caching.

You are supposed to use this to give the planner an idea about how much
space the OS will using for caching on behalf of Posgres.

 So kindly help me in pointing me to the correct
 parameter to set.
 
 It will be great if you can point me to the docs that
 explains the implementation of caching in Postgresql
 which will help me in understanding things much
 clearly.

You probably want to read the following:
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] BUG #1797: Problem using Limit in a function, seqscan

2005-07-29 Thread Bruno Wolff III

On Fri, Jul 29, 2005 at 13:52:45 +0100,
  Magno Leite [EMAIL PROTECTED] wrote:
 
 Description:Problem using Limit in a function, seqscan
 
 I looked for about this problem in BUG REPORT but I can't find. This is my
 problem, when I try to use limit in a function, the Postgre doesn't use my
 index, then it use sequencial scan. What is the problem ?

You haven't described the problem well enough to allow us to help you and
you posted it to the wrong list. This should be discussed on the performance
list, not the bug list.

It would help if you showed us the query you are running and run it outside
of the function with EXPLAIN ANALYSE and show us that output. Depending
on what that output shows, we may ask you other questions.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] slow joining very large table to smaller ones

2005-07-15 Thread Bruno Wolff III

On Thu, Jul 14, 2005 at 16:29:58 -0600,
  Dan Harris [EMAIL PROTECTED] wrote:
 
 Ok, I tried this one.  My ssh keeps getting cut off by a router  
 somewhere between me and the server due to inactivity timeouts, so  
 all I know is that both the select and explain analyze are taking  
 over an hour to run.  Here's the explain select for that one, since  
 that's the best I can get.

Are you using NAT at home? That's probably where the issue is. If you
have control of that box you can probably increase the timeout to a
couple of hours.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] join and query planner

2005-07-12 Thread Bruno Wolff III

On Wed, Jul 06, 2005 at 18:54:02 -0300,
  Dario Pudlo [EMAIL PROTECTED] wrote:
 (first at all, sorry for my english)
 Hi.
- Does left join restrict the order in which the planner must join
 tables? I've read about join, but i'm not sure about left join...

The left join operator is not associative so in general the planner doesn't
have much flexibility to reorder left (or right) joins.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Why the planner is not using the INDEX .

2005-07-05 Thread Bruno Wolff III

On Mon, Jul 04, 2005 at 20:29:50 -0400,
  David Gagnon [EMAIL PROTECTED] wrote:
 Thanks .. I miss that FK don't create indexed ...  since Primary key 
 implicitly does ...
 
 I'm a bit surprised of that behavior thought, since it means that if we 
 delete a row from table A all tables (B,C,D) with FK pointing to this 
 table (A) must be scanned. 

But in some applications you don't ever do that, so you don't save
anything by having the index for deletes but have to pay the cost to
update it when modifying the referencing table.

If you think an index will help in your case, just create one.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] tricky query

2005-06-28 Thread Bruno Wolff III

On Tue, Jun 28, 2005 at 10:21:16 -0400,
  Merlin Moncure [EMAIL PROTECTED] wrote:
 I need a fast way (sql only preferred) to solve the following problem:
 
 I need the smallest integer that is greater than zero that is not in the
 column of a table.  In other words, if an 'id' column has values
 1,2,3,4,6 and 7, I need a query that returns the value of 5.
 
 I've already worked out a query using generate_series (not scalable) and
 pl/pgsql.  An SQL only solution would be preferred, am I missing
 something obvious?

I would expect that using generate series from the 1 to the max (using
order by and limit 1 to avoid extra sequential scans) and subtracting
out the current list using except and then taking the minium value
would be the best way to do this if the list is pretty dense and
you don't want to change the structure.

If it is sparse than you can do a special check for 1 and if that
is present find the first row whose successor is not in the table.
That shouldn't be too slow.

If you are willing to change the structure you might keep one row for
each number and use a flag to mark which ones are empty. If there are
relatively few empty rows at any time, then you can create a partial
index on the row number for only empty rows.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] tricky query

2005-06-28 Thread Bruno Wolff III

On Tue, Jun 28, 2005 at 12:02:09 -0400,
  Merlin Moncure [EMAIL PROTECTED] wrote:
 
 Confirmed.  Hats off to you, the above some really wicked querying.
 IIRC I posted the same question several months ago with no response and
 had given up on it.  I think your solution (smallest X1 not in X) is a
 good candidate for general bits, so I'm passing this to varlena for
 review :)
 
 SELECT t1.id+1 as id_new FROM id_test t1
 WHERE NOT EXISTS
 (SELECT t2.id FROM id_test t2 WHERE t2.id = t1.id+1)
 ORDER BY t1.id LIMIT 1;

You need to rework this to check to see if row '1' is missing. The
above returns the start of the first gap after the first row that
isn't missing.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] parameterized LIKE does not use index

2005-06-23 Thread Bruno Wolff III

On Thu, Jun 23, 2005 at 10:33:18 +0200,
  Kurt De Grave [EMAIL PROTECTED] wrote:
 
 Now it's tempting to dream of some mechanism that could make the
 database consider
 replanning the query automatically once it knows the parameter, or
 choose from
 a set of plans depending on the parameter. In this case the general plan
 was about three orders
 of magnitude slower than the specialized plan. But I guess this case is
 not all that common
 and the developer can work around it.

I remember some discussion about delaying planning until the first
actual query so that planning could use actual parameters to do
the planning. If you really want to have it check the parameters
every time, I think you will need to replan every time. I don't
know if there is a way to save some of the prepare working while
doing this.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Needed: Simplified guide to optimal memory configuration

2005-06-16 Thread Bruno Wolff III

On Thu, Jun 16, 2005 at 07:46:45 -0700,
  Todd Landfried [EMAIL PROTECTED] wrote:
 Yes, it is 7.2. Why? because an older version of our software runs on  
 RH7.3 and that was the latest supported release of Postgresql for  
 RH7.3 (that we can find). We're currently ported to 8, but we still  
 have a large installed base with the other version.

You can build it from source. I run 8.0 stable from CVS on a RH 6.1 box.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Needed: Simplified guide to optimal memory configuration

2005-06-15 Thread Bruno Wolff III

On Wed, Jun 15, 2005 at 02:06:27 -0700,
  Todd Landfried [EMAIL PROTECTED] wrote:
 
 What's the problem? The sucker gets s-l-o-w on relatively simple  
 queries. For example, simply listing all of the users online at one  
 time takes 30-45 seconds if we're talking about 800 users. We've  
 adjusted the time period for vacuuming the tables to the point where  
 it occurs once an hour, but we're getting only a 25% performance gain  
 from that. We're looking at the system settings now to see how those  
 can be tweaked.

It might be useful to see example slow queries and the corresponding
explain analyze output.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Index ot being used

2005-06-13 Thread Bruno Wolff III

On Mon, Jun 13, 2005 at 00:29:08 -0400,
  Madison Kelly [EMAIL PROTECTED] wrote:
 Bruno Wolff III wrote:
 On Sun, Jun 12, 2005 at 23:42:05 -0400,
   Madison Kelly [EMAIL PROTECTED] wrote:
 
 As you probably saw in my last reply, I went back to the old index and 
 tried the query you and Tom Lane recommended. Should this not have 
 caught the index?
 
 
 Probably, but there might be some other reason the planner thought it
 was better to not use it. Using indexes is not always faster.
 
 It would help to see your latest definition of the table and indexes,
 the exact query you used and explain analyze output.
 
 
 Okay, here's what I have at the moment:
 
 tle-bu= \d file_info_7   Table 
 public.file_info_7
 Column| Type |Modifiers
 --+--+-
  file_group_name  | text |
  file_group_uid   | bigint   | not null
  file_mod_time| bigint   | not null
  file_name| text | not null
  file_parent_dir  | text | not null
  file_perm| text | not null
  file_size| bigint   | not null
  file_type| character varying(2) | not null default 
 'f'::character varying
  file_user_name   | text |
  file_user_uid| bigint   | not null
  file_backup  | boolean  | not null default true
  file_display | boolean  | not null default false
  file_restore_display | boolean  | not null default false
  file_restore | boolean  | not null default false
 Indexes:
 file_info_7_display_idx btree (file_parent_dir, file_name)
 
 
 tle-bu= \d file_info_7_display_idx
 Index public.file_info_7_display_idx
  Column  | Type
 -+--
  file_parent_dir | text
  file_name   | text
 btree, for table public.file_info_7
 
 
 tle-bu= EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display 
 FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC, 
 file_name ASC;
  QUERY PLAN
 
  Sort  (cost=15091.53..15165.29 rows=29502 width=114) (actual 
 time=12834.933..12955.136 rows=25795 loops=1)
Sort Key: file_parent_dir, file_name
-  Seq Scan on file_info_7  (cost=0.00..11762.44 rows=29502 
 width=114) (actual time=0.244..2533.388 rows=25795 loops=1)
  Filter: ((file_type)::text = 'd'::text)
  Total runtime: 13042.421 ms
 (5 rows)
 
 
   Since my last post I went back to a query closer to what I actually 
 want. What is most important to me is that 'file_parent_dir, file_name, 
 file_display' are returned and that the results are sorted by 
 'file_parent_dir, file_name' and the results are restricted to where 
 'file_info='d''.

I am guessing you mean 'file_type' instead of 'file_info'.

To do this efficiently you want an index on (file_type, file_parent_dir,
file_name). Currently you only have an index on (file_parent_dir, file_name)
which won't help for this query. You also need to order by file_type
even though it will be constant for all of the returned rows in order
to help out the planner. This will allow an index scan over the desired
rows that returns them in the desired order.

Please actually try this before changing anything else.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Index ot being used

2005-06-13 Thread Bruno Wolff III

On Mon, Jun 13, 2005 at 09:51:57 -0500,
  John A Meinel [EMAIL PROTECTED] wrote:
 
 I don't know if there are specific reasons why not, other than just not
 being implemented yet. It might be tricky to get it correct (for
 instance, how do you know which columns can be added, which ones will be
 constant) Perhaps you could just potentially add the WHERE items if they
 have an equality constraint with a constant. But I'm guessing there are
 more cases than that where the optimization could be performed.

I think there is already some intelligence about which expressions are
constant in particular parts of a plan.

I think you need to be able to do two things. One is to drop constant
expressions from order by lists. The other is when looking for an index
to produce a specific ordering, to ingore leading constant expressions
when comparing to the order by expressions.

 Also, the more options you give the planner, the longer it takes on
 average to plan any single query. Yes, it is beneficial for this use
 case, but does that balance out slowing down all the other queries by a
 tiny bit.

But there aren't that many possible indexes, so I don't expect this will
slow things down much more than the current check for potentially useful
indexes.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Index ot being used

2005-06-13 Thread Bruno Wolff III

On Mon, Jun 13, 2005 at 12:22:14 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 I don't think the use-case has been shown that justifies doing this much
 work to ignore useless ORDER BY clauses.  The examples that have come up
 in the past all suggest ignoring index columns not the other way 'round.
 Can you make a case that we need to do that part of it?

I don't think so. I don't think people are likely to order by constant
expressions except by adding them to the front to help optimization.
When I was thinking about this I was looking at what equivalences could
be used and didn't look back to see which ones would be useful in the
normal case. And I think it is a lot more likely people will leave out
columns they know not to be relevant than to include them.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Index ot being used

2005-06-13 Thread Bruno Wolff III

On Mon, Jun 13, 2005 at 11:46:46 -0500,
  Kevin Grittner [EMAIL PROTECTED] wrote:
 I agree that ignoring useless columns in an ORDER BY clause is less
 important than ignoring index columns where the value is fixed.  There
 is one use case for ignoring useless ORDER BY columns that leaps to
 mind, however -- a column is added to the ORDER BY clause of a query to
 help out the optimizer, then the indexes are modified such that that
 column is no longer useful.  Whether this merits the programming effort
 and performance hit you describe seems highly questionable, though.

I suspect that this isn't a big deal. There was a question like that
that has been going back and forth over the last couple of days.

If you remove the constant expression from the index, you aren't likely
going to use the index anyway, but will instead sort the output rows
from either a sequential scan or an index scan based on an index
that does use the constant expression.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Index ot being used

2005-06-13 Thread Bruno Wolff III

On Mon, Jun 13, 2005 at 15:05:00 -0400,
  Madison Kelly [EMAIL PROTECTED] wrote:
 Wow!
 
 With the sequence scan off my query took less than 2sec. When I turned 
 it back on the time jumped back up to just under 14sec.
 
 
 tle-bu= set enable_seqscan = off; SET
 tle-bu= EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display 
 FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC, 
 file_parent_dir ASC, file_name ASC;
 
 QUERY PLAN
 
  Index Scan using file_info_7_display_idx on file_info_7 
 (cost=0.00..83171.78 rows=25490 width=119) (actual 
 time=141.405..1700.459 rows=25795 loops=1)
Index Cond: ((file_type)::text = 'd'::text)
  Total runtime: 1851.366 ms
 (3 rows)
 
 
 tle-bu= set enable_seqscan = on; SET
 tle-bu= EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_display 
 FROM file_info_7 WHERE file_type='d' ORDER BY file_type ASC, 
 file_parent_dir ASC, file_name ASC;
  QUERY PLAN
 
  Sort  (cost=14810.92..14874.65 rows=25490 width=119) (actual 
 time=13605.185..13728.436 rows=25795 loops=1)
Sort Key: file_type, file_parent_dir, file_name
-  Seq Scan on file_info_7  (cost=0.00..11956.84 rows=25490 
 width=119) (actual time=0.048..2018.996 rows=25795 loops=1)
  Filter: ((file_type)::text = 'd'::text)
  Total runtime: 13865.830 ms
 (5 rows)
 
   So the index obiously provides a major performance boost! I just need 
 to figure out how to tell the planner how to use it...

The two things you probably want to look at are (in postgresql.conf):
effective_cache_size = 1# typically 8KB each
random_page_cost = 2# units are one sequential page fetch cost

Increasing effective cache size and decreasing the penalty for random
disk fetches will favor using index scans. People have reported that
dropping random_page_cost from the default of 4 to 2 works well.
Effective cache size should be set to some reasonable estimate of
the memory available on your system to postgres, not counting that
set aside for shared buffers.

However, since the planner thought the index scan plan was going to be 6 times
slower than the sequential scan plan, I don't know if tweaking these values
enough to switch the plan choice won't cause problems for other queries.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Index ot being used

2005-06-12 Thread Bruno Wolff III

On Sun, Jun 12, 2005 at 10:12:27 -0400,
  Madison Kelly [EMAIL PROTECTED] wrote:
 Indexes:
 file_info_7_display_idx btree (file_type, file_parent_dir, file_name)

   Here is my full query:
 
 tle-bu= EXPLAIN ANALYZE SELECT file_name, file_parent_dir, file_type 
 FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC, 
 file_name ASC;
  QUERY PLAN
 -

This is a case where postgres's planner can't make a deduction needed for
it to realize that the index can be used. Try rewriting the query as:

SELECT file_name, file_parent_dir, file_type 
  FROM file_info_7 WHERE file_type='d'
  ORDER BY file_type ASC, file_parent_dir ASC, file_name ASC;

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Index ot being used

2005-06-12 Thread Bruno Wolff III

On Sun, Jun 12, 2005 at 18:52:05 -0400,
  Madison Kelly [EMAIL PROTECTED] wrote:
 
   After sending that email I kept plucking away and in the course of 
 doing so decided that I didn't need to return the 'file_type' column. 
 Other than that, it would see my query now matches what you two have 
 recommended in the 'ORDER BY...' front but I still can't get an index 
 search.

No it doesn't. Even if you don't return file_type you still need it
in the order by clause if you want postgres to consider using your
index.

Is there some reason you didn't actually try out our suggestion, but are
now asking for more advice?

 
   Here is the latest query and the new index:
 
 tle-bu= \d file_info_7_display_idx;
 Index public.file_info_7_display_idx
  Column  | Type
 -+--
  file_parent_dir | text
  file_name   | text
 btree, for table public.file_info_7
 
 tle-bu= EXPLAIN ANALYZE SELECT file_parent_dir, file_name, file_display 
 FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC, 
 file_name ASC;

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Index ot being used

2005-06-12 Thread Bruno Wolff III

On Sun, Jun 12, 2005 at 22:00:01 -0500,
  Bruno Wolff III [EMAIL PROTECTED] wrote:
 On Sun, Jun 12, 2005 at 18:52:05 -0400,
   Madison Kelly [EMAIL PROTECTED] wrote:
  
After sending that email I kept plucking away and in the course of 
  doing so decided that I didn't need to return the 'file_type' column. 
  Other than that, it would see my query now matches what you two have 
  recommended in the 'ORDER BY...' front but I still can't get an index 
  search.
 
 No it doesn't. Even if you don't return file_type you still need it
 in the order by clause if you want postgres to consider using your
 index.

I didn't notice that you had changed the index. The reason this index
doesn't help is that you can't use it to select on records with the
desired file_type.

 
 Is there some reason you didn't actually try out our suggestion, but are
 now asking for more advice?
 
  
Here is the latest query and the new index:
  
  tle-bu= \d file_info_7_display_idx;
  Index public.file_info_7_display_idx
   Column  | Type
  -+--
   file_parent_dir | text
   file_name   | text
  btree, for table public.file_info_7
  
  tle-bu= EXPLAIN ANALYZE SELECT file_parent_dir, file_name, file_display 
  FROM file_info_7 WHERE file_type='d' ORDER BY file_parent_dir ASC, 
  file_name ASC;
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Index ot being used

2005-06-12 Thread Bruno Wolff III

On Sun, Jun 12, 2005 at 23:42:05 -0400,
  Madison Kelly [EMAIL PROTECTED] wrote:
 
 As you probably saw in my last reply, I went back to the old index and 
 tried the query you and Tom Lane recommended. Should this not have 
 caught the index?

Probably, but there might be some other reason the planner thought it
was better to not use it. Using indexes is not always faster.

It would help to see your latest definition of the table and indexes,
the exact query you used and explain analyze output.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] full outer performance problem

2005-06-08 Thread Bruno Wolff III

On Wed, Jun 08, 2005 at 11:37:40 +0200,
  Kim Bisgaard [EMAIL PROTECTED] wrote:
 Hi,
 
 I'm having problems with the query optimizer and FULL OUTER JOIN on 
 PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins. 
 I might be naive, but I think that it should be possible?
 
 I have two BIG tables (virtually identical) with 3 NOT NULL columns 
 Station_id, TimeObs, Temp_, with unique indexes on (Station_id, 
 TimeObs) and valid ANALYSE (set statistics=100). I want to join the two 
 tables with a FULL OUTER JOIN.
 
 When I specify the query as:
 
 SELECT station_id, timeobs,temp_grass, temp_dry_at_2m
FROM temp_dry_at_2m a
FULL OUTER JOIN temp_grass b 
USING (station_id, timeobs)
WHERE station_id = 52981
  AND timeobs = '2004-1-1 0:0:0'
 
 I get the correct results
 
 station_id |   timeobs   | temp_grass | temp_dry_at_2m
 +-++
  52944 | 2004-01-01 00:10:00 ||   -1.1
 (1 row)
 
 BUT LOUSY performance, and the following EXPLAIN:
 
   
QUERY PLAN
 --
 Merge Full Join  (cost=1542369.83..1618958.58 rows=6956994 width=32) 
 (actual time=187176.408..201436.264 rows=1 loops=1)
   Merge Cond: ((outer.station_id = inner.station_id) AND 
   (outer.timeobs = inner.timeobs))
   Filter: ((COALESCE(outer.station_id, inner.station_id) = 52981) AND 
   (COALESCE(outer.timeobs, inner.timeobs) = '2004-01-01 
   00:00:00'::timestamp without time zone))
   -  Sort  (cost=1207913.44..1225305.93 rows=6956994 width=16) (actual 
   time=145748.253..153851.607 rows=6956994 loops=1)
 Sort Key: a.station_id, a.timeobs
 -  Seq Scan on temp_dry_at_2m a  (cost=0.00..117549.94 
 rows=6956994 width=16) (actual time=0.049..54226.770 rows=6956994 
 loops=1)
   -  Sort  (cost=334456.38..340472.11 rows=2406292 width=16) (actual 
   time=31668.876..34491.123 rows=2406292 loops=1)
 Sort Key: b.station_id, b.timeobs
 -  Seq Scan on temp_grass b  (cost=0.00..40658.92 rows=2406292 
 width=16) (actual time=0.052..5484.489 rows=2406292 loops=1)
 Total runtime: 201795.989 ms
 (10 rows)

Someone else will need to comment on why Postgres can't use a more
efficient plan. What I think will work for you is to restrict
the station_id and timeobs on each side and then do a full join.
You can try something like the sample query below (which hasn't been tested):
SELECT station_id, timeobs, temp_grass, temp_dry_at_2m
  FROM
(SELECT station_id, timeobs, temp_dry_at_2m
  FROM temp_dry_at_2m
  WHERE
station_id = 52981
AND
timeobs = '2004-1-1 0:0:0') a
FULL OUTER JOIN
(SELECT station_id, timeobs, temp_grass
  FROM temp_grass
  WHERE
station_id = 52981
AND
timeobs = '2004-1-1 0:0:0') b
USING (station_id, timeobs)

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Help with rewriting query

2005-06-08 Thread Bruno Wolff III

On Wed, Jun 08, 2005 at 15:48:27 -0700,
  Junaili Lie [EMAIL PROTECTED] wrote:
 Hi,
 The suggested query below took forever when I tried it.
 In addition, as suggested by Tobias, I also tried to create index on
 food(p_id, id), but still no goal (same query plan).
 Here is the explain:
 TEST1=# explain select f.p_id, max(f.id) from Food f, Person p where
 (f.p_id = p.id) group by p.id;

The above is going to require reading all the food table (assuming no
orphaned records), so the plan below seems reasonable.

   QUERY PLAN
 
  GroupAggregate  (cost=0.00..214585.51 rows=569 width=16)
   -  Merge Join  (cost=0.00..200163.50 rows=2884117 width=16)
 Merge Cond: (outer.id = inner.p_id)
 -  Index Scan using person_pkey on person p
 (cost=0.00..25.17 rows=569 width=8)
 -  Index Scan using person_id_food_index on food f
 (cost=0.00..164085.54 rows=2884117 width=16)
 (5 rows)
 
 
 
 
 TEST1=# explain select p.id, (Select f.id from food f where
 f.p_id=p.id order by f.id desc limit 1) from person p;

Using a subselect seems to be the best hope of getting better performance.
I think you almost got it right, but in order to use the index on
(p_id, id) you need to order by f.p_id desc, f.id desc. Postgres won't
deduce this index can be used because f.p_id is constant in the subselect,
you need to give it some help.

QUERY PLAN
 ---
  Seq Scan on Person p  (cost=1.00..17015.24 rows=569 width=8)
   SubPlan
 -  Limit  (cost=0.00..12.31 rows=1 width=8)
   -  Index Scan Backward using food_pkey on food f
 (cost=0.00..111261.90 rows=9042 width=8)
 Filter: (p_id = $0)
 (5 rows)
 
 any ideas or suggestions is appreciate.
 
 
 On 6/8/05, Tobias Brox [EMAIL PROTECTED] wrote:
  [Junaili Lie - Wed at 12:34:32PM -0700]
   select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
   by f.p_id will work.
   But I understand this is not the most efficient way. Is there another
   way to rewrite this query? (maybe one that involves order by desc
   limit 1)
  
  eventually, try something like
  
   select p.id,(select f.id from food f where f.p_id=p.id order by f.id desc 
  limit 1)
   from person p
  
  not tested, no warranties.
  
  Since subqueries can be inefficient, use explain analyze to see which one
  is actually better.
  
  This issue will be solved in future versions of postgresql.
  
  --
  Tobias Brox, +47-91700050
  Tallinn
 
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] BUG #1697: Select getting slower on continously updating data

2005-06-03 Thread Bruno Wolff III

On Fri, Jun 03, 2005 at 00:09:00 -0700,
  Bahadur Singh [EMAIL PROTECTED] wrote:
 
 Many thanks for this tip !
 But is this good idea to analyse/vacuuming the
 database tables while updates are taking place..
 Since, I update continuously say (100,000 ) times or
 more the same data set.
 
 This is the result of analyze command.
 
 INFO:  analyzing public.salesarticle
 INFO:  salesarticle: scanned 3000 of 20850 pages,
 containing 62 live rows and 134938 dead rows; 62 rows
 in sample, 431 estimated total rows
 
 Gesamtlaufzeit der Abfrage: 5531 ms.
 Total Time Taken : 5531 ms.
 
 Can you suggest me some clever way to so, because I
 would prefer to do vaccumming while database is not
 loaded with queries/transactions.

While that may be a nice preference, under your usage pattern that does
not appear to be a good idea. As long as your disk I/O isn't saturated
you want to be running vacuums a lot more often than you are. (Analyze should
only be needed if the distrution of values is changing constantly. An example
would be timestamps indicating when an update occured.)

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Index on a NULL-value

2005-05-30 Thread Bruno Wolff III

On Tue, May 31, 2005 at 11:02:07 +0800,
  Tobias Brox [EMAIL PROTECTED] wrote:
 I read in the manual today:
 
   Indexes are not used for IS NULL clauses by default. The best way to use
   indexes in such cases is to create a partial index using an IS NULL
   predicate.
   
 This is from the documentation for PostgreSQL 8.  I did not find anything
 equivalent in the 7.4.8-documentation.
   
 I wasn't aware of this until it became an issue :-) Well, so I follow the
 tip but in vain.  Reduced and reproduced like this in PostgreSQL 7.4.7:
 
 test=# create table mock(a int, b int);
 CREATE TABLE
 test=# create index b_is_null on mock((b IS NULL));
 CREATE INDEX
 test=# insert into mock values (10,20);
 INSERT 70385040 1
 test=# insert into mock values (20,30);
 INSERT 70385041 1
 test=# insert into mock values (30, NULL);
 INSERT 70385042 1
 test=# set enable_seqscan=off; 
 SET
 test=# explain select * from mock where b is NULL;
  QUERY PLAN 
 
  Seq Scan on mock (cost=1.00..10020.00 rows=6 width=8)
Filter: (b IS NULL)
 (2 rows)
 
 vacuum analyze also didn't help to recognize the index ;-)

It isn't surprising that an index wasn't used since a sequential scan is
going to be faster in your test case.

If you want to test this out, you to want use realistically sized tables.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Index on a NULL-value

2005-05-30 Thread Bruno Wolff III

On Tue, May 31, 2005 at 11:21:20 +0800,
  Tobias Brox [EMAIL PROTECTED] wrote:
 [Tobias Brox - Tue at 11:02:07AM +0800]
  test=# explain select * from mock where b is NULL;
   QUERY PLAN 
  
   Seq Scan on mock (cost=1.00..10020.00 rows=6 width=8)
 Filter: (b IS NULL)
  (2 rows)
 
 (...)
 
  ---(end of broadcast)---
  TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
 
 That tip helped me :-)
 
 test=# explain select * from mock where (b IS NULL)=true;
   QUERY PLAN  
 
 --
  Index Scan using b_is_null on mock  (cost=0.00..4.68 rows=1 width=8)
Index Cond: ((b IS NULL) = true)
 (2 rows)

Looked back at your first example and saw that you didn't use a partial
index which is why you had to contort things to make it possible to
use an indexed search. (Though the planner really should have done this
since all of the rows should be in one disk block and doing an index
scan should require doing more disk reads than a sequential scan for
the test case you used.)

You want something like this:
CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL;

The advantage is that the index can be a lot smaller than an index over all
of the rows in the case where only a small fraction of rows have a null value
for b. (If this isn't the case you probably don't want the index.)

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Index on a NULL-value

2005-05-30 Thread Bruno Wolff III

On Tue, May 31, 2005 at 11:31:58 +0800,
  Tobias Brox [EMAIL PROTECTED] wrote:
 [Tobias Brox]
  test=# set enable_seqscan=off; 
 
 [Bruno Wolff III - Mon at 10:16:53PM -0500]
  It isn't surprising that an index wasn't used since a sequential scan is
  going to be faster in your test case.
  
  If you want to test this out, you to want use realistically sized tables.
 
 Wrong.  In this case I was not wondering about the planners choise of not
 using the index, but the fact that the planner could not find the index at
 all.  Reproducing it on a simple table in a test environment was a valid
 strategy to solve this specific problem.

I missed that you turned sequential scans off for your test.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread Bruno Wolff III

On Tue, May 24, 2005 at 08:36:36 -0700,
  mark durrant [EMAIL PROTECTED] wrote:
 
 --MSSQL's ability to hit the index only and not having
 to go to the table itself results in a _big_
 performance/efficiency gain. If someone who's in
 development wants to pass this along, it would be a
 nice addition to PostgreSQL sometime in the future.
 I'd suspect that as well as making one query faster,
 it would make everything else faster/more scalable as
 the server load is so much less.

This gets brought up a lot. The problem is that the index doesn't include
information about whether the current transaction can see the referenced
row. Putting this information in the index will add significant overhead
to every update and the opinion of the developers is that this would be
a net loss overall.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Can anyone explain this: duplicate dbs.

2005-05-24 Thread Bruno Wolff III

On Tue, May 24, 2005 at 21:39:15 -0500,
  John A Meinel [EMAIL PROTECTED] wrote:
 
 By the way, I think doing:
 
 CREATE DATABASE tempdb WITH TEMPLATE = originaldb;
 
 Is a much faster way of doing dump and load. I *think* it would recreate
 indexes, etc. If it just does a copy it may not show the dump/restore
 improvement.

You need to be careful when doing this. See section 18.3 of the 8.0 docs
for caveats.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Bruno Wolff III

On Tue, May 10, 2005 at 08:02:50 -0700,
  Adam Haberlach [EMAIL PROTECTED] wrote:
 
 
 With all the Opteron v. Xeon around here, and talk of $30,000 machines,
 perhaps it would be worth exploring the option of buying 10 cheapass
 machines for $300 each.  At the moment, that $300 buys you, from Dell, a
 2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit ethernet.
 The aggregate CPU and bandwidth is pretty stupendous, but not as easy to
 harness as a single machine.

That isn't going to be ECC ram. I don't think you really want to use
non-ECC ram in a critical database.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] COPY vs INSERT

2005-05-06 Thread Bruno Wolff III

On Fri, May 06, 2005 at 01:51:29 -0500,
  Jim C. Nasby [EMAIL PROTECTED] wrote:
 On Wed, May 04, 2005 at 10:22:56PM -0400, Tom Lane wrote:
  Also, there is a whole lot of one-time-per-statement overhead that can
  be amortized across many rows instead of only one.  Stuff like opening
  the target table, looking up the per-column I/O conversion functions,
  identifying trigger functions if any, yadda yadda.  It's not *that*
  expensive, but compared to an operation as small as inserting a single
  row, it's significant.
 
 Has thought been given to supporting inserting multiple rows in a single
 insert? DB2 supported:
 
 INSERT INTO table VALUES(
 (1,2,3),
 (4,5,6),
 (7,8,9)
 );
 
 I'm not sure how standard that is or if other databases support it.

It's on the TODO list. I don't remember anyone bringing this up for about
a year now, so I doubt anyone is actively working on it.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Question on REINDEX

2005-04-18 Thread Bruno Wolff III

On Mon, Apr 18, 2005 at 12:21:42 -0700,
  Bill Chandler [EMAIL PROTECTED] wrote:
 
 Running PostgreSQL 7.4.2 on Solaris.
 
 1) When is it necessary to run REINDEX or drop/create
 an index?  All I could really find in the docs is:
 
 In some situations it is worthwhile to rebuild
 indexes periodically with the REINDEX command. (There
 is also contrib/reindexdb which can reindex an entire
 database.) However, PostgreSQL 7.4 has substantially
 reduced the need for this activity compared to earlier
 releases.

In pathologic cases it is possible to have a lot of empty space on a lot
of your index pages. Reindexing would change that to a smaller number.
In earlier versions, I think it was possible to have completely empty
pages and this happened for patterns of use (new values monotonically
increasing, oldest values deleted first) that were actually seen in
practice.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Question on vacuumdb

2005-04-18 Thread Bruno Wolff III

On Mon, Apr 18, 2005 at 12:27:08 -0700,
  Bill Chandler [EMAIL PROTECTED] wrote:
 All,
 
 If I run the command vacuumdb mydb I understand that
 it does some disk space recovery (but not as much as
 vacuumdb --full mydb).  

You are better off not using vacuum full unless some unusual event has
bloated your database. By running normal vacuums often enough (and with
a large enough fsm setting) your database should reach a steady state size.

 Question: if I run the command vacuumdb --analyze
 mydb does it still do the aforementioned disk space
 recovery AS WELL AS update query planning statistics? 
 Or are those two completely separate operations
 requiring separate invocations of 'vacuumdb'.

It is better to do both with one command.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)

2005-04-07 Thread Bruno Wolff III

On Wed, Apr 06, 2005 at 18:09:37 -0400,
  Tom Lane [EMAIL PROTECTED] wrote:
 
 Can anyone suggest a more general rule?  Do we need for example to
 consider whether the relation membership is the same in two clauses
 that might be opposite sides of a range restriction?  It seems like
 
   a.x  b.y AND a.x  b.z
 
 probably can be treated as a range restriction on a.x for this purpose,
 but I'm much less sure that the same is true of
 
   a.x  b.y AND a.x  c.z
 
 Thoughts?

I think it makes sense to guess that a smaller fraction of the rows will
be returned when a column value is bounded above and below than if it
is only bounded on one side, even if the bounds aren't fixed. You can
certainly be wrong. The difference between this and the normal case is that
column statistics aren't normally going to be that useful.

If date/time ranges are the common use for this construct, it might be better
to create date and/or time range types that use rtree or gist indexes.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Delete query takes exorbitant amount of time

2005-04-03 Thread Bruno Wolff III

On Tue, Mar 29, 2005 at 01:48:48 -0700,
  Karim A Nassar [EMAIL PROTECTED] wrote:
 
 For this FK check, there only need be one referring id to invalidate the
 delete. ISTM that for any delete with a FK reference, the index could
 always be used to search for a single value in the referring table
 (excepting very small tables). Why then must a sequential scan be
 performed in this case, and/or in general? 

First the index needs to exist. It isn't created automatically because not
everyone wants such an index. Second, you need to have analyzed the
referencing table so that the planner will know it is big enough that
using an indexed search is worthwhile. The planner is getting better
about dealing with size changes without reanalyzing, but it seems there
are still some gotchas in 8.0.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] coalesce alternative

2005-04-03 Thread Bruno Wolff III

On Tue, Mar 29, 2005 at 14:21:13 +0300,
  ALÝ ÇELÝK [EMAIL PROTECTED] wrote:
 I have used coalesce function for null fields but coalesce is too slow.
 I need fast alternative for coalesce

It is unlikely that coalesce is your problem. People might be able to provide
some help if you provide EXPLAIN ANALYZE output and the actual query for your
slow query.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

1 2 >

1 - 100 of 156 matches

Mail list logo