from:"Michael Fuhr"

Re: [PERFORM] Question about CLUSTER

2008-02-11 Thread Michael Fuhr

On Mon, Feb 11, 2008 at 03:33:37PM -0600, Scott Marlowe wrote:
 On Feb 11, 2008 2:03 PM, salman [EMAIL PROTECTED] wrote:
  I'm planning to cluster a few large tables in our database but I'm
  unable to find any recommendations/documentation on best practices --
  Mainly, whether it's better to use an index which has a higher idx_scan
  value, a higher idx_tup_read value, or the higest idx_tup_fetch value.
 
  I'm assuming that idx_tup_read would probably be the best choice, but
  want to get other opinions before proceeding.
 
 If you've got two indexes that are both being hit a lot, it might be
 worth looking into their correlation, and if they get used a lot
 together, look at creating an index on both.
 
 But I'd guess that idx_tup_read would be a good bet.

You might also consider the ratio idx_tup_read::float8 / idx_scan
to see which indexes access a lot of rows per scan.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Questions about enabling SSL

2008-02-11 Thread Michael Fuhr

On Mon, Feb 11, 2008 at 05:37:51PM -0700, Michael Fuhr wrote:
 On Mon, Feb 11, 2008 at 04:58:35PM -0700, fabrix peñuelas wrote:
  If ssl is enable  in postgresql decreanse the performance of the database?
  How much?
 
 The performance impact of an encrypted connection depends on how
 expensive the queries are and how much data they return.

Another consideration is how much time you spend using each connection
vs. how much time it takes to establish each connection.  A thousand
simple queries over the same encrypted connection might be significantly
faster than running each query over a separate unencrypted connection,
which in turn will probably be significantly faster than using
separate encrypted connections that must each carry out a relatively
expensive key establishment.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Questions about enabling SSL

2008-02-11 Thread Michael Fuhr

On Mon, Feb 11, 2008 at 04:58:35PM -0700, fabrix peñuelas wrote:
 If ssl is enable  in postgresql decreanse the performance of the database?
 How much?

The performance impact of an encrypted connection depends on how
expensive the queries are and how much data they return.  A query
that joins several tables and aggregates millions of rows might
take several seconds or minutes to run and return only a few rows;
for such a query the impact of an encrypted connection is insignificant.
But if you make many queries that run quickly and return large
result sets then you might indeed notice the impact of an encrypted
connection vs. a non-encrypted connection.  The most reliable way
to assess the impact would be to run representative queries over
your data and measure the difference yourself.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Query taking too long. Problem reading explain output.

2007-10-03 Thread Michael Fuhr

On Wed, Oct 03, 2007 at 10:03:53AM +0200, Henrik wrote:
 I have a little query that takes too long and what I can see in  the 
 explain output is a seq scan on my biggest table ( tbl_file_structure) 
 which I can't explain why.

Here's where almost all of the time is taken:

 Hash Join  (cost=8605.68..410913.87 rows=19028 width=40) (actual 
 time=22.810..16196.414 rows=17926 loops=1)
   Hash Cond: (tbl_file_structure.fk_file_id = tbl_file.pk_file_id)
   -  Seq Scan on tbl_file_structure  (cost=0.00..319157.94 rows=16591994 
 width=16) (actual time=0.016..7979.083 rows=16591994 loops=1)
   -  Hash  (cost=8573.62..8573.62 rows=2565 width=40) (actual 
 time=22.529..22.529 rows=2221 loops=1)
 -  Bitmap Heap Scan on tbl_file  (cost=74.93..8573.62 rows=2565 
 width=40) (actual time=1.597..20.691 rows=2221 loops=1)
   Filter: (lower((file_name)::text) ~~ 'index.php%'::text)
   -  Bitmap Index Scan on tbl_file_idx  (cost=0.00..74.28 
 rows=2565 width=0) (actual time=1.118..1.118 rows=2221 loops=1)
 Index Cond: ((lower((file_name)::text) ~=~ 
 'index.php'::character varying) AND (lower((file_name)::text) ~~ 
 'index.phq'::character varying))

Does tbl_file_structure have an index on fk_file_id?  If so then
what's the EXPLAIN ANALYZE output if you set enable_seqscan to off?
I don't recommend disabling sequential scans permanently but doing
so can be useful when investigating why the planner thinks one plan
will be faster than another.

What are your settings for random_page_cost, effective_cache_size,
work_mem, and shared_buffers?  If you're using the default
random_page_cost of 4 then what's the EXPLAIN ANALYZE output if you
reduce it to 3 or 2 (after setting enable_seqscan back to on)?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] one column from huge view

2007-07-12 Thread Michael Fuhr

On Thu, Jul 12, 2007 at 09:50:42AM +0100, Heikki Linnakangas wrote:
 Marcin Stępnicki wrote:
 Let's say I've got a view with 100 columns and 1mln rows; some of them are
 calculated on the fly. For some reason I want only one column from
 this view:
 
 select col1 from huge_view;
 
 Now, does PostgreSQL skip all the calculations from other columns and
 executes this query faster then select * from huge_view?
 
 In simple cases, yes. But for example, if you have a LEFT OUTER JOIN in 
 the view, the join is performed even if your query doesn't return any 
 columns from the outer relation. Also, if the calculation contains 
 immutable functions, it's not skipped.

Don't you mean if the calculation contains VOLATILE functions,
it's not skipped?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Query Analyser

2007-07-11 Thread Michael Fuhr

On Tue, Jul 10, 2007 at 08:17:05PM +0530, Gauri Kanekar wrote:
 Is there anyway so as to indicate the Query Analyser not to use the
 plan which it is using regularly, and use a new plan ?

You can't dictate the query plan but you can influence the planner's
decisions with various configuration settings.

http://www.postgresql.org/docs/8.2/interactive/runtime-config-query.html

Disabling planner methods (enable_seqscan, etc.) should be a last
resort -- before doing so make sure that settings like shared_buffers
and effective_cache_size are appropriately sized for your system,
that you're gathering enough statistics (see below), and that the
statistics are current (run ANALYZE or VACUUM ANALYZE).  After all
that, if you still think you need to disable a planner method then
consider posting the query and the EXPLAIN ANALYZE output to
pgsql-performance to see if anybody has other suggestions.

 From where do the Query Analyser gets the all info to prepare a plan?
 Is it only from the pg_statistics table or are there anyother tables
 which have this info. stored?

The planner also uses pg_class.{reltuples,relpages}.

http://www.postgresql.org/docs/8.2/interactive/planner-stats.html
http://www.postgresql.org/docs/8.2/interactive/planner-stats-details.html

 And can we change the statistic??

You can increase the amount of statistics gathered for a specific
column with ALTER TABLE SET STATISTICS or system-wide by adjusting
default_statistics_target.

http://www.postgresql.org/docs/8.2/interactive/sql-altertable.html
http://www.postgresql.org/docs/8.2/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-OTHER

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Delete Cascade FK speed issue

2007-07-03 Thread Michael Fuhr

On Tue, Jul 03, 2007 at 08:05:27AM +0200, Patric de Waha wrote:
 Is there a way I can find out what postgres does, and where it hangs 
 around, so I know where the FK might not be indexed. (The dbase is
 to big to analyze it by hand).

You could query the system catalogs to look for foreign key constraints
that don't have an index on the referencing column(s).  Something like
the following should work for single-column foreign keys:

select n1.nspname,
   c1.relname,
   a1.attname,
   t.conname,
   n2.nspname as fnspname,
   c2.relname as frelname,
   a2.attname as fattname
  from pg_constraint t
  join pg_attribute  a1 on a1.attrelid = t.conrelid and a1.attnum = t.conkey[1]
  join pg_class  c1 on c1.oid = t.conrelid
  join pg_namespace  n1 on n1.oid = c1.relnamespace
  join pg_class  c2 on c2.oid = t.confrelid
  join pg_namespace  n2 on n2.oid = c2.relnamespace
  join pg_attribute  a2 on a2.attrelid = t.confrelid and a2.attnum = 
t.confkey[1]
 where t.contype = 'f'
   and not exists (
 select 1
   from pg_index i
  where i.indrelid = t.conrelid
and i.indkey[0] = t.conkey[1]
   )
 order by n1.nspname,
  c1.relname,
  a1.attname;

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] pg_statistic doesnt contain details for specific table

2007-06-11 Thread Michael Fuhr

On Mon, Jun 11, 2007 at 02:28:32PM +0530, Nimesh Satam wrote:
 We have already used analyze command on the table.
 We have also ran the vacuum analyze command.
 
 But they are not helping.

Is there any data in the table?  What does ANALYZE VERBOSE or VACUUM
ANALYZE VERBOSE show for this table?  Is there any chance that
somebody set all of the columns' statistics targets to zero?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] pg_statistic doesnt contain details for specific table

2007-06-11 Thread Michael Fuhr

On Mon, Jun 11, 2007 at 07:22:24PM +0530, Nimesh Satam wrote:
 INFO:  analyzing public.cam_attr
 INFO:  cam_attr: scanned 103 of 103 pages, containing 11829 live rows and
 0 dead rows; 6000 rows in sample, 11829 estimated total rows

Looks reasonable.

 Also how do we check if the statistics are set to Zero for the table?

SELECT attname, attstattarget
  FROM pg_attribute
 WHERE attrelid = 'public.cam_attr'::regclass
   AND attnum  0
   AND NOT attisdropped;

If nobody has changed the statistics targets then they're all
probably -1.  Negative attstattarget values mean to use the system
default, which you can see with:

SHOW default_statistics_target;

How exactly are you determining that no statistics are showing up
for this table?  Are you running a query like the following?

SELECT *
  FROM pg_stats
 WHERE schemaname = 'public' AND tablename = 'cam_attr';

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] upgraded to pgsql 8.2.4, getting worse performance then 7.4.x

2007-06-02 Thread Michael Fuhr

On Sat, Jun 02, 2007 at 09:13:32AM -0400, Douglas J Hunley wrote:
 Our 'esteemed' Engr group recently informed a customer that in their testing, 
 upgrading to 8.2.x improved the performance of our J2EE 
 application approximately 20%, so of course, the customer then tasked me 
 with upgrading them. We dumped their db, removed pgsql, installed the 8.2.4 
 rpms from postgresql.org, did an initdb, and the pg_restored their data. It's 
 been about a week now, and the customer is complaining that in their testing, 
 they are seeing a 30% /decrease/ in general performance.

After the restore, did you ANALYZE the entire database to update
the planner's statistics?  Have you enabled autovacuum or are you
otherwise vacuuming and analyzing regularly?  What kind of queries
are slower than desired?  If you post an example query and the
EXPLAIN ANALYZE output then we might be able to see if the slowness
is due to query plans.

A few differences between the configuration files stand out.  The
7.4 file has the following settings:

  shared_buffers = 25000
  sort_mem = 15000
  effective_cache_size = 196608

The 8.2 config has:

  #shared_buffers = 32MB
  #work_mem = 1MB
  #effective_cache_size = 128MB

To be equivalent to the 7.4 config the 8.2 config would need:

  shared_buffers = 195MB
  work_mem = 15000kB
  effective_cache_size = 1536MB

With 8GB of RAM you might try increasing shared_buffers to 400MB - 800MB
(less if the entire database isn't that big) and effective_cache_size
to 5GB - 6GB.  You might have to increase the kernel's shared memory
settings before increasing shared_buffers.

Some of the other settings are the same between the configurations
but deserve discussion:

  fsync = off

Disabling fsync is dangerous -- are all parties aware of the risk
and willing to accept it?  Has the risk been weighed against the
cost of upgrading to a faster I/O subsystem?  How much performance
benefit are you realizing by disabling fsync?  What kind of activity
led to the decision to disable fynsc?  Are applications doing
anything like executing large numbers of insert/update/delete
statements outside of a transaction block when they could be done
in a single transaction?

  commit_delay = 2
  commit_siblings = 3

What kind of activity led to the above settings?  Are they a guess
or were they determined empirically?  How much benefit are they
providing and how did you measure that?

  enable_mergejoin = off
  geqo = off

I've occasionally had to tweak planner settings but I prefer to do
so for specific queries instead of changing them server-wide.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Help with TOAST Compression

2007-04-23 Thread Michael Fuhr

On Tue, Apr 17, 2007 at 04:13:36PM -0500, David Hinkle wrote:
 I have a table where I store email,  the bodies are mostly kept in a
 toast table.The toast table is 940 Meg in size.   The whole database
 is about 1.2 Gig in size.   When I back the database up using pg_dump in
 custom output mode, I pipe the output into gzip.   My backups are only
 about 600 meg in size.   From this, I assume the that toe toast table
 isn't getting compressed.

How are you measuring the toast table and database sizes?  Have you
taken indexes and uncompressible data and metadata into account?
The database compresses only certain data, whereas when you pipe a
dump into gzip you get compression on the entire dump.

Some of the space might be taken up by dead rows and unused item
pointers.  How often do you vacuum?  What does VACUUM VERBOSE
tablename show?

 Is there any way I can tell for sure if the messages from this column
 are being stored compressed?

You could look at a hex/ascii dump of the base and toast tables --
you might see runs of legible text but it should be obvious where
the data is compressed.  See the TOAST section in the documentation
for more information about how and when data is compressed:

http://www.postgresql.org/docs/8.2/interactive/storage-toast.html

Note that The TOAST code is triggered only when a row value to be
stored in a table is wider than BLCKSZ/4 bytes (normally 2 kB).
And I'm no expert at compression algorithms but it's possible that
the fairly simple and very fast member of the LZ family of compression
techniques isn't as space-efficient as the algorithm that gzip
uses (LZ77 according to its manual page).  Maybe one of the developers
can comment.

 I know I can set the compression settings using the ALTER TABLE
 ALTER SET STORAGE syntax, but is there a way I can see what this
 value is currently set to?

You could query pg_attribute.attstorage:

http://www.postgresql.org/docs/8.2/interactive/catalog-pg-attribute.html

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] postgres 8.2 seems to prefer Seq Scan

2007-04-06 Thread Michael Fuhr

On Fri, Apr 06, 2007 at 04:38:33PM -0400, Alex Deucher wrote:
 One more anomaly between 7.4 and 8.2.  DB dumped from 7.4 and loaded
 onto 8.2, both have locale set to C.  8.2 seems to prefer Seq Scans
 for the first query while the ordering in the second query seems to
 perform worse on 8.2.  I ran analyze.  I've tried with the encoding
 set to UTF-8 and SQL_ASCII; same numbers and plans.  Any ideas how to
 improve this?

Are you sure the data sets are identical?  The 7.4 query returned
0 rows; the 8.2 query returned 1 row.  If you're running the same
query against the same data in both versions then at least one of
them appears to be returning the wrong result.  Exactly which
versions of 7.4 and 8.2 are you running?

Have you analyzed all tables in both versions?  The row count
estimate in 7.4 is much closer to reality than in 8.2:

7.4
   -  Index Scan using pnum_idx on event  (cost=0.00..3.37 rows=19
 width=172) (actual time=0.063..0.063 rows=0 loops=1)
 Index Cond: ((pnum)::text = 'AB5819188'::text)

8.2
   -  Index Scan using pnum_idx on event  (cost=0.00..3147.63
 rows=1779 width=171) (actual time=0.030..0.033 rows=1 loops=1)
 Index Cond: ((pnum)::text = 'AB5819188'::text)

If analyzing the event table doesn't improve the row count estimate
then try increasing the statistics target for event.pnum and analyzing
again.  Example:

ALTER TABLE event ALTER pnum SET STATISTICS 100;
ANALYZE event;

You can set the statistics target as high as 1000 to get more
accurate results at the cost of longer ANALYZE times.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Wrong plan sequential scan instead of an index one

2007-03-30 Thread Michael Fuhr

On Fri, Mar 30, 2007 at 12:08:26PM +0200, Gaetano Mendola wrote:
 Claus Guttesen wrote:
  Try changing random_page_cost from the default 4 to 2 in postgresql.conf:
  
  random_page_cost = 2
 
 I have tuned that number already at 2.5, lowering it to 2 doesn't change
 the plan.

The following 19-fold overestimate is influencing the rest of the
plan:

  -  Seq Scan on l_pvcp  (cost=0.00..2.17 rows=19 width=4) (actual 
time=0.066..0.081 rows=1 loops=1)
Filter: (value ~~* '%pi%'::text)

Have you tried increasing the statistics target on l_pvcp.value?
I ran your queries against canned data in 8.2.3 and better statistics
resulted in more accurate row count estimates for this and other
parts of the plan.  I don't recall if estimates for non-leading-character
matches in earlier versions can benefit from better statistics.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Wrong plan sequential scan instead of an index one

2007-03-30 Thread Michael Fuhr

On Fri, Mar 30, 2007 at 04:46:11AM -0600, Michael Fuhr wrote:
 Have you tried increasing the statistics target on l_pvcp.value?
 I ran your queries against canned data in 8.2.3 and better statistics
 resulted in more accurate row count estimates for this and other
 parts of the plan.  I don't recall if estimates for non-leading-character
 matches in earlier versions can benefit from better statistics.

This might work only in 8.2.  I see the following in the Release Notes:

* Improve the optimizer's selectivity estimates for LIKE, ILIKE,
  and regular expression operations (Tom)

-- 
Michael Fuhr

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Nested Loop

2007-03-26 Thread Michael Fuhr

On Mon, Mar 26, 2007 at 05:34:39PM +0530, Gauri Kanekar wrote:
 how to speedup nested loop queries and by which parameters.

Please post a query you're trying to tune and the EXPLAIN ANALYZE
output, as well as any changes you've already made in postgresql.conf
or configuration variables you've set in a particular session.
Without more information we can't give much advice other than to
make sure you're vacuuming and analyzing the tables often enough
to keep them from becoming bloated with dead rows and to keep the
statistics current, and to review a configuration checklist such
as this one:

http://www.powerpostgresql.com/PerfList

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Performance of count(*)

2007-03-22 Thread Michael Fuhr

On Thu, Mar 22, 2007 at 01:29:46PM +0100, Andreas Kostyrka wrote:
 * Andreas Tille [EMAIL PROTECTED] [070322 13:24]:
  Well, to be honest I'm not really interested in the performance of
  count(*).  I was just discussing general performance issues on the
  phone line and when my colleague asked me about the size of the
  database he just wonderd why this takes so long for a job his
  MS-SQL server is much faster.  So in principle I was just asking
  a first question that is easy to ask.  Perhaps I come up with
  more difficult optimisation questions.
 
 Simple. MSSQL is optimized for this case, and uses older
 datastructures. PG uses a MVCC storage, which is not optimized for
 this usecase. It's quite fast for different kinds of queries.

Ask about performing concurrent selects, inserts, updates, and
deletes in SQL Server and about the implications on ACID of locking
hints such as NOLOCK.  Then consider how MVCC handles concurrency
without blocking or the need for dirty reads.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Execution plan changed after upgrade from 7.3.9 to 8.2.3

2007-03-13 Thread Michael Fuhr

On Tue, Mar 13, 2007 at 09:19:47AM +0100, [EMAIL PROTECTED] wrote:
 Is there an option in the 8.2.3 to change in order to have the same 
 execution plan than before ?

Let's see if we can figure out why 8.2.3 is choosing a bad plan.
Have you run ANALYZE on the tables in 8.2.3?  Could you post the
query and the complete output of EXPLAIN ANALYZE (preferably without
wrapping) for both versions?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Not Picking Index

2007-02-16 Thread Michael Fuhr

On Fri, Feb 16, 2007 at 06:26:51PM +0530, Gauri Kanekar wrote:
 I want to run a Select Query on a table. But i dont want the query to pick a
 index defined on that table.
 
 So can i instruct the planner not to pick that index.

Why don't you want the planner to use the index?  Is there a specific
index you want to ignore or do you want the planner to ignore all
indexes?  What problem are you trying to solve?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Moving a tablespace

2006-08-22 Thread Michael Fuhr

On Tue, Aug 22, 2006 at 06:16:54PM -0700, Craig A. James wrote:
 Is there a way to move a tablespace to a new location without a 
 dump/restore?  I, er, this hypothetical guy, knows he can move it and put a 
 symbolic link in for /disk2, but this is somewhat unsatisfactory since 
 /disk2 would have to exist forever.

The last paragraph of the Tablespaces documentation might be helpful:

http://www.postgresql.org/docs/8.1/interactive/manage-ag-tablespaces.html

The directory $PGDATA/pg_tblspc contains symbolic links that point
to each of the non-built-in tablespaces defined in the cluster.
Although not recommended, it is possible to adjust the tablespace
layout by hand by redefining these links.  Two warnings: do not do
so while the postmaster is running; and after you restart the
postmaster, update the pg_tablespace catalog to show the new
locations.  (If you do not, pg_dump will continue to show the old
tablespace locations.)

I just tested this and it appeared to work, but this hypothetical
DBA might want to wait for others to comment before proceeding.  He
might also want to initdb and populate a test cluster and practice
the procedure before doing it for real.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Big differences in plans between 8.0 and 8.1

2006-07-15 Thread Michael Fuhr

On Sat, Jul 15, 2006 at 04:14:11PM +0200, Gabriele Turchi wrote:
 Hi all. I have a strange (and serious) problem with an application
 ported from postgres 8.0 to 8.1.
 
 The old installation is postgresql-8.0.4-2.FC4.1 running on a Fedora 4,
 the new one is postgresql-8.1.4-1.FC5.1 running on a fedora 5.
 
 Some query is now _very_ slow. I've found some deep differences between
 query plans.

Have you run ANALYZE in 8.1?  Some of the row count estimates in
the 8.1 plan differ significantly from the actual number of rows
returned, while in the 8.0 plan the estimates are accurate.  For
example, in one case the 8.0 plan shows 349 rows estimated, 349
rows returned:

 -  Seq Scan on registrazioni (cost=0.00..11.98 rows=349 width=19) (actual 
time=0.029..2.042 rows=349 loops=1)
   Filter: (date((now() - '02:00:00'::interval)) = data)

but the 8.1 plan shows 2 rows estimated, 349 rows returned:

 -  Seq Scan on registrazioni (cost=0.00..11.98 rows=2 width=44) (actual 
time=0.025..2.315 rows=349 loops=1)
   Filter: (date((now() - '02:00:00'::interval)) = data)

This suggests that the 8.1 statistics are out of date, possibly
because ANALYZE or VACUUM ANALYZE hasn't been run since the data
was loaded.  Try running ANALYZE in 8.1 and post the new plans if
that doesn't help.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] hyper slow after upgrade to 8.1.4

2006-07-13 Thread Michael Fuhr

On Thu, Jul 13, 2006 at 08:22:46AM -0500, Medora Schauer wrote:
 Can it be that the connection delay is because first an IPv6 socket is
 trying to be established and when that fails an IPv4 socket is created?

A sniffer like tcpdump or ethereal might reveal why connecting is
so slow.  The problem might be with DNS queries for  (IPv6)
records prior to queries for A (IPv4) records; see this thread from
almost a year ago:

http://archives.postgresql.org/pgsql-general/2005-08/msg00216.php

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Michael Fuhr

On Tue, Jun 13, 2006 at 06:22:07PM -0400, Tom Lane wrote:
 Mischa Sandberg [EMAIL PROTECTED] writes:
  vmstat showed that it was swapping like crazy.
  Dropped shared_buffers back down again. 
  Swapping stopped.
 
 Does Solaris have any call that allows locking a shmem segment in RAM?

The Solaris 9 shmctl manpage mentions this token:

  SHM_LOCK
Lock the shared memory segment specified by shmid in
memory. This command can be executed only by a process
that has an effective user ID equal to super-user.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Posrgres speed problem

2006-06-12 Thread Michael Fuhr

On Mon, Jun 12, 2006 at 04:38:57PM +0200, Ruben Rubio Rey wrote:
 I have two similar servers, one in production and another for testing 
 purposes.
 Databases are equal (with a difference of some hours)
 
 In the testing server, an sql sentence takes arround 1 sec.
 In production server (low server load) takes arround 50 secs, and uses 
 too much resources.
 
 Explain analyze takes too much load, i had to cancel it!

The EXPLAIN ANALYZE output would be helpful, but if you don't want
to run it to completion then please post the output of EXPLAIN
ANALYZE for the fast system and EXPLAIN (without ANALYZE) for the
slow one.

As someone else asked, are you running ANALYZE regularly?  What
about VACUUM?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] lowering priority automatically at connection

2006-05-25 Thread Michael Fuhr

On Thu, May 25, 2006 at 06:16:24PM +0200, Chris Mair wrote:
 I find this very helpful:
 
   Lowering the priority of a PostgreSQL query
   http://weblog.bignerdranch.com/?p=11
 
 Now I was wondering whether one could have a
   SELECT pg_setpriority(10);
 executed automatically each time a certain user
 connects (not necessarily using psql)?

Beware that setting priorities can have unintended, adverse effects.
Use a search engine to find information about priority inversion
before deciding that query priorities are a good idea.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Adding and filling new column on big table

2006-05-16 Thread Michael Fuhr

On Wed, May 17, 2006 at 03:19:26AM +0200, Jonathan Blitz wrote:
 I have a table of about 500,000 rows. 
 
 I need to add a new column and populate it.
 
 So, I have tried to run the following command. The command never finishes (I
 gave up after about and hour and a half!).

If you install contrib/pgstattuple you can figure out how fast the
update is running.  Run SELECT * FROM pgstattuple('mytable') a
few times and note the rate at which dead_tuple_count is increasing.
If it's not increasing at all then query pg_locks and look for locks
where granted is false.

I created a test table, populated it with 500,000 rows of random
data, and ran the update you posted.  On a 500MHz Pentium III with
512M RAM and a SCSI drive from the mid-to-late 90s, running PostgreSQL
8.1.3 on FreeBSD 6.1, the update finished in just over two minutes.
The table had one index (the primary key).

 Note that none of the columns have indexes.

Do you mean that no columns in the table have indexes?  Or that the
columns referenced in the update don't have indexes but that other
columns do?  What does \d mytable show?  Do other tables have
foreign key references to this table?  What non-default settings
do you have in postgresql.conf?  What version of PostgreSQL are you
running and on what platform?  How busy is the system?  What's the
output of EXPLAIN UPDATE mytable ...?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Array performance

2006-03-24 Thread Michael Fuhr

On Fri, Mar 24, 2006 at 07:06:19AM -0600, Jim C. Nasby wrote:
 On Fri, Mar 24, 2006 at 02:01:29PM +0100, Ruben Rubio Rey wrote:
  mmm ... i have read in postgres documentation that null values on arrays 
  are not supported ...
 
 Damn, you're right. Another reason I tend to stay away from them...

8.2 will support NULL array elements.

http://archives.postgresql.org/pgsql-committers/2005-11/msg00385.php
http://developer.postgresql.org/docs/postgres/arrays.html

test= SELECT '{1,2,NULL,3,4}'::integer[];
  int4  

 {1,2,NULL,3,4}
(1 row)

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Help optimizing a slow index scan

2006-03-17 Thread Michael Fuhr

On Fri, Mar 17, 2006 at 11:41:11PM -0500, Tom Lane wrote:
 Dan Harris [EMAIL PROTECTED] writes:
  Furthermore, by doing so, I am tying my queries directly to 
  postgres-isms.  One of the long term goals of this project is to be 
  able to fairly transparently support any ANSI SQL-compliant back end 
  with the same code base.
 
 Unfortunately, there isn't any portable or standard (not exactly the
 same thing ;-)) SQL functionality for dealing gracefully with
 two-dimensional searches, which is what your lat/long queries are.

The OpenGIS Simple Features Specification[1] is a step in that
direction, no?  PostGIS[2], MySQL[3], and Oracle Spatial[4] implement
to varying degrees.  With PostGIS you do have to add non-standard
operators to a query's predicate to benefit from GiST indexes on
spatial columns, but the rest of the query can be straight out of
the SQL and OGC standards.

[1] http://www.opengeospatial.org/docs/99-049.pdf
[2] http://www.postgis.org/
[3] http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html
[4] http://www.oracle.com/technology/products/spatial/index.html

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] PG Statistics

2006-03-13 Thread Michael Fuhr

On Mon, Mar 13, 2006 at 06:49:39PM -0500, mcelroy, tim wrote:
 Does anyone know how much of a performance hit turning stats_block_level and
 stats_row_level on will incur?  Do both need to be on to gather cache
 related statistics?  I know the annotated_conf_80 document states to only
 turn them on for debug but if they're not that performance intensive I
 cannot see the harm.

I ran some tests a few months ago and found that stats_command_string
had a significant impact, whereas stats_block_level and stats_row_level
were almost negligible.  Here are my test results:

http://archives.postgresql.org/pgsql-performance/2005-12/msg00307.php

Your results may vary.  If you see substantially different results
then please post the particulars.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Process Time X200

2006-03-10 Thread Michael Fuhr

On Fri, Mar 10, 2006 at 08:11:44AM +0100, NbForYou wrote:
 As you can see the query isn't useful anymore because of the
 processtime. Please Also notice that both systems use a different
 query plan.
 Also on the webhost we have a loop of 162409 (403 rows * 403 rows).
 Both systems also use a different postgresql version. But I cannot
 believe that the performance difference between 1 version could be
 this big regarding self outer join queries!

What versions are both servers?  I'd guess that the webhost is using
7.3 or earlier and you're using 7.4 or later.  I created a table
like yours, populated it with test data, and ran your query on
several versions of PostgreSQL.  I saw the same horrible plan on
7.3 and the same good plan on later versions.  The 7.4 Release Notes
do mention improvements in query planning; apparently one of those
improvements is making the difference.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Query time

2006-03-10 Thread Michael Fuhr

On Fri, Mar 10, 2006 at 11:05:57AM +0100, Ruben Rubio Rey wrote:
 Sometimes some big query takes between 5 to 15 seconds. It happens 
 sometimes all the day it does not depend if database is busy.
 
 I have measured that sentence in 15 - 70 ms in normal circunstances.

Is it the *exact* same query, including the values you're querying
for?  The same query with different values can run with different
plans depending on row count estimates.  It might be useful to see
the query string and the EXPLAIN ANALYZE output for a fast query
and a slow one.

How many tables are you querying?  Might you be hitting geqo_threshold
(default 12)?  If so then the following thread might be helpful:

http://archives.postgresql.org/pgsql-performance/2006-01/msg00132.php

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Can anyone explain this pgbench results?

2006-03-06 Thread Michael Fuhr

On Mon, Mar 06, 2006 at 07:46:05PM +0100, Joost Kraaijeveld wrote:
 Michael Fuhr wrote:
  What's your setting?  

 Default.

Have you tweaked postgresql.conf at all?  If so, what non-default
settings are you using?

  Are your test results more consistent
  if you execute CHECKPOINT between them?

 Could you tell me how I could do that?

Connect to the database as a superuser and execute a CHECKPOINT
statement.

http://www.postgresql.org/docs/8.1/interactive/sql-checkpoint.html

From the command line you could do something like

psql -c checkpoint
pgbench -c 10 -t 150 test
psql -c checkpoint
pgbench -c 10 -t 150 test
psql -c checkpoint
pgbench -c 10 -t 150 test

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Can anyone explain this pgbench results?

2006-03-06 Thread Michael Fuhr

[Please copy the mailing list on replies.]

On Mon, Mar 06, 2006 at 09:38:20PM +0100, Joost Kraaijeveld wrote:
 Michael Fuhr wrote:
  Have you tweaked postgresql.conf at all?  If so, what non-default
  settings are you using? 
 
 Yes, I have tweaked the following settings:
 
 shared_buffers = 4
 work_mem = 512000
 maintenance_work_mem = 512000
 max_fsm_pages = 4
 effective_cache_size = 131072

Are you sure you need work_mem that high?  How did you decide on
that value?  Are all other settings at their defaults?  No changes
to the write ahead log (WAL) or background writer (bgwriter) settings?
What version of PostgreSQL are you running?  The paths in your
original message suggest 8.1.x.

  Are your test results more consistent
  psql -c checkpoint
  pgbench -c 10 -t 150 test
  psql -c checkpoint
  pgbench -c 10 -t 150 test
  psql -c checkpoint
  pgbench -c 10 -t 150 test

 OK, that leads to a consistant hight score. I also noticed that
 psql -c checkpoint results in I/O on the database partition but
 not on the partition that has the logfiles (pg_xlog directory). Do
 you know if that how it should be?

A checkpoint updates the database files with the data from the
write-ahead log; you're seeing those writes to the database partition.
The postmaster does checkpoints every checkpoint_timeout seconds
(default 300) or every checkpoint_segment log segments (default 3);
it also uses a background writer to trickle pages to the database
files between checkpoints so the checkpoints don't have as much
work to do.  I've been wondering if your pgbench runs are being
affected by that background activity; the fact that you get
consistently good performance after forcing a checkpoint suggests
that that might be the case.

If you run pgbench several times without intervening checkpoints,
do your postmaster logs have any messages like checkpoints are
occurring too frequently?  It might be useful to increase
checkpoint_warning up to the value of checkpoint_timeout and then
see if you get any such messages during pgbench runs.  If checkpoints
are happening a lot more often than every checkpoint_timeout seconds
then try increasing checkpoint_segments (assuming you have the disk
space).  After doing so, restart the database and run pgbench several
times without intervening checkpoints and see if performance is
more consistent.

Note that tuning PostgreSQL for pgbench performance might be
irrelevant for your actual needs unless your usage patterns happen
to resemble what pgbench does.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] SQL Function Performance

2006-02-14 Thread Michael Fuhr

On Tue, Feb 14, 2006 at 11:33:57AM +0200, Adnan DURSUN wrote:
 -  Nested Loop  (cost=5.90..267.19 rows=3 width=101) (actual 
 time=76.240..30974.777 rows=63193 loops=1)
   -  Nested Loop  (cost=5.90..123.48 rows=26 width=73) (actual 
 time=32.082..4357.786 rows=14296 loops=1)

A prepared query is planned before the parameters' values are known,
so the planner can't take full advantage of column statistics to
estimate row counts.  The planner must therefore decide on a plan
that should be reasonable in most cases; apparently this isn't one
of those cases, as the disparity between estimated and actual rows
shows.  Maybe Tom (one of the core developers) can comment on whether
anything can be done to improve the plan in this case.

Absent a better solution, you could write a PL/pgSQL function and
build the query as a text string, then EXECUTE it.  That would give
you a new plan each time, one that can take better advantage of
statistics, at the cost of having to plan the query each time you
call the function (but you probably don't care about that cost
as long as the overall results are better).  Here's an example:

CREATE FUNCTION fooquery(qval text) RETURNS SETOF foo AS $$
DECLARE
rowfoo%ROWTYPE;
query  text;
BEGIN
query := 'SELECT * FROM foo WHERE val = ' || quote_literal(qval);

FOR row IN EXECUTE query LOOP
RETURN NEXT row;
END LOOP;

RETURN;
END;
$$ LANGUAGE plpgsql STABLE STRICT;

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] SQL Function Performance

2006-02-13 Thread Michael Fuhr

On Mon, Feb 13, 2006 at 07:57:07PM -0500, Tom Lane wrote:
 Adnan DURSUN [EMAIL PROTECTED] writes:
  EXPLAIN ANALYZE EXECUTE stmt (...);
 
 Here is the EXPLAIN ANALYZE output for prepared statement :
 
 This is exactly the same as the other plan --- you did not parameterize
 the query.  To see what's going on, you need to insert PREPARE
 parameters in the places where the function uses plpgsql variables.

Actually it was an SQL function, but that also does PREPARE/EXECUTE,
right?

Adnan, what Tom is saying is that I requested this (simplified):

PREPARE stmt (integer) AS SELECT * FROM foo WHERE id = $1;
EXPLAIN ANALYZE EXECUTE stmt (12345);

but instead you appear to have done this:

PREPARE stmt AS SELECT * FROM foo WHERE id = 12345;
EXPLAIN ANALYZE EXECUTE stmt;

We can tell because if you had done it the first way (parameterized)
then the EXPLAIN ANALYZE output would have shown the parameters as
$1, $2, $3, etc., which it didn't.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] SQL Function Performance

2006-02-12 Thread Michael Fuhr

On Sun, Feb 12, 2006 at 10:25:28PM +0200, Adnan DURSUN wrote:
 My database has an SQL function. The result comes in 30-40 seconds
 when i use the SQL function. On the other hand; The result comes
 300-400 milliseconds when i run the SQL statement. Any idea ??

Have you analyzed the tables?  If that's not the problem then could
you post the EXPLAIN ANALYZE output for the direct query and for a
prepared query?  For the prepared query do this:

PREPARE stmt (varchar, date, int4, varchar, varchar) AS SELECT ... ;

where ... is the same SQL as in the function body, including the
numbered parameters ($1, $2, etc.).  To execute the query do this:

EXPLAIN ANALYZE EXECUTE stmt (...);

Where ... is the same parameter list you'd pass to the function
(the same values you used in the direct query).

If you need to re-prepare the query then run DEALLOCATE stmt
before doing so.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Sequential scan being used despite indexes

2006-01-31 Thread Michael Fuhr

On Tue, Jan 31, 2006 at 07:29:51PM -0800, Joshua D. Drake wrote:
  Any ideas?

 What does explain analyze say?

Also, have the tables been vacuumed and analyzed?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Sequential scan being used despite indexes

2006-01-31 Thread Michael Fuhr

On Wed, Feb 01, 2006 at 01:33:08PM +0900, James Russell wrote:
 Reading about this issue further in the FAQ, it seems that I should ensure
 that Postgres has adequate and accurate information about the tables in
 question by regularly running VACUUM ANALYZE, something I don't do
 currently.

Many people use a cron job (or the equivalent) to run VACUUM ANALYZE
at regular intervals; some also use the pg_autovacuum daemon, which
is a contrib module in 8.0 and earlier and part of the backend as of
8.1.

How often to vacuum/analyze depends on usage.  Once per day is
commonly cited, but busy tables might need it more often than that.
Just recently somebody had a table that could have used vacuuming
every five minutes or less (all records were updated every 30
seconds); pg_autovacuum can be useful in such cases.

 I disabled SeqScan as per the FAQ, and it indeed was a lot slower so
 Postgres was making the right choice in this case.

The planner might be making the right choice given the statistics
it has, but it's possible that better statistics would lead to a
different plan, perhaps one where an index scan would be faster.

What happens if you run VACUUM ANALYZE on all the tables, then run
the query again with EXPLAIN ANALYZE?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Incorrect Total runtime Reported by Explain Analyze!?

2006-01-26 Thread Michael Fuhr

On Thu, Jan 26, 2006 at 04:49:59PM +, Richard Huxton wrote:
 Jozsef Szalay wrote:
 I have seen it on occasion that the total runtime reported by explain
 analyze was much higher than the actual time the query needed to
 complete. The differences in my case ranged between 20-120 seconds. I'm
 just curious if anyone else has experienced this and whether there is
 something that I can do to convince explain analyze to report the
 execution time of the query itself rather than the time of its own
 execution. Engine version is 8.1.1.
 
 I think it's down to all the gettime() calls that have to be made to 
 measure how long each stage of the query takes. In some cases these can 
 take a substantial part of the overall query time.

Another possibility is that the total query time was indeed that
long because the query was blocked waiting for a lock.  For example:

T1: BEGIN;
T2: BEGIN;
T1: SELECT * FROM foo WHERE id = 1 FOR UPDATE;
T2: EXPLAIN ANALYZE UPDATE foo SET x = x + 1 WHERE id = 1;
T1: (do something for a long time)
T1: COMMIT;

When T2's EXPLAIN ANALYZE finally returns it'll show something like
this:

test= EXPLAIN ANALYZE UPDATE foo SET x = x + 1 WHERE id = 1;
  QUERY PLAN
   
---
 Index Scan using foo_pkey on foo  (cost=0.00..3.92 rows=1 width=14) (actual 
time=0.123..0.138 rows=1 loops=1)
   Index Cond: (id = 1)
 Total runtime: 31926.304 ms
(3 rows)

SELECT queries can be blocked by operations that take an Access
Exclusive lock, such as CLUSTER, VACUUM FULL, or REINDEX.  Have you
ever examined pg_locks during one of these queries to look for
ungranted locks?

If this weren't 8.1 I'd ask if you had any triggers (including
foreign key constraints), whose execution time EXPLAIN ANALYZE
doesn't show in earlier versions.  For example:

8.1.2:
test= EXPLAIN ANALYZE DELETE FROM foo WHERE id = 1;
  QUERY PLAN
  
--
 Index Scan using foo_pkey on foo  (cost=0.00..3.92 rows=1 width=6) (actual 
time=0.136..0.154 rows=1 loops=1)
   Index Cond: (id = 1)
 Trigger for constraint bar_fooid_fkey: time=1538.054 calls=1
 Total runtime: 1539.732 ms
(4 rows)

8.0.6:
test= EXPLAIN ANALYZE DELETE FROM foo WHERE id = 1;
  QUERY PLAN
  
--
 Index Scan using foo_pkey on foo  (cost=0.00..3.92 rows=1 width=6) (actual 
time=0.124..0.147 rows=1 loops=1)
   Index Cond: (id = 1)
 Total runtime: 1746.173 ms
(3 rows)

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Inconsistant query plan

2006-01-24 Thread Michael Fuhr

On Tue, Jan 24, 2006 at 04:15:57PM -0700, Daniel Gish wrote:
 We are running Postgresql 8.1, and getting dramatically inconsistant results
 after running VACUUM ANALYZE.  Sometimes after analyzing the database, the
 query planner chooses a very efficient plan (15 rows, 4.744 ms), and
 sometimes a terrible one (24 rows, 3536.995 ms).  Here's the abbreviated
 query:
 
 SELECT * FROM t1 INNER JOIN (t2 INNER JOIN (t3 INNER JOIN t4 ON t3.gid =
 t4.gid) ON t3.gid = t2.gid) ON t2.eid = t1.eid WHERE ...

How abbreviated is that example?  Are you actually joining more
tables than that?  In another recent thread varying plans were
attributed to exceeding geqo_threshold:

http://archives.postgresql.org/pgsql-performance/2006-01/msg00132.php

Does your situation look similar?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Use of Stored Procedures and

2006-01-18 Thread Michael Fuhr

On Tue, Jan 17, 2006 at 09:04:53AM +, Marcos wrote:
 I already read the documentation for to use the SPI_PREPARE and
 SPI_EXEC... but sincerely I don't understand how I will use this
 resource in my statements.

What statements?  What problem are you trying to solve?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] insert without oids

2006-01-13 Thread Michael Fuhr

On Fri, Jan 13, 2006 at 03:10:11PM -0500, Michael Stone wrote:
 Are there plans on updating the insert API for the post-OID world?

Are you looking for this TODO item?

* Allow INSERT/UPDATE ... RETURNING new.col or old.col

  This is useful for returning the auto-generated key for an INSERT.
  One complication is how to handle rules that run as part of the
  insert.

http://www.postgresql.org/docs/faqs.TODO.html

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Index isn't used during a join.

2006-01-11 Thread Michael Fuhr

On Wed, Jan 11, 2006 at 12:56:55AM -0700, Michael Fuhr wrote:
 WHERE ...
   AND doy = EXTRACT(doy FROM now() - '24 hour'::interval)
   AND doy = EXTRACT(doy FROM now())

To work on 1 Jan this should be more like

WHERE ...
  AND (doy = EXTRACT(doy FROM now() - '24 hour'::interval) OR
   doy = EXTRACT(doy FROM now()))

In any case the point is to add conditions to the WHERE clause that
will use an index on the table for which you're currently getting
a sequential scan.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Index isn't used during a join.

2006-01-11 Thread Michael Fuhr

On Wed, Jan 11, 2006 at 08:02:37AM -0700, Robert Creager wrote:
 The query is wrong as stated, as it won't work when the interval
 crosses a year boundary, but it's a stop gap for now.

Yeah, I realized that shortly after I posted the original and posted
a correction.

http://archives.postgresql.org/pgsql-performance/2006-01/msg00104.php

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Showing Column Statistics Number

2006-01-11 Thread Michael Fuhr

On Wed, Jan 11, 2006 at 04:05:18PM -0600, Dave Dutcher wrote:
 I've looked around through the docs, but can't seem to find an answer to
 this.  If I change a column's statistics with Alter table alter column
 set statistics n, is there a way I can later go back and see what the
 number is for that column?  I want to be able to tell which columns I've
 changed the statistics on, and which ones I haven't.

pg_attribute.attstattarget

http://www.postgresql.org/docs/8.1/interactive/catalog-pg-attribute.html

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] indexes on primary and foreign keys

2006-01-11 Thread Michael Fuhr

On Thu, Jan 12, 2006 at 10:26:58AM +0900, Michael Glaesemann wrote:
 On Jan 12, 2006, at 9:36 , K C Lau wrote:
 NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index  
 cities_pkey for table cities
 
 Is there a way to suppress this notice when I create tables in a  
 script?
 
 Set[1] your log_min_messages to WARNING or higher[2].

Or client_min_messages, depending on where you don't want to see
the notice.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Index isn't used during a join.

2006-01-10 Thread Michael Fuhr

On Tue, Jan 10, 2006 at 10:10:55PM -0700, Robert Creager wrote:
 The query is now correct, but still is slow because of lack of
 index usage.  I don't know how to structure the query correctly to
 use the index.

Have you tried adding restrictions on doy in the WHERE clause?
Something like this, I think:

WHERE ...
  AND doy = EXTRACT(doy FROM now() - '24 hour'::interval)
  AND doy = EXTRACT(doy FROM now())

Something else occurred to me: do you (or will you) have more than
one year of data?  If so then matching on doy could be problematic
unless you also check for the year, or unless you want to match
more than one year.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Index isn't used during a join.

2006-01-09 Thread Michael Fuhr

On Mon, Jan 09, 2006 at 09:23:38PM -0700, Robert Creager wrote:
 I'm working with a query to get more info out with a join.  The base
 query works great speed wise because of index usage.  When the join is
 tossed in, the index is no longer used, so the query performance tanks.

The first query you posted returns 285 rows and the second returns
over one million; index usage aside, that difference surely accounts
for a performance penalty.  And as is often pointed out, index scans
aren't always faster than sequential scans: the more of a table a
query has to fetch, the more likely a sequential scan will be faster.

Have the tables been vacuumed and analyzed?  The planner's estimates
for windspeed are pretty far off, which could be affecting the query
plan:

 -  Sort  (cost=12997.68..13157.98 rows=64120 width=28) (actual 
 time=2286.155..2286.450 rows=284 loops=1)
   Sort Key: date_part('doy'::text, 
 unmunge_time(windspeed.time_group))
   -  Seq Scan on windspeed  (cost=0.00..7878.18 rows=64120 
 width=28) (actual time=2279.275..2285.271 rows=284 loops=1)
 Filter: (unmunge_time(time_group)  (now() - 
 '24:00:00'::interval))

That's a small amount of the total query time, however, so although
an index scan might help it probably won't provide the big gain
you're looking for.

Have you done any tests with enable_seqscan disabled?  That'll show
whether an index or bitmap scan would be faster.  And have you
verified that the join condition is correct?  Should the query be
returning over a million rows?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Performance problems with 8.1.1 compared to 7.4.7

2005-12-27 Thread Michael Fuhr

On Tue, Dec 27, 2005 at 05:09:28PM +0100, Albert Cervera Areny wrote:
   However, now we have a query that is much slower with 8.1 compared to 
 7.4. 
 The query lasts 7minutes (all the times we try) with 8.1, keeping CPU usage 
 at 93~97% while it lasts 25 seconds in 7.4 the first time going down to 4 
 seconds the following tries.
   We're not experts at all but we can't see anything strange with the 
 differences of EXPLAIN in the queries. Below I paste the query and the 
 EXPLAIN output.

Could you post the EXPLAIN ANALYZE output of the query on both
systems?  That'll show how accurate the planner's estimates are.

Have you run ANALYZE (or VACUUM ANALYZE) on the tables in both
versions?  The row count estimates in the 8.1.1 query differ from
those in the 7.4.7 query.  Are the two versions using the same data
set?

Are your configuration settings the same in both versions?  You
mentioned increasing work_mem, but what about others like
effective_cache_size, random_page_cost, and shared_buffers?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Wrong index used when ORDER BY LIMIT 1

2005-12-21 Thread Michael Fuhr

On Wed, Dec 21, 2005 at 07:03:00PM +0100, Sz?cs Gbor wrote:
 Version: 7.4.6
[...]
 Query is:
 SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1.
 
 I expected the planner to choose the index on muvelet, then sort by idopont.
 Instead, it took the other index.

I think the planner is guessing that since you're ordering on
idopont, scanning the idopont index will find the first matching
row faster than using the muvelet index would.  In many cases that's
a good bet, but in this case the guess is wrong and you end up with
a suboptimal plan.

I just ran some tests with 8.1.1 and it chose the better plan for
a query similar to what you're doing.  One of the developers could
probably explain why; maybe it's because of the changes that allow
better use of multicolumn indexes.  Try 8.1.1 if you can and see
if you get better results.

 -- workaround 2: quite ugly but seems to work (at least for this
 -- one test case):
 # explain analyze
   select idopont from
   (select idopont from muvelet_vonalkod
where muvelet=6859 order by idopont) foo
   order by idopont limit 1;

Another workaround is to use OFFSET 0 in the subquery.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Speed of different procedural language

2005-12-21 Thread Michael Fuhr

On Wed, Dec 21, 2005 at 12:06:47PM +0100, Steinar H. Gunderson wrote:
 On Sun, Dec 18, 2005 at 01:10:21AM -, Ben Trewern wrote:
  I know I should be writing these in C but that's a bit beyond me.  I was 
  going to try PL/Python or PL/Perl or even PL/Ruby.  Has anyone any idea 
  which language is fastest, or is the data access going to swamp the 
  overhead 
  of small functions?
 
 I'm not sure if it's what you ask for, but there _is_ a clear difference
 between the procedural languages -- I've had a 10x speed increase from
 rewriting PL/PgSQL stuff into PL/Perl, for instance.

The difference is clear only in specific cases; just because you
saw a 10x increase in some cases doesn't mean you can expect that
kind of increase, or indeed any increase, in others.  I've seen
PL/pgSQL beat all other PL/* challengers handily many times,
especially when the function does a lot of querying and looping
through large result sets.

I tend to use PL/pgSQL except in cases where PL/pgSQL can't do what
I want or the job would be much easier in another language (e.g.,
string manipulation, for which I'd use PL/Perl or PL/Ruby).  Even
then I might use the other language only to write small functions
that a PL/pgSQL function could call.

As Merlin suggested, maybe Ben could tell us what he wants to do
that he thinks should be written in C or a language other than
PL/pgSQL.  Without knowing what problem is to be solved it's near
impossible to recommend an appropriate tool.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Speed of different procedural language

2005-12-21 Thread Michael Fuhr

On Wed, Dec 21, 2005 at 10:38:10PM +0100, Steinar H. Gunderson wrote:
 On Wed, Dec 21, 2005 at 02:24:42PM -0700, Michael Fuhr wrote:
  The difference is clear only in specific cases; just because you
  saw a 10x increase in some cases doesn't mean you can expect that
  kind of increase, or indeed any increase, in others. I've seen
  PL/pgSQL beat all other PL/* challengers handily many times,
  especially when the function does a lot of querying and looping
  through large result sets.
 
 That's funny, my biggest problems with PL/PgSQL have been (among others)
 exactly with large result sets...

Out of curiosity, do you have a simple test case?  I'd be interested
in seeing what you're doing in PL/pgSQL that's contradicting what
I'm seeing.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Speed of different procedural language

2005-12-21 Thread Michael Fuhr

On Thu, Dec 22, 2005 at 02:08:23AM +0100, Steinar H. Gunderson wrote:
 On Wed, Dec 21, 2005 at 03:10:28PM -0700, Michael Fuhr wrote:
  That's funny, my biggest problems with PL/PgSQL have been (among others)
  exactly with large result sets...
  Out of curiosity, do you have a simple test case?  I'd be interested
  in seeing what you're doing in PL/pgSQL that's contradicting what
  I'm seeing.
 
 I'm not sure if I have the code anymore (it was under 7.4 or 8.0), but it was
 largely scanning through ~2 million rows once, noting differences from the
 previous rows as it went.
 
 In that case, I didn't benchmark against any of the other PL/* languages, but
 it was pretty clear that even on a pretty speedy Opteron, it was CPU bound,
 which it really shouldn't have been.

Try looping through two million rows with PL/Perl or PL/Tcl and
you'll probably see significantly worse performance than with
PL/pgSQL -- so much worse that I'd be surprised to see those languages
make up the difference with whatever processing they'd be doing for
each row unless it was something they're particularly good at and
PL/pgSQL is particularly bad at.

In 8.1 PL/Perl has a couple of ways to fetch query results:
spi_exec_query to fetch all the rows at once into a single data
structure, and spi_query/spi_fetchrow to fetch the rows one at a
time.  In my tests with one million rows, spi_exec_query was around
8 times slower than a loop in PL/pgSQL, not to mention requiring a
lot of memory.  spi_query/spi_fetchrow was about 25 times slower
but didn't require the amount of memory that spi_exec_query did.
A PL/Tcl function that used spi_exec was about 10 times slower than
PL/pgSQL, or only slightly slower than PL/Perl and spi_exec_query.

If you didn't benchmark the two million row query, do you have an
example that you did benchmark?  I don't doubt that PL/Perl and
other langauges can do some things faster than PL/pgSQL, but looping
through large result sets doesn't seem to be one of them.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] make bulk deletes faster?

2005-12-18 Thread Michael Fuhr

On Sat, Dec 17, 2005 at 09:10:40PM -0800, James Klo wrote:
 I'd like some suggestions on how to get the deletes to happen faster, as 
 while deleting individually appears to extremely fast, when I go to 
 delete lots of rows the operation takes an extremely long time to 
 complete (5000 rows takes about 3 minutes, 100 rows takes almost 
 close to 4 hours or more depending upon server load; wall time btw).

Those times do seem excessive -- do any other tables have foreign
key references to the table you're deleting from?  If so, do those
tables have indexes on the referring columns?  Does this table or
any referring table have triggers?  Also, are you regularly vacuuming
and analyzing your tables?  Have you examined pg_locks to see if
an unacquired lock might be slowing things down?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] How much expensive are row level statistics?

2005-12-15 Thread Michael Fuhr

On Mon, Dec 12, 2005 at 10:20:45PM -0500, Tom Lane wrote:
 Given the rather lackadaisical way in which the stats collector makes
 the data available, it seems like the backends are being much too
 enthusiastic about posting their stats_command_string status
 immediately.  Might be worth thinking about how to cut back the
 overhead by suppressing some of these messages.

Would a GUC setting akin to log_min_duration_statement be feasible?
Does the backend support, or could it be easily modified to support,
a mechanism that would post the command string after a configurable
amount of time had expired, and then continue processing the query?
That way admins could avoid the overhead of posting messages for
short-lived queries that nobody's likely to see in pg_stat_activity
anyway.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] How much expensive are row level statistics?

2005-12-12 Thread Michael Fuhr

On Mon, Dec 12, 2005 at 01:33:27PM -0500, Merlin Moncure wrote:
  The cost depends on your usage patterns.  I did tests with one of
  my applications and saw no significant performance difference for
  simple selects, but a series of insert/update/delete operations ran
  about 30% slower when block- and row-level statistics were enabled
  versus when the statistics collector was disabled.
 
 That approximately confirms my results, except that the penalty may even
 be a little bit higher in the worst-case scenario.  Row level stats hit
 the hardest if you are doing 1 row at a time operations over a
 persistent connection.

That's basically how the application I tested works: it receives
data from a stream and performs whatever insert/update/delete
statements are necessary to update the database for each chunk of
data.  Repeat a few thousand times.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] How much expensive are row level statistics?

2005-12-12 Thread Michael Fuhr

On Mon, Dec 12, 2005 at 10:23:42AM -0300, Alvaro Herrera wrote:
 Michael Fuhr wrote:
  The cost depends on your usage patterns.  I did tests with one of
  my applications and saw no significant performance difference for
  simple selects, but a series of insert/update/delete operations ran
  about 30% slower when block- and row-level statistics were enabled
  versus when the statistics collector was disabled.
 
 This series of i/u/d operations ran with no sleep in between, right?
 I wouldn't expect a normal OLTP operation to be like this.  (If it is
 you have a serious shortage of hardware ...)

There's no sleeping but there is some client-side processing between
groups of i/u/d operations.  As I mentioned in another message, the
application reads a chunk of data from a stream, does a few i/u/d
operations to update the database, and repeats several thousand times.

The hardware is old but it's adequate for this application.  What
kind of overhead would you expect?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] How much expensive are row level statistics?

2005-12-12 Thread Michael Fuhr

On Mon, Dec 12, 2005 at 06:01:01PM -0500, Tom Lane wrote:
 IIRC, the only significant cost from enabling stats is the cost of
 transmitting the counts to the stats collector, which is a cost
 basically paid once at each transaction commit.  So short transactions
 will definitely have more overhead than longer ones.  Even for a really
 simple transaction, though, 30% seems high --- the stats code is
 designed deliberately to minimize the penalty.

Now there goes Tom with his skeptical eye again, and here comes me
saying oops again.  Further tests show that for this application
the killer is stats_command_string, not stats_block_level or
stats_row_level.  Here are timings for the same set of operations
(thousands of insert, update, and delete statements in one transaction)
run under various settings:

stats_command_string = off
stats_block_level = off
stats_row_level = off
time: 2:09.46

stats_command_string = off
stats_block_level = on
stats_row_level = off
time: 2:12.28

stats_command_string = off
stats_block_level = on
stats_row_level = on
time: 2:14.38

stats_command_string = on
stats_block_level = off
stats_row_level = off
time: 2:50.58

stats_command_string = on
stats_block_level = on
stats_row_level = on
time: 2:53.76

[Wanders off, swearing that he ran these tests before and saw higher
penalties for block- and row-level statistics.]

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] How much expensive are row level statistics?

2005-12-11 Thread Michael Fuhr

On Sun, Dec 11, 2005 at 11:53:36AM +, Carlos Benkendorf wrote:
 I would like to use autovacuum but is not too much expensive
 collecting row level statistics?

The cost depends on your usage patterns.  I did tests with one of
my applications and saw no significant performance difference for
simple selects, but a series of insert/update/delete operations ran
about 30% slower when block- and row-level statistics were enabled
versus when the statistics collector was disabled.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Executing a shell command from a PG function

2005-12-10 Thread Michael Fuhr

On Sat, Dec 10, 2005 at 04:55:56PM +0100, Yves Vindevogel wrote:
 Is it possible to run a shell script, passing values of fields to it, in 
 a Postgres function ?

Not directly from SQL or PL/pgSQL functions, but you can execute
shell commands with the untrusted versions of PL/Perl, PL/Tcl,
PL/Python, etc.  There's even a PL/sh:

http://pgfoundry.org/projects/plsh/

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Stored Procedure

2005-11-22 Thread Michael Fuhr

On Tue, Nov 22, 2005 at 07:29:37PM +0100, Yves Vindevogel wrote:
 Is there another way in PG to return a recordset from a function than 
 to declare a type first ? 

In 8.1 some languages support OUT and INOUT parameters.

CREATE FUNCTION foo(IN x integer, INOUT y integer, OUT z integer) AS $$
BEGIN
y := y * 10;
z := x * 10;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

SELECT * FROM foo(1, 2);
 y  | z  
+
 20 | 10
(1 row)

CREATE FUNCTION fooset(IN x integer, INOUT y integer, OUT z integer) 
RETURNS SETOF record AS $$
BEGIN
y := y * 10;
z := x * 10;
RETURN NEXT;
y := y + 1;
z := z + 1;
RETURN NEXT;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

SELECT * FROM fooset(1, 2);
 y  | z  
+
 20 | 10
 21 | 11
(2 rows)

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Stored Procedure

2005-11-22 Thread Michael Fuhr

On Tue, Nov 22, 2005 at 11:20:09PM +0100, Yves Vindevogel wrote:
 8.1, hmm, that's brand new.   

Yes, but give it a try, at least in a test environment.  The more
people use it, the more we'll find out if it has any problems.

 But, still, it's quite some coding for a complete recordset, not ? 

How so?  The examples I posted are almost identical to how you'd
return a composite type created with CREATE TYPE or SETOF that type,
except that you declare the return columns as INOUT or OUT parameters
and you no longer have to create a separate type.  If you're referring
to how I wrote two sets of assignments and RETURN NEXT statements,
you don't have to do it that way: you can use a loop, just as you
would with any other set-returning function.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Perl DBD and an alarming problem

2005-11-17 Thread Michael Fuhr

On Thu, Nov 17, 2005 at 01:04:21PM -0800, Craig A. James wrote:
 When I set statement_timeout in the config file, it just didn't do anything 
 - it never timed out (PG 8.0.3).  I finally found in the documentation that 
 I can do set statement_timeout = xxx from PerlDBI on a per-client basis, 
 and that works.

You probably shouldn't set statement_timeout on a global basis
anyway, but did you reload the server after you made the change?
Setting statement_timeout in postgresql.conf and then reloading the
server works here in 8.0.4.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Perl DBD and an alarming problem

2005-11-17 Thread Michael Fuhr

[Please copy the mailing list on replies.]

On Thu, Nov 17, 2005 at 05:38:13PM -0800, Craig A. James wrote:
 You probably shouldn't set statement_timeout on a global basis
 anyway
 
 The server is a one trick pony so setting a global timeout value is 
 actually appropriate.

Beware that statement_timeout also applies to maintenance commands
like VACUUM; it might be more appropriate to set per-user timeouts
with ALTER USER.  If you do set a global timeout then you might
want to set a per-user timeout of 0 for database superusers so
maintenance activities don't get timed out.

 ... but did you reload the server after you made the change?
 Setting statement_timeout in postgresql.conf and then reloading the
 server works here in 8.0.4.
 
 Yes.  By reload I assume you mean restarting it from scratch.

Either a restart or a pg_ctl reload, which sends a SIGHUP to the
server.  You can effect some changes by sending a signal to a running
server without having to restart it entirely.

 In this case, I use
 
/etc/init.d/postgresql restart
 
 It definitely had no effect at all.  I tried values clear down to 1 
 millisecond, but the server never timed out for any query.

Did you use SHOW statement_timeout to see if the value was set
to what you wanted?  Are you sure you edited the right file?  As a
database superuser execute SHOW config_file to see what file the
server is using.  What exactly did the line look like after you
changed it?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Perl DBD and an alarming problem

2005-11-16 Thread Michael Fuhr

On Wed, Nov 16, 2005 at 12:59:21PM -0800, Craig A. James wrote:
   eval {
  local $SIG{ALRM} = sub {die(Timeout);};
  $time = gettimeofday;
  alarm 20;
  $sth = $dbh-prepare(a query that may take a long time...);
  $sth-execute();
  alarm 0;
   };
   if ($@  $@ =~ /Timeout/) {
  my $elapsed = gettimeofday - $time;
  print Timed out after $elapsed seconds;
   }
 
 Now the mystery: It works, but it hardly matters what time I use for the 
 alarm call, the actual alarm event always happens at 26 seconds.  I can set 
 alarm 1 or alarm 20, and it almost always hits right at 26 seconds.

High-level languages' signal handlers don't always work well with
low-level libraries.  I haven't dug into the Perl source code but
I'd guess that since only certain things are safe to do in a signal
handler, Perl's handler simply sets some kind of state that the
interpreter will examine later during normal execution.  If you're
using only Perl facilities then that probably happens fairly timely,
but if you're stuck in a low-level library (e.g., libpq) then you
might have to wait until that library returns control to Perl before
Perl recognizes that a signal occurred.

As an example, if I run code such as yours with alarm(2) and a query
that takes 5 seconds, I see the following in a process trace (from
ktrace/kdump on FreeBSD):

55395 perl 0.000978 CALL  poll(0xbfbfe1b8,0x1,0x)
55395 perl 1.996629 RET   poll -1 errno 4 Interrupted system call
55395 perl 0.13 PSIG  SIGALRM caught handler=0x281be22c mask=0x0 
code=0x0
55395 perl 0.50 CALL  sigprocmask(0x1,0,0x805411c)
55395 perl 0.05 RET   sigprocmask 0
55395 perl 0.20 CALL  sigreturn(0xbfbfde60)
55395 perl 0.07 RET   sigreturn JUSTRETURN
55395 perl 0.19 CALL  poll(0xbfbfe1b8,0x1,0x)
55395 perl 3.004065 RET   poll 1
55395 perl 0.24 CALL  recvfrom(0x3,0x81c6000,0x4000,0,0,0)
55395 perl 0.16 GIO   fd 3 read 60 bytes

The poll() call is interrupted by SIGALRM after 2 seconds but then
it starts again and doesn't return until the query completes after
the remaining 3 seconds.  Only sometime later does Perl invoke the
ALRM handler I installed, presumably because it can't do so until
the low-level code returns control to Perl.

Is there a reason you're using alarm() in the client instead of
setting statement_timeout on the server?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Searching union views not using indices

2005-11-04 Thread Michael Fuhr

On Fri, Nov 04, 2005 at 12:38:30PM +0100, Michal Taborsky wrote:
 SELECT object_id FROM commonview WHERE link_id=1234567
 
 The result is sequential scan on all tables, append, sort and then 
 filter scan on this whole thing. Which of course is slow as hell. We use 
 version 8.0.2.

I couldn't duplicate this in 8.0.4; I don't know if anything's
changed since 8.0.2 that would affect the query plan.  Could you
post the EXPLAIN ANALYZE output?  It might also be useful to see
the output with enable_seqscan disabled.

Have the tables been vacuumed and analyzed recently?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Effects of cascading references in foreign keys

2005-10-29 Thread Michael Fuhr

 Does an UPDATE of e.g. m_fld1 in t_master cause a 'lookup' in all tables
 which have a cascading update-rule or is this 'lookup' only triggered if
 the referenced column in t_master is explicitly updated?

My tests suggest that a lookup on the referring key is done only
if the referenced key is changed.  Here's an example from 8.1beta4;
I used this version because EXPLAIN ANALYZE shows triggers and the
time spent in them, but I see similar performance characteristics
in earlier versions.  I've intentionally not put an index on the
referring column to make lookups on it slow.

CREATE TABLE foo (id serial PRIMARY KEY, x integer NOT NULL);
CREATE TABLE bar (fooid integer NOT NULL REFERENCES foo ON UPDATE CASCADE);

INSERT INTO foo (x) SELECT * FROM generate_series(1, 10);
INSERT INTO bar (fooid) SELECT * FROM generate_series(1, 10);

ANALYZE foo;
ANALYZE bar;

EXPLAIN ANALYZE UPDATE foo SET x = 1 WHERE id = 10;
  QUERY PLAN
   
---
 Index Scan using foo_pkey on foo  (cost=0.00..3.01 rows=1 width=10) (actual 
time=0.059..0.070 rows=1 loops=1)
   Index Cond: (id = 10)
 Total runtime: 0.633 ms
(3 rows)

EXPLAIN ANALYZE UPDATE foo SET x = 1, id = 20 WHERE id = 10;
  QUERY PLAN
  
--
 Index Scan using foo_pkey on foo  (cost=0.00..3.01 rows=1 width=6) (actual 
time=0.082..0.092 rows=1 loops=1)
   Index Cond: (id = 10)
 Trigger for constraint bar_fooid_fkey: time=232.612 calls=1
 Total runtime: 233.073 ms
(4 rows)

I'm not sure if this is the right place to look, but I see several
places in src/backend/utils/adt/ri_triggers.c with code that looks
like this:

/*
 * No need to do anything if old and new keys are equal
 */
if (ri_KeysEqual(pk_rel, old_row, new_row, qkey,
 RI_KEYPAIR_PK_IDX))
{
heap_close(fk_rel, RowExclusiveLock);
return PointerGetDatum(NULL);
}

 After removing some detail tables which are not longer needed we
 see an improvemed performance so at the moment it _looks_ like each
 update in t_master triggers a 'lookup' in each referencing table
 also if the referenced column (m_id) is not changed.

Do you have statistics enabled?  You might be able to infer what
happens by looking at pg_stat_user_tables or pg_statio_user_tables
before and after an update, assuming that no concurrent activity
is also affecting the statistics.

I suppose there's overhead just from having a foreign key constraint,
and possibly additional overhead for each constraint.  If so then
that might explain at least some of the performance improvement.
Maybe one of the developers will comment.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Effects of cascading references in foreign keys

2005-10-29 Thread Michael Fuhr

On Sat, Oct 29, 2005 at 09:49:47AM -0500, Bruno Wolff III wrote:
 On Sat, Oct 29, 2005 at 08:24:32 -0600, Michael Fuhr [EMAIL PROTECTED] 
 wrote:
  My tests suggest that a lookup on the referring key is done only
  if the referenced key is changed.  Here's an example from 8.1beta4;
  I used this version because EXPLAIN ANALYZE shows triggers and the
  time spent in them, but I see similar performance characteristics
  in earlier versions.  I've intentionally not put an index on the
  referring column to make lookups on it slow.
 
 It looks like this feature was added last May, so I think it only applies
 to 8.1.

Earlier versions appear to have at least some kind of optimization.
Here's a test in 7.3.11 using the same tables I used in 8.1beta4,
although on a slower box.

test= UPDATE foo SET x = 1 WHERE id = 10;
UPDATE 1
Time: 32.18 ms

test= UPDATE foo SET x = 1, id = 20 WHERE id = 10;
UPDATE 1
Time: 4144.95 ms

test= DROP TABLE bar;
DROP TABLE
Time: 240.87 ms

test= UPDATE foo SET x = 1, id = 10 WHERE id = 20;
UPDATE 1
Time: 63.52 ms

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] impact of stats_command_string

2005-10-25 Thread Michael Fuhr

On Thu, Oct 20, 2005 at 01:33:07PM -0700, [EMAIL PROTECTED] wrote:
 If I turn on stats_command_string, how much impact would it have on
 PostgreSQL server's performance during a period of massive data
 INSERTs?

Do you really need to be doing massive data INSERTs?  Can you use
COPY, which is much more efficient for bulk loads?

http://www.postgresql.org/docs/8.0/interactive/populate.html

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] What gets cached?

2005-10-21 Thread Michael Fuhr

On Fri, Oct 21, 2005 at 07:34:30AM -0500, Martin Nickel wrote:
 I don't know of any way to view what is actually cached at any point in time

In 8.1 (currently in beta) you can use contrib/pg_buffercache.  Code
for older versions is available on PgFoundry:

http://pgfoundry.org/projects/pgbuffercache/

Note that pg_buffercache shows only pages in PostgreSQL's buffer
cache; it doesn't show your operating system's cache.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Inefficient escape codes.

2005-10-18 Thread Michael Fuhr

On Tue, Oct 18, 2005 at 06:07:12PM +, Rodrigo Madera wrote:
 1) Is there any way for me to send the binary field directly without needing
 escape codes?

In 7.4 and later the client/server protocol supports binary data
transfer.  If you're programming with libpq you can use PQexecParams()
to send and/or retrieve values in binary instead of text.

http://www.postgresql.org/docs/8.0/interactive/libpq-exec.html#LIBPQ-EXEC-MAIN

APIs built on top of libpq or that implement the protcol themselves
might provide hooks to this capability; check your documentation.
What language and API are you using?

See also COPY BINARY:

http://www.postgresql.org/docs/8.0/interactive/sql-copy.html

 2) Will this mean that the client actually wastes my network bandwidth
 converting binary data to text? Or does the client transparently manage
 this?

Binary transfer sends data in binary, not by automatically converting
to and from text.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Inefficient escape codes.

2005-10-18 Thread Michael Fuhr

[Please copy the mailing list on replies so others can participate
in and learn from the discussion.]

On Tue, Oct 18, 2005 at 07:09:08PM +, Rodrigo Madera wrote:
  What language and API are you using?
 
 I'm using libpqxx. A nice STL-style library for C++ (I am 101% C++).

I've only dabbled with libpqxx; I don't know if or how you can make
it send data in binary instead of text.  See the documentation or
ask in a mailing list like libpqxx-general or pgsql-interfaces.

  Binary transfer sends data in binary, not by automatically converting
  to and from text.
 
 Uh, I'm sorry I didn't get that... If I send: insert into foo
 values('\\001\\002') will libpq send 0x01, 0x02 or 001002??

If you do it that way libpq will send the string as text with escape
sequences; you can use a sniffer like tcpdump or ethereal to see this
for yourself.  To send the data in binary you'd call PQexecParams()
with a query like INSERT INTO foo VALUES ($1).  The $1 is a
placeholder; the other arguments to PQexecParams() provide the data
itself, the data type and length, and specify whether the data is in
text format or binary.  See the libpq documentation for details.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] functions and execution plan caching

2005-10-06 Thread Michael Fuhr

On Thu, Oct 06, 2005 at 08:17:54AM -0500, Kelly Burkhart wrote:
 Given:
 - Processes which are connected to the database for long periods of time
 (transactions are always short).
 - These processes will use some functions to query data.
 - Lots of data is being inserted into tables that these functions query.
 - Vacuums are done frequently.
 
 Am I at risk of degrading performance after some time due to stale
 execution plans?

Yes, because plans are chosen based on the statistics that were
current when the function was first called.  For example, if a
sequential scan made sense when you first called the function, then
subsequent calls will also use a sequential scan.  You can see this
for yourself with a simple test: create a table, populate it with
a handful of records, and call a function that issues a query that
can (but won't necessarily) use an index.  Then add a lot of records
to the table and call the function again.  You'll probably notice
that the function runs slower than the same query run from outside
the function, and that the function runs fast if you recreate it
or call it in a new session.

If you set debug_print_plan to on and client_min_messages to debug1,
then you'll see the plan that the function chose (but only on the
first call to the function).  If you have statistics enabled, then
you can query pg_stat_user_tables and pg_stat_user_indexes to see
whether subsequent calls use sequential or index scans (this should
be done when nobody else is querying the table so the statistics
represent only what you did).

You can avoid cached plans by using EXECUTE.  You'll have to run
tests to see whether the potential gain is worth the overhead.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Queries 15 times slower on 8.1 beta 2 than on 8.0

2005-09-22 Thread Michael Fuhr

On Thu, Sep 22, 2005 at 03:19:05PM -0700, Josh Berkus wrote:
-  Seq Scan on price p  (cost=0.00..11317.75 rows=581475 width=4)
  (actual time=0.004..1143.720 rows=581475 loops=1)
 
 Well, this is your pain point.   Can we see the index scan plan on 8.1?  
 Given that it's *expecting* only one row, I can't understand why it's 
 using a seq scan ...

I've created a simplified, self-contained test case for this:

CREATE TABLE price (
priceid  integer PRIMARY KEY
);

CREATE TABLE supplier (
supplierid  integer PRIMARY KEY
);

CREATE TABLE content (
contentid   integer PRIMARY KEY,
supplierid  integer NOT NULL REFERENCES supplier,
priceid integer NOT NULL REFERENCES price
);

INSERT INTO price (priceid) SELECT * FROM generate_series(1, 5);
INSERT INTO supplier (supplierid) SELECT * FROM generate_series(1, 1);
INSERT INTO content (contentid, supplierid, priceid) VALUES (1, 1, 5);

ANALYZE price;
ANALYZE supplier;
ANALYZE content;

EXPLAIN ANALYZE
SELECT 0
FROM content c
LEFT OUTER JOIN supplier s ON c.supplierid = s.supplierid
LEFT OUTER JOIN price pON c.priceid = p.priceid;

Here's the EXPLAIN ANALYZE from 8.0.3:

 Nested Loop Left Join  (cost=0.00..7.06 rows=1 width=0) (actual 
time=0.180..0.232 rows=1 loops=1)
   -  Nested Loop Left Join  (cost=0.00..4.04 rows=1 width=4) (actual 
time=0.105..0.133 rows=1 loops=1)
 -  Seq Scan on content c  (cost=0.00..1.01 rows=1 width=8) (actual 
time=0.021..0.029 rows=1 loops=1)
 -  Index Scan using supplier_pkey on supplier s  (cost=0.00..3.01 
rows=1 width=4) (actual time=0.052..0.059 rows=1 loops=1)
   Index Cond: (outer.supplierid = s.supplierid)
   -  Index Scan using price_pkey on price p  (cost=0.00..3.01 rows=1 width=4) 
(actual time=0.046..0.055 rows=1 loops=1)
 Index Cond: (outer.priceid = p.priceid)
 Total runtime: 0.582 ms

Here it is from 8.1beta2:

 Merge Right Join  (cost=4.05..1054.06 rows=1 width=0) (actual 
time=676.863..676.895 rows=1 loops=1)
   Merge Cond: (outer.priceid = inner.priceid)
   -  Index Scan using price_pkey on price p  (cost=0.00..925.00 rows=5 
width=4) (actual time=0.035..383.345 rows=5 loops=1)
   -  Sort  (cost=4.05..4.05 rows=1 width=4) (actual time=0.152..0.159 rows=1 
loops=1)
 Sort Key: c.priceid
 -  Nested Loop Left Join  (cost=0.00..4.04 rows=1 width=4) (actual 
time=0.082..0.111 rows=1 loops=1)
   -  Seq Scan on content c  (cost=0.00..1.01 rows=1 width=8) 
(actual time=0.016..0.024 rows=1 loops=1)
   -  Index Scan using supplier_pkey on supplier s  
(cost=0.00..3.01 rows=1 width=4) (actual time=0.039..0.047 rows=1 loops=1)
 Index Cond: (outer.supplierid = s.supplierid)
 Total runtime: 677.563 ms

If we change content's priceid then we get the same plan but faster results:

UPDATE content SET priceid = 1;

 Merge Right Join  (cost=4.05..1054.06 rows=1 width=0) (actual 
time=0.268..0.303 rows=1 loops=1)
   Merge Cond: (outer.priceid = inner.priceid)
   -  Index Scan using price_pkey on price p  (cost=0.00..925.00 rows=5 
width=4) (actual time=0.049..0.061 rows=2 loops=1)
   -  Sort  (cost=4.05..4.05 rows=1 width=4) (actual time=0.187..0.192 rows=1 
loops=1)
 Sort Key: c.priceid
 -  Nested Loop Left Join  (cost=0.00..4.04 rows=1 width=4) (actual 
time=0.099..0.128 rows=1 loops=1)
   -  Seq Scan on content c  (cost=0.00..1.01 rows=1 width=8) 
(actual time=0.025..0.033 rows=1 loops=1)
   -  Index Scan using supplier_pkey on supplier s  
(cost=0.00..3.01 rows=1 width=4) (actual time=0.046..0.053 rows=1 loops=1)
 Index Cond: (outer.supplierid = s.supplierid)
 Total runtime: 0.703 ms

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Queries 15 times slower on 8.1 beta 2 than on 8.0

2005-09-22 Thread Michael Fuhr

On Thu, Sep 22, 2005 at 07:07:41PM -0400, Jean-Pierre Pelletier wrote:
 I don't know if it makes a difference but in my tables,
 content.supplierid and content.priceid were nullable.

That makes no difference in the tests I've done.

Tom Lane says he's found the problem; I expect he'll be committing
a fix shortly.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] CHECK vs REFERENCES

2005-09-10 Thread Michael Fuhr

On Sat, Sep 10, 2005 at 01:03:03AM -0300, Marc G. Fournier wrote:
 On Fri, 9 Sep 2005, Michael Fuhr wrote:
 INSERT INTO test_check SELECT 1 FROM generate_series(1, 10);
 INSERT 0 10
 Time: 3492.344 ms
 
 INSERT INTO test_fk SELECT 1 FROM generate_series(1, 10);
 INSERT 0 10
 Time: 23578.853 ms
 
 Yowch, I expected CHECK to be better ... but not so significantly ... I 
 figured I'd be saving milliseconds, which, on a busy server, would add up 
 fast ... but not 10k' of milliseconds ...

Results will differ depending on the table structure: if you're
indexing ten columns and have five triggers then the foreign key
check will have less of an overall impact.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] CHECK vs REFERENCES

2005-09-09 Thread Michael Fuhr

On Sat, Sep 10, 2005 at 12:23:19AM -0300, Marc G. Fournier wrote:
 Which is faster, where the list involved is fixed?  My thought is that 
 since it doesn't have to check a seperate table, the CHECK itself should 
 be the faster of the two, but I can't find anything that seems to validate 
 that theory ...

Why not just benchmark each method as you intend to use them?  Here's
a simplistic example:

CREATE TABLE test_none (
val  integer NOT NULL
);

CREATE TABLE test_check (
val  integer NOT NULL CHECK (val IN (1, 2, 3, 4, 5))
);

CREATE TABLE test_vals (
id  integer PRIMARY KEY
);
INSERT INTO test_vals SELECT * FROM generate_series(1, 5);

CREATE TABLE test_fk (
val  integer NOT NULL REFERENCES test_vals
);

\timing

INSERT INTO test_none SELECT 1 FROM generate_series(1, 10);
INSERT 0 10
Time: 3109.089 ms

INSERT INTO test_check SELECT 1 FROM generate_series(1, 10);
INSERT 0 10
Time: 3492.344 ms

INSERT INTO test_fk SELECT 1 FROM generate_series(1, 10);
INSERT 0 10
Time: 23578.853 ms

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Advise about how to delete entries

2005-09-02 Thread Michael Fuhr

On Fri, Sep 02, 2005 at 01:43:05PM +0200, Arnau wrote:

  statistic_id | numeric(10,0)| not null default 
 nextval('STATISTICS_OPERATOR_ID_SEQ'::text)

Any reason this column is numeric instead of integer or bigint?

 That contains about 7.000.000 entries and I have to remove 33.000 
 entries. I have created an sql file with all the delete sentences, e.g.:
 
DELETE FROM statistics_sasme WHERE statistic_id = 9832;
 
 then I do \i delete_items.sql. Remove a single entry takes more than 10 
 seconds. What would you do to speed it up?

The referential integrity triggers might be slowing down the delete.
Do you have indexes on all foreign key columns that refer to this
table?  Do all foreign key columns that refer to statistic_id have
the same type as statistic_id (numeric)?  What's the output EXPLAIN
ANALYZE DELETE ...?  Do you vacuum and analyze the tables regularly?
What version of PostgreSQL are you using?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Massive performance issues

2005-09-01 Thread Michael Fuhr

On Thu, Sep 01, 2005 at 06:42:31PM +0100, Matthew Sackman wrote:

 address_pc_top_index btree (postcode_top)
 address_pc_top_middle_bottom_index btree (postcode_top,
  postcode_middle, postcode_bottom)
 address_pc_top_middle_index btree (postcode_top, postcode_middle)

This doesn't address the query performance problem, but isn't only
one of these indexes necessary?  The second one, on all three
columns, because searches involving only postcode_top or only
postcode_top and postcode_middle could use it, making the indexes
on only those columns superfluous.  Or am I missing something?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Weird performance drop after VACUUM

2005-08-29 Thread Michael Fuhr

On Mon, Aug 29, 2005 at 11:07:17AM -0700, asif ali wrote:
 The database is on the same system.
 What I am doing is only VACUUM analyze 
 conversion_table
 
 I did the the same thing on a newly created database.
 And got the same result. So after VACUUM analyze
 performance dropped.
 Please see this. Runtime changes from 7755.115 to
 14859.291 ms

As has been pointed out a couple of times, you're getting a different
plan after VACUUM ANALYZE because the row count estimates are more
accurate.  Unfortunately the more accurate estimates result in a
query plan that's slower than the plan for the less accurate
estimates.  PostgreSQL *thinks* the plan will be faster but your
results show that it isn't, so you might need to adjust some of the
planner's cost constants.

A asked some questions that you didn't answer, so I'll ask them again:

What's your effective_cache_size setting?
What's your work_mem (8.x) or sort_mem (7.x) setting?
What's your random_page_cost setting?
How much available RAM does the machine have?
What version of PostgreSQL are you running?

Various tuning guides give advice on how to set the above and other
configuration variables.  Here's one such guide:

http://www.powerpostgresql.com/PerfList/

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] difference in plan between 8.0 and 8.1?

2005-08-26 Thread Michael Fuhr

On Fri, Aug 26, 2005 at 10:45:07AM -0400, Alan Stange wrote:
 -  Bitmap Heap Scan on tbltimeseries (cost=48714.09..1331000.42 
 rows=10854026 width=8)
   Recheck Cond: (timeseriesid  0)
   -  Bitmap Index Scan on idx_timeseris (cost=0.00..48714.09 
 rows=10854026 width=0)
 Index Cond: (timeseriesid  0)
 
 I'm hoping someone can explain the new query plan (as I'm not sure I 
 understand what it is doing).

Search for bitmap in the 8.1 Release Notes:

http://developer.postgresql.org/docs/postgres/release.html#RELEASE-8-1

You could probably find more detailed discussion in the pgsql-hackers
archives.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Weird performance drop after VACUUM

2005-08-26 Thread Michael Fuhr

On Fri, Aug 26, 2005 at 03:52:24PM -0700, asif ali wrote:
 I have the same issue. After doing VACCUME ANALYZE
 performance of the query dropped. 

Your EXPLAIN output doesn't show the actual query times -- could
you post the EXPLAIN ANALYZE output?  That'll also show how accurate
the planner's row count estimates are.

 Before VACCUME ANALYZE
 
 Index Scan using conversion_table_pk on
 keyword_conversion_table c  (cost=0.00..18599.25
 rows=4986 width=95)
   Index Cond: ((conversion_date =
 '2005-06-07'::date) AND (conversion_date =
 '2005-08-17'::date))
 
 After  VACCUME ANALYZE
 
 Seq Scan on conversion_table c  (cost=0.00..29990.83
 rows=1094820 width=66)
   Filter: ((conversion_date = '2005-06-07'::date)
 AND (conversion_date = '2005-08-17'::date))
 
 I dont know why system is doing Seq scan now.

Notice the row count estimates: 4986 in the before query and
1094820 in the after query.  In the latter, the planner thinks
it has to fetch so much of the table that a sequential scan would
be faster than an index scan.  You can see whether that guess is
correct by disabling enable_seqscan to force an index scan.  It
might be useful to see the output of the following:

SET enable_seqscan TO on;
SET enable_indexscan TO off;
EXPLAIN ANALYZE SELECT ...;

SET enable_seqscan TO off;
SET enable_indexscan TO on;
EXPLAIN ANALYZE SELECT ...;

You might also experiment with planner variables like effective_cache_size
and random_page_cost to see how changing them affects the query
plan.  However, be careful of tuning the system based on one query:
make sure adjustments result in reasonable plans for many different
queries.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Weird performance drop after VACUUM

2005-08-26 Thread Michael Fuhr

On Fri, Aug 26, 2005 at 05:10:49PM -0700, asif ali wrote:
 GroupAggregate  (cost=195623.66..206672.52 rows=20132
 width=16) (actual time=8205.283..10139.369 rows=55291
 loops=1)
   -  Sort  (cost=195623.66..198360.71 rows=1094820
 width=16) (actual time=8205.114..9029.501 rows=863883
 loops=1)
 Sort Key: keyword_id
 -  Seq Scan on keyword_conversion_table c 
 (cost=0.00..29990.83 rows=1094820 width=16) (actual
 time=0.057..1422.319 rows=863883 loops=1)
   Filter: ((conversion_date =
 '2005-06-07'::date) AND (conversion_date =
 '2005-08-17'::date))
 Total runtime: 14683.617 ms

What are your effective_cache_size and work_mem (8.x) or sort_mem (7.x)
settings?  How much RAM does the machine have?  If you have enough
memory then raising those variables should result in better plans;
you might also want to experiment with random_page_cost.  Be careful
not to set work_mem/sort_mem too high, though.  See Run-time
Configuration in the Server Run-time Environment chapter of the
documentation for more information about these variables.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Speedier count(*)

2005-08-10 Thread Michael Fuhr

On Wed, Aug 10, 2005 at 05:37:49PM -0600, Dan Harris wrote:
 Also, I am using select ... group by ... order by .. limit 1 to get  
 the min/max since I have already been bit by the issue of min() max()  
 being slower.

PostgreSQL 8.1 will have optimizations for certain MIN and MAX
queries.

http://archives.postgresql.org/pgsql-committers/2005-04/msg00163.php
http://archives.postgresql.org/pgsql-committers/2005-04/msg00168.php

-- 
Michael Fuhr

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Table locking problems?

2005-08-09 Thread Michael Fuhr

On Tue, Aug 09, 2005 at 12:04:11PM -0600, Dan Harris wrote:
 These changes have definitely improved performance, but I am now  
 finding some trouble with UPDATE or DELETE queries hanging and  
 never releasing their locks.  As this happens, other statements queue  
 up behind it.

Have you examined pg_locks to see if the UPDATE or DELETE is blocked
because of a lock another session holds?

Are you using foreign keys?  When updating referencing rows, released
versions of PostgreSQL acquire a lock on the referenced row that can
hurt concurrency or cause deadlock (this will be improved in 8.1).

-- 
Michael Fuhr

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Two queries are better than one?

2005-07-28 Thread Michael Fuhr

On Thu, Jul 28, 2005 at 04:04:25PM -0700, Karim Nassar wrote:
 I ran into a situation today maintaining someone else's code where the
 sum time running 2 queries seems to be faster than 1. The original code
 was split into two queries. I thought about joining them, but
 considering the intelligence of my predecessor, I wanted to test it. 
 
 The question is, which technique is really faster? Is there some hidden
 setup cost I don't see with explain analyze?

To see which technique will be faster in your application, time the
application code.  The queries you show are taking fractions of a
millisecond; the communications overhead of executing two queries
might make that technique significantly slower than just the server
execution time that EXPLAIN ANALYZE shows.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] How to revoke a password

2005-07-08 Thread Michael Fuhr

On Fri, Jul 08, 2005 at 05:16:27PM -0700, Bailey, Larry wrote:

 Thanks but it is still prompting for a password. 

Let's back up a bit: what problem are you trying to solve?  Do you
want the user to be able to log in without entering a password?  If
so then see Client Authentication in the documentation:

http://www.postgresql.org/docs/8.0/static/client-authentication.html

If you're trying to do something else then please elaborate, as
it's not clear what you mean by I want to ALTER that user to exclude
the password.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] ALTER TABLE tabla ALTER COLUMN columna SET STATISTICS number

2005-07-06 Thread Michael Fuhr

On Wed, Jul 06, 2005 at 04:49:21PM -0300, Dario wrote:
 where is stored the value set by ALTER TABLE table_name ALTER COLUMN
 column_name SET STATISTICS = [1-1000]?

pg_attribute.attstattarget

Example query:

SELECT attrelid::regclass, attname, attstattarget
FROM pg_attribute
WHERE attstattarget  0;

See the System Catalogs chapter in the documentation for more
information.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Postgresql7.4.5 running slow on plpgsql function

2005-06-29 Thread Michael Fuhr

[Please copy the mailing list on replies so others can contribute
to and learn from the discussion.]

On Wed, Jun 29, 2005 at 12:29:42PM +0800, Chun Yit(Chronos) wrote:

 Yes, the function will delete records from the temporary table every time 
 on each call.
 if leaving a lot of dead tuples, then how can we solve it?

If the function deletes all records from the temporary table then
you could use TRUNCATE instead of DELETE.  Otherwise you could
VACUUM the table between calls to the function (you can't run VACUUM
inside a function).

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Postgresql7.4.5 running slow on plpgsql function

2005-06-28 Thread Michael Fuhr

On Thu, Jun 23, 2005 at 05:56:52PM +0800, Chun Yit(Chronos) wrote:

 currently we have a function that use together with temp table, it calls
 search result function, everytime this function is calling, it will go
 through some filter before come out as a result.  now we have some major
 problem , the first time the function execute, it take about 13 second
 second time the function is execute, it take about 17 second, every time
 you execute the function the time taken will grow about 4 second, ?  may
 i know what going on here?  since we use function with temp table, so
 every statement that related to temp table will using EXECUTE command.

Could you post the function?  Without knowing what the code is doing
it's impossible to say what's happening.  Is the temporary table
growing on each function call?  Does the function delete records
from the table on each call, leaving a lot of dead tuples?

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Michael Fuhr

On Tue, Jun 28, 2005 at 01:54:08AM +, Karl O. Pinc wrote:
 On 06/27/2005 06:33:03 PM, Michael Fuhr wrote:

 See timeofday().
 
 That only gives you the time at the start of the transaction,
 so you get no indication of how long anything in the
 transaction takes.

Did you read the documentation or try it?  Perhaps you're thinking
of now(), current_timestamp, and friends, which don't advance during
a transaction; but as the documentation states, timeofday() returns
the wall-clock time and does advance during transactions.

I just ran tests on versions of PostgreSQL going back to 7.2.8 and
in all of them timeofday() advanced during a transaction.  Does it
not work on your system?  If not then something's broken -- what
OS and version of PostgreSQL are you using?

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Performance analysis of plpgsql code

2005-06-27 Thread Michael Fuhr

On Tue, Jun 28, 2005 at 03:03:06AM +, Karl O. Pinc wrote:
 
 For all your work a documentation patch is appended that
 I think is easier to read and might avoid this problem
 in the future.

Patches should go to the pgsql-patches list -- the people who review
and apply patches might not be following this thread.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Limit clause not using index

2005-06-21 Thread Michael Fuhr

On Tue, Jun 21, 2005 at 09:46:39PM +0200, Tobias Brox wrote:
 [John A Meinel - Tue at 10:14:24AM -0500]
  I believe if you drop the indexes inside a transaction, they will still
  be there for other queries, and if you rollback instead of commit, you
  won't lose anything.
 
 Has anyone tested this?

Observations from tests with 8.0.3:

DROP INDEX acquires an AccessExclusiveLock on the table and on the
index.  This will cause the transaction executing the DROP INDEX
to block until no other transaction holds any kind of lock on either,
and once the locks are acquired, no other transaction will be able
to access the table or the index until the transaction doing the
DROP INDEX commits or rolls back.  Rolling back leaves the index
in place.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Need help to decide Mysql vs Postgres

2005-06-06 Thread Michael Fuhr

On Mon, Jun 06, 2005 at 08:25:08PM +0300, Tobias Brox wrote:
 [Jeffrey Tenny - Mon at 11:51:22AM -0400]
  There are some pitfalls to pgsql though, especially for existing SQL 
  code using MAX and some other things which can really be blindsided 
  (performance-wise) by pgsql if you don't use the workarounds.
 
 Yes, I discovered that - select max(num_attr) does a full table scan even
 if the figure can be found easily through an index.

PostgreSQL 8.1 will be able to use indexes for MIN and MAX.

http://archives.postgresql.org/pgsql-committers/2005-04/msg00163.php
http://archives.postgresql.org/pgsql-committers/2005-04/msg00168.php

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] timestamp indexing

2005-05-30 Thread Michael Fuhr

On Mon, May 30, 2005 at 05:19:51PM +0800, Tobias Brox wrote:

 We have a production database with transaction-style data, in most of the
 tables we have a timestamp attribute created telling the creation time of
 the table row.  Naturally, this attribute is always increasing.

The message subject is timestamp indexing but you don't mention
whether you have an index on the timestamp column.  Do you?

 By now we are hitting the limit where the table data does not fit in caches
 anymore.  We have a report section where there are constantly requests for
 things like sum up all transactions for the last two weeks, and those
 requests seem to do a full table scan, even though only the last parts of
 the table is needed - so by now those reports have started to cause lots of
 iowait.

Could you post an example query and its EXPLAIN ANALYZE output?  If
the query uses a sequential scan then it might also be useful to see
the EXPLAIN ANALYZE output with enable_seqscan turned off.  Since
caching can cause a query to be significantly faster after being run
several times, it might be a good idea to run EXPLAIN ANALYZE three
times and post the output of the last run -- that should put the
queries under comparison on a somewhat equal footing (i.e., we don't
want to be misled about how much faster one query is than another
simply because one query happened to use more cached data on a
particular run).

How many records are in the tables you're querying?  Are you regularly
vacuuming and analyzing the database or the individual tables?  Are
any of the tables clustered?  If so, on what indexes and how often
are you re-clustering them?  What version of PostgreSQL are you using?

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] index on different types

2005-04-28 Thread Michael Fuhr

On Fri, Apr 29, 2005 at 04:35:13AM +0200, Enrico Weigelt wrote:
 
 there's often some talk about indices cannot be used if datatypes
 dont match. 

PostgreSQL 8.0 is smarter than previous versions in this respect.
It'll use an index if possible even when the types don't match.

 On a larger (and long time growed) application I tend to use OID 
 for references on new tables while old stuff is using integer.

If you're using OIDs as primary keys then you might wish to reconsider.
See the caveats in the documentation and in the FAQ:

http://www.postgresql.org/docs/8.0/interactive/datatype-oid.html
http://www.postgresql.org/docs/faqs.FAQ.html#4.12

 Is the planner smart enough to see both as compatible datatype
 or is manual casting required ?

You can use EXPLAIN to see what the planner will do, but be aware
that the planner won't always use an index even if it could: if it
thinks a sequential scan would be faster then it won't use an index.
To see if using an index is possible, you could set enable_seqscan
to off before executing EXPLAIN.  In any case, a foreign key column
probably ought to have the same type as the column it references --
is there a reason for making them different?

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Sort and index

2005-04-18 Thread Michael Fuhr

On Mon, Apr 18, 2005 at 10:44:43AM -0500, Dave Held wrote:
  
  I thought that an index can be used for sorting.
  I'm a little confused about the following result:
  
  create index OperationsName on Operations(cOperationName);
  explain SELECT * FROM Operations ORDER BY cOperationName;
QUERY PLAN
  --
  -
   Sort  (cost=185.37..189.20 rows=1532 width=498)
 Sort Key: coperationname
 -  Seq Scan on operations  (cost=0.00..104.32 rows=1532 width=498)
  (3 rows)
  
  Is this supposed to be so?
 
 Since you are fetching the entire table, you are touching all the rows.
 If the query were to fetch the rows in index order, it would be seeking
 all over the table's tracks.  By fetching in sequence order, it has a
 much better chance of fetching rows in a way that minimizes head seeks.
 Since disk I/O is generally 10-100x slower than RAM, the in-memory sort 
 can be surprisingly slow and still beat indexed disk access.  Of course,
 this is only true if the table can fit and be sorted entirely in memory
 (which, with 1500 rows, probably can).

Out of curiosity, what are the results of the following queries?
(Queries run twice to make sure time differences aren't due to
caching.)

SET enable_seqscan TO on;
SET enable_indexscan TO off;
EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;
EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;

SET enable_seqscan TO off;
SET enable_indexscan TO on;
EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;
EXPLAIN ANALYZE SELECT * FROM Operations ORDER BY cOperationName;

SELECT version();

With 1500 rows of random data, I consistently see better performance
with an index scan (about twice as fast as a sequence scan), and
the planner uses an index scan if it has a choice (i.e., when
enable_seqscan and enable_indexscan are both on).  But my test case
and postgresql.conf settings might be different enough from yours
to account for different behavior.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] refcurosr vs. setof

2005-04-18 Thread Michael Fuhr

On Sun, Apr 17, 2005 at 10:05:29PM +0200, Rüdiger Herrmann wrote:
 
 I need to write several PL/pgSQL functions all returning a result set wich 
 can be obtained by a single SELECT statement.
 For now the functions are called by a Java application.
 Both REFCURSOR and SETOF serve my purpose, but I was wondering if there is a 
 perfonance difference between the two. The result set can become quite 
 large.

Here's an excerpt from the Control Structures section of the
PL/pgSQL documentation:

The current implementation of RETURN NEXT for PL/pgSQL stores
the entire result set before returning from the function, as
discussed above.  That means that if a PL/pgSQL function produces
a very large result set, performance may be poor: data will be
written to disk to avoid memory exhaustion, but the function
itself will not return until the entire result set has been
generatedCurrently, the point at which data begins being
written to disk is controlled by the work_mem configuration
variable.

You might want to test both ways in typical and worst-case scenarios
and see how each performs.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] profiling postgresql queries?

2005-04-12 Thread Michael Fuhr

On Tue, Apr 12, 2005 at 12:46:43PM +0200, hubert lubaczewski wrote:
 
 the problem is that both the inserts and updated operate on
 heavy-tirggered tables.
 and it made me wonder - is there a way to tell how much time of backend
 was spent on triggers, index updates and so on?
 like:
 total query time: 1 secons
 trigger a: 0.50 second
 trigger b: 0.25 second
 index update: 0.1 second

EXPLAIN ANALYZE in 8.1devel (CVS HEAD) prints a few statistics for
triggers:

EXPLAIN ANALYZE UPDATE foo SET x = 10 WHERE x = 20;
QUERY PLAN  
  
--
 Index Scan using foo_x_idx on foo  (cost=0.00..14.44 rows=10 width=22) (actual 
time=0.184..0.551 rows=7 loops=1)
   Index Cond: (x = 20)
 Trigger row_trig1: time=1.625 calls=7
 Trigger row_trig2: time=1.346 calls=7
 Trigger stmt_trig1: time=1.436 calls=1
 Total runtime: 9.659 ms
(6 rows)

8.1devel changes frequently (sometimes requiring initdb) and isn't
suitable for production, but if the trigger statistics would be
helpful then you could set up a test server and load a copy of your
database into it.  Just beware that because it's bleeding edge, it
might destroy your data and it might behave differently than released
versions.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Functionscan estimates

2005-04-08 Thread Michael Fuhr

On Fri, Apr 08, 2005 at 03:15:50PM -0700, Josh Berkus wrote:
 
 I'm wondering if it might be useful to be able to add estimated selectivity 
 to 
 a function definition for purposes of query estimation.  Currently function 
 scans automatically return a flat default 1000 estimated rows.   It seems 
 like the DBA ought to be able to ALTER FUNCTION and give it a row estimate 
 for planning purposes.   

About a month ago I mentioned that I'd find that useful.  In a
followup, Christopher Kings-Lynne brought up the idea of a GUC
variable that could give hints about the expected row count.

http://archives.postgresql.org/pgsql-hackers/2005-03/msg00146.php
http://archives.postgresql.org/pgsql-hackers/2005-03/msg00153.php

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] date - range

2005-04-01 Thread Michael Fuhr

On Fri, Apr 01, 2005 at 09:59:44PM -0800, Mischa wrote:
  
  select   ... where first_date = today and last_date = today
  
  Whatever index we create system always does a sequential scan (which I can
  understand). Has someone a smarter solution?
 
 Yep, standard SQL problem. The answer is sort of a hand-rolled GIST index.

That might not be necessary in this case.

CREATE TABLE foo (
id  serial PRIMARY KEY,
first_date  date NOT NULL,
last_date   date NOT NULL,
CONSTRAINT check_date CHECK (last_date = first_date)
);

/* populate table */

CREATE INDEX foo_date_idx ON foo (first_date, last_date);
ANALYZE foo;

EXPLAIN SELECT * FROM foo
WHERE first_date = current_date AND last_date = current_date;
 QUERY PLAN 


 Index Scan using foo_date_idx on foo  (cost=0.01..15.55 rows=97 width=12)
   Index Cond: ((first_date = ('now'::text)::date) AND (last_date = 
('now'::text)::date))
(2 rows)

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

1 2 >

1 - 100 of 119 matches

Mail list logo