from:"Richard Huxton"

Re: [PERFORM] Tunning FreeeBSD and PostgreSQL

2003-07-14 Thread Richard Huxton

On Monday 14 Jul 2003 3:31 pm, Stephen Howie wrote:
[snip]
 My problem is that I have not totally put my head around the concepts of
 the shmmax, shmmaxpgs, etc  As it pertains to my current setup and the
 shared mem values in postgresql.conf.  I'm looking for a good rule of thumb
 when approaching this.  Any help or direction would be greatly appreciated.

There are two articles recently posted here:

http://www.varlena.com/GeneralBits/

They should provide a good start.
-- 
  Richard Huxton

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] factoring problem with view in 7.3.3

2003-07-23 Thread Richard Huxton

On Wednesday 23 July 2003 11:21, Rajesh Kumar Mallah wrote:
 Hi ,

 I have a view which is a union of select of certain feilds from
 indentical tables. The problem is when we query a column on
 which index exists exists foreach of the tables does not use the
 indexes.


 But when we query individual tables it uses indexes.

 tradein_clients=# create view sent_enquiry_eyp_iid_ip_cat1 as
 select rfi_id,sender_uid,receiver_uid,subject,generated from eyp_rfi  UNION
 select rfi_id,sender_uid,receiver_uid,subject,generated from iid_rfi UNION
 select rfi_id,sender_uid,receiver_uid,subject,generated from ip_rfi UNION
 select rfi_id,sender_uid,receiver_uid,subject,generated from catalog_rfi ;

 CREATE VIEW
 tradein_clients=#
 tradein_clients=# explain analyze select rfi_id  from
 sent_enquiry_eyp_iid_ip_cat1 where sender_uid = 34866;

[snip query plan showing full selects being done and then filtering on the 
outputs]

I do remember some talk about issues with pushing where clauses down into 
unions on a view (sorry - can't remember when - maybe check the archives). 
Actually, I thought work had been done on that for 7.3.3, but it might have 
been 7.4

If you generally do that particular query (checking agains sender_uid) then 
the simplest solution is to build an SQL query to push the comparison down 
for you:

CREATE my_function(int4) RETURNS SETOF my_type AS '
  SELECT ... FROM eyp_rfi WHERE sender_uid = $1 UNION
  ...etc...
' LANGUAGE 'SQL';

Note that you may get an error about an operator =$ if you miss the spaces 
around the =.

HTH

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] PostgreSQL performance problem - tuning

2003-08-14 Thread Richard Huxton

On Thursday 07 August 2003 17:30, Yaroslav Mazurak wrote:
 Hi All!


   First, thanks for answers!

 Richard Huxton wrote:
  On Thursday 07 August 2003 09:24, Yaroslav Mazurak wrote:
 IIRC there is a limit on filesystem cache on freeBSD. 300MB by default.
 If that is the case, you might have to raise it to make
 effective_cache_size really effective..
 
 Try various sysctls says nothing for me. I want use *all available
 RAM* (of course, without needed for OS use) for PostgreSQL.
 
  PG will be using the OS' disk caching.

   I think all applications using OS disk caching. ;)
   Or you want to say that PostgreSQL tuned for using OS-specific cache
 implementation?
   Do you know method for examining real size of OS filesystem cache? If I
 understood right, PostgreSQL dynamically use all available RAM minus
 shared_buffers minus k * sort_mem minus effective_cache_size?
   I want configure PostgreSQL for using _maximum_ of available RAM.

PG's memory use can be split into four areas (note - I'm not a developer so 
this could be wrong).
1. Shared memory - vital so that different connections can communicate with 
each other. Shouldn't be too large, otherwise PG spends too long managing its 
shared memory rather than working on your queries.
2. Sort memory - If you have to sort results during a query it will use up to 
the amount you define in sort_mem and then use disk if it needs any more. 
This is for each sort.
3. Results memory - If you're returning 8000 rows then PG will assemble these 
and send them to the client which also needs space to store the 8000 rows.
4. Working memory - to actually run the queries - stack and heap space to keep 
track of its calculations etc.

Your best bet is to start off with some smallish reasonable values and step 
them up gradually until you don't see any improvement. What is vital is that 
the OS can cache enough disk-space to keep all your commonly used tables and 
indexes in memory - if it can't then you'll see performance drop rapidly as 
PG has to keep accessing the disk.

For the moment, I'd leave the settings roughly where they are while we look at 
the query, then once that's out of the way we can fine-tune the settings.

[snip suggestion to break the query down]
   Yes, you're right. I've tested a few statements and obtain interesting
 results.
   SELECT * FROM v_file02wide WHERE... executes about 34 seconds.
   SELECT showcalc(...); executes from 0.7 seconds (without recursion) up
 to 6.3 seconds if recursion is used! :(
   This mean, that approximate execute time for fully qualified SELECT
 with about 8K rows is... about 13 hours! :-O

Hmm - not good.

   Hence, problem is in my function showcalc:

That's certainly the place to start, although we might be able to do something 
with v_file02wide later.

 CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4),
 NUMERIC(16)) RETURNS NUMERIC(16)
 LANGUAGE SQL STABLE AS '
 -- Parameters: code, dd, r020, t071
   SELECT COALESCE(
   (SELECT sc.koef * $4
   FROM showing AS s NATURAL JOIN showcomp AS sc
   WHERE s.kod = $1
   AND NOT SUBSTR(acc_mask, 1, 1) = ''[''
   AND SUBSTR(acc_mask, 1, 4) = $3
   AND SUBSTR(acc_mask, 5, 1) = SUBSTR($2, 1, 1)),
   (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2,
 LENGTH(acc_mask) - 2), $2, $3, $4), 0))
   FROM showing AS s NATURAL JOIN showcomp AS sc
   WHERE s.kod = $1
   AND SUBSTR(acc_mask, 1, 1) = ''[''),
   0) AS showing;
 ';

   BTW, cross join , with WHERE clause don't improve performance
 relative to NATURAL JOIN.
   Additionally, with user-defined function beginchar (SUBSTR(..., 1, 1)),
 used for indexing, showcalc executes about 16 seconds. With function
 SUBSTR the same showcalc executes 6 seconds.

Fair enough - substr should be fairly efficient.

[snip explanation of table structures and usage]

I'm not going to claim I understood everything in your explanation, but there 
are a couple of things I can suggest. However, before you go and do any of 
that, can I ask you to post an EXPLAIN ANALYSE of two calls to your 
showcalc() function (once for a simple account, once for one with recursion)? 
You'll need to cut and paste the query as standard SQL since the explain 
won't look inside the function body.

OK - bear in mind that these suggestions are made without the benefit of the 
explain analyse:

1. You could try splitting out the various tags of your mask into different 
fields - that will instantly eliminate all the substr() calls and might make 
a difference. If you want to keep the mask for display purposes, we could 
build a trigger to keep it in sync with the separate flags.

2. Use a calculations table and build your results step by step. So - 
calculate all

Re: [PERFORM] PostgreSQL performance problem - tuning

2003-08-14 Thread Richard Huxton

On Thursday 07 August 2003 09:24, Yaroslav Mazurak wrote:
  IIRC there is a limit on filesystem cache on freeBSD. 300MB by default.
  If that is the case, you might have to raise it to make
  effective_cache_size really effective..

   Try various sysctls says nothing for me. I want use *all available
 RAM* (of course, without needed for OS use) for PostgreSQL.

PG will be using the OS' disk caching.

   While idle time top says:

 Mem: 14M Active, 1944K Inact, 28M Wired, 436K Cache, 48M Buf, 331M Free
 Swap: 368M Total, 17M Used, 352M Free, 4% Inuse

   After 1 minute of EXPLAIN ANALYZE SELECT SUM(showcalc('B00204', dd,
 r020, t071)) FROM v_file02wide WHERE a011 = 3 AND inrepdate(data) AND
 b030 IN (SELECT b030 FROM dov_bank WHERE dov_bank_box_22(box) IN ('NL',
 'NM')) AND r030 = 980; executing:

 Mem: 64M Active, 17M Inact, 72M Wired, 436K Cache, 48M Buf, 221M Free
 Swap: 368M Total, 3192K Used, 365M Free

PID USERNAMEPRI NICE  SIZERES STATETIME   WCPUCPU
 COMMAND 59063 postgres 49   0 65560K 55492K RUN  1:06 94.93% 94.63%
 postgres

   After 12 minutes of query executing:

 Mem: 71M Active, 17M Inact, 72M Wired, 436K Cache, 48M Buf, 214M Free
 Swap: 368M Total, 3192K Used, 365M Free

PID USERNAMEPRI NICE  SIZERES STATETIME   WCPUCPU
 COMMAND 59063 postgres 56   0 73752K 62996K RUN 12:01 99.02% 99.02%
 postgres

   I suspect that swap-file size is too small for my query... but query
 isn't too large, about 8K rows only. :-|

Looks fine - PG isn't growing too large and your swap usage seems steady. We 
can try upping the sort memory later, but given the amount of data you're 
dealing with I'd guess 64MB should be fine.

I think we're going to have to break the query down a little and see where the 
issue is.

What's the situation with:
EXPLAIN ANALYZE SELECT some_field FROM v_file02wide WHERE a011 = 3 AND 
inrepdate(data) AND b030 IN (SELECT b030 FROM dov_bank WHERE 
dov_bank_box_22(box) IN ('NL', 'NM')) AND r030 = 980;

and:
EXPLAIN ANALYZE SELECT SUM(showcalc(parameters)) FROM something simple

Hopefully one of these will run in a reasonable time, and the other will not. 
Then we can examine the slow query in more detail. Nothing from your previous 
EXPLAIN (email of yesterday 13:42) looks unreasonable but something must be 
going wild in the heart of the query, otherwise you wouldn't be here.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] PostgreSQL performance problem - tuning

2003-08-14 Thread Richard Huxton

 is to get those two configuration settings somewhere sane, then we 
can tune properly. You might like the document at:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Performance problems on a fairly big table with two key columns.

2003-09-05 Thread Richard Huxton

On Thursday 04 September 2003 23:53, Rasmus Aveskogh wrote:
 Hi,

 I have a table that looks like this:

   DATA   ID   TIME

 |--||--|

 The table holds app. 14M rows now and grows by app. 350k rows a day.

 The ID-column holds about 1500 unique values (integer).
 The TIME-columns is of type timestamp without timezone.

 I have one index (b-tree) on the ID-column and one index (b-tree) on the
 time-column.

 My queries most often look like this:

 SELECT DATA FROM tbl WHERE ID = 1 AND TIME  now() - '1 day'::interval;
[snip]
 I tried applying a multicolumn index on ID and TIME, but that one won't
 even be used (after ANALYZE).

The problem is likely to be that the parser isn't spotting that now()-'1 day' 
is constant. Try an explicit time and see if the index is used. If so, you 
can write a wrapper function for your expression (mark it STABLE so the 
planner knows it won't change during the statement).

Alternatively, you can do the calculation in the application and use an 
explicit time.

HTH
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] [GENERAL] Seq scan of table?

2003-09-05 Thread Richard Huxton

On Friday 05 September 2003 09:47, Bjorn T Johansen wrote:
 I think I have found out why.. I have a where clause on a ID field but it
 seems like I need to cast this integer to the same integer as the field is
 defined in the table, else it will do a tablescan.

 Is this assumtion correct? And if it is, do I then need to change all my
 sql's to cast the where clause where I just have a number (eg where field
 = 1) to force the planner to use index scan instead of seq scan?

PG's parser will assume an explicit number is an int4 - if you need an int8 
etc you'll need to cast it, yes.
You should find plenty of discussion of why in the archives, but the short 
reason is that PG's type structure is quite flexible which means it can't 
afford to make too many assumptions.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] [GENERAL] Seq scan of table?

2003-09-05 Thread Richard Huxton

On Friday 05 September 2003 19:20, Neil Conway wrote:
 On Fri, 2003-09-05 at 06:07, Richard Huxton wrote:
  PG's parser will assume an explicit number is an int4 - if you need an
  int8 etc you'll need to cast it, yes.

 Or enclose the integer literal in single quotes.

  You should find plenty of discussion of why in the archives, but the
  short reason is that PG's type structure is quite flexible which means it
  can't afford to make too many assumptions.

 Well, it's definitely a bug in PG, it's quite flexible type structure
 notwithstanding.

It certainly catches out a lot of people. I'd guess it's in the top three 
issues in the general/sql lists. I'd guess part of the problem is it's so 
silent. In some ways it would be better to issue a NOTICE every time a 
typecast is forced in a comparison - irritating as that would be.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] pgsql inserts problem

2003-08-28 Thread Richard Huxton

On Wednesday 27 August 2003 13:50, Tarhon-Onu Victor wrote:

 shared_buffers = 520
 max_locks_per_transaction = 128
 wal_buffers = 8
 max_fsm_relations = 3
 max_fsm_pages = 482000
 sort_mem = 131072
 vacuum_mem = 131072
 effective_cache_size = 1
 random_page_cost = 2

Slightly off-topic, but I think your tuning settings are a bit out. You've got 
4MB allocated to shared_buffers but 128MB allocated to sort_mem? And only 
80MB to effective_cache_size? Your settings might be right, but you'd need a 
very strange set of circumstances.

As for PG silently discarding inserts, your best bet might be to write a short 
Perl script to reproduce the problem. Without that, people are likely to be 
sceptical - if PG tended to do this sort of thing, none of us would use it.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Need advice about triggers

2003-09-09 Thread Richard Huxton

On Tuesday 09 September 2003 13:40, Mindaugas Riauba wrote:
   Hello,

   I have small table (up to 1 rows) and every row will be updated
 once per minute. Table also has before update on each row trigger
 written in plpgsql. But trigger 99.99% of the time will do nothing
 to the database. It will just compare old and new values in the row
 and those values almost always will be identical.

   Now I tried simple test and was able to do 1 updates on 1000
 rows table in ~30s. That's practically enough but I'd like to have
 more room to slow down.
   Also best result I achieved by doing commit+vacuum every ~500
 updates.

   How can I improve performance and will version 7.4 bring something
 valuable for my task? Rewrite to some other scripting language is not
 a problem. Trigger is simple enough.

Well, try it without the trigger. If performance improves markedly, it might 
be worth rewriting in C.

If not, you're probably saturating the disk I/O - using iostat/vmstat will let 
you see what's happening. If it is your disks, you might see if moving the 
WAL onto a separate drive would help, or check the archives for plenty of 
discussion about raid setups.

   Postgres v7.3.4, shared_buffers=4096 max_fsm settings also bumped up
 10 times.

Well effective_cache_size is useful for reads, but won't help with writing. 
You might want to look at wal_buffers and see if increasing that helps, but I 
couldn't say for sure.
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] A Basic Question

2003-10-03 Thread Richard Huxton

On Friday 03 October 2003 07:34, [EMAIL PROTECTED] wrote:
 12:28p
 Dear All,
 This question is regarding the performance of queries in general.
 The performance of the queries wud varying depending on the no. Of tuples
 it is returning, and the sort of alogorithm that will be implemented or the
 retrieval. Now if the relation returns zero tuples.. (the seq, and the
 index scan is the best option) and if there are 1 or more then rest
 PG-supported scans will be the best. Now here is where I am having a bit of
 considerations. My relation works fast, when it returns more than on tuple.
 But get's slow when it returns zero tuple. Now how shud I got abt it.

If PG has to examine a lot of tuples to rule them out, then returning no rows 
can take longer.

If you post EXPLAIN ANALYSE output for both queries, someone will be able to 
explain why in your case.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] [SQL] sql performance and cache

2003-10-11 Thread Richard Huxton

On Saturday 11 October 2003 10:43, Chris Faulkner wrote:
 Hello all

 I have two very similar queries which I need to execute. They both have
 exactly the same from / where conditions. When I execute the first, it
 takes about 16 seconds. The second is executed almost immediately after, it
 takes 13 seconds. In short, I'd like to know why the query result isn't
 being cached and any ideas on how to improve the execution.

The short answer is that PG doesn't cache query results. The only way it could 
do so safely is to lock all tables you access to make sure that no other 
process changes them. That would effectively turn PG into a single-user DB in 
short notice.

 The first query attempts to find the maximum size of an array in the result
 set- the field is called level. IT contains anything between 1 and 10
 integers. I just need to know what the largest size is. I do this to find
 out the maximum size of the level array.

 max(replace(split_part(array_dims(level),':',2),']','')::int)

 I know this is big and ugly but is there any better way of doing it ?

 The second query just returns the result set - it has exactly the same
 FROM/Where clause.

I assume these two queries are linked? If you rely on the max size being 
unchanged and have more than one process using the database, you should make 
sure you lock the rows in question.

 OK - so I could execute the query once, and get the maximum size of the
 array and the result set in one. I know what I am doing is less than
 optimal but I had expected the query results to be cached. So the second
 execution would be very quick. So why aren't they ? I have increased my
 cache size - shared_buffers is 2000 and I have doubled the default
 max_fsm... settings (although I am not sure what they do). sort_mem is
 8192.

PG will cache the underlying data, but not the results. The values you are 
changing are used to hold table/index rows etc. This means the second query 
shouldn't need to access the disk if the rows it requires are cached.

There is a discussion of the postgresql.conf file and how to tune it at:
  http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

Given the explain attached, 16 secs seems slow. Could you post an EXPLAIN 
ANALYSE of either/both queries to the performance list. I'd drop the sql list 
when we're just talking about performance.
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] [SQL] sql performance and cache

2003-10-11 Thread Richard Huxton

On Saturday 11 October 2003 12:12, Chris Faulkner wrote:
 Hello

 Thanks for the reply.

  The short answer is that PG doesn't cache query results. The only
  way it could
  do so safely is to lock all tables you access to make sure that no other
  process changes them. That would effectively turn PG into a
  single-user DB in
  short notice.

 I am not sure I agree with you. I have done similar things with Oracle and
 found that the second query will execute much more quickly than the first.
 It could be made to work in at least two scenarios

I'm guessing because the underlying rows and perhaps the plan are cached, 
rather than the results. If you cached the results of the first query you'd 
only have the max length, not your other data anyway.

[snip]

  I assume these two queries are linked? If you rely on the max size being
  unchanged and have more than one process using the database, you
  should make
  sure you lock the rows in question.

 I can rely on the max size remaining the same. As I mentioned above, the
 tables are entirely read only. The data will not be updated or deleted by
 anyone - I don't need to worry about that. The data will be updated en
 masse once every 3 months.

Hmm - might be worth adding a column for your array length and pre-calculating 
if your data is basically static.

  There is a discussion of the postgresql.conf file and how to tune it at:
http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

 Thanks for that.

  Given the explain attached, 16 secs seems slow. Could you post an EXPLAIN
  ANALYSE of either/both queries to the performance list. I'd drop
  the sql list
  when we're just talking about performance.

 To be honest, my main concern was about the cache. If the second one could
 use a cache amd execute in 2 seconds, that would be better that reducing
 the execution of each individual query by 30% or so.

I'm puzzled as to why they aren't both below 2 seconds to start with - you're 
not dealing with that many rows.

 Thanks for the offer of help on this one. explain analyze gives me the same
 as the last message - did you want verbose ?

Nope, this is what I need. Verbose prints pages of stuff that only the 
developers would be interested in. This one actually runs the query and gives 
you a second set of figures showing times.

  Nested Loop  (cost=0.00..147.11 rows=1 width=148) (actual
 time=84.00..12323.00 rows=67 loops=1)
Join Filter: (inner.GEOM_ID_OF_POINT = outer.POINT_ID)
-  Index Scan using gidx_oscar_point on oscar_point p 
 (cost=0.00..61.34 rows=1 width=57) (actual time=0.00..9.00 rows=67 loops=1)
  Index Cond: (wkb_geometry  'SRID=-1;BOX3D(529540 179658.88
 0,530540 1
 80307.12 0)'::geometry)
  Filter: (((TILE_REF = 'TQ27NE'::bpchar) OR (TILE_REF =
 'TQ28SE'::bp
 char) OR (TILE_REF = 'TQ37NW'::bpchar) OR (TILE_REF =
 'TQ38SW'::bpchar)) AND
  (FEAT_CODE = 3500))

This next bit is the issue. It's joining on TILE_REF and then filtering by 
your three static values. That's taking 67 * 150ms = 10.05secs

-  Index Scan using idx_on_tile_ref on oscar_node n  (cost=0.00..85.74
 rows=2 width=91) (actual time=0.06..150.07 rows=4797 loops=67)
  Index Cond: (n.TILE_REF = outer.TILE_REF)
  Filter: ((TILE_REF = 'TQ27NE'::bpchar) OR (TILE_REF =
 'TQ28SE'::bpchar) OR (TILE_REF = 'TQ37NW'::bpchar) OR (TILE_REF =
 'TQ38SW'::bpchar))

Now if you look at the first set of figures, it's estimating 2 rows rather 
than the 4797 you're actually getting. That's probably why it's chosen to 
join then filter rather than the other way around.

I'd suggest the following:
1. VACUUM FULL on the table in question if you haven't done so since the last 
update/reload. If you aren't doing this after every bulk upload, you probably 
should be.
2. VACUUM ANALYSE/ANALYSE the table.
3. Check the tuning document I mentioned and make sure your settings are at 
least reasonable. They don't have to be perfect - that last 10% takes 
forever, but if they are badly wrong it can cripple you.
4. PG should now have up-to-date stats and a reasonable set of config 
settings. If it's still getting its row estimates wrong, we'll have to look 
at the statistics its got.

If we reach the statistics tinkering stage, it might be better to wait til 
Monday if you can - more people on the list then.
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Richard Huxton

On Tuesday 21 October 2003 15:28, Anjan Dave wrote:
 Hi,

 Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, with
 internal drives on RAID5 will be delivered. Postgres will be from RH8.0.

You'll want to upgrade PG to v7.3.4

 I am planning for these values for the postgres configuration - to begin
 with:

 Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144

 Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that -
 167772

 Effective_cache_size = 262144 (same as shared_buffers - 25%)

My instincts would be to lower the first two substantially, and increase the 
effective cache once you know load levels. I'd probably start with something 
like the values below and work up:
shared_buffers = 8,000 - 10,000 (PG is happier letting the OS do the cacheing)
sort_mem = 4,000 - 8,000 (don't forget this is for each sort)

You'll find the annotated postgresql.conf and performance tuning articles 
useful:
http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

 In a generic sense, these are recommended values I found in some
 documents. The database will be small in size and will gradually grow
 over time from few thousands to a few million records, or more. The
 activity will be mostly of select statements from a few tables with
 joins, orderby, groupby clauses. The web application is based on
 Apache/Resin and hotspot JVM 1.4.0.

You'll need to figure out how many concurrent users you'll have and how much 
memory will be required by apache/java. If your database grows radically, 
you'll probably want to re-tune as it grows.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Problem with insert into select...

2003-11-24 Thread Richard Huxton

On Thursday 20 November 2003 21:04, stephen farrell wrote:
 I'm having a problem with a queyr like:  INSERT INTO FACT (x,x,x,x,x,x)
 SELECT a.key,b.key,c.key,d.key,e.key,f.key from x,a,b,c,d,e,f where x=a
 and x=b    -- postgres7.4 is running out of memory.

When this has happened to me it's always been because I've got an 
unconstrained join due to pilot error. Try an EXPLAIN on the select part and 
see if that pops up anything.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] expression (functional) index use in joins

2003-11-26 Thread Richard Huxton

On Wednesday 26 November 2003 16:38, Roger Ging wrote:
 I just installed v7.4 and restored a database from v7.3.4.
[snip]

Hmm - you seem to be getting different row estimates in the plan. Can you 
re-analyse both versions and post EXPLAIN ANALYSE rather than just EXPLAIN?

 -  Seq Scan on program p  (cost=0.00..15192.35
 rows=4335 width=20)

 planner results on 7.3.4:

   -  Index Scan using idx_program_mri_id_no_program on program
 p  (cost=0.00..3209.16 rows=870 width=20)

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Followup - expression (functional) index use in joins

2003-11-26 Thread Richard Huxton

On Wednesday 26 November 2003 18:39, Roger Ging wrote:
 version 7.4 results:

 explain analyse SELECT L.row_id FROM music.logfile L LEFT JOIN
 music.program P ON
 music.fn_mri_id_no_program(P.mri_id_no) = L.program_id
 WHERE  L.station = UPPER('kabc')::VARCHAR
 AND L.air_date = '04/12/2002'::TIMESTAMP
 AND P.cutoff_date IS NULL
 ORDER BY L.chron_start,L.chron_end;

 -  Seq Scan on program p  (cost=0.00..15192.35
 rows=4335 width=20) (actual time=109.045..1955.882 rows=173998 loops=1)

The estimated number of rows here (4335) is *way* off (173998 actually). If 
you only had 4335 rows, then this might be a more sensible plan.

First step is to run:
  VACUUM ANALYSE program;
Then, check the definition of your function fn_mri_id_no_program() and make 
sure it is marked immutable/stable (depending on what it does) and that it's 
returning a varchar.


-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] tuning questions

2003-12-04 Thread Richard Huxton

On Thursday 04 December 2003 19:50, Jack Coates wrote:

 I'm trying to set Postgres's shared memory usage in a fashion that
 allows it to return requested results quickly. Unfortunately, none of
 these changes allow PG to use more than a little under 300M RAM.
 vacuumdb --analyze is now taking an inordinate amount of time as well
 (40 minutes and counting), so that change needs to be rolled back.

You don't want PG to use all your RAM, it's designed to let the underlying OS 
do a lot of caching for it. Probably worth having a look at vmstat/iostat and 
see if it's saturating on I/O.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Slow UPADTE, compared to INSERT

2003-12-04 Thread Richard Huxton

On Thursday 04 December 2003 19:59, William Yu wrote:
 Ivar Zarans wrote:
  I am experiencing strange behaviour, where simple UPDATE of one field is
  very slow, compared to INSERT into table with multiple indexes. I have
  two tables - one with raw data records (about 24000), where one field

 In Postgres and any other DB that uses MVCC (multi-version concurrency),
 UPDATES will always be slower than INSERTS. With MVCC, what the DB does
 is makes a copy of the record, updates that record and then invalidates
 the previous record. 
[snip]

Yes, but he's seeing 0.25secs to update one row - that's something odd.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] tuning questions

2003-12-04 Thread Richard Huxton

On Thursday 04 December 2003 23:16, Jack Coates wrote:

   effective_cache_size = 1
 
  This is way the heck too low.  it's supposed to be the size of all
  available RAM; I'd set it to 2GB*65% as a start.

 This makes a little bit of difference. I set it to 65% (15869 pages).

That's still only about 127MB (15869 * 8KB).

 Now we have some real disk IO:
procs  memoryswap  io
 system cpu
  r  b  w   swpd   free   buff  cache  si  sobibo   incs  us
 sy  id
  0  3  1   2804  10740  40808 1899856   0   0 26624 0  941  4144

According to this your cache is currently 1,899,856 KB which in 8KB blocks is 
237,482 - be frugal and say effective_cache_size = 20 (or even 15 if 
the trace above isn't typical).

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Slow UPADTE, compared to INSERT

2003-12-05 Thread Richard Huxton

On Friday 05 December 2003 02:07, Ivar Zarans wrote:
 I have played around with explain and explain analyze and noticed one
 interesting oddity:
[snip]
 Why first example, where recid is given as numeric constant, is using
 sequential scan, but second example, where recid is given as string
 constant works with index scan, as expected? Third example shows, that
 numeric constant must be typecasted in order to function properly.

 Is this normal behaviour of fields with bigint type?

As Christopher says, normal (albeit irritating). Not sure it applies here - 
all the examples you've shown me are using the index.

Well - I must admit I'm stumped. Unless you have a *lot* of indexes and 
foreign keys to check, I can't see why it would take so long to update a 
single row. Can you post the schema for the table?
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Nested loop question

2003-12-17 Thread Richard Huxton

On Tuesday 16 December 2003 17:06, Nick Fankhauser - Doxpop wrote:
 Hi-

 I'm trying to optimize a query that I *think* should run very fast.
 Essentially, I'm joining two tables that have very selective indexes and
 constraining the query on an indexed field. (There's a third small lookup
 table in the mix, but it doesn't really affect the bottom line.)

 I'm unsure what is happening next. I notice that an index scan is occurring
 on actor_summary_pk, with an actual time of 9.15, but then it looks like
 a nested loop occurs at the next level to join these tables. Does this mean
 that each probe of the actor_summary index will take 9.15 msec, but the
 nested loop is going to do this once for each actor_id?

That's right - you need to multiply the actual time by the number of loops. In 
your case this would seem to be about 33 seconds.

  -  Index Scan using actor_summary_pk on
 actor_summary  (cost=0.00..8.11 rows=1 width=72) (actual time=9.14..9.15
 rows=1 loops=3639)
Index Cond: (outer.actor_id =
 actor_summary.actor_id)

 The nested loop appears to be where most of my time is going, so I'm
 focusing on this area, but don't know if there is a better approach to this
 join.

 Is there a more efficient means than a nested loop to handle such a join?
 Would a different method be chosen if there was exactly one row in
 actor_summary for every row in actor?

Hmm - tricky to say in your case. PG has decided to filter on actor then look 
up the corresponding values in actor_summary. Given that you have 3 million 
rows in both tables that seems a reasonable approach. You could always try 
forcing different plans by switching the various ENABLE_HASHJOIN etc options 
(see the runtime configuration section of the manuals). I'm not sure that 
will help you here though.

The fact that it's taking you 9ms to do each index lookup suggests to me that 
it's going to disk each time. Does that sound plausible, or do you think you 
have enough RAM to cache your large indexes?

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Slow query problem

2004-01-09 Thread Richard Huxton

On Friday 09 January 2004 07:29, Dennis Björklund wrote:
 On Thu, 8 Jan 2004, Bradley Tate wrote:
 
  select invheadref, invprodref, sum(units)
  from invtran
  group by invheadref, invprodref

 For the above query, shouldn't you have one index for both columns
 (invheadref, invprodref). Then it should not need to sort at all to do the
 grouping and it should all be fast.

Not sure if that would make a difference here, since the whole table is being 
read. 

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] freebsd 5.2 and max_connections

2004-01-13 Thread Richard Huxton

On Tuesday 13 January 2004 16:04, David Hill wrote:
 Hello -
 I am using postgresql to hold aliases, users, and relay_domains for postfix
 and courier to do lookups from.  I am not storing mail in sql.

 I need postgresql to have fast read performance, so i setup index's on the
 tables.  Also, the queries are basically select blah from table where
 domain='domain.com';, so i dont need to be able to support large results.

 I will  have a lot of mail servers connecting to this postgresql db, so i
 need to support a lot of connections... but dont need to support large
 results.

Firstly - if you don't know about the tuning guidelines/annotated config file, 
you should go here:
http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

Hmm - small result sets accessed directly via indexed fields, so sort_mem 
probably isn't important to you.

Make sure your effective cache setting is accurate though, so PG can estimate 
whether it'll need to access the disks.

Not sure if clustering one or more tables will help - I'm guessing not. What 
might help is to increase the statistics gathered on important columns. That 
should give the planner a more accurate estimate of value distribution and 
shouldn't cost you too much to keep accurate, since I'm guessing a low rate 
of updating.

You might want to play with the random page cost (?or is it random access 
cost?) but more RAM for a bigger disk cache is probably the simplest tweak.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Trigger question

2004-01-15 Thread Richard Huxton

On Thursday 15 January 2004 13:13, pginfo wrote:
 Hi,

 I am using pg 7.4.1 and have created a trigger over table with 3 M rows.
 If I start masive update on this table, pg executes this trigger on
 every row and dramaticaly slows the system.
 Exists in pg any way to define the trigger execution only if I have
 changes on some fields?

Not at the moment (and I don't know of any plans for it).

 For example I am able to declare this in oracle.
 My trigger is writen in pgSQL.

Hmm - I can only think of two things you can try:
1. check for the change first thing you do and exit if not there
2. do the same, but write the trigger function in 'C'

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] shared_buffer value

2004-01-15 Thread Richard Huxton

On Thursday 15 January 2004 22:49, Anjan Dave wrote:
 Gurus,

 I have defined the following values on a db:

 shared_buffers = 10240  # 10240 = 80MB
 max_connections = 100
 sort_mem = 1024 # 1024KB is 1MB per operation
 effective_cache_size = 262144   # equals to 2GB for 8k pages

 Rest of the values are unchanged from default.


 The poweredge 2650 machine has 4GB RAM, and the size of the database
 (size of 'data' folder) is about 5GB. PG is 7.4, RH9.

OK - settings don't look unreasonable so far.

 The machine has been getting quite busy (when, say, 50 students login at
 the same time, when others have logged in already) and is maxing out at
 100 connections (will increase this tonight probably to 200). We have
 been getting too many clients message upon trying to connect. Once
 connected, the pgmonitor, and the 'pg_stat_activity' show connections
 reaching about 100.

 There's a series of SELECT and UPDATE statements that get called for
 when a group of users log in simultaneously...and for some reason, many
 of them stay there for a while...

 During that time, if i do a 'top', i can see multiple postmaster
 processes, each about 87MB in size. The Memory utilization drops down to
 about 30MB free, and i can see a little bit of swap utilization in
 vmstat then.

On linux you'll see three values: SIZE, RSS and SHARE. SIZE is what you're 
looking at, RSS is resident set size (it's in main memory) and SHARE is how 
much is shared with other processes. So - 3 processes each with RSS=15MB, 
SIZE=10MB take up 10+5+5+5 = 25MB.
Don't worry about a tiny bit of swap - how is your buff/cache doing then?

 Should i decrease the buffer value to about 50MB and monitor?

That shared_buffer is between all backends. The sort_mem however, is *per 
sort*, not even per backend. So - if a complicated query uses four sorts you 
could use 4MB in one backend.

 Interestingly, at one point, we vacuumed the database, and the size
 reported by 'df -k' on the pgsql slice dropped very
 significantly...guess, it had been using a lot of temp files?

You need to run VACUUM regularly to reclaim unused space. Since you're on 7.4, 
take a look at the pg_autovacuum utility, or start by running VACUUM ANALYZE 
from a cron job every evening. Perhaps a VACUUM FULL at weekends?

 Further steps will be to add more memory, and possibly drop/recreate a
 couple of indexes that are used in the UPDATE statements.

A REINDEX might be worthwhile. Details on this and VACUUM in the manuals.


-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Trigger question

2004-01-20 Thread Richard Huxton

On Tuesday 20 January 2004 00:01, Neil Conway wrote:
 Harald Fuchs [EMAIL PROTECTED] writes:
  Does anyone know how to access the affected values for
  statement-level triggers?  I mean what the old and new
  pseudo-records are for row-level triggers.

 Yeah, I didn't get around to implementing that. If anyone wants this
 feature, I'd encourage them to step up to the plate -- I'm not sure
 when I'll get the opportunity/motivation to implement this myself.

I didn't think they'd be meaningful for a statement-level trigger. Surely 
OLD/NEW are by definition row-level details.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Trigger question

2004-01-20 Thread Richard Huxton

On Tuesday 20 January 2004 16:42, Tom Lane wrote:
 Harald Fuchs [EMAIL PROTECTED] writes:
  Why?  If the underlying table has a primary key, finding corresponding
  pairs is trivial; if there isn't, it's impossible.

 Exactly.  Nonetheless, the correspondence exists --- the UPDATE
 definitely updated some particular row of the OLD set into some
 particular one of the NEW set.  If the trigger API makes it impossible
 to reconstruct the matchup, the API is broken.

Perhaps they should be cursors? The only sensible way I can think of working 
with them would be:
1. count how many rows affected
2. step through one row at a time, doing something.

I suppose there might be cases where you'd want to GROUP BY... which would 
mean you'd need some oid/row-id added to a real recordset.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Queries with timestamps

2004-01-21 Thread Richard Huxton

On Wednesday 21 January 2004 19:06, Josh Berkus wrote:
 Arnau,

  As the number of rows grows the time needed to execute this query
  takes longer.  What'd I should do improve the performance of this query?

 Tip #1)  add an index to the timestamp column
 Tip #2) make sure that you VACUUM and ANALYZE regularly
 Tip #3) You will get better performance if you pass the current_date - 1
 month as a constant from the client instead of in the query.   This is a
 known issue, expected to be fixed in 7.5.

(I think Tip 3 is already fixed in 7.3, or I misunderstand what Josh is 
saying)

Note that this is timestamp-related and not timestamp with time zone 
related. Most of the time you want the latter anyway. If you can use with 
time zones and drop the cast you might well find the index is being used...

EXPLAIN ANALYSE SELECT * FROM log_core WHERE log_ts  CURRENT_DATE - '1 
week'::interval;
 QUERY PLAN
-
 Index Scan using log_core_ts_idx on log_core  (cost=0.00..18.73 rows=239 
width=117) (actual time=0.79..0.79 rows=0 loops=1)
   Index Cond: (log_ts  ((('now'::text)::date - '7 
days'::interval))::timestamp with time zone)
 Total runtime: 1.03 msec
(3 rows)

It seems to help an accurate estimate of number-of-rows if you put an upper 
and lower limit in.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Explain plan for 2 column index

2004-01-29 Thread Richard Huxton

On Thursday 29 January 2004 19:29, [EMAIL PROTECTED] wrote:
 I have 2 columns index.
 The question is if optimizer can use both columns of an index or not,

Should do.

 i.e. the plan should read like this:

   Index Cond:
   ((name)::text = 'name1'::text)
   AND ((date_from)::timestamp with time zone=
 ('now'::text)::timestamp(6) with time zone)

 Whilst I am getting index scan on first column and filter on the other:

  Index Scan using testtab_name_date_from on testtab  (cost=0.00..2.01
 rows=1 width=18)
Index Cond: ((name)::text = 'name1'::text)
Filter: ((date_from)::timestamp with time zone =
 ('now'::text)::timestamp(6)with time zone)

 Could the problem be timestamp column or timestamp with time zones?

What types are the columns here? If date_from isn't timestamp with time zone, 
that might be the issue. Also, I'm not convinced timestamp is the same thing 
as timestamp(6) - why the different accuracies.

Also, note that 'now' is deprecated - now() or CURRENT_TIMESTAMP/DATE/etc are 
preferred.
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Bulk Record upload (was Re: [PERFORM] Increasing number of PG connections)

2004-02-02 Thread Richard Huxton

On Monday 02 February 2004 19:39, Qing Zhao wrote:
 I am new here. I have a question related to this in some way.

Hmm - no real connection I can see - might have been better to start a new 
thread rather than replying to this one. Also, it is usually considered best 
practice not to quote large amounts of the previous message if you're not 
replying to it,

 Our web site needs to upload a large volume of data into Postgres at a
 time. The performance deterioates as number of rows becomes larger.
 When it reaches 2500 rows, it never come back to GUI. Since the tests
 were run through GUI, my suspision is
 that it might be caused by the way the application server talking to
 Postgres server, the connections, etc.. What might be the factors
 involved here? Does anyone know?

You don't really give us enough information. What GUI are you talking about? 
How are you loading this data - as a series of INSERT statements, text-file 
with separators, from Access/MySQL etc?

In general, the fastest way to add a large number of rows is via the COPY sql 
command. Next best is to batch your inserts together into larger transactions 
of say 100-1000 inserts.

Two other things to be aware of are: use of VACUUM/ANALYZE and configuration 
tuning (see http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php).

PG shouldn't have a problem with inserting a few thousand rows, so I suspect 
it's something to do with your application/GUI setup.

Hope that helps, if not try turning on statement logging for PG and then we 
can see what commands your GUI is sending.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Database conversion woes...

2004-02-03 Thread Richard Huxton

On Tuesday 03 February 2004 16:42, Kevin Carpenter wrote:

 Thanks in advance, will give more detail - just looking for some open
 directions and maybe some kicks to fuel my thought in other areas.

I've taken to doing a lot of my data manipulation (version conversions etc) in 
PG even if the final resting place is MySQL.

It's generally not too difficult to transfer data but you will have problems 
with MySQL's more relaxed attitude to data types (things like all-zero 
timestamps). I tend to write a script to tidy the data before export, and 
repeatedly restore from backup until the script corrects all problems.Not 
sure how convenient that'll be with dozens of gigs of data. Might be 
practical to start with the smaller databases, let your script grow in 
capabilities before importing the larger ones.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Database conversion woes...

2004-02-03 Thread Richard Huxton

On Tuesday 03 February 2004 22:29, Kevin wrote:
 The mammoth replicator has been working well.  I had tried
 the pgsql-r and had limited success with it, and dbmirror was just
 taking to long having to do 4 db transactions just to mirror one
 command.  I have eserv but was never really a java kind of guy.

When this is over and you've got the time, I don't suppose you could put 
together a few hundred words describing your experiences with the Mammoth 
replicator - there are a couple of places they could be posted.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Index Performance Help

2004-02-05 Thread Richard Huxton

On Thursday 05 February 2004 12:13, Damien Dougan wrote:
 Hi All,

 I've been seeing very slow read performance on a database of 1 million
 indexed subscribers, which I believe is nothing to do with the data
 itself, but delays on processing the index.

 If I make a random jump into the index (say X), it can take about 50ms
 to read the subscriber. If I then make a close by lookup (say X+10),
 it takes only about 0.5ms to read the subscriber. Making another lookup
 to a far away (say X+1000), it again takes about 50ms to read.

The first time, it has to fetch a block from disk. The second time that disk 
block is already in RAM so it's much faster. The third time it needs a 
different disk block.

 Am I correct in my analysis? Is there anything I can do to improve the
 performance of the index lookups?

Make sure you have enough RAM to buffer your disks. Buy faster disks.

 I've tried increasing the index memory and making a number of queries
 around the index range, but a stray of several hundred indexes from a
 cached entry always results in a major lookup delay.

Yep, that'll be your disks.

 I've also increased the shared memory available to Postgres to 80MB
 incase this is a paging of the index, but it hasn't seemed to have any
 effect.

Probably the wrong thing to do (although you don't mention what hardware 
you've got). Read the tuning document at:
http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

 Sample analyze output for an initial query:

 hydradb=# explain analyze select * from pvsubscriber where actorid =
 'b3432-asdas-232-Subscriber793500';
...
  -  Index Scan using mc_actor_key on mc_actor
 (cost=0.00..4.08 rows=1 width=69) (actual time=39.497..39.499 rows=1
 loops=1)
...
  Total runtime: 49.845 ms

 And the analyze output for a nearby subscriber (10 indexes away):

 hydradb=# explain analyze select * from pvsubscriber where actorid =
 'b3432-asdas-232-Subscriber793510';

...
  -  Index Scan using mc_actor_key on mc_actor
 (cost=0.00..4.08 rows=1 width=69) (actual time=0.220..0.221 rows=1
 loops=1)
  Total runtime: 0.428 ms
 (15 rows)

That certainly seems to be the big change - the only way to consistently get 
1ms timings is going to be to make sure all your data is cached. Try the 
tuning guide above and see what difference that makes. If that's no good, 
post again with details of your config settings, hardware, number of clients 
etc...

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] cacheable stored functions?

2004-02-20 Thread Richard Huxton

On Friday 20 February 2004 15:35, Bill Moran wrote:
 I'm converting a SQL application to PostgreSQL.  The majority of the logic
 in this application is in the stored functions in the database.

 Somewhere, I saw a reference to WITH (iscachable) for stored functions,
 looking again, I'm unable to find any reference to this directive.  I have
 a single function that is _obviously_ safe to cache using this, and it
 generates no errors or problems that I can see.

 Now I'm looking at a lot of other functions that, if cached, would speed
 up performance considerably.  Yet I'm reluctant to use this directive
 since I can't find documentation on it anywhere.

From memory, iscachable was replaced in version 7.3 by the three 
finer-grained settings IMMUTABLE, STABLE, VOLATILE.

I'm guessing the old behaviour is still there for backwards compatibility, but 
it's probably best to use the new versions.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] A cache for the results of queries ?

2004-02-26 Thread Richard Huxton

On Thursday 26 February 2004 13:30, David Pradier wrote:
 Hi everybody,

 i'd like to know if it exists a system of cache for the results of
 queries.

 What i'd like to do :

 select whatever_things from (selection_in_cache) where special_conditions;

 The interesting thing would be to have a precalculated
 selection_in_cache, especially when selection_in_cache is a very long
 list of joins...

You might want to search the archives for the -sql list for a message 
Materialized View Summary - some time this week. That's almost exactly what 
you want.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Select-Insert-Query

2004-02-27 Thread Richard Huxton

On Friday 27 February 2004 16:52, [EMAIL PROTECTED] wrote:

*please* don't post HTML-only messages.

brbrwhat is the most performant way to select for
 example the first 99 rows of a table and insert them into another
 table...brbrat the moment i do this:brbr

 for userrecord in select *
 from table where account_id = a_account_id and counter_id =
 userrecord.counter_id and visitortable_id between a_minid and a_maxid limit
 99 loop
   insert into lastusers (account_id, counter_id, date, ip,
 hostname)
 values(a_account_id,userrecord.counter_id,userrecord.date,userrecord.ip,
 userrecord.hostname);
end loop;

If that is the actual query, I'm puzzled as to what you're doing, since you 
don't know what it is you just inserted. Anyway, you can do this as a single 
query

INSERT INTO lastusers (account_id ... hostname)
SELECT a_account_id, counter_id...
FROM table where...

The LIMIT shouldn't take any time in itself, although if you are sorting then 
PG may need to sort all the rows before discarding all except the first 99.

If this new query is no better, make sure you have vacuum analyse'd the tables 
and post the output of EXPLAIN ANALYSE for the query.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] WAL Optimisation - configuration and usage

2004-02-28 Thread Richard Huxton

Rob

Sir - I have to congratulate you on having the most coherently summarised and 
yet complex list query I have ever seen. 

I fear that I will be learning from this problem rather than helping, but one 
thing did puzzle me - you've set your random_page_cost to 0.5? I'm not sure 
this is sensible - you may be compensating for some other parameter 
out-of-range.


-- 
  Richard Huxton

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] syslog slowing the database?

2004-03-09 Thread Richard Huxton

On Tuesday 09 March 2004 20:29, Greg Spiegelberg wrote:

 iostat reported ~2000 blocks written every 2
 seconds to the DB file system.

 I turned syslog off to see if it was blocking anything and in the
 past couple minutes 1GB has been restored and iostat reports ~35,000
 blocks written every 2 seconds to the DB file system.

 Can anyone confirm this for me?

If syslog is set to sync after every line and you're logging too much then it 
could slow things down as the disk heads shift back and fore between two 
areas of disk. How many disks do you have and in what configuration?

Also - was PG logging a lot of info, or is some other application the culprit?

Tip: put a minus - in front of the file-path in your syslog.conf and it 
won't sync to disk after every entry.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] atrocious update performance

2004-03-17 Thread Richard Huxton

On Tuesday 16 March 2004 00:08, Tom Lane wrote:

 I'm inclined to suspect an issue with foreign-key checking.  You didn't
 give us any details about foreign key relationships your cust table is
 involved in --- could we see those?  And the schemas of the other tables
 involved?

Two questions Tom:
1. Do the stats tables record FK checks, or just explicit table accesses?
2. If not, should they?

If the only real activity is this update then simple before/after views of the 
stats might be revealing.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] two seperate queries run faster than queries ORed together

2004-03-22 Thread Richard Huxton

On Thursday 18 March 2004 21:21, Joseph Shraibman wrote:
 explain
 SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260
 AND (u.status = 3 OR d.status = 3 ) AND NOT u.boolfield ;


QUERY PLAN
 ---
 Aggregate  (cost=128867.45..128867.45 rows=1 width=4)
 -  Hash Join  (cost=32301.47..128866.77 rows=272 width=4)
   Hash Cond: (outer.ukey = inner.ukey)
   Join Filter: ((inner.status = 3) OR (outer.status = 3))
   -  Seq Scan on u  (cost=0.00..41215.97 rows=407824 width=6)
 Filter: ((pkey = 260) AND (NOT boolfield))

There's your problem. For some reason it thinks it's getting 407,824 rows back 
from that filtered seq-scan. I take it that pkey is a primary-key and is 
defined as being UNIQUE? If you actually did have several hundred thousand 
matches then a seq-scan might be sensible.

I'd start by analyze-ing the table in question, and if that doesn't have any 
effect look at the column stats and see what spread of values it thinks you 
have.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Help with query plan inconsistencies

2004-03-24 Thread Richard Huxton

On Tuesday 23 March 2004 18:49, Woody Woodring wrote:
 Hello,

 I am using postgres 7.4.2 as a backend for geocode data for a mapping
 application.  My question is why can't I get a consistent use of my indexes
 during a query, I tend to get a lot of seq scan results.

I'm not sure it wants to be using the indexes all of the time.

  Nested Loop Left Join  (cost=0.00..23433.18 rows=1871 width=34) (actual
 time=0.555..5095.434 rows=3224 loops=1)
  Total runtime: 5100.028 ms

  Nested Loop Left Join  (cost=0.00..76468.90 rows=9223 width=34) (actual
 time=0.559..17387.427 rows=19997 loops=1)
  Total runtime: 17416.501 ms

  Nested Loop Left Join  (cost=0.00..29160.02 rows=2327 width=34) (actual
 time=0.279..510.773 rows=5935 loops=1)
  Total runtime: 516.782 ms

#1 = 630 rows/sec (with index on cable_billing)
#2 = 1,148 rows/sec (without index)
#3 = 11,501 rows/sec (with index)

The third case is so much faster, I suspect the data wasn't cached at the 
beginning of this run.

In any case #2 is faster than #1. If the planner is getting things wrong, 
you're not showing it here.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] [ADMIN] Raw vs Filesystem

2004-03-30 Thread Richard Huxton

On Monday 29 March 2004 22:56, Jaime Casanova wrote:
 ok. if i don't misunderstand you (english is not my mother tongue, so i can
 be wrong). your point is that speed is not necesarily performance, that's
 right.

 so, the real question is what is the best filesystem for optimal speed in
 postgresql?

That's going to depend on a number of things:

1. Size of database
2. Usage patterns (many updates or mostly reads? single user or many?...)
3. What hardware you've got
4. What OS you're running.
5. How you've configured your hardware, OS and PG.

There are some test results people have provided in the archives, but whether 
they apply to your setup is open to argument.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] [ADMIN] Raw vs Filesystem

2004-03-30 Thread Richard Huxton

On Tuesday 30 March 2004 17:43, Josh Berkus wrote:
 Jaime, Richard,

  That's going to depend on a number of things:
  There are some test results people have provided in the archives, but
  whether they apply to your setup is open to argument.

 True.  On Linux overall, XFS, JFS, and Reiser have all looked good at one
 time or another.   Ext3 has never been a leader for performance, though, so
 that's an easy elimination.

True, but on the sorts of commodity boxes I use, it doesn't make sense for me 
to waste time setting up non-standard filesystems - it's cheaper to spend a 
little more for better performance. I think SuSE offer Reiser though, so 
maybe we'll see a wider selection available by default.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] select slow?

2004-03-31 Thread Richard Huxton

On Tuesday 30 March 2004 20:25, Jaime Casanova wrote:
 hi all,


 i have an amd athlon with 256 ram (i know, this is not a *real* server but
 my tables are small)

Nothing wrong with it - it's what I still use as my development server.

 i'm using vb6 (win98) with pgsql-7.3.4 (rh8) trough the psqlodbc.

 when i do a select in took long to execute, here is an example

 CREATE TABLE ICC_M_BANCO (
CodBanco SMALLINT NOT NULL,

 select * from icc_m_banco where codbanco = 1;

 it tooks 13s from it's send until it's executed.

Try:
  SELECT * FROM icc_m_banco WHERE codbanco = 1::smallint;

By default, PG will treat a numeric constant as integer not smallint, so when 
it looks for an index it can't find one for integer, so scans instead.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] What index for 'like (%keyword%)' ???

2004-03-31 Thread Richard Huxton

On Wednesday 31 March 2004 10:51, Priem, Alexander wrote:
 Hi everyone,

 I am building a query which uses a clause like Where doc_description like
 '%keyword%'. I know a normal index won't be of any use here, but since the
 table in question will get fairly big, I do want to use an index.

 Can anyone give me some advise on what kind of index I can use here? Or
 shouldn't I use one in this case?

You probably want to look at the contrib/tsearch2 full-text indexing module.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[PERFORM] Back to Linux 2.6 kernel thoughts...

2004-04-06 Thread Richard Huxton

I seem to remember discussion of anticipatory vs deadline scheduler in 2.6. 
Here is what Andrew Morton (I think) says:

The deadline scheduler has two additional scheduling queues that were not 
available to the 2.4 IO scheduler. The two new queues are a FIFO read queue 
and a FIFO write queue. This new multi-queue method allows for greater 
interactivity by giving the read requests a better deadline than write 
requests, thus ensuring that applications rarely will be delayed by read 
requests.

Deadline scheduling is best suited for database servers and high disk 
performance systems. Morton has experienced up to 15 percent increases on 
database loads while using deadline scheduling.

http://story.news.yahoo.com/news?tmpl=storycid=75e=2u=/nf/20040405/tc_nf/23603

Nothing very in-depth in the story.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] good pc but bad performance,why?

2004-04-07 Thread Richard Huxton

On Wednesday 07 April 2004 05:00, huang yaqin wrote:
 hello

   Thanks, you are right.
I use postmaster -o -F  to start my PGand performance improved
 greatly.

I don't think Tom was recommending turning fsync off. If you have a system 
crash/power glitch then the database can become corrupted.

If you are happy the possibility if losing your data, write performance will 
improve noticably.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] plan problem

2004-04-07 Thread Richard Huxton

On Wednesday 07 April 2004 10:03, Ken Geis wrote:
 Richard Huxton wrote:
  On Tuesday 06 April 2004 21:25, Ken Geis wrote:
 I am trying to find an efficient way to draw a random sample from a
 complex query.  I also want it to be easy to use within my application.
 
 So I've defined a view that encapsulates the query.  The id in the
 driving table is exposed, and I run a query like:
 
 select * from stats_record_view
   where id in (select id from driver_stats
 order by random()
 limit 3);
 
  How about a join?
 
  SELECT s.*
  FROM
  stats_record_view s
  JOIN
  (SELECT id FROM driver_stats ORDER BY random() LIMIT 3) AS r
  ON s.id = r.id;

 Yes, I tried this too after I sent the first mail, and this was somewhat
 better.  I ended up adding a random column to the driving table, putting
 an index on it, and exposing that column in the view.  Now I can say

 SELECT * FROM stats_record_view WHERE random  0.093;

 For my application, it's OK if the same sample is picked time after time
 and it may change if data is added.

Fair enough - that'll certainly do it.

  Also worth checking the various list archives - this has come up in the
  past, but some time ago.

 There are some messages in the archives about how to get a random
 sample.  I know how to do that, and that's not why I posted my message.
   Are you saying that the planner behavior I spoke of is in the
 archives?  I wouldn't know what to search on to find that thread.  Does
 anyone think that the planner issue has merit to address?  Can someone
 help me figure out what code I would look at?

I was assuming after getting a random subset they'd see the same problem you 
are. If not, probably worth looking at. In which case, an EXPLAIN ANALYZE of 
your original query would be good.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] select count(*) very slow on an already vacuumed table.

2004-04-14 Thread Richard Huxton

On Wednesday 14 April 2004 18:53, Rajesh Kumar Mallah wrote:
 Hi
 I have .5 million rows in a table. My problem is select count(*) takes
 ages. VACUUM FULL does not help. can anyone please tell me
 how to i enhance the performance of the setup.

 SELECT count(*) from eyp_rfi;

If this is the actual query you're running, and you need a guaranteed accurate 
result, then you only have one option: write a trigger function to update a 
table_count table with every insert/delete to eyp_rfi.

There is loads of info on this (and why it isn't as simple as you might think) 
in the archives. First though:
1. Is this the actual query, or just a representation?
2. Do you need an accurate figure or just something near enough?

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] select count(*) very slow on an already vacuumed table.

2004-04-15 Thread Richard Huxton

On Thursday 15 April 2004 08:10, Rajesh Kumar Mallah wrote:
 The problem is that i want to know if i need a Hardware upgrade
 at the moment.

 Eg i have another table rfis which contains ~ .6 million records.

 SELECT count(*) from rfis where sender_uid  0;

 Time: 117560.635 ms

 Which is approximate 4804 records per second. Is it an acceptable
 performance  on the hardware below:

 RAM: 2 GB
 DISKS: ultra160 , 10 K  , 18 GB
 Processor: 2* 2.0 Ghz Xeon

Hmm - doesn't seem good, does it? If you run it again, is it much faster 
(since the data should be cached then)? What does vmstat 10 show while 
you're running the query?

One thing you should have done is read the performance tuning guide at:
  http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
The default values are very conservative, and you will need to change them.

 What kind of upgrades shoud be put on the server for it to become
 reasonable fast.

If you've only got one disk, then a second disk for OS/logging. Difficult to 
say more without knowing numbers of users/activity etc.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] [ SOLVED ] select count(*) very slow on an already

2004-04-15 Thread Richard Huxton

On Thursday 15 April 2004 17:19, Rajesh Kumar Mallah wrote:
 Bill Moran wrote:
  Rajesh Kumar Mallah wrote:
  Hi,
 
  The problem was solved by reloading the Table.
  the query now takes only 3 seconds. But that is
  not a solution.
 
  If dropping/recreating the table improves things, then we can reasonably
  assume that the table is pretty active with updates/inserts.  Correct?

 Yes the table results from an import process and under goes lots
 of inserts and updates , but thats before the vacuum full operation.
 the table is not accessed during vacuum. What i want to know is
 is there any wat to automate the dumping and reload of a table
 individually. will the below be safe and effective:

Shouldn't be necessary assuming you vacuum (not full) regularly. However, 
looking back at your original posting, the vacuum output doesn't seem to show 
any rows that need removing.

# VACUUM full verbose eyp_rfi;
INFO:  vacuuming public.eyp_rfi
INFO:  eyp_rfi: found 0 removable, 505960 nonremovable row versions in 
71987 pages
DETAIL:  0 dead row versions cannot be removed yet.

Since your select count(*) showed 505960 rows, I can't see how 
dropping/replacing could make a difference on a sequential scan. Since we're 
not using any indexes I don't see how it could be related to that.

 begin work;
 create table new_tab AS select * from tab;
 truncate table tab;
 insert into tab select * from new_tab;
 drop table new_tab;
 commit;
 analyze tab;

 i havenot tried it but plan to do so.
 but i feel insert would take ages to update
 the indexes if any.

It will have to update them, which will take time.

 BTW

 is there any way to disable checks and triggers on
 a table temporarily while loading data (is updating
 reltriggers in pg_class safe?)

You can take a look at pg_restore and copy how it does it.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] [OFF-TOPIC] - Known maximum size of the PostgreSQL

2004-05-06 Thread Richard Huxton

Christopher Kings-Lynne wrote:
What's the case of bigger database PostgreSQL (so greate and amount of
registers) that they know???

Didn't someone say that RedSheriff had a 10TB postgres database or 
something?
From http://www.redsheriff.com/us/news/news_4_201.html
According to the company, RedSheriff processes 10 billion records a 
month and the total amount of data managed is more than 32TB. Griffin 
said PostgreSQL has been in production for 12 months with not a single 
database fault in that time The stability of the database can not be 
questioned. Needless to say, we are extremely happy.

I think it's safe to assume this is not on a spare Dell 600SC though.
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] LIKE and INDEX

2004-05-05 Thread Richard Huxton

Jie Liang wrote:
All,
This is old topic, when I use:
select url from urlinfo where url like 'http://www.lycos.de%';
it uses the index, good!
but if I use:
select url from urlinfo where url like 'http://%.lycos.de';
it won't use index at all, NOT good!
is there any way I can force secon query use index???
I've seen people define a reverse(text) function via plperl or similar 
then build a functional index on reverse(url). Of course, that would 
rely on your knowing which end of your search pattern has the % wildcard.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] PostgreSQL caching

2004-05-21 Thread Richard Huxton

Vitaly Belman wrote:
Hello,
I have the following problem:
When I run some query after I just run the Postmaster, it takse
several seconds to execute (sometimes more than 10), if I rerun it
again afterwards, it takes mere milliseconds.
So, I guess it has to do with PostgreSQL caching.. But how exactly
does it work? What does it cache? And how can I control it?
There are two areas of cache - PostgreSQL's shared buffers and the 
operating system's disk-cache. You can't directly control what data is 
cached, it just keeps track of recently used data. It sounds like PG 
isn't being used for a while so your OS decides to use its cache for 
webserver files.

I would like to load selected information in the memory before a user
runs the query. Can I do it somehow? As PostgreSQL is used in my case
as webserver, it isn't really helping if the user has to wait 10
seconds every time he goes to a new page (even if refreshing the page
would be really quick, sine Postgre already loaded the data to
memory).
If you could pin data in the cache it would run quicker, but at the 
cost of everything else running slower.

Suggested steps:
1. Read the configuration/tuning guide at:
  http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
2. Post a sample query/explain analyse that runs very slowly when not 
cached.
3. If needs be, you can write a simple timed script that performs a 
query. Or, the autovacuum daemon might be what you want.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] select max(id) from aTable is very slow

2004-06-03 Thread Richard Huxton

David Teran wrote:
Hi,
we have a table with about 6.000.000 rows. There is an index on a  
column with the name id which is an integer and serves as primary key.

When we execute select max(id) from theTable; it takes about 10  
seconds. Explain analyze returns:
Due to the open-ended nature of PG's aggregate function system, it can't 
see inside the max() function to realise it doesn't need all the values.

Fortune favours the flexible however - the simple workaround is to use 
the equivalent:
  SELECT id FROM theTable ORDER BY id DESC LIMIT 1;

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] [SQL] Materialized View Summary

2004-06-07 Thread Richard Huxton

On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote:

 I've written a summary of my findings on implementing and using
 materialized views in PostgreSQL. I've already deployed eagerly updating
 materialized views on several views in a production environment for a
 company called RedWeek: http://redweek.com/. As a result, some queries
 that were taking longer than 30 seconds to run now run in a fraction of a
 millisecond.

 You can view my summary at
 http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

Interesting (and well written) summary. Even if not a built in feature, I'm 
sure that plenty of people will find this useful. Make sure it gets linked to 
from techdocs.

If you could identify candidate keys on a view, you could conceivably automate 
the process even more. That's got to be possible in some cases, but I'm not 
sure how difficult it is to do in all cases.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Major differences between oracle and postgres performance

2004-06-18 Thread Richard Huxton

Gary Cowell wrote:
I'm not as familiar with postgresql as I am with
Oracle but I think I've configured comparible
buffering and sort area sizes, certainly there isn't
much physical IO going on in either case.
People are going to want to know:
1. version of PG
2. explain analyse output, rather than just explain
3. What values you've used for the postgresql.conf file
The actual plan from explain analyse isn't going to be much use - as you 
say, a scan of the whole table followed by sorting is the best you'll 
get. However, the actual costs of these steps might say something useful.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [PERFORM] memory allocation

2004-06-18 Thread Richard Huxton

Michael Ryan S. Puncia wrote:
Hi everyone .
 

How much memory should I give to the kernel and postgresql
I have 1G of memory and 120G of HD
Devrim's pointed you to a guide to the configuration file. There's also 
an introduction to performance tuning on the same site.

An important thing to remember is that the sort_mem is the amount of 
memory available *per sort* and some queries can use several sorts.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] after using pg_resetxlog, db lost

2004-06-22 Thread Richard Huxton

Tom Lane wrote:
Shea,Dan [CIS] [EMAIL PROTECTED] writes:
The pg_resetxlog was run as root. It caused ownership problems of
pg_control and xlog files.
Now we have no access to the data now through psql.  The data is still
there under /var/lib/pgsql/data/base/17347  (PWFPM_DEV DB name).  But
there is no reference to 36 of our tables in pg_class.  Also the 18
other tables that are reported in this database have no data in them.
Is there anyway to have the database resync or make it aware of the data
under /var/lib/pgsql/data/base/17347?
How can this problem be resolved?

What this sounds like is that you reset the transaction counter along
with the xlog, so that those tables appear to have been created by
transactions in the future.  This could be repaired by doing
pg_resetxlog with a more appropriate initial transaction ID, but
figuring out what that value should be is not easy :-(
Tom - would there be any value in adding this to a pg_dump? I'm assuming 
the numbers attached to tables etc are their OIDs anyway, so it might be 
a useful reference in cases like this.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] postgres 7.4 at 100%

2004-06-25 Thread Richard Huxton

Chris Cheston wrote:
Hi all,
I was running Postgres 7.3 and it was running at about 15% with my
application. On Postgres 7.4 on another box, it was running at 100%...
People are going to need more information. Are you talking about 
CPU/disk IO/memory?

My settings are default on both boxes I think.
Doubtful - PG crawls with the default settings. Check your old 
postgresql.conf file and compare. Also, read the tuning article at:
 http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

There are only about 20 inserts per second, which is really low. 
Anyone have any ideas as to something I have to do to Postgres 7.4 to
change it from the default so that it's not eating up all my CPU?  I
have no clue how to debug this...
What does top/vmstat/iostat show during heavy usage?
Help please  Should I downgrade to 7.3 to see what happens?  BTW
I'm running Postgres 7.3.2 on:
Linux box 2.4.25-040218 #1 SMP Wed Feb 18 17:59:29 CET 2004 i686 i686
i386 GNU/Linux
on a single processor P4 1.4GHz, 512 MB RAM.  Does the SMP kernel do
something with the single processor CPU? or should this not affect
psql?
Don't know about the SMP thing. Unlikely that one of the big 
distributions would mess that up much though.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Query performance

2004-06-29 Thread Richard Huxton

Bill wrote:
Okso here lies the output of oclh (i.e \d oclh)  

Table public.oclh
 Column | Type  |   Modifiers   
+---+---
 symbol | character varying(10) | not null default ''
 date   | date  | not null default '0001-01-01'
 open   | numeric(12,2) | not null default '0.00'
 close  | numeric(12,2) | not null default '0.00'
 low| numeric(12,2) | not null default '0.00'
 high   | numeric(12,2) | not null default '0.00'
Indexes: symbol_2_oclh_index btree (symbol, date),
 symbol_oclh_index btree (symbol, date)
Well, I'm not sure why the two indexes on the same columns, and I'm not 
sure it makes sense to have defaults for _any_ of the columns there.

So - you want:
1. ratio = abs(closing-opening)/opening
2. average = all the ratios of each day of each stock
3. Highest average
Well, I don't know what you mean by #2, but #1 is just:
SELECT
  symbol,
  date,
  abs(close - open)/open AS ratio
FROM
  oclh
GROUP BY
  symbol, date;
I'd probably fill in a summary table with this and use that as the basis 
for your further queries. Presumably from yesterday back, the 
ratios/averages won't change.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Query performance

2004-06-29 Thread Richard Huxton

Bill wrote:
Ok, thanks.  So let me explain the query number 2 as this is the more
difficult to write.  So I have a list of stocks, this table contains the
price of all of the stocks at the open and close date.  Ok, now we have a
ratio from query (1) that returns at least a very rough index of the daily
performance of a given stock, with each ratio representing the stock's
performance in one day.  Now we need to average this with the same stock's
ratio every day, to get a total average for each stock contained in the
database.  Now I would simply like to find a ratio like this that represents
the average of every stock in the table and simply find the greatest ratio.
Sorry about the lousy explanation before, is this a bit better?
Here is an example if needed.
Say we have a stock by the name of YYY
I know, due to query 1 that stock YYY has a abs(close-open)/open price ratio
of for example, 1.3 on Dec 1 and (for simplicity let's say we only have two
dates) and Dec 2 the ratio for YYY is 1.5. So the query averages and gets
1.4.  Now it needs to do this for all of the stocks in the table and sort by
increasing ratio.
Well, the simplest would be something like:
CREATE VIEW my_ratios AS SELECT ...(select details we used for #1 
previously)

Query #1 then becomes:
SELECT * FROM my_ratios;
Then you could do:
SELECT
  symbol,
  avg(ratio) as ratio_avg
FROM
  my_ratios
GROUP BY
  symbol
ORDER BY
  avg(ratio)
;
Now, in practice, I'd probably create a symbol_ratio table and fill that 
one day at a time. Then #2,#3 would be easier.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] [GENERAL] How to know which queries are to be optimised?

2004-08-12 Thread Richard Huxton

Ulrich Wisser wrote:
You can log queries that run for at least a specified amount of time.
This will be useful in finding what the long running queries are.
You can then use explain analyse to see why they are long running.
But is there a tool that could compile a summary out of the log? The log 
grows awefully big after a short time.
You might want to look at the Practical Query Analyser - haven't used 
it myself yet, but it seems a sensible idea.

http://pqa.projects.postgresql.org/
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Weird Database Performance problem!

2004-08-13 Thread Richard Huxton

Arash Zaryoun wrote:
Hi Richard,
Thanks for your prompt reply. It fixed the problem. 
Just one more question: Do I need to create an index for FKs? 
You don't _need_ to, but on the referring side (e.g. table GCTBALLOT in 
your example) PostgreSQL won't create one automatically.

Of course, the primary-key side will already have an index being used as 
part of the constraint.

I've cc:ed the list on this, the question pops up quite commonly.
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] using an index worst performances

2004-08-19 Thread Richard Huxton

Gaetano Mendola wrote:
Hi all,
I'm tring to optimize the following query:
http://rafb.net/paste/results/YdO9vM69.html
as you can see from the explain after defining the
index the performance is worst.
If I raise the default_statistic_target to 200
then the performance are worst then before:
Without index: 1.140 ms
With index: 1.400 ms
With default_statistic_targer = 200:   1.800 ms
Can I just check that 1.800ms means 1.8 secs (You're using . as the 
thousands separator)?

If it means 1.8ms then frankly the times are too short to mean anything 
without running them 100 times and averaging.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] What is the best way to do attribute/values?

2004-08-24 Thread Richard Huxton

Daniel Ceregatti wrote:
Hi list,
I have a database with 1M people in it. Each person has about 20
attributes, such as height, weight, eye color, etc. I need to be able to
search for people based on these attributes. A search can be conducted
on one attribute, all attributes, or any number in between. How would
_you_ do this?
I have already attempted to answer this. My attempts are detailed here:
http://sh.nu/email.txt
Hmm... interesting.
Shot in the dark - try a tsearch2 full-text index. Your problem could be 
translated into searching strings of the form
  hair=black eyes=blue age=117

Not pretty, but might give you the speed you want.
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [PERFORM] Optimizer Selecting Incorrect Index

2004-08-26 Thread Richard Huxton

Dennis Bjorklund wrote:
On Wed, 25 Aug 2004, Richard Huxton wrote:
These queries are different. The first returns 687 rows and the second 
713 rows.

The 687 and 713 are the number of rows in the plan, not the number of rows 
the queries return.
D'OH! Thanks Dennis
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [PERFORM] [ADMIN] slower every day

2004-09-01 Thread Richard Huxton

G u i d o B a r o s i o wrote:
Conclusion:
If you comment a line on the conf file, and reload it, will remain in
the last state. (either wast true or false, while I expected a
default)
Yes, that's correct. No, you're not the only one to have been caught out 
by this.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] View Query Performance

2004-10-15 Thread Richard Huxton

Igor Maciel Macaubas wrote:
Hi all,
I'm trying to find smarter ways to dig data from my database, and
have the following scenario:
table1 -- id -- name . . . . . .
table2 -- id -- number . . . . . .
I want to create a view to give me back just what I want: The id, the
name and the number. I tought in doing the following: create view
my_view as select t1.id, t1.name, t2.number from table1 as t1, table2
as t2 where t1.id = t2.id;
Will this be enough fast ? Are there a faster way to make it work ?! 
This table is mid-big, around 100K registers ..
That's as simple a way as you will find. If you apply further 
conditions, e.g.
  SELECT * FROM my_view WHERE id = 123;
then you should see any index on id being used.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [PERFORM] [pgsql-hackers-win32] Poor Performance for large queries

2004-09-29 Thread Richard Huxton

John Meinel wrote:
So notice that when doing the actual select it is able to do the index 
query. But for some reason with a prepared statement, it is not able to 
do it.

Any ideas?
In the index-using example, PG knows the value you are comparing to. So, 
it can make a better estimate of how many rows will be returned. With 
the prepared/compiled version it has to come up with a plan that makes 
sense for any value.

If you look back at the explain output you'll see PG is guessing 181,923 
rows will match with the prepared query but only 1 for the second query. 
If in fact you returned that many rows, you wouldn't want to use the 
index - it would mean fetching values twice.

The only work-around if you are using plpgsql functions is to use 
EXECUTE to make sure your queries are planned for each value provided.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] inconsistent/weird index usage

2004-10-01 Thread Richard Huxton

Dustin Sallings wrote:
The following view creates the illusion of the old ``single-table'' 
model:

create view samples as
select * from samples_1999
 union  select * from samples_2000
 union  select * from samples_2001
 union  select * from samples_2002
 union  select * from samples_2003
 union  select * from samples_2004
Try this with UNION ALL (you know there won't be any duplicates) and 
possibly with some limits too:

SELECT * FROM samples_1999 WHERE ts BETWEEN '1999-01-01 00:00:00+00' AND 
'1999-12-31 11:59:59+00'
UNION ALL ...

select
s.serial as serial_num,
s.name as name,
date(ts) as day,
min(sample) as min_temp,
avg(sample) as avg_temp,
stddev(sample) as stddev_temp,
max(sample) as max_temp
from
samples inner join sensors s using (sensor_id)
where
ts  current_date - 7
group by
serial_num, name, day
order by
serial_num, day desc
Try restricting the timestamp too
WHERE
  ts BETWEEN (current_date -7) AND current_timestamp
Hopefully that will give the planner enough smarts to know it can skip 
most of the sample_200x tables.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] index not used if using IN or OR

2004-11-04 Thread Richard Huxton

Mario Ivankovits wrote:
Hello !
Sorry if this has been discussed before, it is just hard to find in the 
archives using the words or or in :-o

I use postgres-8.0 beta4 for windows.
I broke down my problem to a very simple table - two columns 
primary_key and secondary_key. Creates and Insert you will find below.

If I query the _empty_ freshly created table I get the following explain 
result:

select * from tt where seckey = 1;
Index Scan using seckey_key on tt  (cost=0.00..17.07 rows=5 width=12)
 Index Cond: (seckey = 1)
If I use OR (or IN) things get worse:
select * from tt where seckey = 1 or seckey = 2
Seq Scan on tt  (cost=0.00..0.00 rows=1 width=12)
 Filter: ((seckey = 1) OR (seckey = 2))
Note the Seq Scan instead of using the index.
But as you said, your table is *empty* - why would an index be faster? 
Try running EXPLAIN ANALYSE on these queries and look at the actual times.

After populating the table with 8920 records and analyze the scenario 
gets even worser:

select * from tt where seckey = 1;
Seq Scan on tt  (cost=0.00..168.50 rows=1669 width=12) (actual 
time=0.000..15.000 rows=1784 loops=1)
 Filter: (seckey = 1)
Total runtime: 31.000 ms

Now also this simple query uses a Seq Scan.
Well, it thinks it's going to be returning 1669 rows. If that's roughly 
right, then scanning the table probably is faster.

Run the queries again with EXPLAIN ANALYSE. Also try issuing
  set enable_seqscan=false;
This will force the planner to use any indexes it finds. Compare the 
times with and without, and don't forget to account for the effects of 
caching.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: FW: [PERFORM] FW: Index usage

2004-11-24 Thread Richard Huxton

BBI Edwin Punzalan wrote:
Thanks, Tim.
I tried adding an upper limit and its still the same as follows:
==
db=# explain analyze select date from chatlogs where date='11/24/04';
NOTICE:  QUERY PLAN:
Index Scan using chatlogs_date_idx on chatlogs  (cost=0.00..145.72 rows=37
width=4) (actual time=0.18..239.69 rows=10737 loops=1)
Total runtime: 246.22 msec
EXPLAIN
db=# explain analyze select date from chatlogs where date='11/23/04' and
date'11/24/04';
NOTICE:  QUERY PLAN:
Seq Scan on chatlogs  (cost=0.00..24763.19 rows=9200 width=4) (actual
time=0.44..4447.01 rows=13029 loops=1)
Total runtime: 4455.56 msec
We have two issues here
1. In the first example it only picks an index because it thinks it is 
going to get 37 rows, it actually gets 10737
2. It's taking 4455ms to run a seq-scan but only 246ms to run an 
index-scan over 10737 rows (and then fetch the rows too).

Questions:
1. How many rows do you have in chatlogs?
2. Is this the only problem you are experiencing, or just one from many?
3. Have you tuned any configuration settings? e.g. as suggested in:
   http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Postgres vs. MySQL

2004-11-24 Thread Richard Huxton

Evilio del Rio wrote:
Please, could anyone explain me this difference?
Is Postgres that bad?
Is MySQL that good?
Am I the only one to have observed this behavior?
Do you have any record of configuration, system hardware, usage 
patterns, queries executed?

If you can tell us what you tried (and why) then we might be able to 
help, otherwise there's not much information here.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [PERFORM] Postgres backend using huge amounts of ram

2004-11-26 Thread Richard Huxton

Gary Doades wrote:
How much RAM can a single postgres backend use?
I've just loaded a moderately sized dataset into postgres and was
applying RI constraints to the tables (using pgadmin on windows). Part
way though I noticed the (single) postgres backend had shot up to using
300+ MB of my RAM!
Oops - guess that's why they call it a Beta. My first guess was a queue 
of pending foreign-key checks or triggers etc. but then you go on to say...

Since I can't get an explain of what the alter table was doing I used this:
select count(*) from booking_plan,reqt_dates where
booking_plan.reqt_date_id = reqt_dates.reqt_date_id
and sure enough this query caused the backend to use 300M RAM. The plan
for this was:
[snip]
I then analysed the database. Note, there are no indexes at this stage
except the primary keys.
the same query then gave:
[snip]
This is the same set of hash joins, BUT the backend only used 30M of
private RAM.
I'm guessing in the first case that the default estimate of 1000 rows in 
a table means PG chooses to do the join in RAM. Once it knows there are 
a lot of rows it can tell not to do so.

However, I thought PG was supposed to spill to disk when the memory 
required exceeded config-file limits. If you could reproduce a simple 
test case I'm sure someone would be interested in squashing this bug.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Normalization or Performance

2004-12-02 Thread Richard Huxton

Alvaro Nunes Melo wrote:
Hi,
Before writing this mail, I'd researched a little about this topic,
and got some opinions from guys like Fabien Pascal, who argues that
logical design should be separated from physical design, and other
sources. As this is not fact, I'm writing to you guys, that make
things work in real world.
I believe he's right. Or at least that you should only compromise your 
logical design once it becomes absolutely necessary due to physical 
limitations.

We started our first big (for our company standards) project always 
thinking in normalization. But once we imported legacy data into the
DB, things got harder.

One example is the clients status. A client might be active, inactive
or pending (for many reasons). We store all the status a client have
since it is in the system. To check what is the actual status of a
client, we get the last status from this historical status table.
This take a considerable time, so our best results were achieved
building  a function that checks the status and indexing this
function. The problem is that indexed functions mus bu immutable, so
as you can figure, if the status change after the creation of the
index, the retunr of the function is still the same.
What do you suggest for situations like this? Should I add a field to
 clients table and store its actual status, keep storing data in the 
historical table an control its changes with a trigger?
Trigger + history table is a common solution, it's easy to implement and 
there's nothing non-relational about it as a solution.

There are other situations that are making things difficult to us.
For example, one query must return the total amount a client bought
in the last 90 days. It's taking too long, when we must know it for
many clients, many times. So should I create summarization tables to
store this kind of stuff, update it with a trigger in daily basis
(for example), and solve this problem with one join?
One solution I use for this sort of thing is a summary table grouped by 
date, and accurate until the start of today. Then, I check the summary 
table and the live table for todays information and sum those.

Our database is not that big. The larger table has about 7.000.000
rows. About 50.000 clients, half of them active. All that I'd point
out above uses indexes for queries, etc. But even with this it's not
been fast enough. We have a Dell server for this (I know, the Dell
issue), a Dual Xeon 2.8, SCSI HD, 1 GB mem. Do we need better
hardware for our system?
Swap one of your processors for more RAM and disks, perhaps.
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [PERFORM] Seqscan rather than Index

2004-12-17 Thread Richard Huxton

David Brown wrote:
You might want to reduce random_page_cost a little.

Keep in mind that your test case is small enough to fit in RAM and
is probably not reflective of what will happen with larger tables.

I am also running 8.0 rc1 for Windows. Despite many hours spent
tweaking various planner cost constants, I found little effect on
cost estimates. Even reducing random_page_cost from 4.0 to 0.1 had
negligible impact and failed to significantly influence the planner.
I'm not sure setting random_page_cost below 1.0 makes much sense.
Increasing the statistics target for the last_name column to 250 or
so *may* help, at least if you're only selecting one name at a time.
Not going to do anything in this case. The planner is roughly right 
about how many rows will be returned, it's just not expecting everything 
to be in RAM.

That's the standard advice around here and the only thing I've found
useful. Half the threads in this forum are about under-utilized
indexes. It would be great if someone could admit the planner is
broken and talk about actually fixing it!
Not sure I agree here - when the stats are accurate, you can get the 
planner to make near-optimal choices most of the time. Is there any 
particular pattern you've seen?

I'm unconvinced that the planner only favours sequential scans as
table size decreases. In my experience so far, larger tables have the
same problem only it's more noticeable.
Hmm - assuming your statistics are good, this would suggest the other 
cost settings just aren't right for your hardware.

The issue hits PostgreSQL harder than others because of its awful
sequential scan speed, which is two to five times slower than other
DBMS. The archives show there has been talk for years about this, but
it seems, no solution. The obvious thing to consider is the block
size, but people have tried increasing this in the past with only
marginal success.
Must admit this puzzles me. Are you saying you can't saturate your disk 
I/O? Or are you saying other DBMS store records in 0.5 to 0.2 times less 
space than PG?

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Postgres on Linux Cluster!

2004-12-21 Thread Richard Huxton

Gurpreet Sachdeva wrote:
I have recently transfered a big database on my master node of a 4
node openSSI Cluster... The system is working fine but sometimes, I
get following errors:
http://192.168.1.100/cgi-bin/search.py File
/usr/lib/python2.2/site-packages/pyPgSQL/PgSQL.py, line 3067, in
execute, referer: http://192.168.1.100/cgi-bin/search.py 
self.conn.conn.query('ROLLBACK WORK'), referer:
http://192.168.1.100/cgi-bin/search.py libpq.ProgrammingError: no
connection to the server, referer:
http://192.168.1.100/cgi-bin/search.py , referer:
http://192.168.1.100/cgi-bin/search.py
At a wild guess, this happens when a CGI process is migrated to another 
node without migrating the accompanying connection (however you'd do that).

This error comes while insertion of data takes place... Is Postgres
successfull on Cluster??? Will that give me performance enhancement
in any way??? Please help...
Probably not, and almost certainly not.
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [PERFORM] Performance delay

2005-01-13 Thread Richard Huxton

Hasnul Fadhly bin Hasan wrote:
Hi,
just want to share with all of you a wierd thing that i found when i 
tested it.

i was doing a query that will call a function long2ip to convert bigint 
to ips.

so the query looks something like this.
select id, long2ip(srcip), long2ip(dstip) from sometable
where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 
23:59' order by id limit 30;

for your info, there are about 300k rows for that timeframe.
it cost me about 57+ secs to get the list.
which is about the same if i query
select id, long2ip(srcip), long2ip(dstip) from sometable
where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 
23:59'

it will cost me about 57+ secs also.
Now if i did this
select id,long2ip(srcip), long2ip(dstip) from (
* from sometable
where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 
23:59' order by id limit 30) as t;

it will cost me about 3+ secs
The difference will be that in the final case you only make 30 calls to 
long2ip() whereas in the first two you call it 300,000 times and then 
throw away most of them.
Try running EXPLAIN ANALYSE ... for both - that will show how PG is 
planning the query.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] sum of all values

2005-01-14 Thread Richard Huxton

Madison Kelly wrote:
Hi all,
  Is there a fast(er) way to get the sum of all integer values for a 
certain condition over many thousands of rows? What I am currently doing 
is this (which takes ~5-10sec.):
OK, I'm assuming you've configured PG to your satisfaction and this is 
the only query giving you problems.

SELECT SUM (a.file_size) FROM file_info_1 a, file_set_1 b WHERE 
a.file_name=b.fs_name AND a.file_parent_dir=b.fs_parent_dir AND 
a.file_type=b.fs_type AND b.fs_backup='t';
You'll want to run EXPLAIN ANALYSE SELECT SUM... and post the output of 
that, although the query looks straightforward enough.

  Here are the schemas, in case they help:
tle-bu= \d file_info_1Table public.file_info_1
 Column  |  Type   | Modifiers
-+-+
 file_acc_time   | bigint  | not null
 file_group_name | text| not null
 file_group_uid  | integer | not null
 file_mod_time   | bigint  | not null
 file_name   | text| not null
 file_parent_dir | text| not null
 file_perm   | text| not null
 file_size   | bigint  | not null
 file_type   | text| not null default 'f'::text
 file_user_name  | text| not null
 file_user_uid   | integer | not null
Indexes:
file_info_1_display_idx btree (file_parent_dir, file_name, file_type)
file_info_1_search_idx btree (file_parent_dir, file_name, file_type)
tle-bu= \d file_set_1   Table public.file_set_1
Column |  Type   | Modifiers
---+-+
 fs_backup | boolean | not null default true
 fs_display| boolean | not null default false
 fs_name   | text| not null
 fs_parent_dir | text| not null
 fs_restore| boolean | not null default false
 fs_type   | text| not null default 'f'::text
Indexes:
file_set_1_sync_idx btree (fs_parent_dir, fs_name, fs_type)
1. WHERE ARE YOUR PRIMARY KEYS???
2. Why do you have two identical indexes on file_info_1
3. WHERE ARE YOUR PRIMARY KEYS???
4. Am I right in thinking that always, file_name==fs_name (i.e. they 
represent the same piece of information) and if so, why are you storing 
it twice? Same for _parent_dir too
5. file_type/fs_type are being held as unbounded text? Not an index into 
some lookup table or a varchar(N)?

Can you explain what you're trying to do here - it might be you want to 
alter your database design.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] sum of all values

2005-01-14 Thread Richard Huxton

Madison Kelly wrote:
Richard Huxton wrote:
Madison Kelly wrote:
Hi all,
  Is there a fast(er) way to get the sum of all integer values for a 
certain condition over many thousands of rows? What I am currently 
doing is this (which takes ~5-10sec.):
OK, I'm assuming you've configured PG to your satisfaction and this is 
the only query giving you problems.
  This is a program for general consumption (hopefully... eventually...) 
so I want to leave the psql config alone. Once I am happier with the 
program I will try different tuning options and write a faq though I 
expect 9 out of 10 users won't read it.
PostgreSQL is not FireFox, and you can't expect it to work efficiently 
without doing at least some configuration. The settings to support 100 
simultaneous connections on a dual-Opteron with 8GB RAM are not the same 
as on a single-user laptop.
Take half an hour to read through the performance-tuning guide here:
  http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

SELECT SUM (a.file_size) FROM file_info_1 a, file_set_1 b WHERE 
a.file_name=b.fs_name AND a.file_parent_dir=b.fs_parent_dir AND 
a.file_type=b.fs_type AND b.fs_backup='t';
You'll want to run EXPLAIN ANALYSE SELECT SUM... and post the output 
of that, although the query looks straightforward enough.
tle-bu= EXPLAIN ANALYZE SELECT SUM (a.file_size) FROM file_info_1 a, 
file_set_1 b WHERE a.file_name=b.fs_name AND 
a.file_parent_dir=b.fs_parent_dir AND a.file_type=b.fs_type AND 
b.fs_backup='t';

QUERY PLAN

 Aggregate  (cost=2202.54..2202.54 rows=1 width=8) (actual 
time=5078.744..5078.748 rows=1 loops=1)
   -  Merge Join  (cost=724.94..2202.51 rows=11 width=8) (actual 
time=3281.677..4969.719 rows=12828 loops=1)
 Merge Cond: ((outer.file_parent_dir = inner.fs_parent_dir) 
AND (outer.file_name = inner.fs_name) AND (outer.file_type = 
inner.fs_type))
 -  Index Scan using file_info_1_search_idx on file_info_1 a 
(cost=0.00..1317.11 rows=12828 width=104) (actual time=0.042..116.825 
rows=12828 loops=1)
 -  Sort  (cost=724.94..740.97 rows=6414 width=96) (actual 
time=3281.516..3350.640 rows=12828 loops=1)
   Sort Key: b.fs_parent_dir, b.fs_name, b.fs_type
   -  Seq Scan on file_set_1 b  (cost=0.00..319.35 
rows=6414 width=96) (actual time=0.029..129.129 rows=12828 loops=1)
 Filter: (fs_backup = true)
 Total runtime: 5080.729 ms
Well, it's slow, but that's probably your settings. Run VACUUM ANALYSE 
on the tables though, it looks like you've got default statistics (It's 
expecting exactly 1/2 the fs_backup values to be true - 6414 out of 12828).

  Here are the schemas, in case they help:
tle-bu= \d file_info_1Table public.file_info_1
 Column  |  Type   | Modifiers
-+-+
 file_acc_time   | bigint  | not null
 file_group_name | text| not null
 file_group_uid  | integer | not null
 file_mod_time   | bigint  | not null
 file_name   | text| not null
 file_parent_dir | text| not null
 file_perm   | text| not null
 file_size   | bigint  | not null
 file_type   | text| not null default 'f'::text
 file_user_name  | text| not null
 file_user_uid   | integer | not null
Indexes:
file_info_1_display_idx btree (file_parent_dir, file_name, 
file_type)
file_info_1_search_idx btree (file_parent_dir, file_name, 
file_type)

tle-bu= \d file_set_1   Table public.file_set_1
Column |  Type   | Modifiers
---+-+
 fs_backup | boolean | not null default true
 fs_display| boolean | not null default false
 fs_name   | text| not null
 fs_parent_dir | text| not null
 fs_restore| boolean | not null default false
 fs_type   | text| not null default 'f'::text
Indexes:
file_set_1_sync_idx btree (fs_parent_dir, fs_name, fs_type)

1. WHERE ARE YOUR PRIMARY KEYS???
2. Why do you have two identical indexes on file_info_1
3. WHERE ARE YOUR PRIMARY KEYS???
4. Am I right in thinking that always, file_name==fs_name (i.e. they 
represent the same piece of information) and if so, why are you 
storing it twice? Same for _parent_dir too
5. file_type/fs_type are being held as unbounded text? Not an index 
into some lookup table or a varchar(N)?

Can you explain what you're trying to do here - it might be you want 
to alter your database design.
--
  Richard Huxton
  Archonet Ltd
  This is where I have to admit my novice level of knowledge. Until now 
I have been more concerned with making it work. It is only now that I 
have finished (more or less) the program that I have started going back 
and trying to find ways to speed it up. I have not used postgres (or 
perl or anything) before this program. I hope my questions aren't too 
basic. ^.^;
There's a rule of thumb about throwing the first version of anything

Re: [PERFORM] Increasing RAM for more than 4 Gb. using postgresql

2005-01-19 Thread Richard Huxton

[EMAIL PROTECTED] wrote:
I would like to upgrade both OS kernel and PGsql version , so in my opinion the
best way to handle it is to *backup* the data in .tar
Just remember if you're going from 7.3.2 = 7.4.x or 8.0 then you'll 
need to use pg_dump not just tar up the directories. If you do use tar, 
remember to backup *all* the directories.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton

Andrei Bintintan wrote:
Hi to all,
I have the following 2 examples. Now, regarding on the offset if it
is small(10) or big(5) what is the impact on the performance of
the query?? I noticed that if I return more data's(columns) or if I
make more joins then the query runs even slower if the OFFSET is
bigger. How can I somehow improve the performance on this?
There's really only one way to do an offset of 1000 and that's to fetch 
1000 rows and then some and discard the first 1000.

If you're using this to provide pages of results, could you use a cursor?
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton

Andrei Bintintan wrote:
If you're using this to provide pages of results, could you use a 
cursor?
What do you mean by that? Cursor?
Yes I'm using this to provide pages, but If I jump to the last pages 
it goes very slow.
DECLARE mycursor CURSOR FOR SELECT * FROM ...
FETCH FORWARD 10 IN mycursor;
CLOSE mycursor;
Repeated FETCHes would let you step through your results. That won't 
work if you have a web-app making repeated connections.

If you've got a web-application then you'll probably want to insert the 
results into a cache table for later use.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton

Alex Turner wrote:
I am also very interesting in this very question.. Is there any way
to declare a persistant cursor that remains open between pg sessions?
Not sure how this would work. What do you do with multiple connections? 
Only one can access the cursor, so which should it be?

 This would be better than a temp table because you would not have to
 do the initial select and insert into a fresh table and incur those
IO costs, which are often very heavy, and the reason why one would
want to use a cursor.
I'm pretty sure two things mean there's less difference than you might 
expect:
1. Temp tables don't fsync
2. A cursor will spill to disk beyond a certain size

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton

Ron Mayer wrote:
Richard Huxton wrote:
If you've got a web-application then you'll probably want to insert 
the results into a cache table for later use.

If I have quite a bit of activity like this (people selecting 1 out
of a few million rows and paging through them in a web browser), would
it be good to have a single table with a userid column shared by all
users, or a separate table for each user that can be truncated/dropped?
I started out with one table; but with people doing 10s of thousand
of inserts and deletes per session, I had a pretty hard time figuring
out a reasonable vacuum strategy.
As often as you can, and make sure your config allocates enough 
free-space-map for them. Unless, of course, you end up I/O saturated.

Eventually I started doing a whole bunch of create table tmp_
tables where  is a userid; and a script to drop these tables - but
that's quite ugly in a different way.
With 8.0 I guess I'll try the single table again - perhaps what I
want may be to always have a I/O throttled vacuum running...  hmm.
Well, there have been some tweaks, but I don't know if they'll help in 
this case.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] [SQL] OFFSET impact on Performance???

2005-01-20 Thread Richard Huxton

Greg Stark wrote:
Andrei Bintintan [EMAIL PROTECTED] writes:

If you're using this to provide pages of results, could you use a cursor?
What do you mean by that? Cursor?
Yes I'm using this to provide pages, but If I jump to the last pages it goes
very slow.

The best way to do pages for is not to use offset or cursors but to use an
index. This only works if you can enumerate all the sort orders the
application might be using and can have an index on each of them.
To do this the query would look something like:
SELECT * FROM tab WHERE col  ? ORDER BY col LIMIT 50
Then you take note of the last value used on a given page and if the user
selects next you pass that as the starting point for the next page.
Greg's is the most efficient, but you need to make sure you have a 
suitable key available in the output of your select.

Also, since you are repeating the query you could get different results 
as people insert/delete rows. This might or might not be what you want.

A similar solution is to partition by date/alphabet or similar, then 
page those results. That can reduce your resultset to a manageable size.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Profiling a function...

2005-01-21 Thread Richard Huxton

Robert Sanford wrote:
How do I profile a user-defined function so that I know which parts of the
function are the ones that are taking the biggest chunk of time?
When I run EXPLAIN on the queries within the function none of them show up
as onerous burdens to the performance. But when they are all operating
together within the function and within the functional logic they become
really expensive. Obviously I've made a mistake somewhere but it isn't
obvious (otherwise it would be fixed already) and I'd prefer having a
profile report telling me what is taking so long rather than guessing and
possibly making things worse.
So is there any way to get a line-by-line timing profile of a user-defined
function?
Not really. What you can do is simulate the queries in functions by 
using PREPARE. You're probably seeing a difference because when PG plans 
 the queries for functions/prepared queries it doesn't know the actual 
values.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq

Re: [PERFORM] Possibly slow query

2005-01-26 Thread Richard Huxton

Peter Darley wrote:
Folks,
I'm using PostgreSQL 7.4.1 on Linux, and I'm trying to figure out 
weather a
query I have is going to be slow when I have more information in my tables.
both tables involved will likely have ~500K rows within a year or so.
Specifically I can't tell if I'm causing myself future problems with the
subquery, and should maybe re-write the query to use a join.  The reason I
went with the subquery is that I don't know weather a row in Assignments
will have a corresponding row in Assignment_Settings
The query is:
SELECT User_ID
FROM Assignments A
WHERE A.User_ID IS NOT NULL
AND (SELECT Value FROM Assignment_Settings WHERE Setting='Status' AND
Assignment_ID=A.Assignment_ID) IS NULL
GROUP BY User_ID;
You could always use a LEFT JOIN instead, like you say. I'd personally 
be tempted to select distinct user_id's then join, but it depends on how 
many of each.

You're not going to know for sure whether you'll have problems without 
testing. Generate 500k rows of plausible looking test-data and give it a 
try.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq

Re: [PERFORM] Possibly slow query

2005-01-26 Thread Richard Huxton

Peter Darley wrote:
Folks,
	I'm using PostgreSQL 7.4.1 on Linux
Oh, and move to the latest in the 7.4 series too.
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] Very important choice

2005-02-01 Thread Richard Huxton

Lago, Bruno Almeida do wrote:
Hello my friends,
I'd like to know (based on your experience and technical details) which OS
is recommended for running PostgreSQL keeping in mind 3 indicators:
1 - Performance (SO, Network and IO)
2 - SO Stability
3 - File System Integrity
The short answer is almost certainly whichever OS you are most familiar 
with. If you have a problem, you don't want to be learning new details 
about your OS while fixing it. That rules out FreeBSD for now.

What hardware you want to use will affect performance and choice of OS. 
You'll need to decide what hardware you're looking to use.

As far as file-systems are concerned, ext3 seems to be the slowest, and 
the general feeling seems to be that XFS is perhaps the fastest. In 
terms of reliability, avoid cutting-edge releases of any file-system - 
let others test them for you. One thing to consider is how long it takes 
to recover from a crash - you can run PostgreSQL on ext2, but checking a 
large disk can take hours after a crash. That's the real benefit of 
journalling for PG - speed of recovery.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Tunning postgresql on linux (fedora core 3)

2005-02-03 Thread Richard Huxton

Din Adrian wrote:
Hello,
I have a little time and I decided to improve the performance of my  
server(s). I have found on google many 'tips' in tuning linux kernel 
and  postgresql database ... but I can't decide wich 'how-to' is better 
... :(
So the question is: where to find a 'easy' and complete documentation  
about this tweaks ... ?
Try the performance tuning article linked from this page:
 http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Tunning postgresql on linux (fedora core 3)

2005-02-03 Thread Richard Huxton

Please CC the mailing list as well as replying to me, so that others can 
help too.

Din Adrian wrote:
yes I have read this as well ...
One question about this option:
fsync = true / false
 a) I have Raid and UPS - it is safe to turn this off ... (' But be 
very  aware that any unexpected database shutdown will force you to 
restore the  database from your last backup.' - from my last backup if 
the server goes  down ??? why ? just at 'any unexpected database 
shutdown' ?  !!!)
Because fsync=true flushes transaction details to disk (the Write Ahead 
Log). That way if (say) the power-supply in your server fails you can 
check the WAL and compare it to the main database files to make sure 
everything is in a known state.

 b) in docs say that after 7.2 seting this to false does'n turn off the  
wall ...!? wich option does?
The docs don't say that, as far as I can see. It doesn't make sense to 
turn off the WAL.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [PERFORM] Tunning postgresql on linux (fedora core 3)

2005-02-03 Thread Richard Huxton

I'll repeat myself:

Please CC the mailing list as well as replying to me, so that others
can  help too.

Din Adrian wrote:
On Thu, 03 Feb 2005 13:56:50 +, Richard Huxton dev@archonet.com  
wrote:

Please CC the mailing list as well as replying to me, so that others 
can  help too.


 b) in docs say that after 7.2 seting this to false does'n turn off  
the  wall ...!? wich option does?

The docs don't say that, as far as I can see. It doesn't make sense 
to  turn off the WAL.

hmm this is the doc about ...
' NOTE: Since 7.2, turning fsync off does NOT stop WAL. It does stop  
checkpointing, however. This is a change in the notes that follow Turn 
WAL  off (fsync=false) only for a read-only database or one where the 
database  can be regenerated from external software. While RAID plus 
UPSes can do a  lot to protect your data, turning off fsync means that 
you will be  restoring from backup in the event of hardware or power 
failure.'
I don't know what this is, and you don't give a URL, but it DOES NOT 
appear to be in the manuals.

You should probably read the sections of the manuals regarding run-time 
configuration and write ahead logs. The manuals are quite extensive, 
are available online at http://www.postgresql.org/ and also in most 
distributions.

This is probably a good place to start.
http://www.postgresql.org/docs/8.0/interactive/runtime-config.html#RUNTIME-CONFIG-WAL
If you turn it off you should have more speed ... !!!???
Basically, as I said in my last email - fsync=true makes sure 
transaction details are safely stored on disk. If you turn this off, the 
database doesn't have to wait for the data to physically be written to 
the disk. But, if power fails then data might be in OS or disk cache and 
so lost when you restart the machine.

Please CC the mailing list if you reply to this message.
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Richard Huxton

Joost Kraaijeveld wrote:
Hi all,
A question on how to read and interpret the explain analyse statement
(and what to do)
I have a query SELECT A.ordernummer, B.klantnummer FROM orders A
LEFT OUTER JOIN klt_alg B ON A.Klantnummer=B.Klantnummer ORDER BY
A.klantnummer;
Both tables have an btree index on klantnummer (int4, the column the
join is on). I have vacuumed and analyzed both tables. The explain
analyse is:
Indexes not necessarily useful here since you're fetching all rows in A 
and presumably much of B

Sort
  Hash Left Join
Seq Scan on orders a
Hash
  Seq Scan on klt_alg b
I've trimmed the above from your explain output. It's sequentially 
scanning b and using a hash to join to a before sorting the results.

Questions: -  Hash Left Join  (cost=41557.43..110069.51 rows=1100836
width=12) (actual time=21263.858..42845.158 rows=1104380 loops=1)
0. What exactly are the numbers in cost=41557.43..110069.51 ( I
assume for the other questions that 41557.43 is the estimated MS the
query will take, what are the others)?
The cost numbers represent effort rather than time. They're only 
really useful in that you can compare one part of the query to another. 
There are two numbers because the first shows startup, the second final 
time. So - the outer parts of the query will have increasing startup 
values since the inner parts will have to do their work first.

The actual time is measured in ms, but remember to multiply it by the 
loops value. Oh, and actually measuring the time slows the query down too.

1. I assume that (cost=41557.43..110069.51 rows=1100836 width=12) is
the estimated cost and (actual time=21263.858..42845.158 rows=1104380
loops=1) the actual cost. Is the difference acceptable?
2. If not, what can I do about it?
The key thing to look for here is the number of rows. If PG expects say 
100 rows but there are instead 10,000 then it may choose the wrong plan. 
In this case the estimate is 1,100,836 and the actual is 1,104,380 - 
very close.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq

Re: [PERFORM] Help with tuning this query

2005-03-02 Thread Richard Huxton

Ken Egervari wrote:
I've tried to use Dan Tow's tuning method
Who? What?
 and created all the right
indexes from his diagraming method, but the query still performs
quite slow both inside the application and just inside pgadmin III.
Can anyone be kind enough to help me tune it so that it performs
better in postgres?  I don't think it's using the right indexes, or
maybe postgres needs special treatment.
I've converted the below query to SQL from a Hibernate query, so the
syntax is probably not perfect but it's semantics are exactly the
same.  I've done so by looking at the source code, but I can't run it
to get the exact SQL since I don't have the database on my home
machine.
Hibernate is a java thing, no? It'd be helpful to have the actual SQL 
the hibernate class (or whatever) generates. One of the problems with 
SQL is that you can have multiple ways to get the same results and it's 
not always possible for the planner to convert from one to the other.

Anyway, people will want to see EXPLAIN ANALYSE for the query in 
question. Obviously, make sure you've vacuumed and analysed the tables 
in question recently. Oh, and make sure yousay what version of PG you're 
running.

select s.* from shipment s inner join carrier_code cc on
s.carrier_code_id = cc.id inner join carrier c on cc.carrier_id =
c.id inner join carrier_to_person ctp on ctp.carrier_id = c.id inner
join person p on p.id = ctp.person_id inner join shipment_status cs
on s.current_status_id = cs.id inner join release_code rc on
cs.release_code_id = rc.id left join shipment_status ss on
ss.shipment_id = s.id where p.id = :personId and s.is_purged = false
and rc.number = '9' and cs is not null and cs.date = current_date -
31 order by cs.date desc
1. Why are you quoting the 9 when checking against rc.number?
2. The cs is not null doesn't appear to be qualified - which table?
Just assume I have no indexes for the moment because while some of
the indexes I made make it work faster, it's still around 250
milliseconds and under heavy load, the query performs very badly (6-7
seconds).
3. If you rewrite the current_date - 31 as a suitable ago(31) function 
then you can use an index on cs.date
4. Are you familiar with the configuration setting join_collapse_limit?

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] bad plan

2005-03-08 Thread Richard Huxton

Gaetano Mendola wrote:
running a 7.4.5 engine, I'm facing this bad plan:
empdb=# explain analyze SELECT 
name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp
empdb-#FROM v_sc_user_request
empdb-#WHERE
empdb-# login = 'babinow1'
empdb-#LIMIT 10 ;

 -  Subquery Scan vsp  (cost=985.73..1016.53 rows=1103 
width=12) (actual time=25.328..1668.754 rows=493 loops=31)
   -  Merge Join  (cost=985.73..1011.01 rows=1103 
width=130) (actual time=25.321..1666.666 rows=493 loops=31)
 Merge Cond: (outer.id_program = 
inner.id_program)
The problem to address is in this subquery. That's a total of 31 x 
(1668.754 - 25.328) = 50seconds (about).

Since your query is so simple, I'm guessing v_sc_user_request is a view. 
Can you provide the definition?
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] bad plan

2005-03-08 Thread Richard Huxton

Tom Lane wrote:
Putting ORDER BYs in views that you intend to use as components of other
views is a bad practice from a performance perspective...
There are also a lot of views involved here for very few output columns. 
Tom - is the planner smart enough to optimise-out unneeded columns from 
a SELECT * view if it's part of a join/subquery and you only use one or 
two columns?

Secondly, in the original plan we have:
-  Nested Loop Left Join  (cost=1478.82..1716.37 rows=1 width=201) 
(actual time=3254.483..52847.064 rows=31 loops=1)

Now, we've got 31 rows instead of 1 here. The one side of the join ends 
up as:
-  Subquery Scan vsp  (cost=985.73..1016.53 rows=1103 width=12) (actual 
time=25.328..1668.754 rows=493 loops=31)
-  Merge Join  (cost=985.73..1011.01 rows=1103 width=130) (actual 
time=25.321..1666.666 rows=493 loops=31)

Would I be right in thinking the planner doesn't materialise the 
subquery because it's expecting 1 loop not 31? If there were 1 row the 
plan would seem OK to me.

Is there any mileage in the idea of a lazy planner that keeps some 
alternative paths around in case they're needed? Or a reactive one that 
can re-plan nodes when assumptions turn out to be wrong?

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq

Re: [PERFORM] What's better: Raid 0 or disk for seperate pg_xlog

2005-03-11 Thread Richard Huxton

Karim Nassar wrote:
Thanks to all for the tips.
On Thu, 2005-03-10 at 09:26 -0600, John A Meinel wrote:
How critical is your data? How update heavy versus read heavy, etc are you? 

Large, relatively infrequent uploads, with frequent reads. The
application is a web front-end to scientific research data. The
scientists have their own copy of the data, so if something went really
bad, we could probably get them to upload again.
If you have very few updates and your reads aren't mostly from RAM you 
could be better off with simply mirroring (assuming that gains you read 
bandwidth). Failing that, use the tablespace feature to balance your 
read load as far as you can.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend

1 2 3 4 5 >

1 - 100 of 456 matches

Mail list logo