Re: [PERFORM] Various performance questions

2003-10-27 Thread Shridhar Daithankar
Dror Matalon wrote:

On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote:
Most of the time involves:

a) Reading each page of the table, and
b) Figuring out which records on those pages are still live.


The table has been VACUUM ANALYZED so that there are no dead records.
It's still not clear why select count() would be slower than select with
a where clause.
Do a vacuum verbose full and then everything should be within small range of 
each other.

Also in the where clause, does explicitly typecasting helps?

Like 'where channel5000::int2;'

 HTH

 Shridhar

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Various performance questions

2003-10-27 Thread Dror Matalon
On Mon, Oct 27, 2003 at 12:52:27PM +0530, Shridhar Daithankar wrote:
 Dror Matalon wrote:
 
 On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote:
 Most of the time involves:
 
 a) Reading each page of the table, and
 b) Figuring out which records on those pages are still live.
 
 
 The table has been VACUUM ANALYZED so that there are no dead records.
 It's still not clear why select count() would be slower than select with
 a where clause.
 
 Do a vacuum verbose full and then everything should be within small range 
 of each other.
 

I did vaccum full verbose and the results are the same as before, 55
seconds for count(*) and 26 seconds for count(*) where channel  5000.

 Also in the where clause, does explicitly typecasting helps?
 
 Like 'where channel5000::int2;'

It makes no difference.

 
  HTH
 
  Shridhar
 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] explicit casting required for index use

2003-10-27 Thread Neil Conway
On Sat, 2003-10-25 at 13:49, Reece Hart wrote:
 Having to explicitly cast criterion is very non-intuitive. Moreover,
 it seems quite straightforward that PostgreSQL might incorporate casts

This is a well-known issue with the query optimizer -- search the
mailing list archives for lots more information. The executive summary
is that this is NOT a trivial issue to fix, and it hasn't been fixed in
7.4, but there is some speculation on how to fix it at some point in the
future.

-Neil



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance Concern

2003-10-27 Thread Manfred Koizar
On Sun, 26 Oct 2003 00:13:36 +0300, Hannu Krosing [EMAIL PROTECTED] wrote:
UPDATE baz
   SET customer_id = '1234'
 WHERE baz_key IN (
SELECT baz_key
  FROM baz innerbaz
 WHERE customer_id IS NULL
   and innerbaz.baz_key = baz.baz_key
 LIMIT 1000 );

AFAICS this is not what the OP intended.  It is equivalent to 

UPDATE baz
   SET customer_id = '1234'
 WHERE customer_id IS NULL;

because the subselect is now correlated to the outer query and is
evaluated for each row of the outer query which makes the LIMIT clause
ineffective.

Servus
 Manfred

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Various performance questions

2003-10-27 Thread Christopher Browne
In the last exciting episode, [EMAIL PROTECTED] (Dror Matalon) wrote:
 I was answering an earlier response that suggested that maybe the actual
 counting took time so it would take quite a bit longer when there are
 more rows to count.

Well, if a where clause allows the system to use an index to search
for the subset of elements, that would reduce the number of pages that
have to be examined, thereby diminishing the amount of work.

Why don't you report what EXPLAIN ANALYZE returns as output for the
query with WHERE clause?  That would allow us to get more of an idea
of what is going on...
-- 
(format nil [EMAIL PROTECTED] cbbrowne acm.org)
http://www3.sympatico.ca/cbbrowne/spiritual.html
When  replying, it  is  often possible  to cleverly edit  the original
message in such a way  as to subtly alter  its meaning or tone to your
advantage while  appearing that you are  taking pains  to preserve the
author's intent.   As a   bonus,   it will   seem that your   superior
intellect is cutting through all the excess verbiage to the very heart
of the matter.  -- from the Symbolics Guidelines for Sending Mail

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Various performance questions

2003-10-27 Thread Greg Stark
Christopher Browne [EMAIL PROTECTED] writes:

 In the last exciting episode, [EMAIL PROTECTED] (Dror Matalon) wrote:
  I was answering an earlier response that suggested that maybe the actual
  counting took time so it would take quite a bit longer when there are
  more rows to count.

That was my theory. I guess it's wrong. There is other work involved in
processing a record, but i'm surprised it's as long as the work to actually
pull the record from kernel and check if it's visible.

 Well, if a where clause allows the system to use an index to search
 for the subset of elements, that would reduce the number of pages that
 have to be examined, thereby diminishing the amount of work.

it's not. therein lies the mystery.

 Why don't you report what EXPLAIN ANALYZE returns as output for the
 query with WHERE clause?  That would allow us to get more of an idea
 of what is going on...

He did, right at the start of the thread.

For a 1 million record table without he's seeing

 select 1 from tab
 select count(*) from tab

being comparable with only a slight delay for the count(*) whereas

 select 1 from tab where c  1000
 select count(*) from tab where c  1000

are much faster even though they still use a sequential scan.

I'm puzzled why the where clause speeds things up as much as it does.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Various performance questions

2003-10-27 Thread Tarhon-Onu Victor
On Sun, 26 Oct 2003, Dror Matalon wrote:

 Here's the structure of the items table
[snip]
  pubdate   | timestamp with time zone |
 Indexes:
 item_channel_link btree (channel, link)
 item_created btree (dtstamp)
 item_signature btree (signature)
 items_channel_article btree (channel, articlenumber)
 items_channel_tstamp btree (channel, dtstamp)
 
 
 5. Any other comments/suggestions on the above setup.

Try set enable_seqscan = off; set enable_indexscan = on; to 
force the planner to use one of the indexes. Analyze the queries from 
your application and see what are the most used columns in WHERE clauses 
and recreate the indexes. select count(*) from items where channel  
5000; will never use any of the current indexes because none matches 
your WHERE clause (channel appears now only in multicolumn indexes).

-- 
Any views or opinions presented within this e-mail are solely those of
the author and do not necessarily represent those of any company, unless
otherwise expressly stated.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Linux Filesystem Shootout

2003-10-27 Thread Christopher Kings-Lynne
http://fsbench.netnation.com/

Seems to answer a few of the questions about which might be the best 
filesystem...

Chris



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Various performance questions

2003-10-27 Thread Neil Conway
On Mon, 2003-10-27 at 10:15, Tarhon-Onu Victor wrote:
 select count(*) from items where channel  
 5000; will never use any of the current indexes because none matches 
 your WHERE clause (channel appears now only in multicolumn indexes).

No -- a multi-column index can be used to answer queries on a prefix of
the index's column list. So an index on (channel, xyz) can be used to
answer queries on (just) channel.

-Neil



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Various performance questions

2003-10-27 Thread Neil Conway
On Sun, 2003-10-26 at 22:49, Greg Stark wrote:
 What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an
 int8 to store its count so it's not limited to 4 billion records.
 Unfortunately int8 is somewhat inefficient as it has to be dynamically
 allocated repeatedly.

Uh, what? Why would an int8 need to be dynamically allocated
repeatedly?

-Neil



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] My own performance/tuning qa

2003-10-27 Thread Vivek Khera
 AL == Allen Landsidel [EMAIL PROTECTED] writes:

 you need to bump some header file constant and rebuild the kernel.  it
 also increases the granularity of how the buffer cache is used, so I'm
 not sure how it affects overall system.  nothing like an experiment...

AL So far I've found a whole lot of questions about this, but nothing
AL about the constant.  The sysctl (vfs.hibufspace I believe is the one)
AL is read only, although I should be able to work around that via
AL /boot/loader.conf if I can't find the kernel option.

Here's what I have in my personal archive.  I have not tried it yet.
BKVASIZE is in a system header file, so is not a regular tunable for
a kernel.  That is, you must muck with the source files to change it,
which make for maintenance headaches.



From: Sean Chittenden [EMAIL PROTECTED]
Subject: Re: go for a script! / ex:  PostgreSQL vs. MySQL
Newsgroups: ml.postgres.performance
To: Vivek Khera [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Date: Mon, 13 Oct 2003 12:04:46 -0700
Organization: none

  echo effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))
  
  I've used it for my dedicated servers.  Is this calculation correct?
 
 SC Yes, or it's real close at least.  vfs.hibufspace is the amount
 of SC kernel space that's used for caching IO operations (minus the
 
 I'm just curious if anyone has a tip to increase the amount of
 memory FreeBSD will use for the cache?

Recompile your kernel with BKVASIZE set to 4 times its current value
and double your nbuf size.  According to Bruce Evans:

Actually there is a way: the vfs_maxbufspace gives the amount of
space reserved for buffer kva (= nbuf * BKVASIZE).  nbuf is easy to
recover from this, and the buffer kva space may be what is wanted
anyway.
[snip]
I've never found setting nbuf useful, however.  I want most
parametrized sizes including nbuf to scale with resource sizes, and
it's only with RAM sizes of similar sizes to the total virtual address
size that its hard to get things to fit.  I haven't hit this problem
myself since my largest machine has only 1GB.  I use an nbuf of
something like twice the default one, and a BKVASIZE of 4 times the
default.  vfs.maxbufspace ends up at 445MB on the machine with 1GB, so
it is maxed out now.

YMMV.

-sc


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] My own performance/tuning qa

2003-10-27 Thread Neil Conway
On Fri, 2003-10-24 at 20:11, Allen Landsidel wrote:
 However, I do the same thing with the reindex, so I'll definitely be taking 
 it out there, as that one does lock.. although I would think the worst this 
 would do would be a making the index unavailable and forcing a seq scan.. 
 is that not the case?

No, it exclusively locks the table. It has been mentioned before that we
should probably be able to fall back to a seqscan while the REINDEX is
going on, but that's not currently done.

-Neil



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Various performance questions

2003-10-27 Thread Vivek Khera
 DM == Dror Matalon [EMAIL PROTECTED] writes:

DM effective_cache_size = 25520  -- freebsd forumla: vfs.hibufspace / 8192

DM 1. While it seems to work correctly, I'm unclear on why this number is
DM correct.  25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it
DM seems like the number should be more like 1 - 1.5 Gigs.

Nope, that's correct...


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] vacuum locking

2003-10-27 Thread Rob Nagler
Greg Stark writes:
 I don't understand why you would expect overwriting to win here. 
 What types of updates do you do on these tables? 

These are statistics that we're adjusting.  I think that's pretty
normal stuff.  The DSS component is the avg() of these numbers on
particular groups.  The groups are related to foreign keys to
customers and other things.

 Normally I found using update on such a table was too awkward to
 contemplate so I just delete all the relation records that I'm
 replacing for the key I'm working with and insert new ones. This
 always works out to be cleaner code. In fact I usually leave such
 tables with no UPDATE grants on them.

In accounting apps, we do this, too.  It's awkward with all the
relationships to update all the records in the right order.  But
Oracle wins on delete/insert, too, because it reuses the tuples it
already has in memory, and it can reuse the same foreign key index
pages, too, since the values are usually the same.

The difference between Oracle and postgres seems to be optimism.
postgres assumes the transaction will fail and/or that a transaction
will modify lots of data that is used by other queries going on in
parallel.  Oracle assumes that the transaction is going to be
committed, and it might as well make the changes in place.

 In that situation I would have actually expected Postgres to do as well as or
 better than Oracle since that makes them both functionally
 equivalent.

I'll find out soon enough. :-)

Rob



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Various performance questions

2003-10-27 Thread Dror Matalon
On Mon, Oct 27, 2003 at 11:12:37AM -0500, Vivek Khera wrote:
  DM == Dror Matalon [EMAIL PROTECTED] writes:
 
 DM effective_cache_size = 25520  -- freebsd forumla: vfs.hibufspace / 8192
 
 DM 1. While it seems to work correctly, I'm unclear on why this number is
 DM correct.  25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it
 DM seems like the number should be more like 1 - 1.5 Gigs.
 
 Nope, that's correct...

I know it's correct. I was asking why it's correct.

 
 
 -- 
 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Vivek Khera, Ph.D.Khera Communications, Inc.
 Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
 AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Various performance questions

2003-10-27 Thread Dror Matalon
On Mon, Oct 27, 2003 at 07:52:06AM -0500, Christopher Browne wrote:
 In the last exciting episode, [EMAIL PROTECTED] (Dror Matalon) wrote:
  I was answering an earlier response that suggested that maybe the actual
  counting took time so it would take quite a bit longer when there are
  more rows to count.
 
 Well, if a where clause allows the system to use an index to search
 for the subset of elements, that would reduce the number of pages that
 have to be examined, thereby diminishing the amount of work.
 
 Why don't you report what EXPLAIN ANALYZE returns as output for the
 query with WHERE clause?  That would allow us to get more of an idea
 of what is going on...


Here it is once again, and I've added another data poing channel 
1000 which takes even less time than channel  5000. It almost seems
like the optimizer knows that it can skip certain rows rows=4910762 vs
rows=1505605 . But how can it do that without using an index or
actually looking at each row?

zp1936= EXPLAIN ANALYZE select count(*) from items;
QUERY PLAN
--
 Aggregate  (cost=245044.53..245044.53 rows=1 width=0) (actual 
time=55806.893..55806.897 rows=1 loops=1)
   -  Seq Scan on items  (cost=0.00..232767.62 rows=4910762 width=0)
(actual time=0.058..30481.482 rows=4910762 loops=1)
 Total runtime: 55806.992 ms
(3 rows)

zp1936= EXPLAIN ANALYZE select count(*) from items where channel  5000;
QUERY PLAN
--
 Aggregate  (cost=248808.54..248808.54 rows=1 width=0) (actual 
time=26071.264..26071.269 rows=1 loops=1)
   -  Seq Scan on items  (cost=0.00..245044.52 rows=1505605 width=0)
(actual time=0.161..17623.033 rows=1632057 loops=1)
 Filter: (channel  5000)
 Total runtime: 26071.361 ms
(4 rows)

zp1936= EXPLAIN ANALYZE select count(*) from items where channel  1000;
  QUERY PLAN
---
 Aggregate  (cost=245429.74..245429.74 rows=1 width=0) (actual 
time=10225.272..10225.276 rows=1 loops=1)
   -  Seq Scan on items  (cost=0.00..245044.52 rows=154085 width=0) (actual 
time=7.633..10083.246 rows=25687 loops=1)
 Filter: (channel  1000)
 Total runtime: 10225.373 ms
(4 rows)


 -- 
 (format nil [EMAIL PROTECTED] cbbrowne acm.org)
 http://www3.sympatico.ca/cbbrowne/spiritual.html
 When  replying, it  is  often possible  to cleverly edit  the original
 message in such a way  as to subtly alter  its meaning or tone to your
 advantage while  appearing that you are  taking pains  to preserve the
 author's intent.   As a   bonus,   it will   seem that your   superior
 intellect is cutting through all the excess verbiage to the very heart
 of the matter.  -- from the Symbolics Guidelines for Sending Mail
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] vacuum locking

2003-10-27 Thread Greg Stark
Rob Nagler [EMAIL PROTECTED] writes:

 I didn't find ALTER SESSION for postgres (isn't that Oracle?), so I
 set sort_mem in the conf file to 512000, restarted postrgres.  Reran
 the simpler query (no name) 3 times, and it was still 27 secs.

Sorry, I don't know how that bubbled up from the depths of my Oracle memory.
In postgres it's just SET

db= set sort_mem = 512000;
SET

  To convince it to do the right thing you would have to do either:
  
  SELECT a, t2.name 
FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1 
JOIN t2 USING (f2)
  
  Or use a subquery:
  
  SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
FROM t1
   GROUP BY f2 
 
 This doesn't solve the problem.  It's the GROUP BY that is doing the
 wrong thing.  It's grouping, then aggregating.

But at least in the form above it will consider using an index on f2, and it
will consider using indexes on t1 and t2 to do the join.

It's unlikely to go ahead and use the indexes though because normally sorting
is faster than using the index when scanning the whole table. You should
compare the explain analyze results for the original query and these two.
And check the results with set enable_seqscan = off as well. 

I suspect you'll find your original query uses sequential scans even when
they're disabled because it has no alternative. With the two above it can use
indexes but I suspect you'll find they actually take longer than the
sequential scan and sort -- especially if you have sort_mem set large enough.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Various performance questions

2003-10-27 Thread Greg Stark
Neil Conway [EMAIL PROTECTED] writes:

 On Sun, 2003-10-26 at 22:49, Greg Stark wrote:
  What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an
  int8 to store its count so it's not limited to 4 billion records.
  Unfortunately int8 is somewhat inefficient as it has to be dynamically
  allocated repeatedly.
 
 Uh, what? Why would an int8 need to be dynamically allocated
 repeatedly?

Perhaps I'm wrong, I'm extrapolating from a comment Tom Lane made that
profiling showed that the bulk of the cost in count() went to allocating
int8s. He commented that this could be optimized by having count() and sum()
bypass the regular api. I don't have the original message handy.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Very Poor Insert Performance

2003-10-27 Thread Greg Stark

Damien Dougan [EMAIL PROTECTED] writes:

 Our batch upload is performing a number of stored procedures to insert data on 
 the database. Initially, this results in quite good performance, but rapidly 
 spirals down to approximately 1 per second after some minutes.

It's fairly unlikely anyone will be able to help without you saying what
you're doing. What are these procedures doing? What do the tables look like?
What indexes exist?

At a guess the foreign key relationships you're enforcing don't have indexes
to help them. If they do perhaps postgres isn't using them.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Various performance questions

2003-10-27 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 On Mon, 2003-10-27 at 12:56, Greg Stark wrote:
 Neil Conway [EMAIL PROTECTED] writes:
 Uh, what? Why would an int8 need to be dynamically allocated
 repeatedly?
 
 Perhaps I'm wrong, I'm extrapolating from a comment Tom Lane made that
 profiling showed that the bulk of the cost in count() went to allocating
 int8s. He commented that this could be optimized by having count() and sum()
 bypass the regular api. I don't have the original message handy.

 I'm still confused: int64 should be stack-allocated, AFAICS. Tom, do you
 recall what the issue here is?

Greg is correct.  int8 is a pass-by-reference datatype and so every
aggregate state-transition function cycle requires at least one palloc
(to return the function result).  I think in the current state of the
code it requires two pallocs :-(, because we can't trust the transition
function to palloc its result in the right context without palloc'ing
leaked junk in that context, so an extra palloc is needed to copy the
result Datum into a longer-lived context than we call the function in.

There was some speculation a few weeks ago about devising a way to let
performance-critical transition functions avoid the excess palloc's by
working with a specialized API instead of the standard function call
API, but I think it'll have to wait for 7.5.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Various performance questions

2003-10-27 Thread Neil Conway
On Mon, 2003-10-27 at 13:52, Tom Lane wrote:
 Greg is correct.  int8 is a pass-by-reference datatype and so every
 aggregate state-transition function cycle requires at least one palloc
 (to return the function result).

Interesting. Is there a reason why int8 is pass-by-reference? (ISTM that
pass-by-value would be sufficient...)

Thanks for the information, Tom  Greg.

-Neil



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Various performance questions

2003-10-27 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Interesting. Is there a reason why int8 is pass-by-reference?

Pass-by-value types have to fit into Datum.

On a 64-bit machine (ie, one where pointers are 64-bits anyway) it would
make sense to convert int8 (and float8 too) into pass-by-value types.
If the machine does not already need Datum to be 8 bytes, though, I
think that widening Datum to 8 bytes just for the benefit of these two
datatypes would be a serious net loss.  Not to mention that it would
just plain break everything on machines with no native 8-byte-int
datatype.

One of the motivations for the version-1 function call protocol was to
allow the pass-by-value-or-by-ref nature of these datatypes to be hidden
from most of the code, with an eye to someday making this a
platform-specific choice.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Various performance questions

2003-10-27 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

 Greg is correct.  int8 is a pass-by-reference datatype 

Just to keep the conversation on track. the evidence from this particular post
seems to indicate that my theory was wrong and the overhead for count(*) is
_not_ a big time sink. It seems to be at most 10% and usually less. A simple
select 1 from tab takes nearly as long.

I'm still puzzled why the times on these are so different when the latter
returns fewer records and both are doing sequential scans:

 select 1 from tab

 select 1 from tab where a  1000

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Various performance questions

2003-10-27 Thread Greg Stark

In fact the number of records seems to be almost irrelevant. A sequential scan
takes almost exactly the same amount of time up until a critical region (for
me around 10 records) at which point it starts going up very quickly.

It's almost as if it's doing some disk i/o, but I'm watching vmstat and don't
see anything. And in any case it would have to read all the same blocks to do
the sequential scan regardless of how many records match, no?

I don't hear the disk seeking either -- though oddly there is some sound
coming from the computer when this computer running. It sounds like a high
pitched sound, almost like a floppy drive reading without seeking. Perhaps
there is some i/o happening and linux is lying about it? Perhaps I'm not
hearing seeking because it's reading everything from one track and not
seeking? Very strange.


slo= explain analyze select 1::int4 from test where a  1 ;
  QUERY PLAN   
   
--
 Seq Scan on test  (cost=0.00..1693.00 rows=11 width=0) (actual time=417.468..417.468 
rows=0 loops=1)
   Filter: (a  1)
 Total runtime: 417.503 ms
(3 rows)

Time: 418.181 ms


slo= explain analyze select 1::int4 from test where a  100 ;
 QUERY PLAN
  
-
 Seq Scan on test  (cost=0.00..1693.00 rows=53 width=0) (actual time=0.987..416.224 
rows=50 loops=1)
   Filter: (a  100)
 Total runtime: 416.301 ms
(3 rows)

Time: 417.008 ms


slo= explain analyze select 1::int4 from test where a  1 ;
   QUERY PLAN  
  
-
 Seq Scan on test  (cost=0.00..1693.00 rows=5283 width=0) (actual time=0.812..434.967 
rows=5000 loops=1)
   Filter: (a  1)
 Total runtime: 439.620 ms
(3 rows)

Time: 440.665 ms


slo= explain analyze select 1::int4 from test where a  10 ;
QUERY PLAN 

---
 Seq Scan on test  (cost=0.00..1693.00 rows=50076 width=0) (actual time=0.889..458.623 
rows=5 loops=1)
   Filter: (a  10)
 Total runtime: 491.281 ms
(3 rows)

Time: 491.998 ms


slo= explain analyze select 1::int4 from test where a  100 ;
 QUERY PLAN
 

 Seq Scan on test  (cost=0.00..1693.00 rows=1 width=0) (actual time=0.018..997.421 
rows=715071 loops=1)
   Filter: (a  100)
 Total runtime: 1461.851 ms
(3 rows)

Time: 1462.898 ms


slo= explain analyze select 1::int4 from test where a  1000 ;
 QUERY PLAN
  
-
 Seq Scan on test  (cost=0.00..1693.00 rows=1 width=0) (actual 
time=0.015..1065.456 rows=80 loops=1)
   Filter: (a  1000)
 Total runtime: 1587.481 ms
(3 rows)

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Very Poor Insert Performance

2003-10-27 Thread Tom Lane
Damien Dougan [EMAIL PROTECTED] writes:
 Has anyone any ideas as to what could be causing the spiraling performance?

You really haven't provided any information that would allow anything
but guesses, but I'll guess anyway: poor plans for foreign key checks?
See nearby threads.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Guesses on what this NestLoop is for?

2003-10-27 Thread Josh Berkus
Folks,

I'm getting this plan on 7.2.4:

--
explain
select events.event_id, events.event_name, type_name,
COALESCE(cases.case_name || '(' || cases.docket || ')', 
trial_groups.tgroup_name) as event_case,
jw_date_format(events.event_date, events.event_tz, events.duration) as 
show_date
FROM event_types, events
LEFT OUTER JOIN cases ON (events.link_type = 'case' AND events.case_id = 
cases.case_id)
LEFT OUTER JOIN trial_groups ON ( events.link_type = 'tg' AND
events.case_id = trial_groups.tgroup_id )
LEFT OUTER JOIN event_history eh ON events.event_id = eh.event_id
WHERE events.status = 1 or events.status = 11
and events.event_date  '2003-10-27'
and events.etype_id = event_types.etype_id
and (
( events.mod_user = 562 AND eh.event_id IS NULL )
OR
( eh.mod_user = 562
  and not exists (select 1 from event_history eh2
where eh2.event_id = eh.event_id
and eh2.mod_date  eh.mod_date) )
);

Nested Loop  (cost=14949.08..2676373923.96 rows=3666858 width=197)
  -  Hash Join  (cost=4949.08..8519.60 rows=43568 width=165)
-  Hash Join  (cost=4407.81..6615.02 rows=43568 width=149)
  -  Hash Join  (cost=4403.21..6485.29 rows=43568 width=125)
-  Seq Scan on events  (cost=0.00..1515.70 rows=43568 
width=79)
-  Hash  (cost=3108.07..3108.07 rows=115355 width=46)
  -  Seq Scan on cases  (cost=0.00..3108.07 
rows=115355 width=46)
  -  Hash  (cost=4.43..4.43 rows=143 width=24)
-  Seq Scan on trial_groups  (cost=0.00..4.43 rows=143 
width=24)
-  Hash  (cost=524.72..524.72 rows=13240 width=16)
  -  Seq Scan on event_history eh  (cost=0.00..524.72 rows=13240 
width=16)
  -  Seq Scan on event_types  (cost=0.00..4.32 rows=106 width=32)
  SubPlan
-  Seq Scan on event_history eh2  (cost=0.00..557.82 rows=1 width=0)
-

What I can't figure out is what is that inredibly expensive nested loop for?   
If I could figure that out, maybe I could query around it.

Unfortunately, I can't EXPLAIN ANALYZE because the present query swamps the 
machine, and it's a production server.  Also it never completes.

And yes, the system is vacuum full analyzed.   Event_history is under-indexed, 
but the other tables are heavily indexed.

Ideas?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Optimizing Performance

2003-10-27 Thread Kamalraj Singh Madhan
Hi,
  I'am having major performance issues with post gre 7.3.1 db. Kindly suggest all 
the possible means by which i can optimize the performance of this database. If not 
all, some ideas (even if they are common) are also welcome. There is no optimisation 
done to the default configuration of the installed database. Kindly suggest.  

regards
Kamalraj

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Optimizing Performance

2003-10-27 Thread Shridhar Daithankar
Kamalraj Singh Madhan wrote:

Hi,
  I'am having major performance issues with post gre 7.3.1 db. Kindly suggest all the possible means by which i can optimize the performance of this database. If not all, some ideas (even if they are common) are also welcome. There is no optimisation done to the default configuration of the installed database. Kindly suggest.  
Check
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
 Shridhar

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html