Re: [PERFORM] Various performance questions

2003-10-27 Thread Dror Matalon
On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote:
> [EMAIL PROTECTED] (Dror Matalon) wrote:
> > On Sun, Oct 26, 2003 at 10:49:29PM -0500, Greg Stark wrote:
> >> Dror Matalon <[EMAIL PROTECTED]> writes:
> >> 
> >> > explain analyze select count(*) from items where channel < 5000;
> >> > QUERY PLAN
> >> > --
> >> >  Aggregate  (cost=249141.54..249141.54 rows=1 width=0) (actual 
> >> > time=26224.603..26224.608 rows=1 loops=1)
> >> >->  Seq Scan on items  (cost=0.00..245377.52 rows=1505605 width=0) (actual 
> >> > time=7.599..17686.869 rows=1632057 loops=1)
> >> >  Filter: (channel < 5000)
> >> >  Total runtime: 26224.703 ms
> >> > 
> >> > 
> >> > How can it do a sequential scan and apply a filter to it in less time
> >> > than the full sequential scan? Is it actually using an index without
> >> > really telling me? 
> >> 
> >> It's not using the index and not telling you. 
> >> 
> >> It's possible the count(*) operator itself is taking some time. Postgres
> >
> > I find it hard to believe that the actual counting would take a
> > significant amount of time.
> 
> Most of the time involves:
> 
>  a) Reading each page of the table, and
>  b) Figuring out which records on those pages are still "live."

The table has been VACUUM ANALYZED so that there are no "dead" records.
It's still not clear why select count() would be slower than select with
a "where" clause.

> 
> What work were you thinking was involved in doing the counting?

I was answering an earlier response that suggested that maybe the actual
counting took time so it would take quite a bit longer when there are
more rows to count.

> 
> >> doesn't have to call it on the rows that don't match the where clause. How
> >> long does "explain analyze select 1 from items" with and without the where
> >> clause take?
> >
> > Same as count(*). Around 55 secs with no where clause, around 25 secs
> > with.
> 
> Good; at least that's consistent...
> -- 
> (format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")
> http://www3.sympatico.ca/cbbrowne/postgresql.html
> Signs of a Klingon  Programmer #2: "You  question the worthiness of my
> code? I should kill you where you stand!"
> 
> ---(end of broadcast)---
> TIP 4: Don't 'kill -9' the postmaster

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Various performance questions

2003-10-27 Thread Shridhar Daithankar
Dror Matalon wrote:

On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote:
Most of the time involves:

a) Reading each page of the table, and
b) Figuring out which records on those pages are still "live."


The table has been VACUUM ANALYZED so that there are no "dead" records.
It's still not clear why select count() would be slower than select with
a "where" clause.
Do a vacuum verbose full and then everything should be within small range of 
each other.

Also in the where clause, does explicitly typecasting helps?

Like 'where channel<5000::int2;'

 HTH

 Shridhar

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Various performance questions

2003-10-27 Thread Dror Matalon
On Mon, Oct 27, 2003 at 12:52:27PM +0530, Shridhar Daithankar wrote:
> Dror Matalon wrote:
> 
> >On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote:
> >>Most of the time involves:
> >>
> >>a) Reading each page of the table, and
> >>b) Figuring out which records on those pages are still "live."
> >
> >
> >The table has been VACUUM ANALYZED so that there are no "dead" records.
> >It's still not clear why select count() would be slower than select with
> >a "where" clause.
> 
> Do a vacuum verbose full and then everything should be within small range 
> of each other.
> 

I did vaccum full verbose and the results are the same as before, 55
seconds for count(*) and 26 seconds for count(*) where channel < 5000.

> Also in the where clause, does explicitly typecasting helps?
> 
> Like 'where channel<5000::int2;'

It makes no difference.

> 
>  HTH
> 
>  Shridhar
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] explicit casting required for index use

2003-10-27 Thread Neil Conway
On Sat, 2003-10-25 at 13:49, Reece Hart wrote:
> Having to explicitly cast criterion is very non-intuitive. Moreover,
> it seems quite straightforward that PostgreSQL might incorporate casts

This is a well-known issue with the query optimizer -- search the
mailing list archives for lots more information. The executive summary
is that this is NOT a trivial issue to fix, and it hasn't been fixed in
7.4, but there is some speculation on how to fix it at some point in the
future.

-Neil



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance Concern

2003-10-27 Thread Manfred Koizar
On Sun, 26 Oct 2003 00:13:36 +0300, Hannu Krosing <[EMAIL PROTECTED]> wrote:
>UPDATE baz
>   SET customer_id = '1234'
> WHERE baz_key IN (
>SELECT baz_key
>  FROM baz innerbaz
> WHERE customer_id IS NULL
>   and innerbaz.baz_key = baz.baz_key
> LIMIT 1000 );

AFAICS this is not what the OP intended.  It is equivalent to 

UPDATE baz
   SET customer_id = '1234'
 WHERE customer_id IS NULL;

because the subselect is now correlated to the outer query and is
evaluated for each row of the outer query which makes the LIMIT clause
ineffective.

Servus
 Manfred

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Various performance questions

2003-10-27 Thread Christopher Browne
In the last exciting episode, [EMAIL PROTECTED] (Dror Matalon) wrote:
> I was answering an earlier response that suggested that maybe the actual
> counting took time so it would take quite a bit longer when there are
> more rows to count.

Well, if a "where clause" allows the system to use an index to search
for the subset of elements, that would reduce the number of pages that
have to be examined, thereby diminishing the amount of work.

Why don't you report what EXPLAIN ANALYZE returns as output for the
query with WHERE clause?  That would allow us to get more of an idea
of what is going on...
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")
http://www3.sympatico.ca/cbbrowne/spiritual.html
When  replying, it  is  often possible  to cleverly edit  the original
message in such a way  as to subtly alter  its meaning or tone to your
advantage while  appearing that you are  taking pains  to preserve the
author's intent.   As a   bonus,   it will   seem that your   superior
intellect is cutting through all the excess verbiage to the very heart
of the matter.  -- from the Symbolics Guidelines for Sending Mail

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Various performance questions

2003-10-27 Thread Greg Stark
Christopher Browne <[EMAIL PROTECTED]> writes:

> In the last exciting episode, [EMAIL PROTECTED] (Dror Matalon) wrote:
> > I was answering an earlier response that suggested that maybe the actual
> > counting took time so it would take quite a bit longer when there are
> > more rows to count.

That was my theory. I guess it's wrong. There is other work involved in
processing a record, but i'm surprised it's as long as the work to actually
pull the record from kernel and check if it's visible.

> Well, if a "where clause" allows the system to use an index to search
> for the subset of elements, that would reduce the number of pages that
> have to be examined, thereby diminishing the amount of work.

it's not. therein lies the mystery.

> Why don't you report what EXPLAIN ANALYZE returns as output for the
> query with WHERE clause?  That would allow us to get more of an idea
> of what is going on...

He did, right at the start of the thread.

For a 1 million record table without he's seeing

 select 1 from tab
 select count(*) from tab

being comparable with only a slight delay for the count(*) whereas

 select 1 from tab where c < 1000
 select count(*) from tab where c < 1000

are much faster even though they still use a sequential scan.

I'm puzzled why the where clause speeds things up as much as it does.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Various performance questions

2003-10-27 Thread Tarhon-Onu Victor
On Sun, 26 Oct 2003, Dror Matalon wrote:

> Here's the structure of the items table
[snip]
>  pubdate   | timestamp with time zone |
> Indexes:
> "item_channel_link" btree (channel, link)
> "item_created" btree (dtstamp)
> "item_signature" btree (signature)
> "items_channel_article" btree (channel, articlenumber)
> "items_channel_tstamp" btree (channel, dtstamp)
> 
> 
> 5. Any other comments/suggestions on the above setup.

Try set enable_seqscan = off; set enable_indexscan = on; to 
force the planner to use one of the indexes. Analyze the queries from 
your application and see what are the most used columns in WHERE clauses 
and recreate the indexes. select count(*) from items where channel < 
5000; will never use any of the current indexes because none matches 
your WHERE clause (channel appears now only in multicolumn indexes).

-- 
Any views or opinions presented within this e-mail are solely those of
the author and do not necessarily represent those of any company, unless
otherwise expressly stated.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Linux Filesystem Shootout

2003-10-27 Thread Christopher Kings-Lynne
http://fsbench.netnation.com/

Seems to answer a few of the questions about which might be the best 
filesystem...

Chris



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Various performance questions

2003-10-27 Thread Neil Conway
On Mon, 2003-10-27 at 10:15, Tarhon-Onu Victor wrote:
> select count(*) from items where channel < 
> 5000; will never use any of the current indexes because none matches 
> your WHERE clause (channel appears now only in multicolumn indexes).

No -- a multi-column index can be used to answer queries on a prefix of
the index's column list. So an index on (channel, xyz) can be used to
answer queries on (just) "channel".

-Neil



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Various performance questions

2003-10-27 Thread Neil Conway
On Sun, 2003-10-26 at 22:49, Greg Stark wrote:
> What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an
> int8 to store its count so it's not limited to 4 billion records.
> Unfortunately int8 is somewhat inefficient as it has to be dynamically
> allocated repeatedly.

Uh, what? Why would an int8 need to be "dynamically allocated
repeatedly"?

-Neil



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


pgsql-performance@postgresql.org

2003-10-27 Thread Vivek Khera
> "AL" == Allen Landsidel <[EMAIL PROTECTED]> writes:

AL> However, I do the same thing with the reindex, so I'll definitely be
AL> taking it out there, as that one does lock.. although I would think
AL> the worst this would do would be a making the index unavailable and
AL> forcing a seq scan.. is that not the case?

Nope.  *All* access to the table is locked out.




AL> ---(end of broadcast)---
AL> TIP 5: Have you checked our extensive FAQ?

AL>http://www.postgresql.org/docs/faqs/FAQ.html

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


pgsql-performance@postgresql.org

2003-10-27 Thread Vivek Khera
> "AL" == Allen Landsidel <[EMAIL PROTECTED]> writes:

>> you need to bump some header file constant and rebuild the kernel.  it
>> also increases the granularity of how the buffer cache is used, so I'm
>> not sure how it affects overall system.  nothing like an experiment...

AL> So far I've found a whole lot of questions about this, but nothing
AL> about the constant.  The sysctl (vfs.hibufspace I believe is the one)
AL> is read only, although I should be able to work around that via
AL> /boot/loader.conf if I can't find the kernel option.

Here's what I have in my personal archive.  I have not tried it yet.
BKVASIZE is in a system header file, so is not a regular "tunable" for
a kernel.  That is, you must muck with the source files to change it,
which make for maintenance headaches.



From: Sean Chittenden <[EMAIL PROTECTED]>
Subject: Re: go for a script! / ex:  PostgreSQL vs. MySQL
Newsgroups: ml.postgres.performance
To: Vivek Khera <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Date: Mon, 13 Oct 2003 12:04:46 -0700
Organization: none

> >> echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))"
> >> 
> >> I've used it for my dedicated servers.  Is this calculation correct?
> 
> SC> Yes, or it's real close at least.  vfs.hibufspace is the amount
> of SC> kernel space that's used for caching IO operations (minus the
> 
> I'm just curious if anyone has a tip to increase the amount of
> memory FreeBSD will use for the cache?

Recompile your kernel with BKVASIZE set to 4 times its current value
and double your nbuf size.  According to Bruce Evans:

"Actually there is a way: the vfs_maxbufspace gives the amount of
space reserved for buffer kva (= nbuf * BKVASIZE).  nbuf is easy to
recover from this, and the buffer kva space may be what is wanted
anyway."
[snip]
"I've never found setting nbuf useful, however.  I want most
parametrized sizes including nbuf to scale with resource sizes, and
it's only with RAM sizes of similar sizes to the total virtual address
size that its hard to get things to fit.  I haven't hit this problem
myself since my largest machine has only 1GB.  I use an nbuf of
something like twice the default one, and a BKVASIZE of 4 times the
default.  vfs.maxbufspace ends up at 445MB on the machine with 1GB, so
it is maxed out now."

YMMV.

-sc


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


pgsql-performance@postgresql.org

2003-10-27 Thread Neil Conway
On Fri, 2003-10-24 at 20:11, Allen Landsidel wrote:
> However, I do the same thing with the reindex, so I'll definitely be taking 
> it out there, as that one does lock.. although I would think the worst this 
> would do would be a making the index unavailable and forcing a seq scan.. 
> is that not the case?

No, it exclusively locks the table. It has been mentioned before that we
should probably be able to fall back to a seqscan while the REINDEX is
going on, but that's not currently done.

-Neil



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Various performance questions

2003-10-27 Thread Vivek Khera
> "DM" == Dror Matalon <[EMAIL PROTECTED]> writes:

DM> effective_cache_size = 25520  -- freebsd forumla: vfs.hibufspace / 8192

DM> 1. While it seems to work correctly, I'm unclear on why this number is
DM> correct.  25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it
DM> seems like the number should be more like 1 - 1.5 Gigs.

Nope, that's correct...


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Very Poor Insert Performance

2003-10-27 Thread Damien Dougan
Hi All,

We've been experiencing extremely poor batch upload performance on our 
Postgres 7.3 (and 7.3.4) database, and I've not been able to improve matters 
significantly using any suggestions I've gleamed off the mailing list 
archives ... so I was wondering if anyone with a bigger brain in this area 
could help :)

Our batch upload is performing a number of stored procedures to insert data on 
the database. Initially, this results in quite good performance, but rapidly 
spirals down to approximately 1 per second after some minutes.

I've got a script that generates stored procedure calls to upload test input 
data, and the script is capable of inserting BEGIN and END at different 
intervals, together with VACUUM ANALYZE commands as well.

I've tried varying the commit level from every operation, every 5, every 10, 
every 25, every 100 operations (again, each operation is 5 SP calls) without 
any noticeable improvement. Likewise, I've varied the VACUUM ANALYZE from 
every 50 to every 100 operations - again without any measurable improvement.

top reports that CPU usage is pretty constant at 99%, and there is 
approximately 1GB of free physical memory available to the OS (with 
approximately 1GB of physical memory in use).

I've have been running postmaster with switched fsync off.

I also tried running with backbuffers of default (64), 128, 256, 512 and even 
1024. Again, with no measurable change.

The typical metrics are (completed operations - each of these require 5 SP 
calls):

1 min: 1036 (1036 operations)
2 min: 1426 (390 operations)
3 min: 1756 (330 operations)
4 min: 2026 (270 operations)
5 min: 2266 (240 operations)

When left running, its not too long before the code snails to 1 operation per 
second.


Has anyone any ideas as to what could be causing the spiraling performance?


With approximately 20,000 operations commited in the database, it takes about 
1 minute to upload a dump of the database - unfortunately we cannot use the 
COPY command to upload brand new data - it really has to go through the 
Stored Procedures to ensure relationships and data integrity across the 
schema (it would be very difficult to develop and maintain code to generate 
COPY commands for inserting new data). And whilst I appreciate INSERTs are 
inherently slower than COPY, I was hoping for something significantly faster 
than the 1 operation/second that things fairly quickly descend to...


Thanks for any advice!

Damien


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] vacuum locking

2003-10-27 Thread Rob Nagler
Greg Stark writes:
> Sorry I was unclear. By "usual case" I meant reading, as opposed to updates.
> The size of the on-disk representation turns out to be a major determinant in
> a lot of database applications, since the dominant resource is i/o bandwidth.
> Try doing a fresh import of a large table with pctfree 0 pctuse 100 and
> compare how long a select takes on it compared to the original table.

BTW, I greatly appreciate your support on this stuff.  This list is a
fantastic resource.

I think we agree.  The question is what is the workload.  On tables
without updates, postgres will be fast enough.  However, postgres is
slow on tables with updates afaict.  I think of OLTP as a system with
updates.  One can do DSS on an OLTP database with Oracle, at least it
seems to work for one of our projects.

> FIrstly, that type of query will be faster in 7.4 due to implementing a new
> method for doing groups called hash aggregates.

We'll be trying it as soon as it is out.

> Secondly you could try raising sort_mem. Postgres can't know how much memory
> it really has before it swaps, so there's a parameter to tell it. And swapping
> would be much worse than doing disk sorts.

It is at 8000.  This is probably as high as I can go with multiple
postmasters.  The sort area is shared in Oracle (I think :-) in the
UGA.

> You can raise sort_mem to tell it how much memory it's allowed to
> use before it goes to disk sorts. You can even use ALTER SESSION to
> raise it in a few DSS sessions but leave it low the many OLTP
> sessions. If it's high in OLTP sessions then you could quickly hit
> swap when they all happen to decide to use the maximum amount at the
> same time. But then you don't want to be doing big sorts in OLTP
> sessions anyways.

This is a web app.  I can't control what the user wants to do.
Sometimes they update data, and other times they simply look at it.

I didn't find ALTER SESSION for postgres (isn't that Oracle?), so I
set sort_mem in the conf file to 512000, restarted postrgres.  Reran
the simpler query (no name) 3 times, and it was still 27 secs.

> Unfortunately there's no way to tell how much memory it thinks it's
> going to use. I used to use a script to monitor the pgsql_tmp
> directory in the database to watch for usage.

I don't have to.  The queries that run slow are hitting disk.
Anything that takes a minute has to be writing to disk.

> Well, first of all it doesn't really because you said to group by t2.name not
> f1. You might expect it to at least optimize something like this:

I put f2 in the group by, and it doesn't matter.  That's the point.
It's the on-disk sort before the aggregate that's killing the query.

> but even then I don't think it actually is capable of using foreign keys as a
> hint like that. I don't think Oracle does either actually, but I'm not sure.

I'll be finding out this week.

> To convince it to do the right thing you would have to do either:
> 
> SELECT a, t2.name 
>   FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1 
>   JOIN t2 USING (f2)
> 
> Or use a subquery:
> 
> SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
>   FROM t1
>  GROUP BY f2 

This doesn't solve the problem.  It's the GROUP BY that is doing the
wrong thing.  It's grouping, then aggregating.

Rob



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] vacuum locking

2003-10-27 Thread Rob Nagler
Greg Stark writes:
> I don't understand why you would expect overwriting to win here. 
> What types of updates do you do on these tables? 

These are statistics that we're adjusting.  I think that's pretty
normal stuff.  The DSS component is the avg() of these numbers on
particular groups.  The groups are related to foreign keys to
customers and other things.

> Normally I found using update on such a table was too awkward to
> contemplate so I just delete all the relation records that I'm
> replacing for the key I'm working with and insert new ones. This
> always works out to be cleaner code. In fact I usually leave such
> tables with no UPDATE grants on them.

In accounting apps, we do this, too.  It's awkward with all the
relationships to update all the records in the right order.  But
Oracle wins on delete/insert, too, because it reuses the tuples it
already has in memory, and it can reuse the same foreign key index
pages, too, since the values are usually the same.

The difference between Oracle and postgres seems to be optimism.
postgres assumes the transaction will fail and/or that a transaction
will modify lots of data that is used by other queries going on in
parallel.  Oracle assumes that the transaction is going to be
committed, and it might as well make the changes in place.

> In that situation I would have actually expected Postgres to do as well as or
> better than Oracle since that makes them both functionally
> equivalent.

I'll find out soon enough. :-)

Rob



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Use of multipart index with "IN"

2003-10-27 Thread Tom Lane
Rob Messer <[EMAIL PROTECTED]> writes:
> The problem comes in when we are selecting multiple field_name values
> in one query.  The normal SQL syntax we have been using is like this:

> select field_name, option_tag from ds_rec_fld where recid = 3001 and
> field_name in ('Q3A1', 'Q3A9');

You'd have better luck if field_name were the first column of the
two-column index.  See the archives.

Improving this situation is on the to-do list but it seems not trivial
to fix.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Various performance questions

2003-10-27 Thread Dror Matalon
On Mon, Oct 27, 2003 at 11:12:37AM -0500, Vivek Khera wrote:
> > "DM" == Dror Matalon <[EMAIL PROTECTED]> writes:
> 
> DM> effective_cache_size = 25520  -- freebsd forumla: vfs.hibufspace / 8192
> 
> DM> 1. While it seems to work correctly, I'm unclear on why this number is
> DM> correct.  25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it
> DM> seems like the number should be more like 1 - 1.5 Gigs.
> 
> Nope, that's correct...

I know it's correct. I was asking why it's correct.

> 
> 
> -- 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Vivek Khera, Ph.D.Khera Communications, Inc.
> Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
> AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Various performance questions

2003-10-27 Thread Dror Matalon
On Mon, Oct 27, 2003 at 07:52:06AM -0500, Christopher Browne wrote:
> In the last exciting episode, [EMAIL PROTECTED] (Dror Matalon) wrote:
> > I was answering an earlier response that suggested that maybe the actual
> > counting took time so it would take quite a bit longer when there are
> > more rows to count.
> 
> Well, if a "where clause" allows the system to use an index to search
> for the subset of elements, that would reduce the number of pages that
> have to be examined, thereby diminishing the amount of work.
> 
> Why don't you report what EXPLAIN ANALYZE returns as output for the
> query with WHERE clause?  That would allow us to get more of an idea
> of what is going on...


Here it is once again, and I've added another data poing "channel <
1000" which takes even less time than channel < 5000. It almost seems
like the optimizer knows that it can skip certain rows "rows=4910762" vs
"rows=1505605" . But how can it do that without using an index or
actually looking at each row?

zp1936=> EXPLAIN ANALYZE select count(*) from items;
QUERY PLAN
--
 Aggregate  (cost=245044.53..245044.53 rows=1 width=0) (actual 
time=55806.893..55806.897 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..232767.62 rows=4910762 width=0)
(actual time=0.058..30481.482 rows=4910762 loops=1)
 Total runtime: 55806.992 ms
(3 rows)

zp1936=> EXPLAIN ANALYZE select count(*) from items where channel < 5000;
QUERY PLAN
--
 Aggregate  (cost=248808.54..248808.54 rows=1 width=0) (actual 
time=26071.264..26071.269 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..245044.52 rows=1505605 width=0)
(actual time=0.161..17623.033 rows=1632057 loops=1)
 Filter: (channel < 5000)
 Total runtime: 26071.361 ms
(4 rows)

zp1936=> EXPLAIN ANALYZE select count(*) from items where channel < 1000;
  QUERY PLAN
---
 Aggregate  (cost=245429.74..245429.74 rows=1 width=0) (actual 
time=10225.272..10225.276 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..245044.52 rows=154085 width=0) (actual 
time=7.633..10083.246 rows=25687 loops=1)
 Filter: (channel < 1000)
 Total runtime: 10225.373 ms
(4 rows)


> -- 
> (format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")
> http://www3.sympatico.ca/cbbrowne/spiritual.html
> When  replying, it  is  often possible  to cleverly edit  the original
> message in such a way  as to subtly alter  its meaning or tone to your
> advantage while  appearing that you are  taking pains  to preserve the
> author's intent.   As a   bonus,   it will   seem that your   superior
> intellect is cutting through all the excess verbiage to the very heart
> of the matter.  -- from the Symbolics Guidelines for Sending Mail
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] vacuum locking

2003-10-27 Thread Greg Stark
Rob Nagler <[EMAIL PROTECTED]> writes:

> I didn't find ALTER SESSION for postgres (isn't that Oracle?), so I
> set sort_mem in the conf file to 512000, restarted postrgres.  Reran
> the simpler query (no name) 3 times, and it was still 27 secs.

Sorry, I don't know how that bubbled up from the depths of my Oracle memory.
In postgres it's just "SET"

db=> set sort_mem = 512000;
SET

> > To convince it to do the right thing you would have to do either:
> > 
> > SELECT a, t2.name 
> >   FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1 
> >   JOIN t2 USING (f2)
> > 
> > Or use a subquery:
> > 
> > SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
> >   FROM t1
> >  GROUP BY f2 
> 
> This doesn't solve the problem.  It's the GROUP BY that is doing the
> wrong thing.  It's grouping, then aggregating.

But at least in the form above it will consider using an index on f2, and it
will consider using indexes on t1 and t2 to do the join.

It's unlikely to go ahead and use the indexes though because normally sorting
is faster than using the index when scanning the whole table. You should
compare the "explain analyze" results for the original query and these two.
And check the results with "set enable_seqscan = off" as well. 

I suspect you'll find your original query uses sequential scans even when
they're disabled because it has no alternative. With the two above it can use
indexes but I suspect you'll find they actually take longer than the
sequential scan and sort -- especially if you have sort_mem set large enough.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Various performance questions

2003-10-27 Thread Greg Stark
Neil Conway <[EMAIL PROTECTED]> writes:

> On Sun, 2003-10-26 at 22:49, Greg Stark wrote:
> > What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an
> > int8 to store its count so it's not limited to 4 billion records.
> > Unfortunately int8 is somewhat inefficient as it has to be dynamically
> > allocated repeatedly.
> 
> Uh, what? Why would an int8 need to be "dynamically allocated
> repeatedly"?

Perhaps I'm wrong, I'm extrapolating from a comment Tom Lane made that
profiling showed that the bulk of the cost in count() went to allocating
int8s. He commented that this could be optimized by having count() and sum()
bypass the regular api. I don't have the original message handy.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Very Poor Insert Performance

2003-10-27 Thread Greg Stark

Damien Dougan <[EMAIL PROTECTED]> writes:

> Our batch upload is performing a number of stored procedures to insert data on 
> the database. Initially, this results in quite good performance, but rapidly 
> spirals down to approximately 1 per second after some minutes.

It's fairly unlikely anyone will be able to help without you saying what
you're doing. What are these procedures doing? What do the tables look like?
What indexes exist?

At a guess the foreign key relationships you're enforcing don't have indexes
to help them. If they do perhaps postgres isn't using them.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Various performance questions

2003-10-27 Thread Neil Conway
On Mon, 2003-10-27 at 12:56, Greg Stark wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> > Uh, what? Why would an int8 need to be "dynamically allocated
> > repeatedly"?
> 
> Perhaps I'm wrong, I'm extrapolating from a comment Tom Lane made that
> profiling showed that the bulk of the cost in count() went to allocating
> int8s. He commented that this could be optimized by having count() and sum()
> bypass the regular api. I don't have the original message handy.

I'm still confused: int64 should be stack-allocated, AFAICS. Tom, do you
recall what the issue here is?

-Neil



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Various performance questions

2003-10-27 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> On Mon, 2003-10-27 at 12:56, Greg Stark wrote:
>> Neil Conway <[EMAIL PROTECTED]> writes:
>>> Uh, what? Why would an int8 need to be "dynamically allocated
>>> repeatedly"?
>> 
>> Perhaps I'm wrong, I'm extrapolating from a comment Tom Lane made that
>> profiling showed that the bulk of the cost in count() went to allocating
>> int8s. He commented that this could be optimized by having count() and sum()
>> bypass the regular api. I don't have the original message handy.

> I'm still confused: int64 should be stack-allocated, AFAICS. Tom, do you
> recall what the issue here is?

Greg is correct.  int8 is a pass-by-reference datatype and so every
aggregate state-transition function cycle requires at least one palloc
(to return the function result).  I think in the current state of the
code it requires two pallocs :-(, because we can't trust the transition
function to palloc its result in the right context without palloc'ing
leaked junk in that context, so an extra palloc is needed to copy the
result Datum into a longer-lived context than we call the function in.

There was some speculation a few weeks ago about devising a way to let
performance-critical transition functions avoid the excess palloc's by
working with a specialized API instead of the standard function call
API, but I think it'll have to wait for 7.5.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Various performance questions

2003-10-27 Thread Neil Conway
On Mon, 2003-10-27 at 13:52, Tom Lane wrote:
> Greg is correct.  int8 is a pass-by-reference datatype and so every
> aggregate state-transition function cycle requires at least one palloc
> (to return the function result).

Interesting. Is there a reason why int8 is pass-by-reference? (ISTM that
pass-by-value would be sufficient...)

Thanks for the information, Tom & Greg.

-Neil



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Various performance questions

2003-10-27 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> Interesting. Is there a reason why int8 is pass-by-reference?

Pass-by-value types have to fit into Datum.

On a 64-bit machine (ie, one where pointers are 64-bits anyway) it would
make sense to convert int8 (and float8 too) into pass-by-value types.
If the machine does not already need Datum to be 8 bytes, though, I
think that widening Datum to 8 bytes just for the benefit of these two
datatypes would be a serious net loss.  Not to mention that it would
just plain break everything on machines with no native 8-byte-int
datatype.

One of the motivations for the version-1 function call protocol was to
allow the pass-by-value-or-by-ref nature of these datatypes to be hidden
from most of the code, with an eye to someday making this a
platform-specific choice.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Various performance questions

2003-10-27 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Greg is correct.  int8 is a pass-by-reference datatype 

Just to keep the conversation on track. the evidence from this particular post
seems to indicate that my theory was wrong and the overhead for count(*) is
_not_ a big time sink. It seems to be at most 10% and usually less. A simple
"select 1 from tab" takes nearly as long.

I'm still puzzled why the times on these are so different when the latter
returns fewer records and both are doing sequential scans:

 select 1 from tab

 select 1 from tab where a < 1000

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Various performance questions

2003-10-27 Thread Greg Stark

In fact the number of records seems to be almost irrelevant. A sequential scan
takes almost exactly the same amount of time up until a critical region (for
me around 10 records) at which point it starts going up very quickly.

It's almost as if it's doing some disk i/o, but I'm watching vmstat and don't
see anything. And in any case it would have to read all the same blocks to do
the sequential scan regardless of how many records match, no?

I don't hear the disk seeking either -- though oddly there is some sound
coming from the computer when this computer running. It sounds like a high
pitched sound, almost like a floppy drive reading without seeking. Perhaps
there is some i/o happening and linux is lying about it? Perhaps I'm not
hearing seeking because it's reading everything from one track and not
seeking? Very strange.


slo=> explain analyze select 1::int4 from test where a < 1 ;
  QUERY PLAN   
   
--
 Seq Scan on test  (cost=0.00..1693.00 rows=11 width=0) (actual time=417.468..417.468 
rows=0 loops=1)
   Filter: (a < 1)
 Total runtime: 417.503 ms
(3 rows)

Time: 418.181 ms


slo=> explain analyze select 1::int4 from test where a < 100 ;
 QUERY PLAN
  
-
 Seq Scan on test  (cost=0.00..1693.00 rows=53 width=0) (actual time=0.987..416.224 
rows=50 loops=1)
   Filter: (a < 100)
 Total runtime: 416.301 ms
(3 rows)

Time: 417.008 ms


slo=> explain analyze select 1::int4 from test where a < 1 ;
   QUERY PLAN  
  
-
 Seq Scan on test  (cost=0.00..1693.00 rows=5283 width=0) (actual time=0.812..434.967 
rows=5000 loops=1)
   Filter: (a < 1)
 Total runtime: 439.620 ms
(3 rows)

Time: 440.665 ms


slo=> explain analyze select 1::int4 from test where a < 10 ;
QUERY PLAN 

---
 Seq Scan on test  (cost=0.00..1693.00 rows=50076 width=0) (actual time=0.889..458.623 
rows=5 loops=1)
   Filter: (a < 10)
 Total runtime: 491.281 ms
(3 rows)

Time: 491.998 ms


slo=> explain analyze select 1::int4 from test where a < 100 ;
 QUERY PLAN
 

 Seq Scan on test  (cost=0.00..1693.00 rows=1 width=0) (actual time=0.018..997.421 
rows=715071 loops=1)
   Filter: (a < 100)
 Total runtime: 1461.851 ms
(3 rows)

Time: 1462.898 ms


slo=> explain analyze select 1::int4 from test where a < 1000 ;
 QUERY PLAN
  
-
 Seq Scan on test  (cost=0.00..1693.00 rows=1 width=0) (actual 
time=0.015..1065.456 rows=80 loops=1)
   Filter: (a < 1000)
 Total runtime: 1587.481 ms
(3 rows)

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Various performance questions

2003-10-27 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> I'm still puzzled why the times on these are so different when the latter
> returns fewer records and both are doing sequential scans:

My best guess is that it's simply the per-tuple overhead of cycling
tuples through the two plan nodes.  When you have no actual I/O happening,
the seqscan runtime is going to be all CPU time, something of the form
cost_per_page * number_of_pages_processed +
cost_per_tuple_scanned * number_of_tuples_scanned +
cost_per_tuple_returned * number_of_tuples_returned
I don't have numbers for the relative sizes of those three costs, but
I doubt that any of them are negligible compared to the other two.

Adding a WHERE clause increases cost_per_tuple_scanned but reduces the
number_of_tuples_returned, and so it cuts the contribution from the
third term, evidently by more than the WHERE clause adds to the second
term.

Ny own profiling had suggested that the cost-per-tuple-scanned in the
aggregate node dominated the seqscan CPU costs, but that might be
platform-specific, or possibly have something to do with the fact that
I was profiling an assert-enabled build.

It might be worth pointing out that EXPLAIN ANALYZE adds two kernel
calls (gettimeofday or some such) into each cycle of the plan nodes;
that's probably inflating the cost_per_tuple_returned by a noticeable
amount.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Very Poor Insert Performance

2003-10-27 Thread Tom Lane
Damien Dougan <[EMAIL PROTECTED]> writes:
> Has anyone any ideas as to what could be causing the spiraling performance?

You really haven't provided any information that would allow anything
but guesses, but I'll guess anyway: poor plans for foreign key checks?
See nearby threads.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Guesses on what this NestLoop is for?

2003-10-27 Thread Josh Berkus
Folks,

I'm getting this plan on 7.2.4:

--
explain
select events.event_id, events.event_name, type_name,
COALESCE(cases.case_name || '(' || cases.docket || ')', 
trial_groups.tgroup_name) as event_case,
jw_date_format(events.event_date, events.event_tz, events.duration) as 
show_date
FROM event_types, events
LEFT OUTER JOIN cases ON (events.link_type = 'case' AND events.case_id = 
cases.case_id)
LEFT OUTER JOIN trial_groups ON ( events.link_type = 'tg' AND
events.case_id = trial_groups.tgroup_id )
LEFT OUTER JOIN event_history eh ON events.event_id = eh.event_id
WHERE events.status = 1 or events.status = 11
and events.event_date > '2003-10-27'
and events.etype_id = event_types.etype_id
and (
( events.mod_user = 562 AND eh.event_id IS NULL )
OR
( eh.mod_user = 562
  and not exists (select 1 from event_history eh2
where eh2.event_id = eh.event_id
and eh2.mod_date < eh.mod_date) )
);

Nested Loop  (cost=14949.08..2676373923.96 rows=3666858 width=197)
  ->  Hash Join  (cost=4949.08..8519.60 rows=43568 width=165)
->  Hash Join  (cost=4407.81..6615.02 rows=43568 width=149)
  ->  Hash Join  (cost=4403.21..6485.29 rows=43568 width=125)
->  Seq Scan on events  (cost=0.00..1515.70 rows=43568 
width=79)
->  Hash  (cost=3108.07..3108.07 rows=115355 width=46)
  ->  Seq Scan on cases  (cost=0.00..3108.07 
rows=115355 width=46)
  ->  Hash  (cost=4.43..4.43 rows=143 width=24)
->  Seq Scan on trial_groups  (cost=0.00..4.43 rows=143 
width=24)
->  Hash  (cost=524.72..524.72 rows=13240 width=16)
  ->  Seq Scan on event_history eh  (cost=0.00..524.72 rows=13240 
width=16)
  ->  Seq Scan on event_types  (cost=0.00..4.32 rows=106 width=32)
  SubPlan
->  Seq Scan on event_history eh2  (cost=0.00..557.82 rows=1 width=0)
-

What I can't figure out is what is that inredibly expensive nested loop for?   
If I could figure that out, maybe I could query around it.

Unfortunately, I can't EXPLAIN ANALYZE because the present query swamps the 
machine, and it's a production server.  Also it never completes.

And yes, the system is vacuum full analyzed.   Event_history is under-indexed, 
but the other tables are heavily indexed.

Ideas?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Optimizing Performance

2003-10-27 Thread Kamalraj Singh Madhan
Hi,
  I'am having major performance issues with post gre 7.3.1 db. Kindly suggest all 
the possible means by which i can optimize the performance of this database. If not 
all, some ideas (even if they are common) are also welcome. There is no optimisation 
done to the default configuration of the installed database. Kindly suggest.  

regards
Kamalraj

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Optimizing Performance

2003-10-27 Thread Shridhar Daithankar
Kamalraj Singh Madhan wrote:

Hi,
  I'am having major performance issues with post gre 7.3.1 db. Kindly suggest all the possible means by which i can optimize the performance of this database. If not all, some ideas (even if they are common) are also welcome. There is no optimisation done to the default configuration of the installed database. Kindly suggest.  
Check
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
 Shridhar

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html