Re: [PERFORM] count * performance issue

2008-03-11 Thread Andrew Sullivan
On Tue, Mar 11, 2008 at 02:19:09PM +, Matthew wrote:
> of rows with IS NULL, then someone changes a row, then you find the count 
> of rows with IS NOT NULL. Add the two together, and there may be rows that 
> were counted twice, or not at all.

Only if you count in READ COMMITTED.

A


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-11 Thread Heikki Linnakangas

Matthew wrote:
No, actually I was referring to a race condition. So, you find the count 
of rows with IS NULL, then someone changes a row, then you find the 
count of rows with IS NOT NULL. Add the two together, and there may be 
rows that were counted twice, or not at all.


Not a problem if you use a serializable transaction, or if you do

SELECT COUNT(*) from table WHERE indexed_field IS NULL
UNION ALL
SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL

as one statement.

However, this makes no sense whatsoever. As both index scans (assuming 
the planner even chooses an index scan for them, which seems highly 
unlikely) still have to visit each tuple in the heap. It's always going 
to be slower than a single "SELECT COUNT(*) FROM table" with a seq scan.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-11 Thread Matthew

On Tue, 11 Mar 2008, Tino Wildenhain wrote:

And certain, qualified definitions of "accurate" as well. Race condition?


You mean in a three-state-logic? null, not null and something different?


True, False, and FILE_NOT_FOUND.

No, actually I was referring to a race condition. So, you find the count 
of rows with IS NULL, then someone changes a row, then you find the count 
of rows with IS NOT NULL. Add the two together, and there may be rows that 
were counted twice, or not at all.


Matthew

--
It's one of those irregular verbs - "I have an independent mind," "You are
an eccentric," "He is round the twist."
 -- Bernard Woolly, Yes Prime Minister

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-11 Thread Tino Wildenhain

Hi,

Matthew wrote:

On Tue, 11 Mar 2008, Bill Moran wrote:


In response to "Robins Tharakan" <[EMAIL PROTECTED]>:

Sorry, if I am missing something here, but shouldn't something like this
allow us to get a (fast) accurate count ?

SELECT COUNT(*) from table WHERE indexed_field IS NULL
+
SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL


For certain, qualified definitions of "fast", sure.


And certain, qualified definitions of "accurate" as well. Race condition?


You mean in a three-state-logic? null, not null and something different?

;-)

Tino

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-11 Thread Matthew

On Tue, 11 Mar 2008, Bill Moran wrote:


In response to "Robins Tharakan" <[EMAIL PROTECTED]>:

Sorry, if I am missing something here, but shouldn't something like this
allow us to get a (fast) accurate count ?

SELECT COUNT(*) from table WHERE indexed_field IS NULL
+
SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL


For certain, qualified definitions of "fast", sure.


And certain, qualified definitions of "accurate" as well. Race condition?

Matthew

--
"Television is a medium because it is neither rare nor well done." 
 -- Fred Friendly


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-11 Thread Bill Moran
In response to "Robins Tharakan" <[EMAIL PROTECTED]>:

> Hi,
> 
> I have been reading this conversation for a few days now and I just wanted
> to ask this. From the release notes, one of the new additions in 8.3 is
> (Allow col IS NULL to use an index (Teodor)).
> 
> Sorry, if I am missing something here, but shouldn't something like this
> allow us to get a (fast) accurate count ?
> 
> SELECT COUNT(*) from table WHERE indexed_field IS NULL
> +
> SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL

For certain, qualified definitions of "fast", sure.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-11 Thread Albert Cervera Areny
A Dimarts 11 Març 2008 04:11, Scott Marlowe va escriure:
> On Mon, Mar 10, 2008 at 7:57 PM, Robins Tharakan <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I have been reading this conversation for a few days now and I just
> > wanted to ask this. From the release notes, one of the new additions in
> > 8.3 is (Allow col IS NULL to use an index (Teodor)).
> >
> > Sorry, if I am missing something here, but shouldn't something like this
> > allow us to get a (fast) accurate count ?
> >
> > SELECT COUNT(*) from table WHERE indexed_field IS NULL
> >  +
> > SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL
>
> It really depends on the distribution of the null / not nulls in the
> table.  If it's 50/50 there's no advantage to using the index, as you
> still have to check visibility info in the table itself.
>
> OTOH, if NULL (or converserly not null) are rare, then yes, the index
> can help.  I.e. if 1% of the tuples are null, the select count(*) from
> table where field is null can use the index efficiently.

But you'll get a sequential scan with the NOT NULL case which will end up 
taking more time.  (Seq Scan + Index Scan > Seq Scan)

-- 
Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.

Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12


  AVISO LEGAL  
La   presente  comunicación  y sus anexos tiene como destinatario la
persona a  la  que  va  dirigida, por  lo  que  si  usted lo  recibe
por error  debe  notificarlo  al  remitente  y   eliminarlo   de  su
sistema,  no  pudiendo  utilizarlo,  total  o   parcialmente,   para
ningún  fin.  Su  contenido  puede  tener información confidencial o
protegida legalmente   y   únicamente   expresa  la  opinión del
remitente.  El   uso   del   correo   electrónico   vía Internet  no
permite   asegurarni  la   confidencialidad   de   los  mensajes
nisucorrecta recepción.   Enel  caso   de   que   el
destinatario no consintiera la utilización  del correo  electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.

... DISCLAIMER .
This message and its  attachments are  intended  exclusively for the
named addressee. If you  receive  this  message  in   error,  please
immediately delete it from  your  system  and notify the sender. You
may  not  use  this message  or  any  part  of it  for any  purpose.
The   message   may  contain  information  that  is  confidential or
protected  by  law,  and  any  opinions  expressed  are those of the
individualsender.  Internet  e-mail   guarantees   neither   the
confidentiality   nor  the  proper  receipt  of  the  message  sent.
If  the  addressee  of  this  message  does  not  consent to the use
of   internete-mail,pleaseinform usinmmediately.



 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-10 Thread Scott Marlowe
On Mon, Mar 10, 2008 at 7:57 PM, Robins Tharakan <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have been reading this conversation for a few days now and I just wanted
> to ask this. From the release notes, one of the new additions in 8.3 is
> (Allow col IS NULL to use an index (Teodor)).
>
> Sorry, if I am missing something here, but shouldn't something like this
> allow us to get a (fast) accurate count ?
>
> SELECT COUNT(*) from table WHERE indexed_field IS NULL
>  +
> SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL

It really depends on the distribution of the null / not nulls in the
table.  If it's 50/50 there's no advantage to using the index, as you
still have to check visibility info in the table itself.

OTOH, if NULL (or converserly not null) are rare, then yes, the index
can help.  I.e. if 1% of the tuples are null, the select count(*) from
table where field is null can use the index efficiently.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-10 Thread Joshua D. Drake
On Tue, 11 Mar 2008 08:27:05 +0530
"Robins Tharakan" <[EMAIL PROTECTED]> wrote:

> SELECT COUNT(*) from table WHERE indexed_field IS NULL
> +
> SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL

If the selectivity is appropriate yes. However if you have 1 million
rows, and 200k of those rows are null (or not null), it is still going
to seqscan.

joshua d. drake

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit



signature.asc
Description: PGP signature


Re: [PERFORM] count * performance issue

2008-03-10 Thread Mark Mielke

Robins Tharakan wrote:

Hi,

I have been reading this conversation for a few days now and I just 
wanted to ask this. From the release notes, one of the new additions 
in 8.3 is (Allow col IS NULL to use an index (Teodor)).


Sorry, if I am missing something here, but shouldn't something like 
this allow us to get a (fast) accurate count ?


SELECT COUNT(*) from table WHERE indexed_field IS NULL
+
SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL


For PostgreSQL: You still don't know whether the row is visible until 
you check the row. That it's NULL or NOT NULL does not influence this truth.


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>



Re: [PERFORM] count * performance issue

2008-03-10 Thread Robins Tharakan
Hi,

I have been reading this conversation for a few days now and I just wanted
to ask this. From the release notes, one of the new additions in 8.3 is
(Allow col IS NULL to use an index (Teodor)).

Sorry, if I am missing something here, but shouldn't something like this
allow us to get a (fast) accurate count ?

SELECT COUNT(*) from table WHERE indexed_field IS NULL
+
SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL

*Robins Tharakan*

-- Forwarded message --
From: Greg Smith <[EMAIL PROTECTED]>
Date: Tue, Mar 11, 2008 at 4:31 AM
Subject: Re: [PERFORM] count * performance issue
To: Joe Mirabal <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org


On Mon, 10 Mar 2008, Joe Mirabal wrote:

> I run queries on the data nad get immediate max and min as well as other
> aggrgate functions very quickly, however a select count(*) of the table
> takes forever usually nearly an hour or more.

Are you sure the form of "select count(*)" you're using is actually
utilizing the index to find a useful subset?  What do you get out of
EXPLAIN ANALZYE on the query?

In order for indexes to be helpful a couple of things need to happen:
1) They have to be structured correctly to be useful
2) There needs to be large enough settings for shared_buffes and
effective_cache_size that the database things it can use them efficiently
3) The tables involved need to be ANALYZEd to keep their statistics up to
date.

The parameters to run a 400GB *table* are very different from the
defaults; if you want tuning suggestions you should post the non-default
entries in your postgresql.conf file from what you've already adjusted
along with basic information about your server (PostgreSQL version, OS,
memory, disk setup).

> We in our warehouse use the count(*) as our verification of counts by
> day/month's etc

If you've got a database that size and you're doing that sort of thing on
it, you really should be considering partitioning as well.

 --
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-10 Thread Greg Smith

On Mon, 10 Mar 2008, Joe Mirabal wrote:

I run queries on the data nad get immediate max and min as well as other 
aggrgate functions very quickly, however a select count(*) of the table 
takes forever usually nearly an hour or more.


Are you sure the form of "select count(*)" you're using is actually 
utilizing the index to find a useful subset?  What do you get out of 
EXPLAIN ANALZYE on the query?


In order for indexes to be helpful a couple of things need to happen:
1) They have to be structured correctly to be useful
2) There needs to be large enough settings for shared_buffes and 
effective_cache_size that the database things it can use them efficiently
3) The tables involved need to be ANALYZEd to keep their statistics up to 
date.


The parameters to run a 400GB *table* are very different from the 
defaults; if you want tuning suggestions you should post the non-default 
entries in your postgresql.conf file from what you've already adjusted 
along with basic information about your server (PostgreSQL version, OS, 
memory, disk setup).


We in our warehouse use the count(*) as our verification of counts by 
day/month's etc


If you've got a database that size and you're doing that sort of thing on 
it, you really should be considering partitioning as well.


 --
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-10 Thread Scott Marlowe
On Mon, Mar 10, 2008 at 1:54 PM, Joe Mirabal <[EMAIL PROTECTED]> wrote:
> Gregory,
>
> I just joined this listserv and was happy to see this posting.  I have a
> 400GB table that I have indexed (building the index took 27 hours) , Loading
> the table with 10 threads took 9 hours.  I run queries on the data nad get
> immediate max and min as well as other aggrgate functions very quickly,
> however a select count(*) of the table takes forever usually nearly an hour
> or more.
>
> Do you have any tuning recommendations.  We in our warehouse use the
> count(*) as our verification of counts by day/month's etc and in Netezza its
> immediate.  I tried by adding oids. BUT the situation I learned was that
> adding the oids in the table adds a significasnt amount of space to the data
> AND the index.

Yeah, this is a typical problem people run into with MVCC databases to
one extent or another.  PostgreSQL has no native way to just make it
faster.  However, if it's a table with wide rows, you can use a lookup
table to help a bit.  Have a FK with cascading deletes from the master
table to a table that just holds the PK for it, and do count(*) on
that table.

Otherwise, you have the trigger solution mentioned previously.

Also, if you only need an approximate count, then you can use the
system tables to get that with something like

select reltuples from pg_class where relname='tablename';

after an analyze.  It won't be 100% accurate, but it will be pretty
close most the time.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-10 Thread Greg Smith

On Mon, 10 Mar 2008, Bill Moran wrote:


Some searches through the archives should turn up details on these
methods.


I've collected up what looked like the best resources on this topic into 
the FAQ entry at http://www.postgresqldocs.org/index.php/Slow_Count


General Bits has already done two good summary articles here and I'd think 
wading through the archives directly shouldn't be necessary.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-10 Thread Bill Moran
In response to "Joe Mirabal" <[EMAIL PROTECTED]>:

> Gregory,
> 
> I just joined this listserv and was happy to see this posting.  I have a
> 400GB table that I have indexed (building the index took 27 hours) , Loading
> the table with 10 threads took 9 hours.  I run queries on the data nad get
> immediate max and min as well as other aggrgate functions very quickly,
> however a select count(*) of the table takes forever usually nearly an hour
> or more.
> 
> Do you have any tuning recommendations.  We in our warehouse use the
> count(*) as our verification of counts by day/month's etc and in Netezza its
> immediate.  I tried by adding oids. BUT the situation I learned was that
> adding the oids in the table adds a significasnt amount of space to the data
> AND the index.
> 
> As you may gather from this we are relatively new on Postgres.
> 
> Any suggestions you can give me would be most helpful.

One approach to this problem is to create triggers that keep track of
the total count whenever rows are added or deleted.  This adds some
overhead to the update process, but the correct row count is always
quickly available.

Another is to use EXPLAIN to get an estimate of the # of rows from
the planner.  This works well if an estimate is acceptable, but can't
be trusted for precise counts.

Some searches through the archives should turn up details on these
methods.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-10 Thread Joe Mirabal
Gregory,

I just joined this listserv and was happy to see this posting.  I have a
400GB table that I have indexed (building the index took 27 hours) , Loading
the table with 10 threads took 9 hours.  I run queries on the data nad get
immediate max and min as well as other aggrgate functions very quickly,
however a select count(*) of the table takes forever usually nearly an hour
or more.

Do you have any tuning recommendations.  We in our warehouse use the
count(*) as our verification of counts by day/month's etc and in Netezza its
immediate.  I tried by adding oids. BUT the situation I learned was that
adding the oids in the table adds a significasnt amount of space to the data
AND the index.

As you may gather from this we are relatively new on Postgres.

Any suggestions you can give me would be most helpful.

Cheers,
Joe

On Mon, Mar 10, 2008 at 11:16 AM, Gregory Stark <[EMAIL PROTECTED]>
wrote:

> "Tom Lane" <[EMAIL PROTECTED]> writes:
>
> > Well, scanning an index to get a count might be significantly faster
> > than scanning the main table, but it's hardly "instantaneous".  It's
> > still going to take time proportional to the table size.
>
> Hm, Mark's comment about bitmap indexes makes that not entirely true. A
> bitmap
> index can do RLE compression which makes the relationship between the size
> of
> the table and the time taken to scan the index more complex. In the
> degenerate
> case where there are no concurrent updates (assuming you can determine
> that
> quickly) it might actually be constant time.
>
> > Unless they keep a central counter of the number of index entries;
> > which would have all the same serialization penalties we've talked
> > about before...
>
> Bitmap indexes do in fact have concurrency issues -- arguably they're just
> a
> baroque version of this central counter in this case.
>
> --
>  Gregory Stark
>  EnterpriseDB  http://www.enterprisedb.com
>  Ask me about EnterpriseDB's Slony Replication support!
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
Mirabili et Veritas
Joe Mirabal


Re: [PERFORM] count * performance issue

2008-03-10 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> Well, scanning an index to get a count might be significantly faster
> than scanning the main table, but it's hardly "instantaneous".  It's
> still going to take time proportional to the table size.

Hm, Mark's comment about bitmap indexes makes that not entirely true. A bitmap
index can do RLE compression which makes the relationship between the size of
the table and the time taken to scan the index more complex. In the degenerate
case where there are no concurrent updates (assuming you can determine that
quickly) it might actually be constant time.

> Unless they keep a central counter of the number of index entries;
> which would have all the same serialization penalties we've talked
> about before...

Bitmap indexes do in fact have concurrency issues -- arguably they're just a
baroque version of this central counter in this case.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-08 Thread Arjen van der Meijden

On 6-3-2008 16:28 Craig James wrote:
On the one hand, I understand that Postgres has its architecture, and I 
understand the issue of row visibility, and so forth.  On the other 
hand, my database is just sitting there, nothing going on, no 
connections except me, and... it takes FIFTY FIVE SECONDS to count 20 
million rows, a query that either Oracle or MySQL would answer in a 
fraction of a second.  It's hard for me to believe there isn't a better 
way.


Can you explain to me how you'd fit this in a fraction of a second?

mysql> select count(*) from messages;
+--+
| count(*) |
+--+
| 21908505 |
+--+
1 row in set (8 min 35.09 sec)

This is a table containing the messages on forumtopics and is therefore 
relatively large. The hardware is quite beefy for a forum however (4 
3Ghz cores, 16GB, 14+1 disk raid5). This table has about 20GB of data.


If I use a table that contains about the same amount of records as the 
above and was before this query probably much less present in the 
innodb-buffer (but also less frequently touched by other queries), we 
see this:


mysql> select count(*) from messagesraw;
+--+
| count(*) |
+--+
| 21962804 |
+--+
1 row in set (5 min 16.41 sec)

This table is about 12GB.

In both cases MySQL claimed to be 'Using index' with the PRIMARY index, 
which for those tables is more or less identical.


Apparently the time is still table-size related, not necessarily 
tuple-count related. As this shows:


mysql> select count(*) from articlestats;
+--+
| count(*) |
+--+
| 34467246 |
+--+
1 row in set (54.14 sec)

that table is only 2.4GB, but contains 57% more records, although this 
was on another database on a system with somewhat different specs (8 
2.6Ghz cores, 16GB, 7+7+1 raid50), used a non-primary index and I have 
no idea how well that index was in the system's cache prior to this query.


Repeating it makes it do that query in 6.65 seconds, repeating the 
12GB-query doesn't make it any faster.


Anyway, long story short: MySQL's table-count stuff also seems 
table-size related. As soon as the index it uses fits in the cache or it 
doesn't have to use the primary index, it might be a different story, 
but when the table(index) is too large to fit, it is quite slow.
Actually, it doesn't appear to be much faster than Postgresql's (8.2) 
table-based counts. If I use a much slower machine (2 2Ghz opterons, 8GB 
ddr memory, 5+1 old 15k rpm scsi disks in raid5) with a 1GB, 13M record 
table wich is similar to the above articlestats, it is able to return a 
count(*) in 3 seconds after priming the cache.


If you saw instantaneous results with MySQL, you have either seen the 
query-cache at work or where using myisam. Or perhaps with a fast 
system, you had small tuples with a nice index in a nicely primed cache.


Best regards,

Arjen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-07 Thread paul rivers

Mark Mielke wrote:

Josh Berkus wrote:

Count() on Oracle and MySQL is almost instantaneous, even for very
large tables. So why can't Postgres do what they do?
  

AFAIK the above claim is false for Oracle.  They have the same
transactional issues we do.



Nope.  Oracle's MVCC is implemented through rollback segments, rather than 
non-overwriting the way ours is.  So Oracle can just do a count(*) on the 
index, then check the rollback segment for any concurrent 
update/delete/insert activity and adjust the count.  This sucks if there's 
a *lot* of concurrent activity, but in the usual case it's pretty fast


I read the "almost instantaneous" against "the above claim is false" and 
"Nope.", and I am not sure from the above whether you are saying that 
Oracle keeps an up-to-date count for the index (which might make it 
instantaneous?), or whether you are saying it still has to scan the 
index - which can take time if the index is large (therefore not 
instantaneous).


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>



Oracle scans the index pages, if the b-tree index is on non-nullable 
columns, or if the bitmap index is on low-ish cardinality data. 
Otherwise, it table scans.  MyISAM in MySQL would be an example where a 
counter is kept.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-07 Thread Tom Lane
Greg Smith <[EMAIL PROTECTED]> writes:
> I know when I'm playing with pgbench the primary key index on the big 
> accounts table is 1/7 the size of the table, and when using that table 
> heavily shared_buffers ends up being mostly filled with that index. The 
> usage counts are so high on the index blocks relative to any section of 
> the table itself that they're very sticky in memory.  And that's toy data; 
> on some of the webapps people want these accurate counts for the ratio of 
> index size to table data is even more exaggerated (think web forum).

Remember that our TOAST mechanism acts to limit the width of the
main-table row.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-07 Thread Mark Mielke

Josh Berkus wrote:

Count() on Oracle and MySQL is almost instantaneous, even for very
large tables. So why can't Postgres do what they do?
  

AFAIK the above claim is false for Oracle.  They have the same
transactional issues we do.



Nope.  Oracle's MVCC is implemented through rollback segments, rather than 
non-overwriting the way ours is.  So Oracle can just do a count(*) on the 
index, then check the rollback segment for any concurrent 
update/delete/insert activity and adjust the count.  This sucks if there's 
a *lot* of concurrent activity, but in the usual case it's pretty fast


I read the "almost instantaneous" against "the above claim is false" and 
"Nope.", and I am not sure from the above whether you are saying that 
Oracle keeps an up-to-date count for the index (which might make it 
instantaneous?), or whether you are saying it still has to scan the 
index - which can take time if the index is large (therefore not 
instantaneous).


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>



Re: [PERFORM] count * performance issue

2008-03-07 Thread Greg Smith

On Fri, 7 Mar 2008, Tom Lane wrote:


Well, scanning an index to get a count might be significantly faster
than scanning the main table, but it's hardly "instantaneous".  It's
still going to take time proportional to the table size.


If this is something that's happening regularly, you'd have to hope that 
most of the index is already buffered in memory somewhere though, so now 
you're talking a buffer/OS cache scan that doesn't touch disk much. 
Should be easier for that to be true because the index is smaller than the 
table, right?


I know when I'm playing with pgbench the primary key index on the big 
accounts table is 1/7 the size of the table, and when using that table 
heavily shared_buffers ends up being mostly filled with that index. The 
usage counts are so high on the index blocks relative to any section of 
the table itself that they're very sticky in memory.  And that's toy data; 
on some of the webapps people want these accurate counts for the ratio of 
index size to table data is even more exaggerated (think web forum).


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-07 Thread Tom Lane
Josh Berkus <[EMAIL PROTECTED]> writes:
> Tom,
>>> Count() on Oracle and MySQL is almost instantaneous, even for very
>>> large tables. So why can't Postgres do what they do?
>> 
>> AFAIK the above claim is false for Oracle.  They have the same
>> transactional issues we do.

> Nope.  Oracle's MVCC is implemented through rollback segments, rather than 
> non-overwriting the way ours is.  So Oracle can just do a count(*) on the 
> index, then check the rollback segment for any concurrent 
> update/delete/insert activity and adjust the count.  This sucks if there's 
> a *lot* of concurrent activity, but in the usual case it's pretty fast.

Well, scanning an index to get a count might be significantly faster
than scanning the main table, but it's hardly "instantaneous".  It's
still going to take time proportional to the table size.

Unless they keep a central counter of the number of index entries;
which would have all the same serialization penalties we've talked
about before...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-07 Thread Josh Berkus
Tom,

> > Count() on Oracle and MySQL is almost instantaneous, even for very
> > large tables. So why can't Postgres do what they do?
>
> AFAIK the above claim is false for Oracle.  They have the same
> transactional issues we do.

Nope.  Oracle's MVCC is implemented through rollback segments, rather than 
non-overwriting the way ours is.  So Oracle can just do a count(*) on the 
index, then check the rollback segment for any concurrent 
update/delete/insert activity and adjust the count.  This sucks if there's 
a *lot* of concurrent activity, but in the usual case it's pretty fast.

I've been thinking that when we apply the Dead Space Map we might be able 
to get a similar effect in PostgreSQL.  That is, just do a count over the 
index, and visit only the heap pages flagged in the DSM.  Again, for a 
heavily updated table this wouldn't have any benefit, but for most cases 
it would be much faster.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-07 Thread Mark Kirkwood

Craig James wrote:

Tom Lane wrote:

Craig James <[EMAIL PROTECTED]> writes:
Count() on Oracle and MySQL is almost instantaneous, even for very 
large tables. So why can't Postgres do what they do?


AFAIK the above claim is false for Oracle.  They have the same
transactional issues we do.


My experience doesn't match this claim.  When I ported my application 
from Oracle to Postgres, this was the single biggest performance 
problem.  count() in Oracle was always very fast.  We're not talking 
about a 20% or 50% difference, we're talking about a small fraction of 
a second (Oracle) versus a minute (Postgres) -- something like two or 
three orders of magnitude.




To convince yourself do this in Oracle:

EXPLAIN PLAN FOR SELECT count(*) FROM table_without_any_indexes

and you will see a full table scan. If you add (suitable) indexes you'll 
see something like an index full fast scan.



In fact you can make count(*) *very* slow indeed in Oracle, by having an 
older session try to count a table that a newer session is modifying and 
committing to. The older session's data for the count is reconstructed 
from the rollback segments - which is very expensive.


regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread paul rivers

Craig James wrote:

Tom Lane wrote:

Craig James <[EMAIL PROTECTED]> writes:
Count() on Oracle and MySQL is almost instantaneous, even for very 
large tables. So why can't Postgres do what they do?


AFAIK the above claim is false for Oracle.  They have the same
transactional issues we do.


My experience doesn't match this claim.  When I ported my application 
from Oracle to Postgres, this was the single biggest performance 
problem.  count() in Oracle was always very fast.  We're not talking 
about a 20% or 50% difference, we're talking about a small fraction of 
a second (Oracle) versus a minute (Postgres) -- something like two or 
three orders of magnitude.


It may be that Oracle has a way to detect when there's no transaction 
and use a faster method.  If so, this was a clever optimization -- in 
my experience, that represents the vast majority of the times you want 
to use count().  It's not very useful to count the rows of a table 
that many apps are actively modifying since the result may change the 
moment your transaction completes.  Most of the time when you use 
count(), it's because you're the only one modifying the table, so the 
count will be meaningful.


Craig




Oracle will use a btree index on a not null set of columns to do a fast 
full index scan, which can be an order of magnitude or faster compared 
to a table scan.  Also, Oracle can use a bitmap index (in cases where a 
bitmap index isn't otherwise silly) for a bitmap fast index scan/bitmap 
conversion for similar dramatic results.  

For "large" tables, Oracle is not going to be as fast as MyISAM tables 
in MySQL, even with these optimizations, since MyISAM doesn't have to 
scan even index pages to get a count(*) answer against the full table.


Paul



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Craig James

Tom Lane wrote:

Craig James <[EMAIL PROTECTED]> writes:

Count() on Oracle and MySQL is almost instantaneous, even for very large 
tables. So why can't Postgres do what they do?


AFAIK the above claim is false for Oracle.  They have the same
transactional issues we do.


My experience doesn't match this claim.  When I ported my application from 
Oracle to Postgres, this was the single biggest performance problem.  count() 
in Oracle was always very fast.  We're not talking about a 20% or 50% 
difference, we're talking about a small fraction of a second (Oracle) versus a 
minute (Postgres) -- something like two or three orders of magnitude.

It may be that Oracle has a way to detect when there's no transaction and use a 
faster method.  If so, this was a clever optimization -- in my experience, that 
represents the vast majority of the times you want to use count().  It's not 
very useful to count the rows of a table that many apps are actively modifying 
since the result may change the moment your transaction completes.  Most of the 
time when you use count(), it's because you're the only one modifying the 
table, so the count will be meaningful.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Mark Kirkwood

Craig James wrote:


My question is: What do the other databases do that Postgres can't do, 
and why not?


Count() on Oracle and MySQL is almost instantaneous, even for very 
large tables. So why can't Postgres do what they do?




I think Mysql can only do that for the myisam engine - innodb and 
falcon  are similar to Postgres.


I don't believe Oracle optimizes bare count(*) on a table either - tho 
it may be able to use a suitable index (if present) to get the answer 
quicker.


regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Tom Lane
Craig James <[EMAIL PROTECTED]> writes:
> Count() on Oracle and MySQL is almost instantaneous, even for very large 
> tables. So why can't Postgres do what they do?

AFAIK the above claim is false for Oracle.  They have the same
transactional issues we do.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread D'Arcy J.M. Cain
On Thu, 06 Mar 2008 07:28:50 -0800
Craig James <[EMAIL PROTECTED]> wrote:
> In the 3 years I've been using Postgres, the problem of count() performance 
> has come up more times than I can recall, and each time the answer is, "It's 
> a sequential scan -- redesign your application."
> 
> My question is: What do the other databases do that Postgres can't do, and 
> why not?
> 
> Count() on Oracle and MySQL is almost instantaneous, even for very large 
> tables. So why can't Postgres do what they do?

It's a tradeoff.  The only way to get that information quickly is to
maintain it internally when you insert or delete a row.  So when do you
want to take your hit.  It sounds like Oracle has made this decision
for you.  In PostgreSQL you can use triggers and rules to manage this
information if you need it.  You can even do stuff like track how many
of each type of something you have.  That's something you can't do if
your database engine has done a generic speedup for you.  You would
still have to create your own table for something like that and then
you get the hit twice.

-- 
D'Arcy J.M. Cain <[EMAIL PROTECTED]> |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Mark Lewis
On Thu, 2008-03-06 at 07:28 -0800, Craig James wrote:
...
> My question is: What do the other databases do that Postgres can't do, and 
> why not?
> 
> Count() on Oracle and MySQL is almost instantaneous, even for very large 
> tables. So why can't Postgres do what they do?
...

I can vouch that Oracle can still take linear time to perform a
count(*), at least in some cases.

I have also seen count(*) fast in some cases too... my understanding is
that they maintain a list of "interested transactions" on a per-relation
basis.  Perhaps they do an optimization based on the index size if there
are no pending DML transactions?

-- Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Dave Page
On Thu, Mar 6, 2008 at 3:49 PM, Greg Smith <[EMAIL PROTECTED]> wrote:
>
>  You only get this accelerated significantly when using MyISAM, which can
>  tell you an exact count of all the rows it hasn't corrupted yet.

Please don't do that again. I'm going to have to spend the next hour
cleaning coffee out of my laptop keyboard.

:-)

-- 
Dave Page
EnterpriseDB UK Ltd: http://www.enterprisedb.com
PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Greg Smith

On Thu, 6 Mar 2008, Steinar H. Gunderson wrote:


On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote:

Count() on Oracle and MySQL is almost instantaneous, even for very large
tables. So why can't Postgres do what they do?


In MySQL's case: Handle transactions. (Try COUNT(*) on an InnoDB table.)


Exactly.  There is a good discussion of this at 
http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/ and I 
found the comments from Ken Jacobs were the most informative.


In short, if you want any reasonable database integrity you have to use 
InnoDB with MySQL, and once you make that choice it has the same problem. 
You only get this accelerated significantly when using MyISAM, which can 
tell you an exact count of all the rows it hasn't corrupted yet.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Bill Moran
In response to Craig James <[EMAIL PROTECTED]>:

> In the 3 years I've been using Postgres, the problem of count() performance 
> has come up more times than I can recall, and each time the answer is, "It's 
> a sequential scan -- redesign your application."
> 
> My question is: What do the other databases do that Postgres can't do, and 
> why not?
> 
> Count() on Oracle and MySQL is almost instantaneous, even for very large 
> tables. So why can't Postgres do what they do?

I don't know about Oracle, but MySQL has this problem as well.  Use
innodb tables and see how slow it is.  The only reason myisam tables
don't have this problem is because they don't implement any of the
features that make the problem difficult to solve.

> On the one hand, I understand that Postgres has its architecture, and I 
> understand the issue of row visibility, and so forth.  On the other hand, my 
> database is just sitting there, nothing going on, no connections except me, 
> and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that 
> either Oracle or MySQL would answer in a fraction of a second.  It's hard for 
> me to believe there isn't a better way.

There's been discussion about putting visibility information in indexes.
I don't know how far along that effort is, but I expect that will improve
count() performance significantly.

> This is a real problem.  Countless people (including me) have spent 
> significant effort rewriting applications because of this performance flaw in 
> Postgres.  Over and over, the response is, "You don't really need to do that 
> ... change your application."  Well, sure, it's always possible to change the 
> application, but that misses the point.  To most of us users, count() seems 
> like it should be a trivial operation.  On other relational database systems, 
> it is a trivial operation.
> 
> This is really a significant flaw on an otherwise excellent relational 
> database system.

Not really.  It really is a design flaw in your application ... it doesn't
make relational sense to use the number of rows in a table for anything.
Just because other people do it frequently doesn't make it right.

That being said, it's still a useful feature, and I don't hear anyone
denying that.  As I said, google around a bit WRT to PG storing
visibility information in indexes, as I think that's the way this will
be improved.

> My rant for today...

Feel better now?

-- 
Bill Moran

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Steinar H. Gunderson
On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote:
> Count() on Oracle and MySQL is almost instantaneous, even for very large 
> tables. So why can't Postgres do what they do?

In MySQL's case: Handle transactions. (Try COUNT(*) on an InnoDB table.)

/* Steinar */
-- 
Homepage: http://www.sesse.net/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Bruce Momjian
Craig James wrote:
> This is a real problem.  Countless people (including me) have
> spent significant effort rewriting applications because of this
> performance flaw in Postgres.  Over and over, the response is,
> "You don't really need to do that ... change your application."
> Well, sure, it's always possible to change the application, but
> that misses the point.  To most of us users, count() seems like
> it should be a trivial operation.  On other relational database
> systems, it is a trivial operation.
> 
> This is really a significant flaw on an otherwise excellent
> relational database system.

Have you read the TODO items related to this?

--
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Craig James

In the 3 years I've been using Postgres, the problem of count() performance has come up 
more times than I can recall, and each time the answer is, "It's a sequential scan 
-- redesign your application."

My question is: What do the other databases do that Postgres can't do, and why 
not?

Count() on Oracle and MySQL is almost instantaneous, even for very large 
tables. So why can't Postgres do what they do?

On the one hand, I understand that Postgres has its architecture, and I 
understand the issue of row visibility, and so forth.  On the other hand, my 
database is just sitting there, nothing going on, no connections except me, 
and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that 
either Oracle or MySQL would answer in a fraction of a second.  It's hard for 
me to believe there isn't a better way.

This is a real problem.  Countless people (including me) have spent significant effort 
rewriting applications because of this performance flaw in Postgres.  Over and over, the 
response is, "You don't really need to do that ... change your application."  
Well, sure, it's always possible to change the application, but that misses the point.  
To most of us users, count() seems like it should be a trivial operation.  On other 
relational database systems, it is a trivial operation.

This is really a significant flaw on an otherwise excellent relational database 
system.

My rant for today...
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Greg Smith

On Thu, 6 Mar 2008, sathiya psql wrote:

any way will you explain., what is this COST, actual time and other 
stuffs


There's a long list of links to tools and articles on this subject at 
http://www.postgresqldocs.org/index.php/Using_EXPLAIN


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Greg Smith

On Thu, 6 Mar 2008, sathiya psql wrote:


is there any article saying the difference between this 7.3 and 8.4


I've collected a list of everything on this topic I've seen at 
http://www.postgresqldocs.org/index.php/Version_8.3_Changes


The Feature Matrix linked to there will be a quicker way to see what's 
happened than sorting through the release notes.


None of these changes change the fact that getting an exact count in this 
situation takes either a sequential scan or triggers.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Mark Mielke

A. Kretschmer wrote:

am  Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes:
  

TRIGGER i can use if i want the count of the whole table, but i require for
some of the rows with WHERE condition

so how to do that ???



Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on
this row. Can you show us the output for a EXPLAIN ANALYSE SELECT
count(*) from  WHERE  = ... ?
  


Actually - in this case, TRIGGER can be a good idea. If your count table 
can include the where information, then you no longer require an 
effective table-wide lock for updates.


In the past I have used sequential articles numbers within a topic for 
an online community. Each topic row had an article_count. To generate a 
new article, I could update the article_count and use the number I had 
generated as the article number. To query the number of articles in a 
particular topic, article_count was available. Given thousands of 
topics, and 10s of thousands of articles, the system worked pretty good. 
Not in the millions range as the original poster, but I saw no reason 
why this wouldn't scale.


For the original poster: You might be desperate and looking for help 
from the only place you know to get it from, but some of your recent 
answers have shown that you are either not reading the helpful responses 
provided to you, or you are unwilling to do your own research. If that 
continues, I won't be posting to aid you.


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>



Re: [PERFORM] count * performance issue

2008-03-06 Thread Alvaro Herrera
sathiya psql escribió:
> > Yes it is the latest stable version.
> 
> is there any article saying the difference between this 7.3 and 8.4

http://www.postgresql.org/docs/8.3/static/release.html

In particular,
http://www.postgresql.org/docs/8.3/static/release-8-3.html
http://www.postgresql.org/docs/8.3/static/release-8-2.html
http://www.postgresql.org/docs/8.3/static/release-8-1.html
http://www.postgresql.org/docs/8.3/static/release-8-0.html
which are all the major releases between 7.4 and 8.3.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread A. Kretschmer
am  Thu, dem 06.03.2008, um 18:13:50 +0530 mailte sathiya psql folgendes:
> 
> Yes it is the latest stable version.
> 
> 
> is there any article saying the difference between this 7.3 and 8.4

http://developer.postgresql.org/pgdocs/postgres/release.html


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Harald Armin Massa
Of course, the official documentation covers that information in its
release notes

http://www.postgresql.org/docs/8.3/static/release.html

best wishes

Harald

On Thu, Mar 6, 2008 at 1:43 PM, sathiya psql <[EMAIL PROTECTED]> wrote:
>
>
> >
> >
> >
> > Yes it is the latest stable version.
>
> is there any article saying the difference between this 7.3 and 8.4
>
>
>



-- 
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread sathiya psql
> Yes it is the latest stable version.
>

is there any article saying the difference between this 7.3 and 8.4


Re: [PERFORM] count * performance issue

2008-03-06 Thread Dave Cramer

Hi,

On 6-Mar-08, at 6:58 AM, sathiya psql wrote:


The only thing that is going to help you is really fast disks, and
more memory, and you should consider moving to 8.3 for all the other
performance benefits.
Is 8.3 is a stable version or what is the latest stable version of  
postgres ??



Yes it is the latest stable version.

moving my database from 7.4 to 8.3 will it do any harm ??


You will have to test this yourself. There may be issues

what are all the advantages of moving from 7.4 to 8.3

Every version of postgresql has improved performance, and robustness;  
so you will get better overall performance. However I want to caution  
you this is not a panacea. It will NOT solve your seq scan problem.




Dave





Re: [PERFORM] count * performance issue

2008-03-06 Thread Dave Cramer


On 6-Mar-08, at 1:43 AM, sathiya psql wrote:


is there any way to explicitly force the postgres to use index scan




If you want to count all the rows in the table there is only one way  
to do it (without keeping track yourself with a trigger ); a seq scan.


An index will not help you.

The only thing that is going to help you is really fast disks, and  
more memory, and you should consider moving to 8.3 for all the other  
performance benefits.


Dave 


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-05 Thread A. Kretschmer
am  Thu, dem 06.03.2008, um 12:36:48 +0530 mailte sathiya psql folgendes:
> 
>   QUERY PLAN
> --
>  Aggregate  (cost=205756.95..205756.95 rows=1 width=0) (actual time=
> 114675.042..114675.042 rows=1 loops=1)
>->  Seq Scan on call_log  (cost=0.00..193224.16 rows=5013112 width=0)
> (actual time=11.754..91429.594 rows=5061619 loops=1)
>  Filter: (call_id > 0)
>  Total runtime: 114699.797 ms
> (4 rows)

'call_id > 0' are your where-condition? An INDEX can't help, all rows
with call_id > 0 are in the result, and i guess, that's all records in
the table.


> 
> 
> it is now taking 114 seconds, i think because of load in my system any way
> will you explain., what is this COST, actual time and other stuffs


08:16 < akretschmer> ??explain
08:16 < rtfm_please> For information about explain
08:16 < rtfm_please> see http://explain-analyze.info
08:16 < rtfm_please> or 
http://www.depesz.com/index.php/2007/08/06/better-explain-analyze/
08:16 < rtfm_please> or 
http://www.postgresql.org/docs/current/static/sql-explain.html

and 

http://redivi.com/~bob/oscon2005_pgsql_pdf/OSCON_Explaining_Explain_Public.pdf


Read this to learn more about explain.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-05 Thread sathiya psql
  QUERY PLAN
--
 Aggregate  (cost=205756.95..205756.95 rows=1 width=0) (actual time=
114675.042..114675.042 rows=1 loops=1)
   ->  Seq Scan on call_log  (cost=0.00..193224.16 rows=5013112 width=0)
(actual time=11.754..91429.594 rows=5061619 loops=1)
 Filter: (call_id > 0)
 Total runtime: 114699.797 ms
(4 rows)


it is now taking 114 seconds, i think because of load in my system any
way will you explain., what is this COST, actual time and other stuffs

On Thu, Mar 6, 2008 at 12:27 PM, A. Kretschmer <
[EMAIL PROTECTED]> wrote:

> am  Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes:
> > TRIGGER i can use if i want the count of the whole table, but i require
> for
> > some of the rows with WHERE condition
> >
> > so how to do that ???
>
> Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on
> this row. Can you show us the output for a EXPLAIN ANALYSE SELECT
> count(*) from  WHERE  = ... ?
>
>
> Andreas
> --
> Andreas Kretschmer
> Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
> GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
>
> http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
>


Re: [PERFORM] count * performance issue

2008-03-05 Thread A. Kretschmer
am  Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes:
> TRIGGER i can use if i want the count of the whole table, but i require for
> some of the rows with WHERE condition
> 
> so how to do that ???

Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on
this row. Can you show us the output for a EXPLAIN ANALYSE SELECT
count(*) from  WHERE  = ... ?


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-05 Thread A. Kretschmer
am  Thu, dem 06.03.2008, um 12:13:17 +0530 mailte sathiya psql folgendes:
> is there any way to explicitly force the postgres to use index scan

Not realy, PG use a cost-based optimizer and use an INDEX if it make
sense.


> 
> On Thu, Mar 6, 2008 at 12:06 PM, A. Kretschmer <
> [EMAIL PROTECTED]> wrote:

please, no silly top-posting with the complete quote below.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Fwd: [PERFORM] count * performance issue

2008-03-05 Thread sathiya psql
-- Forwarded message --
From: sathiya psql <[EMAIL PROTECTED]>
Date: Thu, Mar 6, 2008 at 12:17 PM
Subject: Re: [PERFORM] count * performance issue
To: "A. Kretschmer" <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]


TRIGGER i can use if i want the count of the whole table, but i require for
some of the rows with WHERE condition

so how to do that ???


On Thu, Mar 6, 2008 at 12:06 PM, A. Kretschmer <
[EMAIL PROTECTED]> wrote:

> am  Thu, dem 06.03.2008, um  1:26:46 -0500 mailte Mark Mielke folgendes:
> >
> >
> > There aren't a general solution. If you realy need the exact
> count of
> > tuples than you can play with a TRIGGER and increase/decrease
> the
> > tuple-count for this table in an extra table.
> >
> >
> > Of course, this means accepting the cost of obtaining update locks on
> the count
> > table.
> >
> > The original poster should understand that they can either get a fast
> estimated
> > count, or they can get a slow accurate count (either slow in terms of
> select
> > using count(*) or slow in terms of updates using triggers and locking).
> >
> > Other systems have their own issues. An index scan may be faster than a
> table
> > scan for databases that can accurately determine counts using only the
> index,
>
> No. The current index-implementation contains no information about the
> row-visibility within the current transaction. You need to scan the
> whole data-table to obtain if the current row are visible within the
> current transaction.
>
>
> > but it's still a relatively slow operation, and people don't normally
> need an
> > accurate count for records in the range of 100,000+? :-)
>
> right.
>
>
> Andreas
> --
> Andreas Kretschmer
> Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
> GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
>
> http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
>


Re: [PERFORM] count * performance issue

2008-03-05 Thread sathiya psql
is there any way to explicitly force the postgres to use index scan

On Thu, Mar 6, 2008 at 12:06 PM, A. Kretschmer <
[EMAIL PROTECTED]> wrote:

> am  Thu, dem 06.03.2008, um  1:26:46 -0500 mailte Mark Mielke folgendes:
> >
> >
> > There aren't a general solution. If you realy need the exact
> count of
> > tuples than you can play with a TRIGGER and increase/decrease
> the
> > tuple-count for this table in an extra table.
> >
> >
> > Of course, this means accepting the cost of obtaining update locks on
> the count
> > table.
> >
> > The original poster should understand that they can either get a fast
> estimated
> > count, or they can get a slow accurate count (either slow in terms of
> select
> > using count(*) or slow in terms of updates using triggers and locking).
> >
> > Other systems have their own issues. An index scan may be faster than a
> table
> > scan for databases that can accurately determine counts using only the
> index,
>
> No. The current index-implementation contains no information about the
> row-visibility within the current transaction. You need to scan the
> whole data-table to obtain if the current row are visible within the
> current transaction.
>
>
> > but it's still a relatively slow operation, and people don't normally
> need an
> > accurate count for records in the range of 100,000+? :-)
>
> right.
>
>
> Andreas
> --
> Andreas Kretschmer
> Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
> GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
>
> http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
>


Re: [PERFORM] count * performance issue

2008-03-05 Thread A. Kretschmer
am  Thu, dem 06.03.2008, um  1:26:46 -0500 mailte Mark Mielke folgendes:
>  
> 
> There aren't a general solution. If you realy need the exact count of
> tuples than you can play with a TRIGGER and increase/decrease the
> tuple-count for this table in an extra table.
> 
> 
> Of course, this means accepting the cost of obtaining update locks on the 
> count
> table.
> 
> The original poster should understand that they can either get a fast 
> estimated
> count, or they can get a slow accurate count (either slow in terms of select
> using count(*) or slow in terms of updates using triggers and locking).
> 
> Other systems have their own issues. An index scan may be faster than a table
> scan for databases that can accurately determine counts using only the index,

No. The current index-implementation contains no information about the
row-visibility within the current transaction. You need to scan the
whole data-table to obtain if the current row are visible within the
current transaction.


> but it's still a relatively slow operation, and people don't normally need an
> accurate count for records in the range of 100,000+? :-)

right.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-05 Thread Shoaib Mir
On Thu, Mar 6, 2008 at 5:31 PM, sathiya psql <[EMAIL PROTECTED]> wrote:

> will you please tell, what is autovacuuming... and wat it ll do... is
> there any good article in this
>
>
>
Read this -->
http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html#AUTOVACUUM

-- 
Shoaib Mir
Fujitsu Australia Software Technology
[EMAIL PROTECTED]


Re: [PERFORM] count * performance issue

2008-03-05 Thread sathiya psql
will you please tell, what is autovacuuming... and wat it ll do... is there
any good article in this

On Thu, Mar 6, 2008 at 11:56 AM, Shoaib Mir <[EMAIL PROTECTED]> wrote:

> On Thu, Mar 6, 2008 at 5:19 PM, sathiya psql <[EMAIL PROTECTED]>
> wrote:
>
> > buy every time i need to put ANALYZE...
> > this takes the same time as count(*) takes, what is the use ??
> >
> >
> >
> Dont you have autovacuuming running in the background which is taking care
> of the analyze as well?
>
> If not then hmm turn it on and doing manual analyze then shouldnt I guess
> take much time!
>
> But yes, I will say if its possible go with the trigger option as that
> might be more helpful and a very fast way to do that.
>
>
> --
> Shoaib Mir
> Fujitsu Australia Software Technology
> [EMAIL PROTECTED]
>


Re: [PERFORM] count * performance issue

2008-03-05 Thread Mark Mielke




There aren't a general solution. If you realy need the exact count of
tuples than you can play with a TRIGGER and increase/decrease the
tuple-count for this table in an extra table.



Of course, this means accepting the cost of obtaining update locks on 
the count table.


The original poster should understand that they can either get a fast 
estimated count, or they can get a slow accurate count (either slow in 
terms of select using count(*) or slow in terms of updates using 
triggers and locking).


Other systems have their own issues. An index scan may be faster than a 
table scan for databases that can accurately determine counts using only 
the index, but it's still a relatively slow operation, and people don't 
normally need an accurate count for records in the range of 100,000+? :-)


Cheers,
mark

--
Mark Mielke <[EMAIL PROTECTED]>



Re: [PERFORM] count * performance issue

2008-03-05 Thread Shoaib Mir
On Thu, Mar 6, 2008 at 5:19 PM, sathiya psql <[EMAIL PROTECTED]> wrote:

> buy every time i need to put ANALYZE...
> this takes the same time as count(*) takes, what is the use ??
>
>
>
Dont you have autovacuuming running in the background which is taking care
of the analyze as well?

If not then hmm turn it on and doing manual analyze then shouldnt I guess
take much time!

But yes, I will say if its possible go with the trigger option as that might
be more helpful and a very fast way to do that.

-- 
Shoaib Mir
Fujitsu Australia Software Technology
[EMAIL PROTECTED]


Re: [PERFORM] count * performance issue

2008-03-05 Thread sathiya psql
buy every time i need to put ANALYZE...
this takes the same time as count(*) takes, what is the use ??

On Thu, Mar 6, 2008 at 11:45 AM, Shoaib Mir <[EMAIL PROTECTED]> wrote:

> On Thu, Mar 6, 2008 at 5:08 PM, A. Kretschmer <
> [EMAIL PROTECTED]> wrote:>
>
> > > am having a table with nearly 50 lakh records,
> > >
> > > it has more than 15 columns, i want to count how many records are
> > there, it is
> > > taking nearly 17 seconds to do that...
> > >
> > > i know that to get a approximate count we can use
> > >  SELECT reltuples FROM pg_class where relname = TABLENAME;
> > >
> > > but this give approximate count, and i require exact count...
> >
> > There aren't a general solution. If you realy need the exact count of
> > tuples than you can play with a TRIGGER and increase/decrease the
> > tuple-count for this table in an extra table.
> >
> >
> >
> Or do something like:
>
> ANALYZE tablename;
> select reltuple from pg_class where relname = 'tablename';
>
> That will also return the total number of rows in a table and I guess
> might be much faster then doing a count(*) but yes if trigger can be an
> option that can be the easiest way to do it and fastest too.
>
> --
> Shoaib Mir
> Fujitsu Australia Software Technology
> [EMAIL PROTECTED]


Re: [PERFORM] count * performance issue

2008-03-05 Thread Shoaib Mir
On Thu, Mar 6, 2008 at 5:08 PM, A. Kretschmer <
[EMAIL PROTECTED]> wrote:>

> > am having a table with nearly 50 lakh records,
> >
> > it has more than 15 columns, i want to count how many records are there,
> it is
> > taking nearly 17 seconds to do that...
> >
> > i know that to get a approximate count we can use
> >  SELECT reltuples FROM pg_class where relname = TABLENAME;
> >
> > but this give approximate count, and i require exact count...
>
> There aren't a general solution. If you realy need the exact count of
> tuples than you can play with a TRIGGER and increase/decrease the
> tuple-count for this table in an extra table.
>
>
>
Or do something like:

ANALYZE tablename;
select reltuple from pg_class where relname = 'tablename';

That will also return the total number of rows in a table and I guess might
be much faster then doing a count(*) but yes if trigger can be an option
that can be the easiest way to do it and fastest too.

-- 
Shoaib Mir
Fujitsu Australia Software Technology
[EMAIL PROTECTED]


Re: [PERFORM] count * performance issue

2008-03-05 Thread A. Kretschmer
am  Thu, dem 06.03.2008, um 11:13:01 +0530 mailte sathiya psql folgendes:
> count(*) tooks much time...
> 
> but with the where clause we can make this to use indexing,... what where
> clause we can use??

An index without a WHERE can't help to avoid a seq. scan.


> 
> Am using postgres 7.4 in Debian OS with 1 GB RAM,

PG 7.4 are very old... Recent versions are MUCH faster.



> 
> am having a table with nearly 50 lakh records,
> 
> it has more than 15 columns, i want to count how many records are there, it is
> taking nearly 17 seconds to do that...
> 
> i know that to get a approximate count we can use
>  SELECT reltuples FROM pg_class where relname = TABLENAME;
> 
> but this give approximate count, and i require exact count...

There aren't a general solution. If you realy need the exact count of
tuples than you can play with a TRIGGER and increase/decrease the
tuple-count for this table in an extra table.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-05 Thread Chris

sathiya psql wrote:

count(*) tooks much time...

but with the where clause we can make this to use indexing,... what 
where clause we can use??


Am using postgres 7.4 in Debian OS with 1 GB RAM,

am having a table with nearly 50 lakh records,


Looks suspiciously like a question asked yesterday:

http://archives.postgresql.org/pgsql-performance/2008-03/msg00068.php

--
Postgresql & php tutorials
http://www.designmagick.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


[PERFORM] count * performance issue

2008-03-05 Thread sathiya psql
count(*) tooks much time...

but with the where clause we can make this to use indexing,... what where
clause we can use??

Am using postgres 7.4 in Debian OS with 1 GB RAM,

am having a table with nearly 50 lakh records,

it has more than 15 columns, i want to count how many records are there, it
is taking nearly 17 seconds to do that...

i know that to get a approximate count we can use
 SELECT reltuples FROM pg_class where relname = TABLENAME;

but this give approximate count, and i require exact count...


Re: [PERFORM] count(*) performance

2006-03-28 Thread Jim C. Nasby
On Mon, Mar 27, 2006 at 12:20:54PM -0700, Brendan Duddridge wrote:
> Does that mean that even though autovacuum is turned on, you still  
> should do a regular vacuum analyze periodically?

Doing a periodic vacuumdb -avz and keeping an eye on the last few lines
isn't a bad idea. It would also be helpful if there was a log parser
that could take a look at the output of a vacuumdb -av and look for any
problem areas, such as relations that have a lot of free space in them.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] count(*) performance

2006-03-28 Thread Markus Schaber
Gábriel Ákos wrote:

> I thought that too. Autovacuum is running on our system but it didn't do
> the trick. Anyway the issue is solved, thank you all for helping. :)

Hi, Gabriel, it may be that your Free Space Map (FSM) setting is way to
low.

Try increasing it.

Btw, VACUUM outputs a Warning if FSM is not high enough, maybe you can
find useful hints in the log file.

HTH
Markus

-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] count(*) performance

2006-03-27 Thread Mikael Carneholm
I think it is definitely necessary from an administration point of view - as an 
administrator, I want to know:

1) Are there any stats (at all) in a schema
2) Are there any stats on the table that slow_query_foo is targeting
3) If I have stats, how recent are they
4) Could it be that there are a lot of dead tuples lying around (given the 
amount of traffic I know I have)

These would be (are always!) the first questions I ask myself when I'm about to 
identify performance problems in an app, don't know how other people do though 
:)

Maybe something I'll try to look into this weekend, if I can spare some time.

- Mikael


-Original Message-
From: Matthew T. O'Connor [mailto:[EMAIL PROTECTED]
Sent: den 28 mars 2006 00:43
To: Mikael Carneholm
Cc: Postgresql Performance
Subject: Re: [PERFORM] count(*) performance


Mikael Carneholm wrote:
> This is where a "last_vacuumed" (and "last_analyzed") column in
> pg_statistic(?) would come in handy. Each time vacuum or analyze has
> finished, update the row for the specific table that was
> vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed
> column. No more guessing "maybe I haven't vacuumed/analyzed in a while",
> and each time a user complains about bad performance, one could request
> the user to do a "select s.last_vacuumed, s.last_analyzed from
> pg_statistic s, pg_attribute a, pg_class c where ..."
> 
> It SOUNDS easy to implement, but that has fooled me before... :-)


It is fairly easy to implement, however it has been discussed before and 
decided that it wasn't necessary.  What the system cares about is how 
long it's been since the last vacuum in terms of XIDs not time.  Storing 
a timestamp would make it more human readable, but I'm not sure the 
powers that be want to add two new columns to some system table to 
accommodate this.

Matt

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] count(*) performance

2006-03-27 Thread Tom Lane
"Matthew T. O'Connor"  writes:
> It is fairly easy to implement, however it has been discussed before and 
> decided that it wasn't necessary.  What the system cares about is how 
> long it's been since the last vacuum in terms of XIDs not time.

I think Alvaro is intending to do the latter (store per-table vacuum xid
info) for 8.2.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] count(*) performance

2006-03-27 Thread Matthew T. O'Connor

Mikael Carneholm wrote:

This is where a "last_vacuumed" (and "last_analyzed") column in
pg_statistic(?) would come in handy. Each time vacuum or analyze has
finished, update the row for the specific table that was
vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed
column. No more guessing "maybe I haven't vacuumed/analyzed in a while",
and each time a user complains about bad performance, one could request
the user to do a "select s.last_vacuumed, s.last_analyzed from
pg_statistic s, pg_attribute a, pg_class c where ..."

It SOUNDS easy to implement, but that has fooled me before... :-)



It is fairly easy to implement, however it has been discussed before and 
decided that it wasn't necessary.  What the system cares about is how 
long it's been since the last vacuum in terms of XIDs not time.  Storing 
a timestamp would make it more human readable, but I'm not sure the 
powers that be want to add two new columns to some system table to 
accommodate this.


Matt

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] count(*) performance

2006-03-27 Thread Mikael Carneholm
This is where a "last_vacuumed" (and "last_analyzed") column in
pg_statistic(?) would come in handy. Each time vacuum or analyze has
finished, update the row for the specific table that was
vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed
column. No more guessing "maybe I haven't vacuumed/analyzed in a while",
and each time a user complains about bad performance, one could request
the user to do a "select s.last_vacuumed, s.last_analyzed from
pg_statistic s, pg_attribute a, pg_class c where ..."

It SOUNDS easy to implement, but that has fooled me before... :-)

- Mikael

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Guido
Neitzer
Sent: den 27 mars 2006 21:44
To: Brendan Duddridge
Cc: Postgresql Performance
Subject: Re: [PERFORM] count(*) performance


On 27.03.2006, at 21:20 Uhr, Brendan Duddridge wrote:

> Does that mean that even though autovacuum is turned on, you still  
> should do a regular vacuum analyze periodically?

It seems that there are situations where autovacuum does not a really  
good job.

However, in our application I have made stupid design decision which  
I want to change as soon as possible. I have a "visit count" column  
in one of the very large tables, so updates are VERY regular. I've  
just checked and saw that autovacuum does a great job with that.

Nevertheless I have set up a cron job to do a standard vacuum every  
month. I've used vacuum full only once after I did a bulk update of  
about 200.000 rows ...

cug

-- 
PharmaLine, Essen, GERMANY
Software and Database Development



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] count(*) performance

2006-03-27 Thread Guido Neitzer

On 27.03.2006, at 21:20 Uhr, Brendan Duddridge wrote:

Does that mean that even though autovacuum is turned on, you still  
should do a regular vacuum analyze periodically?


It seems that there are situations where autovacuum does not a really  
good job.


However, in our application I have made stupid design decision which  
I want to change as soon as possible. I have a "visit count" column  
in one of the very large tables, so updates are VERY regular. I've  
just checked and saw that autovacuum does a great job with that.


Nevertheless I have set up a cron job to do a standard vacuum every  
month. I've used vacuum full only once after I did a bulk update of  
about 200.000 rows ...


cug

--
PharmaLine, Essen, GERMANY
Software and Database Development




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] count(*) performance

2006-03-27 Thread Alvaro Herrera
Brendan Duddridge wrote:
> Does that mean that even though autovacuum is turned on, you still  
> should do a regular vacuum analyze periodically?

No, it probably means you have set FSM settings too low, or not tuned
the autovacuum parameters to your specific situation.

A bug in the autovacuum daemon is not unexpected however, so if it
doesn't work after tuning, let us know.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] count(*) performance

2006-03-27 Thread Matthew T. O'Connor

Gábriel Ákos wrote:

Luke Lonergan wrote:

Gabriel,

On 3/27/06 10:05 AM, "Gábriel Ákos" <[EMAIL PROTECTED]> wrote:


That gave me an idea. I thought that autovacuum is doing it right, but I
issued a vacuum full analyze verbose , and it worked all the day.
After that I've tweaked memory settings a bit too (more fsm_pages)


Oops! I replied to your disk speed before I saw this.

The only thing is - you probably don't want to do a "vacuum full", but
rather a simple "vacuum" should be enough.


I thought that too. Autovacuum is running on our system but it didn't do 
the trick. Anyway the issue is solved, thank you all for helping. :)


Yeah, it would be nice of autovacuum had some way of raising a flag to 
the admin that given current settings (thresholds, FSM etc...), it's not 
keeping up with the activity.  I don't know how to do this, but I hope 
someone else has some good ideas.


Matt


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] count(*) performance

2006-03-27 Thread Brendan Duddridge
Does that mean that even though autovacuum is turned on, you still  
should do a regular vacuum analyze periodically?


Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Mar 27, 2006, at 11:14 AM, Luke Lonergan wrote:


Gabriel,

On 3/27/06 10:05 AM, "Gábriel Ákos" <[EMAIL PROTECTED]> wrote:

That gave me an idea. I thought that autovacuum is doing it right,  
but I

issued a vacuum full analyze verbose , and it worked all the day.
After that I've tweaked memory settings a bit too (more fsm_pages)


Oops! I replied to your disk speed before I saw this.

The only thing is - you probably don't want to do a "vacuum full", but
rather a simple "vacuum" should be enough.

- Luke



---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings





smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] count(*) performance

2006-03-27 Thread Gábriel Ákos

Luke Lonergan wrote:

Gabriel,

On 3/27/06 10:05 AM, "Gábriel Ákos" <[EMAIL PROTECTED]> wrote:


That gave me an idea. I thought that autovacuum is doing it right, but I
issued a vacuum full analyze verbose , and it worked all the day.
After that I've tweaked memory settings a bit too (more fsm_pages)


Oops! I replied to your disk speed before I saw this.

The only thing is - you probably don't want to do a "vacuum full", but
rather a simple "vacuum" should be enough.


I thought that too. Autovacuum is running on our system but it didn't do 
the trick. Anyway the issue is solved, thank you all for helping. :)



--
Üdvözlettel,
Gábriel Ákos
-=E-Mail :[EMAIL PROTECTED]|Web:  http://www.i-logic.hu=-
-=Tel/fax:+3612367353|Mobil:+36209278894=-

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] count(*) performance

2006-03-27 Thread Luke Lonergan
Gabriel,

On 3/27/06 10:05 AM, "Gábriel Ákos" <[EMAIL PROTECTED]> wrote:

> That gave me an idea. I thought that autovacuum is doing it right, but I
> issued a vacuum full analyze verbose , and it worked all the day.
> After that I've tweaked memory settings a bit too (more fsm_pages)

Oops! I replied to your disk speed before I saw this.

The only thing is - you probably don't want to do a "vacuum full", but
rather a simple "vacuum" should be enough.

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] count(*) performance

2006-03-27 Thread Gábriel Ákos

Jim C. Nasby wrote:


But in this case, I'd bet money that if it's taking 4 minutes something
else is wrong. Have you been vacuuming that table frequently enough?


That gave me an idea. I thought that autovacuum is doing it right, but I
issued a vacuum full analyze verbose , and it worked all the day.
After that I've tweaked memory settings a bit too (more fsm_pages)

Now:

staging=# SELECT count(*) from infx.infx_product;
  count
-
 3284997
(1 row)

Time: 1301.049 ms

As I saw the output, the database was compressed to 10% of its size :)
This table has quite big changes every 4 hour, let's see how it works.
Maybe I'll have to issue full vacuums from cron regularly.


What's SELECT relpages FROM pg_class WHERE relname='tablename' show?


This went to 10% as well, now it's around 156000 pages.

Regards,
Akos


--
Üdvözlettel,
Gábriel Ákos
-=E-Mail :[EMAIL PROTECTED]|Web:  http://www.i-logic.hu=-
-=Tel/fax:+3612367353|Mobil:+36209278894=-


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] count(*) performance

2006-03-27 Thread Gábriel Ákos

Luke Lonergan wrote:


To test your disk speed, use the following commands and report the times
here:

  time bash -c "dd if=/dev/zero of=bigfile bs=8k count=50 && sync"


[EMAIL PROTECTED]:/fast #   time bash -c "dd if=/dev/zero of=bigfile bs=8k
count=50 && sync"
50+0 records in
50+0 records out
409600 bytes transferred in 45.469404 seconds (90082553 bytes/sec)
real0m56.880s
user0m0.112s
sys 0m18.937s


  time dd if=bigfile of=/dev/null bs=8k


[EMAIL PROTECTED]:/fast #   time dd if=bigfile of=/dev/null bs=8k
50+0 records in
50+0 records out
409600 bytes transferred in 53.542147 seconds (76500481 bytes/sec)

real0m53.544s
user0m0.048s
sys 0m10.637s

I guess these values aren't that bad :)

--
Üdvözlettel,
Gábriel Ákos
-=E-Mail :[EMAIL PROTECTED]|Web:  http://www.i-logic.hu=-
-=Tel/fax:+3612367353|Mobil:+36209278894=-


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] count(*) performance

2006-03-27 Thread Luke Lonergan
Gabriel,

On 3/27/06 5:34 AM, "Gábriel Ákos" <[EMAIL PROTECTED]> wrote:

> Question: I have a table with 2.5M rows. count(*) on this table is
> running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10
> array (sata, not scsi)) Is this normal? How could I make it run faster?
> Maybe make it run faster for the 2nd time? Which parameters should I
> change in postgresql.conf and how?

Before changing anything with your Postgres configuration, you should check
your hard drive array performance.  All select count(*) does is a sequential
scan of your data, and if the table is larger than memory, or if it's the
first time you've scanned it, it is limited by your disk speed.

To test your disk speed, use the following commands and report the times
here:

  time bash -c "dd if=/dev/zero of=bigfile bs=8k count=50 && sync"
  time dd if=bigfile of=/dev/null bs=8k

If these are taking a long time, from another session watch the I/O rate
with "vmstat 1" for a while and report that here.

- Luke   



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] count(*) performance

2006-03-27 Thread Jim C. Nasby
On Mon, Mar 27, 2006 at 03:34:32PM +0200, G?briel ?kos wrote:
> Hi,
> 
> I guess this is an age-old 100times answered question, but I didn't find 
> the answer to it yet (neither in the FAQ nor in the mailing list archives).
> 
> Question: I have a table with 2.5M rows. count(*) on this table is 
> running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10 
> array (sata, not scsi)) Is this normal? How could I make it run faster?
> Maybe make it run faster for the 2nd time? Which parameters should I 
> change in postgresql.conf and how?

First, count(*) on PostgreSQL tends to be slow because you can't do
index covering[1].

But in this case, I'd bet money that if it's taking 4 minutes something
else is wrong. Have you been vacuuming that table frequently enough?
What's SELECT relpages FROM pg_class WHERE relname='tablename' show?

[1] 
http://www.pervasive-postgres.com/lp/newsletters/2006/Insights_postgres_Feb.asp#5
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] count(*) performance

2006-03-27 Thread Gábriel Ákos

Hi,

I guess this is an age-old 100times answered question, but I didn't find 
the answer to it yet (neither in the FAQ nor in the mailing list archives).


Question: I have a table with 2.5M rows. count(*) on this table is 
running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10 
array (sata, not scsi)) Is this normal? How could I make it run faster?
Maybe make it run faster for the 2nd time? Which parameters should I 
change in postgresql.conf and how?






--
Üdvözlettel,
Gábriel Ákos
-=E-Mail :[EMAIL PROTECTED]|Web:  http://www.i-logic.hu=-
-=Tel/fax:+3612367353|Mobil:+36209278894=-

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings