Re: [PERFORM] count * performance issue
On Tue, Mar 11, 2008 at 02:19:09PM +, Matthew wrote: > of rows with IS NULL, then someone changes a row, then you find the count > of rows with IS NOT NULL. Add the two together, and there may be rows that > were counted twice, or not at all. Only if you count in READ COMMITTED. A -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Matthew wrote: No, actually I was referring to a race condition. So, you find the count of rows with IS NULL, then someone changes a row, then you find the count of rows with IS NOT NULL. Add the two together, and there may be rows that were counted twice, or not at all. Not a problem if you use a serializable transaction, or if you do SELECT COUNT(*) from table WHERE indexed_field IS NULL UNION ALL SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL as one statement. However, this makes no sense whatsoever. As both index scans (assuming the planner even chooses an index scan for them, which seems highly unlikely) still have to visit each tuple in the heap. It's always going to be slower than a single "SELECT COUNT(*) FROM table" with a seq scan. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
On Tue, 11 Mar 2008, Tino Wildenhain wrote: And certain, qualified definitions of "accurate" as well. Race condition? You mean in a three-state-logic? null, not null and something different? True, False, and FILE_NOT_FOUND. No, actually I was referring to a race condition. So, you find the count of rows with IS NULL, then someone changes a row, then you find the count of rows with IS NOT NULL. Add the two together, and there may be rows that were counted twice, or not at all. Matthew -- It's one of those irregular verbs - "I have an independent mind," "You are an eccentric," "He is round the twist." -- Bernard Woolly, Yes Prime Minister -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Hi, Matthew wrote: On Tue, 11 Mar 2008, Bill Moran wrote: In response to "Robins Tharakan" <[EMAIL PROTECTED]>: Sorry, if I am missing something here, but shouldn't something like this allow us to get a (fast) accurate count ? SELECT COUNT(*) from table WHERE indexed_field IS NULL + SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL For certain, qualified definitions of "fast", sure. And certain, qualified definitions of "accurate" as well. Race condition? You mean in a three-state-logic? null, not null and something different? ;-) Tino -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
On Tue, 11 Mar 2008, Bill Moran wrote: In response to "Robins Tharakan" <[EMAIL PROTECTED]>: Sorry, if I am missing something here, but shouldn't something like this allow us to get a (fast) accurate count ? SELECT COUNT(*) from table WHERE indexed_field IS NULL + SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL For certain, qualified definitions of "fast", sure. And certain, qualified definitions of "accurate" as well. Race condition? Matthew -- "Television is a medium because it is neither rare nor well done." -- Fred Friendly -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
In response to "Robins Tharakan" <[EMAIL PROTECTED]>: > Hi, > > I have been reading this conversation for a few days now and I just wanted > to ask this. From the release notes, one of the new additions in 8.3 is > (Allow col IS NULL to use an index (Teodor)). > > Sorry, if I am missing something here, but shouldn't something like this > allow us to get a (fast) accurate count ? > > SELECT COUNT(*) from table WHERE indexed_field IS NULL > + > SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL For certain, qualified definitions of "fast", sure. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
A Dimarts 11 Març 2008 04:11, Scott Marlowe va escriure: > On Mon, Mar 10, 2008 at 7:57 PM, Robins Tharakan <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I have been reading this conversation for a few days now and I just > > wanted to ask this. From the release notes, one of the new additions in > > 8.3 is (Allow col IS NULL to use an index (Teodor)). > > > > Sorry, if I am missing something here, but shouldn't something like this > > allow us to get a (fast) accurate count ? > > > > SELECT COUNT(*) from table WHERE indexed_field IS NULL > > + > > SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL > > It really depends on the distribution of the null / not nulls in the > table. If it's 50/50 there's no advantage to using the index, as you > still have to check visibility info in the table itself. > > OTOH, if NULL (or converserly not null) are rare, then yes, the index > can help. I.e. if 1% of the tuples are null, the select count(*) from > table where field is null can use the index efficiently. But you'll get a sequential scan with the NOT NULL case which will end up taking more time. (Seq Scan + Index Scan > Seq Scan) -- Albert Cervera Areny Dept. Informàtica Sedifa, S.L. Av. Can Bordoll, 149 08202 - Sabadell (Barcelona) Tel. 93 715 51 11 Fax. 93 715 51 12 AVISO LEGAL La presente comunicación y sus anexos tiene como destinatario la persona a la que va dirigida, por lo que si usted lo recibe por error debe notificarlo al remitente y eliminarlo de su sistema, no pudiendo utilizarlo, total o parcialmente, para ningún fin. Su contenido puede tener información confidencial o protegida legalmente y únicamente expresa la opinión del remitente. El uso del correo electrónico vía Internet no permite asegurarni la confidencialidad de los mensajes nisucorrecta recepción. Enel caso de que el destinatario no consintiera la utilización del correo electrónico, deberá ponerlo en nuestro conocimiento inmediatamente. ... DISCLAIMER . This message and its attachments are intended exclusively for the named addressee. If you receive this message in error, please immediately delete it from your system and notify the sender. You may not use this message or any part of it for any purpose. The message may contain information that is confidential or protected by law, and any opinions expressed are those of the individualsender. Internet e-mail guarantees neither the confidentiality nor the proper receipt of the message sent. If the addressee of this message does not consent to the use of internete-mail,pleaseinform usinmmediately. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
On Mon, Mar 10, 2008 at 7:57 PM, Robins Tharakan <[EMAIL PROTECTED]> wrote: > Hi, > > I have been reading this conversation for a few days now and I just wanted > to ask this. From the release notes, one of the new additions in 8.3 is > (Allow col IS NULL to use an index (Teodor)). > > Sorry, if I am missing something here, but shouldn't something like this > allow us to get a (fast) accurate count ? > > SELECT COUNT(*) from table WHERE indexed_field IS NULL > + > SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL It really depends on the distribution of the null / not nulls in the table. If it's 50/50 there's no advantage to using the index, as you still have to check visibility info in the table itself. OTOH, if NULL (or converserly not null) are rare, then yes, the index can help. I.e. if 1% of the tuples are null, the select count(*) from table where field is null can use the index efficiently. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
On Tue, 11 Mar 2008 08:27:05 +0530 "Robins Tharakan" <[EMAIL PROTECTED]> wrote: > SELECT COUNT(*) from table WHERE indexed_field IS NULL > + > SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL If the selectivity is appropriate yes. However if you have 1 million rows, and 200k of those rows are null (or not null), it is still going to seqscan. joshua d. drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL SPI Liaison | SPI Director | PostgreSQL political pundit signature.asc Description: PGP signature
Re: [PERFORM] count * performance issue
Robins Tharakan wrote: Hi, I have been reading this conversation for a few days now and I just wanted to ask this. From the release notes, one of the new additions in 8.3 is (Allow col IS NULL to use an index (Teodor)). Sorry, if I am missing something here, but shouldn't something like this allow us to get a (fast) accurate count ? SELECT COUNT(*) from table WHERE indexed_field IS NULL + SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL For PostgreSQL: You still don't know whether the row is visible until you check the row. That it's NULL or NOT NULL does not influence this truth. Cheers, mark -- Mark Mielke <[EMAIL PROTECTED]>
Re: [PERFORM] count * performance issue
Hi, I have been reading this conversation for a few days now and I just wanted to ask this. From the release notes, one of the new additions in 8.3 is (Allow col IS NULL to use an index (Teodor)). Sorry, if I am missing something here, but shouldn't something like this allow us to get a (fast) accurate count ? SELECT COUNT(*) from table WHERE indexed_field IS NULL + SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL *Robins Tharakan* -- Forwarded message -- From: Greg Smith <[EMAIL PROTECTED]> Date: Tue, Mar 11, 2008 at 4:31 AM Subject: Re: [PERFORM] count * performance issue To: Joe Mirabal <[EMAIL PROTECTED]> Cc: pgsql-performance@postgresql.org On Mon, 10 Mar 2008, Joe Mirabal wrote: > I run queries on the data nad get immediate max and min as well as other > aggrgate functions very quickly, however a select count(*) of the table > takes forever usually nearly an hour or more. Are you sure the form of "select count(*)" you're using is actually utilizing the index to find a useful subset? What do you get out of EXPLAIN ANALZYE on the query? In order for indexes to be helpful a couple of things need to happen: 1) They have to be structured correctly to be useful 2) There needs to be large enough settings for shared_buffes and effective_cache_size that the database things it can use them efficiently 3) The tables involved need to be ANALYZEd to keep their statistics up to date. The parameters to run a 400GB *table* are very different from the defaults; if you want tuning suggestions you should post the non-default entries in your postgresql.conf file from what you've already adjusted along with basic information about your server (PostgreSQL version, OS, memory, disk setup). > We in our warehouse use the count(*) as our verification of counts by > day/month's etc If you've got a database that size and you're doing that sort of thing on it, you really should be considering partitioning as well. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
On Mon, 10 Mar 2008, Joe Mirabal wrote: I run queries on the data nad get immediate max and min as well as other aggrgate functions very quickly, however a select count(*) of the table takes forever usually nearly an hour or more. Are you sure the form of "select count(*)" you're using is actually utilizing the index to find a useful subset? What do you get out of EXPLAIN ANALZYE on the query? In order for indexes to be helpful a couple of things need to happen: 1) They have to be structured correctly to be useful 2) There needs to be large enough settings for shared_buffes and effective_cache_size that the database things it can use them efficiently 3) The tables involved need to be ANALYZEd to keep their statistics up to date. The parameters to run a 400GB *table* are very different from the defaults; if you want tuning suggestions you should post the non-default entries in your postgresql.conf file from what you've already adjusted along with basic information about your server (PostgreSQL version, OS, memory, disk setup). We in our warehouse use the count(*) as our verification of counts by day/month's etc If you've got a database that size and you're doing that sort of thing on it, you really should be considering partitioning as well. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
On Mon, Mar 10, 2008 at 1:54 PM, Joe Mirabal <[EMAIL PROTECTED]> wrote: > Gregory, > > I just joined this listserv and was happy to see this posting. I have a > 400GB table that I have indexed (building the index took 27 hours) , Loading > the table with 10 threads took 9 hours. I run queries on the data nad get > immediate max and min as well as other aggrgate functions very quickly, > however a select count(*) of the table takes forever usually nearly an hour > or more. > > Do you have any tuning recommendations. We in our warehouse use the > count(*) as our verification of counts by day/month's etc and in Netezza its > immediate. I tried by adding oids. BUT the situation I learned was that > adding the oids in the table adds a significasnt amount of space to the data > AND the index. Yeah, this is a typical problem people run into with MVCC databases to one extent or another. PostgreSQL has no native way to just make it faster. However, if it's a table with wide rows, you can use a lookup table to help a bit. Have a FK with cascading deletes from the master table to a table that just holds the PK for it, and do count(*) on that table. Otherwise, you have the trigger solution mentioned previously. Also, if you only need an approximate count, then you can use the system tables to get that with something like select reltuples from pg_class where relname='tablename'; after an analyze. It won't be 100% accurate, but it will be pretty close most the time. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
On Mon, 10 Mar 2008, Bill Moran wrote: Some searches through the archives should turn up details on these methods. I've collected up what looked like the best resources on this topic into the FAQ entry at http://www.postgresqldocs.org/index.php/Slow_Count General Bits has already done two good summary articles here and I'd think wading through the archives directly shouldn't be necessary. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
In response to "Joe Mirabal" <[EMAIL PROTECTED]>: > Gregory, > > I just joined this listserv and was happy to see this posting. I have a > 400GB table that I have indexed (building the index took 27 hours) , Loading > the table with 10 threads took 9 hours. I run queries on the data nad get > immediate max and min as well as other aggrgate functions very quickly, > however a select count(*) of the table takes forever usually nearly an hour > or more. > > Do you have any tuning recommendations. We in our warehouse use the > count(*) as our verification of counts by day/month's etc and in Netezza its > immediate. I tried by adding oids. BUT the situation I learned was that > adding the oids in the table adds a significasnt amount of space to the data > AND the index. > > As you may gather from this we are relatively new on Postgres. > > Any suggestions you can give me would be most helpful. One approach to this problem is to create triggers that keep track of the total count whenever rows are added or deleted. This adds some overhead to the update process, but the correct row count is always quickly available. Another is to use EXPLAIN to get an estimate of the # of rows from the planner. This works well if an estimate is acceptable, but can't be trusted for precise counts. Some searches through the archives should turn up details on these methods. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Gregory, I just joined this listserv and was happy to see this posting. I have a 400GB table that I have indexed (building the index took 27 hours) , Loading the table with 10 threads took 9 hours. I run queries on the data nad get immediate max and min as well as other aggrgate functions very quickly, however a select count(*) of the table takes forever usually nearly an hour or more. Do you have any tuning recommendations. We in our warehouse use the count(*) as our verification of counts by day/month's etc and in Netezza its immediate. I tried by adding oids. BUT the situation I learned was that adding the oids in the table adds a significasnt amount of space to the data AND the index. As you may gather from this we are relatively new on Postgres. Any suggestions you can give me would be most helpful. Cheers, Joe On Mon, Mar 10, 2008 at 11:16 AM, Gregory Stark <[EMAIL PROTECTED]> wrote: > "Tom Lane" <[EMAIL PROTECTED]> writes: > > > Well, scanning an index to get a count might be significantly faster > > than scanning the main table, but it's hardly "instantaneous". It's > > still going to take time proportional to the table size. > > Hm, Mark's comment about bitmap indexes makes that not entirely true. A > bitmap > index can do RLE compression which makes the relationship between the size > of > the table and the time taken to scan the index more complex. In the > degenerate > case where there are no concurrent updates (assuming you can determine > that > quickly) it might actually be constant time. > > > Unless they keep a central counter of the number of index entries; > > which would have all the same serialization penalties we've talked > > about before... > > Bitmap indexes do in fact have concurrency issues -- arguably they're just > a > baroque version of this central counter in this case. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > Ask me about EnterpriseDB's Slony Replication support! > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Mirabili et Veritas Joe Mirabal
Re: [PERFORM] count * performance issue
"Tom Lane" <[EMAIL PROTECTED]> writes: > Well, scanning an index to get a count might be significantly faster > than scanning the main table, but it's hardly "instantaneous". It's > still going to take time proportional to the table size. Hm, Mark's comment about bitmap indexes makes that not entirely true. A bitmap index can do RLE compression which makes the relationship between the size of the table and the time taken to scan the index more complex. In the degenerate case where there are no concurrent updates (assuming you can determine that quickly) it might actually be constant time. > Unless they keep a central counter of the number of index entries; > which would have all the same serialization penalties we've talked > about before... Bitmap indexes do in fact have concurrency issues -- arguably they're just a baroque version of this central counter in this case. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
On 6-3-2008 16:28 Craig James wrote: On the one hand, I understand that Postgres has its architecture, and I understand the issue of row visibility, and so forth. On the other hand, my database is just sitting there, nothing going on, no connections except me, and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that either Oracle or MySQL would answer in a fraction of a second. It's hard for me to believe there isn't a better way. Can you explain to me how you'd fit this in a fraction of a second? mysql> select count(*) from messages; +--+ | count(*) | +--+ | 21908505 | +--+ 1 row in set (8 min 35.09 sec) This is a table containing the messages on forumtopics and is therefore relatively large. The hardware is quite beefy for a forum however (4 3Ghz cores, 16GB, 14+1 disk raid5). This table has about 20GB of data. If I use a table that contains about the same amount of records as the above and was before this query probably much less present in the innodb-buffer (but also less frequently touched by other queries), we see this: mysql> select count(*) from messagesraw; +--+ | count(*) | +--+ | 21962804 | +--+ 1 row in set (5 min 16.41 sec) This table is about 12GB. In both cases MySQL claimed to be 'Using index' with the PRIMARY index, which for those tables is more or less identical. Apparently the time is still table-size related, not necessarily tuple-count related. As this shows: mysql> select count(*) from articlestats; +--+ | count(*) | +--+ | 34467246 | +--+ 1 row in set (54.14 sec) that table is only 2.4GB, but contains 57% more records, although this was on another database on a system with somewhat different specs (8 2.6Ghz cores, 16GB, 7+7+1 raid50), used a non-primary index and I have no idea how well that index was in the system's cache prior to this query. Repeating it makes it do that query in 6.65 seconds, repeating the 12GB-query doesn't make it any faster. Anyway, long story short: MySQL's table-count stuff also seems table-size related. As soon as the index it uses fits in the cache or it doesn't have to use the primary index, it might be a different story, but when the table(index) is too large to fit, it is quite slow. Actually, it doesn't appear to be much faster than Postgresql's (8.2) table-based counts. If I use a much slower machine (2 2Ghz opterons, 8GB ddr memory, 5+1 old 15k rpm scsi disks in raid5) with a 1GB, 13M record table wich is similar to the above articlestats, it is able to return a count(*) in 3 seconds after priming the cache. If you saw instantaneous results with MySQL, you have either seen the query-cache at work or where using myisam. Or perhaps with a fast system, you had small tuples with a nice index in a nicely primed cache. Best regards, Arjen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Mark Mielke wrote: Josh Berkus wrote: Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? AFAIK the above claim is false for Oracle. They have the same transactional issues we do. Nope. Oracle's MVCC is implemented through rollback segments, rather than non-overwriting the way ours is. So Oracle can just do a count(*) on the index, then check the rollback segment for any concurrent update/delete/insert activity and adjust the count. This sucks if there's a *lot* of concurrent activity, but in the usual case it's pretty fast I read the "almost instantaneous" against "the above claim is false" and "Nope.", and I am not sure from the above whether you are saying that Oracle keeps an up-to-date count for the index (which might make it instantaneous?), or whether you are saying it still has to scan the index - which can take time if the index is large (therefore not instantaneous). Cheers, mark -- Mark Mielke <[EMAIL PROTECTED]> Oracle scans the index pages, if the b-tree index is on non-nullable columns, or if the bitmap index is on low-ish cardinality data. Otherwise, it table scans. MyISAM in MySQL would be an example where a counter is kept. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Greg Smith <[EMAIL PROTECTED]> writes: > I know when I'm playing with pgbench the primary key index on the big > accounts table is 1/7 the size of the table, and when using that table > heavily shared_buffers ends up being mostly filled with that index. The > usage counts are so high on the index blocks relative to any section of > the table itself that they're very sticky in memory. And that's toy data; > on some of the webapps people want these accurate counts for the ratio of > index size to table data is even more exaggerated (think web forum). Remember that our TOAST mechanism acts to limit the width of the main-table row. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Josh Berkus wrote: Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? AFAIK the above claim is false for Oracle. They have the same transactional issues we do. Nope. Oracle's MVCC is implemented through rollback segments, rather than non-overwriting the way ours is. So Oracle can just do a count(*) on the index, then check the rollback segment for any concurrent update/delete/insert activity and adjust the count. This sucks if there's a *lot* of concurrent activity, but in the usual case it's pretty fast I read the "almost instantaneous" against "the above claim is false" and "Nope.", and I am not sure from the above whether you are saying that Oracle keeps an up-to-date count for the index (which might make it instantaneous?), or whether you are saying it still has to scan the index - which can take time if the index is large (therefore not instantaneous). Cheers, mark -- Mark Mielke <[EMAIL PROTECTED]>
Re: [PERFORM] count * performance issue
On Fri, 7 Mar 2008, Tom Lane wrote: Well, scanning an index to get a count might be significantly faster than scanning the main table, but it's hardly "instantaneous". It's still going to take time proportional to the table size. If this is something that's happening regularly, you'd have to hope that most of the index is already buffered in memory somewhere though, so now you're talking a buffer/OS cache scan that doesn't touch disk much. Should be easier for that to be true because the index is smaller than the table, right? I know when I'm playing with pgbench the primary key index on the big accounts table is 1/7 the size of the table, and when using that table heavily shared_buffers ends up being mostly filled with that index. The usage counts are so high on the index blocks relative to any section of the table itself that they're very sticky in memory. And that's toy data; on some of the webapps people want these accurate counts for the ratio of index size to table data is even more exaggerated (think web forum). -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Josh Berkus <[EMAIL PROTECTED]> writes: > Tom, >>> Count() on Oracle and MySQL is almost instantaneous, even for very >>> large tables. So why can't Postgres do what they do? >> >> AFAIK the above claim is false for Oracle. They have the same >> transactional issues we do. > Nope. Oracle's MVCC is implemented through rollback segments, rather than > non-overwriting the way ours is. So Oracle can just do a count(*) on the > index, then check the rollback segment for any concurrent > update/delete/insert activity and adjust the count. This sucks if there's > a *lot* of concurrent activity, but in the usual case it's pretty fast. Well, scanning an index to get a count might be significantly faster than scanning the main table, but it's hardly "instantaneous". It's still going to take time proportional to the table size. Unless they keep a central counter of the number of index entries; which would have all the same serialization penalties we've talked about before... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Tom, > > Count() on Oracle and MySQL is almost instantaneous, even for very > > large tables. So why can't Postgres do what they do? > > AFAIK the above claim is false for Oracle. They have the same > transactional issues we do. Nope. Oracle's MVCC is implemented through rollback segments, rather than non-overwriting the way ours is. So Oracle can just do a count(*) on the index, then check the rollback segment for any concurrent update/delete/insert activity and adjust the count. This sucks if there's a *lot* of concurrent activity, but in the usual case it's pretty fast. I've been thinking that when we apply the Dead Space Map we might be able to get a similar effect in PostgreSQL. That is, just do a count over the index, and visit only the heap pages flagged in the DSM. Again, for a heavily updated table this wouldn't have any benefit, but for most cases it would be much faster. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Craig James wrote: Tom Lane wrote: Craig James <[EMAIL PROTECTED]> writes: Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? AFAIK the above claim is false for Oracle. They have the same transactional issues we do. My experience doesn't match this claim. When I ported my application from Oracle to Postgres, this was the single biggest performance problem. count() in Oracle was always very fast. We're not talking about a 20% or 50% difference, we're talking about a small fraction of a second (Oracle) versus a minute (Postgres) -- something like two or three orders of magnitude. To convince yourself do this in Oracle: EXPLAIN PLAN FOR SELECT count(*) FROM table_without_any_indexes and you will see a full table scan. If you add (suitable) indexes you'll see something like an index full fast scan. In fact you can make count(*) *very* slow indeed in Oracle, by having an older session try to count a table that a newer session is modifying and committing to. The older session's data for the count is reconstructed from the rollback segments - which is very expensive. regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Craig James wrote: Tom Lane wrote: Craig James <[EMAIL PROTECTED]> writes: Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? AFAIK the above claim is false for Oracle. They have the same transactional issues we do. My experience doesn't match this claim. When I ported my application from Oracle to Postgres, this was the single biggest performance problem. count() in Oracle was always very fast. We're not talking about a 20% or 50% difference, we're talking about a small fraction of a second (Oracle) versus a minute (Postgres) -- something like two or three orders of magnitude. It may be that Oracle has a way to detect when there's no transaction and use a faster method. If so, this was a clever optimization -- in my experience, that represents the vast majority of the times you want to use count(). It's not very useful to count the rows of a table that many apps are actively modifying since the result may change the moment your transaction completes. Most of the time when you use count(), it's because you're the only one modifying the table, so the count will be meaningful. Craig Oracle will use a btree index on a not null set of columns to do a fast full index scan, which can be an order of magnitude or faster compared to a table scan. Also, Oracle can use a bitmap index (in cases where a bitmap index isn't otherwise silly) for a bitmap fast index scan/bitmap conversion for similar dramatic results. For "large" tables, Oracle is not going to be as fast as MyISAM tables in MySQL, even with these optimizations, since MyISAM doesn't have to scan even index pages to get a count(*) answer against the full table. Paul -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Tom Lane wrote: Craig James <[EMAIL PROTECTED]> writes: Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? AFAIK the above claim is false for Oracle. They have the same transactional issues we do. My experience doesn't match this claim. When I ported my application from Oracle to Postgres, this was the single biggest performance problem. count() in Oracle was always very fast. We're not talking about a 20% or 50% difference, we're talking about a small fraction of a second (Oracle) versus a minute (Postgres) -- something like two or three orders of magnitude. It may be that Oracle has a way to detect when there's no transaction and use a faster method. If so, this was a clever optimization -- in my experience, that represents the vast majority of the times you want to use count(). It's not very useful to count the rows of a table that many apps are actively modifying since the result may change the moment your transaction completes. Most of the time when you use count(), it's because you're the only one modifying the table, so the count will be meaningful. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Craig James wrote: My question is: What do the other databases do that Postgres can't do, and why not? Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? I think Mysql can only do that for the myisam engine - innodb and falcon are similar to Postgres. I don't believe Oracle optimizes bare count(*) on a table either - tho it may be able to use a suitable index (if present) to get the answer quicker. regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
Craig James <[EMAIL PROTECTED]> writes: > Count() on Oracle and MySQL is almost instantaneous, even for very large > tables. So why can't Postgres do what they do? AFAIK the above claim is false for Oracle. They have the same transactional issues we do. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, 06 Mar 2008 07:28:50 -0800 Craig James <[EMAIL PROTECTED]> wrote: > In the 3 years I've been using Postgres, the problem of count() performance > has come up more times than I can recall, and each time the answer is, "It's > a sequential scan -- redesign your application." > > My question is: What do the other databases do that Postgres can't do, and > why not? > > Count() on Oracle and MySQL is almost instantaneous, even for very large > tables. So why can't Postgres do what they do? It's a tradeoff. The only way to get that information quickly is to maintain it internally when you insert or delete a row. So when do you want to take your hit. It sounds like Oracle has made this decision for you. In PostgreSQL you can use triggers and rules to manage this information if you need it. You can even do stuff like track how many of each type of something you have. That's something you can't do if your database engine has done a generic speedup for you. You would still have to create your own table for something like that and then you get the hit twice. -- D'Arcy J.M. Cain <[EMAIL PROTECTED]> | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, 2008-03-06 at 07:28 -0800, Craig James wrote: ... > My question is: What do the other databases do that Postgres can't do, and > why not? > > Count() on Oracle and MySQL is almost instantaneous, even for very large > tables. So why can't Postgres do what they do? ... I can vouch that Oracle can still take linear time to perform a count(*), at least in some cases. I have also seen count(*) fast in some cases too... my understanding is that they maintain a list of "interested transactions" on a per-relation basis. Perhaps they do an optimization based on the index size if there are no pending DML transactions? -- Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, Mar 6, 2008 at 3:49 PM, Greg Smith <[EMAIL PROTECTED]> wrote: > > You only get this accelerated significantly when using MyISAM, which can > tell you an exact count of all the rows it hasn't corrupted yet. Please don't do that again. I'm going to have to spend the next hour cleaning coffee out of my laptop keyboard. :-) -- Dave Page EnterpriseDB UK Ltd: http://www.enterprisedb.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, 6 Mar 2008, Steinar H. Gunderson wrote: On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote: Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? In MySQL's case: Handle transactions. (Try COUNT(*) on an InnoDB table.) Exactly. There is a good discussion of this at http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/ and I found the comments from Ken Jacobs were the most informative. In short, if you want any reasonable database integrity you have to use InnoDB with MySQL, and once you make that choice it has the same problem. You only get this accelerated significantly when using MyISAM, which can tell you an exact count of all the rows it hasn't corrupted yet. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
In response to Craig James <[EMAIL PROTECTED]>: > In the 3 years I've been using Postgres, the problem of count() performance > has come up more times than I can recall, and each time the answer is, "It's > a sequential scan -- redesign your application." > > My question is: What do the other databases do that Postgres can't do, and > why not? > > Count() on Oracle and MySQL is almost instantaneous, even for very large > tables. So why can't Postgres do what they do? I don't know about Oracle, but MySQL has this problem as well. Use innodb tables and see how slow it is. The only reason myisam tables don't have this problem is because they don't implement any of the features that make the problem difficult to solve. > On the one hand, I understand that Postgres has its architecture, and I > understand the issue of row visibility, and so forth. On the other hand, my > database is just sitting there, nothing going on, no connections except me, > and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that > either Oracle or MySQL would answer in a fraction of a second. It's hard for > me to believe there isn't a better way. There's been discussion about putting visibility information in indexes. I don't know how far along that effort is, but I expect that will improve count() performance significantly. > This is a real problem. Countless people (including me) have spent > significant effort rewriting applications because of this performance flaw in > Postgres. Over and over, the response is, "You don't really need to do that > ... change your application." Well, sure, it's always possible to change the > application, but that misses the point. To most of us users, count() seems > like it should be a trivial operation. On other relational database systems, > it is a trivial operation. > > This is really a significant flaw on an otherwise excellent relational > database system. Not really. It really is a design flaw in your application ... it doesn't make relational sense to use the number of rows in a table for anything. Just because other people do it frequently doesn't make it right. That being said, it's still a useful feature, and I don't hear anyone denying that. As I said, google around a bit WRT to PG storing visibility information in indexes, as I think that's the way this will be improved. > My rant for today... Feel better now? -- Bill Moran -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote: > Count() on Oracle and MySQL is almost instantaneous, even for very large > tables. So why can't Postgres do what they do? In MySQL's case: Handle transactions. (Try COUNT(*) on an InnoDB table.) /* Steinar */ -- Homepage: http://www.sesse.net/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
Craig James wrote: > This is a real problem. Countless people (including me) have > spent significant effort rewriting applications because of this > performance flaw in Postgres. Over and over, the response is, > "You don't really need to do that ... change your application." > Well, sure, it's always possible to change the application, but > that misses the point. To most of us users, count() seems like > it should be a trivial operation. On other relational database > systems, it is a trivial operation. > > This is really a significant flaw on an otherwise excellent > relational database system. Have you read the TODO items related to this? -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
In the 3 years I've been using Postgres, the problem of count() performance has come up more times than I can recall, and each time the answer is, "It's a sequential scan -- redesign your application." My question is: What do the other databases do that Postgres can't do, and why not? Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? On the one hand, I understand that Postgres has its architecture, and I understand the issue of row visibility, and so forth. On the other hand, my database is just sitting there, nothing going on, no connections except me, and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that either Oracle or MySQL would answer in a fraction of a second. It's hard for me to believe there isn't a better way. This is a real problem. Countless people (including me) have spent significant effort rewriting applications because of this performance flaw in Postgres. Over and over, the response is, "You don't really need to do that ... change your application." Well, sure, it's always possible to change the application, but that misses the point. To most of us users, count() seems like it should be a trivial operation. On other relational database systems, it is a trivial operation. This is really a significant flaw on an otherwise excellent relational database system. My rant for today... Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, 6 Mar 2008, sathiya psql wrote: any way will you explain., what is this COST, actual time and other stuffs There's a long list of links to tools and articles on this subject at http://www.postgresqldocs.org/index.php/Using_EXPLAIN -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, 6 Mar 2008, sathiya psql wrote: is there any article saying the difference between this 7.3 and 8.4 I've collected a list of everything on this topic I've seen at http://www.postgresqldocs.org/index.php/Version_8.3_Changes The Feature Matrix linked to there will be a quicker way to see what's happened than sorting through the release notes. None of these changes change the fact that getting an exact count in this situation takes either a sequential scan or triggers. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
A. Kretschmer wrote: am Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes: TRIGGER i can use if i want the count of the whole table, but i require for some of the rows with WHERE condition so how to do that ??? Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on this row. Can you show us the output for a EXPLAIN ANALYSE SELECT count(*) from WHERE = ... ? Actually - in this case, TRIGGER can be a good idea. If your count table can include the where information, then you no longer require an effective table-wide lock for updates. In the past I have used sequential articles numbers within a topic for an online community. Each topic row had an article_count. To generate a new article, I could update the article_count and use the number I had generated as the article number. To query the number of articles in a particular topic, article_count was available. Given thousands of topics, and 10s of thousands of articles, the system worked pretty good. Not in the millions range as the original poster, but I saw no reason why this wouldn't scale. For the original poster: You might be desperate and looking for help from the only place you know to get it from, but some of your recent answers have shown that you are either not reading the helpful responses provided to you, or you are unwilling to do your own research. If that continues, I won't be posting to aid you. Cheers, mark -- Mark Mielke <[EMAIL PROTECTED]>
Re: [PERFORM] count * performance issue
sathiya psql escribió: > > Yes it is the latest stable version. > > is there any article saying the difference between this 7.3 and 8.4 http://www.postgresql.org/docs/8.3/static/release.html In particular, http://www.postgresql.org/docs/8.3/static/release-8-3.html http://www.postgresql.org/docs/8.3/static/release-8-2.html http://www.postgresql.org/docs/8.3/static/release-8-1.html http://www.postgresql.org/docs/8.3/static/release-8-0.html which are all the major releases between 7.4 and 8.3. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
am Thu, dem 06.03.2008, um 18:13:50 +0530 mailte sathiya psql folgendes: > > Yes it is the latest stable version. > > > is there any article saying the difference between this 7.3 and 8.4 http://developer.postgresql.org/pgdocs/postgres/release.html Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
Of course, the official documentation covers that information in its release notes http://www.postgresql.org/docs/8.3/static/release.html best wishes Harald On Thu, Mar 6, 2008 at 1:43 PM, sathiya psql <[EMAIL PROTECTED]> wrote: > > > > > > > > > > Yes it is the latest stable version. > > is there any article saying the difference between this 7.3 and 8.4 > > > -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 fx 01212-5-13695179 - EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
> Yes it is the latest stable version. > is there any article saying the difference between this 7.3 and 8.4
Re: [PERFORM] count * performance issue
Hi, On 6-Mar-08, at 6:58 AM, sathiya psql wrote: The only thing that is going to help you is really fast disks, and more memory, and you should consider moving to 8.3 for all the other performance benefits. Is 8.3 is a stable version or what is the latest stable version of postgres ?? Yes it is the latest stable version. moving my database from 7.4 to 8.3 will it do any harm ?? You will have to test this yourself. There may be issues what are all the advantages of moving from 7.4 to 8.3 Every version of postgresql has improved performance, and robustness; so you will get better overall performance. However I want to caution you this is not a panacea. It will NOT solve your seq scan problem. Dave
Re: [PERFORM] count * performance issue
On 6-Mar-08, at 1:43 AM, sathiya psql wrote: is there any way to explicitly force the postgres to use index scan If you want to count all the rows in the table there is only one way to do it (without keeping track yourself with a trigger ); a seq scan. An index will not help you. The only thing that is going to help you is really fast disks, and more memory, and you should consider moving to 8.3 for all the other performance benefits. Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
am Thu, dem 06.03.2008, um 12:36:48 +0530 mailte sathiya psql folgendes: > > QUERY PLAN > -- > Aggregate (cost=205756.95..205756.95 rows=1 width=0) (actual time= > 114675.042..114675.042 rows=1 loops=1) >-> Seq Scan on call_log (cost=0.00..193224.16 rows=5013112 width=0) > (actual time=11.754..91429.594 rows=5061619 loops=1) > Filter: (call_id > 0) > Total runtime: 114699.797 ms > (4 rows) 'call_id > 0' are your where-condition? An INDEX can't help, all rows with call_id > 0 are in the result, and i guess, that's all records in the table. > > > it is now taking 114 seconds, i think because of load in my system any way > will you explain., what is this COST, actual time and other stuffs 08:16 < akretschmer> ??explain 08:16 < rtfm_please> For information about explain 08:16 < rtfm_please> see http://explain-analyze.info 08:16 < rtfm_please> or http://www.depesz.com/index.php/2007/08/06/better-explain-analyze/ 08:16 < rtfm_please> or http://www.postgresql.org/docs/current/static/sql-explain.html and http://redivi.com/~bob/oscon2005_pgsql_pdf/OSCON_Explaining_Explain_Public.pdf Read this to learn more about explain. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
QUERY PLAN -- Aggregate (cost=205756.95..205756.95 rows=1 width=0) (actual time= 114675.042..114675.042 rows=1 loops=1) -> Seq Scan on call_log (cost=0.00..193224.16 rows=5013112 width=0) (actual time=11.754..91429.594 rows=5061619 loops=1) Filter: (call_id > 0) Total runtime: 114699.797 ms (4 rows) it is now taking 114 seconds, i think because of load in my system any way will you explain., what is this COST, actual time and other stuffs On Thu, Mar 6, 2008 at 12:27 PM, A. Kretschmer < [EMAIL PROTECTED]> wrote: > am Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes: > > TRIGGER i can use if i want the count of the whole table, but i require > for > > some of the rows with WHERE condition > > > > so how to do that ??? > > Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on > this row. Can you show us the output for a EXPLAIN ANALYSE SELECT > count(*) from WHERE = ... ? > > > Andreas > -- > Andreas Kretschmer > Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) > GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > > http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance >
Re: [PERFORM] count * performance issue
am Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes: > TRIGGER i can use if i want the count of the whole table, but i require for > some of the rows with WHERE condition > > so how to do that ??? Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on this row. Can you show us the output for a EXPLAIN ANALYSE SELECT count(*) from WHERE = ... ? Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
am Thu, dem 06.03.2008, um 12:13:17 +0530 mailte sathiya psql folgendes: > is there any way to explicitly force the postgres to use index scan Not realy, PG use a cost-based optimizer and use an INDEX if it make sense. > > On Thu, Mar 6, 2008 at 12:06 PM, A. Kretschmer < > [EMAIL PROTECTED]> wrote: please, no silly top-posting with the complete quote below. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Fwd: [PERFORM] count * performance issue
-- Forwarded message -- From: sathiya psql <[EMAIL PROTECTED]> Date: Thu, Mar 6, 2008 at 12:17 PM Subject: Re: [PERFORM] count * performance issue To: "A. Kretschmer" <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] TRIGGER i can use if i want the count of the whole table, but i require for some of the rows with WHERE condition so how to do that ??? On Thu, Mar 6, 2008 at 12:06 PM, A. Kretschmer < [EMAIL PROTECTED]> wrote: > am Thu, dem 06.03.2008, um 1:26:46 -0500 mailte Mark Mielke folgendes: > > > > > > There aren't a general solution. If you realy need the exact > count of > > tuples than you can play with a TRIGGER and increase/decrease > the > > tuple-count for this table in an extra table. > > > > > > Of course, this means accepting the cost of obtaining update locks on > the count > > table. > > > > The original poster should understand that they can either get a fast > estimated > > count, or they can get a slow accurate count (either slow in terms of > select > > using count(*) or slow in terms of updates using triggers and locking). > > > > Other systems have their own issues. An index scan may be faster than a > table > > scan for databases that can accurately determine counts using only the > index, > > No. The current index-implementation contains no information about the > row-visibility within the current transaction. You need to scan the > whole data-table to obtain if the current row are visible within the > current transaction. > > > > but it's still a relatively slow operation, and people don't normally > need an > > accurate count for records in the range of 100,000+? :-) > > right. > > > Andreas > -- > Andreas Kretschmer > Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) > GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > > http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance >
Re: [PERFORM] count * performance issue
is there any way to explicitly force the postgres to use index scan On Thu, Mar 6, 2008 at 12:06 PM, A. Kretschmer < [EMAIL PROTECTED]> wrote: > am Thu, dem 06.03.2008, um 1:26:46 -0500 mailte Mark Mielke folgendes: > > > > > > There aren't a general solution. If you realy need the exact > count of > > tuples than you can play with a TRIGGER and increase/decrease > the > > tuple-count for this table in an extra table. > > > > > > Of course, this means accepting the cost of obtaining update locks on > the count > > table. > > > > The original poster should understand that they can either get a fast > estimated > > count, or they can get a slow accurate count (either slow in terms of > select > > using count(*) or slow in terms of updates using triggers and locking). > > > > Other systems have their own issues. An index scan may be faster than a > table > > scan for databases that can accurately determine counts using only the > index, > > No. The current index-implementation contains no information about the > row-visibility within the current transaction. You need to scan the > whole data-table to obtain if the current row are visible within the > current transaction. > > > > but it's still a relatively slow operation, and people don't normally > need an > > accurate count for records in the range of 100,000+? :-) > > right. > > > Andreas > -- > Andreas Kretschmer > Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) > GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > > http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance >
Re: [PERFORM] count * performance issue
am Thu, dem 06.03.2008, um 1:26:46 -0500 mailte Mark Mielke folgendes: > > > There aren't a general solution. If you realy need the exact count of > tuples than you can play with a TRIGGER and increase/decrease the > tuple-count for this table in an extra table. > > > Of course, this means accepting the cost of obtaining update locks on the > count > table. > > The original poster should understand that they can either get a fast > estimated > count, or they can get a slow accurate count (either slow in terms of select > using count(*) or slow in terms of updates using triggers and locking). > > Other systems have their own issues. An index scan may be faster than a table > scan for databases that can accurately determine counts using only the index, No. The current index-implementation contains no information about the row-visibility within the current transaction. You need to scan the whole data-table to obtain if the current row are visible within the current transaction. > but it's still a relatively slow operation, and people don't normally need an > accurate count for records in the range of 100,000+? :-) right. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
On Thu, Mar 6, 2008 at 5:31 PM, sathiya psql <[EMAIL PROTECTED]> wrote: > will you please tell, what is autovacuuming... and wat it ll do... is > there any good article in this > > > Read this --> http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html#AUTOVACUUM -- Shoaib Mir Fujitsu Australia Software Technology [EMAIL PROTECTED]
Re: [PERFORM] count * performance issue
will you please tell, what is autovacuuming... and wat it ll do... is there any good article in this On Thu, Mar 6, 2008 at 11:56 AM, Shoaib Mir <[EMAIL PROTECTED]> wrote: > On Thu, Mar 6, 2008 at 5:19 PM, sathiya psql <[EMAIL PROTECTED]> > wrote: > > > buy every time i need to put ANALYZE... > > this takes the same time as count(*) takes, what is the use ?? > > > > > > > Dont you have autovacuuming running in the background which is taking care > of the analyze as well? > > If not then hmm turn it on and doing manual analyze then shouldnt I guess > take much time! > > But yes, I will say if its possible go with the trigger option as that > might be more helpful and a very fast way to do that. > > > -- > Shoaib Mir > Fujitsu Australia Software Technology > [EMAIL PROTECTED] >
Re: [PERFORM] count * performance issue
There aren't a general solution. If you realy need the exact count of tuples than you can play with a TRIGGER and increase/decrease the tuple-count for this table in an extra table. Of course, this means accepting the cost of obtaining update locks on the count table. The original poster should understand that they can either get a fast estimated count, or they can get a slow accurate count (either slow in terms of select using count(*) or slow in terms of updates using triggers and locking). Other systems have their own issues. An index scan may be faster than a table scan for databases that can accurately determine counts using only the index, but it's still a relatively slow operation, and people don't normally need an accurate count for records in the range of 100,000+? :-) Cheers, mark -- Mark Mielke <[EMAIL PROTECTED]>
Re: [PERFORM] count * performance issue
On Thu, Mar 6, 2008 at 5:19 PM, sathiya psql <[EMAIL PROTECTED]> wrote: > buy every time i need to put ANALYZE... > this takes the same time as count(*) takes, what is the use ?? > > > Dont you have autovacuuming running in the background which is taking care of the analyze as well? If not then hmm turn it on and doing manual analyze then shouldnt I guess take much time! But yes, I will say if its possible go with the trigger option as that might be more helpful and a very fast way to do that. -- Shoaib Mir Fujitsu Australia Software Technology [EMAIL PROTECTED]
Re: [PERFORM] count * performance issue
buy every time i need to put ANALYZE... this takes the same time as count(*) takes, what is the use ?? On Thu, Mar 6, 2008 at 11:45 AM, Shoaib Mir <[EMAIL PROTECTED]> wrote: > On Thu, Mar 6, 2008 at 5:08 PM, A. Kretschmer < > [EMAIL PROTECTED]> wrote:> > > > > am having a table with nearly 50 lakh records, > > > > > > it has more than 15 columns, i want to count how many records are > > there, it is > > > taking nearly 17 seconds to do that... > > > > > > i know that to get a approximate count we can use > > > SELECT reltuples FROM pg_class where relname = TABLENAME; > > > > > > but this give approximate count, and i require exact count... > > > > There aren't a general solution. If you realy need the exact count of > > tuples than you can play with a TRIGGER and increase/decrease the > > tuple-count for this table in an extra table. > > > > > > > Or do something like: > > ANALYZE tablename; > select reltuple from pg_class where relname = 'tablename'; > > That will also return the total number of rows in a table and I guess > might be much faster then doing a count(*) but yes if trigger can be an > option that can be the easiest way to do it and fastest too. > > -- > Shoaib Mir > Fujitsu Australia Software Technology > [EMAIL PROTECTED]
Re: [PERFORM] count * performance issue
On Thu, Mar 6, 2008 at 5:08 PM, A. Kretschmer < [EMAIL PROTECTED]> wrote:> > > am having a table with nearly 50 lakh records, > > > > it has more than 15 columns, i want to count how many records are there, > it is > > taking nearly 17 seconds to do that... > > > > i know that to get a approximate count we can use > > SELECT reltuples FROM pg_class where relname = TABLENAME; > > > > but this give approximate count, and i require exact count... > > There aren't a general solution. If you realy need the exact count of > tuples than you can play with a TRIGGER and increase/decrease the > tuple-count for this table in an extra table. > > > Or do something like: ANALYZE tablename; select reltuple from pg_class where relname = 'tablename'; That will also return the total number of rows in a table and I guess might be much faster then doing a count(*) but yes if trigger can be an option that can be the easiest way to do it and fastest too. -- Shoaib Mir Fujitsu Australia Software Technology [EMAIL PROTECTED]
Re: [PERFORM] count * performance issue
am Thu, dem 06.03.2008, um 11:13:01 +0530 mailte sathiya psql folgendes: > count(*) tooks much time... > > but with the where clause we can make this to use indexing,... what where > clause we can use?? An index without a WHERE can't help to avoid a seq. scan. > > Am using postgres 7.4 in Debian OS with 1 GB RAM, PG 7.4 are very old... Recent versions are MUCH faster. > > am having a table with nearly 50 lakh records, > > it has more than 15 columns, i want to count how many records are there, it is > taking nearly 17 seconds to do that... > > i know that to get a approximate count we can use > SELECT reltuples FROM pg_class where relname = TABLENAME; > > but this give approximate count, and i require exact count... There aren't a general solution. If you realy need the exact count of tuples than you can play with a TRIGGER and increase/decrease the tuple-count for this table in an extra table. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
Re: [PERFORM] count * performance issue
sathiya psql wrote: count(*) tooks much time... but with the where clause we can make this to use indexing,... what where clause we can use?? Am using postgres 7.4 in Debian OS with 1 GB RAM, am having a table with nearly 50 lakh records, Looks suspiciously like a question asked yesterday: http://archives.postgresql.org/pgsql-performance/2008-03/msg00068.php -- Postgresql & php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance
[PERFORM] count * performance issue
count(*) tooks much time... but with the where clause we can make this to use indexing,... what where clause we can use?? Am using postgres 7.4 in Debian OS with 1 GB RAM, am having a table with nearly 50 lakh records, it has more than 15 columns, i want to count how many records are there, it is taking nearly 17 seconds to do that... i know that to get a approximate count we can use SELECT reltuples FROM pg_class where relname = TABLENAME; but this give approximate count, and i require exact count...
Re: [PERFORM] count(*) performance
On Mon, Mar 27, 2006 at 12:20:54PM -0700, Brendan Duddridge wrote: > Does that mean that even though autovacuum is turned on, you still > should do a regular vacuum analyze periodically? Doing a periodic vacuumdb -avz and keeping an eye on the last few lines isn't a bad idea. It would also be helpful if there was a log parser that could take a look at the output of a vacuumdb -av and look for any problem areas, such as relations that have a lot of free space in them. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] count(*) performance
Gábriel Ákos wrote: > I thought that too. Autovacuum is running on our system but it didn't do > the trick. Anyway the issue is solved, thank you all for helping. :) Hi, Gabriel, it may be that your Free Space Map (FSM) setting is way to low. Try increasing it. Btw, VACUUM outputs a Warning if FSM is not high enough, maybe you can find useful hints in the log file. HTH Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] count(*) performance
I think it is definitely necessary from an administration point of view - as an administrator, I want to know: 1) Are there any stats (at all) in a schema 2) Are there any stats on the table that slow_query_foo is targeting 3) If I have stats, how recent are they 4) Could it be that there are a lot of dead tuples lying around (given the amount of traffic I know I have) These would be (are always!) the first questions I ask myself when I'm about to identify performance problems in an app, don't know how other people do though :) Maybe something I'll try to look into this weekend, if I can spare some time. - Mikael -Original Message- From: Matthew T. O'Connor [mailto:[EMAIL PROTECTED] Sent: den 28 mars 2006 00:43 To: Mikael Carneholm Cc: Postgresql Performance Subject: Re: [PERFORM] count(*) performance Mikael Carneholm wrote: > This is where a "last_vacuumed" (and "last_analyzed") column in > pg_statistic(?) would come in handy. Each time vacuum or analyze has > finished, update the row for the specific table that was > vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed > column. No more guessing "maybe I haven't vacuumed/analyzed in a while", > and each time a user complains about bad performance, one could request > the user to do a "select s.last_vacuumed, s.last_analyzed from > pg_statistic s, pg_attribute a, pg_class c where ..." > > It SOUNDS easy to implement, but that has fooled me before... :-) It is fairly easy to implement, however it has been discussed before and decided that it wasn't necessary. What the system cares about is how long it's been since the last vacuum in terms of XIDs not time. Storing a timestamp would make it more human readable, but I'm not sure the powers that be want to add two new columns to some system table to accommodate this. Matt ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] count(*) performance
"Matthew T. O'Connor" writes: > It is fairly easy to implement, however it has been discussed before and > decided that it wasn't necessary. What the system cares about is how > long it's been since the last vacuum in terms of XIDs not time. I think Alvaro is intending to do the latter (store per-table vacuum xid info) for 8.2. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] count(*) performance
Mikael Carneholm wrote: This is where a "last_vacuumed" (and "last_analyzed") column in pg_statistic(?) would come in handy. Each time vacuum or analyze has finished, update the row for the specific table that was vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed column. No more guessing "maybe I haven't vacuumed/analyzed in a while", and each time a user complains about bad performance, one could request the user to do a "select s.last_vacuumed, s.last_analyzed from pg_statistic s, pg_attribute a, pg_class c where ..." It SOUNDS easy to implement, but that has fooled me before... :-) It is fairly easy to implement, however it has been discussed before and decided that it wasn't necessary. What the system cares about is how long it's been since the last vacuum in terms of XIDs not time. Storing a timestamp would make it more human readable, but I'm not sure the powers that be want to add two new columns to some system table to accommodate this. Matt ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] count(*) performance
This is where a "last_vacuumed" (and "last_analyzed") column in pg_statistic(?) would come in handy. Each time vacuum or analyze has finished, update the row for the specific table that was vacuumed/analyzed with a timestamp in the last_vacuumed/last_analyzed column. No more guessing "maybe I haven't vacuumed/analyzed in a while", and each time a user complains about bad performance, one could request the user to do a "select s.last_vacuumed, s.last_analyzed from pg_statistic s, pg_attribute a, pg_class c where ..." It SOUNDS easy to implement, but that has fooled me before... :-) - Mikael -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Guido Neitzer Sent: den 27 mars 2006 21:44 To: Brendan Duddridge Cc: Postgresql Performance Subject: Re: [PERFORM] count(*) performance On 27.03.2006, at 21:20 Uhr, Brendan Duddridge wrote: > Does that mean that even though autovacuum is turned on, you still > should do a regular vacuum analyze periodically? It seems that there are situations where autovacuum does not a really good job. However, in our application I have made stupid design decision which I want to change as soon as possible. I have a "visit count" column in one of the very large tables, so updates are VERY regular. I've just checked and saw that autovacuum does a great job with that. Nevertheless I have set up a cron job to do a standard vacuum every month. I've used vacuum full only once after I did a bulk update of about 200.000 rows ... cug -- PharmaLine, Essen, GERMANY Software and Database Development ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] count(*) performance
On 27.03.2006, at 21:20 Uhr, Brendan Duddridge wrote: Does that mean that even though autovacuum is turned on, you still should do a regular vacuum analyze periodically? It seems that there are situations where autovacuum does not a really good job. However, in our application I have made stupid design decision which I want to change as soon as possible. I have a "visit count" column in one of the very large tables, so updates are VERY regular. I've just checked and saw that autovacuum does a great job with that. Nevertheless I have set up a cron job to do a standard vacuum every month. I've used vacuum full only once after I did a bulk update of about 200.000 rows ... cug -- PharmaLine, Essen, GERMANY Software and Database Development smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] count(*) performance
Brendan Duddridge wrote: > Does that mean that even though autovacuum is turned on, you still > should do a regular vacuum analyze periodically? No, it probably means you have set FSM settings too low, or not tuned the autovacuum parameters to your specific situation. A bug in the autovacuum daemon is not unexpected however, so if it doesn't work after tuning, let us know. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] count(*) performance
Gábriel Ákos wrote: Luke Lonergan wrote: Gabriel, On 3/27/06 10:05 AM, "Gábriel Ákos" <[EMAIL PROTECTED]> wrote: That gave me an idea. I thought that autovacuum is doing it right, but I issued a vacuum full analyze verbose , and it worked all the day. After that I've tweaked memory settings a bit too (more fsm_pages) Oops! I replied to your disk speed before I saw this. The only thing is - you probably don't want to do a "vacuum full", but rather a simple "vacuum" should be enough. I thought that too. Autovacuum is running on our system but it didn't do the trick. Anyway the issue is solved, thank you all for helping. :) Yeah, it would be nice of autovacuum had some way of raising a flag to the admin that given current settings (thresholds, FSM etc...), it's not keeping up with the activity. I don't know how to do this, but I hope someone else has some good ideas. Matt ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] count(*) performance
Does that mean that even though autovacuum is turned on, you still should do a regular vacuum analyze periodically? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Mar 27, 2006, at 11:14 AM, Luke Lonergan wrote: Gabriel, On 3/27/06 10:05 AM, "Gábriel Ákos" <[EMAIL PROTECTED]> wrote: That gave me an idea. I thought that autovacuum is doing it right, but I issued a vacuum full analyze verbose , and it worked all the day. After that I've tweaked memory settings a bit too (more fsm_pages) Oops! I replied to your disk speed before I saw this. The only thing is - you probably don't want to do a "vacuum full", but rather a simple "vacuum" should be enough. - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] count(*) performance
Luke Lonergan wrote: Gabriel, On 3/27/06 10:05 AM, "Gábriel Ákos" <[EMAIL PROTECTED]> wrote: That gave me an idea. I thought that autovacuum is doing it right, but I issued a vacuum full analyze verbose , and it worked all the day. After that I've tweaked memory settings a bit too (more fsm_pages) Oops! I replied to your disk speed before I saw this. The only thing is - you probably don't want to do a "vacuum full", but rather a simple "vacuum" should be enough. I thought that too. Autovacuum is running on our system but it didn't do the trick. Anyway the issue is solved, thank you all for helping. :) -- Üdvözlettel, Gábriel Ákos -=E-Mail :[EMAIL PROTECTED]|Web: http://www.i-logic.hu=- -=Tel/fax:+3612367353|Mobil:+36209278894=- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] count(*) performance
Gabriel, On 3/27/06 10:05 AM, "Gábriel Ákos" <[EMAIL PROTECTED]> wrote: > That gave me an idea. I thought that autovacuum is doing it right, but I > issued a vacuum full analyze verbose , and it worked all the day. > After that I've tweaked memory settings a bit too (more fsm_pages) Oops! I replied to your disk speed before I saw this. The only thing is - you probably don't want to do a "vacuum full", but rather a simple "vacuum" should be enough. - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] count(*) performance
Jim C. Nasby wrote: But in this case, I'd bet money that if it's taking 4 minutes something else is wrong. Have you been vacuuming that table frequently enough? That gave me an idea. I thought that autovacuum is doing it right, but I issued a vacuum full analyze verbose , and it worked all the day. After that I've tweaked memory settings a bit too (more fsm_pages) Now: staging=# SELECT count(*) from infx.infx_product; count - 3284997 (1 row) Time: 1301.049 ms As I saw the output, the database was compressed to 10% of its size :) This table has quite big changes every 4 hour, let's see how it works. Maybe I'll have to issue full vacuums from cron regularly. What's SELECT relpages FROM pg_class WHERE relname='tablename' show? This went to 10% as well, now it's around 156000 pages. Regards, Akos -- Üdvözlettel, Gábriel Ákos -=E-Mail :[EMAIL PROTECTED]|Web: http://www.i-logic.hu=- -=Tel/fax:+3612367353|Mobil:+36209278894=- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] count(*) performance
Luke Lonergan wrote: To test your disk speed, use the following commands and report the times here: time bash -c "dd if=/dev/zero of=bigfile bs=8k count=50 && sync" [EMAIL PROTECTED]:/fast # time bash -c "dd if=/dev/zero of=bigfile bs=8k count=50 && sync" 50+0 records in 50+0 records out 409600 bytes transferred in 45.469404 seconds (90082553 bytes/sec) real0m56.880s user0m0.112s sys 0m18.937s time dd if=bigfile of=/dev/null bs=8k [EMAIL PROTECTED]:/fast # time dd if=bigfile of=/dev/null bs=8k 50+0 records in 50+0 records out 409600 bytes transferred in 53.542147 seconds (76500481 bytes/sec) real0m53.544s user0m0.048s sys 0m10.637s I guess these values aren't that bad :) -- Üdvözlettel, Gábriel Ákos -=E-Mail :[EMAIL PROTECTED]|Web: http://www.i-logic.hu=- -=Tel/fax:+3612367353|Mobil:+36209278894=- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] count(*) performance
Gabriel, On 3/27/06 5:34 AM, "Gábriel Ákos" <[EMAIL PROTECTED]> wrote: > Question: I have a table with 2.5M rows. count(*) on this table is > running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10 > array (sata, not scsi)) Is this normal? How could I make it run faster? > Maybe make it run faster for the 2nd time? Which parameters should I > change in postgresql.conf and how? Before changing anything with your Postgres configuration, you should check your hard drive array performance. All select count(*) does is a sequential scan of your data, and if the table is larger than memory, or if it's the first time you've scanned it, it is limited by your disk speed. To test your disk speed, use the following commands and report the times here: time bash -c "dd if=/dev/zero of=bigfile bs=8k count=50 && sync" time dd if=bigfile of=/dev/null bs=8k If these are taking a long time, from another session watch the I/O rate with "vmstat 1" for a while and report that here. - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] count(*) performance
On Mon, Mar 27, 2006 at 03:34:32PM +0200, G?briel ?kos wrote: > Hi, > > I guess this is an age-old 100times answered question, but I didn't find > the answer to it yet (neither in the FAQ nor in the mailing list archives). > > Question: I have a table with 2.5M rows. count(*) on this table is > running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10 > array (sata, not scsi)) Is this normal? How could I make it run faster? > Maybe make it run faster for the 2nd time? Which parameters should I > change in postgresql.conf and how? First, count(*) on PostgreSQL tends to be slow because you can't do index covering[1]. But in this case, I'd bet money that if it's taking 4 minutes something else is wrong. Have you been vacuuming that table frequently enough? What's SELECT relpages FROM pg_class WHERE relname='tablename' show? [1] http://www.pervasive-postgres.com/lp/newsletters/2006/Insights_postgres_Feb.asp#5 -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] count(*) performance
Hi, I guess this is an age-old 100times answered question, but I didn't find the answer to it yet (neither in the FAQ nor in the mailing list archives). Question: I have a table with 2.5M rows. count(*) on this table is running 4 minutes long. (dual opteron, 4gig ram, db on 4 disk raid10 array (sata, not scsi)) Is this normal? How could I make it run faster? Maybe make it run faster for the 2nd time? Which parameters should I change in postgresql.conf and how? -- Üdvözlettel, Gábriel Ákos -=E-Mail :[EMAIL PROTECTED]|Web: http://www.i-logic.hu=- -=Tel/fax:+3612367353|Mobil:+36209278894=- ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings