Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Greg Smith
Mladen Gogala wrote: I agree, but I am afraid that after the demise of SGI, XFS isn't being developed. It's back to being well maintained again; see http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html for some history here and why it's become relevant to RedHat in

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Greg Smith
Samuel Gendler wrote: I spent some time going through the various tuning docs on the wiki whie bringing some new hardware up and I can't remember seeing any discussion of tweaking read-ahead at all in the normal performance-tuning references. Do you have any documentation of the kinds of

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Vitalii Tymchyshyn
11.10.10 20:46, Craig James написав(ла): First of all, it's not true. There are plenty of applications that need an exact answer. Second, even if it is only 1%, that means it's 1% of the queries, not 1% of people. Sooner or later a large fraction of developers will run into this. It's

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Craig Ringer
On 10/12/2010 03:56 PM, Vitalii Tymchyshyn wrote: BTW: There is a lot of talk about MVCC, but is next solution possible: 1) Create a page information map that for each page in the table will tell you how may rows are within and if any write (either successful or not) were done to this page.

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread david
On Tue, 12 Oct 2010, Craig Ringer wrote: BTW: There is a lot of talk about MVCC, but is next solution possible: 1) Create a page information map that for each page in the table will tell you how may rows are within and if any write (either successful or not) were done to this page. This even

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Vitalii Tymchyshyn
12.10.10 11:14, Craig Ringer написав(ла): On 10/12/2010 03:56 PM, Vitalii Tymchyshyn wrote: BTW: There is a lot of talk about MVCC, but is next solution possible: 1) Create a page information map that for each page in the table will tell you how may rows are within and if any write (either

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Craig Ringer
On 10/12/2010 04:22 PM, da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. At the cost of a fair bit more complexity, though, and slowing everything else down. The proper solution

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Mladen Gogala
da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. Fixing PR stuff is not the approach that I would take. People are complaining about select count(*) because they're using it in

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread Robert Haas
On Mon, Oct 11, 2010 at 11:11 PM, gnuo...@rcn.com wrote: An approach that works can be found in DB2, and likely elsewhere. The key is that tablespaces/tables/indexes/buffers are all attached through the bufferpool (the DB2 term).  A tablespace/bufferpool match is defined.   Then tables and

Re: [PERFORM] Runtime dependency from size of a bytea field

2010-10-12 Thread Robert Haas
On Fri, Oct 8, 2010 at 12:53 AM, Sander, Ingo (NSN - DE/Munich) ingo.san...@nsn.com wrote: The difference to my test is that we use the ODBC interface in our C program. Could it be that the difference in the runtimes is caused by the ODBC? I've heard tell that ODBC is substantially slower than

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Jon Nelson
On Tue, Oct 12, 2010 at 7:27 AM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: So, the results weren't cached the first time around. The explanation is the fact that Oracle, as of the version 10.2.0, reads the table in the private process memory, not in the shared buffers.  This table alone

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Greg Smith
Jon Nelson wrote: Is there a command to tell postgresql to drop/clear/reset it's buffer_cache? No. Usually the sequence used to remove all cached data from RAM before a benchmark is: pg_ctl stop sync echo 3 /proc/sys/vm/drop_caches pg_ctl start -- Greg Smith, 2ndQuadrant US

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Luca Tettamanti
On Tue, Oct 12, 2010 at 3:07 PM, Jon Nelson jnelson+pg...@jamponi.net wrote: On Tue, Oct 12, 2010 at 7:27 AM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: So, the results weren't cached the first time around. The explanation is the fact that Oracle, as of the version 10.2.0, reads the table

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Jon Nelson
On Tue, Oct 12, 2010 at 8:18 AM, Greg Smith g...@2ndquadrant.com wrote: No.  Usually the sequence used to remove all cached data from RAM before a benchmark is: All cached data (as cached in postgresql - *not* the Linux system caches)..., right? -- Jon -- Sent via pgsql-performance mailing

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Tom Lane
Mladen Gogala mladen.gog...@vmsinfo.com writes: The number of rows is significantly smaller, but the table contains rather significant text field which consumes quite a bit of TOAST storage and the sizes are comparable. Postgres read through 27GB in 113 seconds, less than 2 minutes and

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Kevin Grittner
Neil Whelchel neil.whelc...@gmail.com wrote: What is the best method to make a page of results and a list of links to other pages of results? For our most heavily used web app we decided to have the renderer just read the list of cases and render the pages to disk, and then present the first

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Mladen Gogala
Tom Lane wrote: Mladen Gogala mladen.gog...@vmsinfo.com writes: The number of rows is significantly smaller, but the table contains rather significant text field which consumes quite a bit of TOAST storage and the sizes are comparable. Postgres read through 27GB in 113 seconds, less than 2

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread gnuoytr
The discussions I've seen indicated that, in use, tablespaces were at the database level, but, yes, the docs do say that a table can be assigned to a defined tablespace. What I still can't find is syntax which establishes buffers/caches/whatever and assigns them to tablespaces. Without that,

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Joe Uhl
The biggest single problem with select count(*) is that it is seriously overused. People use that idiom to establish existence, which usually leads to a performance disaster in the application using it, unless the table has no more than few hundred records. SQL language, of which PostgreSQL

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread Kevin Grittner
gnuo...@rcn.com wrote: An approach that works can be found in DB2, and likely elsewhere. The key is that tablespaces/tables/indexes/buffers are all attached through the bufferpool (the DB2 term). A tablespace/ bufferpool match is defined. Then tables and indexes are assigned to the

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread gnuoytr
Couldn't have said it better myself; covered all the bases. If PG wants to become an industrial strength database, worthy of replacing DB2/etc., then these are the sorts of knobs and switches it will need. -- None of that is anything for amateurs to play with. Not jam a stick in anybody's

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread Kevin Grittner
gnuo...@rcn.com wrote: -- None of that is anything for amateurs to play with. Not jam a stick in anybody's eye, but shouldn't database pros not be amateurs? While many PostgreSQL installations are managed by professional DBAs, or programmers or consultants with a deep enough grasp of the

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread bricklen
On Sat, Oct 9, 2010 at 4:26 PM, Neil Whelchel neil.whelc...@gmail.com wrote: Maybe an estimate(*) that works like count but gives an answer from the index without checking visibility? I am sure that this would be good enough to make a page list, it is really no big deal if it errors on the

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Kevin Grittner
Jon Nelson jnelson+pg...@jamponi.net wrote: Greg Smith g...@2ndquadrant.com wrote: Usually the sequence used to remove all cached data from RAM before a benchmark is: All cached data (as cached in postgresql - *not* the Linux system caches)..., right? No. The stop and start of

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Dan Harris
On 10/11/10 8:02 PM, Scott Carey wrote: would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme file fragmentation over time and sequential scans end up mostly random. On-line file defrag helps

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread david
On Tue, 12 Oct 2010, Joe Uhl wrote: The biggest single problem with select count(*) is that it is seriously overused. People use that idiom to establish existence, which usually leads to a performance disaster in the application using it, unless the table has no more than few hundred records.

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread david
On Tue, 12 Oct 2010, Mladen Gogala wrote: da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. Fixing PR stuff is not the approach that I would take. People are complaining about

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread david
On Tue, 12 Oct 2010, Craig Ringer wrote: On 10/12/2010 04:22 PM, da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. At the cost of a fair bit more complexity, though, and slowing

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Scott Carey
On Oct 11, 2010, at 9:21 PM, Samuel Gendler wrote: On Mon, Oct 11, 2010 at 9:06 PM, Scott Carey sc...@richrelevance.commailto:sc...@richrelevance.com wrote: I can't speak to documentation, but it is something that helps as your I/O subsystem gets more powerful, and how much it helps depends

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Samuel Gendler
On Tue, Oct 12, 2010 at 9:02 AM, Scott Carey sc...@richrelevance.comwrote: However, for large reporting queries and sequential scans, XFS will win in the long run if you use the online defragmenter. Otherwise, your sequential scans won't be all that sequential on any file system over time if

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Chris Browne
sgend...@ideasculptor.com (Samuel Gendler) writes: Geez.  I wish someone would have written something quite so bold as 'xfs is always faster than ext3' in the standard tuning docs.  I couldn't find anything that made a strong filesystem recommendation.  How does xfs compare to ext4?  I wound

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Scott Carey
A count with any joins or filter criteria would still have to scan all pages with visible tuples in them. So the visibility map helps speed up scanning of bloated tables, but doesn't provide a magical fast count except in the utterly trivial select count(*) from tablename; case, and

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Jesper Krogh
On 2010-10-12 18:02, Scott Carey wrote: However, for large reporting queries and sequential scans, XFS will win in the long run if you use the online defragmenter. Otherwise, your sequential scans won't be all that sequential on any file system over time if your tables aren't written once,

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Scott Carey
On Oct 12, 2010, at 8:39 AM, Dan Harris wrote: On 10/11/10 8:02 PM, Scott Carey wrote: would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme file fragmentation over time and sequential scans

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Scott Carey
On Oct 12, 2010, at 8:54 AM, da...@lang.hm wrote: On Tue, 12 Oct 2010, Craig Ringer wrote: On 10/12/2010 04:22 PM, da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. At the

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Scott Carey
On Oct 12, 2010, at 9:46 AM, Scott Carey wrote: On Oct 12, 2010, at 8:54 AM, da...@lang.hm wrote: On Tue, 12 Oct 2010, Craig Ringer wrote: On 10/12/2010 04:22 PM, da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid

Re: [PERFORM] large dataset with write vs read clients

2010-10-12 Thread Chris Browne
cr...@postnewspapers.com.au (Craig Ringer) writes: Hey, maybe I should try posting YouTube video answers to a few questions for kicks, see how people react ;-) And make sure it uses the same voice as is used in the MongoDB is web scale video, to ensure that people interpret it correctly :-). --

Re: [PERFORM] large dataset with write vs read clients

2010-10-12 Thread Chris Browne
mladen.gog...@vmsinfo.com (Mladen Gogala) writes: I have a logical problem with asynchronous commit. The commit command should instruct the database to make the outcome of the transaction permanent. The application should wait to see whether the commit was successful or not. Asynchronous

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Dan Harris
On 10/12/10 10:44 AM, Scott Carey wrote: On Oct 12, 2010, at 8:39 AM, Dan Harris wrote: On 10/11/10 8:02 PM, Scott Carey wrote: would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme file

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Tom Lane
So I spent a bit of quality time with oprofile this morning, and found once again that there's no substitute for having actual data before theorizing. Test case software: current Git HEAD (plus one code change explained below), compiled with --enable-debug to support oprofile, cassert off; no

[PERFORM] read only transactions

2010-10-12 Thread Jon Nelson
Are there any performance implications (benefits) to executing queries in a transaction where SET TRANSACTION READ ONLY; has been executed? -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription:

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Mladen Gogala
da...@lang.hm wrote: On Tue, 12 Oct 2010, Mladen Gogala wrote: da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. Fixing PR stuff is not the approach that I would

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Jesper Krogh
On 2010-10-12 19:07, Tom Lane wrote: Anyway, if anyone is hot to make COUNT(*) faster, that's where to look. I don't think any of the previous discussion in this thread is on-point at all, except for the parts where people suggested avoiding it. I would have to say that allthough it is

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Tom Lane
Jesper Krogh jes...@krogh.cc writes: On 2010-10-12 19:07, Tom Lane wrote: Anyway, if anyone is hot to make COUNT(*) faster, that's where to look. Just having 32 bytes bytes of payload would more or less double you time to count if I read you test results correctly?. .. and in the situation

Re: [PERFORM] read only transactions

2010-10-12 Thread Kevin Grittner
Jon Nelson jnelson+pg...@jamponi.net wrote: Are there any performance implications (benefits) to executing queries in a transaction where SET TRANSACTION READ ONLY; has been executed? I don't think it allows much optimization in any current release. It wouldn't be a bad idea to use it

Re: [PERFORM] read only transactions

2010-10-12 Thread Tom Lane
Jon Nelson jnelson+pg...@jamponi.net writes: Are there any performance implications (benefits) to executing queries in a transaction where SET TRANSACTION READ ONLY; has been executed? No. regards, tom lane -- Sent via pgsql-performance mailing list

Re: [PERFORM] read only transactions

2010-10-12 Thread Chris Browne
jnelson+pg...@jamponi.net (Jon Nelson) writes: Are there any performance implications (benefits) to executing queries in a transaction where SET TRANSACTION READ ONLY; has been executed? Directly? No. Indirectly, well, a *leetle* bit... Transactions done READ ONLY do not generate actual

[PERFORM] bulk load performance question

2010-10-12 Thread Samuel Gendler
I've got a few tables that periodically get entirely refreshed via a COPY. I don't really have a good mechanism for detecting only rows which have changed so even though the differences are small, a full copy is easiest. However, the data includes a primary key column, so I can't simply load into

Re: [PERFORM] read only transactions

2010-10-12 Thread Tom Lane
Chris Browne cbbro...@acm.org writes: jnelson+pg...@jamponi.net (Jon Nelson) writes: Are there any performance implications (benefits) to executing queries in a transaction where SET TRANSACTION READ ONLY; has been executed? Directly? No. Indirectly, well, a *leetle* bit...

Re: [PERFORM] bulk load performance question

2010-10-12 Thread Tom Lane
Samuel Gendler sgend...@ideasculptor.com writes: Is there some way to do the drop+rename in a manner which will preserve the OID or otherwise allow blocked queries to execute correctly once they unblock? No, but you could consider begin; truncate original_table; insert

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Pierre C
suggest that 99% instances of the select count(*) idiom are probably bad use of the SQL language. Well, suppose you paginate results. If the user sees that the search query returns 500 pages, there are two options : - you're google, and your sorting algorithms are so good that the answer

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Neil Whelchel
On Tuesday 12 October 2010 07:19:57 you wrote: The biggest single problem with select count(*) is that it is seriously overused. People use that idiom to establish existence, which usually leads to a performance disaster in the application using it, unless the table has no more than few

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Mladen Gogala
Pierre C wrote: Well, suppose you paginate results. If the user sees that the search query returns 500 pages, there are two options : With Google, I usually lose patience on the page 3. All that I, as an end user, need to know is whether there are more than 10 pages. The fact that

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Neil Whelchel
On Tuesday 12 October 2010 14:35:01 you wrote: suggest that 99% instances of the select count(*) idiom are probably bad use of the SQL language. Well, suppose you paginate results. If the user sees that the search query returns 500 pages, there are two options : - you're google, and

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Dan Harris
On 10/12/10 4:33 PM, Neil Whelchel wrote: On Tuesday 12 October 2010 08:39:19 Dan Harris wrote: On 10/11/10 8:02 PM, Scott Carey wrote: would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme file

Re: [PERFORM] Query much faster with enable_seqscan=0

2010-10-12 Thread Ogden
On Sep 21, 2010, at 6:30 PM, Tom Lane wrote: Ogden li...@darkstatic.com writes: SELECT tr.id, tr.sid FROM test_registration tr, INNER JOIN test_registration_result r on (tr.id = r.test_registration_id) WHERE.

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread Ron Mayer
Kevin Grittner wrote: ...Sybase named caches...segment off portions of the memory for specific caches... bind specific database objects (tables and indexes) to specific caches. ... When I posted to the list about it, the response was that LRU eviction was superior to any tuning any