Re: [PERFORM] Slow count(*) again...
Mladen Gogala wrote: I agree, but I am afraid that after the demise of SGI, XFS isn't being developed. It's back to being well maintained again; see http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html for some history here and why it's become relevant to RedHat in particular recently. -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us Author, PostgreSQL 9.0 High PerformancePre-ordering at: https://www.packtpub.com/postgresql-9-0-high-performance/book -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
Samuel Gendler wrote: I spent some time going through the various tuning docs on the wiki whie bringing some new hardware up and I can't remember seeing any discussion of tweaking read-ahead at all in the normal performance-tuning references. Do you have any documentation of the kinds of tweaking you have done and its effects on different types of workloads? Much of my recent research has gone into the book you'll see plugged below rather than the wiki. The basics of read-ahead tuning is that you can see it increase bonnie++ sequential read results when you increase it, to a point. Get to that point and stop and you should be in good shape. -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us Author, PostgreSQL 9.0 High PerformancePre-ordering at: https://www.packtpub.com/postgresql-9-0-high-performance/book -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
11.10.10 20:46, Craig James написав(ла): First of all, it's not true. There are plenty of applications that need an exact answer. Second, even if it is only 1%, that means it's 1% of the queries, not 1% of people. Sooner or later a large fraction of developers will run into this. It's probably been the most-asked question I've seen on this forum in the four years I've been here. It's a real problem, and it needs a real solution. I know it's a hard problem to solve, but can we stop hinting that those of us who have this problem are somehow being dense? BTW: There is a lot of talk about MVCC, but is next solution possible: 1) Create a page information map that for each page in the table will tell you how may rows are within and if any write (either successful or not) were done to this page. This even can be two maps to make second one really small (a bit per page) - so that it could be most time in-memory. 2) When you need to to count(*) or index check - first check if there were no writes to the page. If not - you can use count information from page info/index data without going to the page itself 3) Let vacuum clear the bit after frozing all the tuples in the page (am I using terminology correctly?). In this case all read-only (archive) data will be this bit off and index/count(*) will be really fast. Am I missing something? Best regards, Vitalii Tymchyshyn. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On 10/12/2010 03:56 PM, Vitalii Tymchyshyn wrote: BTW: There is a lot of talk about MVCC, but is next solution possible: 1) Create a page information map that for each page in the table will tell you how may rows are within and if any write (either successful or not) were done to this page. This even can be two maps to make second one really small (a bit per page) - so that it could be most time in-memory. 2) When you need to to count(*) or index check - first check if there were no writes to the page. If not - you can use count information from page info/index data without going to the page itself 3) Let vacuum clear the bit after frozing all the tuples in the page (am I using terminology correctly?). Part of this already exists. It's called the visibility map, and is present in 8.4 and above. It's not currently used for queries, but can potentially be used to aid some kinds of query. http://www.postgresql.org/docs/8.4/static/storage-vm.html In this case all read-only (archive) data will be this bit off and index/count(*) will be really fast. A count with any joins or filter criteria would still have to scan all pages with visible tuples in them. So the visibility map helps speed up scanning of bloated tables, but doesn't provide a magical fast count except in the utterly trivial select count(*) from tablename; case, and can probably only be used for accurate results when there are no read/write transactions currently open. Even if you kept a count of tuples in each page along with the mvcc transaction ID information required to determine for which transactions that count is valid, it'd only be useful if you didn't have to do any condition checks, and it'd be yet another thing to update with every insert/delete/update. Perhaps for some users that'd be worth having, but it seems to me like it'd have pretty narrow utility. I'm not sure that's the answer. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Tue, 12 Oct 2010, Craig Ringer wrote: BTW: There is a lot of talk about MVCC, but is next solution possible: 1) Create a page information map that for each page in the table will tell you how may rows are within and if any write (either successful or not) were done to this page. This even can be two maps to make second one really small (a bit per page) - so that it could be most time in-memory. 2) When you need to to count(*) or index check - first check if there were no writes to the page. If not - you can use count information from page info/index data without going to the page itself 3) Let vacuum clear the bit after frozing all the tuples in the page (am I using terminology correctly?). Part of this already exists. It's called the visibility map, and is present in 8.4 and above. It's not currently used for queries, but can potentially be used to aid some kinds of query. http://www.postgresql.org/docs/8.4/static/storage-vm.html In this case all read-only (archive) data will be this bit off and index/count(*) will be really fast. A count with any joins or filter criteria would still have to scan all pages with visible tuples in them. So the visibility map helps speed up scanning of bloated tables, but doesn't provide a magical fast count except in the utterly trivial select count(*) from tablename; case, and can probably only be used for accurate results when there are no read/write transactions currently open. Even if you kept a count of tuples in each page along with the mvcc transaction ID information required to determine for which transactions that count is valid, it'd only be useful if you didn't have to do any condition checks, and it'd be yet another thing to update with every insert/delete/update. Perhaps for some users that'd be worth having, but it seems to me like it'd have pretty narrow utility. I'm not sure that's the answer. from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. in the case where you are doing a count(*) where query and the where is on an indexed column, could the search just look at the index + the visibility mapping rather than doing an sequential search through the table? as for your worries about the accuracy of a visibility based count in the face of other transactions, wouldn't you run into the same issues if you are doing a sequential scan with the same transactions in process? David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
12.10.10 11:14, Craig Ringer написав(ла): On 10/12/2010 03:56 PM, Vitalii Tymchyshyn wrote: BTW: There is a lot of talk about MVCC, but is next solution possible: 1) Create a page information map that for each page in the table will tell you how may rows are within and if any write (either successful or not) were done to this page. This even can be two maps to make second one really small (a bit per page) - so that it could be most time in-memory. 2) When you need to to count(*) or index check - first check if there were no writes to the page. If not - you can use count information from page info/index data without going to the page itself 3) Let vacuum clear the bit after frozing all the tuples in the page (am I using terminology correctly?). Part of this already exists. It's called the visibility map, and is present in 8.4 and above. It's not currently used for queries, but can potentially be used to aid some kinds of query. http://www.postgresql.org/docs/8.4/static/storage-vm.html In this case all read-only (archive) data will be this bit off and index/count(*) will be really fast. A count with any joins or filter criteria would still have to scan all pages with visible tuples in them. If one don't use parittioning. With proper partitioning, filter can simply select a partitions. Also filtering can be mapped on the index lookup. And if one could join index hash and visibility map, much like two indexes can be bit joined now, count can be really fast for all but non-frozen tuples. So the visibility map helps speed up scanning of bloated tables, but doesn't provide a magical fast count except in the utterly trivial select count(*) from tablename; case, and can probably only be used for accurate results when there are no read/write transactions currently open. Why so? You simply has to recount the pages that are marked dirty in usual way. But count problem usually occurs when there are a lot of archive data (you need to count over 100K records) that is not modified. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On 10/12/2010 04:22 PM, da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. At the cost of a fair bit more complexity, though, and slowing everything else down. The proper solution here remains, IMO, support for visibility information in indexes, whether by storing it once in the index and once in the heap (ouch!), storing it out-of-line, or using a covering index where one or more columns are stored wholly in the index not in the table heap at all. Here are a few of the many past discussions about this that have already covered some of the same ground: http://stackoverflow.com/questions/839015/postgres-could-an-index-organized-tables-paved-way-for-faster-select-count-fr http://osdir.com/ml/db.postgresql.performance/2003-10/msg00075.html (and the rest of the thread) A decent look with Google will find many, many more. in the case where you are doing a count(*) where query and the where is on an indexed column, could the search just look at the index + the visibility mapping rather than doing an sequential search through the table? Nope, because the visibility map, which is IIRC only one bit per page, doesn't record how many tuples there are on the page, or enough information about them to determine how many of them are visible to the current transaction*. as for your worries about the accuracy of a visibility based count in the face of other transactions, wouldn't you run into the same issues if you are doing a sequential scan with the same transactions in process? No. Every tuple in a table heap in postgresql has hidden fields, some of which are used to determine whether the current transaction* can see the tuple - it may have been inserted after this transaction started, or deleted before this transaction started, so it's not visible to this transaction but may still be to others. http://www.postgresql.org/docs/current/static/ddl-system-columns.html This information isn't available in the visibility map, or in indexes. That's why PostgreSQL has to hit the heap to find it. * current transaction should really be current snapshot. The snapshot is taken at the start of the whole transaction for SERIALIZABLE isolation, and at the start of each statement for READ COMMITTED isolation. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. Fixing PR stuff is not the approach that I would take. People are complaining about select count(*) because they're using it in all the wrong places. My assessment that there is a problem with sequential scan was wrong. Now, let's again take Oracle as the measure. Someone asked me about caching the data. Here it is: SQL connect system/* Connected. SQL alter system flush buffer_cache; System altered. Elapsed: 00:00:12.68 SQL connect adbase/* Connected. SQL alter session set db_file_multiblock_read_Count=128; Session altered. Elapsed: 00:00:00.41 SQL select count(*) from ni_occurrence; COUNT(*) -- 402062638 Elapsed: 00:02:37.77 SQL select bytes/1048576 MB from user_segments 2 where segment_name='NI_OCCURRENCE'; MB -- 35329 Elapsed: 00:00:00.20 SQL So, the results weren't cached the first time around. The explanation is the fact that Oracle, as of the version 10.2.0, reads the table in the private process memory, not in the shared buffers. This table alone is 35GB in size, Oracle took 2 minutes 47 seconds to read it using the full table scan. If I do the same thing with PostgreSQL and a comparable table, Postgres is, in fact, faster: psql (9.0.1) Type help for help. news= \timing Timing is on. news= select count(*) from moreover_documents_y2010m09; count -- 17242655 (1 row) Time: 113135.114 ms news= select pg_size_pretty(pg_table_size('moreover_documents_y2010m09')); pg_size_pretty 27 GB (1 row) Time: 100.849 ms news= The number of rows is significantly smaller, but the table contains rather significant text field which consumes quite a bit of TOAST storage and the sizes are comparable. Postgres read through 27GB in 113 seconds, less than 2 minutes and oracle took 2 minutes 37 seconds to read through 35GB. I stand corrected: there is nothing wrong with the speed of the Postgres sequential scan. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How does PG know if data is in memory?
On Mon, Oct 11, 2010 at 11:11 PM, gnuo...@rcn.com wrote: An approach that works can be found in DB2, and likely elsewhere. The key is that tablespaces/tables/indexes/buffers are all attached through the bufferpool (the DB2 term). A tablespace/bufferpool match is defined. Then tables and indexes are assigned to the tablespace (and implicitly, the bufferpool). As a result, one can effectively pin data in memory. This is very useful, but not low hanging fruit to implement. The introduction of rudimentary tablespaces is a first step. I assumed that the point was to get to a DB2-like structure at some point. Yes? We already have tablespaces, and our data already is accessed through the buffer pool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Runtime dependency from size of a bytea field
On Fri, Oct 8, 2010 at 12:53 AM, Sander, Ingo (NSN - DE/Munich) ingo.san...@nsn.com wrote: The difference to my test is that we use the ODBC interface in our C program. Could it be that the difference in the runtimes is caused by the ODBC? I've heard tell that ODBC is substantially slower than a native libpq connection, but I don't know that for a fact, not being an ODBC user. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Tue, Oct 12, 2010 at 7:27 AM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: So, the results weren't cached the first time around. The explanation is the fact that Oracle, as of the version 10.2.0, reads the table in the private process memory, not in the shared buffers. This table alone is 35GB in size, Oracle took 2 minutes 47 seconds to read it using the full table scan. If I do the same thing with PostgreSQL and a comparable table, Postgres is, in fact, faster: Well, I didn't quite mean that - having no familiarity with Oracle I don't know what the alter system statement does, but I was talking specifically about the linux buffer and page cache. The easiest way to drop the linux caches in one fell swoop is: echo 3 /proc/sys/vm/drop_caches Is there a command to tell postgresql to drop/clear/reset it's buffer_cache? Clearing/dropping both the system (Linux) and the DB caches is important when doing benchmarks that involve I/O. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
Jon Nelson wrote: Is there a command to tell postgresql to drop/clear/reset it's buffer_cache? No. Usually the sequence used to remove all cached data from RAM before a benchmark is: pg_ctl stop sync echo 3 /proc/sys/vm/drop_caches pg_ctl start -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us Author, PostgreSQL 9.0 High PerformancePre-ordering at: https://www.packtpub.com/postgresql-9-0-high-performance/book -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Tue, Oct 12, 2010 at 3:07 PM, Jon Nelson jnelson+pg...@jamponi.net wrote: On Tue, Oct 12, 2010 at 7:27 AM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: So, the results weren't cached the first time around. The explanation is the fact that Oracle, as of the version 10.2.0, reads the table in the private process memory, not in the shared buffers. This table alone is 35GB in size, Oracle took 2 minutes 47 seconds to read it using the full table scan. If I do the same thing with PostgreSQL and a comparable table, Postgres is, in fact, faster: Well, I didn't quite mean that - having no familiarity with Oracle I don't know what the alter system statement does, but I was talking specifically about the linux buffer and page cache. The easiest way to drop the linux caches in one fell swoop is: echo 3 /proc/sys/vm/drop_caches AFAIK this won't affect Oracle when using direct IO (which bypasses the page cache). Luca -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Tue, Oct 12, 2010 at 8:18 AM, Greg Smith g...@2ndquadrant.com wrote: No. Usually the sequence used to remove all cached data from RAM before a benchmark is: All cached data (as cached in postgresql - *not* the Linux system caches)..., right? -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
Mladen Gogala mladen.gog...@vmsinfo.com writes: The number of rows is significantly smaller, but the table contains rather significant text field which consumes quite a bit of TOAST storage and the sizes are comparable. Postgres read through 27GB in 113 seconds, less than 2 minutes and oracle took 2 minutes 37 seconds to read through 35GB. I stand corrected: there is nothing wrong with the speed of the Postgres sequential scan. Um ... the whole point of TOAST is that the data isn't in-line. So what Postgres was actually reading through was probably quite a lot less than 27Gb. It's probably hard to make a completely apples-to-apples comparison because the two databases are so different, but I don't think this one proves that PG is faster than Oracle. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
Neil Whelchel neil.whelc...@gmail.com wrote: What is the best method to make a page of results and a list of links to other pages of results? For our most heavily used web app we decided to have the renderer just read the list of cases and render the pages to disk, and then present the first one. We set a limit of 500 entries on the list; if we get past 500 we put up a page telling them to refine their search criteria. That won't work for all circumstances, but it works well for out web app. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
Tom Lane wrote: Mladen Gogala mladen.gog...@vmsinfo.com writes: The number of rows is significantly smaller, but the table contains rather significant text field which consumes quite a bit of TOAST storage and the sizes are comparable. Postgres read through 27GB in 113 seconds, less than 2 minutes and oracle took 2 minutes 37 seconds to read through 35GB. I stand corrected: there is nothing wrong with the speed of the Postgres sequential scan. Um ... the whole point of TOAST is that the data isn't in-line. So what Postgres was actually reading through was probably quite a lot less than 27Gb. It's probably hard to make a completely apples-to-apples comparison because the two databases are so different, but I don't think this one proves that PG is faster than Oracle. regards, tom lane As is usually the case, you're right. I will try copying the table to Postgres over the weekend, my management would not look kindly upon my copying 35GB of the production data during the working hours, for the scientific reasons. I have the storage and I can test, I will post the result. I developed quite an efficient Perl script which does copying without the intervening CSV file, so that the copy should not take more than 2 hours. I will be able to impose a shared lock on the table over the weekend, so that I don't blow away the UNDO segments. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How does PG know if data is in memory?
The discussions I've seen indicated that, in use, tablespaces were at the database level, but, yes, the docs do say that a table can be assigned to a defined tablespace. What I still can't find is syntax which establishes buffers/caches/whatever and assigns them to tablespaces. Without that, I'm not sure what benefit there is to tablespaces, other than a sort of RAID-lite. Robert Original message Date: Tue, 12 Oct 2010 08:34:23 -0400 From: pgsql-performance-ow...@postgresql.org (on behalf of Robert Haas robertmh...@gmail.com) Subject: Re: [PERFORM] How does PG know if data is in memory? To: gnuo...@rcn.com Cc: pgsql-performance@postgresql.org On Mon, Oct 11, 2010 at 11:11 PM, gnuo...@rcn.com wrote: An approach that works can be found in DB2, and likely elsewhere. The key is that tablespaces/tables/indexes/buffers are all attached through the bufferpool (the DB2 term). A tablespace/bufferpool match is defined. Then tables and indexes are assigned to the tablespace (and implicitly, the bufferpool). As a result, one can effectively pin data in memory. This is very useful, but not low hanging fruit to implement. The introduction of rudimentary tablespaces is a first step. I assumed that the point was to get to a DB2-like structure at some point. Yes? We already have tablespaces, and our data already is accessed through the buffer pool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
The biggest single problem with select count(*) is that it is seriously overused. People use that idiom to establish existence, which usually leads to a performance disaster in the application using it, unless the table has no more than few hundred records. SQL language, of which PostgreSQL offers an excellent implementation, offers [NOT] EXISTS clause since its inception in the Jurassic era. The problem is with the sequential scan, not with counting. I'd even go as far as to suggest that 99% instances of the select count(*) idiom are probably bad use of the SQL language. I agree, I have seen many very bad examples of using count(*). I will go so far as to question the use of count(*) in my examples here. It there a better way to come up with a page list than using count(*)? What is the best method to make a page of results and a list of links to other pages of results? Am I barking up the wrong tree here? One way I have dealt with this on very large tables is to cache the count(*) at the application level (using memcached, terracotta, or something along those lines) and then increment that cache whenever you add a row to the relevant table. On application restart that cache is re-initialized with a regular old count(*). This approach works really well and all large systems in my experience need caching in front of the DB eventually. If you have a simpler system with say a single application/web server you can simply store the value in a variable, the specifics would depend on the language and framework you are using. Another more all-DB approach is to create a statistics tables into which you place aggregated statistics rows (num deleted, num inserted, totals, etc) at an appropriate time interval in your code. So you have rows containing aggregated statistics information for the past and some tiny portion of the new data happening right now that hasn't yet been aggregated. Queries then look like a summation of the aggregated values in the statistics table plus a count(*) over just the newest portion of the data table and are generally very fast. Overall I have found that once things get big the layers of your app stack start to blend together and have to be combined in clever ways to keep speed up. Postgres is a beast but when you run into things it can't do well just find a way to cache it or make it work together with some other persistence tech to handle those cases.
Re: [PERFORM] How does PG know if data is in memory?
gnuo...@rcn.com wrote: An approach that works can be found in DB2, and likely elsewhere. The key is that tablespaces/tables/indexes/buffers are all attached through the bufferpool (the DB2 term). A tablespace/ bufferpool match is defined. Then tables and indexes are assigned to the tablespace (and implicitly, the bufferpool). As a result, one can effectively pin data in memory. This is very useful, but not low hanging fruit to implement. This sounds similar to Sybase named caches. You can segment off portions of the memory for specific caches, break that up into space reserved for different I/O buffer sizes, and bind specific database objects (tables and indexes) to specific caches. On the few occasions where someone had failed to configure the named caches when setting up a machine, it was caught almost immediately after deployment because of end-user complaints about poor performance. This was so critical to performance for us when we were using Sybase, that one of my first reactions on finding it missing in PostgreSQL was distress over the inability to tune as I had. When I posted to the list about it, the response was that LRU eviction was superior to any tuning any human would do. I didn't and don't believe that, but have found it's close enough in the PostgreSQL environment to be *way* down my list of performance issues. In fact, when looking at the marginal benefits it would generate in PostgreSQL when done right, versus the number of people who would shoot themselves in the foot with it, even I have come around to feeling it's probably not a good idea. FWIW, the four main reasons for using it were: (1) Heavily used data could be kept fully cached in RAM and not driven out by transient activity. (2) You could flag a cache used for (1) above as using relaxed LRU accounting -- it saved a lot of time tracking repeated references, leaving more CPU for other purposes. (3) Each named cache had its own separate set of locks, reducing contention. (4) Large tables for which the heap was often were scanned in its entirety or for a range on the clustered index could be put in a relatively small cache with large I/O buffers. This avoided blowing out the default cache space for situations which almost always required disk I/O anyway. None of that is anything for amateurs to play with. You need to set up caches like that based on evidence from monitoring and do careful benchmarking of the results to actually achieve improvements over LRU logic. The introduction of rudimentary tablespaces is a first step. I assumed that the point was to get to a DB2-like structure at some point. Yes? As far as I can tell, there is nobody with that intent. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How does PG know if data is in memory?
Couldn't have said it better myself; covered all the bases. If PG wants to become an industrial strength database, worthy of replacing DB2/etc., then these are the sorts of knobs and switches it will need. -- None of that is anything for amateurs to play with. Not jam a stick in anybody's eye, but shouldn't database pros not be amateurs? Or are most PG-ers coders who don't really want to design and tune a database? Robert Original message Date: Tue, 12 Oct 2010 09:35:56 -0500 From: pgsql-performance-ow...@postgresql.org (on behalf of Kevin Grittner kevin.gritt...@wicourts.gov) Subject: Re: [PERFORM] How does PG know if data is in memory? To: pgsql-performance@postgresql.org,gnuo...@rcn.com gnuo...@rcn.com wrote: An approach that works can be found in DB2, and likely elsewhere. The key is that tablespaces/tables/indexes/buffers are all attached through the bufferpool (the DB2 term). A tablespace/ bufferpool match is defined. Then tables and indexes are assigned to the tablespace (and implicitly, the bufferpool). As a result, one can effectively pin data in memory. This is very useful, but not low hanging fruit to implement. This sounds similar to Sybase named caches. You can segment off portions of the memory for specific caches, break that up into space reserved for different I/O buffer sizes, and bind specific database objects (tables and indexes) to specific caches. On the few occasions where someone had failed to configure the named caches when setting up a machine, it was caught almost immediately after deployment because of end-user complaints about poor performance. This was so critical to performance for us when we were using Sybase, that one of my first reactions on finding it missing in PostgreSQL was distress over the inability to tune as I had. When I posted to the list about it, the response was that LRU eviction was superior to any tuning any human would do. I didn't and don't believe that, but have found it's close enough in the PostgreSQL environment to be *way* down my list of performance issues. In fact, when looking at the marginal benefits it would generate in PostgreSQL when done right, versus the number of people who would shoot themselves in the foot with it, even I have come around to feeling it's probably not a good idea. FWIW, the four main reasons for using it were: (1) Heavily used data could be kept fully cached in RAM and not driven out by transient activity. (2) You could flag a cache used for (1) above as using relaxed LRU accounting -- it saved a lot of time tracking repeated references, leaving more CPU for other purposes. (3) Each named cache had its own separate set of locks, reducing contention. (4) Large tables for which the heap was often were scanned in its entirety or for a range on the clustered index could be put in a relatively small cache with large I/O buffers. This avoided blowing out the default cache space for situations which almost always required disk I/O anyway. None of that is anything for amateurs to play with. You need to set up caches like that based on evidence from monitoring and do careful benchmarking of the results to actually achieve improvements over LRU logic. The introduction of rudimentary tablespaces is a first step. I assumed that the point was to get to a DB2-like structure at some point. Yes? As far as I can tell, there is nobody with that intent. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How does PG know if data is in memory?
gnuo...@rcn.com wrote: -- None of that is anything for amateurs to play with. Not jam a stick in anybody's eye, but shouldn't database pros not be amateurs? While many PostgreSQL installations are managed by professional DBAs, or programmers or consultants with a deep enough grasp of the issues to tune a knob like that appropriately, PostgreSQL is also used in environments without such staff. In fact, there is pressure to make PostgreSQL easier to configure for exactly that reason. If we add more knobs which are this hard to tune correctly, we would risk inundation with complaints from people to tried to use it and made things worse. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Sat, Oct 9, 2010 at 4:26 PM, Neil Whelchel neil.whelc...@gmail.com wrote: Maybe an estimate(*) that works like count but gives an answer from the index without checking visibility? I am sure that this would be good enough to make a page list, it is really no big deal if it errors on the positive side, maybe the list of pages has an extra page off the end. I can live with that. What I can't live with is taking 13 seconds to get a page of results from 850,000 rows in a table. -Neil- FWIW, Michael Fuhr wrote a small function to parse the EXPLAIN plan a few years ago and it works pretty well assuming your stats are up to date. http://markmail.org/message/gknqthlwry2eoqey -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
Jon Nelson jnelson+pg...@jamponi.net wrote: Greg Smith g...@2ndquadrant.com wrote: Usually the sequence used to remove all cached data from RAM before a benchmark is: All cached data (as cached in postgresql - *not* the Linux system caches)..., right? No. The stop and start of PostgreSQL causes empty PostgreSQL caches. These lines, in between the stop and start, force the Linux cache to be empty (on recent kernel versions): sync echo 3 /proc/sys/vm/drop_caches -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On 10/11/10 8:02 PM, Scott Carey wrote: would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme file fragmentation over time and sequential scans end up mostly random. On-line file defrag helps tremendously. We just had a corrupt table caused by an XFS online defrag. I'm scared to use this again while the db is live. Has anyone else found this to be safe? But, I can vouch for the fragmentation issue, it happens very quickly in our system. -Dan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Tue, 12 Oct 2010, Joe Uhl wrote: The biggest single problem with select count(*) is that it is seriously overused. People use that idiom to establish existence, which usually leads to a performance disaster in the application using it, unless the table has no more than few hundred records. SQL language, of which PostgreSQL offers an excellent implementation, offers [NOT] EXISTS clause since its inception in the Jurassic era. The problem is with the sequential scan, not with counting. I'd even go as far as to suggest that 99% instances of the select count(*) idiom are probably bad use of the SQL language. I agree, I have seen many very bad examples of using count(*). I will go so far as to question the use of count(*) in my examples here. It there a better way to come up with a page list than using count(*)? What is the best method to make a page of results and a list of links to other pages of results? Am I barking up the wrong tree here? One way I have dealt with this on very large tables is to cache the count(*) at the application level (using memcached, terracotta, or something along those lines) and then increment that cache whenever you add a row to the relevant table. On application restart that cache is re-initialized with a regular old count(*). This approach works really well and all large systems in my experience need caching in front of the DB eventually. If you have a simpler system with say a single application/web server you can simply store the value in a variable, the specifics would depend on the language and framework you are using. this works if you know ahead of time what the criteria of the search is going to be. so it will work for select count(*) from table; what this won't work for is cases wher the criteria of the search is unpredictable, i.e. ask the user for input select count(*) from table where field=$input; David Lang Another more all-DB approach is to create a statistics tables into which you place aggregated statistics rows (num deleted, num inserted, totals, etc) at an appropriate time interval in your code. So you have rows containing aggregated statistics information for the past and some tiny portion of the new data happening right now that hasn't yet been aggregated. Queries then look like a summation of the aggregated values in the statistics table plus a count(*) over just the newest portion of the data table and are generally very fast. Overall I have found that once things get big the layers of your app stack start to blend together and have to be combined in clever ways to keep speed up. Postgres is a beast but when you run into things it can't do well just find a way to cache it or make it work together with some other persistence tech to handle those cases. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Tue, 12 Oct 2010, Mladen Gogala wrote: da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. Fixing PR stuff is not the approach that I would take. People are complaining about select count(*) because they're using it in all the wrong places. that may be the case, but if it's possible to make it less painful it will mean more people use postgres, both because it works better for them when they are using the suboptimal programs, but also because when people do their trivial testing of databases to decide which one they will use, they won't rule out postgres because it's so slow the fact of the matter is that people do use count(*), and even though there are usually ways to avoid doing so, having the programmer have to do something different for postgres than they do for other databases is raising a barrier against postgres untilization in anything. David Lang My assessment that there is a problem with sequential scan was wrong. Now, let's again take Oracle as the measure. Someone asked me about caching the data. Here it is: SQL connect system/* Connected. SQL alter system flush buffer_cache; System altered. Elapsed: 00:00:12.68 SQL connect adbase/* Connected. SQL alter session set db_file_multiblock_read_Count=128; Session altered. Elapsed: 00:00:00.41 SQL select count(*) from ni_occurrence; COUNT(*) -- 402062638 Elapsed: 00:02:37.77 SQL select bytes/1048576 MB from user_segments 2 where segment_name='NI_OCCURRENCE'; MB -- 35329 Elapsed: 00:00:00.20 SQL So, the results weren't cached the first time around. The explanation is the fact that Oracle, as of the version 10.2.0, reads the table in the private process memory, not in the shared buffers. This table alone is 35GB in size, Oracle took 2 minutes 47 seconds to read it using the full table scan. If I do the same thing with PostgreSQL and a comparable table, Postgres is, in fact, faster: psql (9.0.1) Type help for help. news= \timing Timing is on. news= select count(*) from moreover_documents_y2010m09; count -- 17242655 (1 row) Time: 113135.114 ms news= select pg_size_pretty(pg_table_size('moreover_documents_y2010m09')); pg_size_pretty 27 GB (1 row) Time: 100.849 ms news= The number of rows is significantly smaller, but the table contains rather significant text field which consumes quite a bit of TOAST storage and the sizes are comparable. Postgres read through 27GB in 113 seconds, less than 2 minutes and oracle took 2 minutes 37 seconds to read through 35GB. I stand corrected: there is nothing wrong with the speed of the Postgres sequential scan. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Tue, 12 Oct 2010, Craig Ringer wrote: On 10/12/2010 04:22 PM, da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. At the cost of a fair bit more complexity, though, and slowing everything else down. complexity probably, although given how complex the planner is already is this significant? as far as slowing everything else down, why would it do that? (beyond the simple fact that any new thing the planner can do makes the planner take a little longer) David Lang The proper solution here remains, IMO, support for visibility information in indexes, whether by storing it once in the index and once in the heap (ouch!), storing it out-of-line, or using a covering index where one or more columns are stored wholly in the index not in the table heap at all. Here are a few of the many past discussions about this that have already covered some of the same ground: http://stackoverflow.com/questions/839015/postgres-could-an-index-organized-tables-paved-way-for-faster-select-count-fr http://osdir.com/ml/db.postgresql.performance/2003-10/msg00075.html (and the rest of the thread) A decent look with Google will find many, many more. in the case where you are doing a count(*) where query and the where is on an indexed column, could the search just look at the index + the visibility mapping rather than doing an sequential search through the table? Nope, because the visibility map, which is IIRC only one bit per page, doesn't record how many tuples there are on the page, or enough information about them to determine how many of them are visible to the current transaction*. as for your worries about the accuracy of a visibility based count in the face of other transactions, wouldn't you run into the same issues if you are doing a sequential scan with the same transactions in process? No. Every tuple in a table heap in postgresql has hidden fields, some of which are used to determine whether the current transaction* can see the tuple - it may have been inserted after this transaction started, or deleted before this transaction started, so it's not visible to this transaction but may still be to others. http://www.postgresql.org/docs/current/static/ddl-system-columns.html This information isn't available in the visibility map, or in indexes. That's why PostgreSQL has to hit the heap to find it. * current transaction should really be current snapshot. The snapshot is taken at the start of the whole transaction for SERIALIZABLE isolation, and at the start of each statement for READ COMMITTED isolation. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Oct 11, 2010, at 9:21 PM, Samuel Gendler wrote: On Mon, Oct 11, 2010 at 9:06 PM, Scott Carey sc...@richrelevance.commailto:sc...@richrelevance.com wrote: I can't speak to documentation, but it is something that helps as your I/O subsystem gets more powerful, and how much it helps depends more on your hardware, which may have adaptive read ahead on its own, and your file system which may be more or less efficient at sequential I/O. For example ext3 out of the box gets a much bigger gain from tuning read-ahead than XFS on a DELL PERC6 RAID card (but still ends up slower). Geez. I wish someone would have written something quite so bold as 'xfs is always faster than ext3' in the standard tuning docs. I couldn't find anything that made a strong filesystem recommendation. How does xfs compare to ext4? I wound up on ext4 on a dell perc6 raid card when an unexpected hardware failure on a production system caused my test system to get thrown into production before I could do any serious testing of xfs. If there is a strong consensus that xfs is simply better, I could afford the downtime to switch. As it happens, this is a system where all of the heavy workload is in the form of sequential scan type load. The OLTP workload is very minimal (tens of queries per minute on a small number of small tables), but there are a lot of reporting queries that wind up doing sequential scans of large partitions (millions to tens of millions of rows). We've sized the new hardware so that the most commonly used partitions fit into memory, but if we could speed the queries that touch less frequently used partitions, that would be good. I'm the closest thing our team has to a DBA, which really only means that I'm the one person on the dev team or the ops team to have read all of the postgres docs and wiki and the mailing lists. I claim no actual DBA experience or expertise and have limited cycles to devote to tuning and testing, so if there is an established wisdom for filesystem choice and read ahead tuning, I'd be very interested in hearing it. ext4 is a very fast file system. Its faster than ext2, but has many more features and has the all-important journaling. However, for large reporting queries and sequential scans, XFS will win in the long run if you use the online defragmenter. Otherwise, your sequential scans won't be all that sequential on any file system over time if your tables aren't written once, forever, serially. Parallel restore will result in a system that is fragmented -- ext4 will do best at limiting this on the restore, but only xfs has online defragmentation. We schedule ours daily and it noticeably improves sequential scan I/O. Supposedly, an online defragmenter is in the works for ext4 but it may be years before its available.
Re: [PERFORM] Slow count(*) again...
On Tue, Oct 12, 2010 at 9:02 AM, Scott Carey sc...@richrelevance.comwrote: However, for large reporting queries and sequential scans, XFS will win in the long run if you use the online defragmenter. Otherwise, your sequential scans won't be all that sequential on any file system over time if your tables aren't written once, forever, serially. Parallel restore will result in a system that is fragmented -- ext4 will do best at limiting this on the restore, but only xfs has online defragmentation. We schedule ours daily and it noticeably improves sequential scan I/O. Our reporting tables are written sequentially and left unmodified until entire partitions are dropped. However, equivalent partitions tend to get a little bit larger over time, so newer partitions won't necessarily fit into the gaps left by prior partition drops, so it is possible that partitions will be split into two sections, but should still be very sequential, if not perfectly so. It would seem that we stumbled into an ideal architecture for doing this kind of work - mostly by virtue of starting with 8.2.x and having huge problems with autovacuum and vacuum taking forever and dragging the db to halt, which caused us to move to an architecture which aggregates and then drops older data in entire partitions instead of updating aggregates individually and then deleting rows. Partitions are sized such that most reporting queries run over entire partitions, too (which was completely accidental since I had not yet delved into individual query optimization at the time), so even though we are doing sequential scans, we at least run as few of them as possible and are able to keep hot data in memory. --sam
Re: [PERFORM] Slow count(*) again...
sgend...@ideasculptor.com (Samuel Gendler) writes: Geez. I wish someone would have written something quite so bold as 'xfs is always faster than ext3' in the standard tuning docs. I couldn't find anything that made a strong filesystem recommendation. How does xfs compare to ext4? I wound up on ext4 on a dell perc6 raid card when an unexpected hardware failure on a production system caused my test system to get thrown into production before I could do any serious testing of xfs. If there is a strong consensus that xfs is simply better, I could afford the downtime to switch. It's news to me (in this thread!) that XFS is actually getting some developer love, which is a pretty crucial factor to considering it relevant. XFS was an SGI creation, and, with: a) the not-scintillating performance of the company, b) the lack of a lot of visible work going into the filesystem, c) the paucity of support by Linux vendors (for a long time, if you told RHAT you were having problems, and were using XFS, the next step would be to park the ticket awaiting your installing a supported filesystem) it didn't look like XFS was a terribly good bet. Those issues were certainly causing concern a couple of years ago. Faster raw performance isn't much good if it comes with a risk of: - Losing data - Losing support from vendors If XFS now *is* getting support from both the development and support perspectives, then the above concerns may have been invalidated. It would be very encouraging, if so. -- output = (cbbrowne @ gmail.com) Rules of the Evil Overlord #228. If the hero claims he wishes to confess in public or to me personally, I will remind him that a notarized deposition will serve just as well. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
A count with any joins or filter criteria would still have to scan all pages with visible tuples in them. So the visibility map helps speed up scanning of bloated tables, but doesn't provide a magical fast count except in the utterly trivial select count(*) from tablename; case, and can probably only be used for accurate results when there are no read/write transactions currently open. select count(*) from tablename where [condition or filter that can use an index] [group by on columns in the index] will also work, I think. Additionally, I think it can work if other open transactions exist, provided they haven't written to the table being scanned. If they have, then only those pages that have been altered and marked in the visibility map need to be cracked open the normal way. Even if you kept a count of tuples in each page along with the mvcc transaction ID information required to determine for which transactions that count is valid, it'd only be useful if you didn't have to do any condition checks, and it'd be yet another thing to update with every insert/delete/update. Yes, lots of drawbacks and added complexity. Perhaps for some users that'd be worth having, but it seems to me like it'd have pretty narrow utility. I'm not sure that's the answer. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On 2010-10-12 18:02, Scott Carey wrote: However, for large reporting queries and sequential scans, XFS will win in the long run if you use the online defragmenter. Otherwise, your sequential scans won't be all that sequential on any file system over time if your tables aren't written once, forever, serially. Parallel restore will result in a system that is fragmented -- ext4 will do best at limiting this on the restore, but only xfs has online defragmentation. We schedule ours daily and it noticeably improves sequential scan I/O. Supposedly, an online defragmenter is in the works for ext4 but it may be years before its available. If some clever postgres hacker could teach postgres to allocate blocks using posix_fallocate in quite large batches, say .. something like: fallocate(min(current_relation_size *0.1,1073741824)); So if you have a relations filling 10GB allready, they the next file for the relations is just fully allocated on the first byte by the filesystem. That would ensure that large table is sitting efficiently on the filesystem level with a minimum of fragmentation on ext4(and other FS's supporting posix_fallocate) and for small systems it would only fill 10% more of diskspace... .. .. last night I spend an hour looking for where its done but couldnt find the source-file where extention of an existing relation takes place.. can someone give directions? -- Jesper
Re: [PERFORM] Slow count(*) again...
On Oct 12, 2010, at 8:39 AM, Dan Harris wrote: On 10/11/10 8:02 PM, Scott Carey wrote: would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme file fragmentation over time and sequential scans end up mostly random. On-line file defrag helps tremendously. We just had a corrupt table caused by an XFS online defrag. I'm scared to use this again while the db is live. Has anyone else found this to be safe? But, I can vouch for the fragmentation issue, it happens very quickly in our system. What version? I'm using the latest CentoOS extras build. We've been doing online defrag for a while now on a very busy database with 8TB of data. Not that that means there are no bugs... It is a relatively simple thing in xfs -- it writes a new file to temp in a way that allocates contiguous space if available, then if the file has not been modified since it was re-written it is essentially moved on top of the other one. This should be safe provided the journaling and storage is safe, etc. -Dan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Oct 12, 2010, at 8:54 AM, da...@lang.hm wrote: On Tue, 12 Oct 2010, Craig Ringer wrote: On 10/12/2010 04:22 PM, da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. At the cost of a fair bit more complexity, though, and slowing everything else down. complexity probably, although given how complex the planner is already is this significant? as far as slowing everything else down, why would it do that? (beyond the simple fact that any new thing the planner can do makes the planner take a little longer) David Lang I wouldn't even expect the planner to do more work. An Index Scan can simply avoid going to the tuples for visibility under some circumstances. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Oct 12, 2010, at 9:46 AM, Scott Carey wrote: On Oct 12, 2010, at 8:54 AM, da...@lang.hm wrote: On Tue, 12 Oct 2010, Craig Ringer wrote: On 10/12/2010 04:22 PM, da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. At the cost of a fair bit more complexity, though, and slowing everything else down. complexity probably, although given how complex the planner is already is this significant? as far as slowing everything else down, why would it do that? (beyond the simple fact that any new thing the planner can do makes the planner take a little longer) David Lang I wouldn't even expect the planner to do more work. An Index Scan can simply avoid going to the tuples for visibility under some circumstances. Of course, the planner has to Otherwise it won't choose the Index Scan over the sequential scan. So the cost of index scans when all the info other than visibility is in the index would need to be lowered. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large dataset with write vs read clients
cr...@postnewspapers.com.au (Craig Ringer) writes: Hey, maybe I should try posting YouTube video answers to a few questions for kicks, see how people react ;-) And make sure it uses the same voice as is used in the MongoDB is web scale video, to ensure that people interpret it correctly :-). -- output = (cbbrowne @ gmail.com) http://linuxdatabases.info/info/nonrdbms.html The *Worst* Things to Say to a Police Officer: Hey, is that a 9 mm? That's nothing compared to this .44 magnum. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large dataset with write vs read clients
mladen.gog...@vmsinfo.com (Mladen Gogala) writes: I have a logical problem with asynchronous commit. The commit command should instruct the database to make the outcome of the transaction permanent. The application should wait to see whether the commit was successful or not. Asynchronous behavior in the commit statement breaks the ACID rules and should not be used in a RDBMS system. If you don't need ACID, you may not need RDBMS at all. You may try with MongoDB. MongoDB is web scale: http://www.youtube.com/watch?v=b2F-DItXtZs The client always has the option of connecting to a set of databases, and stowing parts of the data hither and thither. That often leads to the relaxation called BASE. (And IBM has been selling that relaxation as MQ-Series since the early '90s!) There often *ARE* cases where it is acceptable for some of the data to not be as durable, because that data is readily reconstructed. This is particularly common for calculated/cached/aggregated data. Many things can get relaxed for a data warehouse data store, where the database is not authoritative, but rather aggregates data drawn from other authoritative sources. In such applications, neither the A, C, I, nor the D are pointedly crucial, in the DW data store. - We don't put the original foreign key constraints into the DW database; they don't need to be enforced a second time. Ditto for constraints of all sorts. - Batching of the loading of updates is likely to break several of the letters. And I find it *quite* acceptable to lose D if the data may be safely reloaded into the DW database. I don't think this is either cavalier nor that it points to MongoDB is web scale. -- cbbrowne,@,gmail.com Rules of the Evil Overlord #181. I will decree that all hay be shipped in tightly-packed bales. Any wagonload of loose hay attempting to pass through a checkpoint will be set on fire. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On 10/12/10 10:44 AM, Scott Carey wrote: On Oct 12, 2010, at 8:39 AM, Dan Harris wrote: On 10/11/10 8:02 PM, Scott Carey wrote: would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme file fragmentation over time and sequential scans end up mostly random. On-line file defrag helps tremendously. We just had a corrupt table caused by an XFS online defrag. I'm scared to use this again while the db is live. Has anyone else found this to be safe? But, I can vouch for the fragmentation issue, it happens very quickly in our system. What version? I'm using the latest CentoOS extras build. We've been doing online defrag for a while now on a very busy database with 8TB of data. Not that that means there are no bugs... It is a relatively simple thing in xfs -- it writes a new file to temp in a way that allocates contiguous space if available, then if the file has not been modified since it was re-written it is essentially moved on top of the other one. This should be safe provided the journaling and storage is safe, etc. I'm not sure how to figure out what version of XFS we're on.. but it's Linux kernel 2.6.24-24 x86_64 on Ubuntu Server 8.04.3. Postgres version 8.3 We're due for an upgrade on that server soon so we'll do some more testing once we upgrade. Right now we are just living with the fragmentation. I'm glad to hear the regular on-line defrag is working successfully, at least that gives me hope we can rely on it in the future. -Dan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
So I spent a bit of quality time with oprofile this morning, and found once again that there's no substitute for having actual data before theorizing. Test case software: current Git HEAD (plus one code change explained below), compiled with --enable-debug to support oprofile, cassert off; no other special configure options. Running on current Fedora 13 (gcc 4.4.4 in particular). All postgresql.conf options are out-of-the-box. Test case hardware: recently purchased mid-grade desktop, dual Xeon E5503 processors (Nehalem cores, 2GHZ), 4GB DDR3-800 RAM, no-name SATA disk. Test query: select count(*) from t where t has 4 nonnull integer columns and 8192 rows, occupying 3459MB. I chose that size specifically to fit into available RAM, so that on repeated executions no physical I/O will occur. On this setup I find that select count(*) runs in about 7.5sec when the data is fully cached in RAM, for a scanning speed of 460MB/sec. This is well in excess of what the machine's disk hardware can do: bonnie++ rates the machine's disk read speed at 152MB/sec. So in theory PG should be able to completely saturate the disk when processing a table bigger than RAM. In reality the test case run time if I've just flushed cache is about 28sec, working out to a scan rate of 123MB/sec. I expect if I'd bothered to tune the kernel readahead parameters as outlined earlier in this thread, I could get to 150MB/sec. Now of course this disk setup is far from industrial strength, but the processor isn't what you'd put in a serious database server either (in particular, its available memory bandwidth is well behind the curve). Also, the table is pretty narrow (only 16 payload bytes per row), and any wider test table would show a pretty much linear scaling of achievable scan rate versus table width. So I don't see much support here at all for the notion that we scan slower than available disk bandwidth. Further details from poking at it with oprofile: in the fully-cached case the CPU time is about 80% Postgres and 20% kernel. That kernel time is of course all to do with moving pages from kernel disk buffers into Postgres shared memory. Although I've not bothered to increase shared_buffers from the default 32MB, it wouldn't matter on this benchmark unless I were able to make shared_buffers hold the entire table ... and even then I'd only save 20%. oprofile further shows that (with stock Postgres sources) the userspace runtime breaks down like this: samples %symbol name 141267 13.0810 heapgettup_pagemode 85947 7.9585 advance_aggregates 83031 7.6885 ExecProject 78975 7.3129 advance_transition_function 75060 6.9504 heapgetpage 73540 6.8096 ExecClearTuple 69355 6.4221 ExecProcNode 59288 5.4899 heap_getnext 57745 5.3470 ExecScan 55618 5.1501 HeapTupleSatisfiesMVCC 47057 4.3574 MemoryContextReset 41904 3.8802 ExecStoreTuple 37146 3.4396 SeqNext 32206 2.9822 ExecAgg 22135 2.0496 XidInMVCCSnapshot 21142 1.9577 int8inc 19280 1.7853 AllocSetReset 18211 1.6863 hash_search_with_hash_value 16285 1.5079 TransactionIdPrecedes I also looked at the source-line-level breakdown, though that's too bulky to post here. The most interesting fact here is that tuple visibility testing (MVCC) overhead is simply nonexistent: it'd be in heapgetpage() if it were being done, which it isn't because all the pages of the table have the PageIsAllVisible bit set. In a previous run where those bits weren't set but the per-tuple hint bits were, visibility testing still only ate a percent or two of the runtime. So the theory some people have espoused in this thread that visibility testing is the bottleneck doesn't hold water either. If you go back and look at previous pgsql-hackers discussions about that, what people have been worried about is not the CPU cost of visibility testing but the need for indexscan queries to visit the heap for no other purpose than to check the visibility flags. In a seqscan it's not going to matter. I looked a bit more closely at the heapgettup_pagemode timing. The lines shown by opannotate as more than 0.1 percent of the runtime are 22545 2.2074 :{ /* heapgettup_pagemode total: 153737 15.0528 */ 5685 0.5566 :boolbackward = ScanDirectionIsBackward(dir); 5789 0.5668 :if (!scan-rs_inited) 5693 0.5574 :lineindex = scan-rs_cindex + 1; 11429 1.1190 :dp = (Page) BufferGetPage(scan-rs_cbuf); 5693 0.5574 :linesleft = lines - lineindex; 5766 0.5646 :while (linesleft 0) 5129 0.5022 :lineoff = scan-rs_vistuples[lineindex]; 44461 4.3533 :tuple-t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp); 11135 1.0903 :tuple-t_len = ItemIdGetLength(lpp); 5692 0.5573 :if (key != NULL) 5773 0.5653 :
[PERFORM] read only transactions
Are there any performance implications (benefits) to executing queries in a transaction where SET TRANSACTION READ ONLY; has been executed? -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
da...@lang.hm wrote: On Tue, 12 Oct 2010, Mladen Gogala wrote: da...@lang.hm wrote: from a PR point of view, speeding up the trivil count(*) case could be worth it, just to avoid people complaining about it not being fast. Fixing PR stuff is not the approach that I would take. People are complaining about select count(*) because they're using it in all the wrong places. that may be the case, but if it's possible to make it less painful it will mean more people use postgres, both because it works better for them when they are using the suboptimal programs, but also because when people do their trivial testing of databases to decide which one they will use, they won't rule out postgres because it's so slow There is no free lunch. If the count field is maintained somewhere, the concurrency will suffer. I find the idea of fixing the count delusion ridiculous, may Richard Dawkins forgive me for this pun. Saying that something is slow without testing and a proper consideration is ridiculous. As a DBA, I usually get complaints like the database is slow today 3 times before lunch, every day. The database is never slow, the database is a warehouse where you keep your data. What is slow is the access to the data, and that is done by, guess what, the application program. Almost always, it's the application that's slow, not the database. As for the select count(*), idiom, what are you trying to do? Where are you using it? If you are using it for pagination, consider the possibility of not specifying the number of pages on the website, just the links next -- and prev --. Alternatively, you can fetch a small amount into the web page and direct the users who would like to see the complete information to a background reporting too. Mixing batch reports and online reports is a very easy thing to do. If you are using it to establish existence, you're doing it wrong. I've had a problem like that this morning. A developer came to me with the usual phrase that the database is slow. It was a PHP form which should write an output file and let the user know where the file is. The function looks like this: function put_xls($sth) { global $FNAME; $FNAME=$FNAME..xls; $lineno=0; $ncols=$sth-FieldCount(); for ($i = 0;$i = $ncols;$i++) { $cols[$i] = $sth-FetchField($i); $colnames[$i]=$cols[$i]-name; } $workbook = new Spreadsheet_Excel_Writer(/software$FNAME); $format_bold = $workbook-addFormat(); $format_bold-setBold(); $format_bold-setAlign('left'); $format_left = $workbook-addFormat(); $format_left-setAlign('left'); $worksheet = $workbook-addWorksheet('Moreover Search'); $worksheet-writeRow($lineno++,0,$colnames,$format_bold); while($row=$sth-FetchRow()) { $worksheet-writeRow($lineno++,0,$row,$format_left); } $workbook-close(); $cnt=$sth-Recordcount(); return($cnt); } The relevant includes are here: require ('Date.php'); require ('adodb5/tohtml.inc.php'); require_once ('adodb5/adodb.inc.php'); require_once ('adodb5/adodb-exceptions.inc.php'); require_once 'Spreadsheet/Excel/Writer.php'; $ADODB_FETCH_MODE = ADODB_FETCH_NUM; So, what is the problem here? Why was the database slow? As it turns out, the PEAR module for writing Excel spreadsheets, which is the tool used here, creates the entire spreadsheet in memory and writes it out on the close command. What was spinning was httpd process, the database was completely and utterly idle, rolling thumbs and awaiting orders. Using the fputcsv instead, made the function fly. The only thing that was lost were the bold column titles. Changing little things can result in the big performance gains. Making select count(*) unnaturally fast would be tending to bad programming practices. I am not sure that this is a desirable development. You can't expect people to adjust the database software to your application. Applications are always database specific. Writing an application that will access a PostgreSQL database is not the same as writing an application that will access an Oracle database. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On 2010-10-12 19:07, Tom Lane wrote: Anyway, if anyone is hot to make COUNT(*) faster, that's where to look. I don't think any of the previous discussion in this thread is on-point at all, except for the parts where people suggested avoiding it. I would have to say that allthough it is nice to get count(*) faster I think your testing is way too simple. It pretty much proves that in terms of the code involved in the count(*) process there is not much to be achieved. But your table has way to little payload. As PG currently is it will start by pushing data off to TOAST when the tuple size reaches 1KB and the speed of count(*) is very much dominated by the amount of dead weight it has to draw in together with the heap-access for the row on accessing the table. Creating a case where the table is this slim is (in my viewpoint) very much to the extreme on the small side. Just having 32 bytes bytes of payload would more or less double you time to count if I read you test results correctly?. .. and in the situation where diskaccess would be needed .. way more. Dividing by pg_relation_size by the amout of tuples in our production system I end up having no avg tuple size less than 100bytes. .. without having complete insigt.. a visibillity map that could be used in conjunction with indices would solve that. What the cost would be of maintaining it is also a factor. Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
Jesper Krogh jes...@krogh.cc writes: On 2010-10-12 19:07, Tom Lane wrote: Anyway, if anyone is hot to make COUNT(*) faster, that's where to look. Just having 32 bytes bytes of payload would more or less double you time to count if I read you test results correctly?. .. and in the situation where diskaccess would be needed .. way more. Dividing by pg_relation_size by the amout of tuples in our production system I end up having no avg tuple size less than 100bytes. Well, yeah. I deliberately tested with a very narrow table so as to stress the per-row CPU costs as much as possible. With any wider table you're just going to be I/O bound. .. without having complete insigt.. a visibillity map that could be used in conjunction with indices would solve that. What the cost would be of maintaining it is also a factor. I'm less than convinced that that approach will result in a significant win. It's certainly not going to do anything to convert COUNT(*) into an O(1) operation, which frankly is what the complainants are expecting. There's basically no hope of solving the PR problem without somehow turning COUNT(*) into a materialized-view reference. We've discussed that in the past, and know how to do it in principle, but the complexity and distributed overhead are daunting. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] read only transactions
Jon Nelson jnelson+pg...@jamponi.net wrote: Are there any performance implications (benefits) to executing queries in a transaction where SET TRANSACTION READ ONLY; has been executed? I don't think it allows much optimization in any current release. It wouldn't be a bad idea to use it where appropriate, though, as future releases might do something with it. If you include this on the BEGIN statement, that will save a round trip. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] read only transactions
Jon Nelson jnelson+pg...@jamponi.net writes: Are there any performance implications (benefits) to executing queries in a transaction where SET TRANSACTION READ ONLY; has been executed? No. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] read only transactions
jnelson+pg...@jamponi.net (Jon Nelson) writes: Are there any performance implications (benefits) to executing queries in a transaction where SET TRANSACTION READ ONLY; has been executed? Directly? No. Indirectly, well, a *leetle* bit... Transactions done READ ONLY do not generate actual XIDs, which reduces the amount of XID generation (pretty tautological!), which reduces the need to do VACUUM to protect against XID wraparound. http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html#VACUUM-BASICS If you process 50 million transactions, that chews thru 50 million XIDs. If 45 million of those were processed via READ ONLY transactions, then the same processing only chews thru 5 million XIDs, meaning that the XID-relevant vacuums can be done rather less frequently. This only terribly much matters if: a) your database is so large that there are tables on which VACUUM would run for a very long time, and b) you are chewing through XIDs mighty quickly. If either condition isn't true, then the indirect effect isn't important either. -- let name=cbbrowne and tld=gmail.com in name ^ @ ^ tld;; I'm not switching from slrn. I'm quite confident that anything that *needs* to be posted in HTML is fatuous garbage not worth my time. -- David M. Cook davec...@home.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] bulk load performance question
I've got a few tables that periodically get entirely refreshed via a COPY. I don't really have a good mechanism for detecting only rows which have changed so even though the differences are small, a full copy is easiest. However, the data includes a primary key column, so I can't simply load into the existing table and then drop older rows. So we load into a table with a different name and then, within a transaction, drop the old and rename the new. However, while the transaction will cause a query against that table to block until the transaction commits, when the transaction commits, the blocked query will fail with an error message like: ERROR: could not open relation with OID 17556 Is there some way to do the drop+rename in a manner which will preserve the OID or otherwise allow blocked queries to execute correctly once they unblock? A secondary issue is that if permissions were granted to a role on the old table, the new table does not acquire those permissions and they must be granted again. The biggest table that gets updated like this is a couple hundred thousand rows, with maybe a few thousand rows actually changing or being added with each load. Suggestions for alternative mechanisms for doing the loading are welcome. I'd really rather avoid updating every row in a several hundred thousand row table, especially without easy upsert functionality. The data is small enough that selecting everything and then comparing in memory before updating modified rows is doable, but sure seems like a lot of work if it can be avoided. Writing this caused me to think of a possible solution, which appears to work correctly, but I'd like to confirm it with folks in the know: Instead of this: CREATE TABLE mytable_temp...; COPY INTO mytable_temp...; BEGIN; DROP TABLE mytable; ALTER TABLE mytable_temp RENAME TO mytable; COMMIT; Which will cause any overlapping queries to pick up the wrong OID for mytable and then fail when the transaction commits, I tested this: COPY INTO mytable_temp; BEGIN; ALTER TABLE mytable RENAME TO mytable_old; ALTER TABLE mytable_temp RENAME TO mytable; COMMIT; DROP TABLE mytable_old; It would appear that any query that uses mytable which overlaps with the transaction will pick up the OID of the original mytable and then block until the transaction commits. WHen the transaction commits, those queries will successfully run against the original OID (no queries write to this table except for the bulk load) and will complete, at which time, the table drop will finally complete. Meanwhile, any queries which don't overlap (or perhaps any queries which start after the rename from mytable_temp to mytable has occurred) will successfully complete against the new table. The net result appears to be that I will no longer suffer the missing OID error, which seemed to periodically completely hose a db connection, requiring that the connection be closed since no subequent queries would ever succeed, whether they touched the table in question or not. I've only seen that erroneous behaviour on 8.3 (so far - we only recently upgraded to 8.4.4), but it was fairly mysterious because I've never been able to replicate it in testing. I could get a single missing OID error, but never one that would break all subsequent queries. Are my assumptions about this correct?
Re: [PERFORM] read only transactions
Chris Browne cbbro...@acm.org writes: jnelson+pg...@jamponi.net (Jon Nelson) writes: Are there any performance implications (benefits) to executing queries in a transaction where SET TRANSACTION READ ONLY; has been executed? Directly? No. Indirectly, well, a *leetle* bit... Transactions done READ ONLY do not generate actual XIDs, which reduces the amount of XID generation (pretty tautological!), which reduces the need to do VACUUM to protect against XID wraparound. You're right that a read-only transaction doesn't generate an XID. But that is not a function of whether you do SET TRANSACTION READ ONLY; it's a function of refraining from attempting any database changes. The SET might be useful for clarifying and enforcing your intent, but it's not a performance boost to use it, versus just doing the read-only transaction without it. Also, I believe that SET TRANSACTION READ ONLY isn't a hard read only restriction anyway --- it'll still allow writes to temp tables for example, which will cause assignment of an XID. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bulk load performance question
Samuel Gendler sgend...@ideasculptor.com writes: Is there some way to do the drop+rename in a manner which will preserve the OID or otherwise allow blocked queries to execute correctly once they unblock? No, but you could consider begin; truncate original_table; insert into original_table select * from new_data; commit; A secondary issue is that if permissions were granted to a role on the old table, the new table does not acquire those permissions and they must be granted again. Not to mention foreign keys ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
suggest that 99% instances of the select count(*) idiom are probably bad use of the SQL language. Well, suppose you paginate results. If the user sees that the search query returns 500 pages, there are two options : - you're google, and your sorting algorithms are so good that the answer the user wants is in the first page - or the user will refine his search by entering more keywords tu get a manageable result set So, in both cases, the count(*) was useless anyway. And the slowest ones are the most useless, since the user will immediatey discard the result and refine his query. If your full text search is slow, try Xapian or Lucene. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Tuesday 12 October 2010 07:19:57 you wrote: The biggest single problem with select count(*) is that it is seriously overused. People use that idiom to establish existence, which usually leads to a performance disaster in the application using it, unless the table has no more than few hundred records. SQL language, of which PostgreSQL offers an excellent implementation, offers [NOT] EXISTS clause since its inception in the Jurassic era. The problem is with the sequential scan, not with counting. I'd even go as far as to suggest that 99% instances of the select count(*) idiom are probably bad use of the SQL language. I agree, I have seen many very bad examples of using count(*). I will go so far as to question the use of count(*) in my examples here. It there a better way to come up with a page list than using count(*)? What is the best method to make a page of results and a list of links to other pages of results? Am I barking up the wrong tree here? One way I have dealt with this on very large tables is to cache the count(*) at the application level (using memcached, terracotta, or something along those lines) and then increment that cache whenever you add a row to the relevant table. On application restart that cache is re-initialized with a regular old count(*). This approach works really well and all large systems in my experience need caching in front of the DB eventually. If you have a simpler system with say a single application/web server you can simply store the value in a variable, the specifics would depend on the language and framework you are using. I use this method when ever possible. I talked about it in my first post. I generally keep a table around I call counts. It has many rows that store count numbers from frequently used views. The one that I can't do anything about is the case where you nave no control over the WHERE clause, (or where there may be simply too many options to count everything ahead of time without making things even slower). That is the point of this entire thread, or was... ;) -Neil- Another more all-DB approach is to create a statistics tables into which you place aggregated statistics rows (num deleted, num inserted, totals, etc) at an appropriate time interval in your code. So you have rows containing aggregated statistics information for the past and some tiny portion of the new data happening right now that hasn't yet been aggregated. Queries then look like a summation of the aggregated values in the statistics table plus a count(*) over just the newest portion of the data table and are generally very fast. Overall I have found that once things get big the layers of your app stack start to blend together and have to be combined in clever ways to keep speed up. Postgres is a beast but when you run into things it can't do well just find a way to cache it or make it work together with some other persistence tech to handle those cases. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
Pierre C wrote: Well, suppose you paginate results. If the user sees that the search query returns 500 pages, there are two options : With Google, I usually lose patience on the page 3. All that I, as an end user, need to know is whether there are more than 10 pages. The fact that there are 1776 pages in the result set is not particularly useful to me. I couldn't care less whether the number of returned pages is 1492, 1776 or 1861, I'm going to look at, at most, the first 5 of them. - you're google, and your sorting algorithms are so good that the answer the user wants is in the first page - or the user will refine his search by entering more keywords tu get a manageable result set So, in both cases, the count(*) was useless anyway. And the slowest ones are the most useless, since the user will immediatey discard the result and refine his query. If your full text search is slow, try Xapian or Lucene. May I also recommend Sphinx? It's a very nice text search engine, with the price equal to that of Lucene. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On Tuesday 12 October 2010 14:35:01 you wrote: suggest that 99% instances of the select count(*) idiom are probably bad use of the SQL language. Well, suppose you paginate results. If the user sees that the search query returns 500 pages, there are two options : - you're google, and your sorting algorithms are so good that the answer the user wants is in the first page - or the user will refine his search by entering more keywords tu get a manageable result set So, in both cases, the count(*) was useless anyway. And the slowest ones are the most useless, since the user will immediatey discard the result and refine his query. If your full text search is slow, try Xapian or Lucene. I guess I have to comment here again and point out that while I am having this issue with text searches, I avoid using count(*) in such cases, I just use next and previous links. Where the real problem (for me) is that when someone searches a date or time range. My application keeps track of huge amounts of realtime transactional data. So an administrator might want a report as to what some data point did yesterday between 3 and 4 PM. Under normal conditions the range of records that match can be between 0 and over 5,000. This is really killing me especially when the reporting people want a list of how many transactions each that were on points in a given zipcode had this morning between 8 and 9 AM, it takes about 5 minutes to run on a server that has enough ram to hold the entire table! Pseudo query: Show how many transactions per node in zipcode 92252 between 8:00 and 9:00 today: point_number | number_of_transactions 65889 | 31 34814 | 4865 28349 | 0 3358| 364 ... 24 total rows, 5 minutes. Then they want every node to be a link to a list of actual data within the specified timeframe. This is where I have to to the same query twice to first find out how many for the page links, then again to get a page of results. Sure, I could keep tables around that have numbers by the hour, minute, day or whatever to cache up results for speeding things, then the problem is that when the data is put into the server, there are so many statistics tables to update, the front end becomes a huge problem. Also, it makes for a huge mess of tables to think about when I need to make a report. -Neil- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
On 10/12/10 4:33 PM, Neil Whelchel wrote: On Tuesday 12 October 2010 08:39:19 Dan Harris wrote: On 10/11/10 8:02 PM, Scott Carey wrote: would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme file fragmentation over time and sequential scans end up mostly random. On-line file defrag helps tremendously. We just had a corrupt table caused by an XFS online defrag. I'm scared to use this again while the db is live. Has anyone else found this to be safe? But, I can vouch for the fragmentation issue, it happens very quickly in our system. -Dan I would like to know the details of what was going on that caused your problem. I have been using XFS for over 9 years, and it has never caused any trouble at all in a production environment. Sure, I had many problems with it on the test bench, but in most cases the issues were very clear and easy to avoid in production. There were some (older) XFS tools that caused some problems, but that is in the past, and as time goes on, it seems take less and less planning to make it work properly. -Neil- There were roughly 50 transactions/sec going on at the time I ran it. xfs_db reported 99% fragmentation before it ran ( we haven't been running it via cron ). The operation completed in about 15 minutes ( 360GB of used data on the file system ) with no errors. Everything seemed fine until the next morning when a user went to query a table we got a message about a missing file inside the pg cluster. We were unable to query the table at all via psql. It was a bit of a panic situation so we restored that table from backup immediately and the problem was solved without doing more research. This database has been running for years with no problem ( and none since ), that was the first time I tried to do an on-line defrag and that was the only unusual variable introduced into the system at that time so it was a strong enough correlation for me to believe that caused it. Hopefully this was just a corner case.. -Dan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query much faster with enable_seqscan=0
On Sep 21, 2010, at 6:30 PM, Tom Lane wrote: Ogden li...@darkstatic.com writes: SELECT tr.id, tr.sid FROM test_registration tr, INNER JOIN test_registration_result r on (tr.id = r.test_registration_id) WHERE. tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid GROUP BY tr.id, tr.sid Seeing that tr.id is a primary key, I think you might be a lot better off if you avoided the inner join and group by. I think what you really want here is something like SELECT tr.id, tr.sid FROM test_registration tr WHERE tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid AND EXISTS(SELECT 1 FROM test_registration_result r WHERE tr.id = r.test_registration_id) regards, tom lane Thank you for this suggestion, however, what if I wanted some columns from test_registration_result - this wouldn't work, for example if I wanted test_registration_result.answer to be fetched. Hence, I had to have a JOIN with test_registration_result and a GROUP BY. I still am not happy with my query - the EXISTS executes in great speed however I cannot retrieve any of the columns from that table. Thank you Ogden -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How does PG know if data is in memory?
Kevin Grittner wrote: ...Sybase named caches...segment off portions of the memory for specific caches... bind specific database objects (tables and indexes) to specific caches. ... When I posted to the list about it, the response was that LRU eviction was superior to any tuning any human would do. I didn't and don't believe that FWIW, the four main reasons for using it were: (1) Heavily used data could be kept fully cached in RAM... Lightly-used-but-important data seems like another use case. LRU's probably far better than me at optimizing for the total throughput and/or average response time. But if there's a requirement: Even though this query's very rare, it should respond ASAP, even at the expense of the throughput of the rest of the system. it sounds like this kind of hand-tuning might be useful. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance