Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
Le mercredi 22 février 2012 20:12:35, Pavel Stehule a écrit : > 2012/2/22 Kevin Grittner : > > Pavel Stehule wrote: > >> usual pattern in our application is > >> > >> create table xx1 as select > >> analyze xx1 > >> create table xx2 as select from xx1, > >> analyze xx2 > >> create table xx3 as select ... from xx3, > >> analyze xx3 > >> create table xx4 as select ... from xx1, ... > >> > >> tables xx** are use as cache. > >> > >> so we have to refresh statistic early. > >> > >> in this situation - and I found so in this case VACUUM ANALYZE is > >> faster (30%) than ANALYZE. Size of xx** is usually between 500Kb > >> and 8Kb > >> > >> This is not usual pattern for OLTP - Application is strictly OLAP. > > > > Is the VACUUM ANALYZE step faster, or is the overall job faster if > > VACUUM ANALYZE is run? You may be running into the need to rewrite > > pages at an inopportune time or order without the VACUUM. Have you > > tried getting a time VACUUM FREEZE ANALYZE on these cache tables > > instead of plain VACUUM ANALYZE? > > > > -Kevin > > vacuum freeze analyze is slower as expected. vacuum analyze is little > bit faster or same in any step then analyze. > > I expected so just analyze should be significantly faster and it is not. > > Tom's demonstration is enough for me. ANALYZE doesn't read complete > table, but uses random IO. VACUUM ANALYZE reads complete table, but it > uses seq IO and vacuum is fast (because it does nothing) in our case. VACUUM does read the 1st block to be sure readahead is done when ANALYSE does not. For ANALYZE, maybe it is interesting to issue a read on the first block or use POSIX_FADVISE to (try) to force a readahead of the table when it is small enough (so ANALYSE can start working while blocks are read and put in cache). That's being said, I am surprised that the pattern "create table...analyze create table analyze" of such smalls ones make the data being flush from OS cache so quickly that they need to be read again from disk. Pavel, can you check the cache status of the tables just before the analyze ? (you can use OS tools or pgfincore extension for that) -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
Le mercredi 22 février 2012 20:12:35, Pavel Stehule a écrit : > 2012/2/22 Kevin Grittner : > > Pavel Stehule wrote: > >> usual pattern in our application is > >> > >> create table xx1 as select > >> analyze xx1 > >> create table xx2 as select from xx1, > >> analyze xx2 > >> create table xx3 as select ... from xx3, > >> analyze xx3 > >> create table xx4 as select ... from xx1, ... > >> > >> tables xx** are use as cache. > >> > >> so we have to refresh statistic early. > >> > >> in this situation - and I found so in this case VACUUM ANALYZE is > >> faster (30%) than ANALYZE. Size of xx** is usually between 500Kb > >> and 8Kb > >> > >> This is not usual pattern for OLTP - Application is strictly OLAP. > > > > Is the VACUUM ANALYZE step faster, or is the overall job faster if > > VACUUM ANALYZE is run? You may be running into the need to rewrite > > pages at an inopportune time or order without the VACUUM. Have you > > tried getting a time VACUUM FREEZE ANALYZE on these cache tables > > instead of plain VACUUM ANALYZE? > > > > -Kevin > > vacuum freeze analyze is slower as expected. vacuum analyze is little > bit faster or same in any step then analyze. > > I expected so just analyze should be significantly faster and it is not. > > Tom's demonstration is enough for me. ANALYZE doesn't read complete > table, but uses random IO. VACUUM ANALYZE reads complete table, but it > uses seq IO and vacuum is fast (because it does nothing) in our case. VACUUM does read the 1st block to be sure readahead is done when ANALYSE does not. For ANALYZE, maybe it is interesting to issue a read on the first block or use POSIX_FADVISE to (try) to force a readahead of the table when it is small enough (so ANALYSE can start working while blocks are read and put in cache). That's being said, I am surprised that the pattern "create table...analyze create table analyze" of such smalls ones make the data being flush from OS cache so quickly that they need to be read again from disk. Pavel, can you check the cache status of the tables just before the analyze ? (you can use OS tools or pgfincore extension for that) -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Thu, Feb 23, 2012 at 3:34 AM, Simon Riggs wrote: > On Wed, Feb 22, 2012 at 10:02 PM, Robert Haas wrote: >> On Wed, Feb 22, 2012 at 2:23 PM, Simon Riggs wrote: >>> The industry accepted description for non-sequential access is "random >>> access" whether or not the function that describes the movement is >>> entirely random. To argue otherwise is merely hairsplitting. >> >> I don't think so. > > PostgreSQL already uses a parameter called "random_page_cost" to > describe non-sequential access. Perhaps that is wrong and we need a > third parameter? > >> For example, a bitmap index scan contrives to speed >> things up by arranging for the table I/O to happen in ascending block >> number order, with skips, rather than in random order, as a plain >> index scan would do, and that seems to be a pretty effective >> technique. Except to the extent that it interferes with the kernel's >> ability to do readahead, it really can't be to read blocks 1, 2, 3, 4, >> and 5 than to read blocks 1, 2, 4, and 5. Not reading block 3 can't >> require more effort than reading it. > > By that argument, ANALYZE never could run longer than VACUUM ANALYZE, > so you disagree with Tom and I and you can't explain Pavel's > results > > cost_bitmap_heap_scan() uses "random_page_cost" to evaluate the cost > of accessing blocks, even though the author knew the access was in > ascending block number order. Why was that? > > Note that the cost_bitmap_heap_scan() cost can be > than > cost-seqscan() for certain parameter values. I think all three of us are saying more or less the same thing in slightly different words, so I'd rather not have an argument about this one. But you're right: I can't explain Pavel's results, unless doing ANALYZE before VACUUM is causing skip-block reads that defeat the kernel's read-ahead detection. I think it's fairly self-evident that reading a fixed-size subset of the pages in ascending order can't *in general* be more expensive than reading all of an arbitrarily large table, and so I believe we're all in agreement that the behavior he observed is unusual. As to the cost estimation stuff, we use random_page_cost as an approximation: there may be a head seek involved, but to do better we'd have to estimate the likely length of the seek based on the number of blocks skipped, something we currently view as irrelevant, and it's not clear that it would improve the quality of the estimate very much - there are other, probably larger sources of error, such as the fact that the sequential logical block number doesn't imply sequential physical position on the platter, since the OS often fragments the file, especially (I think) on Windows. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Wed, Feb 22, 2012 at 10:02 PM, Robert Haas wrote: > On Wed, Feb 22, 2012 at 2:23 PM, Simon Riggs wrote: >> The industry accepted description for non-sequential access is "random >> access" whether or not the function that describes the movement is >> entirely random. To argue otherwise is merely hairsplitting. > > I don't think so. PostgreSQL already uses a parameter called "random_page_cost" to describe non-sequential access. Perhaps that is wrong and we need a third parameter? > For example, a bitmap index scan contrives to speed > things up by arranging for the table I/O to happen in ascending block > number order, with skips, rather than in random order, as a plain > index scan would do, and that seems to be a pretty effective > technique. Except to the extent that it interferes with the kernel's > ability to do readahead, it really can't be to read blocks 1, 2, 3, 4, > and 5 than to read blocks 1, 2, 4, and 5. Not reading block 3 can't > require more effort than reading it. By that argument, ANALYZE never could run longer than VACUUM ANALYZE, so you disagree with Tom and I and you can't explain Pavel's results cost_bitmap_heap_scan() uses "random_page_cost" to evaluate the cost of accessing blocks, even though the author knew the access was in ascending block number order. Why was that? Note that the cost_bitmap_heap_scan() cost can be > than cost-seqscan() for certain parameter values. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Wed, Feb 22, 2012 at 2:23 PM, Simon Riggs wrote: > The industry accepted description for non-sequential access is "random > access" whether or not the function that describes the movement is > entirely random. To argue otherwise is merely hairsplitting. I don't think so. For example, a bitmap index scan contrives to speed things up by arranging for the table I/O to happen in ascending block number order, with skips, rather than in random order, as a plain index scan would do, and that seems to be a pretty effective technique. Except to the extent that it interferes with the kernel's ability to do readahead, it really can't be to read blocks 1, 2, 3, 4, and 5 than to read blocks 1, 2, 4, and 5. Not reading block 3 can't require more effort than reading it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Wed, Feb 22, 2012 at 3:29 PM, Tom Lane wrote: > Simon Riggs writes: >> On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule >> wrote: >>> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I >>> expected so ANALYZE should be faster then VACUUM ANALYZE. > >> VACUUM ANALYZE scans the whole table sequentially. > >> ANALYZE accesses a random sample of data blocks. Random access is >> slower than sequential access, so at some threshold of sample size and >> sequential/random I/O speed ratio ANALYZE could become slower. > > That analysis is entirely wrong. In the first place, although ANALYZE > doesn't read all the blocks, what it does read it reads in block number > order. So it's not like there are "random" seeks all over the disk that > would not need to happen anyway. Entirely right it would seem, since your later comments match my own. The industry accepted description for non-sequential access is "random access" whether or not the function that describes the movement is entirely random. To argue otherwise is merely hairsplitting. The disk access is not-sequential for ANALYZE. Not-sequential access is slower on some hardware, and so given a large enough sample it can account for the observed difference. Additional access to the disk while the ANALYZE was running would actually make it fully random, if anyone really cares. > If the filesystem is hugely biased towards sequential I/O for some > reason, and the VACUUM scan causes the whole table to become resident in > RAM where ANALYZE can read it "for free", then I guess it might be > possible to arrive at Pavel's result. But it would be an awfully narrow > corner case. I cannot believe that his statement is true in general, > or even for a noticeably large fraction of cases. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
2012/2/22 Kevin Grittner : > Pavel Stehule wrote: > >> usual pattern in our application is >> >> create table xx1 as select >> analyze xx1 >> create table xx2 as select from xx1, >> analyze xx2 >> create table xx3 as select ... from xx3, >> analyze xx3 >> create table xx4 as select ... from xx1, ... >> >> tables xx** are use as cache. >> >> so we have to refresh statistic early. >> >> in this situation - and I found so in this case VACUUM ANALYZE is >> faster (30%) than ANALYZE. Size of xx** is usually between 500Kb >> and 8Kb >> >> This is not usual pattern for OLTP - Application is strictly OLAP. > > Is the VACUUM ANALYZE step faster, or is the overall job faster if > VACUUM ANALYZE is run? You may be running into the need to rewrite > pages at an inopportune time or order without the VACUUM. Have you > tried getting a time VACUUM FREEZE ANALYZE on these cache tables > instead of plain VACUUM ANALYZE? > > -Kevin vacuum freeze analyze is slower as expected. vacuum analyze is little bit faster or same in any step then analyze. I expected so just analyze should be significantly faster and it is not. Tom's demonstration is enough for me. ANALYZE doesn't read complete table, but uses random IO. VACUUM ANALYZE reads complete table, but it uses seq IO and vacuum is fast (because it does nothing) in our case. Thank You Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
Pavel Stehule wrote: > usual pattern in our application is > > create table xx1 as select > analyze xx1 > create table xx2 as select from xx1, > analyze xx2 > create table xx3 as select ... from xx3, > analyze xx3 > create table xx4 as select ... from xx1, ... > > tables xx** are use as cache. > > so we have to refresh statistic early. > > in this situation - and I found so in this case VACUUM ANALYZE is > faster (30%) than ANALYZE. Size of xx** is usually between 500Kb > and 8Kb > > This is not usual pattern for OLTP - Application is strictly OLAP. Is the VACUUM ANALYZE step faster, or is the overall job faster if VACUUM ANALYZE is run? You may be running into the need to rewrite pages at an inopportune time or order without the VACUUM. Have you tried getting a time VACUUM FREEZE ANALYZE on these cache tables instead of plain VACUUM ANALYZE? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
2012/2/22 Robert Haas : > On Wed, Feb 22, 2012 at 8:13 AM, Nicolas Barbier > wrote: >> 2012/2/22 Robert Haas : >> >>> On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule >>> wrote: >>> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I expected so ANALYZE should be faster then VACUUM ANALYZE. But is not true. Why? >>> >>> I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in >>> general, because VACUUM has to scan the whole table, and ANALYZE only >>> a fixed-size subset of its pages. >> >> It sounds like you just said the opposite of what you wanted to say. > > Yeah, I did. Woops. Let me try that again: > > ANALYZE should be faster; reads only some pages. > > VACUUM ANALYZE should be slower; reads them all. > > Dunno why Pavel's seeing the opposite without more info. usual pattern in our application is create table xx1 as select analyze xx1 create table xx2 as select from xx1, analyze xx2 create table xx3 as select ... from xx3, analyze xx3 create table xx4 as select ... from xx1, ... tables xx** are use as cache. so we have to refresh statistic early. in this situation - and I found so in this case VACUUM ANALYZE is faster (30%) than ANALYZE. Size of xx** is usually between 500Kb and 8Kb This is not usual pattern for OLTP - Application is strictly OLAP. Regards Pavel > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On 2012-02-22 16:29, Tom Lane wrote: (Snip context) VACUUM ANALYZE consists of two separate passes, VACUUM and then ANALYZE, and the second pass is going to be "random" I/O by your definition no matter what. I don't suppose there's a case where the VACUUM (1) gets to delete lots and lots of rows that then don't need ANALYZE'ing, and (2) can do so without actually touching all those pages? Jeroen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Wed, Feb 22, 2012 at 10:29:56AM -0500, Tom Lane wrote: > Simon Riggs writes: > > On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule > > wrote: > >> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I > >> expected so ANALYZE should be faster then VACUUM ANALYZE. > > > VACUUM ANALYZE scans the whole table sequentially. > > > ANALYZE accesses a random sample of data blocks. Random access is > > slower than sequential access, so at some threshold of sample size and > > sequential/random I/O speed ratio ANALYZE could become slower. > > That analysis is entirely wrong. In the first place, although ANALYZE > doesn't read all the blocks, what it does read it reads in block number > order. So it's not like there are "random" seeks all over the disk that > would not need to happen anyway. In the second place, VACUUM ANALYZE > consists of two separate passes, VACUUM and then ANALYZE, and the second > pass is going to be "random" I/O by your definition no matter what. > > If the filesystem is hugely biased towards sequential I/O for some > reason, and the VACUUM scan causes the whole table to become resident in > RAM where ANALYZE can read it "for free", then I guess it might be > possible to arrive at Pavel's result. But it would be an awfully narrow > corner case. I cannot believe that his statement is true in general, > or even for a noticeably large fraction of cases. > > regards, tom lane > Wouldn't a full sequential scan trigger the kernel read-ahead, which might not trigger for the analyze block reads, even though they are in order? That could account for the observation. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
Simon Riggs writes: > On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule > wrote: >> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I >> expected so ANALYZE should be faster then VACUUM ANALYZE. > VACUUM ANALYZE scans the whole table sequentially. > ANALYZE accesses a random sample of data blocks. Random access is > slower than sequential access, so at some threshold of sample size and > sequential/random I/O speed ratio ANALYZE could become slower. That analysis is entirely wrong. In the first place, although ANALYZE doesn't read all the blocks, what it does read it reads in block number order. So it's not like there are "random" seeks all over the disk that would not need to happen anyway. In the second place, VACUUM ANALYZE consists of two separate passes, VACUUM and then ANALYZE, and the second pass is going to be "random" I/O by your definition no matter what. If the filesystem is hugely biased towards sequential I/O for some reason, and the VACUUM scan causes the whole table to become resident in RAM where ANALYZE can read it "for free", then I guess it might be possible to arrive at Pavel's result. But it would be an awfully narrow corner case. I cannot believe that his statement is true in general, or even for a noticeably large fraction of cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule wrote: > I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I > expected so ANALYZE should be faster then VACUUM ANALYZE. VACUUM ANALYZE scans the whole table sequentially. ANALYZE accesses a random sample of data blocks. Random access is slower than sequential access, so at some threshold of sample size and sequential/random I/O speed ratio ANALYZE could become slower. So it depends upon the hardware and the setting of stats_target. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Wed, Feb 22, 2012 at 8:13 AM, Nicolas Barbier wrote: > 2012/2/22 Robert Haas : > >> On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule >> wrote: >> >>> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I >>> expected so ANALYZE should be faster then VACUUM ANALYZE. >>> >>> But is not true. Why? >> >> I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in >> general, because VACUUM has to scan the whole table, and ANALYZE only >> a fixed-size subset of its pages. > > It sounds like you just said the opposite of what you wanted to say. Yeah, I did. Woops. Let me try that again: ANALYZE should be faster; reads only some pages. VACUUM ANALYZE should be slower; reads them all. Dunno why Pavel's seeing the opposite without more info. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
2012/2/22 Robert Haas : > On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule > wrote: > >> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I >> expected so ANALYZE should be faster then VACUUM ANALYZE. >> >> But is not true. Why? > > I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in > general, because VACUUM has to scan the whole table, and ANALYZE only > a fixed-size subset of its pages. It sounds like you just said the opposite of what you wanted to say. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?
On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule wrote: > I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I > expected so ANALYZE should be faster then VACUUM ANALYZE. > > But is not true. Why? I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in general, because VACUUM has to scan the whole table, and ANALYZE only a fixed-size subset of its pages. Not sure what's happening in your particular case... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers