Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
On Fri, Feb 19, 2016 at 12:05 PM, Alvaro Herrera wrote: > But those pages are supposed to be used as the index grows. So unless > they are forgotten by the FSM, they shouldn't accumulate. (Except where > the table doesn't grow but only shrinks, so there's no need for new > index pages, but I don't think that's an interesting case.) Sure. I'm talking about a narrow issue around how things are represented in pgstatindex() only. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
Peter Geoghegan wrote: > On Thu, Feb 18, 2016 at 4:53 PM, Tom Lane wrote: > >> there are usage patterns where half-dead pages might accumulate. > > > > Other than a usage pattern of "randomly SIGKILL backends every few > > seconds", I don't see how that would happen. > > I meant where pages could accumulate without being recycled. But those pages are supposed to be used as the index grows. So unless they are forgotten by the FSM, they shouldn't accumulate. (Except where the table doesn't grow but only shrinks, so there's no need for new index pages, but I don't think that's an interesting case.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
On Thu, Feb 18, 2016 at 4:53 PM, Tom Lane wrote: > Only a physical-order scan, ie vacuum, would visit a dead page > (ignoring transient corner cases like a page getting deleted while an > indexscan is in flight to it). So I think treating it as part of the > fragmentation measure is completely wrong: the point of that measure, > AFAICS, is to model how close an index-order traversal is to linear. > Half-dead pages are also normally very transient --- the only way they > persist is if there's a crash partway through a page deletion. So I think > it's appropriate to assume that future indexscans won't visit those, > either. Okay. >> there are usage patterns where half-dead pages might accumulate. > > Other than a usage pattern of "randomly SIGKILL backends every few > seconds", I don't see how that would happen. I meant where pages could accumulate without being recycled. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
Peter Geoghegan writes: > On Thu, Feb 18, 2016 at 4:15 PM, Tom Lane wrote: >> Because they've been removed from the right-link/left-link chains. > That isn't the same thing as being inaccessible by scans, clearly > (just what you call the "leaf scan sequence"). Only a physical-order scan, ie vacuum, would visit a dead page (ignoring transient corner cases like a page getting deleted while an indexscan is in flight to it). So I think treating it as part of the fragmentation measure is completely wrong: the point of that measure, AFAICS, is to model how close an index-order traversal is to linear. Half-dead pages are also normally very transient --- the only way they persist is if there's a crash partway through a page deletion. So I think it's appropriate to assume that future indexscans won't visit those, either. > there are usage patterns where half-dead pages might accumulate. Other than a usage pattern of "randomly SIGKILL backends every few seconds", I don't see how that would happen. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
On Thu, Feb 18, 2016 at 4:15 PM, Tom Lane wrote: > Because they've been removed from the right-link/left-link chains. That isn't the same thing as being inaccessible by scans, clearly (just what you call the "leaf scan sequence"). Besides, half-dead pages still have right-link/left-link chains, and there are usage patterns where half-dead pages might accumulate. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
Peter Geoghegan writes: > Why would dead pages not get traversed by scans? Because they've been removed from the right-link/left-link chains. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
On Thu, Feb 18, 2016 at 3:06 PM, Tom Lane wrote: > Don't see why; the documentation and field names clearly imply that those > numbers are accumulated only over leaf pages. Then I guess what might be lacking is a delineation of what a leaf page is for the purposes of pgstatindex(). > I certainly wouldn't expect > the fragmentation measure to include dead pages, for example, since they > would not get traversed by scans. (Whether the "rightlink points to a > higher page number" rule for fragmentation is actually very useful is a > separate question; but as long as that's the measure, only pages that > are part of the leaf scan sequence should count.) Why would dead pages not get traversed by scans? It's clear that they would be traversed to the extent that there will be buffer locking and inspection of the B-Tree special area flags to determine that the page should be ignored. >> Having looked at the 2008 commit d287818eb514d431 myself, ISTM >> that your intent might well have been to have that happen -- why else >> would any reasonable person have changed the order at all? > > My best guess is that I was thinking that the tests were independent Dunno. I'd have thought that the basic fact that the ordering had some significance was fairly obvious. One of the tests is "if (P_IGNORE(...))", and I can't imagine that not provoking thought. I'm particular about things like this because I care about making sure these tools are useful for teaching novice hackers about Postgres internals. I've made my point and will leave it at that, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
Peter Geoghegan writes: > I think that the P_ISLEAF() instrumentation of free space and > fragments might still need to happen for deleted and/or half dead > pages. Don't see why; the documentation and field names clearly imply that those numbers are accumulated only over leaf pages. I certainly wouldn't expect the fragmentation measure to include dead pages, for example, since they would not get traversed by scans. (Whether the "rightlink points to a higher page number" rule for fragmentation is actually very useful is a separate question; but as long as that's the measure, only pages that are part of the leaf scan sequence should count.) > Having looked at the 2008 commit d287818eb514d431 myself, ISTM > that your intent might well have been to have that happen -- why else > would any reasonable person have changed the order at all? My best guess is that I was thinking that the tests were independent, and rearranged them so that the most common case would be tested first. I'm quite sure I didn't intend to change the statistical behavior, else I would have updated docs and/or mentioned it in the commit message. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
On Thu, Feb 18, 2016 at 11:52 AM, Tom Lane wrote: > It looks like this was done correctly to begin with, and I broke it in > d287818eb514d431b1a68e1f3940cd958f82aa34. Not sure what I was thinking :-( I think that you might not have simply changed the order in a totally misguided way back in 2008, as you seem to imply. Consider what this block does following your commit just now: ... else if (P_IGNORE(opaque)) indexStat.empty_pages++;/* this is the "half dead" state */ else if (P_ISLEAF(opaque)) { int max_avail; max_avail = BLCKSZ - (BLCKSZ - ((PageHeader) page)->pd_special + SizeOfPageHeaderData); indexStat.max_avail += max_avail; indexStat.free_space += PageGetFreeSpace(page); indexStat.leaf_pages++; /* * If the next leaf is on an earlier block, it means a * fragmentation. */ if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno) indexStat.fragments++; } ... I think that the P_ISLEAF() instrumentation of free space and fragments might still need to happen for deleted and/or half dead pages. Having looked at the 2008 commit d287818eb514d431 myself, ISTM that your intent might well have been to have that happen -- why else would any reasonable person have changed the order at all? The avg_leaf_density and leaf_fragmentation fields might now be argued to misrepresent the true picture. This is not clearly a departure from their documented behavior, if only because the descriptions are almost the same as the names of the fields themselves. If you think that the instrumentation of free space is the most useful possible behavior as of today, which it might well be, then you might have clarified that this behavior was your intent in today's commit, for example by updating the descriptions of the fields avg_leaf_density and leaf_fragmentation in the docs. Just a thought. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
I wrote: > Peter Geoghegan writes: >> I think we should change it. It seems like a bug to me. > Me too. Is it enough bug-like to be something to back-patch, or should > we just change it in HEAD? Actually, there's a significantly worse bug here: I just realized that the page type tests are done in the wrong order. A deleted page that was formerly a leaf will be reported as though it was a live leaf page, because both the BTP_LEAF and BTP_DELETED flags are set for such a page. It looks like this was done correctly to begin with, and I broke it in d287818eb514d431b1a68e1f3940cd958f82aa34. Not sure what I was thinking :-( Anyway, I think that puts the final nail in the coffin of the idea that the current code's behavior is sane enough to preserve. I think we should fix all these things and back-patch 'em all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
I vote back patch. Subtle differences between the branches should be avoided. -- Peter Geoghegan
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
Peter Geoghegan writes: > On Thu, Feb 18, 2016 at 10:49 AM, Tom Lane wrote: >> Yeah, that seems a bit strange to me as well. Should we change it to >> count the root as an internal page, or is that going too far? > I think we should change it. It seems like a bug to me. Me too. Is it enough bug-like to be something to back-patch, or should we just change it in HEAD? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
On Thu, Feb 18, 2016 at 10:49 AM, Tom Lane wrote: > Yeah, that seems a bit strange to me as well. Should we change it to > count the root as an internal page, or is that going too far? I think we should change it. It seems like a bug to me. I've had the same point come up ("leaf-ness/internal-ness and root-ness are orthogonal") a couple of times with Heikki over the years. I just haven't used pgstattuple very much for some reason, and so didn't catch it before now. > Note that it's already the case that in a one-page index (root is also > a leaf), the root will be included in the leaf_pages count. So it > sure seems inconsistent that it's not included in the internal_pages > count when it's not a leaf. That's what I was thinking. > Well, actually, since we don't have write lock on the index it'd be > possible to see zero or multiple roots because the root's location > changes. That's already mentioned in the documentation, if somewhat > obliquely. Ah, yes. Another consequence of going in physical order. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
Peter Geoghegan writes: > That's odd. Having taken a quick look at pgstatindex_impl(), I dislike > that it counts the root separately from internal pages. That's not how > they're actually presented and understood in the code. Yeah, that seems a bit strange to me as well. Should we change it to count the root as an internal page, or is that going too far? Note that it's already the case that in a one-page index (root is also a leaf), the root will be included in the leaf_pages count. So it sure seems inconsistent that it's not included in the internal_pages count when it's not a leaf. > It's also possible to have more than one root page, since we do a scan > in physical order, which the code considers. There could be a fast > root and a true root. Looks like one is counted as deleted, though: Well, actually, since we don't have write lock on the index it'd be possible to see zero or multiple roots because the root's location changes. That's already mentioned in the documentation, if somewhat obliquely. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
On Thu, Feb 18, 2016 at 10:01 AM, Tom Lane wrote: > Yeah ... the numbers already appear not to add up because the root page > is counted in index_size but not any other column, so there's already > something worthy of explanation there. Maybe something like "The reported > index_size will normally correspond to two more pages than are accounted > for by internal_pages + leaf_pages + empty_pages + deleted_pages, because > it also includes the index's metapage and root page". That's odd. Having taken a quick look at pgstatindex_impl(), I dislike that it counts the root separately from internal pages. That's not how they're actually presented and understood in the code. It's also possible to have more than one root page, since we do a scan in physical order, which the code considers. There could be a fast root and a true root. Looks like one is counted as deleted, though: regression=# select index_size/2^13 as total_pages, root_block_no, internal_pages, leaf_pages, deleted_pages from pgstatindex('btree_tall_idx'); total_pages │ root_block_no │ internal_pages │ leaf_pages │ deleted_pages ─┼───┼┼┼─── 206 │81 │ 3 │160 │42 (1 row) BTW, I am actively working on the amcheck B-Tree checker tool, and hope to post it soon. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
Peter Geoghegan writes: > On Thu, Feb 18, 2016 at 9:40 AM, Tom Lane wrote: >> I think this is a bug and we ought to fix the code to include the >> metapage in the reported index_size. Thoughts? > I tend to agree, but I think you should note that specifically in the > documentation. I'm in favor of tools like pgstattuple and pageinspect > going into this kind of detail in their documentation generally. Yeah ... the numbers already appear not to add up because the root page is counted in index_size but not any other column, so there's already something worthy of explanation there. Maybe something like "The reported index_size will normally correspond to two more pages than are accounted for by internal_pages + leaf_pages + empty_pages + deleted_pages, because it also includes the index's metapage and root page". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
On Thu, Feb 18, 2016 at 9:40 AM, Tom Lane wrote: > I think this is a bug and we ought to fix the code to include the > metapage in the reported index_size. Thoughts? I tend to agree, but I think you should note that specifically in the documentation. I'm in favor of tools like pgstattuple and pageinspect going into this kind of detail in their documentation generally. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?
=?UTF-8?B?5aSn5aGa5oay5Y+4?= writes: > It seems that description of index_size that is returned by pgstatindex() is > wrong. > The document says "Total number of pages in index". > But it looks like the number of bytes is stored in index_size. Hmm, you're right, because what the code does is values[j++] = psprintf(INT64_FORMAT, (indexStat.root_pages + indexStat.leaf_pages + indexStat.internal_pages + indexStat.deleted_pages + indexStat.empty_pages) * BLCKSZ); so the result is clearly measured in bytes not pages. I was going to just change the docs to say "Total index size in bytes", but then I noticed that this number is not that either: it does not count the metapage and therefore is BLCKSZ less than the true file size. As an example: regression=# select * from pgstatindex('tenk1_unique1'); version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation -+++---++--- -+-+---+--+ 2 | 1 | 237568 | 3 | 0 | 28 | 0 | 0 |87.91 | 0 (1 row) regression=# select relfilenode from pg_class where relname = 'tenk1_unique1'; relfilenode - 27449 (1 row) $ ls -l 27449 -rw---. 1 postgres postgres 245760 Feb 17 18:58 27449 I think this is a bug and we ought to fix the code to include the metapage in the reported index_size. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers