Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Mon, Jul 15, 2013 at 1:41 PM, Jeff Davis pg...@j-davis.com wrote: I'm of the opinion that we ought to extract the parts of the patch that hold the VM pin for longer, review those separately, and if they're good and desirable, apply them. I'm confused. My patch holds a VM page pinned for those cases where PD_ALL_VISIBLE is currently used -- scans or insert/update/delete. If we have PD_ALL_VISIBLE, there's no point in the cache, right? Hmm. You might be right. I thought there might be some benefit there, but I guess we're not going to clear the bit more than once, so maybe not. To check visibility from an index scan, you either need to pin a heap page or a VM page. Why would checking the heap page be cheaper? Is it just other code in the VM-testing path that's slower? Or is there concurrency involved in your measurements which may indicate contention over VM pages? Well, I have seen some data that hints at contention problems with VM pages, but it's not conclusive. But the real issue is that, if the index-only scan finds the VM page not set, it still has to check the heap page. Thus, it ends up accessing two pages rather than one, and it turns out that's more expensive. It's unfortunately hard to predict whether the cost of checking VM first will pay off. It's a big win if we learn that we don't need to look at the heap page (because we don't need to read, lock, pin, or examine it, in that case) but it's a loss if we do (because checking the VM isn't free). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Sun, 2013-07-14 at 23:06 -0400, Robert Haas wrote: Of course, there's a reason that PD_ALL_VISIBLE is not like a normal hint: we need to make sure that inserts/updates/deletes clear the VM bit. But my patch already addresses that by keeping the VM page pinned. I'm of the opinion that we ought to extract the parts of the patch that hold the VM pin for longer, review those separately, and if they're good and desirable, apply them. I'm confused. My patch holds a VM page pinned for those cases where PD_ALL_VISIBLE is currently used -- scans or insert/update/delete. If we have PD_ALL_VISIBLE, there's no point in the cache, right? I am not convinced. I thought about the problem of repeatedly switching pinned VM pages during the index-only scans work, and decided that we could live with it because, if the table was large enough that we were pinning VM pages frequently, we were also avoiding I/O. Of course, this is a logical fallacy, since the table could easily be large enough to have quite a few VM pages and yet small enough to fit in RAM. And, indeed, at least in the early days, an index scan could beat out an index-only scan by a significant margin on a memory-resident table, precisely because of the added cost of the VM lookups. I haven't benchmarked lately so I don't know for sure whether that's still the case, but I bet it is. To check visibility from an index scan, you either need to pin a heap page or a VM page. Why would checking the heap page be cheaper? Is it just other code in the VM-testing path that's slower? Or is there concurrency involved in your measurements which may indicate contention over VM pages? I think this idea is worth exploring, although I fear the overhead is likely to be rather large. We could find out, though. Suppose we simply change XLOG_HEAP2_VISIBLE to emit FPIs for the heap pages; how much does that slow down vacuuming a large table into which many pages have been bulk loaded? Sadly, I bet it's rather a lot, but I'd like to be wrong. My point was that, if freezing needs to emit an FPI anyway, and we combine freezing and PD_ALL_VISIBLE, then using WAL properly wouldn't cost us anything. Whether that makes sense depends on what other combination of proposals makes it in, of course. I agree that we don't want to start adding FPIs unnecessarily. Anyway, thanks for the feedback. Moved out of this 'fest. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Fri, Jul 5, 2013 at 4:18 PM, Jeff Davis pg...@j-davis.com wrote: On Tue, 2013-07-02 at 10:12 -0700, Jeff Davis wrote: Regardless, this is at least a concrete issue that I can focus on, and I appreciate that. Are scans of small tables the primary objection to this patch, or are there others? If I solve it, will this patch make real progress? I had an idea here: What if we keep PD_ALL_VISIBLE, but make it more like other hints, and make it optional? If a page is all visible, either or both of the VM bit or PD_ALL_VISIBLE could be set (please suspend disbelief for a moment). Then, we could use a heuristic, like setting PD_ALL_VISIBLE in the first 256 pages of a relation; but in later pages, only setting it if the page is already dirty for some other reason. That has the following benefits: 1. Eliminates the worry over contention related to scans, because we wouldn't need to use the VM for small tables. And I don't think anyone was worried about using the VM on a large table scan. 2. Still avoids dirtying lots of pages after a data load. I'm not worried if a few MB of data is rewritten on a large table. 3. Eliminates the complex way in which we (ab)use WAL and the recovery mechanism to keep PD_ALL_VISIBLE and the VM bit in sync. Of course, there's a reason that PD_ALL_VISIBLE is not like a normal hint: we need to make sure that inserts/updates/deletes clear the VM bit. But my patch already addresses that by keeping the VM page pinned. I'm of the opinion that we ought to extract the parts of the patch that hold the VM pin for longer, review those separately, and if they're good and desirable, apply them. Although that optimization becomes more necessary if we were to adopt your proposal than it is now, it's really separate from this patch. Given that VM pin caching can be done with or without removing PD_ALL_VISIBLE, it seems to me that the fair comparison is between master + VM pin caching and master + VM pin caching + remove PD_ALL_VISIBLE. Comparing the latter vs. unpatched master seems to me to be confusing the issue. That has some weaknesses: as Heikki pointed out[1], you can defeat the cache of the pin by randomly seeking between 512MB regions during an update (would only be noticable if it's all in shared buffers already, of course). But even in that case, it was a fairly modest degradation (20%), so overall this seems like a fairly minor drawback. I am not convinced. I thought about the problem of repeatedly switching pinned VM pages during the index-only scans work, and decided that we could live with it because, if the table was large enough that we were pinning VM pages frequently, we were also avoiding I/O. Of course, this is a logical fallacy, since the table could easily be large enough to have quite a few VM pages and yet small enough to fit in RAM. And, indeed, at least in the early days, an index scan could beat out an index-only scan by a significant margin on a memory-resident table, precisely because of the added cost of the VM lookups. I haven't benchmarked lately so I don't know for sure whether that's still the case, but I bet it is. From your other email: I have a gut feeling that the complexity we go through to maintain PD_ALL_VISIBLE is unnecessary and will cause us problems later. If we could combine freezing and marking all-visible, and use WAL for PD_ALL_VISIBLE in a normal fashion, then I'd be content with that. I think this idea is worth exploring, although I fear the overhead is likely to be rather large. We could find out, though. Suppose we simply change XLOG_HEAP2_VISIBLE to emit FPIs for the heap pages; how much does that slow down vacuuming a large table into which many pages have been bulk loaded? Sadly, I bet it's rather a lot, but I'd like to be wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Tue, 2013-07-02 at 10:12 -0700, Jeff Davis wrote: Regardless, this is at least a concrete issue that I can focus on, and I appreciate that. Are scans of small tables the primary objection to this patch, or are there others? If I solve it, will this patch make real progress? I had an idea here: What if we keep PD_ALL_VISIBLE, but make it more like other hints, and make it optional? If a page is all visible, either or both of the VM bit or PD_ALL_VISIBLE could be set (please suspend disbelief for a moment). Then, we could use a heuristic, like setting PD_ALL_VISIBLE in the first 256 pages of a relation; but in later pages, only setting it if the page is already dirty for some other reason. That has the following benefits: 1. Eliminates the worry over contention related to scans, because we wouldn't need to use the VM for small tables. And I don't think anyone was worried about using the VM on a large table scan. 2. Still avoids dirtying lots of pages after a data load. I'm not worried if a few MB of data is rewritten on a large table. 3. Eliminates the complex way in which we (ab)use WAL and the recovery mechanism to keep PD_ALL_VISIBLE and the VM bit in sync. Of course, there's a reason that PD_ALL_VISIBLE is not like a normal hint: we need to make sure that inserts/updates/deletes clear the VM bit. But my patch already addresses that by keeping the VM page pinned. That has some weaknesses: as Heikki pointed out[1], you can defeat the cache of the pin by randomly seeking between 512MB regions during an update (would only be noticable if it's all in shared buffers already, of course). But even in that case, it was a fairly modest degradation (20%), so overall this seems like a fairly minor drawback. Thoughts? Regards, Jeff Davis [1] http://www.postgresql.org/message-id/50fd11c5.1030...@vmware.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On 2013-07-01 19:52:57 -0700, Jeff Davis wrote: 2. Can you be more specific about the scenarios that you *are* concerned about? Preferably in a form that could be tested on a 64-core box; but at least some kind of analysis involving numbers. More page accesses is scary, but completely meaningless without saying how *many* more and in which situations. Ok, so you want some benchmark results. I spent 20 minutes concocting some quick tests. Here you go: master (384f933046dc9e9a2b416f5f7b3be30b93587c63): tps = 155075.448341 (including connections establishing) tps = 155259.752267 (excluding connections establishing) dev (384f933046dc9e9a2b416f5f7b3be30b93587c63 + patch): tps = 151450.387021 (including connections establishing) tps = 152512.741161 (excluding connections establishing) That's about a 3% regression. Testsetup: ~/build/postgres/{master,dev}-optimize/vpath/src/backend/postgres \ -D /srv/dev/pdav-{master,dev}/ \ -c shared_buffers=1GB -c max_connections=150 Data loaded: load.sql. Test run: SELECT key, data FROM kv WHERE key = 'key-17'; Test: pgbench postgres -n -S -M prepared -f /tmp/query.sql -T 100 -c 100 -j 100 So basically we're just scanning a smalling relation that's all visible rather frequently. With lookup tables - that might even be accessed in a correlated subquery - that's not a unrealistic scenario. I am pretty sure it's not all that hard to create a test where the loss is bigger due to the loss of all_visible on small relations (the SCAN_VM_THRESHOLD thing). I am not sure whether that's big enough to say the idea of SCAN_VM_THRESHOLD is dead, but it's not small enough to dismiss either. So, running the same test with 'kv' having 36 pages/5500 tuples instead of just 1 page/100 tuples: master: tps = 171260.444722 (including connections establishing) tps = 173335.031802 (excluding connections establishing) dev: tps = 170809.702066 (including connections establishing) tps = 171730.291712 (excluding connections establishing) ~1% With SELECT count(*) FROM kv; master: tps = 13740.652186 (including connections establishing) tps = 13779.507250 (excluding connections establishing) dev: tps = 13409.639239 (including connections establishing) tps = 13466.905051 (excluding connections establishing) ~2%. All that isn't a too big regression, but it shows that this isn't free lunch either. Would be interesting to see whether that shows up worse on bigger hardware than mine (2 x E5520). Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On 2013-07-02 14:02:22 +0200, Andres Freund wrote: Data loaded: load.sql. Attached... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services DROP TABLE IF EXISTS kv; CREATE TABLE kv( id serial primary key, key text NOT NULL UNIQUE, data text NOT NULL ); INSERT INTO kv(key, data) SELECT 'key-'||g.i::text, 'abcdefg' FROM generate_series(1, 100) g(i); VACUUM (ANALYZE, VERBOSE, FREEZE) kv; VACUUM (ANALYZE, VERBOSE) kv; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Tue, 2013-07-02 at 14:02 +0200, Andres Freund wrote: Ok, so you want some benchmark results. I spent 20 minutes concocting some quick tests. Here you go: master (384f933046dc9e9a2b416f5f7b3be30b93587c63): tps = 155075.448341 (including connections establishing) tps = 155259.752267 (excluding connections establishing) dev (384f933046dc9e9a2b416f5f7b3be30b93587c63 + patch): tps = 151450.387021 (including connections establishing) tps = 152512.741161 (excluding connections establishing) That's about a 3% regression. I had a little trouble reproducing this result on my workstation, and my previous tests on the 64-core box didn't seem to show a difference (although I didn't spend a lot of time on it, so perhaps I could try again). I did see some kind of difference, I think. But the fastest run without the patch beat the slowest run with the patch by about 1.4%. The delta generally seemed closer to 0.5%. The noise seemed to be around 2.6%. Why did you do this as a concurrent test? The difference between reading hints and PD_ALL_VISIBLE doesn't seem to have much to do with concurrency. Regardless, this is at least a concrete issue that I can focus on, and I appreciate that. Are scans of small tables the primary objection to this patch, or are there others? If I solve it, will this patch make real progress? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On 2013-07-02 10:12:31 -0700, Jeff Davis wrote: On Tue, 2013-07-02 at 14:02 +0200, Andres Freund wrote: Ok, so you want some benchmark results. I spent 20 minutes concocting some quick tests. Here you go: master (384f933046dc9e9a2b416f5f7b3be30b93587c63): tps = 155075.448341 (including connections establishing) tps = 155259.752267 (excluding connections establishing) dev (384f933046dc9e9a2b416f5f7b3be30b93587c63 + patch): tps = 151450.387021 (including connections establishing) tps = 152512.741161 (excluding connections establishing) That's about a 3% regression. I had a little trouble reproducing this result on my workstation, and my previous tests on the 64-core box didn't seem to show a difference (although I didn't spend a lot of time on it, so perhaps I could try again). I did see some kind of difference, I think. But the fastest run without the patch beat the slowest run with the patch by about 1.4%. The delta generally seemed closer to 0.5%. The noise seemed to be around 2.6%. It was more reliably reproduceable here, I ran every test 5 times and the differences between runs weren't big. But I wouldn't be surprised if it's dependent on the exact CPU. Why did you do this as a concurrent test? The difference between reading hints and PD_ALL_VISIBLE doesn't seem to have much to do with concurrency. Primarily because I didn't spend too much time on it and just wanted to get a feeling for things. I originally wanted to check how much the additional pin/buffer reference on a small table (i.e. ~33+ pages) is noticeable, but noticed this before. Also, cache issues are often easier to notice in concurrent scenarios where the CPUs are overcommitted since increased cache usage shows up more prominently and intelligence on the cpu level can save less. Regardless, this is at least a concrete issue that I can focus on, and I appreciate that. Are scans of small tables the primary objection to this patch, or are there others? If I solve it, will this patch make real progress? Well, I still have my doubts that it's a good idea to remove this knowledge from the page level. And that's not primarily related to performance. Unfortunately a good part of judging architectural questions is gut feeling... The only real argument for the removal I see - removal of one cycle of dirtying - wouldn't really weigh much if we would combine it with freezing (which we can do if Robert's forensic freezing patch makes it in). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Tue, 2013-07-02 at 19:34 +0200, Andres Freund wrote: Well, I still have my doubts that it's a good idea to remove this knowledge from the page level. And that's not primarily related to performance. Unfortunately a good part of judging architectural questions is gut feeling... The only real argument for the removal I see - removal of one cycle of dirtying - wouldn't really weigh much if we would combine it with freezing (which we can do if Robert's forensic freezing patch makes it in). I'll have to take a closer look at the relationship between Robert's and Heikki's proposals. I have a gut feeling that the complexity we go through to maintain PD_ALL_VISIBLE is unnecessary and will cause us problems later. If we could combine freezing and marking all-visible, and use WAL for PD_ALL_VISIBLE in a normal fashion, then I'd be content with that. Even better would be if we could eliminate the freeze I/O with Heikki's proposal, and eliminate the PD_ALL_VISIBLE I/O with my patch. But, that's easier said than done. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Sun, 2013-06-30 at 22:58 -0400, Robert Haas wrote: I thought that Jeff withdrew this patch. No -- was there a reason you thought that? I know it could use another round of testing before commit, and there may be a couple other things to clear up. But I don't want to invest a lot of time there right now, because, as I understand it, you still object to the patch anyway. I am still not entirely clear on the objections to this patch: 1. Contention was a concern, but I believe I have mitigated it. Strictly speaking, additional pins may be acquired, but the cost of those pin operations will be spread over a lot of other work. 2. There are quite a few different ideas about where we're going with PD_ALL_VISIBLE and freezing, but it seems like removing PD_ALL_VISIBLE is potentially compatible with most of them. Any others? The patch reduces code complexity and reduces writes during a data load. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Mon, Jul 1, 2013 at 1:21 PM, Jeff Davis pg...@j-davis.com wrote: On Sun, 2013-06-30 at 22:58 -0400, Robert Haas wrote: I thought that Jeff withdrew this patch. No -- was there a reason you thought that? I thought I remembered you saying you were going to abandon it in the face of objections. I know it could use another round of testing before commit, and there may be a couple other things to clear up. But I don't want to invest a lot of time there right now, because, as I understand it, you still object to the patch anyway. I am still not entirely clear on the objections to this patch: 1. Contention was a concern, but I believe I have mitigated it. Strictly speaking, additional pins may be acquired, but the cost of those pin operations will be spread over a lot of other work. 2. There are quite a few different ideas about where we're going with PD_ALL_VISIBLE and freezing, but it seems like removing PD_ALL_VISIBLE is potentially compatible with most of them. Any others? The patch reduces code complexity and reduces writes during a data load. Well, I don't believe there's any way to really eliminate the contention concern completely. There's no way around the fact that it means more access to the visibility map, and I've seen recent (albeit circumstantial thus far) evidence that that can be a real problem. The buffer mapping locks are a problem, too, so anything that means more page accesses can't be taken lightly. I agree your proposed changes reduce the chances of problems; I don't agree that they eliminate them. The other concern I remember being expressed (and not just by me, but by a number of people) is that your patch turns loss of a visibility map bit into a data corruption scenario, which it currently isn't. Right now, if your visibility map gets corrupted, you can always recover by deleting it. Under your proposal that would no longer be possible. I think that's sufficient grounds to reject the patch by itself, even if there were no other issues. If that doesn't strike you as very dangerous, I'm baffled as to why not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Mon, 2013-07-01 at 16:05 -0400, Robert Haas wrote: The other concern I remember being expressed (and not just by me, but by a number of people) is that your patch turns loss of a visibility map bit into a data corruption scenario, which it currently isn't. Right now, if your visibility map gets corrupted, you can always recover by deleting it. Under your proposal that would no longer be possible. I think that's sufficient grounds to reject the patch by itself, even if there were no other issues. If that doesn't strike you as very dangerous, I'm baffled as to why not. Can you point me to that criticism? Why can't you just drop the VM completely if it becomes corrupted? (You might be referring to another idea of mine that was related to Andres's proposal for getting rid of freezing.) Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Mon, Jul 1, 2013 at 8:23 PM, Jeff Davis pg...@j-davis.com wrote: Can you point me to that criticism? Why can't you just drop the VM completely if it becomes corrupted? (You might be referring to another idea of mine that was related to Andres's proposal for getting rid of freezing.) One of several relevant emails is at: http://www.postgresql.org/message-id/51a7473c.6070...@vmware.com It is definitely possible that I am mixing up two different things. But if I am, I don't know what the other one is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Mon, 2013-07-01 at 20:59 -0400, Robert Haas wrote: One of several relevant emails is at: http://www.postgresql.org/message-id/51a7473c.6070...@vmware.com It is definitely possible that I am mixing up two different things. But if I am, I don't know what the other one is. I believe you are mixing up two different things. The patch in the commitfest now doesn't cause that problem at all. The thread above is about one proposal in which Andres basically suggested treating all visible as frozen. I threw out the idea that my proposal was not necessarily in conflict with that one, although others pointed out some problems with combining them. However, that only matters if Andres's proposal is going to actually make it in. Heikki also made a very interesting proposal related to freezing here: http://www.postgresql.org/message-id/51a7553e.5070...@vmware.com and that seems compatible with my proposal (which is one of the advantages you list). So, if you object because we're moving toward another incompatible proposal that is more desirable, then I understand that. It can be a bit frustrating to me though if my proposal is rejected because one of several proposals is in conflict. (Not that it's necessarily wrong to do so, but I'm sure you can see how that is frustrating.) I'll see if I can help out with Heikki's patch. If it starts to look like it's going to make it, will you drop this particular objection? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Mon, 2013-07-01 at 16:05 -0400, Robert Haas wrote: Well, I don't believe there's any way to really eliminate the contention concern completely. There's no way around the fact that it means more access to the visibility map, and I've seen recent (albeit circumstantial thus far) evidence that that can be a real problem. The buffer mapping locks are a problem, too, so anything that means more page accesses can't be taken lightly. I agree your proposed changes reduce the chances of problems; I don't agree that they eliminate them. If you have a 1000-page table that is being accessed concurrently, that requires 1000 pins. My patch would make that 1001, which doesn't sound very scary to me. 1. Do you agree that concurrent access to 1000-page tables is not a problem with the design of my patch? 2. Can you be more specific about the scenarios that you *are* concerned about? Preferably in a form that could be tested on a 64-core box; but at least some kind of analysis involving numbers. More page accesses is scary, but completely meaningless without saying how *many* more and in which situations. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Sat, Jun 29, 2013 at 11:24 AM, Robins rob...@pobox.com wrote: On 10 June 2013 00:17, Jeff Davis pg...@j-davis.com wrote: On Thu, 2013-05-30 at 10:07 -0700, Jeff Davis wrote: Come to think of it, even without the torn page checksum issue, do we really want to actively clear the all-visible flags after upgrade? Removed that from the patch and rebased. I think the best approach is to remove the bit opportunistically when we're already dirtying the page for something else. However, right now, there is enough skepticism of the general approach in this patch (and enough related proposals) that I'll leave this to be resolved if and when there is more agreement that my approach is a good one. Did some basic checks on this patch. List-wise feedback below. - Cleanly applies to Git-Head: Yes (Some offsets, but thats probably because of delay in review) - Documentation Updated: No. (Required?) - Tests Updated: No. (Required?) - All tests pass: Yes - Does it Work : ??? - Any visible issues: No - Any compiler warnings: No - Others: Number of uncovered lines: Reduced by 167 lines I thought that Jeff withdrew this patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
I thought that Jeff withdrew this patch. He did, but nobody removed it from the commitfest --- partly because of a change of subject line breaking the thread. Bounced to returned with feedback now. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On 10 June 2013 00:17, Jeff Davis pg...@j-davis.com wrote: On Thu, 2013-05-30 at 10:07 -0700, Jeff Davis wrote: Come to think of it, even without the torn page checksum issue, do we really want to actively clear the all-visible flags after upgrade? Removed that from the patch and rebased. I think the best approach is to remove the bit opportunistically when we're already dirtying the page for something else. However, right now, there is enough skepticism of the general approach in this patch (and enough related proposals) that I'll leave this to be resolved if and when there is more agreement that my approach is a good one. Did some basic checks on this patch. List-wise feedback below. - Cleanly applies to Git-Head: Yes (Some offsets, but thats probably because of delay in review) - Documentation Updated: No. (Required?) - Tests Updated: No. (Required?) - All tests pass: Yes - Does it Work : ??? - Any visible issues: No - Any compiler warnings: No - Others: Number of uncovered lines: Reduced by 167 lines Robins Tharakan
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On 30.05.2013 06:54, Jeff Davis wrote: Continuation of: http://www.postgresql.org/message-id/1353551097.11440.128.camel@sussancws0025 Rebased patch attached; no other changes. @@ -675,6 +675,16 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, } /* +* If this page is left over from an upgraded system, it may have a +* PD_ALL_VISIBLE bit set (which is deprecated). If so, clear it. +*/ + if (PageIsAllVisible(page)) + { + PageClearAllVisible(page); + MarkBufferDirty(buf); + } + + /* * Prune all HOT-update chains in this page. * * We count tuples removed by the pruning step as removed by VACUUM. That could cause a torn page and checksum failure if checksums are enabled. Actually, I think the later PageClearAllVisible() call later in the function has the same problem, even without this patch. Instead of adding a new vmbuffer argument to heap_insert() and friends, could we put that into BulkInsertStateData? The new argument is similar to the current bulk-insert state in spirit. That would simplify the callers and make the heapam API cleaner. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On 30.05.2013 11:26, Heikki Linnakangas wrote: On 30.05.2013 06:54, Jeff Davis wrote: Continuation of: http://www.postgresql.org/message-id/1353551097.11440.128.camel@sussancws0025 Rebased patch attached; no other changes. @@ -675,6 +675,16 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, } /* + * If this page is left over from an upgraded system, it may have a + * PD_ALL_VISIBLE bit set (which is deprecated). If so, clear it. + */ + if (PageIsAllVisible(page)) + { + PageClearAllVisible(page); + MarkBufferDirty(buf); + } + + /* * Prune all HOT-update chains in this page. * * We count tuples removed by the pruning step as removed by VACUUM. That could cause a torn page and checksum failure if checksums are enabled. Come to think of it, even without the torn page checksum issue, do we really want to actively clear the all-visible flags after upgrade? For tables that haven't been changed, and thus have the all-visible bits set, that amounts to a complete rewrite on the first vacuum after upgrade. That's expensive. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating PD_ALL_VISIBLE, take 2
On Thu, 2013-05-30 at 11:32 +0300, Heikki Linnakangas wrote: That could cause a torn page and checksum failure if checksums are enabled. Thank you, I missed that in the rebase; it should be MarkBufferDirtyHint(). Come to think of it, even without the torn page checksum issue, do we really want to actively clear the all-visible flags after upgrade? For tables that haven't been changed, and thus have the all-visible bits set, that amounts to a complete rewrite on the first vacuum after upgrade. That's expensive. I expected that question and intended to raise it for discussion when and if we get past performance concerns (I believe Robert is still not convinced that the patch is viable). We have a few options: We can ignore the bit entirely, or we can aggressively unset it, or we can opportunistically unset it if the page is already dirty. I don't think we're in a hurry to reuse that bit for something else, so maybe it's best to just ignore it entirely. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers