On Wed, Apr 22, 2015 at 12:39 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:
> The thing that made me nervous about that approach is that it made the LSN
> of each page critical information. If you somehow zeroed out the LSN, you
> could no longer tell which pages are frozen and which are not. I'm sure it
> could be made to work - and I got it working to some degree anyway - but
> it's a bit scary. It's similar to the multixid changes in 9.3: multixids
> also used to be data that you can just zap at restart, and when we changed
> the rules so that you lose data if you lose multixids, we got trouble. Now,
> LSNs are much simpler, and there wouldn't be anything like the
> multioffset/member SLRUs that you'd have to keep around forever or vacuum,
> but still..

LSNs are already pretty critical.  If they're in the future, you can't
flush those pages.  Ever.  And if they're wrong in either direction,
crash recovery is broken.  But it's still worth thinking about ways
that we could make this more robust.

I keep coming back to the idea of treating any page that is marked as
all-visible as frozen, and deferring freezing until the page is again
modified.  The big downside of this is that if the page is set as
all-visible and then immediately thereafter modified, it sucks to have
to freeze when the XIDs in the page are still present in CLOG.  But if
we could determine from the LSN that the XIDs in the page are new
enough to still be considered valid, then we could skip freezing in
those cases and only do it when the page is "old".  That way, if
somebody zeroed out the LSN (why, oh why?) the worst that would happen
is that we'd do some extra freezing when the page was next modified.

> I would feel safer if we added a completely new "epoch" counter to the page
> header, instead of reusing LSNs. But as we all know, changing the page
> format is a problem for in-place upgrade, and takes some space too.

Yeah.  We have a serious need to reduce the size of our on-disk
format.  On a TPC-C-like workload Jan Wieck recently tested, our data
set was 34% larger than another database at the beginning of the test,
and 80% larger by the end of the test.  And we did twice the disk
writes.  See "The Elephants in the Room.pdf" at
https://sites.google.com/site/robertmhaas/presentations

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to