Re: [HACKERS] Freeze avoidance of very large table.

Jim Nasby Mon, 06 Apr 2015 10:09:06 -0700

On 4/6/15 11:12 AM, Sawada Masahiko wrote:

On Mon, Apr 6, 2015 at 10:17 PM, Jim Nasby <jim.na...@bluetreble.com> wrote:

On 4/6/15 1:46 AM, Sawada Masahiko wrote:


On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:


On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby <jim.na...@bluetreble.com>
wrote:



On 4/3/15 12:59 AM, Sawada Masahiko wrote:



+                               case HEAPTUPLE_LIVE:
+                               case HEAPTUPLE_RECENTLY_DEAD:
+                               case HEAPTUPLE_INSERT_IN_PROGRESS:
+                               case HEAPTUPLE_DELETE_IN_PROGRESS:
+                                       if
(heap_prepare_freeze_tuple(tuple.t_data, freezelimit,
+
mxactcutoff, &frozen[nfrozen]))
+
frozen[nfrozen++].offset
= offnum;
+                                       break;




This doesn't seem safe enough to me. Can't there be tuples that are
still
new enough that they can't be frozen, and are still live?




Yep.  I've set a table to read only while it contained unfreezable
tuples,
and the tuples remain unfrozen yet the read-only action claims to have
succeeded.


Somewhat related... instead of forcing the freeze to happen
synchronously,
can't we set this up so a table is in one of three states? Read/Write,
Read
Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to
the
appropriate state, and all the vacuum infrastructure would continue to
process those tables as it does today. lazy_vacuum_rel would become
responsible for tracking if there were any non-frozen tuples if it was
also
attempting a freeze. If it discovered there were none, AND the table was
marked as ReadOnly, then it would change the table state to Frozen and
set
relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
AT_SetReadWrite could change relfrozenxid to it's own Xid as an
optimization. Doing it that way leaves all the complicated vacuum code
in
one place, and would eliminate concerns about race conditions with still
running transactions, etc.




+1 here as well.  I might want to set tables to read only for reasons
other
than to avoid repeated freezing.  (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to
automatically
trigger a vacuum freeze in the process.


Thank you for comments.

Somewhat related... instead of forcing the freeze to happen
synchronously, can't we set this up so a table is in one of three states?
Read/Write, Read Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would
simply change to > the appropriate state, and all the vacuum infrastructure
would continue to process those tables as it does today. lazy_vacuum_rel
would become responsible for tracking if there were any non-frozen tuples if
it was also attempting > a freeze. If it discovered there were none, AND the
table was marked as ReadOnly, then it would change the table state to Frozen
and set relfrozenxid = InvalidTransactionId and relminxid =
InvalidMultiXactId. AT_SetReadWrite > could change relfrozenxid to it's own
Xid as an optimization. Doing it that way leaves all the complicated vacuum
code in one place, and would eliminate concerns about race conditions with
still running transactions, etc.



I agree with 3 status, Read/Write, ReadOnly and Frozen.
But I'm not sure when we should do to freeze tuples, e.g., scan whole
tables.
I think that the any changes to table are completely
ignored/restricted if table is marked as ReadOnly table,
and it's accompanied by freezing tuples, just mark as ReadOnly.
Frozen table ensures that all tuples of its table completely has been
frozen, so it also needs to scan whole table as well.
e.g., we should need to scan whole table at two times. right?



No. You would be free to set a table as ReadOnly any time you wanted,
without scanning anything. All that setting does is disable any DML on the
table.

The Frozen state would only be set by the vacuum code, IFF:
- The table state is ReadOnly *at the start of vacuum* and did not change
during vacuum
- Vacuum ensured that there were no un-frozen tuples in the table

That does not necessitate 2 scans.


I understood this comcept, and have question as I wrote below.

+1 here as well.  I might want to set tables to read only for reasons
other than to avoid repeated freezing.  (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to automatically
trigger a
vacuum freeze in the process.

There is another possibility here, too. We can completely divorce a
ReadOnly mode (which I think is useful for other things besides freezing)
from the question of whether we need to force-freeze a relation if we create
a
FrozenMap, similar to the visibility map. This has the added advantage of
helping freeze scans on relations that are not ReadOnly in the case of
tables that are insert-mostly or any other pattern where most pages stay
all-frozen.
Prior to the visibility map this would have been a rather daunting
project, but I believe this could piggyback on the VM code rather nicely.
Anytime you clear the VM you clearly must clear the FrozenMap as well. The
logic for
setting the FM is clearly different, but that would be entirely
self-contained to vacuum. Unlike the VM, I don't see any point to marking
special bits in the page itself for FM.



I was thinking this idea (FM) to avoid freezing all tuples actually.
As you said, it might not be good idea (or overkill) that the reason
why settings table to read only is avoidance repeated freezing.
I'm attempting to try design FM to avoid freezing relations as well.
Is it enough that each bit of FM has information that corresponding
pages are completely frozen on each bit?



If I'm understanding your implied question correctly, I don't think there
would actually be any relationship between FM and marking ReadOnly. It would
come into play if we wanted to do the Frozen state, but if we have the FM,
marking an entire relation as Frozen becomes a lot less useful. What's going
to happen with a VACUUM FREEZE once we have FM is that vacuum will be able
to skip reading pages if they are all-visible *and* the FM shows them as
frozen, whereas today we can't use the VM to skip pages if scan_all is true.

For simplicity, I would start out with each FM bit representing a single
page. That means the FM would be very similar in operation to the VM; the
only difference would be when a bit in the FM was set. I would absolutely
split this into 2 patches as well; one for ReadOnly (and skip the Frozen
status for now), and one for FM.
When I looked at the VM code briefly it occurred to me that it might be
quite difficult to have 1 FM bit represent multiple pages. The issue is the
locking necessary between VACUUM and clearing a FM bit. In the VM that's
handled by the cleanup lock, but that will only work at a page level. We'd
need something to ensure that nothing came in and performed DML while the
vacuum code was getting ready to set a FM bit. There's probably several ways
this could be accomplished, but I think it would be foolish to try and do
anything about it in the initial patch. Especially because it's only
supposition that there would be much benefit to having multiple pages per
bit.


Yes, I will separate the patch into two patches.

I'd like to confirm about whether what I'm thinking is correct here.
In first version of patch, each FM bit represent a single page is
imply whether the all tuple of the page completely has been frozen, it
would be one patch.


Yes.

The second patch adds 3 states and read-only table which disable to


Actually, I would start simply with ReadOnly and ReadWrite.

As I understand it, the goal here is to prevent huge amounts of periodicfreeze work due to XID wraparound. I don't think we need the Freezestate to accomplish that.

With a single bit per page in the Frozen Map, checking a 800GB tablewould require reading a mere 100MB of FM. That's pretty tiny, andlargely accomplishes the goal.

Obviously it would be nice to eliminate even that 100MB read, but Isuggest you leave that for a 3rd patch. I think you'll find that justgetting the first 2 accomplished will be a significant amount of work.

Also, note that you don't really even need the ReadOnly patch. As longas you're not actually touching the table at all the FM will eventuallyread as everything is frozen; that gets you 80% of the way there. So I'dsuggest starting with the FM, then doing ReadOnly, and only thenattempting to add the Frozen state.

any write to table. The trigger which changes state from Read/Write to
Read-Only is ALTER TABLE SET READ ONLY. And the trigger changes from
Read-Only to Frozen is vacuum only when the table has been marked as
Read-Only at vacuum is started *and* the vacuum did not any freeze
tuple(including skip the page refer to FM). If we support FM, we would
be able to avoid repeated freezing whole table even if the table has
not been marked as Read-Only.

In order to change state to Frozen, we need to do VACUUM FREEZE or
wait for running of auto vacuum. Generally, the threshold of cutoff
xid is different between VACUUM (and autovacuum) and VACUUM FREEZE. We
would not expect to change status using by explicit vacuum and
autovacuum. Inevitably, we would need to do both command ALTER TABLE
SET READ ONLY and VACUUM FREEZE to change state to Frozen.
I think that we should also add DDL which does both freezing tuple and
changing state in one pass, like ALTER TABLE SET READ ONLY WITH FREEZE
or ALTER TABLE SET FROZEN.

Regards,

-------
Sawada Masahiko



--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

Reply via email to