Hello, Antonin!

Some comments for 0006:

> SnapBuildSnapshotForRepack(SnapBuild *builder)
Does it also "replays" previously processed WAL to the position that
snapshot is ready to use?
I am afraid we may see some non-yet processed parts of WAL leading to
duplicate insertion.

> first_block
What is the reason for that variable? It seems to be always the first block
of relation.
Also, what if we have a huge amount of empty space at the start. In that
case the first block will be the block of the first "filled" page. But
insert may (and will) fill empty pages before first_block - out of the
range.
So, I think we should delete it and always use 0 instead.

> DecodeMultiInsert
ItemPointerSet-related logic seems missing.

> tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
With SnapshotAny we are going to check the same tuple multiple times.
Better to let scan logic handle it (and change snapshots used by scan code).
See [0] for a way to reset snapshots during the scan.

> if (blkno >= range_end)
I don't think it is legal to switch a snapshot while holding the tuple.
Nothing is protecting it from being pruned\reused.
Snapshots need to be switched "between" pages. You may check how it is done
at [0].

> PopActiveSnapshot();
> InvalidateCatalogSnapshot();
I think it is a good idea to add here assert for MyProc->xmin and
MyProc->xid to be invalid. To ensure we really allow the horizon to advance.

> /* Set to request a snapshot. */
> bool snapshot_requested;
We know the end or region in advance, so it should be possible to filter
before writing changes to file.
So, it is some kind of "this is the end of region, create new file and
store everything before + create new snapshot for me".

> PopActiveSnapshot();
Sometimes without InvalidateCatalogSnapshot().

> PushActiveSnapshot(GetTransactionSnapshot());
GetLatestSnapshot() feels better here.

> * The individual builds might still be a problem, but that's a
> * separate issue.
Opening the index may create a catalog snapshot, so it needs to be
invalidated after.

> * TODO Can we somehow use the fact that the new heap is not yet
> * visible to other transaction, and thus cannot be vacuumed?
Snapshot resetting [0] may work here (without CIC, just as part of the scan
+ some code to ensure catalog snapshot is managed correctly).
Also, to correctly build a unique index - some tech from [0] is required
(building a unique index with multiple snapshots is a little bit tricky).
Or we may implement some super lightweight way - just SnapshotAny without
any visibility checks (just assume everything is ok since it copied from
another relation with the same index set).

> This approach introduces one limitation though: if the USING INDEX clause
is
> specified, an explicit sort is always used. Index scan wouldn't work
because
> it does not return the tuples sorted by CTID.

Technically we may just use keys (if they are comparable) as a way to
specify regions. Instead of number of pages to switch snapshot - number of
tuples or time.
But because we don't know the region end in advance - we have to keep all
the changes in file and filter only while applying.

> Assert(XLogRecPtrIsInvalid(shared->lsn_upto));
>  /* Initially we're expected to provide a snapshot and only that. */
>  Assert(shared->snapshot_requested &&
>    XLogRecPtrIsInvalid(shared->lsn_upto));

XLogRecPtrIsInvalid(shared->lsn_upto) assertion is duplicated here.

> range_end = repack_blocks_per_snapshot;
Should be repack_blocks_per_snapshot + ctx->first_block ? (but better to
remove the first_block at all).

> * XXX It might be worth Assert(CatalogSnapshot == NULL)
> * here, however that symbol is not external.
As said above - better assert for MyProc->xmin/xid + add
InvalidateCatalogSnapshot.

> extern Snapshot
> extern void

Some "externs" used in C files (not headers).

> ctx->block_ranges = NIL;
may be used with list_free?

>  * Remember here to which pages should applied to changes recorded in
given
>  * file.
"should apply"

>  of pages, so that VACUUM does not get block for too long
"blocked"

> there are no restriction on block
"there is" or "restrictions"

> repack_add_block_range
extra new-line after function.

> Is REPACKED (CONCURRENTLY) is being run by this backend?
REPACK and double "is"

[0]: https://commitfest.postgresql.org/patch/6401/

Reply via email to