[HACKERS] Difficulty in partial writable TOAST value

KaiGai Kohei Tue, 25 Aug 2009 20:57:38 -0700

I've considered the way to implement access controls on the largeobject
feature since the first commit fest. One suggestion was that it may be
implementable as a partial read/write interface to TOAST values (which
should be newly added), and existing largeobject feature performs as
a simple wrapper for the interfaces.
However, it has a few hard matters to be resolved.


* Visibility of TOAST chunks

The toast_fetch_datum() which is called to detoast an externally stored
datum scans the target toast relation with SnapshotToast. It does not
check any visibility checks except for vacuuming, because the visibility
of toast chunks is controled by the visibility of toast pointer in the
referer side. In other word, all the corresponding toast chunks are
visible as long as its toast pointer is visible for the transaction.

When we update a value which can be toasted, the toast mechanism tries
to insert the new value with a different chunk_id (in separated pages
if necessary), and delete old pages. The returned toast pointer
contains the new chunk_id, and it can be visible for transactions which
can see the new chunk_id stored in the referer side.
If you are familiar to the Linux operating sytem, you can find out
this algotithm is similar to RCU mechanism.

The largeobject interfaces allow us to read/write a part of very large
data, so it is not suitable to replace all the unchanged chunks whenever
we update a byte of largeobject. (Please consider a situation to replace
10000 of chunks by 100 bytes of partial writing.)
It is a hard matter to implement the largeobject interfaces as a wrapper
of partial readable/writable toast value.

So, it seems to me it is necessary to determine whether we should change
the way to handle visibility of datum, or not, at first.
If a toast value has its own visibility, the caller of pg_detoast_datum()
must give a proper snapshot to be used in scanning the toast relation.
However, widespread routines call the function (in addition, most of them
don't have enough information about what snapshot should be used), so
its impact will grow very large.

At this moment, I don't think it is a reasonable approach to rework whole
of the toast mechanism to implement access control features in largeobject.
If you have any good idea, please suggest me.


* The way to implement access controls in largeobject.

The current pg_largeobject is used to store data chunks of largeobject.
A largeobject consists of multiple separated data chunks, but it is not
suitable to store metadata (ownership, ACLs, ...) of largeobject.
So, we change the definition of pg_largeobject as follows:

  #define LargeObjectRelationId  2613

  CATALOG(pg_largeobject,2613)
  {
      Oid       loowner;      /* OID of the owner */
      Oid       lochunk;      /* OID of the chunk on TOAST relation */
      aclitem   loacl[1];     /* access permissions */
  } FormData_pg_largeobject;

I have a plan to put all the data chunks on the toast relation of the
pg_largeobject system catalog, but any data chunks associated with
a certain largeobject are not accessed via existing toast mechanism.
In other word, the largeobject uses its toast relation just as a relation
to store its data chunks.
(Note that the current pg_largeobject has identical definition.)

Is there any comment?

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <kai...@ak.jp.nec.com>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Difficulty in partial writable TOAST value

Reply via email to