Hi,

yesterday, I promised to outline the requirements of Postgres-R for tuple serialization, which we have been talking about before. There are basically three types of how to serialize tuple changes, depending on whether they originate from an INSERT, UPDATE or DELETE. For updates and deletes, it saves the old pkey as well as the origin (a global transaction id) of the tuple (required for consistent serialization on remote nodes). For inserts and updates, all added or changed attributes need to be serialized as well.

           pkey+origin    changes
  INSERT        -            x
  UPDATE        x            x
  DELETE        x            -

Note, that the pkey attributes may never be null, so an isnull bit field can be skipped for those attributes. For the insert case, all attributes (including primary key attributes) are serialized. Updates require an additional bit field (well, I'm using chars ATM) to store which attributes have changed. Only those should be transferred.

I'm tempted to unify that, so that inserts are serialized as the difference against the default vaules or NULL. That would make things easier for Postgres-R. However, how about other uses of such a fast tuple applicator? Does such a use case exist at all? I mean, for parallelizing COPY FROM STDIN, one certainly doesn't want to serialize all input tuples into that format before feeding multiple helper backends. Instead, I'd recommend letting the helper backends do the parsing and therefore parallelize that as well.

For other features, like parallel pg_dump or even parallel query execution, this tuple serialization code doesn't help much, IMO. So I'm thinking that optimizing it for Postgres-R's internal use is the best way to go.

Comments? Opinions?

Regards

Markus

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to