On May30, 2012, at 22:28 , james wrote:
> Well, I was assuming that there was some intelligence in the receiver that 
> could effectively parse this for the application; are you suggesting that is 
> effectively binary deltas to apply to raw pages?

In parts. The log that is streamed to replication slaves is the same as the 
write-ahead log that postgres uses for crash recovery. Some records in this log 
are simply page images, while others describe operations (like inserting a 
tuple). But none of them include enough information to convert them into, say, 
SQL statements which would redo that operation.

Tuple insertion records, for example, simply contain the binary representation 
of the tuple to be inserted. To convert that back into SQL, you need meta-data 
(like the tuple layout for the table in question). Now, in theory you could 
query that meta-data from the masters. In practice, however, doing that 
correctly would be extremely hard, since you'd need to get the meta-data 
present at the time the record was created, *not* at the time it is shipped 
out. Otherwise, there'll be a window after every DDL statement where you decode 
the records incorrectly, since the records refer to a different table structure 
than currently present on the master.

Also, records generally describe pretty low-level operations. An actual INSERT, 
for example, will produce records for inserting the tuple into the heap, and 
separate records for inserting them into whatever indexes are defined on the 
table. If one of the inserted fields is larger than the TOAST threshold, you'll 
also get a separate record for the TOAST-table insertion, and the main tuple 
will only contain references to the chunks in the TOAST table.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to