On Fri, Jul 03, 2015 at 07:21:21PM +0200, Andres Freund wrote:
> On 2015-07-03 19:14:26 +0200, Martijn van Oosterhout wrote:
> > Am I missing something. ISTM that if the truncate record was simply not
> > logged at all everything would work fine. The whole point is that the
> > table was created in this transaction and so if it exists the table on
> > disk must be the correct representation.
> 
> That'd not work either. Consider:
> 
> BEGIN;
> CREATE TABLE ...
> INSERT;
> TRUNCATE;
> INSERT;
> COMMIT;
> 
> If you replay that without a truncation wal record the second INSERT
> will try to add stuff to already occupied space. And they can have
> different lengths and stuff, so you cannot just ignore that fact.

I was about to disagree with you by suggesting that if the table was
created in this transaction then WAL logging is skipped. But testing
shows that inserts are indeed logged, as you point out.

With inserts the WAL records look as follows (relfilenodes changed):

martijn@martijn-jessie:~/git/ctm/docker$ sudo 
/usr/lib/postgresql/9.4/bin/pg_xlogdump -p /tmp/pgtest/postgres/pg_xlog/ 
000000010000000000000001 |grep -wE '16386|16384|16390'
rmgr: Storage     len (rec/tot):     16/    48, tx:          0, lsn: 
0/016A79C8, prev 0/016A79A0, bkp: 0000, desc: file create: base/12139/16384
rmgr: Sequence    len (rec/tot):    158/   190, tx:        683, lsn: 
0/016B4258, prev 0/016B2508, bkp: 0000, desc: log: rel 1663/12139/16384
rmgr: Storage     len (rec/tot):     16/    48, tx:        683, lsn: 
0/016B4318, prev 0/016B4258, bkp: 0000, desc: file create: base/12139/16386
rmgr: Storage     len (rec/tot):     16/    48, tx:        683, lsn: 
0/016B9468, prev 0/016B9418, bkp: 0000, desc: file create: base/12139/16390
rmgr: Sequence    len (rec/tot):    158/   190, tx:        683, lsn: 
0/016BC938, prev 0/016BC880, bkp: 0000, desc: log: rel 1663/12139/16384
rmgr: Sequence    len (rec/tot):    158/   190, tx:        683, lsn: 
0/016BCAF0, prev 0/016BCAA0, bkp: 0000, desc: log: rel 1663/12139/16384
rmgr: Heap        len (rec/tot):     35/    67, tx:        683, lsn: 
0/016BCBB0, prev 0/016BCAF0, bkp: 0000, desc: insert(init): rel 
1663/12139/16386; tid 0/1
rmgr: Btree       len (rec/tot):     20/    52, tx:        683, lsn: 
0/016BCBF8, prev 0/016BCBB0, bkp: 0000, desc: newroot: rel 1663/12139/16390; 
root 1 lev 0
rmgr: Btree       len (rec/tot):     34/    66, tx:        683, lsn: 
0/016BCC30, prev 0/016BCBF8, bkp: 0000, desc: insert: rel 1663/12139/16390; tid 
1/1
rmgr: Storage     len (rec/tot):     16/    48, tx:        683, lsn: 
0/016BCC78, prev 0/016BCC30, bkp: 0000, desc: file truncate: base/12139/16386 
to 0 blocks
rmgr: Storage     len (rec/tot):     16/    48, tx:        683, lsn: 
0/016BCCA8, prev 0/016BCC78, bkp: 0000, desc: file truncate: base/12139/16390 
to 0 blocks
rmgr: Heap        len (rec/tot):     35/    67, tx:        683, lsn: 
0/016BCCD8, prev 0/016BCCA8, bkp: 0000, desc: insert(init): rel 
1663/12139/16386; tid 0/1
rmgr: Btree       len (rec/tot):     20/    52, tx:        683, lsn: 
0/016BCD20, prev 0/016BCCD8, bkp: 0000, desc: newroot: rel 1663/12139/16390; 
root 1 lev 0
rmgr: Btree       len (rec/tot):     34/    66, tx:        683, lsn: 
0/016BCD58, prev 0/016BCD20, bkp: 0000, desc: insert: rel 1663/12139/16390; tid 
1/1
 
   relname   | relfilenode 
-------------+-------------
 test        |       16386
 test_id_seq |       16384
 test_pkey   |       16390
(3 rows)

And amazingly, the database cluster successfuly recovers and there's no
error now.  So the problem is *only* because there is no data in the
table at commit time.  Which indicates that it's the 'newroot" record
that saves the day normally.  And it's apparently generated by the
first insert.

> Agreed. I think the problem is something else though. Namely that we
> reuse the relfilenode for heap_truncate_one_rel(). That's just entirely
> broken afaics. We need to allocate a new relfilenode and write stuff
> into that. Then we can forgo WAL logging the truncation record.

Would that properly initialise the index though?

Anyway, this is way outside my expertise, so I'll bow out now. Let me
know if I can be of more assistance.

Have a nice day,
-- 
Martijn van Oosterhout   <klep...@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment: signature.asc
Description: Digital signature

Reply via email to