RE: [HACKERS] vacuum crash on 6.5.3

2000-12-30 Thread Hiroshi Inoue

Just a supplement.
Essentially this isn't a crash bug.
This had been a disastrous bug that causes data loss silently.
(This is known as 'HEAP_MOVED_IN was not expected' bug
but the result could be more serious than I've recognized.) 

Please apply the patch if you still have pre-7.0 pg db-s and
you don't love data loss.

Regards.
Hiroshi Inoue

 -Original Message-
 From: Tatsuo Ishii
 
  Althoug this happens on old 6.5.3, I would like to know if this has
  been already fixed...
  
  Here is the scenario:
  
  1) before vacuum, table A has 8850 tuples.
  
  2) vacuum on table A makes postgres crashed.
  
  3) it crashes at line 1758:
  
  Assert(num_moved == checked_moved);
  
  I examined variables using gdb. num_moved == 8849, check_moved ==
  8813, num_tuples == 18.
  
  4) if PostgreSQL is not compiled with assertion, vacuum does not
 crash. However, after vacuum, the number of tuples descreases from
 8850 to 8814!! (I am not sure which number is correct, though)
  
  I think this is an important problem since a data loss might
  happen. Any idea?
 
 It turns out that this was caused by vacuum's bug.  Thanks to Hiroshi,
 he has identified the problem. I have checked other version of
 PostgreSQL, and found that at we have had the bug at least since
 6.3.2, and it has been fixed in 7.0. Included are patches for 6.5.3 and
 a test sript to reproduce the bug. Both of them are made by Hiroshi.
 --
 Tatsuo Ishii
 



Re: [HACKERS] vacuum crash on 6.5.3

2000-12-29 Thread Tatsuo Ishii

 Althoug this happens on old 6.5.3, I would like to know if this has
 been already fixed...
 
 Here is the scenario:
 
 1) before vacuum, table A has 8850 tuples.
 
 2) vacuum on table A makes postgres crashed.
 
 3) it crashes at line 1758:
 
   Assert(num_moved == checked_moved);
 
   I examined variables using gdb. num_moved == 8849, check_moved ==
   8813, num_tuples == 18.
 
 4) if PostgreSQL is not compiled with assertion, vacuum does not
crash. However, after vacuum, the number of tuples descreases from
8850 to 8814!! (I am not sure which number is correct, though)
 
 I think this is an important problem since a data loss might
 happen. Any idea?

It turns out that this was caused by vacuum's bug.  Thanks to Hiroshi,
he has identified the problem. I have checked other version of
PostgreSQL, and found that at we have had the bug at least since
6.3.2, and it has been fixed in 7.0. Included are patches for 6.5.3 and
a test sript to reproduce the bug. Both of them are made by Hiroshi.
--
Tatsuo Ishii


drop sequence t1_i_seq;
drop table t1;
create table t1 (i serial, t text);
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',3970,'x'));
insert into t1(t) values(rpad('x',3970,'x'));
insert into t1(t) values(rpad('x',3970,'x'));
insert into t1(t) values(rpad('x',3970,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',2048,'x'));
insert into t1(t) values(rpad('x',3970,'x'));
insert into t1(t) values(rpad('x',3970,'x'));
insert into t1(t) values(rpad('x',3970,'x'));
select ctid,i,char_length(t) from t1;
delete from t1 where i = 1;
delete from t1 where i = 4;
delete from t1 where i = 7;
delete from t1 where i = 10;
delete from t1 where i = 11;
delete from t1 where i = 12;
delete from t1 where i = 13;
delete from t1 where i = 14;
delete from t1 where i = 15;
delete from t1 where i = 16;
select ctid,i,char_length(t) from t1;
vacuum t1;
select ctid,i,char_length(t) from t1;
select version();


*** commands/vacuum.c.orig  Tue Dec 26 23:24:01 2000
--- commands/vacuum.c   Wed Dec 27 00:36:46 2000
***
*** 1025,1030 
--- 1025,1031 
   *idcur;
int last_fraged_block,
last_vacuum_block,
+   last_moved_in_block,
i = 0;
Sizetuple_len;
int num_moved,
***
*** 1060,1065 
--- 1061,1067 
vacuumed_pages = vacuum_pages-vpl_num_pages - 
vacuum_pages-vpl_empty_end_pages;
last_vacuum_page = vacuum_pages-vpl_pagedesc[vacuumed_pages - 1];
last_vacuum_block = last_vacuum_page-vpd_blkno;
+   last_moved_in_block = 0;
Assert(last_vacuum_block = last_fraged_block);
cur_buffer = InvalidBuffer;
num_moved = 0;
***
*** 1073,1078 
--- 1075,1083 
/* if it's reapped page and it was used by me - quit */
if (blkno == last_fraged_block  last_fraged_page-vpd_offsets_used  
0)
break;
+   /* couldn't shrink any more if this block has MOVED_IN tuplesit's - 
+quit */
+   if (blkno == last_moved_in_block)
+   break;
  
buf = ReadBuffer(onerel, blkno);
page = BufferGetPage(buf);
***
*** 1447,1452 
--- 1452,1459 
pfree(newtup.t_data);
newtup.t_data = (HeapTupleHeader) 
PageGetItem(ToPage, newitemid);
ItemPointerSet((newtup.t_self), 
vtmove[ti].vpd-vpd_blkno, newoff);
+   if (vtmove[i].vpd-vpd_blkno  
+last_moved_in_block)
+   last_moved_in_block = 
+vtmove[i].vpd-vpd_blkno;
  
/*
 * Set t_ctid pointing to itself for last 
tuple in
***
*** 1579,1584 
--- 1586,1593 
newtup.t_data = (HeapTupleHeader) PageGetItem(ToPage, 
newitemid);
ItemPointerSet((newtup.t_data-t_ctid), cur_page-vpd_blkno, 
newoff);
newtup.t_self = newtup.t_data-t_ctid;
+   if (cur_page-vpd_blkno  last_moved_in_block)
+