[ZODB-Dev] Why DO we use timestamps as transaction ids?

2006-10-04 Thread Chris Withers

...rather than just incrementing integers?

I'm asking 'cos I've just started having time-stamp reduction errors 
on a production system where a contingent system is having a .fs file 
that's been re-constituted from repozo backups tested with fstest.py...


cheers,

Chris

--
Simplistix - Content Management, Zope  Python Consulting
   - http://www.simplistix.co.uk
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] zodb time-stamp reduction error without time chaning on server

2006-10-04 Thread Chris Withers

Hi All,

One of my customers has a large (21GB) production zodb which they back 
up onto a contingency server using repozo and rsync. The process is 
roughly as follows:


1. pack the production database to 3 days once a day.

2. create a full backup with repozo and rsync this to the contingency
   storage server once day.

3. on the contingency server, use repozo to turn the backup back into a
   zodb and run fstest to make sure it's sane.

4. every 2 hrs throughout the day, create an incremental backup with
   repozo and rsync the increments to the contingency storage server.

5. once the increments have arrived on the contingency storage server,
   use repozo there to turn the full backup and the increments back into
   a zodb and run fstest to make it

The process has worked fine for several months now. However, yesterday, 
after repozo had combined the full backup with two increments, fstest.py 
spat out the following:


Data.fs.restore
time-stamp reduction at 3102046:
0x03689aa5aa40e444 = 0x03689aea40d19233

First thought was that the times on the production storage server had 
screwed up, but a lot of digging has failed to show any evidence of 
this, so it looks like the above error has occurred without the usual 
time slip ups.


One of the guys here applied the attached patch to fstest.py to get more 
info on what was happening. I've attached a section of the output of 
running this over the database.


The interesting thing is that it looks like the transactions where the 
time appears to go backwards are duplicates of earlier transactions:


position in file   tid time from tid
310253762330x03689abb582f1311  2006-10-03 04:43:20.668098
310253765080x03689abdbbe5f000  2006-10-03 04:45:44.038639
310253767830x03689abddbe6be55  2006-10-03 04:45:51.539377

...lots of transactions...

310256469130x03689abb582f1311  2006-10-03 04:43:20.668098
310256471880x03689abdbbe5f000  2006-10-03 04:45:44.038639
310256474630x03689abddbe6be55  2006-10-03 04:45:51.539377

Would this seem to be an accurate reading of the attached log?

If so:

1. Could repozo have a bug that resulted in this?

2. If repozo has no bug, should it have checking that makes sure it
   doesn't build insane .fs files, or is fstest the way to go for that?

3. If repozo is not to blame, what could be?

cheers,

Chris

--
Simplistix - Content Management, Zope  Python Consulting
   - http://www.simplistix.co.uk
--- fstest.py.original 2006-08-29 11:33:50.0 +0100
+++ fstest.py2006-10-04 13:57:37.0 +0100
@@ -1,2 +1,2 @@
 ##
 #
@@ -40,3 +40,4 @@
 import string
 import struct
 import sys
+from ZODB.utils import p64, readable_tid_repr as rtr

 class FormatError(ValueError):
 There is a problem with the format of the FileStorage.
+def __repr__(self):
+return self.msg

 class Status:
 checkpoint = 'c'
@@ -80,1 +83,1 @@
 return l

 def check(path):
 file = open(path, 'rb')

 file.seek(0, 2)
 file_size = file.tell()
 if file_size == 0:
-raise FormatError(empty file)
+print FormatError(empty file)
 file.seek(0)
 if file.read(4) != packed_version:
-raise FormatError(invalid file header)
+print FormatError(invalid file header)

 pos = 4L
 tid = '\000' * 8 # lowest possible tid to start
 i = 0
 while pos:
 _pos = pos
 pos, tid = check_trec(path, file, pos, tid, file_size)
 if tid is not None:
-chatter(%10d: transaction tid %s #%d \n %
-(_pos, hexify(tid), i))
+chatter(%10d: transaction tid %s tid_as_date %s #%d \n %
+(_pos, hexify(tid), rtr(p64(long(hexify(tid), 16))), i))
 i = i + 1
+if i  100:
+break


 def check_trec(path, file, pos, ltid, file_size):
@@ -114,4 +123,4 @@
 if not h:
 return None, None
 if len(h) != TREC_HDR_LEN:
-raise FormatError(%s truncated at %s % (path, pos))
+print FormatError(%s truncated at %s % (path, pos))

 tid, stl, status, ul, dl, el = struct.unpack(8s8scHHH, h)
 tmeta_len = TREC_HDR_LEN + ul + dl + el

 if tid = ltid:
-raise FormatError(%s time-stamp reduction at %s: %s = %s %
+print FormatError(%s time-stamp reduction at %s: %s = %s %
   (path, pos, hexify(tid), hexify(ltid)))
 ltid = tid

 tl = U64(stl) # transaction record length - 8
 if pos + tl + 8  file_size:
-raise FormatError(%s truncated possibly because of
+print FormatError(%s truncated possibly because of
damaged records at %s % (path, pos))
 if status == Status.checkpoint:
-raise FormatError(%s checkpoint flag was not cleared at %s
+print FormatError(%s checkpoint flag was not cleared at %s
 

[ZODB-Dev] Re: [Zope-dev] zodb time-stamp reduction error without time chaning on server

2006-10-04 Thread Dieter Maurer
Chris Withers wrote at 2006-10-4 15:06 +0100:
 ...
The interesting thing is that it looks like the transactions where the 
time appears to go backwards are duplicates of earlier transactions:

position in file   tid time from tid
310253762330x03689abb582f1311  2006-10-03 04:43:20.668098
310253765080x03689abdbbe5f000  2006-10-03 04:45:44.038639
310253767830x03689abddbe6be55  2006-10-03 04:45:51.539377

...lots of transactions...

310256469130x03689abb582f1311  2006-10-03 04:43:20.668098
310256471880x03689abdbbe5f000  2006-10-03 04:45:44.038639
310256474630x03689abddbe6be55  2006-10-03 04:45:51.539377

Would this seem to be an accurate reading of the attached log?

It looks as if you had given the same incremental file twice
to repozo.

1. Could repozo have a bug that resulted in this?

Maybe, especially when the same file is twice integrated

2. If repozo has no bug, should it have checking that makes sure it
doesn't build insane .fs files, or is fstest the way to go for that?

As fstest found this problem, it was not too bad.

3. If repozo is not to blame, what could be?

One possibility would be a bad call.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?

2006-10-04 Thread Dieter Maurer
Chris Withers wrote at 2006-10-4 09:45 +0100:
...rather than just incrementing integers?

I'm asking 'cos I've just started having time-stamp reduction errors 
on a production system where a contingent system is having a .fs file 
that's been re-constituted from repozo backups tested with fstest.py...

You should be happy about the much more explicit information.
It may allow you to analyse your problem better.

For example, these timestamps precisely tell you from when
the doubled transaction entries come. It may help you to verify
that they come from a single incremental backup file.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev