Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?

2006-10-05 Thread Chris Withers

Dieter Maurer wrote:

You should be happy about the much more explicit information.
It may allow you to analyse your problem better.


This question has nothing to do with that problem, it just came up as a 
result of once again being reminded that we use timestamps as 
transaction ids.



For example, these timestamps precisely tell you from when
the doubled transaction entries come. It may help you to verify
that they come from a single incremental backup file.


Yes, but using timestamps also means:

- we're dependent on the system clock being accurate for no good reason

- under high load, we have to deal with the possibility of duplicate 
transaction ids


I'm wondering why we take on those issues rather than just use an 
incrementing integer sequence instead?


Chris

--
Simplistix - Content Management, Zope  Python Consulting
   - http://www.simplistix.co.uk

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?

2006-10-05 Thread Jim Fulton

Chris Withers wrote:

Dieter Maurer wrote:

You should be happy about the much more explicit information.
It may allow you to analyse your problem better.


This question has nothing to do with that problem, it just came up as a 
result of once again being reminded that we use timestamps as 
transaction ids.


It would be more correct to say that we base transaction ids on timestamps.


For example, these timestamps precisely tell you from when
the doubled transaction entries come. It may help you to verify
that they come from a single incremental backup file.


Yes, but using timestamps also means:

- we're dependent on the system clock being accurate for no good reason


I'm hoping that Jeremy or Tim will chime in, since we considered switching
to integers a while back.

- under high load, we have to deal with the possibility of duplicate 
transaction ids


This is not the case.  A storage guarantees that time stamps are unique,
incrementing time stamps, if necessary, to do so.

I'm wondering why we take on those issues rather than just use an 
incrementing integer sequence instead?


Note that originally, transaction ids were not required to be
based on time stamps. In fact, I don't know that they are required
to be based on time stamps and I don't know where to find this out. :(

(ZODB is really in need of some Zope3-style cleanup, without
 Zope3-style reimplementation and refactoring.)

My original intent was that transaction ids would be opaque strings.
Note that, until MVCC, transaction ids were largely internal implementation
details of storages.

With MVCC, transaction ids have to be monotonically increasing
and must be the same as serials.

Long ago, in a fit of laziness on my part, we began leveraging the
fact that serials were time stamps to display object modification times.
If we switched to integers, this would break.  This isn't to say that we
shouldn't fix this, but doing so would entail some significant disruption.
I would go so far to say that the benefit isn't worth the cost for
FileStorage.

Jim

--
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
CTO  (540) 361-1714http://www.python.org
Zope Corporation http://www.zope.com   http://www.zope.org
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?

2006-10-05 Thread Jim Fulton

Dieter Maurer wrote:

Chris Withers wrote at 2006-10-5 09:52 +0100:

...

I'm wondering why we take on those issues rather than just use an 
incrementing integer sequence instead?


With integer keys, you would not be able to pack to something
like n days before -- as you do not have any time in your
storage file


You would still need to track times, but you could manage
this as transaction meta data, rather than using it as an id.

Jim

--
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
CTO  (540) 361-1714http://www.python.org
Zope Corporation http://www.zope.com   http://www.zope.org
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?

2006-10-05 Thread Dieter Maurer
Jim Fulton wrote at 2006-10-5 13:58 -0400:
 ...
 With integer keys, you would not be able to pack to something
 like n days before -- as you do not have any time in your
 storage file

You would still need to track times, but you could manage
this as transaction meta data, rather than using it as an id.

Yes, but the only thing I would gain, is that I would not need
to worry about uniqueness of the timestamps -- which is
a very small part of the overall complexity.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?

2006-10-05 Thread Tim Peters

...

[Chris Withers]

Yes, but using timestamps also means:

- we're dependent on the system clock being accurate for no good reason


[Jim Fulton]

I'm hoping that Jeremy or Tim will chime in, since we considered switching
to integers a while back.


Not much to say here.  It's not true that correctness in recent
versions of ZODB depends on the system clock being accurate; and
FileStorage.__init__ logs messages about seeming time travel because
users (not ZODB) get confused if tids don't bear a close relationship
to true wall-clock time.

Or at least that's the theory ;-)  That Chris /is/ seeing timestamp
reductions suggests something may be broken, although Jim  I did
eyeball review of all seemingly revelant code paths for days a year or
so ago, and didn't find anything that looked even vaguely suspicious.

Philosophically, I'm a fan of making things as general as /necessary/,
rarely as general as /possible/.  We need /some/ idea of monotonically
increasing time regardless.  If that can also serve for transaction
identifiers and object revision identifiers, why not? is the first
question I ask.  Less stuff to go wrong the fewer the concepts.


- under high load, we have to deal with the possibility of duplicate
  transaction ids



This is not the case.  A storage guarantees that time stamps are unique,
incrementing time stamps, if necessary, to do so.


Which is a way of dealing with duplicates, so I'll take Chris's point.
It's in fact very likely that successive calls to time.time() will
deliver exactly the same value on Windows, so the increment time
stamps, if necessary code gets exercised fequently on Windows even
when the system clock is 100% healthy (well, not so much in production
code, but in the test suite some parts generate transactions at very
high rates).

But, Chris, in this sense we have to deal with the possibility of
duplicates using /any/ form of identifier.  A real advantage of using
time.time() as a /starting guess/ is that it's quite likely to deliver
unique /starting guesses/ in production code (and even on Windows) all
by itself.  If you want to use integers instead, then you need harrier
code, and have to resort to a hand-managed mutex in the end.

Of course that's doable -- but why bother?  If there's some path in
ZODB that's failing to handle duplicates now, despite that all paths
definitely /intend/ to make duplicates impossible, there's no reason
to imagine that the hairier code needed to prevent raw-integer
duplicates would somehow be more reliable.  (OK, I'll confess that I
don't consider the timestamp duplication-prevention code to be hairy
because it's coded in C -- this is partly an out of sight, out of
mind illusion -- but even so, the C code doing this is
straightforward).


I'm wondering why we take on those issues rather than just use an
incrementing integer sequence instead?


As above, I don't believe that's actually easier, or more reliable, in
any relevant sense.


Note that originally, transaction ids were not required to be
based on time stamps. In fact, I don't know that they are required
to be based on time stamps and I don't know where to find this out. :(


While I mostly care whether there's a killer-strong reason /not/ to
base them on time as a starting guess.  If there isn't,
YMTYGTNIBNHSFSWTF (you may think you're going to need it but nobody
has so far so WTF ;-)).


(ZODB is really in need of some Zope3-style cleanup, without
  Zope3-style reimplementation and refactoring.)

My original intent was that transaction ids would be opaque strings.
Note that, until MVCC, transaction ids were largely internal implementation
details of storages.


IMO the code got easier to follow when the distinction between tids
and serials went away, and doing so eliminated more possibility of
nasty bugs due to confusing them.


With MVCC, transaction ids have to be monotonically increasing
and must be the same as serials.


Right, the code relies on those heavily, and in particular the ZEO
cache (which potentially needs to store multiple revisions of each
object, and needs to order revisions by commit order -- time works
fine for that too).


Long ago, in a fit of laziness on my part, we began leveraging the
fact that serials were time stamps to display object modification times.


Which proved to be extremely useful for debugging and analysis.  All
hail laziness!


If we switched to integers, this would break.  This isn't to say that we
shouldn't fix this, but doing so would entail some significant disruption.
I would go so far to say that the benefit isn't worth the cost for
FileStorage.


While I would go so far as to suggest there may well be no real
benefit at all.  It's hard to imagine opening a new ZODB for Christmas
and exclaiming oh boy!  transaction ids are /finally/ meaningless
bags of bits!  how did we ever manage when printing one was
informative? :-)
___
For more information about ZODB, see the ZODB Wiki:

Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?

2006-10-04 Thread Dieter Maurer
Chris Withers wrote at 2006-10-4 09:45 +0100:
...rather than just incrementing integers?

I'm asking 'cos I've just started having time-stamp reduction errors 
on a production system where a contingent system is having a .fs file 
that's been re-constituted from repozo backups tested with fstest.py...

You should be happy about the much more explicit information.
It may allow you to analyse your problem better.

For example, these timestamps precisely tell you from when
the doubled transaction entries come. It may help you to verify
that they come from a single incremental backup file.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev