Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?
... [Chris Withers] Yes, but using timestamps also means: - we're dependent on the system clock being accurate for no good reason [Jim Fulton] I'm hoping that Jeremy or Tim will chime in, since we considered switching to integers a while back. Not much to say here. It's not true that correctness in recent versions of ZODB depends on the system clock being accurate; and FileStorage.__init__ logs messages about seeming "time travel" because users (not ZODB) get confused if tids don't bear a close relationship to true wall-clock time. Or at least that's the theory ;-) That Chris /is/ seeing timestamp reductions suggests something may be broken, although Jim & I did eyeball review of all seemingly revelant code paths for days a year or so ago, and didn't find anything that looked even vaguely suspicious. Philosophically, I'm a fan of making things as general as /necessary/, rarely as general as /possible/. We need /some/ idea of monotonically increasing time regardless. If that can also serve for transaction identifiers and object revision identifiers, "why not?" is the first question I ask. Less stuff to go wrong the fewer the concepts. - under high load, we have to deal with the possibility of duplicate transaction ids This is not the case. A storage guarantees that time stamps are unique, incrementing time stamps, if necessary, to do so. Which is a way of dealing with duplicates, so I'll take Chris's point. It's in fact very likely that successive calls to time.time() will deliver exactly the same value on Windows, so the "increment time stamps, if necessary" code gets exercised fequently on Windows even when the system clock is 100% healthy (well, not so much in production code, but in the test suite some parts generate transactions at very high rates). But, Chris, in this sense we have to "deal with the possibility of duplicates" using /any/ form of identifier. A real advantage of using time.time() as a /starting guess/ is that it's quite likely to deliver unique /starting guesses/ in production code (and even on Windows) all by itself. If you want to use integers instead, then you need harrier code, and have to resort to a hand-managed mutex in the end. Of course that's doable -- but why bother? If there's some path in ZODB that's failing to handle duplicates now, despite that all paths definitely /intend/ to make duplicates impossible, there's no reason to imagine that the hairier code needed to prevent raw-integer duplicates would somehow be more reliable. (OK, I'll confess that I don't consider the timestamp duplication-prevention code to "be hairy" because it's coded in C -- this is partly an "out of sight, out of mind" illusion -- but even so, the C code doing this is straightforward). I'm wondering why we take on those issues rather than just use an incrementing integer sequence instead? As above, I don't believe that's actually easier, or more reliable, in any relevant sense. Note that originally, transaction ids were not required to be based on time stamps. In fact, I don't know that they are required to be based on time stamps and I don't know where to find this out. :( While I mostly care whether there's a killer-strong reason /not/ to base them on time as a starting guess. If there isn't, YMTYGTNIBNHSFSWTF (you may think you're going to need it but nobody has so far so WTF ;-)). (ZODB is really in need of some Zope3-style cleanup, without Zope3-style reimplementation and refactoring.) My original intent was that transaction ids would be opaque strings. Note that, until MVCC, transaction ids were largely internal implementation details of storages. IMO the code got easier to follow when the distinction between tids and serials went away, and doing so eliminated more possibility of nasty bugs due to confusing them. With MVCC, transaction ids have to be monotonically increasing and must be the same as serials. Right, the code relies on those heavily, and in particular the ZEO cache (which potentially needs to store multiple revisions of each object, and needs to order revisions by commit order -- "time" works fine for that too). Long ago, in a fit of laziness on my part, we began leveraging the fact that serials were time stamps to display object modification times. Which proved to be extremely useful for debugging and analysis. All hail laziness! If we switched to integers, this would break. This isn't to say that we shouldn't fix this, but doing so would entail some significant disruption. I would go so far to say that the benefit isn't worth the cost for FileStorage. While I would go so far as to suggest there may well be no real benefit at all. It's hard to imagine opening a new ZODB for Christmas and exclaiming "oh boy! transaction ids are /finally/ meaningless bags of bits! how did we ever manage when printing one was informative?" :-) ___ For more information about ZODB, see the ZOD
Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?
Jim Fulton wrote at 2006-10-5 13:58 -0400: > ... >> With integer keys, you would not be able to pack to something >> like n days before -- as you do not have any time in your >> storage file > >You would still need to track times, but you could manage >this as transaction meta data, rather than using it as an id. Yes, but the only thing I would gain, is that I would not need to worry about uniqueness of the timestamps -- which is a very small part of the overall complexity. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?
Dieter Maurer wrote: Chris Withers wrote at 2006-10-5 09:52 +0100: ... I'm wondering why we take on those issues rather than just use an incrementing integer sequence instead? With integer keys, you would not be able to pack to something like n days before -- as you do not have any time in your storage file You would still need to track times, but you could manage this as transaction meta data, rather than using it as an id. Jim -- Jim Fulton mailto:[EMAIL PROTECTED] Python Powered! CTO (540) 361-1714http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?
Chris Withers wrote at 2006-10-5 09:52 +0100: >Dieter Maurer wrote: >> You should be happy about the much more explicit information. >> It may allow you to analyse your problem better. > >This question has nothing to do with that problem, it just came up as a >result of once again being reminded that we use timestamps as >transaction ids. > >> For example, these timestamps precisely tell you from when >> the doubled transaction entries come. It may help you to verify >> that they come from a single incremental backup file. > >Yes, but using timestamps also means: > >- we're dependent on the system clock being accurate for no good reason Requirements are very moderate. Only packing really depends on those dates and you usually do not need packing control below second resolution. >- under high load, we have to deal with the possibility of duplicate >transaction ids No, because the storages take care of this. > >I'm wondering why we take on those issues rather than just use an >incrementing integer sequence instead? With integer keys, you would not be able to pack to something like n days before -- as you do not have any time in your storage file -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?
Chris Withers wrote: Dieter Maurer wrote: You should be happy about the much more explicit information. It may allow you to analyse your problem better. This question has nothing to do with that problem, it just came up as a result of once again being reminded that we use timestamps as transaction ids. It would be more correct to say that we base transaction ids on timestamps. For example, these timestamps precisely tell you from when the doubled transaction entries come. It may help you to verify that they come from a single incremental backup file. Yes, but using timestamps also means: - we're dependent on the system clock being accurate for no good reason I'm hoping that Jeremy or Tim will chime in, since we considered switching to integers a while back. - under high load, we have to deal with the possibility of duplicate transaction ids This is not the case. A storage guarantees that time stamps are unique, incrementing time stamps, if necessary, to do so. I'm wondering why we take on those issues rather than just use an incrementing integer sequence instead? Note that originally, transaction ids were not required to be based on time stamps. In fact, I don't know that they are required to be based on time stamps and I don't know where to find this out. :( (ZODB is really in need of some Zope3-style cleanup, without Zope3-style reimplementation and refactoring.) My original intent was that transaction ids would be opaque strings. Note that, until MVCC, transaction ids were largely internal implementation details of storages. With MVCC, transaction ids have to be monotonically increasing and must be the same as serials. Long ago, in a fit of laziness on my part, we began leveraging the fact that serials were time stamps to display object modification times. If we switched to integers, this would break. This isn't to say that we shouldn't fix this, but doing so would entail some significant disruption. I would go so far to say that the benefit isn't worth the cost for FileStorage. Jim -- Jim Fulton mailto:[EMAIL PROTECTED] Python Powered! CTO (540) 361-1714http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?
Dieter Maurer wrote: You should be happy about the much more explicit information. It may allow you to analyse your problem better. This question has nothing to do with that problem, it just came up as a result of once again being reminded that we use timestamps as transaction ids. For example, these timestamps precisely tell you from when the doubled transaction entries come. It may help you to verify that they come from a single incremental backup file. Yes, but using timestamps also means: - we're dependent on the system clock being accurate for no good reason - under high load, we have to deal with the possibility of duplicate transaction ids I'm wondering why we take on those issues rather than just use an incrementing integer sequence instead? Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] Why DO we use timestamps as transaction ids?
Chris Withers wrote at 2006-10-4 09:45 +0100: >...rather than just incrementing integers? > >I'm asking 'cos I've just started having "time-stamp reduction" errors >on a production system where a contingent system is having a .fs file >that's been re-constituted from repozo backups tested with fstest.py... You should be happy about the much more explicit information. It may allow you to analyse your problem better. For example, these timestamps precisely tell you from when the doubled transaction entries come. It may help you to verify that they come from a single incremental backup file. -- Dieter ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Why DO we use timestamps as transaction ids?
...rather than just incrementing integers? I'm asking 'cos I've just started having "time-stamp reduction" errors on a production system where a contingent system is having a .fs file that's been re-constituted from repozo backups tested with fstest.py... cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev