On Aug 2, 2006, at 6:03 AM, Dominik Huber wrote:

Hi Gary

Thanks for your inputs. I was also offline this days...

Gary Poster wrote:
Well, first of all, I suspect your situation, based on what I've seen so far, looks something like this:

- code creates obj P1
- code puts P1 in intids, with connection.
- code does something incorrect (either your code or code that you are relying on), probably involving a savepoint because of your discussion below, that keeps keyreference to P1 in intids BTree, but rolls back addition of P1 to connection. Right now we have the bad data (a key reference to an object without a connection in the BTree), but we don't know about it yet.
- Now code creates obj P2.
- code tries to put P2 in intids. If the BTree logic happens to make it compare against the bad keyreference to P1, you get the __cmp__ error. This is a red herring for debugging: the problem has already occurred (see above).

Few more details:
There is a import handler which creates p1 , p2 to pn and sets those object to a certain folder. After each addition of a certain p an initializer subscriber listen to IObjectAddedEvent is invoked too. The error occurs sometimes during this initialization. In case of an error the creation and initialization of the error-prone p is rolled back by an Importer (compare the following code):

class Importer(object):
   def __call__(self, importhandelr)
          for d in import_reader:
               if not d[first].startswith('#'):
                   savepoint = transaction.savepoint()
data = dict([(key, value for key, value in d.items()]) importhandler(data) # <- call importhandler which creates the p

                   except Exception, e:
                       savepoint.rollback() # clean up

Huh. I wonder if a savepoint rollback cleans up sufficiently after a commit; I've never used it like that. Out of curiosity, if you take the commit out does the problem go away? Generally, looking at the code, one of the very first things that Transaction.commit does is _invalidate_all_savepoints, so the pattern you have here doesn't look advisable.

That said, if that were the problem here, it looks like you should have gotten an exception when you did the rollback, so it's probably just something to change, but not the cause of what we're talking about.

Also, in the vein of things that are probably not the cause of the problem we are discussing, but are maybe worth highlighting, the whole bare except thing around a commit is an interesting discussion. For a one-time import, maybe this is fine, but for more robust code I prefer something a little more sophisticated. I've tried various things over the years; my current recipe is something like this:

    ...do stuff and commit...
except ZODB.POSException.TransactionError:
    ...abort and retry some number of times, and eventually raise...
except (SystemExit, KeyboardInterrupt, ZODB.POSException.POSError):
   ...abort and raise...
    ...abort and log, or whatever is appropriate...

That's probably somewhat controversial--maybe TransactionError is too broad, for instance--but I offer it for what it's worth.

I'm pretty sure your BTree is already bad by the time you get to the failing __cmp__.
The only way I know to fix it at this late stage in the story is drastic: you'll need to create a new BTree, iterate over the old one and add all the contents except for the broken keyreferences, and replace the old intid BTree with the new one.

I don't think that's appropriate error catching for the BTree code. I think if you get the error you describe, you have a serious problem in "your" code (i.e., code that you've written or code that you're relying on in your stack, like Zope). How--and when--to handle it is application policy, not something that should be in the base intids, I think.
Yes, but the whole problem seems not so heavy:

1. If I restart the server no wrong intid entries are there.

One possible explanation is that you have some in-memory data structure that the savepoint can't roll back. Another is that the savepoint code isn't quite right. A third is that the BTree code isn't quite right. I suspect something like the first more than the second or third, but that doesn't rule anything out. Have you analyzed what exception triggers the problem? Have you analyzed the bad in-memory tree to see if it gives you any clues?

2. If I patch the __cmp__ method of KeyReferenceToPersistent the proposed way it only runs right after a wrong p creation into the except. After another successful transaction commit I don't have any bad intids entries

IMO that points that somehow the persistent data are ok. That we only might have an 'old' volatile reference to a roled back intid somewhere, that causes the error ... My problem is that I cannot the error situation by a test case where intids are invoked. Probably it's a problem of the order of invoked subscribers listen to the object added event.

Maybe so, but I believe that you are going to *have* to produce a reliable test case to have a reliable fix. This is too weird, and we don't even know what is implicated.

The error occurs after a savepoint.rollback. I can reproduce the error within the basic error story (-> see testcase). It happens if we compare a commited persistent object with a rollbacked persistent object.

The current implementation hurts the demeter's law. A fresh instance does not support the db attribute, even though the situation showed within the basic error story

   >>> from persistent import Persistent
   >>> p = Persistent()
   >>> p._p_jar.db().database_name
   Traceback (most recent call last):
   AttributeError: 'NoneType' object has no attribute 'db'

I remember knowing Demeter's law. ;-) I should look it up, but I've already spent more time than I should have this morning. The above behavior is expected. The connection.add(obj) code in the persistent keyreference code gives the obj a _p_jar.

IMO the error should still be handled by KeyReferenceToPersistent __cmp__ method.

This is too late. Anything at this stage in the game is going to be a dangerous cover-up of an already serious problem.
Demeter says only long.dotted.name.calls() on references of an object are leading to pretty instable code.

Ah, I see. Not sure if I ever knew the "law" or not ;-) but I see the point.

We have this problem in the current implementation as shown in the test case. If we use rollback's we can produce an object state of a persistent that does not allow to lookup p._p_jar.db ().database_name. That means the current implementation relies on an implicit interface assumption too.

Well, I agree that the current implementation relies on an implicit interface assumption, but we part ways after that. The current implementation would be explicit if it documented that it relies on other persistent key reference implementations having an `object` attribute. Maybe, in fact, the intent is that key_type_id is only associated with a single class--but this isn't a documented assertion. The persistent.txt file does document that persistent key references must be to objects that have a connection, and so the `_p_jar` requirement in __cmp__ is reasonable, since the _p_jar is the connection. This could maybe be better documented, in an interface perhaps; and, as I said in one of my first replies to this thread, __cmp__ could maybe raise a more informative error to make this edge case easier to identify.

  >>> transaction.commit()
  >>> savepoint = transaction.savepoint()
  >>> p2 = P()
  >>> p2.__parent__ = top
  >>> keyref2 = KeyReferenceToPersistent(p2)
  >>> cmp(keyref1, keyref2)
  >>> savepoint.rollback()
  >>> cmp(keyref1, keyref2)
  Traceback (most recent call last):
  AttributeError: 'NoneType' object has no attribute 'db'

Yes, you've showed this before.  This is to be expected.

IMO it is a pretty self-punishment-strategy to use a __cmp__ method to recognize corrupt intids.

:-) It's not ideal, but I don't have a better suggestion. Something is going wrong in the stack you are using, and you are getting in an exceptional edge case. The root cause needs to be addressed. With an error like this, it is a horrible time to modify the BTree, so the __cmp__ can't gloss things over (see brief examples of "lies" below); and a __cmp__ method is a very impractical and inappropriate time to write to the database to fix things.

Your aspect of corrupt intids is *very* important, but removing those after an error-prone transaction would be much comfortable with a working __cmp__ function. So I still propose the following fix...

class KeyReferenceToPersistent(object):
  def __cmp__(self, other):
      if self.key_type_id == other.key_type_id:
              return cmp(
(self.object._p_jar.db().database_name, self.object._p_oid), (other.object._p_jar.db().database_name, other.object._p_oid),
          except AttributeError:
              # a transaction might be rolled back
              self_db = getattr(self.object._p_jar, 'db', '')
              if self_db:
                  self_db = self_db().database_name
              other_db = getattr(other.object._p_jar, 'db', '')
              if other_db:
                  other_db = other_db().database_name

              return cmp(
                  (self_db,  self.object._p_oid),
                  (other_db, other.object._p_oid),

Again, like your other similar suggestion, this gives you the wrong answer--it lies. In this case, if your code compares something that doesn't have a database, it assume the database name is '': the default name for the main Zope database. Here are a few ways that can be a lie:

- the default database name is different. When (if!) the object that has no connection is added to the database, it will sort differently than it does now: corrupt BTree.

- You have a multi-database. The object is actually going to be added to a database with a different name. When it is, it will sort differently: corrupt BTree.

- You have an insane situation in your BTree, involving in-memory data structures or savepoints or whatever else is causing the problem here, so that one or more key references in the BTree have objects without connections. This glosses over the problem, causing...the unknown. Possibly a corrupt BTree. We don't know enough about what's going on for me to speculate too much.

If you don't ever use multiple databases, and you never change the default database name, you *might* be ok. We use multiple databases. We might change the default database name. The change is too fragile for us.

(I'm sure this is frustrating. :-( You have my sympathy. :-) )

Zope3-dev mailing list
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com

Reply via email to