Re: [ZODB-Dev] Unpickler.noload, zc.zodbdgc and multi-database reference bug (IndexError)?

2014-01-30 Thread Jim Fulton
On Thu, Jan 30, 2014 at 12:40 PM,   wrote:
>
> On Jan 30, 2014, at 11:12, jason.mad...@nextthought.com wrote:
>
>> So it seems that the behaviour of `noload` might have changed between 2.6.x 
>> and 2.7.x?
>
> Apologies for replying to myself, but I think I found the root cause.
>
> After some further investigation, I found issue 1101399 
> (http://bugs.python.org/issue1101399), complaining that noload is broken for 
> subclasses of dict. The fix for this issue was applied to the cPython trunk 
> in October of 2009 without any corresponding tests 
> (http://hg.python.org/releasing/2.7.6/rev/d0f005e6fadd). In releases with 
> this fix (if I'm reading the code correctly),

Probably because the original code had no tests (because we weren't
doing that then) and
no documentation either (my bad) because this was added for the
specific use case of
efficiently scraping out references in ZODB.



>
> This fix means that multi-database references are always going to be returned 
> as an empty list under noload (again, if I'm reading the code correctly). 
> This means that multi-references and noload don't work under Python 2.7.x or 
> 3.x with zodbpickle and so consequently neither does an unmodified zc.zodbdgc.

Sigh.  We're still using Python 2.6 for  out database servers. :)

> I don't know what the best way forward is. Our solution to use `load` instead 
> seems to work for us, but may not work for everyone.

It will probably work, but be slower, which isn't a huge deal since
zc.zodbdgc runs out of
process. We run it on ZRS secondaries, which are otherwise idle.

> Maybe zodbpickle could revert the fix in its branch and zc.zodbgc could 
> depend on that? I'm happy to help test other ideas.

That may be the best way forward.

I can't speak with much confidence until I've had a chance to
wade in and refresh my memory on this stuff.

Something I wish I'd done differently in ZODB and contemplated changing on a
number of occasions is the handling of references. I wish I'd stored
them outside the
pickle so they could be analyzed without unpicking (or at least
without unpickling the
application data).

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Unpickler.noload, zc.zodbdgc and multi-database reference bug (IndexError)?

2014-01-30 Thread jason . madden

On Jan 30, 2014, at 11:12, jason.mad...@nextthought.com wrote:

> So it seems that the behaviour of `noload` might have changed between 2.6.x 
> and 2.7.x?

Apologies for replying to myself, but I think I found the root cause.

After some further investigation, I found issue 1101399 
(http://bugs.python.org/issue1101399), complaining that noload is broken for 
subclasses of dict. The fix for this issue was applied to the cPython trunk in 
October of 2009 without any corresponding tests 
(http://hg.python.org/releasing/2.7.6/rev/d0f005e6fadd). In releases with this 
fix (if I'm reading the code correctly), the pickle opcodes append and appends 
become no-ops, clearing the pickle stack when they are encountered. 

A multi-database reference is pickled as a list, so it uses the appends opcode:

   23: ]EMPTY_LIST
   24: qBINPUT 3
   26: (MARK
   27: USHORT_BINSTRING 'm'
   30: (MARK
   31: USHORT_BINSTRING 'Users_1_Prod'
   45: qBINPUT 4
   47: USHORT_BINSTRING 
'\x00\x00\x00\x00\x00\x00\x00\x01'
   57: qBINPUT 5
   59: cGLOBAL 'zope.site.folder Folder'
   84: qBINPUT 6
   86: tTUPLE  (MARK at 30)
   87: qBINPUT 7
   89: eAPPENDS(MARK at 26)
   90: QBINPERSID

This fix means that multi-database references are always going to be returned 
as an empty list under noload (again, if I'm reading the code correctly). This 
means that multi-references and noload don't work under Python 2.7.x or 3.x 
with zodbpickle and so consequently neither does an unmodified zc.zodbdgc. 

I don't know what the best way forward is. Our solution to use `load` instead 
seems to work for us, but may not work for everyone. Maybe zodbpickle could 
revert the fix in its branch and zc.zodbgc could depend on that? I'm happy to 
help test other ideas.

Thanks again,

Jason
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Unpickler.noload, zc.zodbdgc and multi-database reference bug (IndexError)?

2014-01-30 Thread jason . madden
On Jan 30, 2014, at 10:14, Jim Fulton  wrote:

> On Thu, Jan 30, 2014 at 9:58 AM,   wrote:

>> I was curious if anyone else had seen this
> 
> I haven't. I'm the author of zodbdgc and I use it regularly, including on
> large (for some definition) databases.

Hi Jim,

I really appreciate your input on this. Thanks for taking the time to respond.

>> Details:
>> 
>> I'm working under Python 2.7.6 and 2.7.3 with ZODB 4.0.0, zc.zodbdgc 0.6.1 
>> and eventually zodbpickle 0.5.2. Most of my results were repeated on both 
>> Mac OS X and Linux.
> 
> Why are you using zodbpickle?  Perhaps that is behaving differently
> from cPickle in some fashion?

I didn't start out with zodbpickle, I just tried it as an alternative to 
cPickle after cPickle was failing. The hope was that they would actually behave 
differently in some fashion, but the observed behaviour is identical.


> I'd try using ZODB 3.10.  I suspect a ZODB 4 incompatibility of some sort.
> 
> Unfortunately, I don't have time to dig into this now.
> 
> This weekend, I'll at least see if I can make zodbdgc tests pass with ZODB 4.
> Perhaps that will shed light.

That's a very good point. I'm not sure what it will take to repeat the test 
with ZODB 3.10, but I can try (although, since the pickles are already written 
at this point, I probably wouldn't expect to see a difference? Nothing mutates 
the actual pickle data loaded from a storage, excluding transformers like 
zc.zlibstorage). 

Along those lines, I tried just downloading zc.zodbdgc and running its tests 
against ZODB 3.10 (under both Python 2.7.6/OS X and 2.7.3/Linux), and 
unfortunately, they immediately fail with the exact same IndexError I see when 
trying to GC our database:

$ mkvirtualenv ZODB3
$ workon ZODB3
$ pip install ZODB3==3.10.5

$ wget 
https://pypi.python.org/packages/source/z/zc.zodbdgc/zc.zodbdgc-0.6.1.tar.gz
...
$ tar -xf zc.zodbdgc-0.6.1.tar.gz && cd zc.zodbdgc-0.6.1
$ python ./setup.py test
...
==
FAIL: zc.zodbdgc-0.6.1/src/zc/zodbdgc/README.test
Doctest: README.test
--
Traceback (most recent call last):
  File 
"/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py",
 line 2201, in runTest
raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for README.test
  File "tmp/zc.zodbdgc-0.6.1/src/zc/zodbdgc/README.test", line 0

--
File "zc.zodbdgc-0.6.1/src/zc/zodbdgc/README.test", line 194, in README.test
Failed example:
bad = zc.zodbdgc.gc('config', days=2)
Exception raised:
Traceback (most recent call last):
  File 
"/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/doctest.py",
 line 1289, in __run
compileflags, 1) in test.globs
  File "", line 1, in 
bad = zc.zodbdgc.gc('config', days=2)
  File "zc.zodbdgc-0.6.1/src/zc/zodbdgc/__init__.py", line 105, in gc
return gc_(close, conf, days, ignore, conf2, fs, untransform, ptid)
  File "zc.zodbdgc-0.6.1/src/zc/zodbdgc/__init__.py", line 209, in gc_
for ref in getrefs(data, name, ignore):
  File "zc.zodbdgc-0.6.1/src/zc/zodbdgc/__init__.py", line 287, in getrefs
ref = ref[1]
IndexError: list index out of range

The tests failed the same way under ZODB 4.0.0 and ZODB3 3.9.0 (the minimum 
requirement listed in setup.py).

However, when I tried Python 2.6.8, the tests pass with ZODB 3.10.5 (they don't 
pass with ZODB 4, but it looks to be unrelated). So it seems that the behaviour 
of `noload` might have changed between 2.6.x and 2.7.x? Is it possible you're 
running on Python < 2.7?

Thanks,

Jason

___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Unpickler.noload, zc.zodbdgc and multi-database reference bug (IndexError)?

2014-01-30 Thread Jim Fulton
On Thu, Jan 30, 2014 at 9:58 AM,   wrote:
> Hello ZODB dev,
>
> I was recently trying to GC a large multi-database setup for the first time 
> using zc.zodbdgc. The process wouldn't complete (or really even get started) 
> because of an IndexError being thrown from `zc.zodbdgc.getrefs` (__init__.py 
> line 287). As I traced through it, it began to look like the combination of 
> `cPickle.Unpickler.noload` and multi-database persistent ids (which in ZODB 
> are list objects) fails, generating an empty list instead of the expected 
> [ref type, args] list documented in `ZODB.serialize`. This makes it 
> impossible to correctly GC a multi-database.
>
> I was curious if anyone else had seen this

I haven't. I'm the author of zodbdgc and I use it regularly, including on
large (for some definition) databases.

>, or maybe I'm just doing something wrong? We solved our problem by using 
>`load` instead of `noload`, but I wondered if there might be a better way?
>
> Details:
>
> I'm working under Python 2.7.6 and 2.7.3 with ZODB 4.0.0, zc.zodbdgc 0.6.1 
> and eventually zodbpickle 0.5.2. Most of my results were repeated on both Mac 
> OS X and Linux.

Why are you using zodbpickle?  Perhaps that is behaving differently
from cPickle in some
fashion?


> After hitting the IndexError, I began debugging the problem. When it became 
> clear that the persistent_load callback was simply getting the wrong 
> persistent ids passed to it (empty lists instead of complete multi-db refs), 
> I tried swapping in zodbpickle for the stock cPickle to the same effect. 
> Here's some code demonstrating the problem:
>
>
> This pickle data came right out of ZODB, captured during a debug session of 
> zc.zodbdgc. It has three persistent ids, two cross database and one in the 
> same database:
>
> >>> p = 
> 'cBTrees.OOBTree\nOOBTree\nq\x01.X\x0c\x00\x00\x00Users_1_Prodq\x02]q\x03(U\x01m(U\x0cUsers_1_Prodq\x04U\x08\x00\x00\x00\x00\x00\x00\x00\x01q\x05czope.site.folder\nFolder\nq\x06tq\x07eQX\x0c\x00\x00\x00Users_2_Prodq\x08]q\t(U\x01m(U\x0cUsers_2_Prodq\nU\x08\x00\x00\x00\x00\x00\x00\x00\x01q\x0bh\x06tq\x0ceQX\x0b\x00\x00\x00dataserver2q\r(U\x08\x00\x00\x00\x00\x00\x00\x00\x10q\x0eh\x06tQq\x0f.'
>
> This code is copy-and-pasted out of zc.zodbgc getrefs. It's supposed to find 
> all the persistent refs and put them inside the `refs` list:
>
> >>> import cPickle
> >>> import cStringIO
> >>> refs = []
> >>> u = cPickle.Unpickler(cStringIO.StringIO(p))
> >>> u.persistent_load = refs
> >>> u.noload()
> >>> u.noload()
>
> But if we look at `refs`, we see that the first two cross-database refs are 
> returned as empty lists, not the correct value:
>
> >>> refs
> [[], [], ('\x00\x00\x00\x00\x00\x00\x00\x10', None)]
>
> If instead we use `load` to read the state, we get the correct references:
>
> >>> refs = []
> >>> u = cPickle.Unpickler(cStringIO.StringIO(p))
> >>> u.persistent_load = refs
> >>> u.noload()
> >>> u.load()
>
> >>> refs
> [['m', ('Users_1_Prod', '\x00\x00\x00\x00\x00\x00\x00\x01',  'zope.site.folder.Folder'>)],
>  ['m', ('Users_2_Prod', '\x00\x00\x00\x00\x00\x00\x00\x01',  'zope.site.folder.Folder'>)],
>  ('\x00\x00\x00\x00\x00\x00\x00\x10', )]
>
> The results are the same using zodbpickle or using an actual callback 
> function instead of the append-directly-to-list shortcut.
>
> If we fix the IndexError by checking the size of the list first, we miss all 
> the cross-db references, meaning that a GC is going to be too aggressive. But 
> using `load` is slower and requires access to all of the classes referenced. 
> If anyone has run into this before or has other suggestions, I'd appreciate 
> hearing them.

I'd try using ZODB 3.10.  I suspect a ZODB 4 incompatibility of some sort.

Unfortunately, I don't have time to dig into this now.

This weekend, I'll at least see if I can make zodbdgc tests pass with ZODB 4.
Perhaps that will shed light.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zodb-dev