hi tres,

On 14.12.2010, at 18:11, Tres Seaver wrote:
> On 12/14/2010 09:12 AM, Andreas Zeidler wrote:
>> Modified: Zope/branches/2.13/src/Products/ZCatalog/Lazy.py
>> ===================================================================
>> --- Zope/branches/2.13/src/Products/ZCatalog/Lazy.py 2010-12-14 13:24:24 UTC 
>> (rev 118862)
>> +++ Zope/branches/2.13/src/Products/ZCatalog/Lazy.py 2010-12-14 14:12:11 UTC 
>> (rev 118863)
>> @@ -145,44 +145,28 @@
>>    # Don't access data until necessary
>>    def __init__(self, func, seq, length=None):
>> -        self._seq = seq
>> -        self._data = []
>> -        self._func = func
>> +        self._seq=seq
>> +        self._func=func
>>        if length is not None:
>> -            self._len = length
>> +            self._len=length
> Please don't un-PEP8 a cleaned-up module:  leave spaces around
> operators.  If that was unintentional (maybe you pasted from a copy of a
> file made before the module was cleaned up?), then you still need to
> review the diff and edit out unintended changes first.

yes, in was indeed unintentional, or rather unfortunate.  i've originally 
worked on the 2.12 branch (in a plone 4 buildout) where i think hanno's 
pep8-related changes haven't been applied/back-ported yet.  after that i pasted 
the version into 2.13 (and yes, the commit order was the other way round :)) 
and was in fact wondering about the longer diff when hanno (who was actually 
sitting next to me) pointed out he had changed the one-line if: and else: 
expressions on that branch.  however, my alias for "svn diff" tells diff to not 
show white space changes so i missed the now again removed spaces around the =.

anyway, it's fixed in c118895.

>>        else:
>>            self._len = len(seq)
>> +        self._marker = object()
>> +        self._data = [self._marker] * self._len
> Have you measured the impact of pre-allocating here for very large
> sequences?

i had not, but have now.  turns out the allocation makes a difference earlier 
than i would have expected:  accessing the first 20 items of the map is slower 
with the new version starting at sequences of a little more than 2000 items.  
and of course it (relatively) gets slower the longer the sequence is.  setting 
up a 10^5 "empty" sequence takes about 0.5 ms on my machine.

> I'm *sure* that is a not a good trade-off for all cases:  the catalog is
> often used for queries which return result sets in the order of 10^5 or
> more objects, and *very* rarely is it accessed randomly (I had never
> seen it before reading the bug entry you cite).  The normal case is to
> take the first few entries (batch_size * batch_number) from the huge set.

true, but the fix wasn't only about this use-case.  it's mainly about avoiding 
extra function calls when fetching a batch other than the first.  so far the 
map function wall called for every item up to the one accessed, e.g. for the 
5th batch of 20 items each, you'd end up with 80 unnecessary (and possible 
expensive) calls.  or iow, the old version wasn't really as lazy as it should 
have been… :)

> I think you would be better off finding a way to inject your style of
> LazyMap into the catalog for the random-access case, e.g., by passing
> '_lazy_map' into the methods you are using.

i disagree.  i think it should be fixed properly upstream.  we should really 
avoid all that monkey-patching all over the place.  but see below…

> BTW, another interesting alternative impplementation might be to use a
> dictionary instead of a list for 'data'.  You could then avoid any
> pre-allocation at all.

with the benchmark in place, i was curious and tried that as well:  it beats 
the other versions independently of the sequence length and the number of 
accessed items, both sequentially and repeatedly over a single batch.  in the 
benchmark they're accessed from the start, but skipping a few batches would 
make it even faster anyway.  hence i've updated the code again in c118896.  
please let me know what you think, or if you want the test script i've been 

> While we're at it, the 'try: ... except:'  here is wasteful and slow.
> Lazy objects aren't persistent, and the '_seq' attribute is added
> unconditionally in '__init__'.

you're right, this should be cleaned up as well.  but since this isn't "my 
code" i was trying to keep the diff minimal (like on 2.12, i.e. in c118864, it 
obviously didn't work in the other one :)).  in any case, it's also gone with 


pgp key at http://zitc.de/pgp - http://wwwkeys.de.pgp.net/
plone 4.0 released! -- http://plone.org/4

Attachment: PGP.sig
Description: This is a digitally signed message part

Zope-Dev maillist  -  Zope-Dev@zope.org
**  No cross posts or HTML encoding!  **
(Related lists - 
 https://mail.zope.org/mailman/listinfo/zope )

Reply via email to