hi tres,
On 14.12.2010, at 18:11, Tres Seaver wrote:
On 12/14/2010 09:12 AM, Andreas Zeidler wrote:
Modified: Zope/branches/2.13/src/Products/ZCatalog/Lazy.py
===
--- Zope/branches/2.13/src/Products/ZCatalog/Lazy.py 2010-12-14 13:24:24 UTC
(rev 118862)
+++ Zope/branches/2.13/src/Products/ZCatalog/Lazy.py 2010-12-14 14:12:11 UTC
(rev 118863)
@@ -145,44 +145,28 @@
# Don't access data until necessary
def __init__(self, func, seq, length=None):
-self._seq = seq
-self._data = []
-self._func = func
+self._seq=seq
+self._func=func
if length is not None:
-self._len = length
+self._len=length
Please don't un-PEP8 a cleaned-up module: leave spaces around
operators. If that was unintentional (maybe you pasted from a copy of a
file made before the module was cleaned up?), then you still need to
review the diff and edit out unintended changes first.
yes, in was indeed unintentional, or rather unfortunate. i've originally
worked on the 2.12 branch (in a plone 4 buildout) where i think hanno's
pep8-related changes haven't been applied/back-ported yet. after that i pasted
the version into 2.13 (and yes, the commit order was the other way round :))
and was in fact wondering about the longer diff when hanno (who was actually
sitting next to me) pointed out he had changed the one-line if: and else:
expressions on that branch. however, my alias for svn diff tells diff to not
show white space changes so i missed the now again removed spaces around the =.
anyway, it's fixed in c118895.
else:
self._len = len(seq)
+self._marker = object()
+self._data = [self._marker] * self._len
Have you measured the impact of pre-allocating here for very large
sequences?
i had not, but have now. turns out the allocation makes a difference earlier
than i would have expected: accessing the first 20 items of the map is slower
with the new version starting at sequences of a little more than 2000 items.
and of course it (relatively) gets slower the longer the sequence is. setting
up a 10^5 empty sequence takes about 0.5 ms on my machine.
I'm *sure* that is a not a good trade-off for all cases: the catalog is
often used for queries which return result sets in the order of 10^5 or
more objects, and *very* rarely is it accessed randomly (I had never
seen it before reading the bug entry you cite). The normal case is to
take the first few entries (batch_size * batch_number) from the huge set.
true, but the fix wasn't only about this use-case. it's mainly about avoiding
extra function calls when fetching a batch other than the first. so far the
map function wall called for every item up to the one accessed, e.g. for the
5th batch of 20 items each, you'd end up with 80 unnecessary (and possible
expensive) calls. or iow, the old version wasn't really as lazy as it should
have been… :)
I think you would be better off finding a way to inject your style of
LazyMap into the catalog for the random-access case, e.g., by passing
'_lazy_map' into the methods you are using.
i disagree. i think it should be fixed properly upstream. we should really
avoid all that monkey-patching all over the place. but see below…
BTW, another interesting alternative impplementation might be to use a
dictionary instead of a list for 'data'. You could then avoid any
pre-allocation at all.
with the benchmark in place, i was curious and tried that as well: it beats
the other versions independently of the sequence length and the number of
accessed items, both sequentially and repeatedly over a single batch. in the
benchmark they're accessed from the start, but skipping a few batches would
make it even faster anyway. hence i've updated the code again in c118896.
please let me know what you think, or if you want the test script i've been
using.
While we're at it, the 'try: ... except:' here is wasteful and slow.
Lazy objects aren't persistent, and the '_seq' attribute is added
unconditionally in '__init__'.
you're right, this should be cleaned up as well. but since this isn't my
code i was trying to keep the diff minimal (like on 2.12, i.e. in c118864, it
obviously didn't work in the other one :)). in any case, it's also gone with
c118896.
cheers,
andi
--
pgp key at http://zitc.de/pgp - http://wwwkeys.de.pgp.net/
plone 4.0 released! -- http://plone.org/4
PGP.sig
Description: This is a digitally signed message part
___
Zope-Dev maillist - Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists -
https://mail.zope.org/mailman/listinfo/zope-announce
https://mail.zope.org/mailman/listinfo/zope )