Re: [Zope-dev] SVN: Zope/branches/2.13/ fix `LazyMap` to avoid unnecessary function calls when not accessing items in order (fixes http://dev.plone.org/plone/ticket/9018)

2010-12-15 Thread Andreas Zeidler
hi tres,

On 14.12.2010, at 18:11, Tres Seaver wrote:
 On 12/14/2010 09:12 AM, Andreas Zeidler wrote:
 Modified: Zope/branches/2.13/src/Products/ZCatalog/Lazy.py
 ===
 --- Zope/branches/2.13/src/Products/ZCatalog/Lazy.py 2010-12-14 13:24:24 UTC 
 (rev 118862)
 +++ Zope/branches/2.13/src/Products/ZCatalog/Lazy.py 2010-12-14 14:12:11 UTC 
 (rev 118863)
 @@ -145,44 +145,28 @@
# Don't access data until necessary
 
def __init__(self, func, seq, length=None):
 -self._seq = seq
 -self._data = []
 -self._func = func
 +self._seq=seq
 +self._func=func
if length is not None:
 -self._len = length
 +self._len=length
 
 Please don't un-PEP8 a cleaned-up module:  leave spaces around
 operators.  If that was unintentional (maybe you pasted from a copy of a
 file made before the module was cleaned up?), then you still need to
 review the diff and edit out unintended changes first.

yes, in was indeed unintentional, or rather unfortunate.  i've originally 
worked on the 2.12 branch (in a plone 4 buildout) where i think hanno's 
pep8-related changes haven't been applied/back-ported yet.  after that i pasted 
the version into 2.13 (and yes, the commit order was the other way round :)) 
and was in fact wondering about the longer diff when hanno (who was actually 
sitting next to me) pointed out he had changed the one-line if: and else: 
expressions on that branch.  however, my alias for svn diff tells diff to not 
show white space changes so i missed the now again removed spaces around the =.

anyway, it's fixed in c118895.

else:
self._len = len(seq)
 +self._marker = object()
 +self._data = [self._marker] * self._len
 
 Have you measured the impact of pre-allocating here for very large
 sequences?

i had not, but have now.  turns out the allocation makes a difference earlier 
than i would have expected:  accessing the first 20 items of the map is slower 
with the new version starting at sequences of a little more than 2000 items.  
and of course it (relatively) gets slower the longer the sequence is.  setting 
up a 10^5 empty sequence takes about 0.5 ms on my machine.

 I'm *sure* that is a not a good trade-off for all cases:  the catalog is
 often used for queries which return result sets in the order of 10^5 or
 more objects, and *very* rarely is it accessed randomly (I had never
 seen it before reading the bug entry you cite).  The normal case is to
 take the first few entries (batch_size * batch_number) from the huge set.

true, but the fix wasn't only about this use-case.  it's mainly about avoiding 
extra function calls when fetching a batch other than the first.  so far the 
map function wall called for every item up to the one accessed, e.g. for the 
5th batch of 20 items each, you'd end up with 80 unnecessary (and possible 
expensive) calls.  or iow, the old version wasn't really as lazy as it should 
have been… :)

 I think you would be better off finding a way to inject your style of
 LazyMap into the catalog for the random-access case, e.g., by passing
 '_lazy_map' into the methods you are using.

i disagree.  i think it should be fixed properly upstream.  we should really 
avoid all that monkey-patching all over the place.  but see below…

 BTW, another interesting alternative impplementation might be to use a
 dictionary instead of a list for 'data'.  You could then avoid any
 pre-allocation at all.

with the benchmark in place, i was curious and tried that as well:  it beats 
the other versions independently of the sequence length and the number of 
accessed items, both sequentially and repeatedly over a single batch.  in the 
benchmark they're accessed from the start, but skipping a few batches would 
make it even faster anyway.  hence i've updated the code again in c118896.  
please let me know what you think, or if you want the test script i've been 
using.

 While we're at it, the 'try: ... except:'  here is wasteful and slow.
 Lazy objects aren't persistent, and the '_seq' attribute is added
 unconditionally in '__init__'.

you're right, this should be cleaned up as well.  but since this isn't my 
code i was trying to keep the diff minimal (like on 2.12, i.e. in c118864, it 
obviously didn't work in the other one :)).  in any case, it's also gone with 
c118896.

cheers,


andi

-- 
pgp key at http://zitc.de/pgp - http://wwwkeys.de.pgp.net/
plone 4.0 released! -- http://plone.org/4



PGP.sig
Description: This is a digitally signed message part
___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] SVN: Zope/branches/2.13/ fix `LazyMap` to avoid unnecessary function calls when not accessing items in order (fixes http://dev.plone.org/plone/ticket/9018)

2010-12-14 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/14/2010 09:12 AM, Andreas Zeidler wrote:
 Log message for revision 118863:
 fix `LazyMap` to avoid unnecessary function calls when not accessing
 items in order (fixes http://dev.plone.org/plone/ticket/9018)

First, thanks for looking into this issue.

 Changed:
   U   Zope/branches/2.13/doc/CHANGES.rst
   U   Zope/branches/2.13/src/Products/ZCatalog/Lazy.py
   U   Zope/branches/2.13/src/Products/ZCatalog/tests/test_lazy.py
 
 -=-
 Modified: Zope/branches/2.13/doc/CHANGES.rst
 ===
 --- Zope/branches/2.13/doc/CHANGES.rst2010-12-14 13:24:24 UTC (rev 
 118862)
 +++ Zope/branches/2.13/doc/CHANGES.rst2010-12-14 14:12:11 UTC (rev 
 118863)
 @@ -11,6 +11,8 @@
  Bugs Fixed
  ++
  
 +- Fix `LazyMap` to avoid unnecessary function calls.
 +
  - LP 686664: WebDAV Lock Manager ZMI view wasn't accessible.
  
  2.13.1 (2010-12-07)
 
 Modified: Zope/branches/2.13/src/Products/ZCatalog/Lazy.py
 ===
 --- Zope/branches/2.13/src/Products/ZCatalog/Lazy.py  2010-12-14 13:24:24 UTC 
 (rev 118862)
 +++ Zope/branches/2.13/src/Products/ZCatalog/Lazy.py  2010-12-14 14:12:11 UTC 
 (rev 118863)
 @@ -145,44 +145,28 @@
  # Don't access data until necessary
  
  def __init__(self, func, seq, length=None):
 -self._seq = seq
 -self._data = []
 -self._func = func
 +self._seq=seq
 +self._func=func
  if length is not None:
 -self._len = length
 +self._len=length

Please don't un-PEP8 a cleaned-up module:  leave spaces around
operators.  If that was unintentional (maybe you pasted from a copy of a
file made before the module was cleaned up?), then you still need to
review the diff and edit out unintended changes first.

  else:
  self._len = len(seq)
 +self._marker = object()
 +self._data = [self._marker] * self._len

Have you measured the impact of pre-allocating here for very large
sequences?  You are trading off space in return for speed for the
(relatively) rare case of random-access indexing off-the end.

I'm *sure* that is a not a good trade-off for all cases:  the catalog is
often used for queries which return result sets in the order of 10^5 or
more objects, and *very* rarely is it accessed randomly (I had never
seen it before reading the bug entry you cite).  The normal case is to
take the first few entries (batch_size * batch_number) from the huge set.

I think you would be better off finding a way to inject your style of
LazyMap into the catalog for the random-access case, e.g., by passing
'_lazy_map' into the methods you are using.

BTW, another interesting alternative impplementation might be to use a
dictionary instead of a list for 'data'.  You could then avoid any
pre-allocation at all.

 -def __getitem__(self, index):
 -data = self._data
 +def __getitem__(self,index):
 +data=self._data
  try:
 -s = self._seq
 +s=self._seq
  except AttributeError:
  return data[index]

While we're at it, the 'try: ... except:'  here is wasteful and slow.
Lazy objects aren't persistent, and the '_seq' attribute is added
unconditionally in '__init__'.

 -i = index
 -if i  0:
 -i = len(self) + i
 -if i  0:
 -raise IndexError(index)
 +value = data[index]
 +if value is self._marker:
 +value = data[index] = self._func(s[index])
 +return value
 -ind = len(data)
 -if i  ind:
 -return data[i]
 -ind = ind - 1
  
 -func = self._func
 -while i  ind:
 -try:
 -ind = ind + 1
 -data.append(func(s[ind]))
 -except IndexError:
 -del self._func
 -del self._seq
 -raise IndexError(index)
 -return data[i]
 -
 -

Finally, whatever cleanup we settle on also needs to be forward-ported
to the trunk.


Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0HpV0ACgkQ+gerLs4ltQ4FsQCg2zpfEn+Hih6lNBqboQJDDBl5
7cIAn1rnqKtXsBv7j/s+PlcK0Nals7m1
=b/uG
-END PGP SIGNATURE-

___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )