[Zope-dev] Re: 64-bit BTrees

2006-04-21 Thread Fred Drake
On 4/13/06, Fred Drake [EMAIL PROTECTED] wrote:
 I've created a feature development branch for
 this, and checked in my initial implementation.

I've made another branch for this, with a different twist.  I'm not
sure it'll be interesting, but I think it'll solve my immediate need
until I can get around to reasonable testing of the performance
implications.

The fdrake-optional-64bits branch will compile using the C int
type for I keys and values by default, and using the PY_LONG_LONG
type if ZODB_64BIT_INTS is defined.

This allows 64-bit BTrees by building ZODB like this:

python setup.py build_ext -D ZODB_64BIT_INTS build

BTrees.IIBTree.using64bits will be True if ZODB_64BIT_INTS is used.


  -Fred

--
Fred L. Drake, Jr.fdrake at gmail.com
Don't let schooling interfere with your education. -- Mark Twain
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: 64-bit BTrees

2006-04-18 Thread Chris Withers

Philipp von Weitershausen wrote:

in memory. Dieter estimates 20% to 35% slowdown for the C algorithms
(whatever that means), Tim seems to think it won't have such a big
effect. I guess we'll only know after some benchmarks.


Can we please not make any definite decisions until this issue has been 
resolved, some of us do actually care about performance ;-)


Yes, ideally, I'd like to see just IIBTree's but only if there are not 
perfomance implications. I think BTrees sit low enough in the stack that 
it's perfectly justifiable to have both an I BTree and an L BTree.


If having two isn't acceptable, then why do we have an I and O BTree's, 
not to mention the special ones used for in-memory ZODB indexes? Surely 
we should just have one BTree class?


cheers,

Chris

--
Simplistix - Content Management, Zope  Python Consulting
   - http://www.simplistix.co.uk

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: 64-bit BTrees

2006-04-17 Thread Jim Fulton

Tres Seaver wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jim Fulton wrote:


Tres Seaver wrote:



-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Fred Drake wrote:



I have a need for 64-bit BTrees (at least for IOBTree and OIBTree),
and I'm not the first.  I've created a feature development branch for
this, and checked in my initial implementation.

I've modified the existing code to use PY_LONG_LONG instead of int for
the key and/or value type; there's no longer a 32-bit version in the
modified code.  Any Python int or long that can fit in 64 bits is
accepted; ValueError is raised for values that require 65 bits (or
more).  Keys and values that can be reported as Python ints are, and
longs are only returned when the value cannot be converted to a Python
int.

This can have a substantial effect on memory consumption, since keys
and/or values now take twice the space.  There may be performance
issues as well, but those have not been tested.

There are new unit tests, but more are likely needed.

If you're interested in getting the code from Subversion, it's
available at:

  svn://svn.zope.org/repos/main/ZODB/branches/fdrake-64bits/

Ideally, this or some variation on this could be folded back into the
main development for ZODB.  If this is objectionable, making 64-bit
btrees available would require introducing new versions of the btrees
(possibly named LLBTree, LOBTree, and OLBTree).




I think coming up with new types is the only reasonable thing to do,
given the prevalence of persistent BTrees out in the wild.  Changing the
runtime behavior (footprint, performance) of those objects is probably
not something which most users are going to want, at least not without
carefully considering the implications.



It really depends on what the impact is.  It would be nice to get a feel
for whether this really impacts memory or performance for real
applications.
This adds 4-bytes per key or value.  That isn't much, especially in a
typical
Zope application.  Similarly, it's hard to say what the difference in C
integer
operations will be.  I can easily imagine it being negligible (or being
significant :).

OTOH, adding a new type could be a huge PITA. We'd like to use these
with existing
catalog and index code, all of which uses IIBTrees.  If the performance
impacts are
modest, I'd much rather declare IIBTrees to use 64-bit rather than
32-bit integers.

I suppose an alternative would be to add a mechanism to configure
IIBTrees to use
either 32-bit or 64-bit integers at run-time.



Who uses IOBTree / OIBTree / IIBTree?

  - Catalogs map RIDs to UIDs as IOBTrees (one record per
indexed object)

  - Most indexes (those derived from Unindex) map RID to indexed value
as an IOBTree (one record per object with a value meaningful to that
index) and map values to RIDs as OOBTrees (where the second O is
usually an IITreeSet).

  - ZCTextIndex uses IIBTrees to map word IDs to RIDs, in various ways,
and make use of IOBTrees as wel..

  - Relationship indexes (typically not stored within catalogs)
usually have an IIBTree which is the mapping
of the edges as pairs of internal node IDs (one per explicit
relationship), with OIBTrees to map the user-supplied node value
to a node ID.

I would guess that if you could do a census of all the OIDs in all the
Datas.fs in the world, a significant majority of them would be instances
of classes declared in IOBTree / IIBTree (certainly the bulk of
*transaction* records are going to be tied up with them).


OK.  I think we are misscommunicating. Using 64 bits for IIBTrees
types would not in any way invalidate existing pickles.
64-bit IIBTrees types can be unpickled from existing data.
Of course, someone who created 64-bit BTrees type instances
that had values outside the 32-bit range would have trouble reading
these values with 32-bit IIBTrees,

The fact that IIBTrees is so widely used is exatly the reason
I want to use 64-bits for the existing types rather than having to
introduce a new type.

Jim

--
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
CTO  (540) 361-1714http://www.python.org
Zope Corporation http://www.zope.com   http://www.zope.org
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


Re: [ZODB-Dev] Re: [Zope-dev] Re: 64-bit BTrees

2006-04-17 Thread Fred Drake
On 4/17/06, Jim Fulton [EMAIL PROTECTED] wrote:
 The fact that IIBTrees is so widely used is exatly the reason
 I want to use 64-bits for the existing types rather than having to
 introduce a new type.

Oops, I was checking in the separated version of 64-bit BTrees while
this was landing in my inbox.  ;-/

Once we determine which approach we're going with, I should make an
alpha release of ZODB 3.7 and knit that into the Zope 3 trunk so we
can get more testing in context.


  -Fred

--
Fred L. Drake, Jr.fdrake at gmail.com
Don't let schooling interfere with your education. -- Mark Twain
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] Re: 64-bit BTrees

2006-04-17 Thread Philipp von Weitershausen
Fred Drake wrote:
 I have a need for 64-bit BTrees (at least for IOBTree and OIBTree),
 and I'm not the first.  I've created a feature development branch for
 this, and checked in my initial implementation.
 
 I've modified the existing code to use PY_LONG_LONG instead of int for
 the key and/or value type; there's no longer a 32-bit version in the
 modified code.  Any Python int or long that can fit in 64 bits is
 accepted; ValueError is raised for values that require 65 bits (or
 more).  Keys and values that can be reported as Python ints are, and
 longs are only returned when the value cannot be converted to a Python
 int.
 
 This can have a substantial effect on memory consumption, since keys
 and/or values now take twice the space.  There may be performance
 issues as well, but those have not been tested.
 
 There are new unit tests, but more are likely needed.
 
 If you're interested in getting the code from Subversion, it's available at:
 
 svn://svn.zope.org/repos/main/ZODB/branches/fdrake-64bits/
 
 Ideally, this or some variation on this could be folded back into the
 main development for ZODB.  If this is objectionable, making 64-bit
 btrees available would require introducing new versions of the btrees
 (possibly named LLBTree, LOBTree, and OLBTree).

-1 to the L*BTree flavours. The 'long' type is disappearing from Python
anyways, why should the implementation detail worry us regarding BTrees?

+1 to making the I*BTree flavours 64bit-aware.

If Jim is right, existing pickles will be read back just fine and as
long as the integers stay under 32 bit, they won't even be bigger. The
only real implication therefore is the pushing around twice as many bits
in memory. Dieter estimates 20% to 35% slowdown for the C algorithms
(whatever that means), Tim seems to think it won't have such a big
effect. I guess we'll only know after some benchmarks.

Philipp

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] Re: 64-bit BTrees

2006-04-13 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Fred Drake wrote:
 I have a need for 64-bit BTrees (at least for IOBTree and OIBTree),
 and I'm not the first.  I've created a feature development branch for
 this, and checked in my initial implementation.
 
 I've modified the existing code to use PY_LONG_LONG instead of int for
 the key and/or value type; there's no longer a 32-bit version in the
 modified code.  Any Python int or long that can fit in 64 bits is
 accepted; ValueError is raised for values that require 65 bits (or
 more).  Keys and values that can be reported as Python ints are, and
 longs are only returned when the value cannot be converted to a Python
 int.
 
 This can have a substantial effect on memory consumption, since keys
 and/or values now take twice the space.  There may be performance
 issues as well, but those have not been tested.
 
 There are new unit tests, but more are likely needed.
 
 If you're interested in getting the code from Subversion, it's available at:
 
 svn://svn.zope.org/repos/main/ZODB/branches/fdrake-64bits/
 
 Ideally, this or some variation on this could be folded back into the
 main development for ZODB.  If this is objectionable, making 64-bit
 btrees available would require introducing new versions of the btrees
 (possibly named LLBTree, LOBTree, and OLBTree).

I think coming up with new types is the only reasonable thing to do,
given the prevalence of persistent BTrees out in the wild.  Changing the
runtime behavior (footprint, performance) of those objects is probably
not something which most users are going to want, at least not without
carefully considering the implications.


Tres.
- --
===
Tres Seaver  +1 202-558-7113  [EMAIL PROTECTED]
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEPpyu+gerLs4ltQ4RAmh1AJ9/dLigNMrQgIFNASKWbpvboapywwCePV22
/3d8kFGTjipAVCsy5fnuLa4=
=xe6v
-END PGP SIGNATURE-

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: 64-bit BTrees

2006-04-13 Thread Jim Fulton

Tres Seaver wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Fred Drake wrote:


I have a need for 64-bit BTrees (at least for IOBTree and OIBTree),
and I'm not the first.  I've created a feature development branch for
this, and checked in my initial implementation.

I've modified the existing code to use PY_LONG_LONG instead of int for
the key and/or value type; there's no longer a 32-bit version in the
modified code.  Any Python int or long that can fit in 64 bits is
accepted; ValueError is raised for values that require 65 bits (or
more).  Keys and values that can be reported as Python ints are, and
longs are only returned when the value cannot be converted to a Python
int.

This can have a substantial effect on memory consumption, since keys
and/or values now take twice the space.  There may be performance
issues as well, but those have not been tested.

There are new unit tests, but more are likely needed.

If you're interested in getting the code from Subversion, it's available at:

   svn://svn.zope.org/repos/main/ZODB/branches/fdrake-64bits/

Ideally, this or some variation on this could be folded back into the
main development for ZODB.  If this is objectionable, making 64-bit
btrees available would require introducing new versions of the btrees
(possibly named LLBTree, LOBTree, and OLBTree).



I think coming up with new types is the only reasonable thing to do,
given the prevalence of persistent BTrees out in the wild.  Changing the
runtime behavior (footprint, performance) of those objects is probably
not something which most users are going to want, at least not without
carefully considering the implications.


It really depends on what the impact is.  It would be nice to get a feel
for whether this really impacts memory or performance for real applications.
This adds 4-bytes per key or value.  That isn't much, especially in a typical
Zope application.  Similarly, it's hard to say what the difference in C integer
operations will be.  I can easily imagine it being negligible (or being
significant :).

OTOH, adding a new type could be a huge PITA. We'd like to use these with 
existing
catalog and index code, all of which uses IIBTrees.  If the performance impacts 
are
modest, I'd much rather declare IIBTrees to use 64-bit rather than 32-bit 
integers.

I suppose an alternative would be to add a mechanism to configure IIBTrees to 
use
either 32-bit or 64-bit integers at run-time.

Jim

--
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
CTO  (540) 361-1714http://www.python.org
Zope Corporation http://www.zope.com   http://www.zope.org

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Re: 64-bit BTrees

2006-04-13 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jim Fulton wrote:
 Tres Seaver wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Fred Drake wrote:

 I have a need for 64-bit BTrees (at least for IOBTree and OIBTree),
 and I'm not the first.  I've created a feature development branch for
 this, and checked in my initial implementation.

 I've modified the existing code to use PY_LONG_LONG instead of int for
 the key and/or value type; there's no longer a 32-bit version in the
 modified code.  Any Python int or long that can fit in 64 bits is
 accepted; ValueError is raised for values that require 65 bits (or
 more).  Keys and values that can be reported as Python ints are, and
 longs are only returned when the value cannot be converted to a Python
 int.

 This can have a substantial effect on memory consumption, since keys
 and/or values now take twice the space.  There may be performance
 issues as well, but those have not been tested.

 There are new unit tests, but more are likely needed.

 If you're interested in getting the code from Subversion, it's
 available at:

svn://svn.zope.org/repos/main/ZODB/branches/fdrake-64bits/

 Ideally, this or some variation on this could be folded back into the
 main development for ZODB.  If this is objectionable, making 64-bit
 btrees available would require introducing new versions of the btrees
 (possibly named LLBTree, LOBTree, and OLBTree).



 I think coming up with new types is the only reasonable thing to do,
 given the prevalence of persistent BTrees out in the wild.  Changing the
 runtime behavior (footprint, performance) of those objects is probably
 not something which most users are going to want, at least not without
 carefully considering the implications.
 
 
 It really depends on what the impact is.  It would be nice to get a feel
 for whether this really impacts memory or performance for real
 applications.
 This adds 4-bytes per key or value.  That isn't much, especially in a
 typical
 Zope application.  Similarly, it's hard to say what the difference in C
 integer
 operations will be.  I can easily imagine it being negligible (or being
 significant :).
 
 OTOH, adding a new type could be a huge PITA. We'd like to use these
 with existing
 catalog and index code, all of which uses IIBTrees.  If the performance
 impacts are
 modest, I'd much rather declare IIBTrees to use 64-bit rather than
 32-bit integers.
 
 I suppose an alternative would be to add a mechanism to configure
 IIBTrees to use
 either 32-bit or 64-bit integers at run-time.

Who uses IOBTree / OIBTree / IIBTree?

  - Catalogs map RIDs to UIDs as IOBTrees (one record per
indexed object)

  - Most indexes (those derived from Unindex) map RID to indexed value
as an IOBTree (one record per object with a value meaningful to that
index) and map values to RIDs as OOBTrees (where the second O is
usually an IITreeSet).

  - ZCTextIndex uses IIBTrees to map word IDs to RIDs, in various ways,
and make use of IOBTrees as wel..

  - Relationship indexes (typically not stored within catalogs)
usually have an IIBTree which is the mapping
of the edges as pairs of internal node IDs (one per explicit
relationship), with OIBTrees to map the user-supplied node value
to a node ID.

I would guess that if you could do a census of all the OIDs in all the
Datas.fs in the world, a significant majority of them would be instances
of classes declared in IOBTree / IIBTree (certainly the bulk of
*transaction* records are going to be tied up with them).


Tres.
- --
===
Tres Seaver  +1 202-558-7113  [EMAIL PROTECTED]
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFEPtfe+gerLs4ltQ4RAkDoAJ9998Bj5yMqVpKoQOn/s3Hf5GZkBwCcC4uY
kXTqmBsu6vMYx4fzAOWF5uo=
=yVVq
-END PGP SIGNATURE-
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope3-dev] Re: [Zope-dev] Re: 64-bit BTrees

2006-04-13 Thread Tim Peters
[Tres Seaver]
 ...
 I would guess that if you could do a census of all the OIDs in all the
 Datas.fs in the world, a significant majority of them would be instances
 of classes declared in IOBTree / IIBTree (certainly the bulk of
 *transaction* records are going to be tied up with them).

Provided it still works, people can use ZODB's analyze.py to figure that out.

But supposing I flavors of BTrees are the only objects that exist,
what follows from that?  It's not clear.  I can guarantee that
multiunion() will run slower, because its workhorse radix sort will
need 8 (instead of 4) passes.  Beyond that, it requires someone to try
it.  I'm reminded that when the MEMS Exchange wrote Durus (a kind of
ZODB lite ;-):

http://www.mems-exchange.org/software/durus/

)

they left their entire BTree implementation coded in Python -- it was
fast enough that way.  The difference between ZODB's BTree C code
pushing 4 or 8 bytes around at a time may well be insignificant
overall.

If done carefully, pickle sizes probably won't change:  cPickle has a
large number of ways to pickle integers, and picks the smallest one
needed to hold an integer's actual value.  Provided the internal
getstate() functions are careful to avoid Python longs when possible,
bigger pickles won't happen unless more than 32 bits are actually
needed to hold an integer.

There's also that ZODB's current I trees are badly broken on 64-bit
boxes (except for Win64) in at least this way:

http://collector.zope.org/Zope/1592

That problem would go away by magic.

looks-like-a-case-of-measure-twice-cut-once-ly y'rs  - tim
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )