Thanks for the clarification, Tim!

Due to the possible future use of 64-bit BTrees, I'm inclined to go for the 2-character fix for now and possibly remove it once we hit 64-bit everywhere.

Otherwise I'd agree on the fix using the comparison you mentioned:
>    (long)(int)some_C_long != some_C_long

(That would require me to activate my very rusty C-knowledge and to change and possibly break things in weird ways.)

So, could somebody (Fred? Jim?) jump in here and tell me which fix they are fine to use?


Tim Peters wrote:
thanks Tim, for the partial enlightenment. :) Unfortunately my
BTree/C-wisdom is much much smaller than yours, so I got to check a
couple of assumptions over here. :)

Yup, it really helps if you have a 64-bit box to check them on.

Hah. Looks like BTrees can accept 2**31 on machines where maxint is a
larger ...

Note that "accept" in this case means "silently loses the
most-significant bits", and that's a BTree bug on platforms where
sizeof(long) > sizeof(int):

I was able to reproduce this on an Intel 64-Bit machine (EM64T) running
Linux and gcc.

And I expect the same will happen on any 64-bit platform other than
Win64 (which is unique, AFAICT, in leaving sizeof(long) == 4 instead
of boosting it to 8).

For one: I didn't see any compiler warnings. That sounds bad, right?

The underlying BTree bug is assignments of the form (usually "hidden"
in macro expansions):

   some_C_int = some_C_long;

When sizeof(int) < sizeof(long), that can silently lose information.
I was really hoping that major compilers on boxes where sizeof(int) <
sizeof(long) would warn about that.  Oh well.

The problem arises because what _Python_ calls "int" is what C calls
"long", but the I-flavor BTree code stores C "int", and C "int"
doesn't correspond to any Python type (except "by accident" on 32-bit
boxes).  C "int" is 4 bytes on all known current 32- and 64-bit
platforms, but the size of what Python calls "int" varies.  The BTree
code isn't aware of the possible mismatch, storing Python "int" (C
"long") into C "int" without any checking.

2**31 doesn't actually lose any bits when you store it, but it will
probably misinterpret the high-order data bit as the sign bit when you
fetch it again, magically changing it into -2**31.

I can store 2**31 in the BTree, but the keys() method will tell you that
it actually stored -2**31.

Right.  If it's not clear, this is because the _bit_ pattern


is 2**31 when viewed as an 8-byte C long, but is -2**31 when viewed as
a 4-byte C long.  If you had, e.g., stored 2**32 instead, you would
have gotten 0 back when you fetched it (the top 32 bits are simply
thrown away).

Did you see my simpler suggestion for fixing the underlying bug (it
was a one-liner change to the original code)?  When you get tired of
fighting the 64-bit BTree bug here (it will be a minor miracle if the
test actually passes now on a 64-bit box, despite all you've tried
...), look that up ;-)

Nope, I didn't find your one-liner. :) Can you post it explicitly for my
blind eyes?

Here; it was a reply to one of the checkin messages:

The key problem in the original intid code is that it used randint()
instead of randrange(), theoretically allowing 2**31 to be a return
value.  It's _almost_ enough just to use randrange() instead.
Unfortunately, that's not quite enough; see the msg for what is.

It _could_ be fixed by adding 2 characters to the original, changing
2**31 to 2**31-2 (or to 0x7ffffffe), but that would leave it pretty
mysterious ;-)

I think I could have made up something that made it work, but I started
looking into making the BTree behave sanely.

My idea was, to modify the BTree code in a way that it actually checks
after the type cast whether the casted value is equal to the requested
key, or alternatively try making the CHECK_KEY macro do an "exact type
match" instead of allowing subclasses. But that wouldn't work either as
2**31 is still an int on those platforms. I'm a bit puzzled now. :)

There's potential silent information loss for both Ix-flavor BTree
keys and xI-flavor BTree values.  There are two robust ways to check
("robust" means that Python's C code has used these ways at various
times for many years now without problems).  Complain if:

   some_C_long < INT_MIN || some_C_long > INT_MAX

or complain if (this sounds close to what you had in mind above, and
is my favorite):

   (long)(int)some_C_long != some_C_long

Because the problem is due to bogus assumptions about the relationship
between C types, it's not going to help to examine Python's idea of

Checking isn't needed (can't fail) if SIZEOF_INT == SIZEOF_LONG
(Python.h supplies definitions for those macros), so there's some
worth to skipping checks when that's not true.  Unfortunately, C
doesn't allow "#if" preprocessor statements _inside_ macro expansions,
so the best way to do that isn't immediately clear.

In short, irritiating little issues abound :-(.  That's why I couldn't
make time to fix it (relatively high cost with no benefit on most Zope

Note that if ZODB moves to 64-bit Ix/xI BTrees on all boxes (IIRC, Jim
and Fred were agitating in that direction, but suffered massive
short-sighted ;-) opposition), the BTree problem would go away by
magic (C "int" would no longer be the type of Ix keys or xI values).
Zope3-dev mailing list

gocept gmbh & co. kg - forsterstra├če 29 - 06112 halle/saale - germany - [EMAIL PROTECTED] - phone +49 345 122 9889 7 -
fax +49 345 122 9889 1 - zope and plone consulting and development

Zope3-dev mailing list

Reply via email to