Re: [Python-Dev] Integer representation (Was: ssize_t question: longs in header files)

2006-05-29 Thread Martin v. Löwis
Thomas Wouters wrote:
 But switching PyInts to use (a symbolic type of the same size as)
 Py_ssize_t means that, when the time comes that 32-bit architectures are
 rare, Win64 isn't left as the only platform (barring other LLP64
 systems) that has slow 33-to-64-bit Python numbers (because they'd be
 PyLongs there, even though the platform has 64-bit registers.) Given the
 timeframe and the impact, though, perhaps we should just do it -- now --
 in the p3yk branch and forget about 2.x; gives people all the more
 reason to switch, two years from now.

I thought Py3k will have a single integer type whose representation
varies depending on the value being represented.

I haven't seen an actual proposal for such a type, so let me make
one:

struct PyInt{
  struct PyObject ob;
  Py_ssize_t value_or_size;
  char is_long;
  digit ob_digit[1];
};

If is_long is false, then value_or_size is the value (represented
as Py_ssize_t), else the value is in ob_digit, and value_or_size
is the size.

PyLong_* will be synonyms for PyInt_*. PyInt_FromLong/AsLong will
continue to exist; PyInt_AsLong will indicate an overflow with -1.
Likewise, PyArg_ParseTuple i will continue to produce int, and
raise an exception (OverflowError?) when the value is out of range.

C code can then decide whether to parse a Python integer as
C int, long, long long, or ssize_t.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Integer representation (Was: ssize_t question: longs in header files)

2006-05-29 Thread Guido van Rossum
[Adding the py3k list; please remove python-dev in followups.]

On 5/29/06, Martin v. Löwis [EMAIL PROTECTED] wrote:
 I thought Py3k will have a single integer type whose representation
 varies depending on the value being represented.

That's one proposal. Another is to have an abstract 'int' type with
two concrete subtypes, e.g. 'short' and 'long', corresponding to
today's int and long. At the C level the API should be unified so C
programmers are isolated from the difference (they aren't today).

 I haven't seen an actual proposal for such a type,

I'm not sure that my proposal above has ever been said out loud. I'm
also not partial; I think we may have to do an experiment to decide.

 so let me make one:

 struct PyInt{
   struct PyObject ob;
   Py_ssize_t value_or_size;
   char is_long;
   digit ob_digit[1];
 };

 If is_long is false, then value_or_size is the value (represented
 as Py_ssize_t), else the value is in ob_digit, and value_or_size
 is the size.

Nice. I guess if we store the long value in big-endian order we could
drop is_long, since the first digit of the long would always be
nonzero. This would save a byte (on average) for the longs, but it
would do nothing for the wasted space for short ints.

 PyLong_* will be synonyms for PyInt_*.

Why do we need to keep the PyLong_* APIs at all? Even at the Python
level we're not planning any backward compatibility features; at the C
level I like even more freedom to break things.

 PyInt_FromLong/AsLong will
 continue to exist; PyInt_AsLong will indicate an overflow with -1.
 Likewise, PyArg_ParseTuple i will continue to produce int, and
 raise an exception (OverflowError?) when the value is out of range.

 C code can then decide whether to parse a Python integer as
 C int, long, long long, or ssize_t.

Nice. I like the unified API and I like using Py_ssize_t instead of
long for the value; this ensures that an int can hold a pointer (if we
allow for signed pointers) and matches the native word size better on
Windows (I guess it makes no difference for any other platform, where
ssize_t and long already have the same size).

I worry about all the wasted space for alignment caused by the extra
flag byte though. That would be 4 byte per integer on 32-bit machines
(where they are currently 12 bytes) and 8 bytes on 64-bit machines
(where they are currently 24 bytes).

That's why I'd like my alternative proposal (int as ABC and two
subclasses that may remain anonymous to the Python user); it'll save
the alignment waste for short ints and will let us use a smaller int
type for the size for long ints (if we care about the latter).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Integer representation (Was: ssize_t question: longs in header files)

2006-05-29 Thread Martin v. Löwis
Guido van Rossum wrote:
 struct PyInt{
   struct PyObject ob;
   Py_ssize_t value_or_size;
   char is_long;
   digit ob_digit[1];
 };

 
 Nice. I guess if we store the long value in big-endian order we could
 drop is_long, since the first digit of the long would always be
 nonzero. This would save a byte (on average) for the longs, but it
 would do nothing for the wasted space for short ints.

Right; alternatively, the top-most bit of ob_digit[0] could also be
used, as longs have currently 15-bit digits.

 Why do we need to keep the PyLong_* APIs at all? Even at the Python
 level we're not planning any backward compatibility features; at the C
 level I like even more freedom to break things.

Indeed, they should get dropped.

 I worry about all the wasted space for alignment caused by the extra
 flag byte though. That would be 4 byte per integer on 32-bit machines
 (where they are currently 12 bytes) and 8 bytes on 64-bit machines
 (where they are currently 24 bytes).

I think ints should get managed by PyMalloc in Py3k. With my proposal,
an int has 16 bytes on a 32-bit machine, so there wouldn't be any
wastage for PyMalloc (which allocates 16 bytes for 12-byte objects,
anyway). On a 64-bit machine, it would indeed waste 8 bytes per
int.

 That's why I'd like my alternative proposal (int as ABC and two
 subclasses that may remain anonymous to the Python user); it'll save
 the alignment waste for short ints and will let us use a smaller int
 type for the size for long ints (if we care about the latter).

I doubt they can remain anonymous. People often dispatch by type
(e.g. pickle, xmlrpclib, ...), and need to put the type into a
dictionary. If the type is anonymous, they will do

   dispatch[type(0)] = marshal_int
   dispatch[type(sys.maxint+1)] = marshal_int

Plus, their current code as

   dispatch[int] = marshal_int

which will silently break (although it won't be silent if they also
have dispatch[long] = marshal_long).

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com