Re: [Python-Dev] Integer representation (Was: ssize_t question: longs in header files)
Thomas Wouters wrote: But switching PyInts to use (a symbolic type of the same size as) Py_ssize_t means that, when the time comes that 32-bit architectures are rare, Win64 isn't left as the only platform (barring other LLP64 systems) that has slow 33-to-64-bit Python numbers (because they'd be PyLongs there, even though the platform has 64-bit registers.) Given the timeframe and the impact, though, perhaps we should just do it -- now -- in the p3yk branch and forget about 2.x; gives people all the more reason to switch, two years from now. I thought Py3k will have a single integer type whose representation varies depending on the value being represented. I haven't seen an actual proposal for such a type, so let me make one: struct PyInt{ struct PyObject ob; Py_ssize_t value_or_size; char is_long; digit ob_digit[1]; }; If is_long is false, then value_or_size is the value (represented as Py_ssize_t), else the value is in ob_digit, and value_or_size is the size. PyLong_* will be synonyms for PyInt_*. PyInt_FromLong/AsLong will continue to exist; PyInt_AsLong will indicate an overflow with -1. Likewise, PyArg_ParseTuple i will continue to produce int, and raise an exception (OverflowError?) when the value is out of range. C code can then decide whether to parse a Python integer as C int, long, long long, or ssize_t. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Integer representation (Was: ssize_t question: longs in header files)
[Adding the py3k list; please remove python-dev in followups.] On 5/29/06, Martin v. Löwis [EMAIL PROTECTED] wrote: I thought Py3k will have a single integer type whose representation varies depending on the value being represented. That's one proposal. Another is to have an abstract 'int' type with two concrete subtypes, e.g. 'short' and 'long', corresponding to today's int and long. At the C level the API should be unified so C programmers are isolated from the difference (they aren't today). I haven't seen an actual proposal for such a type, I'm not sure that my proposal above has ever been said out loud. I'm also not partial; I think we may have to do an experiment to decide. so let me make one: struct PyInt{ struct PyObject ob; Py_ssize_t value_or_size; char is_long; digit ob_digit[1]; }; If is_long is false, then value_or_size is the value (represented as Py_ssize_t), else the value is in ob_digit, and value_or_size is the size. Nice. I guess if we store the long value in big-endian order we could drop is_long, since the first digit of the long would always be nonzero. This would save a byte (on average) for the longs, but it would do nothing for the wasted space for short ints. PyLong_* will be synonyms for PyInt_*. Why do we need to keep the PyLong_* APIs at all? Even at the Python level we're not planning any backward compatibility features; at the C level I like even more freedom to break things. PyInt_FromLong/AsLong will continue to exist; PyInt_AsLong will indicate an overflow with -1. Likewise, PyArg_ParseTuple i will continue to produce int, and raise an exception (OverflowError?) when the value is out of range. C code can then decide whether to parse a Python integer as C int, long, long long, or ssize_t. Nice. I like the unified API and I like using Py_ssize_t instead of long for the value; this ensures that an int can hold a pointer (if we allow for signed pointers) and matches the native word size better on Windows (I guess it makes no difference for any other platform, where ssize_t and long already have the same size). I worry about all the wasted space for alignment caused by the extra flag byte though. That would be 4 byte per integer on 32-bit machines (where they are currently 12 bytes) and 8 bytes on 64-bit machines (where they are currently 24 bytes). That's why I'd like my alternative proposal (int as ABC and two subclasses that may remain anonymous to the Python user); it'll save the alignment waste for short ints and will let us use a smaller int type for the size for long ints (if we care about the latter). -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Integer representation (Was: ssize_t question: longs in header files)
Guido van Rossum wrote: struct PyInt{ struct PyObject ob; Py_ssize_t value_or_size; char is_long; digit ob_digit[1]; }; Nice. I guess if we store the long value in big-endian order we could drop is_long, since the first digit of the long would always be nonzero. This would save a byte (on average) for the longs, but it would do nothing for the wasted space for short ints. Right; alternatively, the top-most bit of ob_digit[0] could also be used, as longs have currently 15-bit digits. Why do we need to keep the PyLong_* APIs at all? Even at the Python level we're not planning any backward compatibility features; at the C level I like even more freedom to break things. Indeed, they should get dropped. I worry about all the wasted space for alignment caused by the extra flag byte though. That would be 4 byte per integer on 32-bit machines (where they are currently 12 bytes) and 8 bytes on 64-bit machines (where they are currently 24 bytes). I think ints should get managed by PyMalloc in Py3k. With my proposal, an int has 16 bytes on a 32-bit machine, so there wouldn't be any wastage for PyMalloc (which allocates 16 bytes for 12-byte objects, anyway). On a 64-bit machine, it would indeed waste 8 bytes per int. That's why I'd like my alternative proposal (int as ABC and two subclasses that may remain anonymous to the Python user); it'll save the alignment waste for short ints and will let us use a smaller int type for the size for long ints (if we care about the latter). I doubt they can remain anonymous. People often dispatch by type (e.g. pickle, xmlrpclib, ...), and need to put the type into a dictionary. If the type is anonymous, they will do dispatch[type(0)] = marshal_int dispatch[type(sys.maxint+1)] = marshal_int Plus, their current code as dispatch[int] = marshal_int which will silently break (although it won't be silent if they also have dispatch[long] = marshal_long). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com