Re: [Python-Dev] Type of range object members
Greg Ewing schrieb: Also it means you'd pay a penalty every time you access it That penalty is already paid today. You'd still have that penalty, *plus* the overhead of bit masking to get at the value. No, the penalty gets smaller if there is only a single type. For example, abstract.c now has if (res (!PyInt_Check(res) !PyLong_Check(res))) { PyErr_Format(PyExc_TypeError, __int__ returned non-int (type %.200s), res-ob_type-tp_name); Py_DECREF(res); return NULL; } Currently, if a long int is returned, it performs two subtype tests. If the long type is dropped, the second test can go away. In this specific code, there is no penalty for a representation flag, since the value is not accessed. Code that wants to support both int and long and needs the value often does PyLong_AsLong these days, which will support int as well. This currently reads if (vv == NULL || !PyLong_Check(vv)) { if (vv != NULL PyInt_Check(vv)) return PyInt_AsLong(vv); PyErr_BadInternalCall(); return -1; } Notice that this has two checks if this is an int, and both are subtype checks. With a single type, this would become if (vv == NULL || !PyInt_Check(vv)) { PyErr_BadInternalCall(); return -1; } if (!vv-ob_size) return PyInt_AsLong(vv); Actually, the implementation of PyInt_AsLong might get inlined; it currently starts with a third PyInt_Check. So overall, I would expect that a single type would improve performance, not decrease it. As you say, any change is likely not noticeable in performance, though, either way. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Neal Norwitz schrieb: It would change the CheckExact()s from: op-ob_type == global-variable, to: op-ob_type CONSTANT == CONSTANT. Check would be the same as the CheckExact, just with different constants. The Check version would then drop the || condition. Hmm. I don't see the for the FAST_SUBCLASS bit still. I would set the relevant bit in the type object itself, and then have #define PyInt_CheckExact(op) ((op)-ob_type == PyInt_Type) #define PyInt_Check(op) \ PyType_FastSubclass((op)-ob_type, Py_TPFLAGS_INT_SUBCLASS) Then, in inherit_special, I'd do type-tp_flags |= base-tp_flags Py_TPFLAGS_FAST_SUBCLASS_MASK; So you would have a pointer comparison for the exact check, and the bit mask check for the subtype check. It's likely that the pointer comparison is still more efficient: It does *not*, normally, need to read a global variable to get the address of PyInt_Type. Currently, on x86, with non-PIC code on Linux, the pointer check compiles as cmpl$PyInt_Type, 4(%eax) ; %eax is the object where the linker fills the address of PyInt_Type into the machine instruction. OTOH, the access to the flags compiles as movl4(%eax), %eax ; %eax is the object movl84(%eax), %eax andl$2013265920, %eax cmpl$2013265920, %eax Even with PIC code, the address check is still more efficient: movl[EMAIL PROTECTED](%ecx), %eax cmpl%eax, 4(%edx) ; %edx is the object Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Greg Ewing schrieb: Martin v. Löwis wrote: We had this discussion before; if you use ob_size==0 to indicate that it's an int, this space isn't needed in a long int. What about int subclasses? It's what Guido proposes. It would still leave two types (perhaps three) at the C level, so C code might have to continue making conditional code depending on which of these it is. Also, Python code that dispatches by type still needs to make the distinction. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Greg Ewing schrieb: Guido van Rossum wrote: I worry that dropping the special allocator will be too slow. Surely there's some compromise that would allow recently-used ints to be kept around, but reclaimed if memory becomes low? Hardly. The efficiency of the special-case allocator also comes from the fact that it doesn't ever have to release memory. Just try changing it to see what I mean. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Greg Ewing schrieb: There isn't? Actually a lot of APIs currently assumen that. Also it means you'd pay a penalty every time you access it, whereas presumably short ints are the case we want to optimise for speed as well. That penalty is already paid today. Much code dealing with ints has a type test whether it's an int or a long. If int and long become subtypes of each other or of some abstract type, performance will decrease even more because a subtype test is quite expensive if the object is neither int nor long (it has to traverse the entire base type hierarchy to find out its not inherited from int). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On 8/15/06, Martin v. Löwis [EMAIL PROTECTED] wrote: That penalty is already paid today. Much code dealing with ints has a type test whether it's an int or a long. If int and long become subtypes of each other or of some abstract type, performance will decrease even more because a subtype test is quite expensive if the object is neither int nor long (it has to traverse the entire base type hierarchy to find out its not inherited from int). I was playing around with a little patch to avoid that penalty. It doesn't take any additional memory, just a handful of bits we aren't using. :-) For the more common builtin types, it stores whether it's a subclass in tp_flags, so there's no function call necessary and it's a constant time operation. It was faster when doing simple stuff. Haven't thought much whether this is really worthwhile or not. n Index: Include/stringobject.h === --- Include/stringobject.h (revision 51237) +++ Include/stringobject.h (working copy) @@ -55,8 +55,9 @@ PyAPI_DATA(PyTypeObject) PyBaseString_Type; PyAPI_DATA(PyTypeObject) PyString_Type; -#define PyString_Check(op) PyObject_TypeCheck(op, PyString_Type) #define PyString_CheckExact(op) ((op)-ob_type == PyString_Type) +#define PyString_Check(op) (PyString_CheckExact(op) || \ + PyType_FastSubclass((op)-ob_type, Py_TPFLAGS_STRING_SUBCLASS)) PyAPI_FUNC(PyObject *) PyString_FromStringAndSize(const char *, Py_ssize_t); PyAPI_FUNC(PyObject *) PyString_FromString(const char *); Index: Include/dictobject.h === --- Include/dictobject.h (revision 51237) +++ Include/dictobject.h (working copy) @@ -90,8 +90,9 @@ PyAPI_DATA(PyTypeObject) PyDict_Type; -#define PyDict_Check(op) PyObject_TypeCheck(op, PyDict_Type) #define PyDict_CheckExact(op) ((op)-ob_type == PyDict_Type) +#define PyDict_Check(op) (PyDict_CheckExact(op) || \ + PyType_FastSubclass((op)-ob_type, Py_TPFLAGS_DICT_SUBCLASS)) PyAPI_FUNC(PyObject *) PyDict_New(void); PyAPI_FUNC(PyObject *) PyDict_GetItem(PyObject *mp, PyObject *key); Index: Include/unicodeobject.h === --- Include/unicodeobject.h (revision 51237) +++ Include/unicodeobject.h (working copy) @@ -390,8 +390,9 @@ PyAPI_DATA(PyTypeObject) PyUnicode_Type; -#define PyUnicode_Check(op) PyObject_TypeCheck(op, PyUnicode_Type) #define PyUnicode_CheckExact(op) ((op)-ob_type == PyUnicode_Type) +#define PyUnicode_Check(op) (PyUnicode_CheckExact(op) || \ + PyType_FastSubclass((op)-ob_type, Py_TPFLAGS_UNICODE_SUBCLASS)) /* Fast access macros */ #define PyUnicode_GET_SIZE(op) \ Index: Include/intobject.h === --- Include/intobject.h (revision 51237) +++ Include/intobject.h (working copy) @@ -27,8 +27,9 @@ PyAPI_DATA(PyTypeObject) PyInt_Type; -#define PyInt_Check(op) PyObject_TypeCheck(op, PyInt_Type) #define PyInt_CheckExact(op) ((op)-ob_type == PyInt_Type) +#define PyInt_Check(op) (PyInt_CheckExact(op) || \ + PyType_FastSubclass((op)-ob_type, Py_TPFLAGS_INT_SUBCLASS)) PyAPI_FUNC(PyObject *) PyInt_FromString(char*, char**, int); #ifdef Py_USING_UNICODE Index: Include/listobject.h === --- Include/listobject.h (revision 51237) +++ Include/listobject.h (working copy) @@ -40,8 +40,9 @@ PyAPI_DATA(PyTypeObject) PyList_Type; -#define PyList_Check(op) PyObject_TypeCheck(op, PyList_Type) #define PyList_CheckExact(op) ((op)-ob_type == PyList_Type) +#define PyList_Check(op) (PyList_CheckExact(op) || \ + PyType_FastSubclass((op)-ob_type, Py_TPFLAGS_LIST_SUBCLASS)) PyAPI_FUNC(PyObject *) PyList_New(Py_ssize_t size); PyAPI_FUNC(Py_ssize_t) PyList_Size(PyObject *); Index: Include/object.h === --- Include/object.h (revision 51237) +++ Include/object.h (working copy) @@ -517,6 +517,18 @@ /* Objects support nb_index in PyNumberMethods */ #define Py_TPFLAGS_HAVE_INDEX (1L17) +/* These flags are used to determine if a type is a subclass. */ +/* Uses bits 30-27. */ +#define Py_TPFLAGS_FAST_SUBCLASS_MASK (0x7800) +#define Py_TPFLAGS_FAST_SUBCLASS (1L27) +#define Py_TPFLAGS_INT_SUBCLASS (Py_TPFLAGS_FAST_SUBCLASS | 0x7000) +#define Py_TPFLAGS_LONG_SUBCLASS (Py_TPFLAGS_FAST_SUBCLASS | 0x6000) +#define Py_TPFLAGS_LIST_SUBCLASS (Py_TPFLAGS_FAST_SUBCLASS | 0x5000) +#define Py_TPFLAGS_TUPLE_SUBCLASS (Py_TPFLAGS_FAST_SUBCLASS | 0x4000) +#define Py_TPFLAGS_STRING_SUBCLASS (Py_TPFLAGS_FAST_SUBCLASS | 0x3000) +#define Py_TPFLAGS_UNICODE_SUBCLASS (Py_TPFLAGS_FAST_SUBCLASS | 0x2000) +#define Py_TPFLAGS_DICT_SUBCLASS (Py_TPFLAGS_FAST_SUBCLASS | 0x1000) + #define Py_TPFLAGS_DEFAULT ( \ Py_TPFLAGS_HAVE_GETCHARBUFFER | \
Re: [Python-Dev] Type of range object members
Guido I worry that dropping the special allocator will be too slow. Greg Surely there's some compromise that would allow recently-used ints Greg to be kept around, but reclaimed if memory becomes low? Martin Hardly. The efficiency of the special-case allocator also comes Martin from the fact that it doesn't ever have to release memory. Just Martin try changing it to see what I mean. Wouldn't use of obmalloc offset much of that? Before obmalloc was available, the int free list was a huge win. Is it likely to be such a huge win today? Skip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On 8/15/06, Martin v. Löwis [EMAIL PROTECTED] wrote: Greg Ewing schrieb: Martin v. Löwis wrote: We had this discussion before; if you use ob_size==0 to indicate that it's an int, this space isn't needed in a long int. What about int subclasses? It's what Guido proposes. It would still leave two types (perhaps three) at the C level, so C code might have to continue making conditional code depending on which of these it is. Also, Python code that dispatches by type still needs to make the distinction. I'm not sure that subclassing ints gives us much. We could make int and long final types, and then all we have to do is tweak type() and __class__ so that they always return the 'int' type. Alternatively, yes, there would be some minimal awareness of the two types in Python -- but nothing like we currently have; dispatching on exact type (which we discourage anyway) would be the only case. Would that be so bad? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Neal Norwitz wrote: On 8/15/06, Martin v. Löwis [EMAIL PROTECTED] wrote: That penalty is already paid today. Much code dealing with ints has a type test whether it's an int or a long. If int and long become subtypes of each other or of some abstract type, performance will decrease even more because a subtype test is quite expensive if the object is neither int nor long (it has to traverse the entire base type hierarchy to find out its not inherited from int). I was playing around with a little patch to avoid that penalty. It doesn't take any additional memory, just a handful of bits we aren't using. :-) For the more common builtin types, it stores whether it's a subclass in tp_flags, so there's no function call necessary and it's a constant time operation. It was faster when doing simple stuff. Haven't thought much whether this is really worthwhile or not. This might als be helpful when exceptions have to inherit from BaseException in Py3k. Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
[EMAIL PROTECTED] schrieb: Guido I worry that dropping the special allocator will be too slow. Greg Surely there's some compromise that would allow recently-used ints Greg to be kept around, but reclaimed if memory becomes low? Martin Hardly. The efficiency of the special-case allocator also comes Martin from the fact that it doesn't ever have to release memory. Just Martin try changing it to see what I mean. Wouldn't use of obmalloc offset much of that? Before obmalloc was available, the int free list was a huge win. Is it likely to be such a huge win today? That's my theory: it isn't a huge win. Guido has another theory: it's still faster. Only benchmarking can tell. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Guido van Rossum schrieb: I'm not sure that subclassing ints gives us much. We could make int and long final types, and then all we have to do is tweak type() and __class__ so that they always return the 'int' type. I don't think this can work - there would be too many ways for the real types to leak, anyway. People would come up with hacks like reload(sys), and then complain that they have to use such hacks. Alternatively, yes, there would be some minimal awareness of the two types in Python -- but nothing like we currently have; dispatching on exact type (which we discourage anyway) would be the only case. Would that be so bad? I thought it was the ultimate goal of PEP 237 to unify int and long, in all respects. I'll do some benchmarking. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On 8/15/06, Neal Norwitz [EMAIL PROTECTED] wrote: I was playing around with a little patch to avoid that penalty. It doesn't take any additional memory, just a handful of bits we aren't using. :-) For the more common builtin types, it stores whether it's a subclass in tp_flags, so there's no function call necessary and it's a constant time operation. It was faster when doing simple stuff. Haven't thought much whether this is really worthwhile or not. I like it! I wonder if you should use another bit for inherits from BaseException. That would make catching and raising exceptions a bit faster. It applies cleanly to py3k -- perhaps you should just check it in there? +1 from me! -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
At 11:46 PM 8/15/2006 -0700, Neal Norwitz wrote: On 8/15/06, Martin v. Löwis [EMAIL PROTECTED] wrote: That penalty is already paid today. Much code dealing with ints has a type test whether it's an int or a long. If int and long become subtypes of each other or of some abstract type, performance will decrease even more because a subtype test is quite expensive if the object is neither int nor long (it has to traverse the entire base type hierarchy to find out its not inherited from int). I was playing around with a little patch to avoid that penalty. It doesn't take any additional memory, just a handful of bits we aren't using. :-) For the more common builtin types, it stores whether it's a subclass in tp_flags, so there's no function call necessary and it's a constant time operation. It was faster when doing simple stuff. Haven't thought much whether this is really worthwhile or not. It seems to me that you could drop the FAST_SUBCLASS bit, since none of the other bits will be set if it is not a subclass of a builtin. That would free up one flag bit -- perhaps usable for that BaseException flag Guido wants. :) (Of course, if you can't inherit from both BaseException and one of the other builtin types, it can just be another enumeration value within the bit mask.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Neal Norwitz schrieb: I was playing around with a little patch to avoid that penalty. It doesn't take any additional memory, just a handful of bits we aren't using. :-) There are common schemes that allow constant-time issubclass tests, although they do require more memory: 1. give each base class a small unique number, starting with 1 (0 means no number has been assigned). Give each class a bitmap of all base classes, plus a field for the maximum-numbered base class. Then, def issubclass(C, B): return B.classnum and (B.classnum C.maxbasenum) and\ bit_set(C.basenums, B.classnum) Supports multiple inheritance, space requirement linear with the number of classes that are used as base classes. Numbering should recycle class numbers when a class is gced. 2. restrict optimization to single-inheritance. Give each class a depth (distance from object, 0 for object and multiply-inherited classes). Also give each class an array of bases, ordered by depth. Then, def issubclass(C, B): if not C.depth: return expensive_issubclass(C, B) return B.depth C.depth and (C.bases[B.depth] is B) Space requirement is linear with the depth of the class (but I think tp_mro could be used, if the formula is changed to (C.bases[C.depth-B.depth] is B)) Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Martin v. Löwis wrote: Neal Norwitz schrieb: I was playing around with a little patch to avoid that penalty. It doesn't take any additional memory, just a handful of bits we aren't using. :-) There are common schemes that allow constant-time issubclass tests, although they do require more memory: 1. give each base class a small unique number, starting with 1 (0 means no number has been assigned). Give each class a bitmap of all base classes, plus a field for the maximum-numbered base class. Then, def issubclass(C, B): return B.classnum and (B.classnum C.maxbasenum) and\ bit_set(C.basenums, B.classnum) Supports multiple inheritance, space requirement linear with the number of classes that are used as base classes. Numbering should recycle class numbers when a class is gced. 2. restrict optimization to single-inheritance. Give each class a depth (distance from object, 0 for object and multiply-inherited classes). Also give each class an array of bases, ordered by depth. Then, def issubclass(C, B): if not C.depth: return expensive_issubclass(C, B) return B.depth C.depth and (C.bases[B.depth] is B) Space requirement is linear with the depth of the class (but I think tp_mro could be used, if the formula is changed to (C.bases[C.depth-B.depth] is B)) Two more: 3. Use a global cache of class objects that caches the issubclass() lookups. This is only amortized constant time, but easy to implement and has a few other nice features such as e.g. enabling traversal of the complete class inheritance forest (or tree if you just have new-style classes). Use weak references to the classes if you want to be able to GC them. 4. Freeze classes after they are constructed, meaning that all attributes from base-classes get bound to the inheriting class. This also speeds up method lookups considerably. Works great with classic classes, I'm not sure about new-style classes using e.g. staticmethods, slots and the like. mxTools has an implementation for classic classes called freeze(). If you add special attributes such as ._issubclass_XYZ in the process, issubclass() would then be a single attribute lookup which is constant time. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 16 2006) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Martin v. Löwis wrote: Greg Ewing schrieb: Also it means you'd pay a penalty every time you access it That penalty is already paid today. You'd still have that penalty, *plus* the overhead of bit masking to get at the value. Maybe that wouldn't be noticeable among all the other overheads, though. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On 8/16/06, Phillip J. Eby [EMAIL PROTECTED] wrote: It seems to me that you could drop the FAST_SUBCLASS bit, since none of the other bits will be set if it is not a subclass of a builtin. That would free up one flag bit -- perhaps usable for that BaseException flag Guido wants. :) :-) Right, I'm not using the bit currently. I was thinking that it would be interesting to change the CheckExact versions to also use this. It's a little more work, but you lose the second comparison for Check. I expect that it would be slower, but I was just curious. So with the patch we currently have: #define PyInt_CheckExact(op) ((op)-ob_type == PyInt_Type) #define PyInt_Check(op) (PyInt_CheckExact(op) || \ PyType_FastSubclass((op)-ob_type, Py_TPFLAGS_INT_SUBCLASS)) But we could have something like: #define PyInt_CheckExact(op) (PyType_FastClass(op,Py_TPFLAGS_INT_CLASS)) #define PyInt_Check(op) (PyType_FastSubclass(op,Py_TPFLAGS_INT_SUBCLASS)) It would change the CheckExact()s from: op-ob_type == global-variable, to: op-ob_type CONSTANT == CONSTANT. Check would be the same as the CheckExact, just with different constants. The Check version would then drop the || condition. I might play with this at the sprint next week. It does seem to make sense to do BaseException too. It will take 4 or 5 bits to handle the current ones plus BaseException, which we can easily spare in tp_flags. n ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Guido van Rossum wrote: To be honest I have no idea how/why Martin or Tim picked this name. Perhaps it is in POSIX? it's from sys/types.h, which is a posix thing, afaik: http://www.opengroup.org/onlinepubs/007908799/xsh/systypes.h.html /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Alexander Belopolsky wrote: The range object is currently defined in Objects/rangeobject.c as typedef struct { PyObject_HEAD longstart; longstep; longlen; } rangeobject; Is this consistent with PEP 353, or should Py_ssize_t be used instead of long? As others have said: no. The range object produces ints. People have asked to make it produce arbitrary longs instead, but none of these patches ever got successfully reviewed. It looks like some of the code in rangeobject.c is already Py_ssize_t aware (see range_item and range_length), but it assumes that it is safe to cast long to ssize_t and back. Where does it assume that it is safe to case ssize_t - long? That would be a bug. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On Aug 15, 2006, at 12:14 AM, Guido van Rossum wrote: Feel free to submit a patch for Python 2.6. Please see http://sourceforge.net/tracker/index.php? func=detailaid=1540617group_id=5470atid=305470 Perhaps it is in POSIX? Yes, ssize_t - Used for a count of bytes or an error indication., defined in sys/types.h . I promised to shut up, but let me respond to Martin: On Aug 15, 2006, at 3:25 AM, Martin v. Löwis wrote: Alexander Belopolsky wrote: [snip] I did not notice the double 's' and thought it was an unsigned type. Hmm. That you fail to read it correctly can hardly be an argument against it. Google tells me I am not the only one: http://dbforums.com/ showthread.php?t=1430615. What happened to readability counts? In the rationale (XRAT) they say This is intended to be a signed analog of size_t. That's POSIX rationale http://www.opengroup.org/onlinepubs/009695399/ xrat/xsh_chap02.html. PEP 353 says: A new type Py_ssize_t is introduced, which has the same size as the compiler's size_t type, but is signed. They don't mandate it to have the same size, but it is the expectation that implementations typically will. That's one of the reasons why POSIX' ssize_t is a wrong analogy. Another reason is that POSIX interprets negative values of ssize_t as an error indication, not as an offset from the end. A better POSIX analogy would be off_t (as used in lseek). In the discussion, ptrdiff_t was never brought up as an alternative Yes, it was, by you http://mail.python.org/pipermail/python-dev/2006- January/059562.html :-). I was not suggesting Py_ptrdiff_t. My suggestion was Py_index_t, but I am six month late :-(. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
FWIW, I propose that in py3k, the int type uses Py_ssize_t instead of long. Already decided is that in py3k, the (x)range object will support arbitrary integer sizes, so that e.g. range(10**10, 10**10+10) is valid (it currently is, but for different reasons, it isn't with xrange, which will replace range in py3k). For 2.6 and onwards, I propose to let the issue rest *or* eventually backport the py3k xrange implementation; fixing xrange to use Py_ssize_t doesn't seem worth the churn. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Alexander Belopolsky schrieb: Another reason is that POSIX interprets negative values of ssize_t as an error indication, not as an offset from the end. A better POSIX analogy would be off_t (as used in lseek). That's subtle. By this reasoning, ptrdiff_t would be wrong, as well, since it really means negative index, not count from the end. In the discussion, ptrdiff_t was never brought up as an alternative Yes, it was, by you http://mail.python.org/pipermail/python-dev/2006- January/059562.html :-). Ah, that's why it isn't ptrdiff_t: Tim Peters said he liked ssize_t I was not suggesting Py_ptrdiff_t. My suggestion was Py_index_t, but I am six month late :-(. Indeed, it is too late for such a change. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Guido van Rossum schrieb: FWIW, I propose that in py3k, the int type uses Py_ssize_t instead of long. This is really a py3k issue, but: shouldn't the int and long types really get unified into a single type in Py3k? I suppose the i parameter to PyArg_ParseTuple would continue to parse int? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On 8/15/06, Martin v. Löwis [EMAIL PROTECTED] wrote: Guido van Rossum schrieb: FWIW, I propose that in py3k, the int type uses Py_ssize_t instead of long. This is really a py3k issue, but: shouldn't the int and long types really get unified into a single type in Py3k? From the Python *user*'s perspective, yes, as much as possible. But I'm still playing with the thought of having two implementation types, since otherwise we'd have to devote 4 bytes (8 on a 64-bit platform) to the single *bit* telling the difference between the two internal representations. I haven't decided whether to make 'int' an abstract base type and have 'short' and 'long' subtypes (perhaps using other names), or whether to make 'int' the base type and 'short' the subtype, or whether to cheat and hack type() so that type() of any integer always returns int. But in any case, this would mean that at the C level the distinction continues to exist. Maybe we can discuss this at the sprint next week? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Guido van Rossum schrieb: From the Python *user*'s perspective, yes, as much as possible. But I'm still playing with the thought of having two implementation types, since otherwise we'd have to devote 4 bytes (8 on a 64-bit platform) to the single *bit* telling the difference between the two internal representations. We had this discussion before; if you use ob_size==0 to indicate that it's an int, this space isn't needed in a long int. On a 32-bit platform, the size of an int would go up from 12 to 16; if we stop using a special-cased allocator (which we should (*)), there isn't any space increase on such a platform. On a 64-bit platform, the size of an int would go up from 24 bytes to 32 bytes. Regards, Martin (*) people have complained that the memory allocated for a large number of ints isn't ever reused. They consumed it by passing range() some large argument. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On 8/15/06, Martin v. Löwis [EMAIL PROTECTED] wrote: Guido van Rossum schrieb: From the Python *user*'s perspective, yes, as much as possible. But I'm still playing with the thought of having two implementation types, since otherwise we'd have to devote 4 bytes (8 on a 64-bit platform) to the single *bit* telling the difference between the two internal representations. We had this discussion before; if you use ob_size==0 to indicate that it's an int, this space isn't needed in a long int. On a 32-bit platform, the size of an int would go up from 12 to 16; if we stop using a special-cased allocator (which we should (*)), there isn't any space increase on such a platform. On a 64-bit platform, the size of an int would go up from 24 bytes to 32 bytes. Regards, Martin (*) people have complained that the memory allocated for a large number of ints isn't ever reused. They consumed it by passing range() some large argument. Since range() won't return a real list any more, this no longer is the case. I worry that dropping the special allocator will be too slow. And the space increase on 64-bit platforms is too much IMO. But clearly these issues can only be addressed by careful benchmarking. Perhaps we should do some of that to settle the issue. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On Aug 15, 2006, at 6:20 PM, Martin v. Löwis wrote: Guido van Rossum schrieb: From the Python *user*'s perspective, yes, as much as possible. But I'm still playing with the thought of having two implementation types, since otherwise we'd have to devote 4 bytes (8 on a 64-bit platform) to the single *bit* telling the difference between the two internal representations. We had this discussion before; if you use ob_size==0 to indicate that it's an int, this space isn't needed in a long int. On a 32-bit platform, the size of an int would go up from 12 to 16; if we stop using a special-cased allocator (which we should (*)), there isn't any space increase on such a platform. On a 64-bit platform, the size of an int would go up from 24 bytes to 32 bytes. But it's the short int that you probably really want to make size efficient. Which is of course also doable via something like: typedef struct { PyObject_HEAD long ob_islong : 1; long ob_ival_or_size : LONG_BITS - 1; long ob_digit[0]; } PyIntObject; There's no particular reason that a short int must be able to store the entire range of C long, so, as many bits can be stolen from it as desired. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On 8/15/06, James Y Knight [EMAIL PROTECTED] wrote: On Aug 15, 2006, at 6:20 PM, Martin v. Löwis wrote: Guido van Rossum schrieb: From the Python *user*'s perspective, yes, as much as possible. But I'm still playing with the thought of having two implementation types, since otherwise we'd have to devote 4 bytes (8 on a 64-bit platform) to the single *bit* telling the difference between the two internal representations. We had this discussion before; if you use ob_size==0 to indicate that it's an int, this space isn't needed in a long int. On a 32-bit platform, the size of an int would go up from 12 to 16; if we stop using a special-cased allocator (which we should (*)), there isn't any space increase on such a platform. On a 64-bit platform, the size of an int would go up from 24 bytes to 32 bytes. But it's the short int that you probably really want to make size efficient. Which is of course also doable via something like: typedef struct { PyObject_HEAD long ob_islong : 1; long ob_ival_or_size : LONG_BITS - 1; long ob_digit[0]; } PyIntObject; There's no particular reason that a short int must be able to store the entire range of C long, so, as many bits can be stolen from it as desired. There isn't? Actually a lot of APIs currently assumen that. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
James Y Knight schrieb: But it's the short int that you probably really want to make size efficient. Only if you have many of them. And if you do, you have the problem of the special-cased allocator: when the many ints go away, Python will hold onto their memory forever. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On 8/15/06, Martin v. Löwis [EMAIL PROTECTED] wrote: James Y Knight schrieb: But it's the short int that you probably really want to make size efficient. Only if you have many of them. And if you do, you have the problem of the special-cased allocator: when the many ints go away, Python will hold onto their memory forever. But that's a bit of a corner case. There are plenty of cases where ints are allocated and deallocated at a fast rate without allocating tons of them at once, or where there's no need to reuse the same memory for something else later. I wonder if we could have a smarter int allocator that allocates some ints in a special array but switches to the standard allocator if too many are being allocated? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On Aug 15, 2006, at 7:06 PM, Guido van Rossum wrote: There's no particular reason that a short int must be able to store the entire range of C long, so, as many bits can be stolen from it as desired. There isn't? Actually a lot of APIs currently assumen that. I thought we were talking about Py3k. *IF* the idea is to integrate both short/long ints into a single type, with only an internal distinction (which is what is being discussed), all those APIs are broken already. The particular internal division of the new int object between short and long doesn't matter. Which is all I was saying. If combining all integers into a single type isn't actually desired, then neither my message nor Martin's is relevant. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On 8/15/06, James Y Knight [EMAIL PROTECTED] wrote: On Aug 15, 2006, at 7:06 PM, Guido van Rossum wrote: There's no particular reason that a short int must be able to store the entire range of C long, so, as many bits can be stolen from it as desired. There isn't? Actually a lot of APIs currently assumen that. I thought we were talking about Py3k. *IF* the idea is to integrate both short/long ints into a single type, with only an internal distinction (which is what is being discussed), all those APIs are broken already. The particular internal division of the new int object between short and long doesn't matter. Which is all I was saying. If combining all integers into a single type isn't actually desired, then neither my message nor Martin's is relevant. As I said before, at the C level I expect the distinction between int and long to be alive and well. Changing it so that ints don't have the full range of the corresponding C type would be painful IMO. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Martin v. Löwis wrote: We had this discussion before; if you use ob_size==0 to indicate that it's an int, this space isn't needed in a long int. What about int subclasses? -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Guido van Rossum wrote: I worry that dropping the special allocator will be too slow. Surely there's some compromise that would allow recently-used ints to be kept around, but reclaimed if memory becomes low? -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Guido van Rossum wrote: On 8/15/06, James Y Knight [EMAIL PROTECTED] wrote: There's no particular reason that a short int must be able to store the entire range of C long, so, as many bits can be stolen from it as desired. There isn't? Actually a lot of APIs currently assumen that. Also it means you'd pay a penalty every time you access it, whereas presumably short ints are the case we want to optimise for speed as well. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
[Greg Ewing] Surely there's some compromise that would allow recently-used ints to be kept around, but reclaimed if memory becomes low? Not without losing /something/ we currently enjoy. The current int scheme has theoretically optimal memory use too, consuming 12 bytes per int object on a 32-bit box. That's minimal, because the object's type pointer, refcount, and integer value each require 4 bytes on a 32-bit box. It does this by allocating big blocks and carving them into 12-byte slices itself. But as with any big block scheme, a live integer anywhere in a block prevents the entire block from being freed, and objects can't /move/ in CPython either (so, e.g., if a single live integer is keeping a big block alive, we can't move it into some other block). That's the same kind of problem obmalloc has with its very big block arenas, and for which only heuristic help was added (with considerable increase in complexity) for 2.5 (before 2.5, obmalloc never freed an arena, same as ints and floats never free their versions of big blocks). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
Guido van Rossum wrote: Methinks that as long as PyIntObject uses long (see intobject.h) there's no point in changing this to long. Those fields are going to have to change to Py_Object* eventually if xrange() is going to become the range() replacement in Py3k. . . Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On Aug 14, 2006, at 7:32 PM, Guido van Rossum wrote: Methinks that as long as PyIntObject uses long (see intobject.h) there's no point in changing this to long. I guess you meant changing this to Py_ssize_t . I don't understand why the type used by PyIntObject is relevant here. Range object's start is logically an index, but int object's ob_ival is not. Since PyIntObject's is definition is exposed by Python.h, changing the type of ob_ival will probably break a lot of code. This reasoning does not apply to the range object. Since on most platforms ssize_t is the same as long, the choice between the two is just a matter of self-documenting code. Speaking of which, I find it unfortunate that the name Py_ssize_t was selected for the typedef. I would prefer Py_index_t. The first time I saw Py_ssize_t, I did not notice the double 's' and thought it was an unsigned type. On the second look, I've realized that it is signed and started wondering why not ptrdiff_t. I understand that ssize_t is defined by POSIX as the return type of functions such as read that can return either size or -1 for error. I don't think POSIX mandates sizeof(size_t) == sizeof(ssize_t), but I may be wrong. I would agree that ptrdiff_t, although standard C, is not a very intuitive name, but ssize_t is even less clear. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On 8/14/06, Alexander Belopolsky [EMAIL PROTECTED] wrote: On Aug 14, 2006, at 7:32 PM, Guido van Rossum wrote: Methinks that as long as PyIntObject uses long (see intobject.h) there's no point in changing this to long. I guess you meant changing this to Py_ssize_t . Yup, sorry. I don't understand why the type used by PyIntObject is relevant here. Because the only way to create one (in 2.5 or before) is by passing it a Python int. Range object's start is logically an index, but int object's ob_ival is not. Since PyIntObject's is definition is exposed by Python.h, changing the type of ob_ival will probably break a lot of code. This reasoning does not apply to the range object. But since the start and end come from a Python int, what advantage would it have to use Py_ssize_t instead? We know sizeof(long) = sizeof(Py_ssize_t). Since on most platforms ssize_t is the same as long, the choice between the two is just a matter of self-documenting code. Speaking of which, I find it unfortunate that the name Py_ssize_t was selected for the typedef. I would prefer Py_index_t. The first time I saw Py_ssize_t, I did not notice the double 's' and thought it was an unsigned type. Blame the C standards committee -- they introduced ssize_t in C99. On the second look, I've realized that it is signed and started wondering why not ptrdiff_t. Because it is not used for the difference of pointers? Frankly, you're about 6 months too late with naming concerns. It's not going to change now. The PEP has been discussed and the code reviewed ad infinitum. We're only looking for bugs now, and so far the issue you've brought up decidedly falls in the category non-bug. I understand that ssize_t is defined by POSIX as the return type of functions such as read that can return either size or -1 for error. I don't think POSIX mandates sizeof(size_t) == sizeof(ssize_t), but I may be wrong. I would agree that ptrdiff_t, although standard C, is not a very intuitive name, but ssize_t is even less clear. Water under the bridge. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On Aug 14, 2006, at 10:56 PM, Guido van Rossum wrote: Because the only way to create one (in 2.5 or before) is by passing it a Python int. Is that true? Python 2.4.2 (#2, Jan 13 2006, 12:00:38) xrange(long(2)) xrange(2) But since the start and end come from a Python int, what advantage would it have to use Py_ssize_t instead? We know sizeof(long) = sizeof(Py_ssize_t). They don't have to come from a python int, they can come from a long.I guess range_new would have to be changed to the 'n' conversion code instead of 'l' to do the proper conversion. Similarly, rangeiter_next will need to use PyInt_FromSsize_t . Blame the C standards committee -- they introduced ssize_t in C99. I've checked in the Wiley's 2003 edition of The C Standard and it does not have ssize_t . I could not find anything suggesting that it is in the C standard on google either. Are you sure it is in C99? On the second look, I've realized that it is signed and started wondering why not ptrdiff_t. Because it is not used for the difference of pointers? After being exposed to C++, ptrdiff_t (or more generally difference_type) is what I expect as an argument of a [] operator that allows negative indices (such as operator[] of C++ iterators). Frankly, you're about 6 months too late with naming concerns. I know :-( I'll shut up now. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On 8/14/06, Alexander Belopolsky [EMAIL PROTECTED] wrote: On Aug 14, 2006, at 10:56 PM, Guido van Rossum wrote: Because the only way to create one (in 2.5 or before) is by passing it a Python int. Is that true? Python 2.4.2 (#2, Jan 13 2006, 12:00:38) xrange(long(2)) xrange(2) But a long with a value larger than sys.maxint is never accepted right? But since the start and end come from a Python int, what advantage would it have to use Py_ssize_t instead? We know sizeof(long) = sizeof(Py_ssize_t). They don't have to come from a python int, they can come from a long.I guess range_new would have to be changed to the 'n' conversion code instead of 'l' to do the proper conversion. Similarly, rangeiter_next will need to use PyInt_FromSsize_t . Feel free to submit a patch for Python 2.6. I expect the 2.5 release managers to reject this on the basis of being a new feature. Blame the C standards committee -- they introduced ssize_t in C99. I've checked in the Wiley's 2003 edition of The C Standard and it does not have ssize_t . I could not find anything suggesting that it is in the C standard on google either. Are you sure it is in C99? To be honest I have no idea how/why Martin or Tim picked this name. Perhaps it is in POSIX? I doubt they made it up; ssize_t is used all over the standard headers in the 4-year-old Linux distro (Red Hat 7.3) running on my ancient home desktop. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Type of range object members
On Tuesday 15 August 2006 14:14, Guido van Rossum wrote: They don't have to come from a python int, they can come from a long.I guess range_new would have to be changed to the 'n' conversion code instead of 'l' to do the proper conversion. Similarly, rangeiter_next will need to use PyInt_FromSsize_t . Feel free to submit a patch for Python 2.6. I expect the 2.5 release managers to reject this on the basis of being a new feature. Absolutely correct. At this point in the release process, there's going to be a massive resistance to changes. Anthony -- Anthony Baxter [EMAIL PROTECTED] It's never too late to have a happy childhood. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com