Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant wrote: Paul Moore wrote: Enough of the abstract. As a concrete example, suppose I have a (byte) string in my program containing some binary data - an ID3 header, or a TCP packet, or whatever. It doesn't really matter. Does your proposal offer anything to me in how I might manipulate that data (assuming I'm not using NumPy)? (I'm not insisting that it should, I'm just trying to understand the scope of the PEP). What do you mean by manipulate the data. The proposal for a data-format object would help you describe that data in a standard way and therefore share that data between several library that would be able to understand the data (because they all use and/or understand the default Python way to handle data-formats). Perhaps the most relevant thing to pull from this conversation is back to what Martin has asked about before: flexible array members. A TCP packet has no defined length (there isn't even a header field in the packet for this, so in fairness we can talk about IP packets which do). There is no way for me to describe this with the pre-PEP data-formats. I feel like it is misleading of you to say it's up to the package to do manipulations, because you glanced over the fact that you can't even describe this type of data. ISTM, that you're only interested in describing repetitious fixed-structure arrays. If we are going to have a default Python way to handle data-formats, then don't you feel like this falls short of the mark? I fear that you speak about this in too grandiose terms and are now trapped by people asking, well, can I do this? I think for a lot of folks the answer is: nope. With respect to the network packets, this PEP doesn't do anything to fix the communication barrier. Is this not in the scope of a consistent and standard way to discuss the format of binary data (which is what your PEP's abstract sets out as the task)? -- Scott Dial [EMAIL PROTECTED] [EMAIL PROTECTED] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Perhaps the most relevant thing to pull from this conversation is back to what Martin has asked about before: flexible array members. A TCP packet has no defined length (there isn't even a header field in the packet for this, so in fairness we can talk about IP packets which do). There is no way for me to describe this with the pre-PEP data-formats. I feel like it is misleading of you to say it's up to the package to do manipulations, because you glanced over the fact that you can't even describe this type of data. ISTM, that you're only interested in describing repetitious fixed-structure arrays. Yes, that's right. I'm only interested in describing binary data with a fixed length. Others can help push it farther than that (if they even care). If we are going to have a default Python way to handle data-formats, then don't you feel like this falls short of the mark? Not for me. We can fix what needs fixing, but not if we can't get out of the gate. I fear that you speak about this in too grandiose terms and are now trapped by people asking, well, can I do this? I think for a lot of folks the answer is: nope. With respect to the network packets, this PEP doesn't do anything to fix the communication barrier. Yes it could if you were interested in pushing it there. No, I didn't solve that particular problem with the PEP (because I can only solve the problems I'm aware of), but I do think the problem could be solved. We have far too many nay-sayers on this list, I think. Right now, I don't have time to push this further. My real interest is the extended buffer protocol. I want something that works for that. When I do have time again to discuss it again, I might come back and push some more. But, not now. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
On 11/2/06, Travis Oliphant [EMAIL PROTECTED] wrote: What do you mean by manipulate the data. The proposal for a data-format object would help you describe that data in a standard way and therefore share that data between several library that would be able to understand the data (because they all use and/or understand the default Python way to handle data-formats). It would be up to the other packages to manipulate the data. Yes, some other messages I read since I posted this clarified it for me. Essentially, as a Python programmer, there's nothing in the PEP for me - it's for extension writers (and maybe writers of some lower-level Python modules? I'm not sure about this). So as I'm not really the target audience, I won't comment further. So, what you would be able to do is take your byte-string and create a buffer object which you could then share with other packages: Example: b = buffer(bytestr, format=data_format_object) Now. a = numpy.frombuffer(b) a['field1'] # prints data stored in the field named field1 etc. Or. cobj = ctypes.frombuffer(b) # Now, cobj is a ctypes object that is basically a structure that can be passed # directly to your C-code. Does this help? Somewhat. My understanding is that the python-level buffer object is frowned upon as not good practice, and is scheduled for removal at some point (Py3K, quite possibly?) Hence, any code that uses buffer() feels like it needs to be replaced by something more acceptable. So although I understand the use you suggest, it's not compelling to me because I am left with the feeling that I wish I knew the way to do it that didn't need the buffer object (even though I realise intellectually that such a way may not exist). Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Travis E. Oliphant schrieb: 2. Should primitive type codes be characters or integers (from an enum) at C level? - I prefer integers 3. Should size be expressed in bits or bytes? - I prefer bits So, you want an integer enum for the kind and an integer for the bitsize? That's fine with me. One thing I just remembered. We have T_UBYTE and T_BYTE, etc. defined in structmember.h already. Should we just re-use those #defines while adding to them to make an easy to use interface for primitive types? Notice that those type codes imply sizes, namely the platform sizes (where platform always means what the C compiler does). So if you want to have platform-independent codes as well, you shouldn't use the T_ codes. In NumPy we've found it convenient to use both. Basically, we've set up a header file that does the translation using #defines and typedefs to create things like (on a 32-bit platform) typedef npy_int32 int #define NPY_INT32 NPY_INT So, that either the T_code-like enum or the bit-width can be used interchangable. Typically people want to specify bit-widths (and see their data-types in bit-widths) but in C-code that implements something you need to use one of the platform integers. I don't know if we really need to bring all of that over. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Ronald Oussoren schrieb: On Oct 31, 2006, at 6:38 PM, Thomas Heller wrote: This mechanism is probably a hack because it'n not possible to add C accessible fields to type objects, on the other hand it is extensible (in principle, at least). I better start rewriting PyObjC then :-). PyObjC stores some addition information in the type objects that are used to describe Objective-C classes (such as a reference to the proxied class). IIRC This has been possible from Python 2.3. I assume you are referring to the code in pyobjc/Modules/objc/objc-class.h ? If this really is reliable I should better start rewriting ctypes then ;-). Hm, I always thought there was some additional magic going on with type objects, fields appended dynamically at the end or whatever. Thomas ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
On Nov 2, 2006, at 9:35 PM, Thomas Heller wrote: Ronald Oussoren schrieb: On Oct 31, 2006, at 6:38 PM, Thomas Heller wrote: This mechanism is probably a hack because it'n not possible to add C accessible fields to type objects, on the other hand it is extensible (in principle, at least). I better start rewriting PyObjC then :-). PyObjC stores some addition information in the type objects that are used to describe Objective-C classes (such as a reference to the proxied class). IIRC This has been possible from Python 2.3. I assume you are referring to the code in pyobjc/Modules/objc/objc- class.h Yes. If this really is reliable I should better start rewriting ctypes then ;-). Hm, I always thought there was some additional magic going on with type objects, fields appended dynamically at the end or whatever. There is such magic, but that magic was updated in Python 2.3 to allow type-object extensions like this. Ronald smime.p7s Description: S/MIME cryptographic signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant wrote: or just numpy.array(array.array('d',[1,2,3])) and leave-out the buffer object all together. I think the buffer object in his example was just a placeholder for some arbitrary object that supports the buffer interface, not necessarily another NumPy array. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant wrote: We have T_UBYTE and T_BYTE, etc. defined in structmember.h already. Should we just re-use those #defines while adding to them to make an easy to use interface for primitive types? They're mixed up with size information, though, which we don't want to do. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Paul Moore p.f.moore at gmail.com writes: Somewhat. My understanding is that the python-level buffer object is frowned upon as not good practice, and is scheduled for removal at some point (Py3K, quite possibly?) Hence, any code that uses buffer() feels like it needs to be replaced by something more acceptable. Python 2.x buffer object serves two distinct purposes. First, it is a mutable string object and this is definitely not going away being replaced by the bytes object. (Interestingly, this functionality is not exposed to python, but C extension modules can call PyBuffer_New(size) to create a buffer.) Second, it is a view into any object supporting buffer protocol. For a while this usage was indeed frowned upon because buffer objects held the pointer obtained from bf_get*buffer for too long causing memory errors in situations like this: a = array('c', x*10) b = buffer(a, 5, 2) a.extend('x'*1000) str(b) 'xx' This problem was fixed more than two years ago. -- r35400 | nascheme | 2004-03-10 Make buffer objects based on mutable objects (like array) safe. -- Even though it was suggested in the past that buffer *object* should be deprecated as unsafe, I don't remember seeing a call to deprecate the buffer protocol. So although I understand the use you suggest, it's not compelling to me because I am left with the feeling that I wish I knew the way to do it that didn't need the buffer object (even though I realise intellectually that such a way may not exist). As I explained in another post, I used buffer object as an example of an object that supports buffer protocol, but does not export type information in the form usable by numpy. Here is another way to illustrate the problem: a = numpy.array(array.array('H', [1,2,3])) b = numpy.array([1,2,3],dtype='H') a.dtype == b.dtype False With the extended buffer protocol it will be possible for numpy.array(..) to realize that array.array('H', [1,2,3]) is a sequence of unsigned short integers and convert it accordingly. Currently numpy has to go through the sequence protocol to create a numpy.array from an array.array and loose the type information. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Bill Baxter schrieb: Basically in my code I want to be able to take the binary data descriptor and say give me the 'r' field of this pixel as an integer. Is either one (the PEP or c-types) clearly easier to use in this case? What would the code look like for handling both formats generically? The PEP, as specified, does not support accessing individual fields from Python. OTOH, ctypes, as implemented, does. This comparison is not fair, though: an *implementation* of the PEP (say, NumPy) might also give you Python-level access to the fields. With the PEP, you can get access to the 'r' field from C code. Performing this access is quite tedious; as I'm uncertain whether you actually wanted to see C code, I refrain from trying to formulate it. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant wrote: Nick Coghlan wrote: In fact, it may make sense to just use the lists/strings directly as the data exchange format definitions, and let the various libraries do their own translation into their private format descriptions instead of creating a new one-type-to-describe-them-all. Yes, I'm open to this possibility. I basically want two things in the object passed through the extended buffer protocol: 1) It's fast on the C-level 2) It covers all the use-cases. If just a particular string or list structure were passed, then I would drop the data-format PEP and just have the dataformat argument of the extended buffer protocol be that thing. Then, something that converts ctypes objects to that special format would be very nice indeed. It may make sense to have a couple distinct sections in the datatype PEP: a. describing data formats with basic Python types b. a lightweight class for parsing these data format descriptions It's most of the way there already - part A would just be the various styles of arguments accepted by the datatype constructor, and part B would be the datatype object itself. I personally think it makes the most sense to do both, but separating the two would make it clear that the descriptions can be standardised without *necessarily* defining a new class. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP: Adding data-type objects to Python
I'm still not sure exactly what is missing from ctypes. To make this concrete: You have an array of 500 elements meeting struct { int simple; struct nested { char name[30]; char addr[45]; int amount; } ctypes can describe this as class nested(Structure): _fields_ = [(name, c_char*30), (addr, c_char*45), (amount, c_long)] class struct(Structure): _fields_ = [(simple, c_int), (nested, nested)] desc = struct * 500 You have said that creating whole classes is too much overhead, and the description should only be an instance. To me, that particular class (arrays of 500 structs) still looks pretty lightweight. So please clarify when it starts to be a problem. (1) For simple types -- mapping char name[30]; == (name, c_char*30) Do you object to using the c_char type? Do you object to the array-of-length-30 class, instead of just having a repeat or shape attribute? Do you object to naming the field? (2) For the complex types, nested and struct Do you object to creating these two classes even once? For example, are you expecting to need different classes for each buffer, and to have many buffers created quickly? Is creating that new class a royal pain, but frequent (and slow) enough that you can't just make a call into python (or ctypes)? (3) Given that you will describe X, is X*500 (== a type describing an array of 500 Xs) a royal pain in C? If so, are you expecting to have to do it dynamically for many sizes, and quickly enough that you can't just let ctypes do it for you? -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Jim Jewett wrote: I'm still not sure exactly what is missing from ctypes. To make this concrete: I think the only thing missing from ctypes expressiveness as far as I can tell in terms of what you can do is the byte-order representation. What is missing is ease-of use for producers and consumers in interpreting the data-type. When I speak of Producers and consumers, I'm largely talking about C-code (or Java or .NET) code writers. Producers must basically use Python code to create classes of various types. This is going to be slow in 'C'. Probably slower than the array interface (which is what we have no informally). Consumers are going to have a hard time interpreting the result. I'm not even sure how to do that, in fact. I'd like NumPy to be able to understand ctypes as a means to specify data. Would I have to check against all the sub-types of CDataType, pull out the fields, check the tp_name of the type object? I'm not sure. It seems like a string with the C-structure would be better as a data-representation, but then a third-party library would want to parse that so that Python might as well have it's own parser for data-types. So, Python might as well have it's own way to describe data. My claim is this default way should *not* be overloaded by using Python type-objects (the ctypes way). I'm making a claim that the NumPy way of using a different Python object to describe data-types. I'm not saying the NumPy object should be used. I'm saying we should come up with a singe DataFormatType whose instances express the data formats in ways that other packages can produce and consume (or even use internally). It would be easy for NumPy to use the default Python object in it's PyArray_Descr * structure. It would also be easy for ctypes to use the default Python object in its StgDict object that is the tp_dict of every ctypes type object. It would be easy for the struct module to allow for this data-format object (instead of just strings) in it's methods. It would be easy for the array module to accept this data-format object (instead of just typecodes) in it's constructor. Lot's of things would suddenly be more consistent throughout both the Python and C-Python user space. Perhaps after discussion, it becomes clear that the ctypes approach is sufficient to be that thing that all modules use to share data-format information. It's definitely expressive enough. But, my argument is that NumPy data-type objects are also pretty close. so why should they be rejected. We could also make a string-syntax do it. You have said that creating whole classes is too much overhead, and the description should only be an instance. To me, that particular class (arrays of 500 structs) still looks pretty lightweight. So please clarify when it starts to be a problem. (1) For simple types -- mapping char name[30]; == (name, c_char*30) Do you object to using the c_char type? Do you object to the array-of-length-30 class, instead of just having a repeat or shape attribute? Do you object to naming the field? (2) For the complex types, nested and struct Do you object to creating these two classes even once? For example, are you expecting to need different classes for each buffer, and to have many buffers created quickly? I object to the way I consume and produce the ctypes interface. It's much to slow to be used on the C-level for sharing many small buffers quickly. Is creating that new class a royal pain, but frequent (and slow) enough that you can't just make a call into python (or ctypes)? (3) Given that you will describe X, is X*500 (== a type describing an array of 500 Xs) a royal pain in C? If so, are you expecting to have to do it dynamically for many sizes, and quickly enough that you can't just let ctypes do it for you? That pretty much sums it up (plus the pain of having to basically write Python code from C). -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Jim Jewett wrote: I'm still not sure exactly what is missing from ctypes. To make this concrete: I was too hasty. There are some things actually missing from ctypes: 1) long double (this is not the same across platforms, but it is a data-type). 2) complex-valued types (you might argue that it's just a 2-array of floats, but you could say the same thing about int as an array of bytes). The point is how do people interpret the data. Complex-valued data-types are very common. It is one reason Fortran is still used by scientists. 3) Unicode characters (there is w_char support but I mean a way to describe what kind of unicode characters you have in a cross-platform way). I actually think we have a way to describe encodings in the data-format representation as well. 4) What about floating-point representations that are not IEEE 754 4-byte or 8-byte. There should be a way to at least express the data-format in these cases (this is actually how long double should be handled as well since it varies across platforms what is actually done with the extra bits). So, we can't just use ctypes as a complete data-format representation because it's also missing some things. What we need is a standard way for libraries that deal with data-formats to communicate with each other. I need help with a PEP like this and that's what I'm asking for. It's all I've really been after all along. A couple of points: * One reason to support the idea of the Python object approach (versus a string-syntax) is that it is already parsed. A list-syntax approach (perhaps built from strings for fundamental data-types) might also be considered already parsed as well. * One advantage of using kind versus a character for every type (like struct and array do) is that it helps consumers and producers speed up the parser (a fuller branching tree). -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant schrieb: I was too hasty. There are some things actually missing from ctypes: I think Thomas can correct me if I'm wrong: I think endianness is supported (although this support seems undocumented). There seems to be code that checks for the presence of a _byteswapped_ attribute on fields of a struct; presence of this field is then interpreted as data having the other endianness. 1) long double (this is not the same across platforms, but it is a data-type). That's indeed missing. 2) complex-valued types (you might argue that it's just a 2-array of floats, but you could say the same thing about int as an array of bytes). The point is how do people interpret the data. Complex-valued data-types are very common. It is one reason Fortran is still used by scientists. Well, by the same reasoning, you could argue that pixel values (RGBA) are missing in the PEP. It's a convenience, sure, and it may also help interfacing with the platform's FORTRAN implementation - however, are you sure that NumPy's complex layout is consistent with the platform's C99 _Complex definition? 3) Unicode characters 4) What about floating-point representations that are not IEEE 754 4-byte or 8-byte. Both of these are available in a platform-dependent way: if the platform uses non-IEEE754 formats for C float and C double, ctypes will interface with that just fine. It is actually vice versa: IEEE-754 4-byte and 8-byte is not supported in ctypes. Same for Unicode: the platform's wchar_t is supported (as you said), but not a platform-independent (say) 4-byte little-endian. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Travis E. Oliphant schrieb: 2) complex-valued types (you might argue that it's just a 2-array of floats, but you could say the same thing about int as an array of bytes). The point is how do people interpret the data. Complex-valued data-types are very common. It is one reason Fortran is still used by scientists. Well, by the same reasoning, you could argue that pixel values (RGBA) are missing in the PEP. It's a convenience, sure, and it may also help interfacing with the platform's FORTRAN implementation - however, are you sure that NumPy's complex layout is consistent with the platform's C99 _Complex definition? I think so (it is on gcc). And yes, where you draw the line between fundamental and derived data-type is somewhat arbitrary. I'd rather include complex-numbers than not given their prevalence in the data-streams I'm trying to make compatible with each other. 3) Unicode characters 4) What about floating-point representations that are not IEEE 754 4-byte or 8-byte. Both of these are available in a platform-dependent way: if the platform uses non-IEEE754 formats for C float and C double, ctypes will interface with that just fine. It is actually vice versa: IEEE-754 4-byte and 8-byte is not supported in ctypes. That's what I meant. The 'f' kind in the data-type description is also intended to mean platform float whatever that is. But, a complete data-format representation would have a way to describe other bit-layouts for floating point representation. Even if you can't actually calculate directly with them without conversion. Same for Unicode: the platform's wchar_t is supported (as you said), but not a platform-independent (say) 4-byte little-endian. Right. It's a matter of scope. Frankly, I'd be happy enough to start with typecodes in the extended buffer protocol (that's where the array module is now) and then move up to something more complete later. But, since we already have an array interface for record-arrays to share information and data with each other, and ctypes showing all of it's power, then why not be more complete? -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant oliphant.travis at ieee.org writes: Frankly, I'd be happy enough to start with typecodes in the extended buffer protocol (that's where the array module is now) and then move up to something more complete later. Let's just start with that. The way I see the problem is that buffer protocol is fine as long as your data is an array of bytes, but if it is an array of doubles, you are out of luck. So, while I can do b = buffer(array('d', [1,2,3])) there is not much that I can do with b. For example, if I want to pass it to numpy, I will have to provide the type and shape information myself: numpy.ndarray(shape=(3,), dtype=float, buffer=b) array([ 1., 2., 3.]) With the extended buffer protocol, I should be able to do numpy.array(b) So let's start by solving this problem and limit it to data that can be found in a standard library array. This way we can postpone the discussion of shapes, strides and nested structs. I propose a simple bf_gettypeinfo(PyObject *obj, int* type, int* bitsize) method that would return a type code and the size of the data item. I believe it is better to have type codes free from size information for several reasons: 1. Generic code can use size information directly without having to know that int is 32 and double is 64 bits. 2. Odd sizes can be easily described without having to add a new type code. 3. I assume that the existing bf_ functions would still return size in bytes, so having item size available as an int will help to get number of items. If we manage to agree on the standard way to pass primitive type information, it will be a big achievement and immediately useful because simple arrays are already in the standard library. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
On 11/1/06, Alexander Belopolsky [EMAIL PROTECTED] wrote: Let's just start with that. The way I see the problem is that buffer protocol is fine as long as your data is an array of bytes, but if it is an array of doubles, you are out of luck. So, while I can do b = buffer(array('d', [1,2,3])) there is not much that I can do with b. For example, if I want to pass it to numpy, I will have to provide the type and shape information myself: numpy.ndarray(shape=(3,), dtype=float, buffer=b) array([ 1., 2., 3.]) With the extended buffer protocol, I should be able to do numpy.array(b) As a data point, this is the first posting that has clearly explained to me what the two PEPs are attempting to achieve. That may be my blindness to what others find self-evident, but equally, I may not be the only one who needed this example... Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Alexander Belopolsky wrote: Travis Oliphant oliphant.travis at ieee.org writes: b = buffer(array('d', [1,2,3])) there is not much that I can do with b. For example, if I want to pass it to numpy, I will have to provide the type and shape information myself: numpy.ndarray(shape=(3,), dtype=float, buffer=b) array([ 1., 2., 3.]) With the extended buffer protocol, I should be able to do numpy.array(b) or just numpy.array(array.array('d',[1,2,3])) and leave-out the buffer object all together. So let's start by solving this problem and limit it to data that can be found in a standard library array. This way we can postpone the discussion of shapes, strides and nested structs. Don't lump those ideas together. Shapes and strides are necessary for N-dimensional array's (it's essentially what *defines* the N-dimensional array). I really don't want to sacrifice those in the extended buffer protocol. If you want to separate them into different functions then that is a possibility. If we manage to agree on the standard way to pass primitive type information, it will be a big achievement and immediately useful because simple arrays are already in the standard library. We could start there, I suppose. Especially if it helps us all get on the same page. But, we already see the applications beyond this simple case so I would like to have at least an eye for the more difficult case which we already have a working solution for in the array interface -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Paul Moore wrote: Enough of the abstract. As a concrete example, suppose I have a (byte) string in my program containing some binary data - an ID3 header, or a TCP packet, or whatever. It doesn't really matter. Does your proposal offer anything to me in how I might manipulate that data (assuming I'm not using NumPy)? (I'm not insisting that it should, I'm just trying to understand the scope of the PEP). What do you mean by manipulate the data. The proposal for a data-format object would help you describe that data in a standard way and therefore share that data between several library that would be able to understand the data (because they all use and/or understand the default Python way to handle data-formats). It would be up to the other packages to manipulate the data. So, what you would be able to do is take your byte-string and create a buffer object which you could then share with other packages: Example: b = buffer(bytestr, format=data_format_object) Now. a = numpy.frombuffer(b) a['field1'] # prints data stored in the field named field1 etc. Or. cobj = ctypes.frombuffer(b) # Now, cobj is a ctypes object that is basically a structure that can be passed # directly to your C-code. Does this help? -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant oliphant.travis at ieee.org writes: Don't lump those ideas together. Shapes and strides are necessary for N-dimensional array's (it's essentially what *defines* the N-dimensional array). I really don't want to sacrifice those in the extended buffer protocol. If you want to separate them into different functions then that is a possibility. I don't understand. Do you want to discuss shapes and strides separately from the datatype or not? Note that in ctypes shape is a property of datatype (as in c_int*2*3). In your proposal, shapes and strides are communicated separately. This presents a unique memory management challenge: if the object does not contain shape information in a ready to be pointed to form, who is responsible for deallocating the shape array? If we manage to agree on the standard way to pass primitive type information, it will be a big achievement and immediately useful because simple arrays are already in the standard library. We could start there, I suppose. Especially if it helps us all get on the same page. Let's start: 1. Should primitive types be associated with simple type codes (short, int, long, float, double) or type/size pairs [(int,16), (int, 32), (int, 64), (float, 32), (float, 64)]? - I prefer pairs 2. Should primitive type codes be characters or integers (from an enum) at C level? - I prefer integers 3. Should size be expressed in bits or bytes? - I prefer bits ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant oliphant.travis at ieee.org writes: Alexander Belopolsky wrote: ... 1. Should primitive types be associated with simple type codes (short, int, long, float, double) or type/size pairs [(int,16), (int, 32), (int, 64), (float, 32), (float, 64)]? - I prefer pairs 2. Should primitive type codes be characters or integers (from an enum) at C level? - I prefer integers Are these orthogonal? Do you mean are my quiestions 1 and 2 orthogonal? I guess they are. 3. Should size be expressed in bits or bytes? - I prefer bits So, you want an integer enum for the kind and an integer for the bitsize? That's fine with me. One thing I just remembered. We have T_UBYTE and T_BYTE, etc. defined in structmember.h already. Should we just re-use those #defines while adding to them to make an easy to use interface for primitive types? I was thinking about using something like NPY_TYPES enum, but T_* codes would work as well. Let me just present both options for the record: --- numpy/ndarrayobject.h --- enum NPY_TYPES {NPY_BOOL=0, NPY_BYTE, NPY_UBYTE, NPY_SHORT, NPY_USHORT, NPY_INT, NPY_UINT, NPY_LONG, NPY_ULONG, NPY_LONGLONG, NPY_ULONGLONG, NPY_FLOAT, NPY_DOUBLE, NPY_LONGDOUBLE, NPY_CFLOAT, NPY_CDOUBLE, NPY_CLONGDOUBLE, NPY_OBJECT=17, NPY_STRING, NPY_UNICODE, NPY_VOID, NPY_NTYPES, NPY_NOTYPE, NPY_CHAR, /* special flag */ NPY_USERDEF=256 /* leave room for characters */ }; --- structmember.h --- /* Types */ #define T_SHORT 0 #define T_INT 1 #define T_LONG 2 #define T_FLOAT 3 #define T_DOUBLE4 #define T_STRING5 #define T_OBJECT6 /* XXX the ordering here is weird for binary compatibility */ #define T_CHAR 7 /* 1-character string */ #define T_BYTE 8 /* 8-bit signed int */ /* unsigned variants: */ #define T_UBYTE 9 #define T_USHORT10 #define T_UINT 11 #define T_ULONG 12 /* Added by Jack: strings contained in the structure */ #define T_STRING_INPLACE13 #define T_OBJECT_EX 16 /* Like T_OBJECT, but raises AttributeError when the value is NULL, instead of converting to None. */ #ifdef HAVE_LONG_LONG #define T_LONGLONG 17 #define T_ULONGLONG 18 #endif /* HAVE_LONG_LONG */ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant schrieb: 2. Should primitive type codes be characters or integers (from an enum) at C level? - I prefer integers 3. Should size be expressed in bits or bytes? - I prefer bits So, you want an integer enum for the kind and an integer for the bitsize? That's fine with me. One thing I just remembered. We have T_UBYTE and T_BYTE, etc. defined in structmember.h already. Should we just re-use those #defines while adding to them to make an easy to use interface for primitive types? Notice that those type codes imply sizes, namely the platform sizes (where platform always means what the C compiler does). So if you want to have platform-independent codes as well, you shouldn't use the T_ codes. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
On 10/31/06, Travis Oliphant [EMAIL PROTECTED] wrote: In order to make sense of the data-format object that I'm proposing you have to see the need to share information about data-format through an extended buffer protocol (which I will be proposing soon). I'm not going to try to argue that right now because there are a lot of people who can do that. So, I'm going to assume that you see the need for it. If you don't, then just suspend concern about that for the moment. There are a lot of us who really see the need for it. [...] Again, my real purpose is the extended buffer protocol. These data-format type is a means to that end. If the consensus is that nobody sees a greater use of the data-format type beyond the buffer protocol, then I will just write 1 PEP for the extended buffer protocol. While I don't personally use NumPy, I can see where an extended buffer protocol like you describe could be advantageous, and so I'm happy to concede that benefit. I can also vaguely see that a unified block of memory description would be useful. My interest would be in the area of the struct module (unpacking and packing data for dumping to byte streams - whether this happens in place or not is not too important to this use case). However, I cannot see how your proposal would help here in practice - does it include the functionality of the struct module (or should it?) If so, then I'd like to see examples of equivalent constructs. If not, then isn't it yet another variation on the theme, adding to the problem of multiple approaches rather than helping? I can also see the parallels with ctypes. Here I feel a little less sure that keeping the two approaches is wrong. I don't know why I feel like that - maybe nothing more than familiarity with ctypes - but I don't have the same reluctance to have both the ctypes data definition stuff and the new datatype proposal. Enough of the abstract. As a concrete example, suppose I have a (byte) string in my program containing some binary data - an ID3 header, or a TCP packet, or whatever. It doesn't really matter. Does your proposal offer anything to me in how I might manipulate that data (assuming I'm not using NumPy)? (I'm not insisting that it should, I'm just trying to understand the scope of the PEP). Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
It might be better not to consider bit to be a type at all, and come up with another way of indicating that the size is in bits. Perhaps 'i4' # 4-byte signed int 'i4b' # 4-bit signed int 'u4' # 4-byte unsigned int 'u4b' # 4-bit unsigned int I like this. Very nice. I think that's the right way to look at it. I remark that 'ib4' and 'ub4' make for marginally easier parsing and less danger of ambiguity. -- g ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
In this email I'm responding to a series of emails from Travis pretty much in the order I read them: Travis Oliphant writes: I'm saying we should introduce a single-object mechanism for describing binary data so that the many-object approach of c-types does not become some kind of de-facto standard. C-types can translate this object-instance to its internals if and when it needs to. In the mean-time, how are other packages supposed to communicate binary information about data with each other? Here we disagree. I haven't used C-types. I have no idea whether it is well-designed or horribly unusable. So if someone wanted to argue that C-types is a mistake and should be thrown out, I'd be willing to listen. Until someone tries to make that argument, I'm presuming it's good enough to be part of the standard library for Python. Given that, I think that it *SHOULD* become a de-facto standard. I think that the way different packages should communicate binary information about data with each other is using C-types. Not because it's wonderful (remember, I've never used it), but because it's STANDARD. There should be one obvious way to do things! When there is, it makes interoperability WAY easier, and interoperability is the main objective when dealing with things like binary data formats. Propose using C-types. Or propose *improving* C-types. But don't propose ignoring it. In a different message, he writes: It also bothers me that so many ways to describe binary data are being used out there. This is a problem that deserves being solved. And, no, ctypes hasn't solved it (we can't directly use the ctypes solution). Really? Why? Is this a failing in C-types? Can C-types be fixed? Later he explains: Remember the buffer protocol is in compiled code. So, as a result, 1) It's harder to construct a class to pass through the protocol using the multiple-types approach of ctypes. 2) It's harder to interpret the object recevied through the buffer protocol. Sure, it would be *possible* to use ctypes, but I think it would be very difficult. Think about how you would write the get_data_format C function in the extended buffer protocol for NumPy if you had to import ctypes and then build a class just to describe your data. How would you interpret what you get back? Aha! So what you REALLY ought to be asking for is a C interface to the ctypes module. That seems like a very sensible and reasonable request. I don't think we should just *use ctypes because it's there* when the way it describes binary data was not constructed with the extended buffer protocol in mind. I just disagree. (1) I *DO* think we should just use ctypes because it's there. After all, the problem we're trying to solve is one of COMPATIBILITY - you don't solve those by introducing competing standards. (2) From what I understand of it, I think ctypes is quite capable of describing data to be accessed via the buffer protocol. In another email: In order to make sense of the data-format object that I'm proposing you have to see the need to share information about data-format through an extended buffer protocol (which I will be proposing soon). I'm not going to try to argue that right now because there are a lot of people who can do that. Actually, no need to convince me... I am already convinced of the wisdom of this approach. My view is that it is un-necessary to use a different type object to describe each different data-type. [...] So, the big difference is that I think data-formats should be *instances* of a single type. Why? Who cares? Seriously, if we were proposing to describe the layouts with a collection of rubber bands and potato chips, I'd say it was a crazy idea. But we're proposing using data structures in a computer memory. Why does it matter whether those data structures are of the same python type or different python types? I care whether the structure can be created, passed around, and interrogated. I don't care what Python type they are. I'm saying that I don't like the idea of forcing this approach on everybody else who wants to describe arbitrary binary data just because ctypes is included. And I'm saying that I *do*. Hey, if someone proposed getting rid of the current syntax for the array module (for Py3K) and replacing it with use of ctypes, I'd give it serious consideration. There should be only one way to describe binary structures. It should be powerful enough to describe almost any structure, easy-to-use, and most of all it should be used consistently everywhere. I need some encouragement in order to continue to invest energy in pushing this forward. Please keep up the good work! Some day I'd like to see NumPy built in to the standard Python distribution. The incremental, PEP by PEP approach you are taking is the best route to getting there. But there may be some changes along the way -- convergence with ctypes may be one
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant wrote: However, the existence of an alternative strategy using a single Python type and multiple instances of that type to describe binary data (which is the NumPy approach and essentially the array module approach) means that we can't just a-priori assume that the way ctypes did it is the only or best way. As a hypothetical, what if there was a helper function that translated a description of a data structure using basic strings and sequences (along the lines of what you have in your PEP) into a ctypes data structure? The examples of missing features that Martin has exposed are not show-stoppers. They can all be easily handled within the context of what is being proposed. I can modify the PEP to show this. But, I don't have the time to spend if it's just all going to be rejected in the end. I need some encouragement in order to continue to invest energy in pushing this forward. I think the most important thing in your PEP is the formats for describing structures in a way that is easy to construct in both C and Python (specifically, by using strings and sequences), and it is worth pursuing for that aspect alone. Whether that datatype is then implemented as a class in its own right or as a factory function that returns a ctypes data type object is, to my mind, a relatively minor implementation issue (either way has questions to be addressed - I'm not sure how you tell ctypes that you have a 32-bit integer with a non-native endian format, for example). In fact, it may make sense to just use the lists/strings directly as the data exchange format definitions, and let the various libraries do their own translation into their private format descriptions instead of creating a new one-type-to-describe-them-all. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Travis Oliphant schrieb: So, the big difference is that I think data-formats should be *instances* of a single type. This is nearly the case for ctypes as well. All layout descriptions are instances of the type type. Nearly, because they are instances of subtypes of the type type: py type(ctypes.c_long) type '_ctypes.SimpleType' py type(ctypes.c_double) type '_ctypes.SimpleType' py type(ctypes.c_double).__bases__ (type 'type',) py type(ctypes.Structure) type '_ctypes.StructType' py type(ctypes.Array) type '_ctypes.ArrayType' py type(ctypes.Structure).__bases__ (type 'type',) py type(ctypes.Array).__bases__ (type 'type',) So if your requirement is all layout descriptions ought to have the same type, then this is (nearly) the case: they are instances of type (rather then datatype, as in your PEP). The big difference, however, is that by going this route you are forced to use the type object as your data-format instance. This is fitting a square peg into a round hole in my opinion.To really be useful, you would need to add the attributes and (most importantly) C-function pointers and C-structure members to these type objects. I don't even think that is possible in Python (even if you do create a meta-type that all the c-type type objects can use that carries the same information). There are a few people claiming I should use the ctypes type-hierarchy but nobody has explained how that would be possible given the attributes, C-structure members and C-function pointers that I'm proposing. In NumPy we also have a Python type for each basic data-format (we call them array scalars). For a little while they carried the data-format information on the Python side. This turned out to be not flexible enough. So, we expanded the PyArray_Descr * structure which has always been a part of Numeric (and the array module array type) into an actual Python type and a lot of things became possible. It was clear to me that we were on to something. Now, the biggest claim against the gist of what I'm proposing (details we can argue about), seems from my perspective to be a desire to go backwards and carry data-type information around with a Python type. The data-type object did not just appear out of thin-air one day. It really can be seen as an evolution from the beginnings of Numeric (and the Python array module). So, this is what we came up with in the NumPy world. Ctypes came up with something a bit different. It is not trivial to just use ctypes. I could say the same thing and tell ctypes to just use NumPy's data-type object. It could be done that way, but of course it would take a bit of work on the part of ctypes to make that happen. Having ctypes in the standard library does not mean that any other discussion of how data-format should be represented has been decided on. If I had known that was what it meant to put ctypes in the standard library, I would have been more vocal several months ago. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Travis Oliphant schrieb: Function pointers are supported with the void data-type and could be more specifically supported if it were important. People typically don't use the buffer protocol to send function-pointers around in a way that the void description wouldn't be enough. As I said before, I can't tell whether it's important, as I still don't know what the purpose of this PEP is. If it is to support a unification of memory layout specifications, and if that unifications is also to include ctypes, then yes, it is important. If it is to describe array elements in NumArray arrays, then it might not be important. For the usage of ctypes, the PEP void type is insufficient to describe function pointers: you also need a specification of the signature of the function pointer (parameter types and return type), or else you can't use the function pointer (i.e. you can't call the function). The buffer protocol is primarily meant for describing the format of (large) contiguous pieces of binary data. In most cases that will be all kinds of numerical data for scientific applications, image and other media data, simple databases and similar kinds of data. There is currently no adequate data format type which sufficiently supports these applications, otherwise Travis wouldn't make this proposal. While Travis' proposal encompasses the data format functionality within the struct module and overlaps with what ctypes has to offer, it does not aim to replace ctypes. I don't think that a basic data format type necessarily should be able to encode all the information a foreign function interface needs to call a code library. From my point of view, that kind of information is one abstraction layer above a basic data format and should be implemented as an extension of or complementary to the basic data format. I also do not understand why the data format type should attempt to fully describe arbitrarily complex data formats, like fragmented (non-continuous) data structures in memory. You'd probably need a full programming language for that anyway. Regards, Stephan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant schrieb: But, there are distinct disadvantages to this approach compared to what I'm trying to allow. Martin claims that the ctypes approach is *basically* equivalent but this is just not true. I may claim that, but primarily, my goal was to demonstrate that the proposed PEP cannot be used to describe ctypes object layouts (without checking, I can readily believe that the PEP covers everything in the array and struct modules). It could be made more true if the ctypes objects inherited from a meta-type and if Python allowed meta-types to expand their C-structures. But, last I checked this is not possible. That I don't understand. a) what do you think is not possible? b) why is that an important difference between a datatype and a ctype? If you are suggesting that, given two Python types A and B, and B inheriting from A, that the memory layout of B cannot extend the memory layout of A, then: that is certainly possible in Python, and there are many examples for it. A Python type object is a very particular kind of Python-type. As far as I can tell, it's not as flexible in terms of the kinds of things you can do with the instances of a type object (i.e. what ctypes types are) on the C-level. Ah, you are worried that NumArray objects would have to be *instances* of ctypes types. That wouldn't be necessary at all. Instead, if each NumArray object had a method get_ctype(), which returned a ctypes type, then you would get the same desciptiveness that you get with the PEP's datatype. I'm happy to have the data-format object live separate from ctypes and leave it to the ctypes author(s) to support it if desired. But, the claim that the extended buffer protocol jump through all kinds of hoops to conform to the ctypes standard when that standard was designed with a different idea in mind is not acceptable. That, of course, is a reasoning I can understand. This is free software, contributors can chose to contribute whatever they want; you can't force anybody to do anything specific you want to get done. Acceptance of any PEP (not just this PEP) should always be contingent on available of a patch implementing it. Where is the discussion that crowned the ctypes way of doing things as the one true way It hasn't been crowned this way. Me, personally, I just said two things about this PEP and ctypes: a) the PEP does not support all concepts that ctypes needs b) ctypes can express all examples in the PEP in response to your proposal that ctypes should adopt the PEP, and that ctypes is not good enough to be the one true way. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Nick Coghlan wrote: Travis E. Oliphant wrote: However, the existence of an alternative strategy using a single Python type and multiple instances of that type to describe binary data (which is the NumPy approach and essentially the array module approach) means that we can't just a-priori assume that the way ctypes did it is the only or best way. As a hypothetical, what if there was a helper function that translated a description of a data structure using basic strings and sequences (along the lines of what you have in your PEP) into a ctypes data structure? That would be fine and useful in fact. I don't see how it helps the problem of what to pass through the buffer protocol I see passing c-types type objects around on the c-level as an un-necessary and burdensome approach unless the ctypes objects were significantly enhanced. In fact, it may make sense to just use the lists/strings directly as the data exchange format definitions, and let the various libraries do their own translation into their private format descriptions instead of creating a new one-type-to-describe-them-all. Yes, I'm open to this possibility. I basically want two things in the object passed through the extended buffer protocol: 1) It's fast on the C-level 2) It covers all the use-cases. If just a particular string or list structure were passed, then I would drop the data-format PEP and just have the dataformat argument of the extended buffer protocol be that thing. Then, something that converts ctypes objects to that special format would be very nice indeed. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant schrieb: The big difference, however, is that by going this route you are forced to use the type object as your data-format instance. Since everything is an object (an instance) in Python, this is not such a big difference. This is fitting a square peg into a round hole in my opinion.To really be useful, you would need to add the attributes and (most importantly) C-function pointers and C-structure members to these type objects. Can you explain why that is? In the PEP, I see two C fucntions: setitem and getitem. I think they can be implemented readily with ctypes' GETFUNC and SETFUNC function pointers that it uses all over the place. I don't see a requirement to support C structure members or function pointers in the datatype object. There are a few people claiming I should use the ctypes type-hierarchy but nobody has explained how that would be possible given the attributes, C-structure members and C-function pointers that I'm proposing. Ok, here you go. Remember, I'm still not claiming that this should be done: I'm just explaining how it could be done. - byteorder/isnative: I think this could be derived from the presence of the _swappedbytes_ field - itemsize: can be done with ctypes.sizeof - kind: can be created through a mapping of the _type_ field (I think) - fields: can be derived from the _fields_ member - hasobject: compare, recursively, with py_object - name: use __name__ - base: again, created from _type_ (if _length_ is present) - shape: recursively look at _length_ - alignment: use ctypes.alignment It was clear to me that we were on to something. Now, the biggest claim against the gist of what I'm proposing (details we can argue about), seems from my perspective to be a desire to go backwards and carry data-type information around with a Python type. I, at least, have no such desire. I just explained that the ctypes model of memory layouts is just as expressive as the one in the PEP. Which of these is better for what the PEP wants to achieve, I can't say, because I still don't quite understand what the PEP wants to achieve. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Stephan Tolksdorf schrieb: While Travis' proposal encompasses the data format functionality within the struct module and overlaps with what ctypes has to offer, it does not aim to replace ctypes. This discussion could have been a lot shorter if he had said so. Unfortunately (?) he stated that it was *precisely* a motivation of the PEP to provide a standard data description machinery that can then be adopted by the struct, array, and ctypes modules. I also do not understand why the data format type should attempt to fully describe arbitrarily complex data formats, like fragmented (non-continuous) data structures in memory. You'd probably need a full programming language for that anyway. For an FFI application, you need to be able to describe arbitrary in-memory formats, since that's what the foreign function will expect. For type safety and reuse, you better separate the description of the layout from the creation of the actual values. Otherwise (i.e. if you have to define the layout on each invocation), creating the parameters for a foreign function becomes very tedious and error-prone, with errors often being catastrophic (i.e. interpreter crashes). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant schrieb: Greg Ewing wrote: Travis Oliphant wrote: Part of the problem is that ctypes uses a lot of different Python types (that's what I mean by multi-object to accomplish it's goal). What I'm looking for is a single Python type that can be passed around and explains binary data. It's not clear that multi-object is a bad thing in and of itself. It makes sense conceptually -- if you have a datatype object representing a struct, and you ask for a description of one of its fields, which could be another struct or array, you would expect to get another datatype object describing that. Can you elaborate on what would be wrong with this? Also, can you clarify whether your objection is to multi-object or multi-type. They're not the same thing -- you could have a data structure built out of multiple objects that are all of the same Python type, with attributes distinguishing between struct, array, etc. That would be single-type but multi-object. I've tried to clarify this in another post. Basically, what I don't like about the ctypes approach is that it is multi-type (every new data-format is a Python type). In order to talk about all these Python types together, then they must all share some attribute (or else be derived from a meta-type in C with a specific function-pointer entry). (I tried to read the whole thread again, but it is too large already.) There is a (badly named, probably) api to access information about ctypes types and instances of this type. The functions are PyObject_stgdict(obj) and PyType_stgdict(type). Both return a 'StgDictObject' instance or NULL if the funtion fails. This object is the ctypes' type object's __dict__. StgDictObject is a subclass of PyDictObject and has fields that carry information about the C type (alignment requirements, size in bytes, plus some other stuff). Also it contains several pointers to functions that implement (in C) struct-like functionality (packing/unpacking). Of course several of these fields can only be used for ctypes-specific purposes, for example a pointer to the ffi_type which is used when calling foreign functions, or the restype, argtypes, and errcheck fields which are only used when the type describes a function pointer. This mechanism is probably a hack because it'n not possible to add C accessible fields to type objects, on the other hand it is extensible (in principle, at least). Just to describe the implementation. Thomas ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Travis E. Oliphant schrieb: But, there are distinct disadvantages to this approach compared to what I'm trying to allow. Martin claims that the ctypes approach is *basically* equivalent but this is just not true. I may claim that, but primarily, my goal was to demonstrate that the proposed PEP cannot be used to describe ctypes object layouts (without checking, I can readily believe that the PEP covers everything in the array and struct modules). That's a fine argument. You are right in terms of the PEP as it stands. However, I want to make clear that a single Python type object *could* be used to describe data including all the cases you laid out. It would not be difficult to extend the PEP to cover all the cases you've described --- I'm not sure that's desireable. I'm not trying to replace what ctypes does. I'm just trying to get something that we can use to exchange data-format information through the extended buffer protocol. It really comes down to using Python type-objects as the instances describing data-formats (which ctypes does) or normal Python objects as the instances describing data-formats (what the PEP proposes). It could be made more true if the ctypes objects inherited from a meta-type and if Python allowed meta-types to expand their C-structures. But, last I checked this is not possible. That I don't understand. a) what do you think is not possible? Extending the C-structure of PyTypeObject and having Python types use that as their type-object. b) why is that an important difference between a datatype and a ctype? Because with instances of C-types you are stuck with the PyTypeObject structure. If you want to add anything you have to do it in the dictionary. Instances of a datatype allow adding anything after the PyObject_HEAD structure. If you are suggesting that, given two Python types A and B, and B inheriting from A, that the memory layout of B cannot extend the memory layout of A, then: that is certainly possible in Python, and there are many examples for it. I know this. I've done it for many different objects. I'm saying it's not quite the same when what you are extending is the PyTypeObject and trying to use it as the type object for some other object. A Python type object is a very particular kind of Python-type. As far as I can tell, it's not as flexible in terms of the kinds of things you can do with the instances of a type object (i.e. what ctypes types are) on the C-level. Ah, you are worried that NumArray objects would have to be *instances* of ctypes types. That wouldn't be necessary at all. Instead, if each NumArray object had a method get_ctype(), which returned a ctypes type, then you would get the same desciptiveness that you get with the PEP's datatype. No, I'm not worried about that (It's not NumArray by the way, it's NumPy. NumPy replaces both NumArray and Numeric). NumPy actually interfaces with ctypes quite well. This is how I learned anything I might know about ctypes. So, I'm well aware of this. What I am concerned about is using Python type objects (i.e. Python objects that can be cast in C to PyTypeObject *) outside of ctypes to describe data-formats when you don't need it and it just complicates dealing with the data-format description. Where is the discussion that crowned the ctypes way of doing things as the one true way It hasn't been crowned this way. Me, personally, I just said two things about this PEP and ctypes: Thanks for clarifying, but I know you didn't say this. Others, however, basically did. a) the PEP does not support all concepts that ctypes needs It could be extended, but I'm not sure it *needs* to be in it's real context. I'm very sorry for contributing to the distraction that ctypes should adopt the PEP. My words were unclear. But, I'm not pushing for that. I really have no opinion how ctypes describes data. b) ctypes can express all examples in the PEP in response to your proposal that ctypes should adopt the PEP, and that ctypes is not good enough to be the one true way. I think it is good enough in the semantic sense. But, I think using type objects in this fashion for general-purpose data-description is over-kill and will be much harder to extend and deal with. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Thomas Heller wrote: (I tried to read the whole thread again, but it is too large already.) There is a (badly named, probably) api to access information about ctypes types and instances of this type. The functions are PyObject_stgdict(obj) and PyType_stgdict(type). Both return a 'StgDictObject' instance or NULL if the funtion fails. This object is the ctypes' type object's __dict__. StgDictObject is a subclass of PyDictObject and has fields that carry information about the C type (alignment requirements, size in bytes, plus some other stuff). Also it contains several pointers to functions that implement (in C) struct-like functionality (packing/unpacking). Of course several of these fields can only be used for ctypes-specific purposes, for example a pointer to the ffi_type which is used when calling foreign functions, or the restype, argtypes, and errcheck fields which are only used when the type describes a function pointer. This mechanism is probably a hack because it'n not possible to add C accessible fields to type objects, on the other hand it is extensible (in principle, at least). Thank you for the description. While I've studied the ctypes code, I still don't understand the purposes beind all the data-structures. Also, I really don't have an opinion about ctypes' implementation. All my comparisons are simply being resistant to the unexplained idea that I'm supposed to use ctypes objects in a way they weren't really designed to be used. For example, I'm pretty sure you were the one who made me aware that you can't just extend the PyTypeObject. Instead you extended the tp_dict of the Python typeObject to store some of the extra information that is needed to describe a data-type like I'm proposing. So, if you I'm just describing data-format information, why do I need all this complexity (that makes ctypes implementation easier/more natural/etc)? What if the StgDictObject is the Python data-format object I'm talking about? It actually looks closer. But, if all I want is the StgDictObject (or something like it), then why should I pass around the whole type object? This is all I'm saying to those that want me to use ctypes to describe data-formats in the extended buffer protocol. I'm not trying to change anything in ctypes. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant schrieb: For example, I'm pretty sure you were the one who made me aware that you can't just extend the PyTypeObject. Instead you extended the tp_dict of the Python typeObject to store some of the extra information that is needed to describe a data-type like I'm proposing. So, if you I'm just describing data-format information, why do I need all this complexity (that makes ctypes implementation easier/more natural/etc)? What if the StgDictObject is the Python data-format object I'm talking about? It actually looks closer. But, if all I want is the StgDictObject (or something like it), then why should I pass around the whole type object? Maybe you don't need it. ctypes certainly needs the type object because it is also used for constructing instances (while NumPy uses factory functions, IIUC), or for converting 'native' Python object into foreign function arguments. I know that this doesn't interest you from the NumPy perspective (and I don't want to offend you by saying this). This is all I'm saying to those that want me to use ctypes to describe data-formats in the extended buffer protocol. I'm not trying to change anything in ctypes. I don't want to change anything in NumPy, either, and was not the one who suggested to use ctypes objects, although I had thought about whether it would be possible or not. What I like about ctypes, and dislike about Numeric/Numarry/NumPy is the way C compatible types are defined in ctypes. I find the ctypes way more natural than the numxxx or array module way, but what else would anyone expect from me as the ctypes author... I hope that a useful interface is developed from your proposals, and will be happy to adapt ctypes to use it or interface ctypes with it if this makes sense. Thomas ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Stephan Tolksdorf schrieb: While Travis' proposal encompasses the data format functionality within the struct module and overlaps with what ctypes has to offer, it does not aim to replace ctypes. This discussion could have been a lot shorter if he had said so. Unfortunately (?) he stated that it was *precisely* a motivation of the PEP to provide a standard data description machinery that can then be adopted by the struct, array, and ctypes modules. Struct and array I was sure about. Ctypes less sure. I'm very sorry for the distraction I caused by mis-stating my objective. My objective is really the extended buffer protocol. The data-type object is a means to that end. I do think ctypes could make use of the data-type object and that there is a real difference between using Python type objects as data-format descriptions and using another Python type for those descriptions. I thought to go the ctypes route (before I even knew what ctypes did) but decided against it for a number of reasons. But, nonetheless those are side issues. The purpose of the PEP is to provide an object that the extended buffer protocol can use to share data-format information. It should be considered primarily in that context. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant schrieb: I think it actually is. Perhaps I'm wrong, but a type-object is still a special kind of an instance of a meta-type. I once tried to add function pointers to a type object by inheriting from it. But, I was told that Python is not set up to handle that. Maybe I misunderstood. I'm not quite sure what the problems are: one obvious problem is that the next Python version may also extend the size of type objects. But, AFAICT, even that should work, in the sense that this new version should check for the presence of a flag to determine whether the additional fields are there. The only tricky question is how you can find out whether your own extension is there. If that is a common problem, I think a framework could be added to support extensible type objects (with some kind of registry for additional fields, and a per-type-object indicator whether a certain extension field is present). Let me be very clear. The whole reason I make any statements about ctypes is because somebody else brought it up. I'm not trying to replace ctypes and the way it uses type objects to represent data internally. Ok. I understood you differently earlier. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
On 10/31/06, Travis Oliphant [EMAIL PROTECTED] wrote: Martin v. Löwis wrote: [...] because I still don't quite understand what the PEP wants to achieve. Are you saying you still don't understand after having read the extended buffer protocol PEP, yet? I can't speak for Martin, but I don't understand how I, as a Python programmer, might use the data type objects specified in the PEP. I have skimmed the extended buffer protocol PEP, but I'm conscious that no objects I currently use support the extended buffer protocol (and the PEP doesn't mention adding support to existing objects), so I don't see that as too relevant to me. I have also installed numpy, and looked at the help for numpy.dtype, but that doesn't add much to the PEP. The freely available chapters of the numpy book explain how dtypes describe data structures, but not how to use them. The freely available Numeric documentation doesn't refer to dtypes, as far as I can tell. Is there any documentation on how to use dtypes, independently of other features of numpy? If not, can you clarify where the benefit lies for a Python user of this proposal? (I understand the benefits of a common language for extensions to communicate datatype information, but why expose it to Python? How do Python users use it?) This is probably all self-evident to the numpy community, but I think that as the PEP is aimed at a wider audience it needs a little more background. Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Paul Moore [EMAIL PROTECTED] wrote: On 10/31/06, Travis Oliphant [EMAIL PROTECTED] wrote: Martin v. Löwis wrote: [...] because I still don't quite understand what the PEP wants to achieve. Are you saying you still don't understand after having read the extended buffer protocol PEP, yet? I can't speak for Martin, but I don't understand how I, as a Python programmer, might use the data type objects specified in the PEP. I have skimmed the extended buffer protocol PEP, but I'm conscious that no objects I currently use support the extended buffer protocol (and the PEP doesn't mention adding support to existing objects), so I don't see that as too relevant to me. Presumably str in 2.x and bytes in 3.x could be extended to support the 'S' specifier, unicode in 2.x and text in 3.x could be extended to support the 'U' specifier. The various array.array variants could be extended to support all relevant specifiers, etc. This is probably all self-evident to the numpy community, but I think that as the PEP is aimed at a wider audience it needs a little more background. Someone correct me if I am wrong, but it allows things equivalent to the following that is available in C, available in Python... typedef struct { char R; char G; char B; char A; } pixel_RGBA; pixel_RGBA image[1024][768]; Or even... typedef struct { long long numerator; unsigned long long denominator; double approximation; } rational; rational ratios[1024]; The real use is that after you have your array of (packed) objects, be it one of the above samples, or otherwise, you don't need to explicitly pass around specifiers (like in struct, or ctypes), numpy and others can talk to each other, and pick up the specifier with the extended buffer protocol, and it just works. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Paul Moore wrote: On 10/31/06, Travis Oliphant [EMAIL PROTECTED] wrote: Martin v. Löwis wrote: [...] because I still don't quite understand what the PEP wants to achieve. Are you saying you still don't understand after having read the extended buffer protocol PEP, yet? I can't speak for Martin, but I don't understand how I, as a Python programmer, might use the data type objects specified in the PEP. I have skimmed the extended buffer protocol PEP, but I'm conscious that no objects I currently use support the extended buffer protocol (and the PEP doesn't mention adding support to existing objects), so I don't see that as too relevant to me. Do you use the PIL? The PIL supports the array interface. CVXOPT supports the array interface. Numarray Numeric NumPy all support the array interface. I have also installed numpy, and looked at the help for numpy.dtype, but that doesn't add much to the PEP. The source-code is available. The freely available chapters of the numpy book explain how dtypes describe data structures, but not how to use them. The freely available Numeric documentation doesn't refer to dtypes, as far as I can tell. It kind of does, they are PyArray_Descr * structures in Numeric. They just aren't Python objects. Is there any documentation on how to use dtypes, independently of other features of numpy? There are examples and other help pages at http://www.scipy.org If not, can you clarify where the benefit lies for a Python user of this proposal? (I understand the benefits of a common language for extensions to communicate datatype information, but why expose it to Python? How do Python users use it?) The only benefit I imagine would be for an extension module library writer and for users of the struct and array modules. But, other than that, I don't know. It actually doesn't have to be exposed to Python. I used Python notation in the PEP to explain what is basically a C-structure. I don't care if the object ever gets exposed to Python. Maybe that's part of the communication problem. This is probably all self-evident to the numpy community, but I think that as the PEP is aimed at a wider audience it needs a little more background. It's hard to write that background because most of what I understand is from the NumPy community. I can't give you all the examples but my concern is that you have all these third party libraries out there describing what is essentially binary data and using either string-copies or the buffer protocol + extra information obtained by some method or attribute that varies across the implementations. There should really be a standard for describing this data. There are attempts at it in the struct and array module. There is the approach of ctypes but I claim that using Python type objects is over-kill for the purposes of describing data-formats. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
The only benefit I imagine would be for an extension module library writer and for users of the struct and array modules. But, other than that, I don't know. It actually doesn't have to be exposed to Python. I used Python notation in the PEP to explain what is basically a C-structure. I don't care if the object ever gets exposed to Python. Maybe that's part of the communication problem. I get the impression where ctypes is good for accessing native C libraries from within python, the data-type object is meant to add a more direct way to share native python object's *data* with C (or other languages) in a more efficient way. For data that can be represented well in continuous memory address's, it lightens the load so instead of a list of python objects you get an array of data for n python_type objects without the duplications of the python type for every element. I think maybe some more complete examples demonstrating how it is to be used from both the Python and C would be good. Cheers, Ron ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
One thing I'm curious about in the ctypes vs this PEP debate is the following. How do the approaches differ in practice if I'm developing a library that wants to accept various image formats that all describe the same thing: rgb data. Let's say for now all I want to support is two different image formats whose pixels are described in C structs by: struct rbg565 { unsigned short r:5; unsigned short g:6; unsigned short b:5; }; struct rgb101210 { unsigned int r:10; unsigned int g:12; unsigned int b:10; }; Basically in my code I want to be able to take the binary data descriptor and say give me the 'r' field of this pixel as an integer. Is either one (the PEP or c-types) clearly easier to use in this case? What would the code look like for handling both formats generically? --bb ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
On Oct 31, 2006, at 6:38 PM, Thomas Heller wrote: This mechanism is probably a hack because it'n not possible to add C accessible fields to type objects, on the other hand it is extensible (in principle, at least). I better start rewriting PyObjC then :-). PyObjC stores some addition information in the type objects that are used to describe Objective-C classes (such as a reference to the proxied class). IIRC This has been possible from Python 2.3. Ronald smime.p7s Description: S/MIME cryptographic signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Neal Becker wrote: I have watched numpy with interest for a long time. My own interest is to possibly use the c-api to wrap c++ algorithms to use from python. One thing that has concerned me, and continues to concern me with this proposal, is that it seems to suffer from a very fat interface. I certainly have not studied the options in any depth, but my gut feeling is that the interface is too fat and too complex. I wonder if it's possible to avoid this. I wonder if this is an example of all the methods sinking to the base class. You've just described my number #1 concern with incorporating NumPy wholesale, and the reason I believe it would be nice to cherry-pick a couple of key components for the standard library, rather than adopting the whole thing. Travis has done a lot of work towards that goal (the latest result of which is this pre-PEP for describing the individual array elements in a way that is more flexible than the single character codes of the current array module). Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Would it be possible to make the data-type objects subclassable, with the subclasses being able to override the equality test?The range of data types that you've specified in the PEP are good enough for most general use, and probably for NumPy as well, but someone already came up with the example of image formats, which have their whole own range of data formats. I could throw in audio formats (bits per sample, excess-N or signed or ulaw samples, mono/stereo/5.1/etc, order of the channels), and there's probably a whole slew of other areas that have their own sets of formats.If the datatype objects are subclassable, modules could initially start by adding their own formats. So, the "jackaudio" and "jillaudio" modules would have distinct sets of formats. But then later on it should be fairly easy for them to recognize each others formats. So, jackaudio would recognize the jillaudio format "msdos linear pcm" as being identical to its own "16-bit excess-32768".Hopefully eventually all audio module writers would get together and define a set of standard audio formats. -- Jack Jansen, [EMAIL PROTECTED], http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman smime.p7s Description: S/MIME cryptographic signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
...in the cases I have seen, which includes BMP, TGA, uncompressed TIFF, a handful of platform-specific bitmap formats, etc., you _always_ get them in RGBA order. If the alpha channel is to be left out, then you get them as RGB. Mac OS X unfortunately uses ARGB. Writing some alti-vec code remedied that for passing it around to the OpenCV library. Just my $.02 Diez ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Josiah Carlson schrieb: One could also toss wxPython, VTK, or any one of the other GUI libraries into the mix for visualizing those images, of which wxPython just acquired no-copy display of PIL images, and being able to manipulate them with numpy (of which some wxPython built in classes use numpy to speed up manipulation) would be very useful. I'm doubtful that this PEP alone would allow zero-copy sharing of images for display. Often, the libraries need the data in a different format. So they need to copy, even if they could understand the other format. However, the PEP won't allow understanding the format. If I know I have an array of 4-byte values: which of them is R, G, B, and A? You give a name to the fields: 'R', 'G', 'B', and 'A'. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Jim Jewett wrote: Travis E. Oliphant wrote: Two packages need to share a chunk of memory (the package authors do not know each other and only have and Python as a common reference). They both want to describe that the memory they are sharing has some underlying binary structure. As a quick sanity check, please tell me where I went off track. it sounds to me like you are assuming that: (1) The memory chunk represents a single object (probably an array of some sort) (2) That subchunks can themselves be described by a (single?) repeating C struct. (3) You can't just use the C header, since you want this at run-time. (4) It would be enough if you could say This is an array of 500 elements that look like struct { int simple; struct nested { char name[30]; char addr[45]; int amount; } Sure. I think that's pretty much it. I assume you mean object in the general sense and not as in (Python object). (5) But is it not acceptable to use Martin's suggested ctypes equivalent of (building out from the inside): Part of the problem is that ctypes uses a lot of different Python types (that's what I mean by multi-object to accomplish it's goal). What I'm looking for is a single Python type that can be passed around and explains binary data. Remember the buffer protocol is in compiled code. So, as a result, 1) It's harder to construct a class to pass through the protocol using the multiple-types approach of ctypes. 2) It's harder to interpret the object recevied through the buffer protocol. Sure, it would be *possible* to use ctypes, but I think it would be very difficult. Think about how you would write the get_data_format C function in the extended buffer protocol for NumPy if you had to import ctypes and then build a class just to describe your data. How would you interpret what you get back? The ctypes format-description approach is not as unified as a single Python type object that I'm proposing. In NumPy, we have a very nice, compact description of complicated data already available. Why not use what we've learned? I don't think we should just *use ctypes because it's there* when the way it describes binary data was not constructed with the extended buffer protocol in mind. The other option, of course, which would not introduce a new Python type is to use the array interface specification and pass a list of tuples. But, I think this is also un-necessarily wasteful because the sending object has to construct it and the receiving object has to de-construct it. The whole point of the (extended) buffer protocol is to communicate this information more quickly. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Greg Ewing wrote: Travis E. Oliphant wrote: Greg Ewing wrote: What exactly does bit mean in that context? Do you mean big ? No, you've got a data type there called bit, which seems to imply a size, in contradiction to the size-independent nature of the other types. I'm asking what size-independent information it's meant to convey. Ah. I see what you were saying now. I guess the 'bit' type is different (we actually don't have that type in NumPy so my understanding of it is limited). The 'bit' type re-intprets the size information to be in units of bits and so implies a bit-field instead of another data-format. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Robert Kern schrieb: As I unification mechanism, I think it is insufficient. I doubt it can express all the concepts that ctypes supports. What do you think is missing that can't be added? I can factually only report what is missing. Whether it can be added, I don't know. As I just wrote in a few other messages: pointers, unions, functions pointers, packed structs, incomplete/recursive types. Also flexible array members (i.e. open-ended arrays). I understand function pointers, pointers, and unions. Function pointers are supported with the void data-type and could be more specifically supported if it were important. People typically don't use the buffer protocol to send function-pointers around in a way that the void description wouldn't be enough. Pointers are also supported with the void data-type. If pointers to other data-types were an important feature to support, then this could be added in many ways (a flag on the data-type object for example is how this is done is NumPy). Unions are actually supported (just define two fields with the same offset). I don't know what you mean by packed structs (unless you are talking about alignment issues in which case there is support for it). I'm not sure I understand what you mean by incomplete / recursive types unless you are referring to something like a node where an element of the structure is a pointer to another structure of the same kind (like used in linked-lists or trees). If that is the case, then it's easily supported once support for pointers is added. I also don't know what you mean by open-ended arrays. The data-format is meant to describe a fixed-size chunk of data. String syntax is not needed to support all of these things. What I'm asking for and proposing is a way to construct an instance of a single Python type that communicates this data-format information in a standardized way. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant schrieb: Function pointers are supported with the void data-type and could be more specifically supported if it were important. People typically don't use the buffer protocol to send function-pointers around in a way that the void description wouldn't be enough. As I said before, I can't tell whether it's important, as I still don't know what the purpose of this PEP is. If it is to support a unification of memory layout specifications, and if that unifications is also to include ctypes, then yes, it is important. If it is to describe array elements in NumArray arrays, then it might not be important. For the usage of ctypes, the PEP void type is insufficient to describe function pointers: you also need a specification of the signature of the function pointer (parameter types and return type), or else you can't use the function pointer (i.e. you can't call the function). Pointers are also supported with the void data-type. If pointers to other data-types were an important feature to support, then this could be added in many ways (a flag on the data-type object for example is how this is done is NumPy). For ctypes, (I think) you need true pointers to other layouts, or else you couldn't set up the memory correctly. I don't understand how this could work with some extended buffer protocol, though: would a buffer still have to be a contiguous piece of memory? If you have structures with pointers in them, they rarely point to contiguous memory. Unions are actually supported (just define two fields with the same offset). Ah, ok. What's the string syntax for it? I don't know what you mean by packed structs (unless you are talking about alignment issues in which case there is support for it). Yes, this is indeed about alignment; I missed it. What's the string syntax for it? I'm not sure I understand what you mean by incomplete / recursive types unless you are referring to something like a node where an element of the structure is a pointer to another structure of the same kind (like used in linked-lists or trees). If that is the case, then it's easily supported once support for pointers is added. That's what I mean, yes. I'm not sure how it can easily be added, though. Suppose you want to describe struct item{ int key; char* value; struct item *next; }; How would you do that? Something like item = datatype([('key', 'i4'), ('value', 'S*'), ('next', 'what_to_put_here*')] can't work: item hasn't been assigned, yet, so you can't use it as the field type. I also don't know what you mean by open-ended arrays. The data-format is meant to describe a fixed-size chunk of data. I see. In C (and thus in ctypes), you sometimes have what C99 calls flexible array member: struct PyString{ Py_ssize_t ob_refcnt; PyObject *ob_type; Py_ssize_t ob_len; char ob_sval[]; }; where the ob_sval field can extend arbitrarily, as it is the last member of the struct. Of course, this will give you dynamically-sized objects (objects in C cannot really be variable-sized, since the size of a memory block has to be defined at allocation time, and can't really change afterwards). String syntax is not needed to support all of these things. Ok. That's confusing in the PEP: it's not clear whether all these forms are meant to be equivalent, and, if not, which one is the most generic one, and what aspects are missing in what forms. Also, if you have a datatype which cannot be expressed in the string syntax, what is its str attribute? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant wrote: Part of the problem is that ctypes uses a lot of different Python types (that's what I mean by multi-object to accomplish it's goal). What I'm looking for is a single Python type that can be passed around and explains binary data. It's not clear that multi-object is a bad thing in and of itself. It makes sense conceptually -- if you have a datatype object representing a struct, and you ask for a description of one of its fields, which could be another struct or array, you would expect to get another datatype object describing that. Can you elaborate on what would be wrong with this? Also, can you clarify whether your objection is to multi-object or multi-type. They're not the same thing -- you could have a data structure built out of multiple objects that are all of the same Python type, with attributes distinguishing between struct, array, etc. That would be single-type but multi-object. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant wrote: The 'bit' type re-intprets the size information to be in units of bits and so implies a bit-field instead of another data-format. Hmmm, okay, but now you've got another orthogonality problem, because you can't distinguish between e.g. a 5-bit signed int field and a 5-bit unsigned int field. It might be better not to consider bit to be a type at all, and come up with another way of indicating that the size is in bits. Perhaps 'i4' # 4-byte signed int 'i4b' # 4-bit signed int 'u4' # 4-byte unsigned int 'u4b' # 4-bit unsigned int (Next we can have an argument about whether bit fields should be packed MSB-to-LSB or vice versa...:-) -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant wrote: I'm not sure I understand what you mean by incomplete / recursive types unless you are referring to something like a node where an element of the structure is a pointer to another structure of the same kind (like used in linked-lists or trees). Yes, and more complex arrangements of types that reference each other. If that is the case, then it's easily supported once support for pointers is added. But it doesn't fit easily into the single-object model. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Armin Rigo wrote: Hi Travis, On Fri, Oct 27, 2006 at 02:05:31PM -0600, Travis E. Oliphant wrote: This PEP proposes adapting the data-type objects from NumPy for inclusion in standard Python, to provide a consistent and standard way to discuss the format of binary data. How does this compare with ctypes? Do we really need yet another, incompatible way to describe C-like data structures in the standarde library? There is a lot of subtlety in the details that IMHO clouds the central issue which I will try to clarify here the way I see it. First of all: In order to make sense of the data-format object that I'm proposing you have to see the need to share information about data-format through an extended buffer protocol (which I will be proposing soon). I'm not going to try to argue that right now because there are a lot of people who can do that. So, I'm going to assume that you see the need for it. If you don't, then just suspend concern about that for the moment. There are a lot of us who really see the need for it. Now: To describe data-formats ctypes uses a Python type-object defined for every data-format you might need. In my view this is an 'over-use' of the type-object and in fact, to be useful, requires the definition of a meta-type that carries the relevant additions to the type-object that are needed to describe data (like function pointers to get data in and out of Python objects). My view is that it is un-necessary to use a different type object to describe each different data-type. The route I'm proposing is to define (in C) a *single* new Python type (called a data-format type) that carries the information needed to describe a chunk of memory. In this way *instances* of this new type define data-formats. In ctypes *instances* of the meta-type (i.e. new types) define data-formats (actually I'm not sure if all the new c-types are derived from the same meta-type). So, the big difference is that I think data-formats should be *instances* of a single type. There is no need to define a Python type-object for every single data-type. In fact, not only is there no need, it makes the extended buffer protocol I'm proposing even more difficult to use and explain. Again, my real purpose is the extended buffer protocol. These data-format type is a means to that end. If the consensus is that nobody sees a greater use of the data-format type beyond the buffer protocol, then I will just write 1 PEP for the extended buffer protocol. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Greg Ewing wrote: Travis Oliphant wrote: Part of the problem is that ctypes uses a lot of different Python types (that's what I mean by multi-object to accomplish it's goal). What I'm looking for is a single Python type that can be passed around and explains binary data. It's not clear that multi-object is a bad thing in and of itself. It makes sense conceptually -- if you have a datatype object representing a struct, and you ask for a description of one of its fields, which could be another struct or array, you would expect to get another datatype object describing that. Can you elaborate on what would be wrong with this? Also, can you clarify whether your objection is to multi-object or multi-type. They're not the same thing -- you could have a data structure built out of multiple objects that are all of the same Python type, with attributes distinguishing between struct, array, etc. That would be single-type but multi-object. I've tried to clarify this in another post. Basically, what I don't like about the ctypes approach is that it is multi-type (every new data-format is a Python type). In order to talk about all these Python types together, then they must all share some attribute (or else be derived from a meta-type in C with a specific function-pointer entry). I think it is simpler to think of a single Python type whose instances convey information about data-format. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant wrote: Greg Ewing wrote: Travis Oliphant wrote: Part of the problem is that ctypes uses a lot of different Python types (that's what I mean by multi-object to accomplish it's goal). What I'm looking for is a single Python type that can be passed around and explains binary data. It's not clear that multi-object is a bad thing in and of itself. It makes sense conceptually -- if you have a datatype object representing a struct, and you ask for a description of one of its fields, which could be another struct or array, you would expect to get another datatype object describing that. Yes, exactly. This is what the Python type I'm proposing does as well. So, perhaps we are misunderstanding each other. The difference is that data-types are instances of the data-type (data-format) object instead of new Python types (as they are in ctypes). I've tried to clarify this in another post. Basically, what I don't like about the ctypes approach is that it is multi-type (every new data-format is a Python type). I should clarify that I have no opinion about the ctypes approach for what ctypes does with it. I like ctypes and have adapted NumPy to make it easier to work with ctypes. I'm saying that I don't like the idea of forcing this approach on everybody else who wants to describe arbitrary binary data just because ctypes is included. Now, if it is shown that it is indeed better than a simpler instances-of-a-single-type approach that I'm basically proposing then I'll be persuaded. However, the existence of an alternative strategy using a single Python type and multiple instances of that type to describe binary data (which is the NumPy approach and essentially the array module approach) means that we can't just a-priori assume that the way ctypes did it is the only or best way. The examples of missing features that Martin has exposed are not show-stoppers. They can all be easily handled within the context of what is being proposed. I can modify the PEP to show this. But, I don't have the time to spend if it's just all going to be rejected in the end. I need some encouragement in order to continue to invest energy in pushing this forward. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Greg Ewing wrote: Travis Oliphant wrote: The 'bit' type re-intprets the size information to be in units of bits and so implies a bit-field instead of another data-format. Hmmm, okay, but now you've got another orthogonality problem, because you can't distinguish between e.g. a 5-bit signed int field and a 5-bit unsigned int field. Good point. It might be better not to consider bit to be a type at all, and come up with another way of indicating that the size is in bits. Perhaps 'i4' # 4-byte signed int 'i4b' # 4-bit signed int 'u4' # 4-byte unsigned int 'u4b' # 4-bit unsigned int I like this. Very nice. I think that's the right way to look at it. (Next we can have an argument about whether bit fields should be packed MSB-to-LSB or vice versa...:-) I guess we need another flag / attribute to indicate that. The other thing that needs to be discussed at some point may be a way to indicate the floating-point format. I've basically punted on this and just meant 'f' to mean platform float Thus, you can't use the data-type object to pass information between two platforms that don't share a common floating point representation. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
M.-A. Lemburg wrote: Travis E. Oliphant wrote: I understand and that's why I'm asking why you made the range explicit in the definition. In the case of NumPy it was so that String and Unicode arrays would both look like multi-length string character arrays and not arrays of arrays of some character. But, this can change in the data-format object. I can see that the Unicode description needs to be improved. The definition should talk about Unicode code points. The number of bytes then determines whether you can only represent the ASCII subset (1 byte), UCS2 (2 bytes, BMP only) or UCS4 (4 bytes, all currently assigned code points). Yes, you are correct. A string of unicode characters should really be represented in the same way that an array of integers is represented for a data-format object. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis Oliphant schrieb: So, the big difference is that I think data-formats should be *instances* of a single type. This is nearly the case for ctypes as well. All layout descriptions are instances of the type type. Nearly, because they are instances of subtypes of the type type: py type(ctypes.c_long) type '_ctypes.SimpleType' py type(ctypes.c_double) type '_ctypes.SimpleType' py type(ctypes.c_double).__bases__ (type 'type',) py type(ctypes.Structure) type '_ctypes.StructType' py type(ctypes.Array) type '_ctypes.ArrayType' py type(ctypes.Structure).__bases__ (type 'type',) py type(ctypes.Array).__bases__ (type 'type',) So if your requirement is all layout descriptions ought to have the same type, then this is (nearly) the case: they are instances of type (rather then datatype, as in your PEP). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Greg Ewing wrote: Nick Coghlan wrote: I'd say the answer to where we put it will be dependent on what happens to the idea of adding a NumArray style fixed dimension array type to the standard library. If that gets exposed through the array module as array.dimarray, then it would make sense to expose the associated data layout descriptors as array.datatype. Seem to me that arrays are a sub-concept of binary data, not the other way around. So maybe both arrays and data types should be in a module called 'binary' or some such. Yes, very good point. That's probably one reason I'm proposing the data-type first before the array interface in the extended buffer protocol. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Greg Ewing wrote: Travis E. Oliphant wrote: The 'kind' does not specify how big the data-type (data-format) is. What exactly does bit mean in that context? Do you mean big ? It's how many bytes the kind is using. So, 'u4' is a 4-byte unsigned integer and 'u2' is a 2-byte unsigned integer. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Greg Ewing wrote: Nick Coghlan wrote: Greg Ewing wrote: Also, what if I want to refer to fields by name but don't want to have to work out all the offsets Use the list definition form. With the changes I've suggested above, you wouldn't even have to name the fields you don't care about - just describe them. That would be okay. I still don't see a strong justification for having a one-big-string form as well as a list/tuple/dict form, though. Compaction of representation is all. It's used quite a bit in numarray, which is where most of the 'kind' names came from as well. When you don't want to name fields it is a really nice feature (but it doesn't nest well). -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant schrieb: What is needed is a definitive way to describe data and then have array struct ctypes all be compatible with that same method.That's why I'm proposing the PEP. It's a unification effort not yet-another-method. As I unification mechanism, I think it is insufficient. I doubt it can express all the concepts that ctypes supports. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Greg Ewing wrote: Travis E. Oliphant wrote: How to handle unicode data-formats could definitely be improved. Suggestions are welcome. 'U4*10' string of 10 4-byte Unicode chars I like that. Thanks. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Travis E. Oliphant schrieb: What is needed is a definitive way to describe data and then have array struct ctypes all be compatible with that same method.That's why I'm proposing the PEP. It's a unification effort not yet-another-method. As I unification mechanism, I think it is insufficient. I doubt it can express all the concepts that ctypes supports. What do you think is missing that can't be added? -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant schrieb: How to handle unicode data-formats could definitely be improved. As before, I'm doubtful what the actual needs are. For example, is it desired to support generation of ID3v2 tags with such a data format? The tag is specified here: http://www.id3.org/id3v2.4.0-structure.txt In ID3v1, text fields have a specified width, and are supposed to be encoded in Latin-1, and padded with zero bytes. In ID3v2, text fields start with an encoding declaration (say, \x03 for UTF-8), then followed with a null-terminated sequence of UTF-8 bytes. Is it the intent of this PEP to support such data structures, and allow the user to fill in a Unicode object, and then the processing is automatic? (i.e. in ID3v1, the string gets automatically Latin-1-encoded and zero-padded, in ID3v2, it gets automatically UTF-8 encoded, and null-terminated) If that is not to be supported, what are the use cases? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Travis E. Oliphant schrieb: What is needed is a definitive way to describe data and then have array struct ctypes all be compatible with that same method.That's why I'm proposing the PEP. It's a unification effort not yet-another-method. As I unification mechanism, I think it is insufficient. I doubt it can express all the concepts that ctypes supports. Please clarify what you mean. Are you saying that a single object can't carry all the information about binary data that ctypes allows with it's multi-object approach? I don't agree with you, if that is the case. Sure, perhaps I've not included certain cases, so give an example. Besides, I don't think this is the right view of unification. I'm not saying that ctypes should get rid of it's many objects used for interfacing with C-functions. I'm saying we should introduce a single-object mechanism for describing binary data so that the many-object approach of c-types does not become some kind of de-facto standard. C-types can translate this object-instance to its internals if and when it needs to. In the mean-time, how are other packages supposed to communicate binary information about data with each other? Remember the context that the data-format object is presented in. Two packages need to share a chunk of memory (the package authors do not know each other and only have and Python as a common reference). They both want to describe that the memory they are sharing has some underlying binary structure. How do they do that? Please explain to me how the buffer protocol can be extended so that information about what is in the memory can be shared without a data-format object? -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Travis E. Oliphant schrieb: How to handle unicode data-formats could definitely be improved. As before, I'm doubtful what the actual needs are. For example, is it desired to support generation of ID3v2 tags with such a data format? The tag is specified here: Perhaps I was not clear enough about what I'm try to do. For a long time a lot of people have wanted something like Numeric in Python itself. There have been many hurdles to that goal. After discussions at SciPy 2006 with Guido, we decided that the best way to proceed at this point was to extend the buffer protocol to allow packages to share array-like information with each-other. There are several things missing from the buffer protocol that NumPy needs in order to be able to really understand the (fixed-size) memory another package has allocated and is sharing. The most important of these is 1) Shape information 2) Striding information 3) Data-format information (how is each element perceived). Shape and striding information can be shared with a C-array of integers. How is data-format information supposed to be shared? We've come up with a very flexible way to do this in NumPy using a single Python object. This Python object supports describing the layout of any fixed-size chunk of memory (right now in units of bytes --- bit fields could be added, though). I'm proposing to add this object to Python so that the buffer protcol has a fast and efficient way to share #3. That's really all I'm after. It also bothers me that so many ways to describe binary data are being used out there. This is a problem that deserves being solved. And, no, ctypes hasn't solved it (we can't directly use the ctypes solution). Perhaps this PEP doesn't hit all the corners, but a data-format object *is* a useful thing to consider. The array object in Python already has a PyArray_Descr * structure that is a watered-down version of what I'm talking about. In fact, this is what Numeric built from (or vice-versa actually). And NumPy has greatly enhanced this object for any conceivable structure. Guido seemed to think the data-type objects were nice when he saw them at SciPy 2006, and so I'm presenting a PEP. Without the data-format object, I'm don't know how to extend the buffer protocol to communicate data-format information. Do you have a better idea? I have no trouble limiting the data-type object to the buffer protocol extension PEP, but I do think it could gain wider use. Is it the intent of this PEP to support such data structures, and allow the user to fill in a Unicode object, and then the processing is automatic? (i.e. in ID3v1, the string gets automatically Latin-1-encoded and zero-padded, in ID3v2, it gets automatically UTF-8 encoded, and null-terminated) No, the point of the data-format object is to communicate information about data-formats not to encode or decode anything. Users of the data-format object could decide what they wanted to do with that information. We just need a standard way to communicate it through the buffer protocol. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant schrieb: I'm proposing to add this object to Python so that the buffer protcol has a fast and efficient way to share #3. That's really all I'm after. I admit that I don't understand this objective. Why is it desirable to support such an extended buffer protocol? What specific application would be made possible if it was available and implemented in the relevant modules and data types? What are the relevant modules and data types that should implement it? It also bothers me that so many ways to describe binary data are being used out there. This is a problem that deserves being solved. And, no, ctypes hasn't solved it (we can't directly use the ctypes solution). Perhaps this PEP doesn't hit all the corners, but a data-format object *is* a useful thing to consider. IMO, it is only useful if it realistically can support all the use cases that it intends to support. If this PEP is about defining the elements of arrays, I doubt it can realistically support everything you can express in ctypes. There is no support for pointers (except for PyObject*), no support for incomplete (recursive) types, no support for function pointers, etc. Vice versa: why exactly can't you use the data type system of ctypes? If I want to say int[10], I do py ctypes.c_long * 10 class '__main__.c_long_Array_10' To rewrite the examples from the PEP: datatype(float) = ctypes.c_double datatype(int) = ctypes.c_long datatype((int, 5)) = ctypes.c_long * 5 datatype((float, (3,2)) = (ctypes.c_double * 3) * 2 struct { int simple; struct nested { char name[30]; char addr[45]; int amount; } = py from ctypes import * py class nested(Structure): ... _fields_ = [(name, c_char*30), (addr, c_char*45), (amount, c_long)] ... py class struct(Structure): ... _fields_ = [(simple, c_int), (nested, nested)] ... Guido seemed to think the data-type objects were nice when he saw them at SciPy 2006, and so I'm presenting a PEP. I have no objection to including NumArray as-is into Python. I just wonder were the rationale for this PEP comes from, i.e. why do you need to exchange this information across different modules? Without the data-format object, I'm don't know how to extend the buffer protocol to communicate data-format information. Do you have a better idea? See above: I can't understand where the need for an extended buffer protocol comes from. I can see why NumArray needs reflection, and needs to keep information to interpret the bytes in the array. But why is it important that the same information is exposed by other data types? Is it the intent of this PEP to support such data structures, and allow the user to fill in a Unicode object, and then the processing is automatic? (i.e. in ID3v1, the string gets automatically Latin-1-encoded and zero-padded, in ID3v2, it gets automatically UTF-8 encoded, and null-terminated) No, the point of the data-format object is to communicate information about data-formats not to encode or decode anything. Users of the data-format object could decide what they wanted to do with that information. We just need a standard way to communicate it through the buffer protocol. This was actually a different sub-thread: why do you need to support the 'U' code (or the 'S' code, for that matter)? In what application do you have fixed size Unicode arrays, as opposed to Unicode strings? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant schrieb: As I unification mechanism, I think it is insufficient. I doubt it can express all the concepts that ctypes supports. Please clarify what you mean. Are you saying that a single object can't carry all the information about binary data that ctypes allows with it's multi-object approach? I'm not sure what you mean by single object. If I use the tuple syntax, e.g. datatype((float, (3,2)) There are also multiple objects (the float, the 3, and the 2). You get a single root object back, but so do you in ctypes. But this isn't really what I meant. Instead, I think the PEP lacks various concepts from C data types, such as pointers, unions, function pointers, alignment/packing. In the mean-time, how are other packages supposed to communicate binary information about data with each other? This is my other question. Why should they? Remember the context that the data-format object is presented in. Two packages need to share a chunk of memory (the package authors do not know each other and only have and Python as a common reference). They both want to describe that the memory they are sharing has some underlying binary structure. Can you please give an example of such two packages, and an application that needs them share data? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Robert Kern schrieb: As I unification mechanism, I think it is insufficient. I doubt it can express all the concepts that ctypes supports. What do you think is missing that can't be added? I can factually only report what is missing. Whether it can be added, I don't know. As I just wrote in a few other messages: pointers, unions, functions pointers, packed structs, incomplete/recursive types. Also flexible array members (i.e. open-ended arrays). While it may be possible to come up with a string syntax to describe all these things (*), I wonder whether it should be done, and whether NumArray can then support this extended data model. Regards, Martin (*) perhaps with the exception of incomplete types: C needs forward references in its own syntax. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant wrote: It also bothers me that so many ways to describe binary data are being used out there. This is a problem that deserves being solved. Is there a survey paper somewhere about binary formats? What formats are used in particle physics, bio-informatics, astronomy, etc? What software is used to read and write binary data? What descriptive languages are used for data (SQL, XML, etc)? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
I have watched numpy with interest for a long time. My own interest is to possibly use the c-api to wrap c++ algorithms to use from python. One thing that has concerned me, and continues to concern me with this proposal, is that it seems to suffer from a very fat interface. I certainly have not studied the options in any depth, but my gut feeling is that the interface is too fat and too complex. I wonder if it's possible to avoid this. I wonder if this is an example of all the methods sinking to the base class. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
On 10/29/06, Martin v. Löwis [EMAIL PROTECTED] wrote: Travis E. Oliphant schrieb: Remember the context that the data-format object is presented in. Two packages need to share a chunk of memory (the package authors do not know each other and only have and Python as a common reference). They both want to describe that the memory they are sharing has some underlying binary structure. Can you please give an example of such two packages, and an application that needs them share data? Here's an example. PIL handles images (in various formats) in memory, as blocks of binary image data. NumPy provides methods for manipulating in-memory blocks of data. Now, if I want to use NumPy to manipulate that data in place (for example, to cap the red component at 128, and equalise the range of the green component) my code needs to know the format of the memory block that PIL exposes. I am assuming that in-place manipulation is better, because there is no need for repeated copies of the data to be made (this would be true for large images). If PIL could expose a descriptor for its data structure, NumPy code could manipulate it in place without fear of corrupting it. Of course, this can be done by the end user reading the PIL documentation and transcribing the documented format into the NumPy code. But I would argue that it's better if the PIL block is self-describing in a way that avoids the need for a manual transcription of the format. To do this *without* needing the PIL and NumPy developers to co-operate needs an independent standard, which is what I assume this PEP is intended to provide. Paul. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Paul Moore [EMAIL PROTECTED] wrote: On 10/29/06, Martin v. Löwis [EMAIL PROTECTED] wrote: Travis E. Oliphant schrieb: Remember the context that the data-format object is presented in. Two packages need to share a chunk of memory (the package authors do not know each other and only have and Python as a common reference). They both want to describe that the memory they are sharing has some underlying binary structure. Can you please give an example of such two packages, and an application that needs them share data? To do this *without* needing the PIL and NumPy developers to co-operate needs an independent standard, which is what I assume this PEP is intended to provide. One could also toss wxPython, VTK, or any one of the other GUI libraries into the mix for visualizing those images, of which wxPython just acquired no-copy display of PIL images, and being able to manipulate them with numpy (of which some wxPython built in classes use numpy to speed up manipulation) would be very useful. Of all of the intended uses, I'd say that zero-copy sharing of information on the graphics/visualization front is the most immediate 'people will be using it tomorrow' feature. I personally don't have my pulse on the Scientific Python community, so I don't know about other uses, but in regards to Martin's list of missing features: pointers, unions, function pointers, alignment/packing [, etc.] I'm going to go out on a limb and say for the majority of those YAGNI, or really, NOHAFIAFACT (no one has asked for it, as far as I can tell). Someone who knows the scipy community, feel free to correct me. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Paul Moore schrieb: Here's an example. PIL handles images (in various formats) in memory, as blocks of binary image data. NumPy provides methods for manipulating in-memory blocks of data. Now, if I want to use NumPy to manipulate that data in place (for example, to cap the red component at 128, and equalise the range of the green component) my code needs to know the format of the memory block that PIL exposes. I am assuming that in-place manipulation is better, because there is no need for repeated copies of the data to be made (this would be true for large images). Thanks, that looks like a good example. Is it possible to elaborate that? E.g. what specific image format would I use (could that work for jpeg, even though this format has compression in it), and what specific NumPy routines would I use to implement the capping and equalising? What would the datatype description look like that those tools need to exchange? Looking at this in more detail, PIL in-memory images (ImagingCore objects) either have the image8 UINT8**, or the image32 INT32**; they have separate fields for pixelsize and linesize. In the image8 case, there are three options: - each value is an 8-bit integer (IMAGING_TYPE_UINT8) (1) - each value is a 16-bit integer, either little (2) or big endian (3) (IMAGING_TYPE_SPECIAL, mode either I;16 or I;16B) In the image32 case, there are five options: - two 8-bit values per four bytes, namely byte 0 and byte 3 (4) - three 8-bit values (bytes 0, 1, 2) (5) - four 8-bit values (6) - a single 32-bit int (7) - a single 32-bit float (8) Now, what would be the algorithm in NumPy that I could use to implement capping and equalising? If PIL could expose a descriptor for its data structure, NumPy code could manipulate it in place without fear of corrupting it. Of course, this can be done by the end user reading the PIL documentation and transcribing the documented format into the NumPy code. But I would argue that it's better if the PIL block is self-describing in a way that avoids the need for a manual transcription of the format. Without digging further, I think some of the formats simply don't allow for the kind of manipulation you suggest, namely all palette formats (which are the single-valued ones, plus the two-band version with a palette number and an alpha value), and greyscale images. So in any case, the application has to look at the mode of the image to find out whether the operation is even meaningful. And then, the application has to tell NumPy somehow what fields to operate on. To do this *without* needing the PIL and NumPy developers to co-operate needs an independent standard, which is what I assume this PEP is intended to provide. Ok, I now understand the goal, although I still like to understand this usecase better. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Josiah Carlson schrieb: One could also toss wxPython, VTK, or any one of the other GUI libraries into the mix for visualizing those images, of which wxPython just acquired no-copy display of PIL images, and being able to manipulate them with numpy (of which some wxPython built in classes use numpy to speed up manipulation) would be very useful. I'm doubtful that this PEP alone would allow zero-copy sharing of images for display. Often, the libraries need the data in a different format. So they need to copy, even if they could understand the other format. However, the PEP won't allow understanding the format. If I know I have an array of 4-byte values: which of them is R, G, B, and A? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis [EMAIL PROTECTED] wrote: Josiah Carlson schrieb: One could also toss wxPython, VTK, or any one of the other GUI libraries into the mix for visualizing those images, of which wxPython just acquired no-copy display of PIL images, and being able to manipulate them with numpy (of which some wxPython built in classes use numpy to speed up manipulation) would be very useful. I'm doubtful that this PEP alone would allow zero-copy sharing of images for display. Often, the libraries need the data in a different format. So they need to copy, even if they could understand the other format. However, the PEP won't allow understanding the format. If I know I have an array of 4-byte values: which of them is R, G, B, and A? ...in the cases I have seen, which includes BMP, TGA, uncompressed TIFF, a handful of platform-specific bitmap formats, etc., you _always_ get them in RGBA order. If the alpha channel is to be left out, then you get them as RGB. The trick with allowing zero-copy sharing is 1) to understand the format, and 2) to manipulate/display in-place. The former is necessary for the latter, which is what Travis is shooting for. Also, because wxPython has figured out how PIL images are structured, they can do #2, and so far no one has mentioned any examples where the standard RGB/RGBA format hasn't worked for them. In the case of jpegs (as you mentioned in another message), PIL uncompresses all images it understands into some kind of 'natural' format (from what I understand). For 24/32 bit images, that is RGB or RGBA. For palletized images (gif, 8-bit png, 8-bit bmp, etc.) maybe it is a palletized format, or maybe it is RGB/RGBA? I don't know, all of my images are 24/32 bit, but I can just about guarantee it's not an issue for the case that Paul mentioned. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant wrote: Greg Ewing wrote: What exactly does bit mean in that context? Do you mean big ? No, you've got a data type there called bit, which seems to imply a size, in contradiction to the size-independent nature of the other types. I'm asking what size-independent information it's meant to convey. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant wrote: Martin v. Löwis wrote: Travis E. Oliphant schrieb: Is it the intent of this PEP to support such data structures, and allow the user to fill in a Unicode object, and then the processing is automatic? No, the point of the data-format object is to communicate information about data-formats not to encode or decode anything. Well, there's still the issue of how much detail you want to be able to convey, so I think the question is valid. Is the encoding of a Unicode string something we want to be able to communicate via this mechanism, or is that outside its scope? -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Josiah Carlson wrote: ...in the cases I have seen ... you _always_ get them in RGBA order. Except when you don't. I've had cases where I've had to convert between RGBA and BGRA (for stuffing directly into a frame buffer on Linux, as far as I remember). So it may be worth including some features in the standard for describing pixel formats. Pygame seems to have a very detailed and flexible system for doing this, so it might be a good idea to have a look at that. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Travis E. Oliphant schrieb: The datatype is an object that specifies how a certain block of memory should be interpreted as a basic data-type. datatype(float) datatype('float64') I can't speak on the specific merits of this proposal, or whether this kind of functionality is desirable. However, I'm -1 on the addition of a builtin for this functionality (the PEP doesn't actually say that there is another builtin, but the examples suggest so). I was intentionally vague. I don't see a need for it to be a built-in, but didn't know where exactly to put it, I should have made it a question for discussion. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Greg Ewing wrote: Travis E. Oliphant wrote: PEP: unassigned Title: Adding data-type objects to the standard library Not sure about having 3 different ways to specify the structure -- it smacks of Too Many Ways To Do It to me. You might be right, but they all have use-cases. I've actually removed most of the multiple ways that NumPy allows for creating data-types. Also, what if I want to refer to fields by name but don't want to have to work out all the offsets I don't know what you mean. You just use the list-style to define a data-format with fields. The offsets are worked out for you. The only use for offsets was the dictionary form. The dictionary form stems from a desire to use the fields dictionary of a data-type as a data-type specification (which it is essentially is). -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant wrote: PEP: unassigned Title: Adding data-type objects to the standard library Attributes kind -- returns the basic kind of the data-type. The basic kinds are: 't' - bit, 'b' - bool, 'i' - signed integer, 'u' - unsigned integer, 'f' - floating point, 'c' - complex floating point, 'S' - string (fixed-length sequence of char), 'U' - fixed length sequence of UCS4, Shouldn't this read fixed length sequence of Unicode ?! The underlying code unit format (UCS2 and UCS4) depends on the Python version. 'O' - pointer to PyObject, 'V' - Void (anything else). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 28 2006) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Hi Travis, On Fri, Oct 27, 2006 at 02:05:31PM -0600, Travis E. Oliphant wrote: This PEP proposes adapting the data-type objects from NumPy for inclusion in standard Python, to provide a consistent and standard way to discuss the format of binary data. How does this compare with ctypes? Do we really need yet another, incompatible way to describe C-like data structures in the standard library? A bientot, Armin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
M.-A. Lemburg [EMAIL PROTECTED] wrote: Travis E. Oliphant wrote: M.-A. Lemburg wrote: Travis E. Oliphant wrote: PEP: unassigned Title: Adding data-type objects to the standard library Attributes kind -- returns the basic kind of the data-type. The basic kinds are: 't' - bit, 'b' - bool, 'i' - signed integer, 'u' - unsigned integer, 'f' - floating point, 'c' - complex floating point, 'S' - string (fixed-length sequence of char), 'U' - fixed length sequence of UCS4, Shouldn't this read fixed length sequence of Unicode ?! The underlying code unit format (UCS2 and UCS4) depends on the Python version. Well, in NumPy 'U' always means UCS4. So, I just copied that over. See my questions at the bottom which talk about how to handle this. A data-format does not necessarily have to correspond to something Python represents with an Object. Ok, but why are you being specific about UCS4 (which is an internal storage format), while you are not specific about e.g. the internal bit size of the integers (which could be 32 or 64 bit) ? I think that even on 64 bit platforms, using 'int' or 'long' generally means 32 bit. In order to get 64 bit ints, one needs to use 'long long'. Sharing some of the codes with the struct module, though arbitrary, doesn't seem like a bad idea to me. Of course offering specifically 32 and 64 bit ints would make sense to me. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
M.-A. Lemburg wrote: Travis E. Oliphant wrote: M.-A. Lemburg wrote: Travis E. Oliphant wrote: PEP: unassigned Title: Adding data-type objects to the standard library Attributes kind -- returns the basic kind of the data-type. The basic kinds are: 't' - bit, 'b' - bool, 'i' - signed integer, 'u' - unsigned integer, 'f' - floating point, 'c' - complex floating point, 'S' - string (fixed-length sequence of char), 'U' - fixed length sequence of UCS4, Shouldn't this read fixed length sequence of Unicode ?! The underlying code unit format (UCS2 and UCS4) depends on the Python version. Well, in NumPy 'U' always means UCS4. So, I just copied that over. See my questions at the bottom which talk about how to handle this. A data-format does not necessarily have to correspond to something Python represents with an Object. Ok, but why are you being specific about UCS4 (which is an internal storage format), while you are not specific about e.g. the internal bit size of the integers (which could be 32 or 64 bit) ? The 'kind' does not specify how big the data-type (data-format) is. A number is needed to represent the number of bytes. In this case, the 'kind' does not specify how large the data-type is. You can have 'u1', 'u2', 'u4', etc. The same is true with Unicode. You can have 10-character unicode elements, 20-character, etc. But, we have to be clear about what a character is in the data-format. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Armin Rigo wrote: Hi Travis, On Fri, Oct 27, 2006 at 02:05:31PM -0600, Travis E. Oliphant wrote: This PEP proposes adapting the data-type objects from NumPy for inclusion in standard Python, to provide a consistent and standard way to discuss the format of binary data. How does this compare with ctypes? Do we really need yet another, incompatible way to describe C-like data structures in the standard library? Part of what the data-type, data-format object is trying to do is bring together all the disparate ways to represent data that *already* exists in the standard library. What is needed is a definitive way to describe data and then have array struct ctypes all be compatible with that same method.That's why I'm proposing the PEP. It's a unification effort not yet-another-method. One of the big reasons for it is to move something like the array interface into Python. There are tens to hundreds of people mostly in the scientific computing community that want to see Python grow more support for NumPy-like things. I keep getting requests to do something to make Python more aware of arrays. This PEP is part of that effort. In particular, something like the array interface should be available in Python. The easiest way to do this is to extend the buffer protocol to allow objects to share information about shape, strides, and data-format of a block of memory. But, how do you represent data-format in Python? What will the objects pass back and forth to each other to do it? C-types has a solution which creates multiple objects to do it. This is an un-wieldy over-complicated solution for the array interface. The array objects have a solution using the a single object that carries the data-format information. The solution we have for arrays deserves consideration. It could be placed inside the array module if desired, but again, I'm really looking for something that would allow the extend buffer protocol (to be proposed soon) to share data-type information. That could be done with the array-interface objects (strings, lists, and tuples), but then every body who uses the interface will have to write their own decoders to process the data-format information. I actually think ctypes would benefit from this data-format specification too. Recognizing all these diverging ways to essentially talk about the same thing is part of what prompted this PEP. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant schrieb: In this case, the 'kind' does not specify how large the data-type is. You can have 'u1', 'u2', 'u4', etc. The same is true with Unicode. You can have 10-character unicode elements, 20-character, etc. But, we have to be clear about what a character is in the data-format. That is certainly confusing. In u1, u2, u4, the digit seems to indicate the size of a single value (1 byte, 2 bytes, 4 bytes). Right? Yet, in U20, it does *not* indicate the size of a single value but of an array? And then, it's not the size, but the number of elements? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant wrote: M.-A. Lemburg wrote: Travis E. Oliphant wrote: M.-A. Lemburg wrote: Travis E. Oliphant wrote: PEP: unassigned Title: Adding data-type objects to the standard library Attributes kind -- returns the basic kind of the data-type. The basic kinds are: 't' - bit, 'b' - bool, 'i' - signed integer, 'u' - unsigned integer, 'f' - floating point, 'c' - complex floating point, 'S' - string (fixed-length sequence of char), 'U' - fixed length sequence of UCS4, Shouldn't this read fixed length sequence of Unicode ?! The underlying code unit format (UCS2 and UCS4) depends on the Python version. Well, in NumPy 'U' always means UCS4. So, I just copied that over. See my questions at the bottom which talk about how to handle this. A data-format does not necessarily have to correspond to something Python represents with an Object. Ok, but why are you being specific about UCS4 (which is an internal storage format), while you are not specific about e.g. the internal bit size of the integers (which could be 32 or 64 bit) ? The 'kind' does not specify how big the data-type (data-format) is. A number is needed to represent the number of bytes. In this case, the 'kind' does not specify how large the data-type is. You can have 'u1', 'u2', 'u4', etc. The same is true with Unicode. You can have 10-character unicode elements, 20-character, etc. But, we have to be clear about what a character is in the data-format. I understand and that's why I'm asking why you made the range explicit in the definition. The definition should talk about Unicode code points. The number of bytes then determines whether you can only represent the ASCII subset (1 byte), UCS2 (2 bytes, BMP only) or UCS4 (4 bytes, all currently assigned code points). This is similar to the range for integers (ie. ZZ_0), where the number of bytes determines the range of numbers that can be represented. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 28 2006) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Martin v. Löwis wrote: Travis E. Oliphant schrieb: In this case, the 'kind' does not specify how large the data-type is. You can have 'u1', 'u2', 'u4', etc. The same is true with Unicode. You can have 10-character unicode elements, 20-character, etc. But, we have to be clear about what a character is in the data-format. That is certainly confusing. In u1, u2, u4, the digit seems to indicate the size of a single value (1 byte, 2 bytes, 4 bytes). Right? Yet, in U20, it does *not* indicate the size of a single value but of an array? And then, it's not the size, but the number of elements? Good point. In NumPy, unicode support was added in parallel with string arrays where there is not the ambiguity. So, yes, it's true that the unicode case is a special-case. The other way to handle it would be to describe the 'code'-point size (i.e. 'U1', 'U2', 'U4' for UCS-1, UCS-2, UCS-4) and then have the length be encoded as an array of those types. This was not the direction we took with NumPy (which is what I'm using as a reference) because I wanted Unicode and string arrays to look the same and thought of strings differently. How to handle unicode data-formats could definitely be improved. Suggestions are welcome. -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Nick Coghlan wrote: Greg Ewing wrote: Also, what if I want to refer to fields by name but don't want to have to work out all the offsets Use the list definition form. With the changes I've suggested above, you wouldn't even have to name the fields you don't care about - just describe them. That would be okay. I still don't see a strong justification for having a one-big-string form as well as a list/tuple/dict form, though. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Nick Coghlan wrote: I'd say the answer to where we put it will be dependent on what happens to the idea of adding a NumArray style fixed dimension array type to the standard library. If that gets exposed through the array module as array.dimarray, then it would make sense to expose the associated data layout descriptors as array.datatype. Seem to me that arrays are a sub-concept of binary data, not the other way around. So maybe both arrays and data types should be in a module called 'binary' or some such. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant wrote: The 'kind' does not specify how big the data-type (data-format) is. What exactly does bit mean in that context? -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant wrote: How to handle unicode data-formats could definitely be improved. Suggestions are welcome. 'U4*10' string of 10 4-byte Unicode chars Then for consistency you'd want 'S*10' rather than just 'S10' (or at least allow it as an alternative). -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP: Adding data-type objects to Python
PEP: unassigned Title: Adding data-type objects to the standard library Version: $Revision: $ Last-Modified: $Date: $ Author: Travis Oliphant [EMAIL PROTECTED] Status: Draft Type: Standards Track Created: 05-Sep-2006 Python-Version: 2.6 Abstract This PEP proposes adapting the data-type objects from NumPy for inclusion in standard Python, to provide a consistent and standard way to discuss the format of binary data. Rationale There are many situations crossing multiple areas where an interpretation is needed of binary data in terms of fundamental data-types such as integers, floating-point, and complex floating-point values. Having a common object that carries information about binary data would be beneficial to many people. The creation of data-type objects in NumPy to carry the load of describing what each element of the array contains represents an evolution of a solution that began with the PyArray_Descr structure in Python's own array object. These data-type objects can represent arbitrary byte data. Currently such information is usually constructed using strings and character codes which is unwieldy when a data-type consists of nested structures. Proposal Add a PyDatatypeObject in Python (adapted from NumPy's dtype object which evolved from the PyArray_Descr structure in Python's array module) that holds information about a data-type. This object will allow packages to exchange information about binary data in a uniform way (see the extended buffer protocol PEP for an application to exchanging information about array data). Specification The datatype is an object that specifies how a certain block of memory should be interpreted as a basic data-type. In addition to being able to describe basic data-types, the data-type object can describe a data-type that is itself an array of other data-types as well as a data-type that contains arbitrary fields (structure members) which are located at specific offsets. In its most basic form, however, a data-type is of a particular kind (bit, bool, int, uint, float, complex, object, string, unicode, void) and size. Datatype objects can be created using either a type-object, a string, a tuple, a list, or a dictionary according to the following constructors: Type-object: For a select set of type-objects a data-type object describing that basic type can be described: Examples: datatype(float) datatype('float64') datatype(int) datatype('int32') # on 32-bit platform (64 if c-long is 64-bits) Tuple-object A tuple of length 2 can be used to specify a data-type that is an array of another kind of basic data-type (this array always describes a C-contiguous array). Examples: datatype((int, 5)) datatype(('int32', (5,))) # describes a 5*4=20-byte block of memory laid out as # a[0], a[1], a[2], a[3], a[4] datatype((float, (3,2)) datatype(('float64', (3,2)) # describes a 3*2*8=48 byte block of memory that should be # interpreted as 6 doubles laid out as arr[0,0], arr[0,1], # ... a[2,0], a[1,2] String-object: The basic format is '%s%s%s%d' % (endian, shape, kind, itemsize) kind : one of the basic array kinds given below. itemsize : the nubmer of bytes (or bits for 't' kind) for this data-type. endian : either '', '=' (native), '|' (doesn't matter), '' (big-endian) or '' (little-endian). shape: either '', or a shape-tuple describing a data-type that is an array of the given shape. A string can also be a comma-separated sequence of basic formats. The result will be a data-type with default field names: 'f0', 'f1', ..., 'fn'. Examples: datatype('u4') datatype('uint32') datatype('f4') datatype('float32') datatype('(3,2)f4') datatype(('float32', (3,2)) datatype('(5,)i4, (3,2)f4, S5') datatype([('f0', 'i4', (5,)), ('f1', 'f4', (3, 2)), ('f2', '|S5')]) List-object: A list should be a list of tuples where each tuple describes a field. Each tuple should contain (name, datatype{, shape}) or ((meta-info, name), datatype{, shape}) in order to specify the data-type. This list must fully specify the data-type (no memory holes). If would would like to return a data-type with memory holes where the compiler would place them, then pass the keyword align=1 to this construction. This will result in un-named fields of Void kind of the correct size interspersed where needed. Examples: datatype([( ([1,2],'coords'), 'f4', (3,6)), ('address', 'S30')]) A data-type that could represent the
Re: [Python-Dev] PEP: Adding data-type objects to Python
Travis E. Oliphant schrieb: The datatype is an object that specifies how a certain block of memory should be interpreted as a basic data-type. datatype(float) datatype('float64') I can't speak on the specific merits of this proposal, or whether this kind of functionality is desirable. However, I'm -1 on the addition of a builtin for this functionality (the PEP doesn't actually say that there is another builtin, but the examples suggest so). Instead, putting it into the sys, array, struct, or ctypes modules might be more appropriate, as might be the introduction of another module. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com