Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-03 Thread Scott Dial
Travis Oliphant wrote:
 Paul Moore wrote:
 Enough of the abstract. As a concrete example, suppose I have a (byte)
 string in my program containing some binary data - an ID3 header, or a
 TCP packet, or whatever. It doesn't really matter. Does your proposal
 offer anything to me in how I might manipulate that data (assuming I'm
 not using NumPy)? (I'm not insisting that it should, I'm just trying
 to understand the scope of the PEP).

 
 What do you mean by manipulate the data.  The proposal for a 
 data-format object would help you describe that data in a standard way 
 and therefore share that data between several library that would be able 
 to understand the data (because they all use and/or understand the 
 default Python way to handle data-formats).
 

Perhaps the most relevant thing to pull from this conversation is back 
to what Martin has asked about before: flexible array members. A TCP 
packet has no defined length (there isn't even a header field in the 
packet for this, so in fairness we can talk about IP packets which do). 
There is no way for me to describe this with the pre-PEP data-formats.

I feel like it is misleading of you to say it's up to the package to do 
manipulations, because you glanced over the fact that you can't even 
describe this type of data. ISTM, that you're only interested in 
describing repetitious fixed-structure arrays. If we are going to have a 
default Python way to handle data-formats, then don't you feel like 
this falls short of the mark?

I fear that you speak about this in too grandiose terms and are now 
trapped by people asking, well, can I do this? I think for a lot of 
folks the answer is: nope. With respect to the network packets, this 
PEP doesn't do anything to fix the communication barrier. Is this not in 
the scope of a consistent and standard way to discuss the format of 
binary data (which is what your PEP's abstract sets out as the task)?

-- 
Scott Dial
[EMAIL PROTECTED]
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-03 Thread Travis Oliphant


 Perhaps the most relevant thing to pull from this conversation is back 
 to what Martin has asked about before: flexible array members. A TCP 
 packet has no defined length (there isn't even a header field in the 
 packet for this, so in fairness we can talk about IP packets which 
 do). There is no way for me to describe this with the pre-PEP 
 data-formats.

 I feel like it is misleading of you to say it's up to the package to 
 do manipulations, because you glanced over the fact that you can't 
 even describe this type of data. ISTM, that you're only interested in 
 describing repetitious fixed-structure arrays. 
Yes, that's right.  I'm only interested in describing binary data with a 
fixed length.  Others can help push it farther than that (if they even 
care).

 If we are going to have a default Python way to handle data-formats, 
 then don't you feel like this falls short of the mark?
Not for me.   We can fix what needs fixing, but not if we can't get out 
of the gate.

 I fear that you speak about this in too grandiose terms and are now 
 trapped by people asking, well, can I do this? I think for a lot of 
 folks the answer is: nope. With respect to the network packets, this 
 PEP doesn't do anything to fix the communication barrier.

Yes it could if you were interested in pushing it there.   No, I didn't 
solve that particular problem with the PEP (because I can only solve the 
problems I'm aware of), but I do think the problem could be solved.   We 
have far too many nay-sayers on this list, I think.

Right now, I don't have time to push this further.  My real interest is 
the extended buffer protocol.  I want something that works for that.  
When I do have time again to discuss it again, I might come back and 
push some more. 

But, not now.

-Travis



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-02 Thread Paul Moore
On 11/2/06, Travis Oliphant [EMAIL PROTECTED] wrote:
 What do you mean by manipulate the data.  The proposal for a
 data-format object would help you describe that data in a standard way
 and therefore share that data between several library that would be able
 to understand the data (because they all use and/or understand the
 default Python way to handle data-formats).

 It would be up to the other packages to manipulate the data.

Yes, some other messages I read since I posted this clarified it for
me. Essentially, as a Python programmer, there's nothing in the PEP
for me - it's for extension writers (and maybe writers of some
lower-level Python modules? I'm not sure about this). So as I'm not
really the target audience, I won't comment further.

 So, what you would be able to do is take your byte-string and create a
 buffer object which you could then share with other packages:

 Example:

 b = buffer(bytestr, format=data_format_object)

 Now.

 a = numpy.frombuffer(b)
 a['field1']  # prints data stored in the field named field1

 etc.

 Or.

 cobj = ctypes.frombuffer(b)

 # Now, cobj is a ctypes object that is basically a structure that can
 be passed # directly to your C-code.

 Does this help?

Somewhat. My understanding is that the python-level buffer object is
frowned upon as not good practice, and is scheduled for removal at
some point (Py3K, quite possibly?) Hence, any code that uses buffer()
feels like it needs to be replaced by something more acceptable.
So although I understand the use you suggest, it's not compelling to
me because I am left with the feeling that I wish I knew the way to
do it that didn't need the buffer object (even though I realise
intellectually that such a way may not exist).

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-02 Thread Travis Oliphant
Martin v. Löwis wrote:
 Travis E. Oliphant schrieb:
 
2. Should primitive type codes be characters or integers (from an enum) at
C level?
- I prefer integers

3. Should size be expressed in bits or bytes?
- I prefer bits


So, you want an integer enum for the kind and an integer for the 
bitsize?   That's fine with me.

One thing I just remembered.  We have T_UBYTE and T_BYTE, etc. defined 
in structmember.h already.  Should we just re-use those #defines while 
adding to them to make an easy to use interface for primitive types?
 
 
 Notice that those type codes imply sizes, namely the platform sizes
 (where platform always means what the C compiler does). So if
 you want to have platform-independent codes as well, you shouldn't
 use the T_ codes.
 

In NumPy we've found it convenient to use both.   Basically, we've set 
up a header file that does the translation using #defines and typedefs 
to create things like (on a 32-bit platform)

typedef npy_int32  int
#define NPY_INT32 NPY_INT

So, that either the T_code-like enum or the bit-width can be used 
interchangable.

Typically people want to specify bit-widths (and see their data-types in 
bit-widths) but in C-code that implements something you need to use one 
of the platform integers.

I don't know if we really need to bring all of that over.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-02 Thread Thomas Heller
Ronald Oussoren schrieb:
 On Oct 31, 2006, at 6:38 PM, Thomas Heller wrote:
 

 This mechanism is probably a hack because it'n not possible to add  
 C accessible
 fields to type objects, on the other hand it is extensible (in  
 principle, at least).
 
 I better start rewriting PyObjC then :-). PyObjC stores some addition  
 information in the type objects that are used to describe Objective-C  
 classes (such as a reference to the proxied class).
 
 IIRC This has been possible from Python 2.3.

I assume you are referring to the code in pyobjc/Modules/objc/objc-class.h ?

If this really is reliable I should better start rewriting ctypes then ;-).

Hm, I always thought there was some additional magic going on with type
objects, fields appended dynamically at the end or whatever.

Thomas
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-02 Thread Ronald Oussoren


On Nov 2, 2006, at 9:35 PM, Thomas Heller wrote:


Ronald Oussoren schrieb:

On Oct 31, 2006, at 6:38 PM, Thomas Heller wrote:



This mechanism is probably a hack because it'n not possible to add
C accessible
fields to type objects, on the other hand it is extensible (in
principle, at least).


I better start rewriting PyObjC then :-). PyObjC stores some addition
information in the type objects that are used to describe Objective-C
classes (such as a reference to the proxied class).

IIRC This has been possible from Python 2.3.


I assume you are referring to the code in pyobjc/Modules/objc/objc- 
class.h


Yes.



If this really is reliable I should better start rewriting ctypes  
then ;-).


Hm, I always thought there was some additional magic going on with  
type

objects, fields appended dynamically at the end or whatever.


There is such magic, but that magic was updated in Python 2.3 to  
allow type-object extensions like this.


Ronald


smime.p7s
Description: S/MIME cryptographic signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-02 Thread Greg Ewing
Travis Oliphant wrote:
 or just
 
 numpy.array(array.array('d',[1,2,3]))
 
 and leave-out the buffer object all together.

I think the buffer object in his example was just a
placeholder for some arbitrary object that supports
the buffer interface, not necessarily another NumPy
array.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-02 Thread Greg Ewing
Travis E. Oliphant wrote:
 We have T_UBYTE and T_BYTE, etc. defined 
 in structmember.h already.  Should we just re-use those #defines while 
 adding to them to make an easy to use interface for primitive types?

They're mixed up with size information, though,
which we don't want to do.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-02 Thread Alexander Belopolsky
Paul Moore p.f.moore at gmail.com writes:

 Somewhat. My understanding is that the python-level buffer object is
 frowned upon as not good practice, and is scheduled for removal at
 some point (Py3K, quite possibly?) Hence, any code that uses buffer()
 feels like it needs to be replaced by something more acceptable.

Python 2.x buffer object serves two distinct purposes.  First, it is a
mutable string object and this is definitely not going away being
replaced by the bytes object. (Interestingly, this functionality is not
exposed to python, but C extension modules can call
PyBuffer_New(size) to create a buffer.)  Second, it is a view into any
object supporting buffer protocol.  For a while this usage was indeed
frowned upon because buffer objects held the pointer obtained from
bf_get*buffer for too long causing memory errors in situations like
this:

 a = array('c', x*10)
 b = buffer(a, 5, 2)
 a.extend('x'*1000)
 str(b)
'xx'

This problem was fixed more than two years ago. 

--
r35400 | nascheme | 2004-03-10 

Make buffer objects based on mutable objects (like array) safe.
--

Even though it was suggested in the past that buffer *object*
should be deprecated as unsafe, I don't remember seeing a call
to deprecate the buffer protocol.   


 So although I understand the use you suggest, it's not compelling to
 me because I am left with the feeling that I wish I knew the way to
 do it that didn't need the buffer object (even though I realise
 intellectually that such a way may not exist).
 

As I explained in another post,  I used buffer object as an example of
an object that supports buffer protocol, but does not export type
information in the form usable by numpy.

Here is another way to illustrate the problem:

 a = numpy.array(array.array('H', [1,2,3]))
 b = numpy.array([1,2,3],dtype='H')
 a.dtype == b.dtype
False

With the extended buffer protocol it will be possible for numpy.array(..)
to realize that array.array('H', [1,2,3]) is a sequence of unsigned short
integers and convert it accordingly.  Currently numpy has to go through
the sequence protocol to create a numpy.array from an array.array and
loose the type information.







___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Martin v. Löwis
Bill Baxter schrieb:
 Basically in my code I want to be able to take the binary data descriptor and
 say give me the 'r' field of this pixel as an integer.
 
 Is either one (the PEP or c-types) clearly easier to use in this case?  What
 would the code look like for handling both formats generically?

The PEP, as specified, does not support accessing individual fields from
Python. OTOH, ctypes, as implemented, does. This comparison is not fair,
though: an *implementation* of the PEP (say, NumPy) might also give you
Python-level access to the fields.

With the PEP, you can get access to the 'r' field from C code.
Performing this access is quite tedious; as I'm uncertain whether you
actually wanted to see C code, I refrain from trying to formulate it.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Nick Coghlan
Travis Oliphant wrote:
 Nick Coghlan wrote:
 In fact, it may make sense to just use the lists/strings directly as the 
 data 
 exchange format definitions, and let the various libraries do their own 
 translation into their private format descriptions instead of creating a new 
 one-type-to-describe-them-all.
 
 Yes, I'm open to this possibility.   I basically want two things in the 
 object passed through the extended buffer protocol:
 
 1) It's fast on the C-level
 2) It covers all the use-cases.
 
 If just a particular string or list structure were passed, then I would 
 drop the data-format PEP and just have the dataformat argument of the 
 extended buffer protocol be that thing.
 
 Then, something that converts ctypes objects to that special format 
 would be very nice indeed.

It may make sense to have a couple distinct sections in the datatype PEP:
  a. describing data formats with basic Python types
  b. a lightweight class for parsing these data format descriptions

It's most of the way there already - part A would just be the various styles 
of arguments accepted by the datatype constructor, and part B would be the 
datatype object itself.

I personally think it makes the most sense to do both, but separating the two 
would make it clear that the descriptions can be standardised without 
*necessarily* defining a new class.

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Jim Jewett
I'm still not sure exactly what is missing from ctypes.  To make this concrete:

You have an array of 500 elements meeting

struct {
  int  simple;
  struct nested {
   char name[30];
   char addr[45];
   int  amount;
  }

ctypes can describe this as

class nested(Structure):
_fields_ = [(name, c_char*30),
(addr, c_char*45),
(amount, c_long)]

class struct(Structure):
_fields_ = [(simple, c_int), (nested, nested)]

desc = struct * 500

You have said that creating whole classes is too much overhead, and
the description should only be an instance.  To me, that particular
class (arrays of 500 structs) still looks pretty lightweight.  So
please clarify when it starts to be a problem.

(1)  For simple types -- mapping
   char name[30];  == (name, c_char*30)

Do you object to using the c_char type?
Do you object to the array-of-length-30 class, instead of just having
a repeat or shape attribute?
Do you object to naming the field?

(2)  For the complex types, nested and struct

Do you object to creating these two classes even once?   For example,
are you expecting to need different classes for each buffer, and to
have many buffers created quickly?

Is creating that new class a royal pain, but frequent (and slow)
enough that you can't just make a call into python (or ctypes)?

(3)  Given that you will describe X, is X*500 (== a type describing
an array of 500 Xs) a royal pain in C?  If so, are you expecting to
have to do it dynamically for many sizes, and quickly enough that you
can't just let ctypes do it for you?

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis Oliphant
Jim Jewett wrote:
 I'm still not sure exactly what is missing from ctypes.  To make this 
 concrete:

I think the only thing missing from ctypes expressiveness as far as I 
can tell in terms of what you can do is the byte-order representation.

What is missing is ease-of use for producers and consumers in 
interpreting the data-type.   When I speak of Producers and consumers, 
I'm largely talking about C-code (or Java or .NET) code writers.

Producers must basically use Python code to create classes of various 
types.   This is going to be slow in 'C'.  Probably slower than the 
array interface (which is what we have no informally).

Consumers are going to have a hard time interpreting the result.  I'm 
not even sure how to do that, in fact.  I'd like NumPy to be able to 
understand ctypes as a means to specify data.  Would I have to check 
against all the sub-types of CDataType, pull out the fields, check the 
tp_name of the type object?  I'm not sure.

It seems like a string with the C-structure would be better as a 
data-representation, but then a third-party library would want to parse 
that so that Python might as well have it's own parser for data-types. 

So, Python might as well have it's own way to describe data.  My claim 
is this default way should *not* be overloaded by using Python 
type-objects (the ctypes way).  I'm making a claim that the NumPy way of 
using a different Python object to describe data-types.  I'm not saying 
the NumPy object should be used.  I'm saying we should come up with a 
singe DataFormatType whose instances express the data formats in ways 
that other packages can produce and consume (or even use internally).  

It would be easy for NumPy to use the default Python object in it's 
PyArray_Descr * structure.  It would also be easy for ctypes to use 
the default Python object in its StgDict object that is the tp_dict of 
every ctypes type object.

It would be easy for the struct module to allow for this data-format 
object (instead of just strings) in it's methods. 

It would be easy for the array module to accept this data-format object 
(instead of just typecodes) in it's constructor.

Lot's of things would suddenly be more consistent throughout both the 
Python and C-Python user space.

Perhaps after discussion, it becomes clear that the ctypes approach is 
sufficient to be that thing that all modules use to share data-format 
information.  It's definitely expressive enough.   But, my argument is 
that NumPy data-type objects are also pretty close. so why should they 
be rejected.  We could also make a string-syntax do it.


 You have said that creating whole classes is too much overhead, and
 the description should only be an instance.  To me, that particular
 class (arrays of 500 structs) still looks pretty lightweight.  So
 please clarify when it starts to be a problem.


 (1)  For simple types -- mapping
   char name[30];  == (name, c_char*30)

 Do you object to using the c_char type?
 Do you object to the array-of-length-30 class, instead of just having
 a repeat or shape attribute?
 Do you object to naming the field?

 (2)  For the complex types, nested and struct

 Do you object to creating these two classes even once?   For example,
 are you expecting to need different classes for each buffer, and to
 have many buffers created quickly?
I object to the way I consume and produce the ctypes interface.  
It's much to slow to be used on the C-level for sharing many small 
buffers quickly.

 Is creating that new class a royal pain, but frequent (and slow)
 enough that you can't just make a call into python (or ctypes)?

 (3)  Given that you will describe X, is X*500 (== a type describing
 an array of 500 Xs) a royal pain in C?  If so, are you expecting to
 have to do it dynamically for many sizes, and quickly enough that you
 can't just let ctypes do it for you?

That pretty much sums it up (plus the pain of having to basically write 
Python code from C).

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis E. Oliphant
Jim Jewett wrote:
 I'm still not sure exactly what is missing from ctypes.  To make this 
 concrete:

I was too hasty.  There are some things actually missing from ctypes:

1) long double (this is not the same across platforms, but it is a 
data-type).
2) complex-valued types (you might argue that it's just a 2-array of 
floats, but you could say the same thing about int as an array of 
bytes).  The point is how do people interpret the data.  Complex-valued 
data-types are very common.  It is one reason Fortran is still used by 
scientists.
3) Unicode characters (there is w_char support but I mean a way to 
describe what kind of unicode characters you have in a cross-platform 
way).  I actually think we have a way to describe encodings in the 
data-format representation as well.

4) What about floating-point representations that are not IEEE 754 
4-byte or 8-byte.   There should be a way to at least express the 
data-format in these cases (this is actually how long double should be 
handled as well since it varies across platforms what is actually done 
with the extra bits).

So, we can't just use ctypes as a complete data-format representation 
because it's also missing some things.

What we need is a standard way for libraries that deal with data-formats 
to communicate with each other.  I need help with a PEP like this and 
that's what I'm asking for.  It's all I've really been after all along.

A couple of points:

* One reason to support the idea of the Python object approach (versus a 
string-syntax) is that it is already parsed.  A list-syntax approach 
(perhaps built from strings for fundamental data-types) might also be 
considered already parsed as well.

* One advantage of using kind versus a character for every type (like 
struct and array do) is that it helps consumers and producers speed up 
the parser (a fuller branching tree).


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
 I was too hasty.  There are some things actually missing from ctypes:

I think Thomas can correct me if I'm wrong: I think endianness is
supported (although this support seems undocumented). There seems
to be code that checks for the presence of a _byteswapped_ attribute
on fields of a struct; presence of this field is then interpreted
as data having the other endianness.

 1) long double (this is not the same across platforms, but it is a 
 data-type).

That's indeed missing.

 2) complex-valued types (you might argue that it's just a 2-array of 
 floats, but you could say the same thing about int as an array of 
 bytes).  The point is how do people interpret the data.  Complex-valued 
 data-types are very common.  It is one reason Fortran is still used by 
 scientists.

Well, by the same reasoning, you could argue that pixel values (RGBA)
are missing in the PEP. It's a convenience, sure, and it may also help
interfacing with the platform's FORTRAN implementation - however, are
you sure that NumPy's complex layout is consistent with the platform's
C99 _Complex definition?

 3) Unicode characters
 
 4) What about floating-point representations that are not IEEE 754 
 4-byte or 8-byte.

Both of these are available in a platform-dependent way: if the
platform uses non-IEEE754 formats for C float and C double, ctypes
will interface with that just fine. It is actually vice versa:
IEEE-754 4-byte and 8-byte is not supported in ctypes.
Same for Unicode: the platform's wchar_t is supported (as you said),
but not a platform-independent (say) 4-byte little-endian.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis Oliphant
Martin v. Löwis wrote:
 Travis E. Oliphant schrieb:
 
2) complex-valued types (you might argue that it's just a 2-array of 
floats, but you could say the same thing about int as an array of 
bytes).  The point is how do people interpret the data.  Complex-valued 
data-types are very common.  It is one reason Fortran is still used by 
scientists.
 
 
 Well, by the same reasoning, you could argue that pixel values (RGBA)
 are missing in the PEP. It's a convenience, sure, and it may also help
 interfacing with the platform's FORTRAN implementation - however, are
 you sure that NumPy's complex layout is consistent with the platform's
 C99 _Complex definition?
 

I think so (it is on gcc).  And yes, where you draw the line between 
fundamental and derived data-type is somewhat arbitrary.  I'd rather 
include complex-numbers than not given their prevalence in the 
data-streams I'm trying to make compatible with each other.

 
3) Unicode characters

4) What about floating-point representations that are not IEEE 754 
4-byte or 8-byte.
 
 
 Both of these are available in a platform-dependent way: if the
 platform uses non-IEEE754 formats for C float and C double, ctypes
 will interface with that just fine. It is actually vice versa:
 IEEE-754 4-byte and 8-byte is not supported in ctypes.

That's what I meant.  The 'f' kind in the data-type description is also 
intended to mean platform float whatever that is.  But, a complete 
data-format representation would have a way to describe other 
bit-layouts for floating point representation.  Even if you can't 
actually calculate directly with them without conversion.

 Same for Unicode: the platform's wchar_t is supported (as you said),
 but not a platform-independent (say) 4-byte little-endian.

Right.

It's a matter of scope.  Frankly, I'd be happy enough to start with 
typecodes in the extended buffer protocol (that's where the array 
module is now) and then move up to something more complete later.

But, since we already have an array interface for record-arrays to share 
information and data with each other, and ctypes showing all of it's 
power, then why not be more complete?



-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Alexander Belopolsky
Travis Oliphant oliphant.travis at ieee.org writes:
 Frankly, I'd be happy enough to start with 
 typecodes in the extended buffer protocol (that's where the array 
 module is now) and then move up to something more complete later.
 

Let's just start with that.  The way I see the problem is that buffer protocol
is fine as long as your data is an array of bytes, but if it is an array of
doubles, you are out of luck. So, while I can do

 b = buffer(array('d', [1,2,3]))

there is not much that I can do with b.  For example, if I want to pass it to
numpy, I will have to provide the type and shape information myself:

 numpy.ndarray(shape=(3,), dtype=float, buffer=b)
array([ 1.,  2.,  3.])

With the extended buffer protocol, I should be able to do

 numpy.array(b)

So let's start by solving this problem and limit it to data that can be found
in a standard library array.  This way we can postpone the discussion of shapes,
strides and nested structs.

I propose a simple bf_gettypeinfo(PyObject *obj, int* type, int* bitsize) method
that would return a type code and the size of the data item.

I believe it is better to have type codes free from size information for
several reasons:

1. Generic code can use size information directly without having to know
that int is 32 and double is 64 bits.

2. Odd sizes can be easily described without having to add a new type code.

3. I assume that the existing bf_ functions would still return size in bytes,
so having item size available as an int will help to get number of items.

If we manage to agree on the standard way to pass primitive type information,
it will be a big achievement and immediately useful because simple arrays are
already in the standard library.

 





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Paul Moore
On 11/1/06, Alexander Belopolsky [EMAIL PROTECTED] wrote:
 Let's just start with that.  The way I see the problem is that buffer protocol
 is fine as long as your data is an array of bytes, but if it is an array of
 doubles, you are out of luck. So, while I can do

  b = buffer(array('d', [1,2,3]))

 there is not much that I can do with b.  For example, if I want to pass it to
 numpy, I will have to provide the type and shape information myself:

  numpy.ndarray(shape=(3,), dtype=float, buffer=b)
 array([ 1.,  2.,  3.])

 With the extended buffer protocol, I should be able to do

  numpy.array(b)

As a data point, this is the first posting that has clearly explained
to me what the two PEPs are attempting to achieve. That may be my
blindness to what others find self-evident, but equally, I may not be
the only one who needed this example...

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis Oliphant
Alexander Belopolsky wrote:
 Travis Oliphant oliphant.travis at ieee.org writes:
 
 
b = buffer(array('d', [1,2,3]))
 
 
 there is not much that I can do with b.  For example, if I want to pass it to
 numpy, I will have to provide the type and shape information myself:
 
 
numpy.ndarray(shape=(3,), dtype=float, buffer=b)
 
 array([ 1.,  2.,  3.])
 
 With the extended buffer protocol, I should be able to do
 
 
numpy.array(b)

or just

numpy.array(array.array('d',[1,2,3]))

and leave-out the buffer object all together.


 
 
 So let's start by solving this problem and limit it to data that can be found
 in a standard library array.  This way we can postpone the discussion of 
 shapes,
 strides and nested structs.

Don't lump those ideas together.  Shapes and strides are necessary for 
N-dimensional array's (it's essentially what *defines* the N-dimensional 
array).   I really don't want to sacrifice those in the extended buffer 
protocol.  If you want to separate them into different functions then 
that is a possibility.

 
 If we manage to agree on the standard way to pass primitive type information,
 it will be a big achievement and immediately useful because simple arrays are
 already in the standard library.
 

We could start there, I suppose.  Especially if it helps us all get on 
the same page.  But, we already see the applications beyond this simple 
case so I would like to have at least an eye for the more difficult 
case which we already have a working solution for in the array interface

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Travis Oliphant
Paul Moore wrote:
 
 
 Enough of the abstract. As a concrete example, suppose I have a (byte)
 string in my program containing some binary data - an ID3 header, or a
 TCP packet, or whatever. It doesn't really matter. Does your proposal
 offer anything to me in how I might manipulate that data (assuming I'm
 not using NumPy)? (I'm not insisting that it should, I'm just trying
 to understand the scope of the PEP).
 

What do you mean by manipulate the data.  The proposal for a 
data-format object would help you describe that data in a standard way 
and therefore share that data between several library that would be able 
to understand the data (because they all use and/or understand the 
default Python way to handle data-formats).

It would be up to the other packages to manipulate the data.

So, what you would be able to do is take your byte-string and create a 
buffer object which you could then share with other packages:

Example:

b = buffer(bytestr, format=data_format_object)

Now.

a = numpy.frombuffer(b)
a['field1']  # prints data stored in the field named field1

etc.

Or.

cobj = ctypes.frombuffer(b)

# Now, cobj is a ctypes object that is basically a structure that can 
be passed # directly to your C-code.

Does this help?

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Alexander Belopolsky
Travis Oliphant oliphant.travis at ieee.org writes:

 Don't lump those ideas together.  Shapes and strides are necessary for 
 N-dimensional array's (it's essentially what *defines* the N-dimensional 
 array).   I really don't want to sacrifice those in the extended buffer 
 protocol.  If you want to separate them into different functions then 
 that is a possibility.


I don't understand.  Do you want to discuss shapes and strides separately
from the datatype or not? Note that in ctypes shape is a property of 
datatype (as in c_int*2*3).   In your proposal, shapes and strides are
communicated separately.  This presents a unique memory management
challenge: if the object does not contain shape information in a ready to
be pointed to form, who is responsible for deallocating the shape array?  
 
  
  If we manage to agree on the standard way to pass primitive type 
  information,
  it will be a big achievement and immediately useful because simple arrays 
  are
  already in the standard library.
  
 
 We could start there, I suppose.  Especially if it helps us all get on 
 the same page.

Let's start:

1. Should primitive types be associated with simple type codes (short, int, 
long,
float, double) or type/size pairs [(int,16), (int, 32), (int, 64), (float, 32), 
(float, 64)]?
 - I prefer pairs

2. Should primitive type codes be characters or integers (from an enum) at
C level?
- I prefer integers

3. Should size be expressed in bits or bytes?
- I prefer bits


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Alexander Belopolsky
Travis E. Oliphant oliphant.travis at ieee.org writes:

 
 Alexander Belopolsky wrote:
  ...
  1. Should primitive types be associated with simple type codes
 (short, 
int, long,
  float, double) or type/size pairs [(int,16), (int, 32), (int, 64), 
(float, 32), 
  (float, 64)]?
   - I prefer pairs
  
  2. Should primitive type codes be characters or integers (from an 
enum) at
  C level?
  - I prefer integers
 
 Are these orthogonal?
 

Do you mean are my quiestions 1 and 2 orthogonal? I guess they are.

  
  3. Should size be expressed in bits or bytes?
  - I prefer bits
  
 
 So, you want an integer enum for the kind and an integer for the 
 bitsize?   That's fine with me.
 
 One thing I just remembered.  We have T_UBYTE and T_BYTE, etc. defined 
 in structmember.h already.  Should we just re-use those #defines while 
 adding to them to make an easy to use interface for primitive types?
 

I was thinking about using something like NPY_TYPES enum, but T_* 
codes would work as well.  Let me just present both options for the
 record:

 --- numpy/ndarrayobject.h ---

enum NPY_TYPES {NPY_BOOL=0,
NPY_BYTE, NPY_UBYTE,
NPY_SHORT, NPY_USHORT,
NPY_INT, NPY_UINT,
NPY_LONG, NPY_ULONG,
NPY_LONGLONG, NPY_ULONGLONG,
NPY_FLOAT, NPY_DOUBLE, NPY_LONGDOUBLE,
NPY_CFLOAT, NPY_CDOUBLE, NPY_CLONGDOUBLE,
NPY_OBJECT=17,
NPY_STRING, NPY_UNICODE,
NPY_VOID,
NPY_NTYPES,
NPY_NOTYPE,
NPY_CHAR,  /* special flag */
NPY_USERDEF=256  /* leave room for characters */
};

--- structmember.h ---

/* Types */
#define T_SHORT 0
#define T_INT   1
#define T_LONG  2
#define T_FLOAT 3
#define T_DOUBLE4
#define T_STRING5
#define T_OBJECT6
/* XXX the ordering here is weird for binary compatibility */
#define T_CHAR  7   /* 1-character string */
#define T_BYTE  8   /* 8-bit signed int */
/* unsigned variants: */
#define T_UBYTE 9
#define T_USHORT10
#define T_UINT  11
#define T_ULONG 12

/* Added by Jack: strings contained in the structure */
#define T_STRING_INPLACE13

#define T_OBJECT_EX 16  /* Like T_OBJECT, but raises AttributeError
   when the value is NULL, instead of
   converting to None. */
#ifdef HAVE_LONG_LONG
#define T_LONGLONG  17  
#define T_ULONGLONG  18
#endif /* HAVE_LONG_LONG */




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-11-01 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
 2. Should primitive type codes be characters or integers (from an enum) at
 C level?
 - I prefer integers
 
 3. Should size be expressed in bits or bytes?
 - I prefer bits

 
 So, you want an integer enum for the kind and an integer for the 
 bitsize?   That's fine with me.
 
 One thing I just remembered.  We have T_UBYTE and T_BYTE, etc. defined 
 in structmember.h already.  Should we just re-use those #defines while 
 adding to them to make an easy to use interface for primitive types?

Notice that those type codes imply sizes, namely the platform sizes
(where platform always means what the C compiler does). So if
you want to have platform-independent codes as well, you shouldn't
use the T_ codes.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Paul Moore
On 10/31/06, Travis Oliphant [EMAIL PROTECTED] wrote:
 In order to make sense of the data-format object that I'm proposing you
 have to see the need to share information about data-format through an
 extended buffer protocol (which I will be proposing soon).  I'm not
 going to try to argue that right now because there are a lot of people
 who can do that.

 So, I'm going to assume that you see the need for it.  If you don't,
 then just suspend concern about that for the moment.  There are a lot of
 us who really see the need for it.

[...]

 Again, my real purpose is the extended buffer protocol.  These
 data-format type is a means to that end.  If the consensus is that
 nobody sees a greater use of the data-format type beyond the buffer
 protocol, then I will just write 1 PEP for the extended buffer protocol.

While I don't personally use NumPy, I can see where an extended buffer
protocol like you describe could be advantageous, and so I'm happy to
concede that benefit.

I can also vaguely see that a unified block of memory description
would be useful. My interest would be in the area of the struct module
(unpacking and packing data for dumping to byte streams - whether this
happens in place or not is not too important to this use case).
However, I cannot see how your proposal would help here in practice -
does it include the functionality of the struct module (or should it?)
If so, then I'd like to see examples of equivalent constructs. If not,
then isn't it yet another variation on the theme, adding to the
problem of multiple approaches rather than helping?

I can also see the parallels with ctypes. Here I feel a little less
sure that keeping the two approaches is wrong. I don't know why I feel
like that - maybe nothing more than familiarity with ctypes - but I
don't have the same reluctance to have both the ctypes data definition
stuff and the new datatype proposal.

Enough of the abstract. As a concrete example, suppose I have a (byte)
string in my program containing some binary data - an ID3 header, or a
TCP packet, or whatever. It doesn't really matter. Does your proposal
offer anything to me in how I might manipulate that data (assuming I'm
not using NumPy)? (I'm not insisting that it should, I'm just trying
to understand the scope of the PEP).

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Gareth McCaughan
  It might be better not to consider bit to be a
  type at all, and come up with another way of indicating
  that the size is in bits. Perhaps
  
  'i4'   # 4-byte signed int
  'i4b'  # 4-bit signed int
  'u4'   # 4-byte unsigned int
  'u4b'  # 4-bit unsigned int
  
 
 I like this.  Very nice.  I think that's the right way to look at it.

I remark that 'ib4' and 'ub4' make for marginally easier
parsing and less danger of ambiguity.

-- 
g

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Michael Chermside
In this email I'm responding to a series of emails from Travis
pretty much in the order I read them:

Travis Oliphant writes:
 I'm saying we should introduce a single-object mechanism for  
 describing binary data so that the many-object approach of c-types  
 does not become some kind of de-facto standard.  C-types can  
 translate this object-instance to its internals if and when it  
 needs to.

 In the mean-time, how are other packages supposed to communicate  
 binary information about data with each other?

Here we disagree.

I haven't used C-types. I have no idea whether it is well-designed or
horribly unusable. So if someone wanted to argue that C-types is a
mistake and should be thrown out, I'd be willing to listen. Until
someone tries to make that argument, I'm presuming it's good enough to
be part of the standard library for Python.

Given that, I think that it *SHOULD* become a de-facto standard. I
think that the way different packages should communicate binary information
about data with each other is using C-types. Not because it's wonderful
(remember, I've never used it), but because it's STANDARD. There should
be one obvious way to do things! When there is, it makes interoperability
WAY easier, and interoperability is the main objective when dealing with
things like binary data formats.

Propose using C-types. Or propose *improving* C-types. But don't propose
ignoring it.

In a different message, he writes:
 It also bothers me that so many ways to describe binary data are  
 being used out there.  This is a problem that deserves being solved.  
  And, no, ctypes hasn't solved it (we can't directly use the ctypes  
 solution).

Really? Why? Is this a failing in C-types? Can C-types be fixed?

Later he explains:
 Remember the buffer protocol is in compiled code.  So, as a result,

 1) It's harder to construct a class to pass through the protocol  
 using the multiple-types approach of ctypes.

 2) It's harder to interpret the object recevied through the buffer protocol.

 Sure, it would be *possible* to use ctypes, but I think it would be  
 very difficult.  Think about how you would write the get_data_format  
 C function in the extended buffer protocol for NumPy if you had to  
 import ctypes and then build a class just to describe your data.   
 How would you interpret what you get back?

Aha! So what you REALLY ought to be asking for is a C interface to the
ctypes module. That seems like a very sensible and reasonable request.

 I don't think we should just *use ctypes because it's there* when  
 the way it describes binary data was not constructed with the  
 extended buffer protocol in mind.

I just disagree. (1) I *DO* think we should just use ctypes because it's
there. After all, the problem we're trying to solve is one of
COMPATIBILITY - you don't solve those by introducing competing standards.
(2) From what I understand of it, I think ctypes is quite capable of
describing data to be accessed via the buffer protocol.

In another email:
 In order to make sense of the data-format object that I'm proposing  
 you have to see the need to share information about data-format  
 through an extended buffer protocol (which I will be proposing  
 soon).  I'm not going to try to argue that right now because there  
 are a lot of people who can do that.

Actually, no need to convince me... I am already convinced of the
wisdom of this approach.

 My view is that it is un-necessary to use a different type object to  
 describe each different data-type.
  [...]
 So, the big difference is that I think data-formats should be  
 *instances* of a single type.

Why? Who cares? Seriously, if we were proposing to describe the layouts
with a collection of rubber bands and potato chips, I'd say it was a
crazy idea. But we're proposing using data structures in a computer
memory. Why does it matter whether those data structures are of the same
python type or different python types? I care whether the structure
can be created, passed around, and interrogated. I don't care what
Python type they are.

 I'm saying that I don't like the idea of forcing this approach on  
 everybody else who wants to describe arbitrary binary data just  
 because ctypes is included.

And I'm saying that I *do*. Hey, if someone proposed getting rid of
the current syntax for the array module (for Py3K) and replacing it with
use of ctypes, I'd give it serious consideration. There should be only
one way to describe binary structures. It should be powerful enough to
describe almost any structure, easy-to-use, and most of all it should be
used consistently everywhere.

 I need some encouragement in order to continue to invest energy in  
 pushing this forward.

Please keep up the good work! Some day I'd like to see NumPy built in
to the standard Python distribution. The incremental, PEP by PEP approach
you are taking is the best route to getting there. But there may be
some changes along the way -- convergence with ctypes may be one 

Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Nick Coghlan
Travis E. Oliphant wrote:
 However, the existence of an alternative strategy using a single Python 
 type and multiple instances of that type to describe binary data (which 
 is the NumPy approach and essentially the array module approach) means 
 that we can't just a-priori assume that the way ctypes did it is the 
 only or best way.

As a hypothetical, what if there was a helper function that translated a 
description of a data structure using basic strings and sequences (along the 
lines of what you have in your PEP) into a ctypes data structure?

 The examples of missing features that Martin has exposed are not 
 show-stoppers.  They can all be easily handled within the context of 
 what is being proposed.   I can modify the PEP to show this.  But, I 
 don't have the time to spend if it's just all going to be rejected in 
 the end.  I need some encouragement in order to continue to invest 
 energy in pushing this forward.

I think the most important thing in your PEP is the formats for describing 
structures in a way that is easy to construct in both C and Python 
(specifically, by using strings and sequences), and it is worth pursuing for 
that aspect alone. Whether that datatype is then implemented as a class in its 
own right or as a factory function that returns a ctypes data type object is, 
to my mind, a relatively minor implementation issue (either way has questions 
to be addressed - I'm not sure how you tell ctypes that you have a 32-bit 
integer with a non-native endian format, for example).

In fact, it may make sense to just use the lists/strings directly as the data 
exchange format definitions, and let the various libraries do their own 
translation into their private format descriptions instead of creating a new 
one-type-to-describe-them-all.

Cheers,
Nick.


-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Martin v. Löwis wrote:
 Travis Oliphant schrieb:
 
So, the big difference is that I think data-formats should be 
*instances* of a single type.
 
 
 This is nearly the case for ctypes as well. All layout descriptions
 are instances of the type type. Nearly, because they are instances
 of subtypes of the type type:
 
 py type(ctypes.c_long)
 type '_ctypes.SimpleType'
 py type(ctypes.c_double)
 type '_ctypes.SimpleType'
 py type(ctypes.c_double).__bases__
 (type 'type',)
 py type(ctypes.Structure)
 type '_ctypes.StructType'
 py type(ctypes.Array)
 type '_ctypes.ArrayType'
 py type(ctypes.Structure).__bases__
 (type 'type',)
 py type(ctypes.Array).__bases__
 (type 'type',)
 
 So if your requirement is all layout descriptions ought to have
 the same type, then this is (nearly) the case: they are instances
 of type (rather then datatype, as in your PEP).
 

The big difference, however, is that by going this route you are forced 
to use the type object as your data-format instance.  This is 
fitting a square peg into a round hole in my opinion.To really be 
useful, you would need to add the attributes and (most importantly) 
C-function pointers and C-structure members to these type objects.  I 
don't even think that is possible in Python (even if you do create a 
meta-type that all the c-type type objects can use that carries the same 
information).

There are a few people claiming I should use the ctypes type-hierarchy 
but nobody has explained how that would be possible given the 
attributes, C-structure members and C-function pointers that I'm proposing.

In NumPy we also have a Python type for each basic data-format (we call 
them array scalars).  For a little while they carried the data-format 
information on the Python side.  This turned out to be not flexible 
enough.  So, we expanded the PyArray_Descr * structure which has always 
been a part of Numeric (and the array module array type) into an actual 
Python type and a lot of things became possible.

It was clear to me that we were on to something.  Now, the biggest 
claim against the gist of what I'm proposing (details we can argue 
about), seems from my perspective to be a desire to go backwards and 
carry data-type information around with a Python type.

The data-type object did not just appear out of thin-air one day.  It 
really can be seen as an evolution from the beginnings of Numeric (and 
the Python array module).

So, this is what we came up with in the NumPy world.  Ctypes came up 
with something a bit different.  It is not trivial to just use 
ctypes.  I could say the same thing and tell ctypes to just use NumPy's 
  data-type object.   It could be done that way, but of course it would 
take a bit of work on the part of ctypes to make that happen.

Having ctypes in the standard library does not mean that any other 
discussion of how data-format should be represented has been decided on. 
If I had known that was what it meant to put ctypes in the standard 
library, I would have been more vocal several months ago.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Stephan Tolksdorf
Martin v. Löwis wrote:
 Travis Oliphant schrieb:
 Function pointers are supported with the void data-type and could be 
 more specifically supported if it were important.   People typically 
 don't use the buffer protocol to send function-pointers around in a way 
 that the void description wouldn't be enough.
 
 As I said before, I can't tell whether it's important, as I still don't
 know what the purpose of this PEP is. If it is to support a unification
 of memory layout specifications, and if that unifications is also to
 include ctypes, then yes, it is important. If it is to describe array
 elements in NumArray arrays, then it might not be important.
 
  For the usage of ctypes, the PEP void type is insufficient to describe
  function pointers: you also need a specification of the signature of
  the function pointer (parameter types and return type), or else you
  can't use the function pointer (i.e. you can't call the function).

The buffer protocol is primarily meant for describing the format of 
(large) contiguous pieces of binary data. In most cases that will be all 
kinds of numerical data for scientific applications, image and other 
media data, simple databases and similar kinds of data.

There is currently no adequate data format type which sufficiently 
supports these applications, otherwise Travis wouldn't make this proposal.

While Travis' proposal encompasses the data format functionality within 
the struct module and overlaps with what ctypes has to offer, it does 
not aim to replace ctypes.

I don't think that a basic data format type necessarily should be able 
to encode all the information a foreign function interface needs to call 
a code library. From my point of view, that kind of information is one 
abstraction layer above a basic data format and should be implemented as 
an extension of or complementary to the basic data format.

I also do not understand why the data format type should attempt to 
fully describe arbitrarily complex data formats, like fragmented 
(non-continuous) data structures in memory. You'd probably need a full 
programming language for that anyway.

Regards,
   Stephan
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
 But, there are distinct disadvantages to this approach compared to what 
 I'm trying to allow.   Martin claims that the ctypes approach is 
 *basically* equivalent but this is just not true.

I may claim that, but primarily, my goal was to demonstrate that the
proposed PEP cannot be used to describe ctypes object layouts (without
checking, I can readily believe that the PEP covers everything in
the array and struct modules).

 It could be made more 
 true if the ctypes objects inherited from a meta-type and if Python 
 allowed meta-types to expand their C-structures.  But, last I checked 
 this is not possible.

That I don't understand. a) what do you think is not possible? b)
why is that an important difference between a datatype and a ctype?

If you are suggesting that, given two Python types A and B, and
B inheriting from A, that the memory layout of B cannot extend
the memory layout of A, then: that is certainly possible in Python,
and there are many examples for it.

 A Python type object is a very particular kind of Python-type.  As far 
 as I can tell, it's not as flexible in terms of the kinds of things you 
 can do with the instances of a type object (i.e. what ctypes types 
 are) on the C-level.

Ah, you are worried that NumArray objects would have to be *instances*
of ctypes types. That wouldn't be necessary at all. Instead, if each
NumArray object had a method get_ctype(), which returned a ctypes type,
then you would get the same desciptiveness that you get with the
PEP's datatype.

 I'm happy to have the data-format object live separate from ctypes and 
 leave it to the ctypes author(s) to support it if desired.  But, the 
 claim that the extended buffer protocol jump through all kinds of hoops 
 to conform to the ctypes standard when that standard was designed 
 with a different idea in mind is not acceptable.

That, of course, is a reasoning I can understand. This is free software,
contributors can chose to contribute whatever they want; you can't force
anybody to do anything specific you want to get done. Acceptance of
any PEP (not just this PEP) should always be contingent on available
of a patch implementing it.

 Where is the discussion that crowned the ctypes way of doing things as 
 the one true way

It hasn't been crowned this way. Me, personally, I just said two things
about this PEP and ctypes:
a) the PEP does not support all concepts that ctypes needs
b) ctypes can express all examples in the PEP
in response to your proposal that ctypes should adopt the PEP, and
that ctypes is not good enough to be the one true way.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Nick Coghlan wrote:
 Travis E. Oliphant wrote:
 
However, the existence of an alternative strategy using a single Python 
type and multiple instances of that type to describe binary data (which 
is the NumPy approach and essentially the array module approach) means 
that we can't just a-priori assume that the way ctypes did it is the 
only or best way.
 
 
 As a hypothetical, what if there was a helper function that translated a 
 description of a data structure using basic strings and sequences (along the 
 lines of what you have in your PEP) into a ctypes data structure?
 

That would be fine and useful in fact.  I don't see how it helps the 
problem of what to pass through the buffer protocol  I see passing 
c-types type objects around on the c-level as an un-necessary and 
burdensome approach unless the ctypes objects were significantly enhanced.


 
 In fact, it may make sense to just use the lists/strings directly as the data 
 exchange format definitions, and let the various libraries do their own 
 translation into their private format descriptions instead of creating a new 
 one-type-to-describe-them-all.

Yes, I'm open to this possibility.   I basically want two things in the 
object passed through the extended buffer protocol:

1) It's fast on the C-level
2) It covers all the use-cases.

If just a particular string or list structure were passed, then I would 
drop the data-format PEP and just have the dataformat argument of the 
extended buffer protocol be that thing.

Then, something that converts ctypes objects to that special format 
would be very nice indeed.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Martin v. Löwis
Travis Oliphant schrieb:
 The big difference, however, is that by going this route you are forced 
 to use the type object as your data-format instance.

Since everything is an object (an instance) in Python, this is not
such a big difference.

 This is 
 fitting a square peg into a round hole in my opinion.To really be 
 useful, you would need to add the attributes and (most importantly) 
 C-function pointers and C-structure members to these type objects. 

Can you explain why that is? In the PEP, I see two C fucntions:
setitem and getitem. I think they can be implemented readily with
ctypes' GETFUNC and SETFUNC function pointers that it uses
all over the place.

I don't see a requirement to support C structure members or
function pointers in the datatype object.

 There are a few people claiming I should use the ctypes type-hierarchy 
 but nobody has explained how that would be possible given the 
 attributes, C-structure members and C-function pointers that I'm proposing.

Ok, here you go. Remember, I'm still not claiming that this should be
done: I'm just explaining how it could be done.

- byteorder/isnative: I think this could be derived from the
  presence of the _swappedbytes_ field
- itemsize: can be done with ctypes.sizeof
- kind: can be created through a mapping of the _type_ field
  (I think)
- fields: can be derived from the _fields_ member
- hasobject: compare, recursively, with py_object
- name: use __name__
- base: again, created from _type_ (if _length_ is present)
- shape: recursively look at _length_
- alignment: use ctypes.alignment

 It was clear to me that we were on to something.  Now, the biggest 
 claim against the gist of what I'm proposing (details we can argue 
 about), seems from my perspective to be a desire to go backwards and 
 carry data-type information around with a Python type.

I, at least, have no such desire. I just explained that the ctypes
model of memory layouts is just as expressive as the one in the
PEP. Which of these is better for what the PEP wants to achieve,
I can't say, because I still don't quite understand what the PEP
wants to achieve.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Martin v. Löwis
Stephan Tolksdorf schrieb:
 While Travis' proposal encompasses the data format functionality within 
 the struct module and overlaps with what ctypes has to offer, it does 
 not aim to replace ctypes.

This discussion could have been a lot shorter if he had said so.
Unfortunately (?) he stated that it was *precisely* a motivation
of the PEP to provide a standard data description machinery that
can then be adopted by the struct, array, and ctypes modules.

 I also do not understand why the data format type should attempt to 
 fully describe arbitrarily complex data formats, like fragmented 
 (non-continuous) data structures in memory. You'd probably need a full 
 programming language for that anyway.

For an FFI application, you need to be able to describe arbitrary
in-memory formats, since that's what the foreign function will
expect. For type safety and reuse, you better separate the
description of the layout from the creation of the actual values.
Otherwise (i.e. if you have to define the layout on each invocation),
creating the parameters for a foreign function becomes very tedious
and error-prone, with errors often being catastrophic (i.e. interpreter
crashes).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Thomas Heller
Travis Oliphant schrieb:
 Greg Ewing wrote:
 Travis Oliphant wrote:
 
 
Part of the problem is that ctypes uses a lot of different Python types 
(that's what I mean by multi-object to accomplish it's goal).  What 
I'm looking for is a single Python type that can be passed around and 
explains binary data.
 
 
 It's not clear that multi-object is a bad thing in and
 of itself. It makes sense conceptually -- if you have
 a datatype object representing a struct, and you ask
 for a description of one of its fields, which could
 be another struct or array, you would expect to get
 another datatype object describing that.
 
 Can you elaborate on what would be wrong with this?
 
 Also, can you clarify whether your objection is to
 multi-object or multi-type. They're not the same thing --
 you could have a data structure built out of multiple
 objects that are all of the same Python type, with
 attributes distinguishing between struct, array, etc.
 That would be single-type but multi-object.
 
 I've tried to clarify this in another post.  Basically, what I don't 
 like about the ctypes approach is that it is multi-type (every new 
 data-format is a Python type).
 
 In order to talk about all these Python types together, then they must 
 all share some attribute (or else be derived from a meta-type in C with 
 a specific function-pointer entry).

(I tried to read the whole thread again, but it is too large already.)

There is a (badly named, probably) api to access information
about ctypes types and instances of this type.  The functions are
PyObject_stgdict(obj) and PyType_stgdict(type).  Both return a
'StgDictObject' instance or NULL if the funtion fails.  This object
is the ctypes' type object's __dict__.

StgDictObject is a subclass of PyDictObject and has fields that
carry information about the C type (alignment requirements, size in bytes,
plus some other stuff).  Also it contains several pointers to functions
that implement (in C) struct-like functionality (packing/unpacking).

Of course several of these fields can only be used for ctypes-specific
purposes, for example a pointer to the ffi_type which is used when
calling foreign functions, or the restype, argtypes, and errcheck fields
which are only used when the type describes a function pointer.


This mechanism is probably a hack because it'n not possible to add C accessible
fields to type objects, on the other hand it is extensible (in principle, at 
least).


Just to describe the implementation.

Thomas

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Martin v. Löwis wrote:
 Travis E. Oliphant schrieb:
 
But, there are distinct disadvantages to this approach compared to what 
I'm trying to allow.   Martin claims that the ctypes approach is 
*basically* equivalent but this is just not true.
 
 
 I may claim that, but primarily, my goal was to demonstrate that the
 proposed PEP cannot be used to describe ctypes object layouts (without
 checking, I can readily believe that the PEP covers everything in
 the array and struct modules).
 

That's a fine argument.  You are right in terms of the PEP as it stands. 
  However, I want to make clear that a single Python type object *could* 
be used to describe data including all the cases you laid out.  It would 
not be difficult to extend the PEP to cover all the cases you've 
described --- I'm not sure that's desireable.  I'm not trying to replace 
what ctypes does.  I'm just trying to get something that we can use to 
exchange data-format information through the extended buffer protocol.

It really comes down to using Python type-objects as the instances 
describing data-formats (which ctypes does) or normal Python objects 
as the instances describing data-formats (what the PEP proposes).

 
It could be made more 
true if the ctypes objects inherited from a meta-type and if Python 
allowed meta-types to expand their C-structures.  But, last I checked 
this is not possible.
 
 
 That I don't understand. a) what do you think is not possible?

Extending the C-structure of PyTypeObject and having Python types use 
that as their type-object.

  b)
 why is that an important difference between a datatype and a ctype?

Because with instances of C-types you are stuck with the PyTypeObject 
structure.  If you want to add anything you have to do it in the 
dictionary.

Instances of a datatype allow adding anything after the PyObject_HEAD 
structure.

 
 If you are suggesting that, given two Python types A and B, and
 B inheriting from A, that the memory layout of B cannot extend
 the memory layout of A, then: that is certainly possible in Python,
 and there are many examples for it.


I know this.  I've done it for many different objects.  I'm saying it's 
not quite the same when what you are extending is the PyTypeObject and 
trying to use it as the type object for some other object.


 
A Python type object is a very particular kind of Python-type.  As far 
as I can tell, it's not as flexible in terms of the kinds of things you 
can do with the instances of a type object (i.e. what ctypes types 
are) on the C-level.
 
 
 Ah, you are worried that NumArray objects would have to be *instances*
 of ctypes types. That wouldn't be necessary at all. Instead, if each
 NumArray object had a method get_ctype(), which returned a ctypes type,
 then you would get the same desciptiveness that you get with the
 PEP's datatype.
 

No, I'm not worried about that (It's not NumArray by the way, it's 
NumPy.  NumPy replaces both NumArray and Numeric).

NumPy actually interfaces with ctypes quite well.  This is how I learned 
anything I might know about ctypes.  So, I'm well aware of this.

What I am concerned about is using Python type objects (i.e. Python 
objects that can be cast in C to PyTypeObject *) outside of ctypes to 
describe data-formats when you don't need it and it just complicates 
dealing with the data-format description.

 
Where is the discussion that crowned the ctypes way of doing things as 
the one true way
 
 
 It hasn't been crowned this way. Me, personally, I just said two things
 about this PEP and ctypes:

Thanks for clarifying, but I know you didn't say this.  Others, however, 
basically did.

 a) the PEP does not support all concepts that ctypes needs

It could be extended, but I'm not sure it *needs* to be in it's real 
context.  I'm very sorry for contributing to the distraction that ctypes 
should adopt the PEP.  My words were unclear.  But, I'm not pushing for 
that.  I really have no opinion how ctypes describes data.

 b) ctypes can express all examples in the PEP
 in response to your proposal that ctypes should adopt the PEP, and
 that ctypes is not good enough to be the one true way.
 

I think it is good enough in the semantic sense.  But, I think using 
type objects in this fashion for general-purpose data-description is 
over-kill and will be much harder to extend and deal with.

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Thomas Heller wrote:
 
 (I tried to read the whole thread again, but it is too large already.)
 
 There is a (badly named, probably) api to access information
 about ctypes types and instances of this type.  The functions are
 PyObject_stgdict(obj) and PyType_stgdict(type).  Both return a
 'StgDictObject' instance or NULL if the funtion fails.  This object
 is the ctypes' type object's __dict__.
 
 StgDictObject is a subclass of PyDictObject and has fields that
 carry information about the C type (alignment requirements, size in bytes,
 plus some other stuff).  Also it contains several pointers to functions
 that implement (in C) struct-like functionality (packing/unpacking).
 
 Of course several of these fields can only be used for ctypes-specific
 purposes, for example a pointer to the ffi_type which is used when
 calling foreign functions, or the restype, argtypes, and errcheck fields
 which are only used when the type describes a function pointer.
 
 
 This mechanism is probably a hack because it'n not possible to add C 
 accessible
 fields to type objects, on the other hand it is extensible (in principle, at 
 least).
 

Thank you for the description.  While I've studied the ctypes code, I 
still don't understand the purposes beind all the data-structures.

Also, I really don't have an opinion about ctypes' implementation.   All 
my comparisons are simply being resistant to the unexplained idea that 
I'm supposed to use ctypes objects in a way they weren't really designed 
to be used.

For example, I'm pretty sure you were the one who made me aware that you 
can't just extend the PyTypeObject.  Instead you extended the tp_dict of 
the Python typeObject to store some of the extra information that is 
needed to describe a data-type like I'm proposing.

So, if you I'm just describing data-format information, why do I need 
all this complexity (that makes ctypes implementation easier/more 
natural/etc)?  What if the StgDictObject is the Python data-format 
object I'm talking about?  It actually looks closer.

But, if all I want is the StgDictObject (or something like it), then why 
should I pass around the whole type object?

This is all I'm saying to those that want me to use ctypes to describe 
data-formats in the extended buffer protocol.  I'm not trying to change 
anything in ctypes.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Thomas Heller
Travis Oliphant schrieb:
 For example, I'm pretty sure you were the one who made me aware that you 
 can't just extend the PyTypeObject.  Instead you extended the tp_dict of 
 the Python typeObject to store some of the extra information that is 
 needed to describe a data-type like I'm proposing.
 
 So, if you I'm just describing data-format information, why do I need 
 all this complexity (that makes ctypes implementation easier/more 
 natural/etc)?  What if the StgDictObject is the Python data-format 
 object I'm talking about?  It actually looks closer.
 
 But, if all I want is the StgDictObject (or something like it), then why 
 should I pass around the whole type object?

Maybe you don't need it.  ctypes certainly needs the type object because
it is also used for constructing instances (while NumPy uses factory functions,
IIUC), or for converting 'native' Python object into foreign function arguments.

I know that this doesn't interest you from the NumPy perspective (and I don't 
want
to offend you by saying this).

 This is all I'm saying to those that want me to use ctypes to describe 
 data-formats in the extended buffer protocol.  I'm not trying to change 
 anything in ctypes.

I don't want to change anything in NumPy, either, and was not the one who
suggested to use ctypes objects, although I had thought about whether it
would be possible or not.

What I like about ctypes, and dislike about Numeric/Numarry/NumPy is
the way C compatible types are defined in ctypes.  I find the ctypes
way more natural than the numxxx or array module way, but what else would
anyone expect from me as the ctypes author...

I hope that a useful interface is developed from your proposals, and
will be happy to adapt ctypes to use it or interface ctypes with it
if this makes sense.

Thomas

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Martin v. Löwis wrote:
 Stephan Tolksdorf schrieb:
 
While Travis' proposal encompasses the data format functionality within 
the struct module and overlaps with what ctypes has to offer, it does 
not aim to replace ctypes.
 
 
 This discussion could have been a lot shorter if he had said so.
 Unfortunately (?) he stated that it was *precisely* a motivation
 of the PEP to provide a standard data description machinery that
 can then be adopted by the struct, array, and ctypes modules.

Struct and array I was sure about.  Ctypes less sure.  I'm very sorry 
for the distraction I caused by mis-stating my objective.   My objective 
is really the extended buffer protocol.  The data-type object is a means 
to that end.

I do think ctypes could make use of the data-type object and that there 
is a real difference between using Python type objects as data-format 
descriptions and using another Python type for those descriptions.  I 
thought to go the ctypes route (before I even knew what ctypes did) but 
decided against it for a number of reasons.

But, nonetheless those are side issues.  The purpose of the PEP is to 
provide an object that the extended buffer protocol can use to share 
data-format information.  It should be considered primarily in that context.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Martin v. Löwis
Travis Oliphant schrieb:
 I think it actually is.  Perhaps I'm wrong, but a type-object is still a 
 special kind of an instance of a meta-type.  I once tried to add 
 function pointers to a type object by inheriting from it.  But, I was 
 told that Python is not set up to handle that.  Maybe I misunderstood.

I'm not quite sure what the problems are: one obvious problem is
that the next Python version may also extend the size of type objects.
But, AFAICT, even that should work, in the sense that this new version
should check for the presence of a flag to determine whether the
additional fields are there. The only tricky question is how you can
find out whether your own extension is there.

If that is a common problem, I think a framework could be added to
support extensible type objects (with some kind of registry for
additional fields, and a per-type-object indicator whether a certain
extension field is present).

 Let me be very clear.  The whole reason I make any statements about 
 ctypes is because somebody else brought it up.  I'm not trying to 
 replace ctypes and the way it uses type objects to represent data 
 internally.

Ok. I understood you differently earlier.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Paul Moore
On 10/31/06, Travis Oliphant [EMAIL PROTECTED] wrote:
 Martin v. Löwis wrote:
  [...] because I still don't quite understand what the PEP
  wants to achieve.
 

 Are you saying you still don't understand after having read the extended
 buffer protocol PEP, yet?

I can't speak for Martin, but I don't understand how I, as a Python
programmer, might use the data type objects specified in the PEP. I
have skimmed the extended buffer protocol PEP, but I'm conscious that
no objects I currently use support the extended buffer protocol (and
the PEP doesn't mention adding support to existing objects), so I
don't see that as too relevant to me.

I have also installed numpy, and looked at the help for numpy.dtype,
but that doesn't add much to the PEP. The freely available chapters of
the numpy book explain how dtypes describe data structures, but not
how to use them. The freely available Numeric documentation doesn't
refer to dtypes, as far as I can tell. Is there any documentation on
how to use dtypes, independently of other features of numpy? If not,
can you clarify where the benefit lies for a Python user of this
proposal? (I understand the benefits of a common language for
extensions to communicate datatype information, but why expose it to
Python? How do Python users use it?)

This is probably all self-evident to the numpy community, but I think
that as the PEP is aimed at a wider audience it needs a little more
background.

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Josiah Carlson

Paul Moore [EMAIL PROTECTED] wrote:
 On 10/31/06, Travis Oliphant [EMAIL PROTECTED] wrote:
  Martin v. Löwis wrote:
   [...] because I still don't quite understand what the PEP
   wants to achieve.
  
 
  Are you saying you still don't understand after having read the extended
  buffer protocol PEP, yet?
 
 I can't speak for Martin, but I don't understand how I, as a Python
 programmer, might use the data type objects specified in the PEP. I
 have skimmed the extended buffer protocol PEP, but I'm conscious that
 no objects I currently use support the extended buffer protocol (and
 the PEP doesn't mention adding support to existing objects), so I
 don't see that as too relevant to me.

Presumably str in 2.x and bytes in 3.x could be extended to support the
'S' specifier, unicode in 2.x and text in 3.x could be extended to
support the 'U' specifier.  The various array.array variants could be
extended to support all relevant specifiers, etc.


 This is probably all self-evident to the numpy community, but I think
 that as the PEP is aimed at a wider audience it needs a little more
 background.

Someone correct me if I am wrong, but it allows things equivalent to the
following that is available in C, available in Python...

typedef struct {
char R;
char G;
char B;
char A;
} pixel_RGBA;

pixel_RGBA image[1024][768];

Or even...

typedef struct {
long long numerator;
unsigned long long denominator;
double approximation;
} rational;

rational ratios[1024];

The real use is that after you have your array of (packed) objects, be
it one of the above samples, or otherwise, you don't need to explicitly
pass around specifiers (like in struct, or ctypes), numpy and others can
talk to each other, and pick up the specifier with the extended buffer
protocol, and it just works.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Travis Oliphant
Paul Moore wrote:
 On 10/31/06, Travis Oliphant [EMAIL PROTECTED] wrote:
 
Martin v. Löwis wrote:

[...] because I still don't quite understand what the PEP
wants to achieve.


Are you saying you still don't understand after having read the extended
buffer protocol PEP, yet?
 
 
 I can't speak for Martin, but I don't understand how I, as a Python
 programmer, might use the data type objects specified in the PEP. I
 have skimmed the extended buffer protocol PEP, but I'm conscious that
 no objects I currently use support the extended buffer protocol (and
 the PEP doesn't mention adding support to existing objects), so I
 don't see that as too relevant to me.

Do you use the PIL?  The PIL supports the array interface.

CVXOPT supports the array interface.

Numarray
Numeric
NumPy

all support the array interface.

 
 I have also installed numpy, and looked at the help for numpy.dtype,
 but that doesn't add much to the PEP. 

The source-code is available.

 The freely available chapters of
 the numpy book explain how dtypes describe data structures, but not
 how to use them. 


The freely available Numeric documentation doesn't
 refer to dtypes, as far as I can tell. 

It kind of does, they are PyArray_Descr * structures in Numeric.  They 
just aren't Python objects.


Is there any documentation on
 how to use dtypes, independently of other features of numpy? 

There are examples and other help pages at http://www.scipy.org

If not,
 can you clarify where the benefit lies for a Python user of this
 proposal? (I understand the benefits of a common language for
 extensions to communicate datatype information, but why expose it to
 Python? How do Python users use it?)
 

The only benefit I imagine would be for an extension module library 
writer and for users of the struct and array modules.  But, other than 
that, I don't know.  It actually doesn't have to be exposed to Python. 
I used Python notation in the PEP to explain what is basically a 
C-structure.  I don't care if the object ever gets exposed to Python.

Maybe that's part of the communication problem.


 This is probably all self-evident to the numpy community, but I think
 that as the PEP is aimed at a wider audience it needs a little more
 background.

It's hard to write that background because most of what I understand is 
from the NumPy community.  I can't give you all the examples but my 
concern is that you have all these third party libraries out there 
describing what is essentially binary data and using either 
string-copies or the buffer protocol + extra information obtained by 
some method or attribute that varies across the implementations.  There 
should really be a standard for describing this data.

There are attempts at it in the struct and array module.  There is the 
approach of ctypes but I claim that using Python type objects is 
over-kill for the purposes of describing data-formats.

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Ron Adam
 The only benefit I imagine would be for an extension module library 
 writer and for users of the struct and array modules.  But, other than 
 that, I don't know.  It actually doesn't have to be exposed to Python. 
 I used Python notation in the PEP to explain what is basically a 
 C-structure.  I don't care if the object ever gets exposed to Python.
 
 Maybe that's part of the communication problem.


I get the impression where ctypes is good for accessing native C libraries from 
within python, the data-type object is meant to add a more direct way to share 
native python object's *data* with C (or other languages) in a more efficient 
way.  For data that can be represented well in continuous memory address's, it 
lightens the load so instead of a list of python objects you get an array of 
data for n python_type objects without the duplications of the python type for 
every element.

I think maybe some more complete examples demonstrating how it is to be used 
from both the Python and C would be good.

Cheers,
Ron

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Bill Baxter
One thing I'm curious about in the ctypes vs this PEP debate is the following. 
How do the approaches differ in practice if I'm developing a library that wants
to accept various image formats that all describe the same thing: rgb data. 
Let's say for now all I want to support is two different image formats whose
pixels are described in C structs by:

struct rbg565
{
  unsigned short r:5;
  unsigned short g:6;
  unsigned short b:5; 
};

struct rgb101210
{
  unsigned int r:10;
  unsigned int g:12;
  unsigned int b:10; 
};


Basically in my code I want to be able to take the binary data descriptor and
say give me the 'r' field of this pixel as an integer.

Is either one (the PEP or c-types) clearly easier to use in this case?  What
would the code look like for handling both formats generically?

--bb


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-31 Thread Ronald Oussoren


On Oct 31, 2006, at 6:38 PM, Thomas Heller wrote:



This mechanism is probably a hack because it'n not possible to add  
C accessible
fields to type objects, on the other hand it is extensible (in  
principle, at least).


I better start rewriting PyObjC then :-). PyObjC stores some addition  
information in the type objects that are used to describe Objective-C  
classes (such as a reference to the proxied class).


IIRC This has been possible from Python 2.3.

Ronald




smime.p7s
Description: S/MIME cryptographic signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Nick Coghlan
Neal Becker wrote:
 I have watched numpy with interest for a long time.  My own interest is to
 possibly use the c-api to wrap c++ algorithms to use from python.
 
 One thing that has concerned me, and continues to concern me with this
 proposal, is that it seems to suffer from a very fat interface.  I
 certainly have not studied the options in any depth, but my gut feeling is
 that the interface is too fat and too complex.  I wonder if it's possible
 to avoid this.  I wonder if this is an example of all the methods sinking
 to the base class.

You've just described my number #1 concern with incorporating NumPy wholesale, 
and the reason I believe it would be nice to cherry-pick a couple of key 
components for the standard library, rather than adopting the whole thing.

Travis has done a lot of work towards that goal (the latest result of which is 
this pre-PEP for describing the individual array elements in a way that is 
more flexible than the single character codes of the current array module).

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Jack Jansen
Would it be possible to make the data-type objects subclassable, with the subclasses being able to override the equality test?The range of data types that you've specified in the PEP are good enough for most general use, and probably for NumPy as well, but someone already came up with the example of image formats, which have their whole own range of data formats. I could throw in audio formats (bits per sample, excess-N or signed or ulaw samples, mono/stereo/5.1/etc, order of the channels), and there's probably a whole slew of other areas that have their own sets of formats.If the datatype objects are subclassable, modules could initially start by adding their own formats. So, the "jackaudio" and "jillaudio" modules would have distinct sets of formats. But then later on it should be fairly easy for them to recognize each others formats. So, jackaudio would recognize the jillaudio format "msdos linear pcm" as being identical to its own "16-bit excess-32768".Hopefully eventually all audio module writers would get together and define a set of standard audio formats. -- Jack Jansen, [EMAIL PROTECTED], http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman  

smime.p7s
Description: S/MIME cryptographic signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Diez B. Roggisch

 ...in the cases I have seen, which includes BMP, TGA, uncompressed TIFF,
 a handful of platform-specific bitmap formats, etc.,  you _always_ get
 them in RGBA order.  If the alpha channel is to be left out, then you
 get them as RGB.

Mac OS X unfortunately uses ARGB. Writing some alti-vec code remedied that for 
passing it around to the OpenCV library.

Just my $.02 

Diez
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis Oliphant
Martin v. Löwis wrote:
 Josiah Carlson schrieb:
 
One could also toss wxPython, VTK, or any one of the other GUI libraries
into the mix for visualizing those images, of which wxPython just
acquired no-copy display of PIL images, and being able to manipulate
them with numpy (of which some wxPython built in classes use numpy to
speed up manipulation) would be very useful.
 
 
 I'm doubtful that this PEP alone would allow zero-copy sharing of images
 for display. Often, the libraries need the data in a different format.
 So they need to copy, even if they could understand the other format.
 However, the PEP won't allow understanding the format. If I know I
 have an array of 4-byte values: which of them is R, G, B, and A?
 

You give a name to the fields: 'R', 'G', 'B', and 'A'.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis Oliphant
Jim Jewett wrote:
 Travis E. Oliphant wrote:
 
 
Two packages need to share a chunk of memory (the package authors do not
know each other and only have and Python as a common reference).  They
both want to describe that the memory they are sharing has some
underlying binary structure.
 
 
 As a quick sanity check, please tell me where I went off track.
 
 it sounds to me like you are assuming that:
 
 (1)  The memory chunk represents a single object (probably an array of
 some sort)
 (2)  That subchunks can themselves be described by a (single?)
 repeating C struct.
 (3)  You can't just use the C header, since you want this at run-time.
 (4)  It would be enough if you could say
 
 This is an array of 500 elements that look like
 
 struct {
   int  simple;
   struct nested {
char name[30];
char addr[45];
int  amount;
   }
 

Sure.  I think that's pretty much it.  I assume you mean object in the 
general sense and not as in (Python object).


 (5)  But is it not acceptable to use Martin's suggested ctypes
 equivalent of (building out from the inside):


Part of the problem is that ctypes uses a lot of different Python types 
(that's what I mean by multi-object to accomplish it's goal).  What 
I'm looking for is a single Python type that can be passed around and 
explains binary data.

Remember the buffer protocol is in compiled code.  So, as a result,

1) It's harder to construct a class to pass through the protocol using 
the multiple-types approach of ctypes.

2) It's harder to interpret the object recevied through the buffer 
protocol.

Sure, it would be *possible* to use ctypes, but I think it would be very 
difficult.  Think about how you would write the get_data_format C 
function in the extended buffer protocol for NumPy if you had to import 
ctypes and then build a class just to describe your data.  How would you 
interpret what you get back?

The ctypes format-description approach is not as unified as a single 
Python type object that I'm proposing.

In NumPy, we have a very nice, compact description of complicated data 
already available.  Why not use what we've learned?

I don't think we should just *use ctypes because it's there* when the 
way it describes binary data was not constructed with the extended 
buffer protocol in mind.

The other option, of course, which would not introduce a new Python type 
is to use the array interface specification and pass a list of tuples. 
But, I think this is also un-necessarily wasteful because the sending 
object has to construct it and the receiving object has to de-construct 
it.  The whole point of the (extended) buffer protocol is to communicate 
this information more quickly.



-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis Oliphant
Greg Ewing wrote:
 Travis E. Oliphant wrote:
 
 
Greg Ewing wrote:
 
 
What exactly does bit mean in that context?   

Do you mean big ?
 
 
 No, you've got a data type there called bit,
 which seems to imply a size, in contradiction
 to the size-independent nature of the other
 types. I'm asking what size-independent
 information it's meant to convey.

Ah.  I see what you were saying now.   I guess the 'bit' type is 
different (we actually don't have that type in NumPy so my understanding 
of it is limited).

The 'bit' type re-intprets the size information to be in units of bits 
and so implies a bit-field instead of another data-format.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis Oliphant
Martin v. Löwis wrote:
 Robert Kern schrieb:
 
As I unification mechanism, I think it is insufficient. I doubt it
can express all the concepts that ctypes supports.

What do you think is missing that can't be added?
 
 
 I can factually only report what is missing. Whether it can be added,
 I don't know. As I just wrote in a few other messages: pointers,
 unions, functions pointers, packed structs, incomplete/recursive
 types. Also flexible array members (i.e. open-ended arrays).
 

I understand function pointers, pointers, and unions.

Function pointers are supported with the void data-type and could be 
more specifically supported if it were important.   People typically 
don't use the buffer protocol to send function-pointers around in a way 
that the void description wouldn't be enough.


Pointers are also supported with the void data-type.  If pointers to 
other data-types were an important feature to support, then this could 
be added in many ways (a flag on the data-type object for example is how 
this is done is NumPy).

Unions are actually supported (just define two fields with the same 
offset).

I don't know what you mean by packed structs (unless you are talking 
about alignment issues in which case there is support for it).

I'm not sure I understand what you mean by incomplete / recursive 
types unless you are referring to something like a node where an element 
of the structure is a pointer to another structure of the same kind 
(like used in linked-lists or trees).  If that is the case, then it's 
easily supported once support for pointers is added.

I also don't know what you mean by open-ended arrays.  The data-format 
is meant to describe a fixed-size chunk of data.

String syntax is not needed to support all of these things.  What I'm 
asking for and proposing is a way to construct an instance of a single 
Python type that communicates this data-format information in a 
standardized way.


-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Martin v. Löwis
Travis Oliphant schrieb:
 Function pointers are supported with the void data-type and could be 
 more specifically supported if it were important.   People typically 
 don't use the buffer protocol to send function-pointers around in a way 
 that the void description wouldn't be enough.

As I said before, I can't tell whether it's important, as I still don't
know what the purpose of this PEP is. If it is to support a unification
of memory layout specifications, and if that unifications is also to
include ctypes, then yes, it is important. If it is to describe array
elements in NumArray arrays, then it might not be important.

For the usage of ctypes, the PEP void type is insufficient to describe
function pointers: you also need a specification of the signature of
the function pointer (parameter types and return type), or else you
can't use the function pointer (i.e. you can't call the function).

 Pointers are also supported with the void data-type.  If pointers to 
 other data-types were an important feature to support, then this could 
 be added in many ways (a flag on the data-type object for example is how 
 this is done is NumPy).

For ctypes, (I think) you need true pointers to other layouts, or
else you couldn't set up the memory correctly.

I don't understand how this could work with some extended buffer
protocol, though: would a buffer still have to be a contiguous piece
of memory? If you have structures with pointers in them, they
rarely point to contiguous memory.

 Unions are actually supported (just define two fields with the same 
 offset).

Ah, ok. What's the string syntax for it?

 I don't know what you mean by packed structs (unless you are talking 
 about alignment issues in which case there is support for it).

Yes, this is indeed about alignment; I missed it. What's the string
syntax for it?

 I'm not sure I understand what you mean by incomplete / recursive 
 types unless you are referring to something like a node where an element 
 of the structure is a pointer to another structure of the same kind 
 (like used in linked-lists or trees).  If that is the case, then it's 
 easily supported once support for pointers is added.

That's what I mean, yes. I'm not sure how it can easily be added,
though. Suppose you want to describe

struct item{
  int key;
  char* value;
  struct item *next;
};

How would you do that? Something like

item = datatype([('key', 'i4'), ('value', 'S*'), ('next',
'what_to_put_here*')]

can't work: item hasn't been assigned, yet, so you can't
use it as the field type.

 I also don't know what you mean by open-ended arrays.  The data-format 
 is meant to describe a fixed-size chunk of data.

I see. In C (and thus in ctypes), you sometimes have what C99 calls
flexible array member:

struct PyString{
  Py_ssize_t ob_refcnt;
  PyObject *ob_type;
  Py_ssize_t ob_len;
  char ob_sval[];
};

where the ob_sval field can extend arbitrarily, as it is the last
member of the struct. Of course, this will give you dynamically-sized
objects (objects in C cannot really be variable-sized, since the
size of a memory block has to be defined at allocation time, and
can't really change afterwards).

 String syntax is not needed to support all of these things.

Ok. That's confusing in the PEP: it's not clear whether all these
forms are meant to be equivalent, and, if not, which one is the most
generic one, and what aspects are missing in what forms. Also,
if you have a datatype which cannot be expressed in the string
syntax, what is its str attribute?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Greg Ewing
Travis Oliphant wrote:

 Part of the problem is that ctypes uses a lot of different Python types 
 (that's what I mean by multi-object to accomplish it's goal).  What 
 I'm looking for is a single Python type that can be passed around and 
 explains binary data.

It's not clear that multi-object is a bad thing in and
of itself. It makes sense conceptually -- if you have
a datatype object representing a struct, and you ask
for a description of one of its fields, which could
be another struct or array, you would expect to get
another datatype object describing that.

Can you elaborate on what would be wrong with this?

Also, can you clarify whether your objection is to
multi-object or multi-type. They're not the same thing --
you could have a data structure built out of multiple
objects that are all of the same Python type, with
attributes distinguishing between struct, array, etc.
That would be single-type but multi-object.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Greg Ewing
Travis Oliphant wrote:

 The 'bit' type re-intprets the size information to be in units of bits 
 and so implies a bit-field instead of another data-format.

Hmmm, okay, but now you've got another orthogonality
problem, because you can't distinguish between e.g.
a 5-bit signed int field and a 5-bit unsigned int
field.

It might be better not to consider bit to be a
type at all, and come up with another way of indicating
that the size is in bits. Perhaps

'i4'   # 4-byte signed int
'i4b'  # 4-bit signed int
'u4'   # 4-byte unsigned int
'u4b'  # 4-bit unsigned int

(Next we can have an argument about whether bit
fields should be packed MSB-to-LSB or vice versa...:-)

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Greg Ewing
Travis Oliphant wrote:

 I'm not sure I understand what you mean by incomplete / recursive 
 types unless you are referring to something like a node where an element 
 of the structure is a pointer to another structure of the same kind 
 (like used in linked-lists or trees).

Yes, and more complex arrangements of types that
reference each other.

  If that is the case, then it's 
 easily supported once support for pointers is added.

But it doesn't fit easily into the single-object
model.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis Oliphant
Armin Rigo wrote:
 Hi Travis,
 
 On Fri, Oct 27, 2006 at 02:05:31PM -0600, Travis E. Oliphant wrote:
 
This PEP proposes adapting the data-type objects from NumPy for
inclusion in standard Python, to provide a consistent and standard
way to discuss the format of binary data. 
 
 
 How does this compare with ctypes?  Do we really need yet another,
 incompatible way to describe C-like data structures in the standarde
 library?

There is a lot of subtlety in the details that IMHO clouds the central 
issue which I will try to clarify here the way I see it.


First of all:

In order to make sense of the data-format object that I'm proposing you 
have to see the need to share information about data-format through an 
extended buffer protocol (which I will be proposing soon).  I'm not 
going to try to argue that right now because there are a lot of people 
who can do that.

So, I'm going to assume that you see the need for it.  If you don't, 
then just suspend concern about that for the moment.  There are a lot of 
us who really see the need for it.

Now:

To describe data-formats ctypes uses a Python type-object defined for 
every data-format you might need.

In my view this is an 'over-use' of the type-object and in fact, to be 
useful, requires the definition of a meta-type that carries the relevant 
additions to the type-object that are needed to describe data (like 
function pointers to get data in and out of Python objects).

My view is that it is un-necessary to use a different type object to 
describe each different data-type.

The route I'm proposing is to define (in C) a *single* new Python type 
(called a data-format type) that carries the information needed to 
describe a chunk of memory.

In this way *instances* of this new type define data-formats.

In ctypes *instances* of the meta-type (i.e. new types) define 
data-formats (actually I'm not sure if all the new c-types are derived 
from the same meta-type).

So, the big difference is that I think data-formats should be 
*instances* of a single type.  There is no need to define a Python 
type-object for every single data-type.  In fact, not only is there no 
need, it makes the extended buffer protocol I'm proposing even more 
difficult to use and explain.

Again, my real purpose is the extended buffer protocol.  These 
data-format type is a means to that end.  If the consensus is that 
nobody sees a greater use of the data-format type beyond the buffer 
protocol, then I will just write 1 PEP for the extended buffer protocol.


-Travis



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis Oliphant
Greg Ewing wrote:
 Travis Oliphant wrote:
 
 
Part of the problem is that ctypes uses a lot of different Python types 
(that's what I mean by multi-object to accomplish it's goal).  What 
I'm looking for is a single Python type that can be passed around and 
explains binary data.
 
 
 It's not clear that multi-object is a bad thing in and
 of itself. It makes sense conceptually -- if you have
 a datatype object representing a struct, and you ask
 for a description of one of its fields, which could
 be another struct or array, you would expect to get
 another datatype object describing that.
 
 Can you elaborate on what would be wrong with this?
 
 Also, can you clarify whether your objection is to
 multi-object or multi-type. They're not the same thing --
 you could have a data structure built out of multiple
 objects that are all of the same Python type, with
 attributes distinguishing between struct, array, etc.
 That would be single-type but multi-object.

I've tried to clarify this in another post.  Basically, what I don't 
like about the ctypes approach is that it is multi-type (every new 
data-format is a Python type).

In order to talk about all these Python types together, then they must 
all share some attribute (or else be derived from a meta-type in C with 
a specific function-pointer entry).

I think it is simpler to think of a single Python type whose instances 
convey information about data-format.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis E. Oliphant
Travis Oliphant wrote:
 Greg Ewing wrote:
 Travis Oliphant wrote:


 Part of the problem is that ctypes uses a lot of different Python types 
 (that's what I mean by multi-object to accomplish it's goal).  What 
 I'm looking for is a single Python type that can be passed around and 
 explains binary data.

 It's not clear that multi-object is a bad thing in and
 of itself. It makes sense conceptually -- if you have
 a datatype object representing a struct, and you ask
 for a description of one of its fields, which could
 be another struct or array, you would expect to get
 another datatype object describing that.

Yes, exactly.  This is what the Python type I'm proposing does as well. 
   So, perhaps we are misunderstanding each other.  The difference is 
that data-types are instances of the data-type (data-format) object 
instead of new Python types (as they are in ctypes).
 
 I've tried to clarify this in another post.  Basically, what I don't 
 like about the ctypes approach is that it is multi-type (every new 
 data-format is a Python type).
 

I should clarify that I have no opinion about the ctypes approach for 
what ctypes does with it.  I like ctypes and have adapted NumPy to make 
it easier to work with ctypes.

I'm saying that I don't like the idea of forcing this approach on 
everybody else who wants to describe arbitrary binary data just because 
ctypes is included.  Now, if it is shown that it is indeed better than a 
simpler instances-of-a-single-type approach that I'm basically proposing 
  then I'll be persuaded.

However, the existence of an alternative strategy using a single Python 
type and multiple instances of that type to describe binary data (which 
is the NumPy approach and essentially the array module approach) means 
that we can't just a-priori assume that the way ctypes did it is the 
only or best way.

The examples of missing features that Martin has exposed are not 
show-stoppers.  They can all be easily handled within the context of 
what is being proposed.   I can modify the PEP to show this.  But, I 
don't have the time to spend if it's just all going to be rejected in 
the end.  I need some encouragement in order to continue to invest 
energy in pushing this forward.

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis E. Oliphant
Greg Ewing wrote:
 Travis Oliphant wrote:
 
 The 'bit' type re-intprets the size information to be in units of bits 
 and so implies a bit-field instead of another data-format.
 
 Hmmm, okay, but now you've got another orthogonality
 problem, because you can't distinguish between e.g.
 a 5-bit signed int field and a 5-bit unsigned int
 field.

Good point.

 
 It might be better not to consider bit to be a
 type at all, and come up with another way of indicating
 that the size is in bits. Perhaps
 
 'i4'   # 4-byte signed int
 'i4b'  # 4-bit signed int
 'u4'   # 4-byte unsigned int
 'u4b'  # 4-bit unsigned int
 

I like this.  Very nice.  I think that's the right way to look at it.

 (Next we can have an argument about whether bit
 fields should be packed MSB-to-LSB or vice versa...:-)

I guess we need another flag / attribute to indicate that.

The other thing that needs to be discussed at some point may be a way to 
indicate the floating-point format.  I've basically punted on this and 
just meant 'f' to mean platform float

Thus, you can't use the data-type object to pass information between two 
platforms that don't share a common floating point representation.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Travis E. Oliphant
M.-A. Lemburg wrote:
 Travis E. Oliphant wrote:
 
 I understand and that's why I'm asking why you made the range
 explicit in the definition.
 

In the case of NumPy it was so that String and Unicode arrays would both 
look like multi-length string character arrays and not arrays of 
arrays of some character.

But, this can change in the data-format object.  I can see that the 
Unicode description needs to be improved.

 The definition should talk about Unicode code points.
 The number of bytes then determines whether you can only
 represent the ASCII subset (1 byte), UCS2 (2 bytes, BMP only)
 or UCS4 (4 bytes, all currently assigned code points).

Yes, you are correct.  A string of unicode characters should really be 
represented in the same way that an array of integers is represented for 
a data-format object.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-30 Thread Martin v. Löwis
Travis Oliphant schrieb:
 So, the big difference is that I think data-formats should be 
 *instances* of a single type.

This is nearly the case for ctypes as well. All layout descriptions
are instances of the type type. Nearly, because they are instances
of subtypes of the type type:

py type(ctypes.c_long)
type '_ctypes.SimpleType'
py type(ctypes.c_double)
type '_ctypes.SimpleType'
py type(ctypes.c_double).__bases__
(type 'type',)
py type(ctypes.Structure)
type '_ctypes.StructType'
py type(ctypes.Array)
type '_ctypes.ArrayType'
py type(ctypes.Structure).__bases__
(type 'type',)
py type(ctypes.Array).__bases__
(type 'type',)

So if your requirement is all layout descriptions ought to have
the same type, then this is (nearly) the case: they are instances
of type (rather then datatype, as in your PEP).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Travis E. Oliphant
Greg Ewing wrote:
 Nick Coghlan wrote:
 I'd say the answer to where we put it will be dependent on what happens to 
 the 
 idea of adding a NumArray style fixed dimension array type to the standard 
 library. If that gets exposed through the array module as array.dimarray, 
 then 
 it would make sense to expose the associated data layout descriptors as 
 array.datatype.
 
 Seem to me that arrays are a sub-concept of binary data,
 not the other way around. So maybe both arrays and data
 types should be in a module called 'binary' or some such.

Yes, very good point.

That's probably one reason I'm proposing the data-type first before the 
array interface in the extended buffer protocol.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Travis E. Oliphant
Greg Ewing wrote:
 Travis E. Oliphant wrote:
 
 The 'kind' does not specify how big the data-type (data-format) is.
 
 What exactly does bit mean in that context?   

Do you mean big ?  It's how many bytes the kind is using.

So, 'u4' is a 4-byte unsigned integer and 'u2' is a 2-byte unsigned 
integer.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Travis E. Oliphant
Greg Ewing wrote:
 Nick Coghlan wrote:
 
 Greg Ewing wrote:
 
 Also, what if I want to refer to fields by name
 but don't want to have to work out all the offsets
 
 Use the list definition form. With the changes I've 
 suggested above, you wouldn't even have to name the fields you don't 
 care about - just describe them.
 
 That would be okay.
 
 I still don't see a strong justification for having a
 one-big-string form as well as a list/tuple/dict form,
 though.

Compaction of representation is all. It's used quite a bit in numarray, 
   which is where most of the 'kind' names came from as well.   When you 
don't want to name fields it is a really nice feature (but it doesn't 
nest well).

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
 What is needed is a definitive way to describe data and then have
 
 array
 struct
 ctypes
 
 all be compatible with that same method.That's why I'm proposing the 
 PEP.  It's a unification effort not yet-another-method.

As I unification mechanism, I think it is insufficient. I doubt it
can express all the concepts that ctypes supports.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Travis E. Oliphant
Greg Ewing wrote:
 Travis E. Oliphant wrote:
 
 How to handle unicode data-formats could definitely be improved. 
 Suggestions are welcome.
 
 'U4*10'  string of 10 4-byte Unicode chars
 

I like that.  Thanks.

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Robert Kern
Martin v. Löwis wrote:
 Travis E. Oliphant schrieb:
 What is needed is a definitive way to describe data and then have

 array
 struct
 ctypes

 all be compatible with that same method.That's why I'm proposing the 
 PEP.  It's a unification effort not yet-another-method.
 
 As I unification mechanism, I think it is insufficient. I doubt it
 can express all the concepts that ctypes supports.

What do you think is missing that can't be added?

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth.
   -- Umberto Eco

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
 How to handle unicode data-formats could definitely be improved. 

As before, I'm doubtful what the actual needs are. For example, is
it desired to support generation of ID3v2 tags with such a data
format? The tag is specified here:

http://www.id3.org/id3v2.4.0-structure.txt

In ID3v1, text fields have a specified width, and are supposed
to be encoded in Latin-1, and padded with zero bytes.

In ID3v2, text fields start with an encoding declaration
(say, \x03 for UTF-8), then followed with a null-terminated
sequence of UTF-8 bytes.

Is it the intent of this PEP to support such data structures,
and allow the user to fill in a Unicode object, and then the
processing is automatic? (i.e. in ID3v1, the string gets
automatically Latin-1-encoded and zero-padded, in ID3v2, it
gets automatically UTF-8 encoded, and null-terminated)

If that is not to be supported, what are the use cases?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Travis E. Oliphant
Martin v. Löwis wrote:
 Travis E. Oliphant schrieb:
 What is needed is a definitive way to describe data and then have

 array
 struct
 ctypes

 all be compatible with that same method.That's why I'm proposing the 
 PEP.  It's a unification effort not yet-another-method.
 
 As I unification mechanism, I think it is insufficient. I doubt it
 can express all the concepts that ctypes supports.
 

Please clarify what you mean.

Are you saying that a single object can't carry all the information 
about binary data that ctypes allows with it's multi-object approach?

I don't agree with you, if that is the case.  Sure, perhaps I've not 
included certain cases, so give an example.

Besides, I don't think this is the right view of unification.  I'm not 
saying that ctypes should get rid of it's many objects used for 
interfacing with C-functions.

I'm saying we should introduce a single-object mechanism for describing 
binary data so that the many-object approach of c-types does not become 
some kind of de-facto standard.  C-types can translate this 
object-instance to its internals if and when it needs to.

In the mean-time, how are other packages supposed to communicate binary 
information about data with each other?

Remember the context that the data-format object is presented in.  Two 
packages need to share a chunk of memory (the package authors do not 
know each other and only have and Python as a common reference).  They 
both want to describe that the memory they are sharing has some 
underlying binary structure.

How do they do that? Please explain to me how the buffer protocol can be 
extended so that information about what is in the memory can be shared 
without a data-format object?

-Travis


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Travis E. Oliphant
Martin v. Löwis wrote:
 Travis E. Oliphant schrieb:
 How to handle unicode data-formats could definitely be improved. 
 
 As before, I'm doubtful what the actual needs are. For example, is
 it desired to support generation of ID3v2 tags with such a data
 format? The tag is specified here:
 

Perhaps I was not clear enough about what I'm try to do.   For a long 
time a lot of people have wanted something like Numeric in Python 
itself.  There have been many hurdles to that goal.

After discussions at SciPy 2006 with Guido, we decided that the best way 
to proceed at this point was to extend the buffer protocol to allow 
packages to share array-like information with each-other.

There are several things missing from the buffer protocol that NumPy 
needs in order to be able to really understand the (fixed-size) memory 
another package has allocated and is sharing.

The most important of these is

1) Shape information
2) Striding information
3) Data-format information  (how is each element perceived).

Shape and striding information can be shared with a C-array of integers.

How is data-format information supposed to be shared?

We've come up with a very flexible way to do this in NumPy using a 
single Python object.  This Python object supports describing the layout 
of any fixed-size chunk of memory (right now in units of bytes --- bit 
fields could be added, though).

I'm proposing to add this object to Python so that the buffer protcol 
has a fast and efficient way to share #3.   That's really all I'm after.

It also bothers me that so many ways to describe binary data are being 
used out there.  This is a problem that deserves being solved.  And, no, 
ctypes hasn't solved it (we can't directly use the ctypes solution). 
Perhaps this PEP doesn't hit all the corners, but a data-format object 
*is* a useful thing to consider.

The array object in Python already has a PyArray_Descr * structure that 
is a watered-down version of what I'm talking about.   In fact, this is 
what Numeric built from (or vice-versa actually).  And NumPy has greatly 
enhanced this object for any conceivable structure.

Guido seemed to think the data-type objects were nice when he saw them 
at SciPy 2006, and so I'm presenting a PEP.

Without the data-format object, I'm don't know how to extend the buffer 
protocol to communicate data-format information.  Do you have a better 
idea?

I have no trouble limiting the data-type object to the buffer protocol 
extension PEP, but I do think it could gain wider use.

 
 Is it the intent of this PEP to support such data structures,
 and allow the user to fill in a Unicode object, and then the
 processing is automatic? (i.e. in ID3v1, the string gets
 automatically Latin-1-encoded and zero-padded, in ID3v2, it
 gets automatically UTF-8 encoded, and null-terminated)


No, the point of the data-format object is to communicate information 
about data-formats not to encode or decode anything.   Users of the 
data-format object could decide what they wanted to do with that 
information.   We just need a standard way to communicate it through the 
buffer protocol.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
 I'm proposing to add this object to Python so that the buffer protcol 
 has a fast and efficient way to share #3.   That's really all I'm after.

I admit that I don't understand this objective. Why is it desirable to
support such an extended buffer protocol? What specific application
would be made possible if it was available and implemented in the
relevant modules and data types? What are the relevant modules and data
types that should implement it?

 It also bothers me that so many ways to describe binary data are being 
 used out there.  This is a problem that deserves being solved.  And, no, 
 ctypes hasn't solved it (we can't directly use the ctypes solution). 
 Perhaps this PEP doesn't hit all the corners, but a data-format object 
 *is* a useful thing to consider.

IMO, it is only useful if it realistically can support all the use cases
that it intends to support. If this PEP is about defining the elements
of arrays, I doubt it can realistically support everything you can
express in ctypes. There is no support for pointers (except for
PyObject*), no support for incomplete (recursive) types, no support
for function pointers, etc.

Vice versa: why exactly can't you use the data type system of ctypes?
If I want to say int[10], I do

py ctypes.c_long * 10
class '__main__.c_long_Array_10'

To rewrite the examples from the PEP:

datatype(float) = ctypes.c_double
datatype(int)   = ctypes.c_long
datatype((int, 5)) = ctypes.c_long * 5
datatype((float, (3,2)) = (ctypes.c_double * 3) * 2

struct {
  int  simple;
  struct nested {
   char name[30];
   char addr[45];
   int  amount;
  }
=
py from ctypes import *
py class nested(Structure):
...  _fields_ = [(name, c_char*30), (addr, c_char*45), (amount,
c_long)]
...
py class struct(Structure):
...   _fields_ = [(simple, c_int), (nested, nested)]
...

 Guido seemed to think the data-type objects were nice when he saw them 
 at SciPy 2006, and so I'm presenting a PEP.

I have no objection to including NumArray as-is into Python. I just
wonder were the rationale for this PEP comes from, i.e. why do you
need to exchange this information across different modules?

 Without the data-format object, I'm don't know how to extend the buffer 
 protocol to communicate data-format information.  Do you have a better 
 idea?

See above: I can't understand where the need for an extended buffer
protocol comes from. I can see why NumArray needs reflection, and
needs to keep information to interpret the bytes in the array.
But why is it important that the same information is exposed by
other data types?

 Is it the intent of this PEP to support such data structures,
 and allow the user to fill in a Unicode object, and then the
 processing is automatic? (i.e. in ID3v1, the string gets
 automatically Latin-1-encoded and zero-padded, in ID3v2, it
 gets automatically UTF-8 encoded, and null-terminated)

 
 No, the point of the data-format object is to communicate information 
 about data-formats not to encode or decode anything.   Users of the 
 data-format object could decide what they wanted to do with that 
 information.   We just need a standard way to communicate it through the 
 buffer protocol.

This was actually a different sub-thread: why do you need to support
the 'U' code (or the 'S' code, for that matter)? In what application
do you have fixed size Unicode arrays, as opposed to Unicode strings?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
 As I unification mechanism, I think it is insufficient. I doubt it
 can express all the concepts that ctypes supports.

 
 Please clarify what you mean.
 
 Are you saying that a single object can't carry all the information 
 about binary data that ctypes allows with it's multi-object approach?

I'm not sure what you mean by single object. If I use the tuple
syntax, e.g.

datatype((float, (3,2))

There are also multiple objects (the float, the 3, and the 2). You
get a single root object back, but so do you in ctypes.

But this isn't really what I meant. Instead, I think the PEP lacks
various concepts from C data types, such as pointers, unions,
function pointers, alignment/packing.

 In the mean-time, how are other packages supposed to communicate binary 
 information about data with each other?

This is my other question. Why should they?

 Remember the context that the data-format object is presented in.  Two 
 packages need to share a chunk of memory (the package authors do not 
 know each other and only have and Python as a common reference).  They 
 both want to describe that the memory they are sharing has some 
 underlying binary structure.

Can you please give an example of such two packages, and an application
that needs them share data?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Martin v. Löwis
Robert Kern schrieb:
 As I unification mechanism, I think it is insufficient. I doubt it
 can express all the concepts that ctypes supports.
 
 What do you think is missing that can't be added?

I can factually only report what is missing. Whether it can be added,
I don't know. As I just wrote in a few other messages: pointers,
unions, functions pointers, packed structs, incomplete/recursive
types. Also flexible array members (i.e. open-ended arrays).

While it may be possible to come up with a string syntax to describe
all these things (*), I wonder whether it should be done, and whether
NumArray can then support this extended data model.

Regards,
Martin

(*) perhaps with the exception of incomplete types: C needs forward
references in its own syntax.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Edward C. Jones
Travis E. Oliphant wrote:
  It also bothers me that so many ways to describe binary data are
  being used out there.  This is a problem that deserves being solved.

Is there a survey paper somewhere about binary formats? What formats are 
used in particle physics, bio-informatics, astronomy, etc? What software 
is used to read and write binary data? What descriptive languages are 
used for data (SQL, XML, etc)?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Neal Becker
I have watched numpy with interest for a long time.  My own interest is to
possibly use the c-api to wrap c++ algorithms to use from python.

One thing that has concerned me, and continues to concern me with this
proposal, is that it seems to suffer from a very fat interface.  I
certainly have not studied the options in any depth, but my gut feeling is
that the interface is too fat and too complex.  I wonder if it's possible
to avoid this.  I wonder if this is an example of all the methods sinking
to the base class.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Paul Moore
On 10/29/06, Martin v. Löwis [EMAIL PROTECTED] wrote:
 Travis E. Oliphant schrieb:
  Remember the context that the data-format object is presented in.  Two
  packages need to share a chunk of memory (the package authors do not
  know each other and only have and Python as a common reference).  They
  both want to describe that the memory they are sharing has some
  underlying binary structure.

 Can you please give an example of such two packages, and an application
 that needs them share data?

Here's an example. PIL handles images (in various formats) in memory,
as blocks of binary image data. NumPy provides methods for
manipulating in-memory blocks of data. Now, if I want to use NumPy to
manipulate that data in place (for example, to cap the red component
at 128, and equalise the range of the green component) my code needs
to know the format of the memory block that PIL exposes. I am assuming
that in-place manipulation is better, because there is no need for
repeated copies of the data to be made (this would be true for large
images).

If PIL could expose a descriptor for its data structure, NumPy code
could manipulate it in place without fear of corrupting it. Of course,
this can be done by the end user reading the PIL documentation and
transcribing the documented format into the NumPy code. But I would
argue that it's better if the PIL block is self-describing in a way
that avoids the need for a manual transcription of the format.

To do this *without* needing the PIL and NumPy developers to
co-operate needs an independent standard, which is what I assume this
PEP is intended to provide.

Paul.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Josiah Carlson

Paul Moore [EMAIL PROTECTED] wrote:
 On 10/29/06, Martin v. Löwis [EMAIL PROTECTED] wrote:
  Travis E. Oliphant schrieb:
   Remember the context that the data-format object is presented in.  Two
   packages need to share a chunk of memory (the package authors do not
   know each other and only have and Python as a common reference).  They
   both want to describe that the memory they are sharing has some
   underlying binary structure.
 
  Can you please give an example of such two packages, and an application
  that needs them share data?
 
 To do this *without* needing the PIL and NumPy developers to
 co-operate needs an independent standard, which is what I assume this
 PEP is intended to provide.

One could also toss wxPython, VTK, or any one of the other GUI libraries
into the mix for visualizing those images, of which wxPython just
acquired no-copy display of PIL images, and being able to manipulate
them with numpy (of which some wxPython built in classes use numpy to
speed up manipulation) would be very useful.

Of all of the intended uses, I'd say that zero-copy sharing of
information on the graphics/visualization front is the most immediate
'people will be using it tomorrow' feature.


I personally don't have my pulse on the Scientific Python community, so
I don't know about other uses, but in regards to Martin's list of
missing features: pointers, unions, function pointers,
alignment/packing [, etc.] I'm going to go out on a limb and say for
the majority of those YAGNI, or really, NOHAFIAFACT (no one has asked
for it, as far as I can tell).  Someone who knows the scipy community,
feel free to correct me.


 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Martin v. Löwis
Paul Moore schrieb:
 Here's an example. PIL handles images (in various formats) in memory,
 as blocks of binary image data. NumPy provides methods for
 manipulating in-memory blocks of data. Now, if I want to use NumPy to
 manipulate that data in place (for example, to cap the red component
 at 128, and equalise the range of the green component) my code needs
 to know the format of the memory block that PIL exposes. I am assuming
 that in-place manipulation is better, because there is no need for
 repeated copies of the data to be made (this would be true for large
 images).

Thanks, that looks like a good example. Is it possible to elaborate
that? E.g. what specific image format would I use (could that work
for jpeg, even though this format has compression in it), and
what specific NumPy routines would I use to implement the capping
and equalising? What would the datatype description look like that
those tools need to exchange?

Looking at this in more detail, PIL in-memory images (ImagingCore
objects) either have the image8 UINT8**, or the image32 INT32**;
they have separate fields for pixelsize and linesize. In the image8
case, there are three options:
- each value is an 8-bit integer (IMAGING_TYPE_UINT8) (1)
- each value is a 16-bit integer, either little (2) or big endian (3)
  (IMAGING_TYPE_SPECIAL, mode either I;16 or I;16B)
In the image32 case, there are five options:
- two 8-bit values per four bytes, namely byte 0 and byte 3 (4)
- three 8-bit values (bytes 0, 1, 2) (5)
- four 8-bit values (6)
- a single 32-bit int (7)
- a single 32-bit float (8)

Now, what would be the algorithm in NumPy that I could use to
implement capping and equalising?

 If PIL could expose a descriptor for its data structure, NumPy code
 could manipulate it in place without fear of corrupting it. Of course,
 this can be done by the end user reading the PIL documentation and
 transcribing the documented format into the NumPy code. But I would
 argue that it's better if the PIL block is self-describing in a way
 that avoids the need for a manual transcription of the format.

Without digging further, I think some of the formats simply don't allow
for the kind of manipulation you suggest, namely all palette formats
(which are the single-valued ones, plus the two-band version with
 a palette number and an alpha value), and greyscale images. So
in any case, the application has to look at the mode of the image to
find out whether the operation is even meaningful. And then, the
application has to tell NumPy somehow what fields to operate on.

 To do this *without* needing the PIL and NumPy developers to
 co-operate needs an independent standard, which is what I assume this
 PEP is intended to provide.

Ok, I now understand the goal, although I still like to understand
this usecase better.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Martin v. Löwis
Josiah Carlson schrieb:
 One could also toss wxPython, VTK, or any one of the other GUI libraries
 into the mix for visualizing those images, of which wxPython just
 acquired no-copy display of PIL images, and being able to manipulate
 them with numpy (of which some wxPython built in classes use numpy to
 speed up manipulation) would be very useful.

I'm doubtful that this PEP alone would allow zero-copy sharing of images
for display. Often, the libraries need the data in a different format.
So they need to copy, even if they could understand the other format.
However, the PEP won't allow understanding the format. If I know I
have an array of 4-byte values: which of them is R, G, B, and A?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Josiah Carlson

Martin v. Löwis [EMAIL PROTECTED] wrote:
 Josiah Carlson schrieb:
  One could also toss wxPython, VTK, or any one of the other GUI libraries
  into the mix for visualizing those images, of which wxPython just
  acquired no-copy display of PIL images, and being able to manipulate
  them with numpy (of which some wxPython built in classes use numpy to
  speed up manipulation) would be very useful.
 
 I'm doubtful that this PEP alone would allow zero-copy sharing of images
 for display. Often, the libraries need the data in a different format.
 So they need to copy, even if they could understand the other format.
 However, the PEP won't allow understanding the format. If I know I
 have an array of 4-byte values: which of them is R, G, B, and A?

...in the cases I have seen, which includes BMP, TGA, uncompressed TIFF,
a handful of platform-specific bitmap formats, etc.,  you _always_ get
them in RGBA order.  If the alpha channel is to be left out, then you
get them as RGB.

The trick with allowing zero-copy sharing is 1) to understand the format,
and 2) to manipulate/display in-place.  The former is necessary for the
latter, which is what Travis is shooting for.  Also, because wxPython
has figured out how PIL images are structured, they can do #2, and so
far no one has mentioned any examples where the standard RGB/RGBA format
hasn't worked for them.

In the case of jpegs (as you mentioned in another message), PIL
uncompresses all images it understands into some kind of 'natural'
format (from what I understand). For 24/32 bit images, that is RGB or
RGBA. For palletized images (gif, 8-bit png, 8-bit bmp, etc.) maybe it
is a palletized format, or maybe it is RGB/RGBA?  I don't know, all of
my images are 24/32 bit, but I can just about guarantee it's not an
issue for the case that Paul mentioned.


 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Greg Ewing
Travis E. Oliphant wrote:

 Greg Ewing wrote:

What exactly does bit mean in that context?   
 
 Do you mean big ?

No, you've got a data type there called bit,
which seems to imply a size, in contradiction
to the size-independent nature of the other
types. I'm asking what size-independent
information it's meant to convey.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Greg Ewing
Travis E. Oliphant wrote:
 Martin v. Löwis wrote:
 
Travis E. Oliphant schrieb:

Is it the intent of this PEP to support such data structures,
and allow the user to fill in a Unicode object, and then the
processing is automatic?

 No, the point of the data-format object is to communicate information 
 about data-formats not to encode or decode anything.

Well, there's still the issue of how much detail you
want to be able to convey, so I think the question
is valid. Is the encoding of a Unicode string something
we want to be able to communicate via this mechanism,
or is that outside its scope?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-29 Thread Greg Ewing
Josiah Carlson wrote:

 ...in the cases I have seen ... you _always_ get
 them in RGBA order.

Except when you don't. I've had cases where I've had to
convert between RGBA and BGRA (for stuffing directly into
a frame buffer on Linux, as far as I remember).

So it may be worth including some features in the standard
for describing pixel formats.

Pygame seems to have a very detailed and flexible system
for doing this, so it might be a good idea to have a
look at that.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread Travis E. Oliphant
Martin v. Löwis wrote:
 Travis E. Oliphant schrieb:
 The datatype is an object that specifies how a certain block of
 memory should be interpreted as a basic data-type. 

datatype(float)
   datatype('float64')
 
 I can't speak on the specific merits of this proposal, or whether this
 kind of functionality is desirable. However, I'm -1 on the addition of
 a builtin for this functionality (the PEP doesn't actually say that
 there is another builtin, but the examples suggest so).

I was intentionally vague.  I don't see a need for it to be a built-in, 
but didn't know where exactly to put it,  I should have made it a 
question for discussion.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread Travis E. Oliphant
Greg Ewing wrote:
 Travis E. Oliphant wrote:
 PEP: unassigned
 Title: Adding data-type objects to the standard library
 
 Not sure about having 3 different ways to specify
 the structure -- it smacks of Too Many Ways To Do
 It to me.

You might be right, but they all have use-cases.  I've actually removed 
most of the multiple ways that NumPy allows for creating data-types.

 
 Also, what if I want to refer to fields by name
 but don't want to have to work out all the offsets

I don't know what you mean.   You just use the list-style to define a 
data-format with fields.  The offsets are worked out for you.   The only 
use for offsets was the dictionary form.  The dictionary form stems from 
a desire to use the fields dictionary of a data-type as a data-type 
specification (which it is essentially is).

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread M.-A. Lemburg
Travis E. Oliphant wrote:
 
 
 
 
 PEP: unassigned
 Title: Adding data-type objects to the standard library
   Attributes
 
  kind  --  returns the basic kind of the data-type. The basic kinds
  are:
't' - bit, 
'b' - bool, 
'i' - signed integer, 
'u' - unsigned integer,
'f' - floating point,  
'c' - complex floating point, 
'S' - string (fixed-length sequence of char),
'U' - fixed length sequence of UCS4,

Shouldn't this read fixed length sequence of Unicode ?!
The underlying code unit format (UCS2 and UCS4) depends on the
Python version.

'O' - pointer to PyObject,
'V' - Void (anything else).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 28 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread Armin Rigo
Hi Travis,

On Fri, Oct 27, 2006 at 02:05:31PM -0600, Travis E. Oliphant wrote:
 This PEP proposes adapting the data-type objects from NumPy for
 inclusion in standard Python, to provide a consistent and standard
 way to discuss the format of binary data. 

How does this compare with ctypes?  Do we really need yet another,
incompatible way to describe C-like data structures in the standard
library?


A bientot,

Armin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread Josiah Carlson

M.-A. Lemburg [EMAIL PROTECTED] wrote:
 
 Travis E. Oliphant wrote:
  M.-A. Lemburg wrote:
  Travis E. Oliphant wrote:
  
 
  PEP: unassigned
  Title: Adding data-type objects to the standard library
Attributes
 
   kind  --  returns the basic kind of the data-type. The basic 
  kinds
   are:
 't' - bit, 
 'b' - bool, 
 'i' - signed integer, 
 'u' - unsigned integer,
 'f' - floating point,  
 'c' - complex floating point, 
 'S' - string (fixed-length sequence of char),
 'U' - fixed length sequence of UCS4,
  Shouldn't this read fixed length sequence of Unicode ?!
  The underlying code unit format (UCS2 and UCS4) depends on the
  Python version.
  
  Well, in NumPy 'U' always means UCS4.  So, I just copied that over.  See 
  my questions at the bottom which talk about how to handle this.  A 
  data-format does not necessarily have to correspond to something Python 
  represents with an Object.
 
 Ok, but why are you being specific about UCS4 (which is an internal
 storage format), while you are not specific about e.g. the
 internal bit size of the integers (which could be 32 or 64 bit) ?

I think that even on 64 bit platforms, using 'int' or 'long' generally
means 32 bit.  In order to get 64 bit ints, one needs to use 'long long'. 
Sharing some of the codes with the struct module, though arbitrary,
doesn't seem like a bad idea to me.  Of course offering specifically 32
and 64 bit ints would make sense to me.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread Travis E. Oliphant
M.-A. Lemburg wrote:
 Travis E. Oliphant wrote:
 M.-A. Lemburg wrote:
 Travis E. Oliphant wrote:
 

 PEP: unassigned
 Title: Adding data-type objects to the standard library
   Attributes

  kind  --  returns the basic kind of the data-type. The basic 
 kinds
  are:
't' - bit, 
'b' - bool, 
'i' - signed integer, 
'u' - unsigned integer,
'f' - floating point,  
'c' - complex floating point, 
'S' - string (fixed-length sequence of char),
'U' - fixed length sequence of UCS4,
 Shouldn't this read fixed length sequence of Unicode ?!
 The underlying code unit format (UCS2 and UCS4) depends on the
 Python version.
 Well, in NumPy 'U' always means UCS4.  So, I just copied that over.  See 
 my questions at the bottom which talk about how to handle this.  A 
 data-format does not necessarily have to correspond to something Python 
 represents with an Object.
 
 Ok, but why are you being specific about UCS4 (which is an internal
 storage format), while you are not specific about e.g. the
 internal bit size of the integers (which could be 32 or 64 bit) ?
 

The 'kind' does not specify how big the data-type (data-format) is.  A 
number is needed to represent the number of bytes.

In this case, the 'kind' does not specify how large the data-type is. 
You can have 'u1', 'u2', 'u4', etc.

The same is true with Unicode.  You can have 10-character unicode 
elements, 20-character, etc.  But, we have to be clear about what a 
character is in the data-format.

-Travis




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread Travis E. Oliphant
Armin Rigo wrote:
 Hi Travis,
 
 On Fri, Oct 27, 2006 at 02:05:31PM -0600, Travis E. Oliphant wrote:
 This PEP proposes adapting the data-type objects from NumPy for
 inclusion in standard Python, to provide a consistent and standard
 way to discuss the format of binary data. 
 
 How does this compare with ctypes?  Do we really need yet another,
 incompatible way to describe C-like data structures in the standard
 library?

Part of what the data-type, data-format object is trying to do is bring 
together all the disparate ways to represent data that *already* exists 
in the standard library.

What is needed is a definitive way to describe data and then have

array
struct
ctypes

all be compatible with that same method.That's why I'm proposing the 
PEP.  It's a unification effort not yet-another-method.  One of the big 
reasons for it is to move something like the array interface into 
Python.  There are tens to hundreds of people mostly in the scientific 
computing community that want to see Python grow more support for 
NumPy-like things.  I keep getting requests to do something to make 
Python more aware of arrays.   This PEP is part of that effort.

In particular, something like the array interface should be available in 
Python.  The easiest way to do this is to extend the buffer protocol to 
allow objects to share information about shape, strides, and data-format 
of a block of memory.

But, how do you represent data-format in Python?  What will the objects 
pass back and forth to each other to do it?  C-types has a solution 
which creates multiple objects to do it.  This is an un-wieldy 
over-complicated solution for the array interface.

The array objects have a solution using the a single object that carries 
the data-format information. The solution we have for arrays deserves 
consideration.  It could be placed inside the array module if desired, 
but again, I'm really looking for something that would allow the extend 
buffer protocol (to be proposed soon) to share data-type information.

That could be done with the array-interface objects (strings, lists, and 
tuples), but then every body who uses the interface will have to write 
their own decoders to process the data-format information.

I actually think ctypes would benefit from this data-format 
specification too.

Recognizing all these diverging ways to essentially talk about the same 
thing is part of what prompted this PEP.


-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
 In this case, the 'kind' does not specify how large the data-type is. 
 You can have 'u1', 'u2', 'u4', etc.
 
 The same is true with Unicode.  You can have 10-character unicode 
 elements, 20-character, etc.  But, we have to be clear about what a 
 character is in the data-format.

That is certainly confusing. In u1, u2, u4, the digit seems to indicate
the size of a single value (1 byte, 2 bytes, 4 bytes). Right? Yet,
in U20, it does *not* indicate the size of a single value but of an
array? And then, it's not the size, but the number of elements?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread M.-A. Lemburg
Travis E. Oliphant wrote:
 M.-A. Lemburg wrote:
 Travis E. Oliphant wrote:
 M.-A. Lemburg wrote:
 Travis E. Oliphant wrote:
 

 PEP: unassigned
 Title: Adding data-type objects to the standard library
   Attributes

  kind  --  returns the basic kind of the data-type. The basic 
 kinds
  are:
't' - bit, 
'b' - bool, 
'i' - signed integer, 
'u' - unsigned integer,
'f' - floating point,  
'c' - complex floating point, 
'S' - string (fixed-length sequence of char),
'U' - fixed length sequence of UCS4,
 Shouldn't this read fixed length sequence of Unicode ?!
 The underlying code unit format (UCS2 and UCS4) depends on the
 Python version.
 Well, in NumPy 'U' always means UCS4.  So, I just copied that over.  See 
 my questions at the bottom which talk about how to handle this.  A 
 data-format does not necessarily have to correspond to something Python 
 represents with an Object.
 Ok, but why are you being specific about UCS4 (which is an internal
 storage format), while you are not specific about e.g. the
 internal bit size of the integers (which could be 32 or 64 bit) ?

 
 The 'kind' does not specify how big the data-type (data-format) is.  A 
 number is needed to represent the number of bytes.
 
 In this case, the 'kind' does not specify how large the data-type is. 
 You can have 'u1', 'u2', 'u4', etc.
 
 The same is true with Unicode.  You can have 10-character unicode 
 elements, 20-character, etc.  But, we have to be clear about what a 
 character is in the data-format.

I understand and that's why I'm asking why you made the range
explicit in the definition.

The definition should talk about Unicode code points.
The number of bytes then determines whether you can only
represent the ASCII subset (1 byte), UCS2 (2 bytes, BMP only)
or UCS4 (4 bytes, all currently assigned code points).

This is similar to the range for integers (ie. ZZ_0), where
the number of bytes determines the range of numbers that can
be represented.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 28 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread Travis E. Oliphant
Martin v. Löwis wrote:
 Travis E. Oliphant schrieb:
 In this case, the 'kind' does not specify how large the data-type is. 
 You can have 'u1', 'u2', 'u4', etc.

 The same is true with Unicode.  You can have 10-character unicode 
 elements, 20-character, etc.  But, we have to be clear about what a 
 character is in the data-format.
 
 That is certainly confusing. In u1, u2, u4, the digit seems to indicate
 the size of a single value (1 byte, 2 bytes, 4 bytes). Right? Yet,
 in U20, it does *not* indicate the size of a single value but of an
 array? And then, it's not the size, but the number of elements?
 

Good point.  In NumPy, unicode support was added in parallel with 
string arrays where there is not the ambiguity.   So, yes, it's true 
that the unicode case is a special-case.

The other way to handle it would be to describe the 'code'-point size 
(i.e. 'U1', 'U2', 'U4' for UCS-1, UCS-2, UCS-4) and then have the length 
be encoded as an array of those types.

This was not the direction we took with NumPy (which is what I'm using 
as a reference) because I wanted Unicode and string arrays to look the 
same and thought of strings differently.

How to handle unicode data-formats could definitely be improved. 
Suggestions are welcome.


-Travis



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread Greg Ewing
Nick Coghlan wrote:

 Greg Ewing wrote:

 Also, what if I want to refer to fields by name
 but don't want to have to work out all the offsets

 Use the list definition form. With the changes I've 
 suggested above, you wouldn't even have to name the fields you don't 
 care about - just describe them.

That would be okay.

I still don't see a strong justification for having a
one-big-string form as well as a list/tuple/dict form,
though.

--
Greg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread Greg Ewing
Nick Coghlan wrote:
 I'd say the answer to where we put it will be dependent on what happens to 
 the 
 idea of adding a NumArray style fixed dimension array type to the standard 
 library. If that gets exposed through the array module as array.dimarray, 
 then 
 it would make sense to expose the associated data layout descriptors as 
 array.datatype.

Seem to me that arrays are a sub-concept of binary data,
not the other way around. So maybe both arrays and data
types should be in a module called 'binary' or some such.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread Greg Ewing
Travis E. Oliphant wrote:

 The 'kind' does not specify how big the data-type (data-format) is.

What exactly does bit mean in that context?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-28 Thread Greg Ewing
Travis E. Oliphant wrote:

 How to handle unicode data-formats could definitely be improved. 
 Suggestions are welcome.

'U4*10'  string of 10 4-byte Unicode chars

Then for consistency you'd want 'S*10' rather than
just 'S10' (or at least allow it as an alternative).

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP: Adding data-type objects to Python

2006-10-27 Thread Travis E. Oliphant


PEP: unassigned
Title: Adding data-type objects to the standard library
Version: $Revision: $
Last-Modified: $Date:  $
Author: Travis Oliphant [EMAIL PROTECTED]
Status: Draft
Type: Standards Track
Created: 05-Sep-2006
Python-Version: 2.6

Abstract

This PEP proposes adapting the data-type objects from NumPy for
inclusion in standard Python, to provide a consistent and standard
way to discuss the format of binary data. 

Rationale

There are many situations crossing multiple areas where an
interpretation is needed of binary data in terms of fundamental
data-types such as integers, floating-point, and complex
floating-point values.  Having a common object that carries
information about binary data would be beneficial to many
people. The creation of data-type objects in NumPy to carry the
load of describing what each element of the array contains
represents an evolution of a solution that began with the
PyArray_Descr structure in Python's own array object.  These
data-type objects can represent arbitrary byte data.  Currently
such information is usually constructed using strings and
character codes which is unwieldy when a data-type consists of
nested structures.

Proposal

Add a PyDatatypeObject in Python (adapted from NumPy's dtype
object which evolved from the PyArray_Descr structure in Python's
array module) that holds information about a data-type.  This object
will allow packages to exchange information about binary data in
a uniform way (see the extended buffer protocol PEP for an application
to exchanging information about array data). 

Specification

The datatype is an object that specifies how a certain block of
memory should be interpreted as a basic data-type. In addition to
being able to describe basic data-types, the data-type object can
describe a data-type that is itself an array of other data-types
as well as a data-type that contains arbitrary fields (structure
members) which are located at specific offsets. In its most basic
form, however, a data-type is of a particular kind (bit, bool,
int, uint, float, complex, object, string, unicode, void) and size.

Datatype objects can be created using either a type-object, a
string, a tuple, a list, or a dictionary according to the following
constructors:

Type-object: 

  For a select set of type-objects a data-type object describing that
  basic type can be described:

  Examples: 

   datatype(float)
  datatype('float64')
  
   datatype(int)
  datatype('int32')  # on 32-bit platform (64 if c-long is 64-bits)

Tuple-object
   
  A tuple of length 2 can be used to specify a data-type that is
  an array of another kind of basic data-type (this array always
  describes a C-contiguous array).

  Examples: 

   datatype((int, 5))
  datatype(('int32', (5,)))
  # describes a 5*4=20-byte block of memory laid out as 
  #  a[0], a[1], a[2], a[3], a[4]

   datatype((float, (3,2))
  datatype(('float64', (3,2))   
  # describes a 3*2*8=48 byte block of memory that should be
  # interpreted as 6 doubles laid out as arr[0,0], arr[0,1],
  # ... a[2,0], a[1,2]


String-object:
 
  The basic format is '%s%s%s%d' % (endian, shape, kind, itemsize) 

 kind : one of the basic array kinds given below. 
 
 itemsize : the nubmer of bytes (or bits for 't' kind) for 
 this data-type.  

 endian   : either '', '=' (native), '|' (doesn't matter),
 '' (big-endian) or '' (little-endian).

 shape: either '', or a shape-tuple describing a data-type that
 is an array of the given shape.

  A string can also be a comma-separated sequence of basic
  formats. The result will be a data-type with default field
  names: 'f0', 'f1', ..., 'fn'.

  Examples: 

   datatype('u4')
  datatype('uint32')

   datatype('f4')
  datatype('float32')

   datatype('(3,2)f4')
  datatype(('float32', (3,2))

   datatype('(5,)i4, (3,2)f4, S5')
  datatype([('f0', 'i4', (5,)), ('f1', 'f4', (3, 2)), ('f2', '|S5')])


List-object:

  A list should be a list of tuples where each tuple describes a
  field. Each tuple should contain (name, datatype{, shape}) or
  ((meta-info, name), datatype{, shape}) in order to specify the
  data-type. 

  This list must fully specify the data-type (no memory holes). If
  would would like to return a data-type with memory holes where the
  compiler would place them, then pass the keyword align=1 to this
  construction.  This will result in un-named fields of Void kind of
  the correct size interspersed where needed.

  Examples: 

  datatype([( ([1,2],'coords'), 'f4', (3,6)), ('address', 'S30')])

  A data-type that could represent the 

Re: [Python-Dev] PEP: Adding data-type objects to Python

2006-10-27 Thread Martin v. Löwis
Travis E. Oliphant schrieb:
 The datatype is an object that specifies how a certain block of
 memory should be interpreted as a basic data-type. 
 
datatype(float)
   datatype('float64')

I can't speak on the specific merits of this proposal, or whether this
kind of functionality is desirable. However, I'm -1 on the addition of
a builtin for this functionality (the PEP doesn't actually say that
there is another builtin, but the examples suggest so). Instead, putting
it into the sys, array, struct, or ctypes modules might be more
appropriate, as might be the introduction of another module.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   >