Re: [Numpy-discussion] Automatic string length in recarray

2009-11-04 Thread Thomas Robitaille


Pierre GM-2 wrote:
 
 As a workwaround, perhaps you could use np.object instead of np.str  
 while defining your array. You can then get the maximum string length  
 by looping, as David suggested, and then use .astype to transform your  
 array...
 

I tried this:

np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),('b',np.object_)])

but I get a TypeError:

---
TypeError Traceback (most recent call last)

/Users/tom/ipython console in module()

/Users/tom/Library/Python/2.6/site-packages/numpy/core/records.pyc in
fromrecords(recList, dtype, shape, formats, names, titles, aligned,
byteorder)
625 res = retval.view(recarray)
626 
-- 627 res.dtype = sb.dtype((record, res.dtype))
628 return res
629 

/Users/tom/Library/Python/2.6/site-packages/numpy/core/records.pyc in
__setattr__(self, attr, val)
432 if attr not in fielddict:
433 exctype, value = sys.exc_info()[:2]
-- 434 raise exctype, value
435 else:
436 fielddict =
ndarray.__getattribute__(self,'dtype').fields or {}

TypeError: Cannot change data-type for object array.

Is this a bug?

Thanks,

Thomas
-- 
View this message in context: 
http://old.nabble.com/Automatic-string-length-in-recarray-tp26174810p26199762.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic string length in recarray

2009-11-04 Thread Dan Yamins
On Tue, Nov 3, 2009 at 11:43 AM, David Warde-Farley d...@cs.toronto.eduwrote:

 On 2-Nov-09, at 11:35 PM, Thomas Robitaille wrote:

  But if I want to specify the data types:
 
  np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),
  ('b',np.str)])
 
  the string field is set to a length of zero:
 
  rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')])
 
  I need to specify datatypes for all numerical types since I care about
  int8/16/32, etc, but I would like to benefit from the auto string
  length detection that works if I don't specify datatypes. I tried
  replacing np.str by None but no luck. I know I can specify '|S5' for
  example, but I don't know in advance what the string length should be
  set to.

 This is a limitation of the way the dtype code works, and AFAIK
 there's no easy fix. In some code I wrote recently I had to loop
 through the entire list of records i.e. max(len(foo[2]) for foo in
 records).


Not to shamelessly plug my own project ... but more robust string type
detection is one of the features  of Tabular (
http://bitbucket.org/elaine/tabular/), and is one of the (kinds of) reasons
we wrote the package.  Perhaps using Tabular could be useful to you?

Dan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic string length in recarray

2009-11-04 Thread Pierre GM

On Nov 4, 2009, at 11:35 AM, Thomas Robitaille wrote:



 Pierre GM-2 wrote:

 As a workwaround, perhaps you could use np.object instead of np.str
 while defining your array. You can then get the maximum string length
 by looping, as David suggested, and then use .astype to transform  
 your
 array...


 I tried this:

 np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8), 
 ('b',np.object_)])

 but I get a TypeError:

Confirmed, it's a bug all right. Would you mind opening a ticket ?  
I'll try to take care of that in the next few days.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic string length in recarray

2009-11-04 Thread Thomas Robitaille


Pierre GM-2 wrote:
 
 Confirmed, it's a bug all right. Would you mind opening a ticket ?  
 I'll try to take care of that in the next few days.
 

Done - http://projects.scipy.org/numpy/ticket/1283

Thanks!

Thomas

-- 
View this message in context: 
http://old.nabble.com/Automatic-string-length-in-recarray-tp26174810p26203110.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic string length in recarray

2009-11-03 Thread David Warde-Farley
On 2-Nov-09, at 11:35 PM, Thomas Robitaille wrote:

 But if I want to specify the data types:

 np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),
 ('b',np.str)])

 the string field is set to a length of zero:

 rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')])

 I need to specify datatypes for all numerical types since I care about
 int8/16/32, etc, but I would like to benefit from the auto string
 length detection that works if I don't specify datatypes. I tried
 replacing np.str by None but no luck. I know I can specify '|S5' for
 example, but I don't know in advance what the string length should be
 set to.

This is a limitation of the way the dtype code works, and AFAIK  
there's no easy fix. In some code I wrote recently I had to loop  
through the entire list of records i.e. max(len(foo[2]) for foo in  
records).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic string length in recarray

2009-11-03 Thread Pierre GM

On Nov 3, 2009, at 11:43 AM, David Warde-Farley wrote:

 On 2-Nov-09, at 11:35 PM, Thomas Robitaille wrote:

 But if I want to specify the data types:

 np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8),
 ('b',np.str)])

 the string field is set to a length of zero:

 rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')])

 I need to specify datatypes for all numerical types since I care  
 about
 int8/16/32, etc, but I would like to benefit from the auto string
 length detection that works if I don't specify datatypes. I tried
 replacing np.str by None but no luck. I know I can specify '|S5' for
 example, but I don't know in advance what the string length should be
 set to.

 This is a limitation of the way the dtype code works, and AFAIK
 there's no easy fix. In some code I wrote recently I had to loop
 through the entire list of records i.e. max(len(foo[2]) for foo in
 records).

As a workwaround, perhaps you could use np.object instead of np.str  
while defining your array. You can then get the maximum string length  
by looping, as David suggested, and then use .astype to transform your  
array...

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Automatic string length in recarray

2009-11-02 Thread Thomas Robitaille
Hi,

I'm having trouble with creating np.string_ fields in recarrays. If I  
create a recarray using

np.rec.fromrecords([(1,'hello'),(2,'world')],names=['a','b'])

the result looks fine:

rec.array([(1, 'hello'), (2, 'world')], dtype=[('a', 'i8'), ('b', '| 
S5')])

But if I want to specify the data types:

np.rec.fromrecords([(1,'hello'),(2,'world')],dtype=[('a',np.int8), 
('b',np.str)])

the string field is set to a length of zero:

rec.array([(1, ''), (2, '')], dtype=[('a', '|i1'), ('b', '|S0')])

I need to specify datatypes for all numerical types since I care about  
int8/16/32, etc, but I would like to benefit from the auto string  
length detection that works if I don't specify datatypes. I tried  
replacing np.str by None but no luck. I know I can specify '|S5' for  
example, but I don't know in advance what the string length should be  
set to.

Is there a way to solve this problem without manually examining the  
data that is being passed to rec.fromrecords?

Thanks for any help,

Thomas
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion