Re: [Numpy-discussion] loadtxt() behavior on single-line files

2010-07-27 Thread Benjamin Root
On Thu, Jun 24, 2010 at 1:53 PM, Benjamin Root ben.r...@ou.edu wrote:

 On Thu, Jun 24, 2010 at 1:00 PM, Warren Weckesser 
 warren.weckes...@enthought.com wrote:

 Benjamin Root wrote:
  Hi,
 
  I was having the hardest time trying to figure out an intermittent bug
  in one of my programs.  Essentially, in some situations, it was
  throwing an error saying that the array object was not an array.  It
  took me a while, but then I figured out that my program was assuming
  that the object returned from a loadtxt() call was always a structured
  array (I was using dtypes).  However, if the data file being loaded
  only had one data record, then all you get back is a structured record.
 
  import numpy as np
  from StringIO import StringIO
 
  strData = StringIO(89.23 47.2\n13.2 42.2)
  a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
  print Length Two
  print a
  print a.shape
  print len(a)
 
  strData = StringIO(53.2 49.2)
  a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
  print \n\nLength One
  print a
  print a.shape
  try :
  print len(a)
  except TypeError as err
  print ERROR:, err
 
  Which gets me this output:
 
  Length Two
  [(89.234, 47.203)
   (13.199, 42.203)]
  (2,)
  2
 
 
  Length One
  (53.203, 49.203)
  ()
  ERROR: len() of unsized object
 
 
  Note that this isn't restricted to structured arrays.  For regular
  ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():

 Exactly.  The last four lines of the function are:

X = np.squeeze(X)
if unpack:
return X.T
else:
return X

 
   a = np.ones((1, 1, 1))
   np.squeeze(a)[0]
  IndexError: 0-d arrays can't be indexed
 
   strData = StringIO(53.2)
   a = np.loadtxt(strData)
   a[0]
  IndexError: 0-d arrays can't be indexed
 
  So, if you have multiple lines with multiple columns, you get a 2-D
  array, as expected.
  if you have a single line of data with multiple columns, you get a 1-D
  array.
  If you have a single column with many lines, you also get a 1-D array
  (which is probably expected, I guess).
  If you have a single column with a single line, you get a scalar
  (actually, a 0-D array).
 
  Is this a bug or a feature?  I can see the advantages of having
  loadtxt() returning the lowest # of dimensions that can hold the given
  data, but it leaves the code vulnerable to certain edge cases.  Maybe
  there is a different way I should be doing this, but I feel that this
  behavior at the very least should be included in the loadtxt
  documentation.
 

 It would be useful to be able to tell loadtxt to not call squeeze, so a
 program that reads column-formatted data doesn't have to treat the case
 of a single line specially.

 Warren


 I don't know if that is the best way to solve the problem.  In that case,
 you would always get a 2-D array, right?  Is that useful for those who have
 text data as a single column?  Maybe a mindim keyword (with None as default)
 and apply an appropriate atleast_Nd() call (or maybe have available an
 .atleast_nd() function?).  But, then what would this mean for structured
 arrays?  One might think that they want at least 2-D, but they really want
 at least 1-D.

 Ben Root

 P.S. - Taking this a step further, the functions completely fail in dealing
 with empty files...  In MATLAB, it returns an empty array (matrix?).


I am reviving this dead thread to note that I have filed ticket #1562 on
the numpy Trac about this issue: http://projects.scipy.org/numpy/ticket/1562

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] loadtxt() behavior on single-line files

2010-06-24 Thread Benjamin Root
Hi,

I was having the hardest time trying to figure out an intermittent bug in
one of my programs.  Essentially, in some situations, it was throwing an
error saying that the array object was not an array.  It took me a while,
but then I figured out that my program was assuming that the object returned
from a loadtxt() call was always a structured array (I was using dtypes).
However, if the data file being loaded only had one data record, then all
you get back is a structured record.

import numpy as np
from StringIO import StringIO

strData = StringIO(89.23 47.2\n13.2 42.2)
a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
print Length Two
print a
print a.shape
print len(a)

strData = StringIO(53.2 49.2)
a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
print \n\nLength One
print a
print a.shape
try :
print len(a)
except TypeError as err
print ERROR:, err

Which gets me this output:

Length Two
[(89.234, 47.203)
 (13.199, 42.203)]
(2,)
2


Length One
(53.203, 49.203)
()
ERROR: len() of unsized object


Note that this isn't restricted to structured arrays.  For regular ndarrays,
loadtxt() appears to mimic the behavior of np.squeeze():

 a = np.ones((1, 1, 1))
 np.squeeze(a)[0]
IndexError: 0-d arrays can't be indexed

 strData = StringIO(53.2)
 a = np.loadtxt(strData)
 a[0]
IndexError: 0-d arrays can't be indexed

So, if you have multiple lines with multiple columns, you get a 2-D array,
as expected.
if you have a single line of data with multiple columns, you get a 1-D
array.
If you have a single column with many lines, you also get a 1-D array (which
is probably expected, I guess).
If you have a single column with a single line, you get a scalar (actually,
a 0-D array).

Is this a bug or a feature?  I can see the advantages of having loadtxt()
returning the lowest # of dimensions that can hold the given data, but it
leaves the code vulnerable to certain edge cases.  Maybe there is a
different way I should be doing this, but I feel that this behavior at the
very least should be included in the loadtxt documentation.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt() behavior on single-line files

2010-06-24 Thread Warren Weckesser
Benjamin Root wrote:
 Hi,

 I was having the hardest time trying to figure out an intermittent bug 
 in one of my programs.  Essentially, in some situations, it was 
 throwing an error saying that the array object was not an array.  It 
 took me a while, but then I figured out that my program was assuming 
 that the object returned from a loadtxt() call was always a structured 
 array (I was using dtypes).  However, if the data file being loaded 
 only had one data record, then all you get back is a structured record.

 import numpy as np
 from StringIO import StringIO

 strData = StringIO(89.23 47.2\n13.2 42.2)
 a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
 print Length Two
 print a
 print a.shape
 print len(a)

 strData = StringIO(53.2 49.2)
 a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
 print \n\nLength One
 print a
 print a.shape
 try :
 print len(a)
 except TypeError as err
 print ERROR:, err

 Which gets me this output:

 Length Two
 [(89.234, 47.203)
  (13.199, 42.203)]
 (2,)
 2


 Length One
 (53.203, 49.203)
 ()
 ERROR: len() of unsized object


 Note that this isn't restricted to structured arrays.  For regular 
 ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():

Exactly.  The last four lines of the function are:

X = np.squeeze(X)
if unpack:
return X.T
else:
return X


  a = np.ones((1, 1, 1))
  np.squeeze(a)[0]
 IndexError: 0-d arrays can't be indexed

  strData = StringIO(53.2)
  a = np.loadtxt(strData)
  a[0]
 IndexError: 0-d arrays can't be indexed

 So, if you have multiple lines with multiple columns, you get a 2-D 
 array, as expected.
 if you have a single line of data with multiple columns, you get a 1-D 
 array.
 If you have a single column with many lines, you also get a 1-D array 
 (which is probably expected, I guess).
 If you have a single column with a single line, you get a scalar 
 (actually, a 0-D array).

 Is this a bug or a feature?  I can see the advantages of having 
 loadtxt() returning the lowest # of dimensions that can hold the given 
 data, but it leaves the code vulnerable to certain edge cases.  Maybe 
 there is a different way I should be doing this, but I feel that this 
 behavior at the very least should be included in the loadtxt 
 documentation.


It would be useful to be able to tell loadtxt to not call squeeze, so a 
program that reads column-formatted data doesn't have to treat the case 
of a single line specially.

Warren


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt() behavior on single-line files

2010-06-24 Thread Christopher Barker
Warren Weckesser wrote:
 Benjamin Root wrote:
 Note that this isn't restricted to structured arrays.  For regular 
 ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():
 
 Exactly.  The last four lines of the function are:
 
 X = np.squeeze(X)
 if unpack:
 return X.T
 else:
 return X

 It would be useful to be able to tell loadtxt to not call squeeze, so a 
 program that reads column-formatted data doesn't have to treat the case 
 of a single line specially.

I agree -- it seem to me that every time I load data, I know what shape 
I expect the result to be -- I'd never want it to squeeze. It might be 
nice if you could specify the dimensionality of the array you want.


But for now: can you just do a reshape?

In [42]: strData = StringIO(53.2 49.2)

In[43]:a=p.loadtxt(strData,dtype=[('x',float),('y',float)]).reshape((-1,))

In [45]: a.shape
Out[45]: (1,)



-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt() behavior on single-line files

2010-06-24 Thread Benjamin Root
On Thu, Jun 24, 2010 at 1:00 PM, Warren Weckesser 
warren.weckes...@enthought.com wrote:

 Benjamin Root wrote:
  Hi,
 
  I was having the hardest time trying to figure out an intermittent bug
  in one of my programs.  Essentially, in some situations, it was
  throwing an error saying that the array object was not an array.  It
  took me a while, but then I figured out that my program was assuming
  that the object returned from a loadtxt() call was always a structured
  array (I was using dtypes).  However, if the data file being loaded
  only had one data record, then all you get back is a structured record.
 
  import numpy as np
  from StringIO import StringIO
 
  strData = StringIO(89.23 47.2\n13.2 42.2)
  a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
  print Length Two
  print a
  print a.shape
  print len(a)
 
  strData = StringIO(53.2 49.2)
  a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
  print \n\nLength One
  print a
  print a.shape
  try :
  print len(a)
  except TypeError as err
  print ERROR:, err
 
  Which gets me this output:
 
  Length Two
  [(89.234, 47.203)
   (13.199, 42.203)]
  (2,)
  2
 
 
  Length One
  (53.203, 49.203)
  ()
  ERROR: len() of unsized object
 
 
  Note that this isn't restricted to structured arrays.  For regular
  ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():

 Exactly.  The last four lines of the function are:

X = np.squeeze(X)
if unpack:
return X.T
else:
return X

 
   a = np.ones((1, 1, 1))
   np.squeeze(a)[0]
  IndexError: 0-d arrays can't be indexed
 
   strData = StringIO(53.2)
   a = np.loadtxt(strData)
   a[0]
  IndexError: 0-d arrays can't be indexed
 
  So, if you have multiple lines with multiple columns, you get a 2-D
  array, as expected.
  if you have a single line of data with multiple columns, you get a 1-D
  array.
  If you have a single column with many lines, you also get a 1-D array
  (which is probably expected, I guess).
  If you have a single column with a single line, you get a scalar
  (actually, a 0-D array).
 
  Is this a bug or a feature?  I can see the advantages of having
  loadtxt() returning the lowest # of dimensions that can hold the given
  data, but it leaves the code vulnerable to certain edge cases.  Maybe
  there is a different way I should be doing this, but I feel that this
  behavior at the very least should be included in the loadtxt
  documentation.
 

 It would be useful to be able to tell loadtxt to not call squeeze, so a
 program that reads column-formatted data doesn't have to treat the case
 of a single line specially.

 Warren


I don't know if that is the best way to solve the problem.  In that case,
you would always get a 2-D array, right?  Is that useful for those who have
text data as a single column?  Maybe a mindim keyword (with None as default)
and apply an appropriate atleast_Nd() call (or maybe have available an
.atleast_nd() function?).  But, then what would this mean for structured
arrays?  One might think that they want at least 2-D, but they really want
at least 1-D.

Ben Root

P.S. - Taking this a step further, the functions completely fail in dealing
with empty files...  In MATLAB, it returns an empty array (matrix?).
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion