[Numpy-discussion] Fwd: np.loadtxt : yet a new implementation...

2008-12-01 Thread Pierre GM
(Sorry about that, I pressed Reply instead of Reply all. Not my  
day for emails...)

 On Dec 1, 2008, at 1:54 PM, John Hunter wrote:

 It looks like I am doing something wrong -- trying to parse a CSV  
 file
 with dates formatted like '2008-10-14', with::

   import datetime, sys
   import dateutil.parser
   StringConverter.upgrade_mapper(dateutil.parser.parse,
 default=datetime.date(1900,1,1))
   r = loadtxt(sys.argv[1], delimiter=',', names=True)

 John,
 The problem you have is that the default dtype is 'float' (for  
 backwards compatibility w/ the original np.loadtxt). What you want  
 is to automatically change the dtype according to the content of  
 your file: you should use dtype=None

 r = loadtxt(sys.argv[1], delimiter=',', names=True, dtype=None)

 As you'll want a recarray, we could make a np.records.loadtxt  
 function where dtype=None would be the default...

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: np.loadtxt : yet a new implementation...

2008-12-01 Thread John Hunter
On Mon, Dec 1, 2008 at 1:14 PM, Pierre GM [EMAIL PROTECTED] wrote:

 The problem you have is that the default dtype is 'float' (for
 backwards compatibility w/ the original np.loadtxt). What you want
 is to automatically change the dtype according to the content of
 your file: you should use dtype=None

 r = loadtxt(sys.argv[1], delimiter=',', names=True, dtype=None)

 As you'll want a recarray, we could make a np.records.loadtxt
 function where dtype=None would be the default...

 As you'll want a recarray, we could make a np.records.loadtxt function where
 dtype=None would be the default...

OK, that worked great.  I do think some a default impl in np.rec which
returned a recarray would be nice.  It might also be nice to have a
method like np.rec.fromcsv which defaults to a delimiter=',',
names=True and dtype=None.  Since csv is one of the most common data
interchange format in  the world, it would be nice to have some
obvious function that works with it with little or no customization
required.

Fernando and I have taught a scientific computing  course on a number
of occasions, and on the last round we taught to undergrads.  Most of
these students have little or no programming, for many the concept of
an array is something they struggle with, dtypes are a difficult
concept, but we found that they responded very well to our csv2rec
example, because with no syntactic cruft they were able to load a file
and do some stats on the columns, and I would like to see that ease of
use preserved.

JDH
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: np.loadtxt : yet a new implementation...

2008-12-01 Thread Pierre GM

On Dec 1, 2008, at 2:26 PM, John Hunter wrote

 OK, that worked great.  I do think some a default impl in np.rec which
 returned a recarray would be nice.  It might also be nice to have a
 method like np.rec.fromcsv which defaults to a delimiter=',',
 names=True and dtype=None.  Since csv is one of the most common data
 interchange format in  the world, it would be nice to have some
 obvious function that works with it with little or no customization
 required.


Quite agreed. Personally, I'd ditch the default dtype=float in favor  
of dtype=None, but compatibility is an issue.
However, if we all agree on genloadtxt, we can use tailored-made  
version in different modules, like you suggest.

There's an extra issue for which we have an solution I'm not  
completely satisfied with: names=True.
It might be simpler for basic user not to set names=True, and have the  
first header recognized as names or not if needed (by processing the  
first line after the others, and using it as header if it's found to  
be a list of names, or inserting it back at the beginning otherwise)...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion