[Numpy-discussion] recfunctions.stack_arrays

2009-01-27 Thread Ryan May
Pierre (or anyone else who cares to chime in),

I'm using stack_arrays to combine data from two different files into a single
array.  In one of these files, the data from one entire record comes back
missing, which, thanks to your recent change, ends up having a boolean dtype.
There is actual data for this same field in the 2nd file, so it ends up having
the dtype of float64.  When I try to combine the two arrays, I end up with the
following traceback:

data = stack_arrays((old_data, data))
  File /home/rmay/.local/lib64/python2.5/site-packages/metpy/cbook.py, line
260, in stack_arrays
output = ma.masked_all((np.sum(nrecords),), newdescr)
  File /home/rmay/.local/lib64/python2.5/site-packages/numpy/ma/extras.py, 
line
79, in masked_all
a = masked_array(np.empty(shape, dtype),
ValueError: two fields with the same name

Which is unsurprising.  Do you think there is any reasonable way to get
stack_arrays() to find a common dtype for fields with the same name?  Or another
suggestion on how to approach this?  If you think coercing one/both of the 
fields
to a common dtype is the way to go, just point me to a function that could 
figure
out the dtype and I'll try to put together a patch.

Thanks,

Ryan

P.S.  Thanks so much for your work on putting those utility functions in
recfunctions.py  It makes it so much easier to have these functions available in
the library itself rather than needing to reinvent the wheel over and over.

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug with mafromtxt

2009-01-26 Thread Ryan May
Pierre GM wrote:
 On Jan 24, 2009, at 6:23 PM, Ryan May wrote:
 Ok, thanks.  I've dug a little further, and it seems like the  
 problem is that a
 column of all missing values ends up as a column of all None's.   
 When you create
 a (masked) array from a list of None's, you end up with an object  
 array.  On one
 hand I'd love for things to behave differently in this case, but on  
 the other I
 understand why things work this way.
 
 Ryan,
 Mind giving r6434 a try? As usual, don't hesitate to report any problem.

Works great! Thanks for the quick fix.  I had racked my brain on how to go about
fixing this cleanly, but this is far simpler than what I would have done.  It
makes sense, since all I really needed for the masked column was something
*other* than object.

Thanks a lot,

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy ndarray questions

2009-01-26 Thread Ryan May
Jochen wrote:
 Hi all,
 
 I just wrote ctypes bindings to fftw3 (see
 http://projects.scipy.org/pipermail/scipy-user/2009-January/019557.html
 for the post to scipy). 
 Now I have a couple of numpy related questions:
 
 In order to be able to use simd instructions I 
 create an ndarray subclass, which uses fftw_malloc to allocate the
 memory and fftw_free to free the memory when the array is deleted. This
 works fine for inplace operations however if someone does something like
 this:
 
 a = fftw3.AlignedArray(1024,complex)
 
 a = a+1
 
 a.ctypes.data points to a different memory location (this is actually an
 even bigger problem when executing fftw plans), however 
 type(a) still gives me class 'fftw3.planning.AlignedArray'.

This might help some:

http://www.scipy.org/Subclasses

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bug with mafromtxt

2009-01-24 Thread Ryan May
Pierre,

I've found what I consider to be a bug in the new mafromtxt (though apparently 
it
existed in earlier versions as well).  If you have an entire column of data in a
file that contains only masked data, and try to get mafromtxt to automatically
choose the dtype, the dtype gets selected to be object type.  In this case, I'd
think the better behavior would be float, but I'm not sure how hard it would be
to make this the case.  Here's a test case:

import numpy as np
from StringIO import StringIO
s = StringIO('1 2 3\n4 5 6\n')
a = np.mafromtxt(s, missing='2,5', dtype=None)
print a.dtype

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug with mafromtxt

2009-01-24 Thread Ryan May
Pierre GM wrote:
 Ryan,
 Thanks for reporting. An idea would be to force the dtype of the  
 masked column to the largest dtype of the other columns (in your  
 example, that would be int). I'll try to see how easily it can be done  
 early next week. Meanwhile, you can always give an explicit dtype at  
 creation.

Ok, thanks.  I've dug a little further, and it seems like the problem is that a
column of all missing values ends up as a column of all None's.  When you create
a (masked) array from a list of None's, you end up with an object array.  On one
hand I'd love for things to behave differently in this case, but on the other I
understand why things work this way.

Ryan

 
 On Jan 24, 2009, at 5:58 PM, Ryan May wrote:
 
 Pierre,

 I've found what I consider to be a bug in the new mafromtxt (though  
 apparently it
 existed in earlier versions as well).  If you have an entire column  
 of data in a
 file that contains only masked data, and try to get mafromtxt to  
 automatically
 choose the dtype, the dtype gets selected to be object type.  In  
 this case, I'd
 think the better behavior would be float, but I'm not sure how hard  
 it would be
 to make this the case.  Here's a test case:

 import numpy as np
 from StringIO import StringIO
 s = StringIO('1 2 3\n4 5 6\n')
 a = np.mafromtxt(s, missing='2,5', dtype=None)
 print a.dtype

 Ryan

 -- 
 Ryan May
 Graduate Research Assistant
 School of Meteorology
 University of Oklahoma
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Pattern for reading non-simple binary files

2009-01-23 Thread Ryan May
Hi,

I'm trying to read in a data from a binary-formatted file. I have the data
format, (available at:
http://www1.ncdc.noaa.gov/pub/data/documentlibrary/tddoc/td7000.pdf if you're
really curious), but it's not what I would consider simple, with a lot of
different blocks and messages, some that are optional and some that have
different formats depending on the data type.  My question is, has anyone dealt
with data like this using numpy?  Have you found a good pattern for how to
construct a numpy dtype dynamically to decode the different parts of the file
appropriately as you go along?

Any insight would be appreciated.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy performance vs Matlab.

2009-01-07 Thread Ryan May
Nicolas ROUX wrote:
 Hi,
 
 I need help ;-)
 I have here a testcase which works much faster in Matlab than Numpy.
  
 The following code takes less than 0.9sec in Matlab, but 21sec in Python.
 Numpy is 24 times slower than Matlab !
 The big trouble I have is a large team of people within my company is ready 
 to replace Matlab by Numpy/Scipy/Matplotlib,
 but I have to demonstrate that this kind of Python Code is executed with the 
 same performance than Matlab, without writing C extension.
 This is becoming a critical point for us.
 
 This is a testcase that people would like to see working without any code 
 restructuring.
 The reasons are:
 - this way of writing is fairly natural.
 - the original code which showed me the matlab/Numpy performance differences 
 is much more complex,
 and can't benefit from broadcasting or other numpy tips (I can later give 
 this code)
 
 ...So I really need to use the code below, without restructuring.
 
 Numpy/Python code:
 #
 import numpy
 import time
 
 print Start test \n 
 
 dim = 3000
 
 a = numpy.zeros((dim,dim,3))
 
 start = time.clock()
 
 for i in range(dim):
 for j in range(dim):
 a[i,j,0] = a[i,j,1]
 a[i,j,2] = a[i,j,0]
 a[i,j,1] = a[i,j,2]
 
 end = time.clock() - start
 
 print Test done,   %f sec % end
 #
SNIP
 Any idea on it ?
 Did I missed something ?

I think you may have reduced the complexity a bit too much.  The python code
above sets all of the elements equal to a[i,j,1].  Is there any reason you can't
use slicing to avoid the loops?

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Alternative to record array

2008-12-29 Thread Ryan May
Jean-Baptiste Rudant wrote:
 Hello,
 
 I like to use record arrays to access fields by their name, and because
 they are esay to use with pytables. But I think it's not very effiicient
 for what I have to do. Maybe I'm misunderstanding something.
 
 Example : 
 
 import numpy as np
 age = np.random.randint(0, 99, 10e6)
 weight = np.random.randint(0, 200, 10e6)
 data = np.rec.fromarrays((age, weight), names='age, weight')
 # the kind of operations I do is :
 data.age += data.age + 1
 # but it's far less efficient than doing :
 age += 1
 # because I think the record array stores [(age_0, weight_0) ...(age_n,
 weight_n)]
 # and not [age0 ... age_n] then [weight_0 ... weight_n].
 
 So I think I don't use record arrays for the right purpose. I only need
 something which would make me esasy to manipulate data by accessing
 fields by their name.
 
 Am I wrong ? Is their something in numpy for my purpose ? Do I have to
 implement my own class, with something like :
 
 
 class FieldArray:
 def __init__(self, array_dict):
 self.array_list = array_dict
 
 def __getitem__(self, field):
 return self.array_list[field]
 
 def __setitem__(self, field, value):
 self.array_list[field] = value
 
 my_arrays = {'age': age, 'weight' : weight}
 data = FieldArray(my_arrays)
 
 data['age'] += 1

You can accomplish what your FieldArray class does using numpy dtypes:

import numpy as np
dt = np.dtype([('age', np.int32), ('weight', np.int32)])
N = int(10e6)
data = np.empty(N, dtype=dt)
data['age'] = np.random.randint(0, 99, 10e6)
data['weight'] = np.random.randint(0, 200, 10e6)

data['age'] += 1

Timing for recarrays (your code):

In [10]: timeit data.age += 1
10 loops, best of 3: 221 ms per loop

Timing for my example:

In [2]: timeit data['age']+=1
10 loops, best of 3: 150 ms per loop

Hope this helps.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genloadtxt : last call

2008-12-17 Thread Ryan May
Pierre GM wrote:
 Ryan,
 OK, I'll look into that. I won't have time to address it before this  
 next week, however. Option #2 looks like the best.

No hurries, I just want to make sure I raise any issues I see while the design 
is 
still up for change.

 In other news, I was considering renaming genloadtxt to genfromtxt,  
 and using ndfromtxt, mafromtxt, recfromtxt, recfromcsv for the  
 function names. That way, loadtxt is untouched.

+1
I know I've changed my tune here, but at this point it seems like there's so 
much 
more functionality here that calling it loadtxt would be a disservice to how 
much 
the new function can do (and how much work you've done).

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unexpected MaskedArray behavior

2008-12-17 Thread Ryan May
Pierre GM wrote:
 On Dec 16, 2008, at 1:57 PM, Ryan May wrote:
 I just noticed the following and I was kind of surprised:

 a = ma.MaskedArray([1,2,3,4,5], mask=[False,True,True,False,False])
 b = a*5
 b
 masked_array(data = [5 -- -- 20 25],
   mask = [False  True  True False False],
   fill_value=99)
 b.data
 array([ 5, 10, 15, 20, 25])

 I was expecting that the underlying data wouldn't get modified while  
 masked.  Is
 this actual behavior expected?
 
 Meh. Masked data shouldn't be trusted anyway, so I guess it doesn't  
 really matter one way or the other.
 But I tend to agree, it'd make more sense leave masked data untouched  
 (or at least, reset them to their original value after the operation),  
 which would mimic the behavior of gimp/photoshop.
 Looks like there's a relatively easy fix. I need time to check whether  
 it doesn't break anything elsewhere, nor that it slows things down too  
 much. I won't have time to test all that before next week, though. In  
 any case, that would be for 1.3.x, not for 1.2.x.
 In the meantime, if you need the functionality, use something like
 ma.where(a.mask,a,a*5)

I agree that masked values probably shouldn't be trusted, I was just surprised 
to 
see the behavior.  I just assumed that no operations were taking place on 
masked 
values.

Just to clarify what I was doing here: I had a masked array of data, where the 
mask was set by a variety of different masked values.  Later on in the code, 
after doing some unit conversions, I went back to look at the raw data to find 
points that had one particular masked value set.  Instead, I was surprised to 
see 
all of the masked values had changed and I could no longer find any of the 
special values in the data.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Unexpected MaskedArray behavior

2008-12-16 Thread Ryan May
Hi,

I just noticed the following and I was kind of surprised:

 a = ma.MaskedArray([1,2,3,4,5], mask=[False,True,True,False,False])
 b = a*5
  b
masked_array(data = [5 -- -- 20 25],
   mask = [False  True  True False False],
   fill_value=99)
  b.data
array([ 5, 10, 15, 20, 25])

I was expecting that the underlying data wouldn't get modified while masked.  
Is 
this actual behavior expected?

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genloadtxt : last call

2008-12-16 Thread Ryan May
Pierre GM wrote:
 All,
 Here's the latest version of genloadtxt, with some recent corrections. 
 With just a couple of tweaking, we end up with some decent speed: it's 
 still slower than np.loadtxt, but only 15% so according to the test at 
 the end of the package.

I have one more use issue that you may or may not want to fix. My problem is 
that 
missing values are specified by their string representation, so that a string 
representing a missing value, while having the same actual numeric value, may 
not 
compare equal when represented as a string.  For instance, if you specify that 
-999.0 represents a missing value, but the value written to the file is 
-999.00, 
you won't end up masking the -999.00 data point.  I'm sure a test case will 
help 
here:

 def test_withmissing_float(self):
 data = StringIO.StringIO('A,B\n0,1.5\n2,-999.00')
 test = mloadtxt(data, dtype=None, delimiter=',', missing='-999.0',
 names=True)
 control = ma.array([(0, 1.5), (2, -1.)],
mask=[(False, False), (False, True)],
dtype=[('A', np.int), ('B', np.float)])
 print control
 print test
 assert_equal(test, control)
 assert_equal(test.mask, control.mask)

Right now this fails with the latest version of genloadtxt.  I've worked around 
this by specifying a whole bunch of string representations of the values, but I 
wasn't sure if you knew of a better way that this could be handled within 
genloadtxt.  I can only think of two ways, though I'm not thrilled with either:

1) Call the converter on the string form of the missing value and compare 
against 
the converted value from the file to determine if missing. (Probably very slow)

2) Add a list of objects (ints, floats, etc.) to compare against after 
conversion 
to determine if they're missing. This might needlessly complicate the function, 
which I know you've already taken pains to optimize.

If there's no good way to do it, I'm content to live with a workaround.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genloadtxt : last call

2008-12-08 Thread Ryan May
Pierre GM wrote:
 All,
 Here's the latest version of genloadtxt, with some recent corrections. 
 With just a couple of tweaking, we end up with some decent speed: it's 
 still slower than np.loadtxt, but only 15% so according to the test at 
 the end of the package.
 
 And so, now what ? Should I put the module in numpy.lib.io ? Elsewhere ?
 
 Thx for any comment and suggestions.

Current version works out of the box for me.

Thanks for running point on this.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genloadtxt: second serving

2008-12-04 Thread Ryan May
Pierre GM wrote:
 All,
 Here's the second round of genloadtxt. That's a tad cleaner version than 
 the previous one, where I tried to take  into account the different 
 comments and suggestions that were posted. So, tabs should be supported 
 and explicit whitespaces are not collapsed.

Looks pretty good, but there's one breakage against what I had working 
with my local copy (with mods).  When adding the filtering of names read 
from the file using usecols, there's a reason I set a flag and fixed it 
later: converters specified by name.  If we have usecols and converters 
specified by name, and we read the names from a file, we have the 
following sequence:

1) Read names
2) Convert usecols names to column numbers.
3) Filter name list using usecols. Indices of names list no longer map 
to column numbers.
4) Change converters from mapping names-funcs to mapping col#-func 
using indices from namesOOPS.

It's an admittedly complex combination, but it allows flexibly reading 
text files since you're only basing on field names, no column numbers. 
Here's a test case:

 def test_autonames_usecols_and_converter(self):
 Tests names and usecols
 data = StringIO.StringIO('A B C D\n  121 45 9.1')
 test = loadtxt(data, usecols=('A', 'C', 'D'), names=True,  
 
  dtype=None, converters={'C':lambda s: 2 * int(s)})
 control = np.array(('', 90, 9.1),
 dtype=[('A', '|S4'), ('C', int), ('D', float)])
 assert_equal(test, control)

This fails with your current implementation, but works for me when:

1) Set a flag when reading names from header line in file
2) Filter names from file using usecols (if the flag is true) *after*
remapping the converters. There may be a better approach, but this is 
the simplest I've come up with so far.

 FYI, in the __main__ section, you'll find 2 hotshot tests and a timeit 
 comparison: same input, no missing data, one with genloadtxt, one with 
 np.loadtxt and a last one with matplotlib.mlab.csv2rec.
 
 As you'll see, genloadtxt is roughly twice slower than np.loadtxt, but 
 twice faster than csv2rec. One of the explanation for the slowness is 
 indeed the use of classes for splitting lines and converting values. 
 Instead of a basic function, we use the __call__ method of the class, 
 which itself calls another function depending on the attribute values. 
 I'd like to reduce this overhead, any suggestion is more than welcome, 
 as usual.
 
 Anyhow: as we do need speed, I suggest we put genloadtxt somewhere in 
 numpy.ma, with an alias recfromcsv for John, using his defaults. Unless 
 somebody comes with a brilliant optimization.

Why only in numpy.ma and not somewhere in core numpy itself (missing 
values aside)?  You have a pretty good masked array agnostic wrapper 
that IMO could go in numpy, though maybe not as loadtxt.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-03 Thread Ryan May

Pierre GM wrote:

I think that treating an explicitly-passed-in ' ' delimiter as
identical to 'no delimiter' is a bad idea. If I say that ' ' is the
delimiter, or '\t' is the delimiter, this should be treated *just*
like ',' being the delimiter, where the expected output is:
['1', '2', '3', '4', '', '5']



Valid point.
Well, all, stay tuned for yet another yet another implementation...



Found a problem.  If you read the names from the file and specify 
usecols, you end up with the first N names read from the file as the 
fields in your output (where N is the number of entries in usecols), 
instead of having the names of the columns you asked for.


For instance:

from StringIO import StringIO
from genload_proposal import loadtxt
f = StringIO('stid stnm relh tair\nnrmn 121 45 9.1')
loadtxt(f, usecols=('stid', 'relh', 'tair'), names=True, dtype=None)
array(('nrmn', 45, 9.0996),
  dtype=[('stid', '|S4'), ('stnm', 'i8'), ('relh', 'f8')])

What I want to come out is:

array(('nrmn', 45, 9.0996),
  dtype=[('stid', '|S4'), ('relh', 'i8'), ('tair', 'f8')])

I've attached a version that fixes this by setting a flag internally if 
the names are read from the file.  If this flag is true, at the end the 
names are filtered down to only the ones that are given in usecols.


I also have one other thought.  Is there any way we can make this handle 
object arrays, or rather, a field containing objects, specifically 
datetime objects?  Right now, this does not work because calling view 
does not work for object arrays.  I'm just looking for a simple way to 
store date/time in my record array (currently a string field).


Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

Proposal : 
Here's an extension to np.loadtxt, designed to take missing values into account.





import itertools
import numpy as np
import numpy.ma as ma


def _is_string_like(obj):

Check whether obj behaves like a string.

try:
obj + ''
except (TypeError, ValueError):
return False
return True

def _to_filehandle(fname, flag='r', return_opened=False):

Returns the filehandle corresponding to a string or a file.
If the string ends in '.gz', the file is automatically unzipped.

Parameters
--
fname : string, filehandle
Name of the file whose filehandle must be returned.
flag : string, optional
Flag indicating the status of the file ('r' for read, 'w' for write).
return_opened : boolean, optional
Whether to return the opening status of the file.

if _is_string_like(fname):
if fname.endswith('.gz'):
import gzip
fhd = gzip.open(fname, flag)
elif fname.endswith('.bz2'):
import bz2
fhd = bz2.BZ2File(fname)
else:
fhd = file(fname, flag)
opened = True
elif hasattr(fname, 'seek'):
fhd = fname
opened = False
else:
raise ValueError('fname must be a string or file handle')
if return_opened:
return fhd, opened
return fhd


def flatten_dtype(ndtype):

Unpack a structured data-type.


names = ndtype.names
if names is None:
return [ndtype]
else:
types = []
for field in names:
(typ, _) = ndtype.fields[field]
flat_dt = flatten_dtype(typ)
types.extend(flat_dt)
return types


def nested_masktype(datatype):

Construct the dtype of a mask for nested elements.


names = datatype.names
if names:
descr = []
for name in names:
(ndtype, _) = datatype.fields[name]
descr.append((name, nested_masktype(ndtype)))
return descr
# Is this some kind of composite a la (np.float,2)
elif datatype.subdtype:
mdescr = list(datatype.subdtype)
mdescr[0] = np.dtype(bool)
return tuple(mdescr)
else:
return np.bool


class LineSplitter:

Defines a function to split a string at a given delimiter or at given places.

Parameters
--
comment : {'#', string}
Character used to mark the beginning of a comment.
delimiter : var


def __init__(self, delimiter=None, comments='#'):
self.comments = comments
# Delimiter is a character
if delimiter is None:
self._isfixed = False
self.delimiter = None
elif _is_string_like(delimiter):
self._isfixed = False
self.delimiter = delimiter.strip() or None
# Delimiter is a list of field widths
elif hasattr(delimiter, '__iter__'):
self._isfixed = True
idx = np.cumsum([0]+list(delimiter))
self.slices = [slice(i,j) for (i,j) in zip(idx[:-1], idx[1:])]
# Delimiter is a single integer
elif int(delimiter):
self._isfixed = True

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Ryan May
Zachary Pincus wrote:
 Specifically, on line 115 in LineSplitter, we have:
  self.delimiter = delimiter.strip() or None
 so if I pass in, say, '\t' as the delimiter, self.delimiter gets set  
 to None, which then causes the default behavior of any-whitespace-is- 
 delimiter to be used. This makes lines like Gene Name\tPubMed ID 
 \tStarting Position get split wrong, even when I explicitly pass in  
 '\t' as the delimiter!
 
 Similarly, I believe that some of the tests are formulated wrong:
  def test_nodelimiter(self):
  Test LineSplitter w/o delimiter
  strg =  1 2 3 4  5 # test
  test = LineSplitter(' ')(strg)
  assert_equal(test, ['1', '2', '3', '4', '5'])
 
 I think that treating an explicitly-passed-in ' ' delimiter as  
 identical to 'no delimiter' is a bad idea. If I say that ' ' is the  
 delimiter, or '\t' is the delimiter, this should be treated *just*  
 like ',' being the delimiter, where the expected output is:
 ['1', '2', '3', '4', '', '5']
 
 At least, that's what I would expect. Treating contiguous blocks of  
 whitespace as single delimiters is perfectly reasonable when None is  
 provided as the delimiter, but when an explicit delimiter has been  
 provided, it strikes me that the code shouldn't try to further- 
 interpret it...
 
 Does anyone else have any opinion here?

I agree.  If the user explicity passes something as a delimiter, we 
should use it and not try to be too smart.

+1

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-02 Thread Ryan May
Pierre GM wrote:
 Well, looks like the attachment is too big, so here's the 
 implementation. The tests will come in another message.

A couple of quick nitpicks:

1) On line 186 (in the NameValidator class), you use 
excludelist.append() to append a list to the end of a list.  I think you 
meant to use excludelist.extend()

2) When validating a list of names, why do you insist on lower casing 
them? (I'm referring to the call to lower() on line 207).  On one hand, 
this would seem nicer than all upper case, but on the other hand this 
can cause confusion for someone who sees certain casing of names in the 
file and expects that data to be laid out the same.

Other than those, it's working fine for me here.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2008-12-01 Thread Ryan May
Stéfan van der Walt wrote:
 Hi Pierre
 
 2008/12/1 Pierre GM [EMAIL PROTECTED]:
 * `genloadtxt` is the base function that makes all the work. It
 outputs 2 arrays, one for the data (missing values being substituted
 by the appropriate default) and one for the mask. It would go in
 np.lib.io
 
 I see the code length increased from 200 lines to 800.  This made me
 wonder about the execution time: initial benchmarks suggest a 3x
 slow-down.  Could this be a problem for loading large text files?  If
 so, should we consider keeping both versions around, or by default
 bypassing all the extra hooks?

I've wondered about this being an issue.  On one hand, you hate to make 
existing code noticeably slower.  On the other hand, if speed is 
important to you, why are you using ascii I/O?

I personally am not entirely against having two versions of loadtxt-like 
functions.  However, the idea seems a little odd, seeing as how loadtxt 
was already supposed to be the swiss army knife of text reading.

I'm seeing a similar slowdown with Pierre's version of the code.  The 
version of loadtxt that I cobbled together with the StringConverter 
class (and no missing value support) shows about a 50% slowdown, so 
clearly there's a performance penalty for trying to make a generic 
function that can be all things to all people.  On the other hand, this 
approach reduces code duplication.

I'm not really opinionated on what the right approach is here.  My only 
opinion is that this functionality *really* needs to be in numpy in some 
fashion.  For my own use case, with the old version, I could read a text 
file and by hand separate out columns and mask values.  Now, I open a 
file and get a structured array with an automatically detected dtype 
(names and types!) plus masked values.

My $0.02.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-26 Thread Ryan May
John Hunter wrote:
 On Tue, Nov 25, 2008 at 11:23 PM, Ryan May [EMAIL PROTECTED] wrote:
 
 Updated patch attached.  This includes:
  * Updated docstring
  * New tests
  * Fixes for previous issues
  * Fixes to make new tests actually work

 I appreciate any and all feedback.
 
 I'm having trouble applying your patch, so I haven't tested yet, but
 do you (and do you want to) handle a case like this::
 
 from  StringIO import StringIO
 import matplotlib.mlab as mlab
 f1 = StringIO(\
 name   age  weight
 John   23   145.
 Harry  43   180.)
 
 for line in f1:
 print line.split(' ')
 
 
 Ie, space delimited but using an irregular number of spaces?   One
 place this comes up a lot is when  the output files are actually
 fixed-width using spaces to line up the columns.  One could count the
 columns to figure out the fixed widths and work with that, but it is
 much easier to simply assume space delimiting and handle the irregular
 number of spaces assuming one or more spaces is the delimiter.  In
 csv2rec, we write a custom file object to handle this case.
 
 Apologies if you are already handling this and I missed it...

I think line.split(None) handles this case, so *in theory* passing 
delimiter=None would do it.  I *am* interested in this case, so I'll 
have to give it a try when I get a chance. (I sense this is the same 
case as Manuel just asked about.)

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-26 Thread Ryan May
Manuel Metz wrote:
 Ryan May wrote:
 3) Better support for missing values.  The docstring mentions a way of
 handling missing values by passing in a converter.  The problem with this is
 that you have to pass in a converter for *every column* that will contain
 missing values.  If you have a text file with 50 columns, writing this
 dictionary of converters seems like ugly and needless boilerplate.  I'm
 unsure of how best to pass in both what values indicate missing values and
 what values to fill in their place.  I'd love suggestions
 
 Hi Ryan,
this would be a great feature to have !!!

Thanks for the support!

 One question: I have a datafile in ASCII format that uses a fixed width 
 for each column. If no data if present, the space is left empty (see 
 second row). What is the default behavior of the StringConverter class 
 in this case? Does it ignore the empty entry by default? If so, what is 
 the value in the array in this case? Is it nan?
 
 Example file:
 
1| 123.4| -123.4| 00.0
2|  |  234.7| 12.2
 

I don't think this is so much anything to do with StringConverter, but 
more to do with how to split lines.  Maybe we should add an option that, 
instead of simply specifying characters that delimit the fields, allows 
one to pass a custom function to split lines?  That could either be done 
by overriding `delimiter` or by adding a new option like `splitter`

I'll have to give that some thought.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Hi,

I have a couple more changes to loadtxt() that I'd like to code up in time
for 1.3, but I thought I should run them by the list before doing too much
work.  These are already implemented in some fashion in
matplotlib.mlab.csv2rec(), but the code bases are different enough, that
pretty much only the idea can be lifted.  All of these changes would be done
in a manner that is backwards compatible with the current API.

1) Support for setting the names of fields in the returned structured array
without using dtype.  This can be a passed in list of names or reading the
names of fields from the first line of the file.  Many files have a header
line that gives a name for each column.  Adding this would obviously make
loadtxt much more general and allow for more generic code, IMO. My current
thinking is to add a *name* keyword parameter that defaults to None, for no
support for reading names.  Setting it to True would tell loadtxt() to read
the names from the first line (after skiprows).  The other option would be
to set names to a list of strings.

2) Support for automatic dtype inference.  Instead of assuming all values
are floats, this would try a list of options until one worked.  For strings,
this would keep track of the longest string within a given field before
setting the dtype.  This would allow reading of files containing a mixture
of types much more easily, without having to go to the trouble of
constructing a full dtype by hand.  This would work alongside any custom
converters one passes in.  My current thinking of API would just be to add
the option of passing the string 'auto' as the dtype parameter.

3) Better support for missing values.  The docstring mentions a way of
handling missing values by passing in a converter.  The problem with this is
that you have to pass in a converter for *every column* that will contain
missing values.  If you have a text file with 50 columns, writing this
dictionary of converters seems like ugly and needless boilerplate.  I'm
unsure of how best to pass in both what values indicate missing values and
what values to fill in their place.  I'd love suggestions

Here's an example of my use case (without 50 columns):

ID,First Name,Last Name,Homework1,Homework2,Quiz1,Homework3,Final
1234,Joe,Smith,85,90,,76,
5678,Jane,Doe,65,99,,78,
9123,Joe,Plumber,45,90,,92,

Currently reading in this code requires a bit of boilerplace (declaring
dtypes, converters).  While it's nothing I can't write, it still would be
easier to write it once within loadtxt and have it for everyone.

Any support for *any* of these ideas?  Any suggestions on how the user
should pass in the information?

Thanks,

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote:
 Ryan,
 FYI,  I've been coding over the last couple of weeks an extension of 
 loadtxt for a better support of masked data, with the option to read 
 column names in a header. Please find an example below (I also have 
 unittest). Most of the work is actually inspired from matplotlib's 
 mlab.csv2rec. It might be worth not duplicating efforts.
 Cheers,
 P.

Absolutely!  Definitely don't want to duplicate effort here.  What I see 
here meets a lot of what I was looking for.  Here are some questions:

1) It looks like the function returns a structured array rather than a 
rec array, so that fields are obtained by doing a dictionary access. 
Since it's a dictionary access, is there any reason that the header 
needs to be munged to replace characters and reserved names?  IIUC, 
csv2rec changes names b/c it returns a rec array, which uses attribute 
lookup and hence all names need to be valid python identifiers.  This is 
not the case for a structured array.

2) Can we avoid the use of seek() in here?  I just posted a patch to 
change the check to readline, which was the only file function used 
previously.  This allowed the direct use of a file-like object returned 
by urllib2.urlopen().

3) In order to avoid breaking backwards compatibility, can we change to 
default for dtype to be float32, and instead use some kind of special 
value ('auto' ?) to use the automatic dtype determination?

I'm currently cooking up some of these changes myself, but thought I 
would see what you thought first.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote:
 On Nov 25, 2008, at 2:06 PM, Ryan May wrote:
 1) It looks like the function returns a structured array rather than a
 rec array, so that fields are obtained by doing a dictionary access.
 Since it's a dictionary access, is there any reason that the header
 needs to be munged to replace characters and reserved names?  IIUC,
 csv2rec changes names b/c it returns a rec array, which uses attribute
 lookup and hence all names need to be valid python identifiers.   
 This is
 not the case for a structured array.
 
 Personally, I prefer flexible ndarrays to recarrays, hence the output.  
 However, I still think that names should be as clean as possible to  
 avoid bad surprises down the road.

Ok, I'm not really partial to this, I just thought it would simplify. 
Your point is valid.

 2) Can we avoid the use of seek() in here?  I just posted a patch to
 change the check to readline, which was the only file function used
 previously.  This allowed the direct use of a file-like object  
 returned
 by urllib2.urlopen().
 
 I coded that a couple of weeks ago, before you posted your patch and I  
 didn't have tme to check it. Yes, we could try getting rid of seek.  
 However, we need to find a way to rewind to the beginning of the file  
 if the dtypes are not given in input (as we parsed the whole file to  
 find the best converter in that case).

What about doing the parsing and type inference in a loop and holding 
onto the already split lines?  Then loop through the lines with the 
converters that were finally chosen?  In addition to making my usecase 
work, this has the benefit of not doing the I/O twice.

 3) In order to avoid breaking backwards compatibility, can we change  
 to
 default for dtype to be float32, and instead use some kind of special
 value ('auto' ?) to use the automatic dtype determination?
 
 I'm not especially concerned w/ backwards compatibility, because we're  
 supporting masked values (something that np.loadtxt shouldn't have to  
 worry about). Initially, I needed a replacement to the fromfile  
 function in the scikits.timeseries.trecords package. I figured it'd be  
 easier and more portable to get a function for generic masked arrays,  
 that could be adapted afterwards to timeseries. In any case, I was  
 more considering the functions I send you to be part of some  
 numpy.ma.io module than a replacement to np.loadtxt. I tried to get  
 the syntax as close as possible to np.loadtxt and mlab.csv2rec, but  
 there'll always be some differences.
 
 So, yes, we could try to use a default dtype=float and yes, we could  
 have an extra parameter 'auto'. But is it really that useful ? I'm not  
 sure (well, no, I'm sure it's not...)

I understand you're not concerned with backwards compatibility, but with 
the exception of missing handling, which is probably specific to masked 
arrays, I was hoping to just add functionality to loadtxt().  Numpy 
doesn't need a separate text reader for most of this and breaking API 
for any of this is likely a non-starter.  So while, yes, having float be 
the default dtype is probably not the most useful, leaving it also 
doesn't break existing code.

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
 On Nov 25, 2008, at 2:37 PM, Ryan May wrote:
 What about doing the parsing and type inference in a loop and holding
 onto the already split lines?  Then loop through the lines with the
 converters that were finally chosen?  In addition to making my usecase
 work, this has the benefit of not doing the I/O twice.
 
 You mean, filling a list and relooping on it if we need to ? Sounds  
 like a plan, but doesn't it create some extra temporaries we may not  
 want ?

It shouldn't create any *extra* temporaries since we already make a list 
of lists before creating the final array.  It just introduces an extra 
looping step. (I'd reuse the existing list of lists).

 Depends on how we do it. We could have a  modified np.loadtxt that  
 takes some of the ideas of the file I send you (the StringConverter,  
 for example), then I could have a numpy.ma.io that would take care of  
 the missing data. And something in scikits.timeseries for the dates...
 
 The new np.loadtxt could use the default of the initial one, or we  
 could create yet another function (np.loadfromtxt) that would match  
 what I was suggesting, and np.loadtxt would be a special stripped  
 downcase with dtype=float by default.
 
 thoughts?

My personal opinion is that if it doesn't make loadtxt too unwieldly, to 
just add a few of the options to loadtxt() itself.  I'm working on 
tweaking loadtxt() to add the auto dtype and the names, relying heavily 
on your StringConverter class (nice code btw.).  If my understanding of 
StringConverter is correct, tweaking the new loadtxt for ma or 
timeseries would only require passing in modified versions of 
StringConverter.

I'll post that when I'm done and we can see if it looks like too much 
functionality stapled together or not.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote:
 Nope, we still need to double check whether there's any missing data  
 in any field of the line we process, independently of the conversion.  
 So there must be some extra loop involved, and I'd need a special  
 function in numpy.ma to take care of that. So our options are
 * create a new function in numpy.ma and leave np.loadtxt like that
 * write a new np.loadtxt incorporating most of the ideas of the code I  
 send, but I'd still need to adapt it to support masked values.

You couldn't run this loop on the array returned by np.loadtxt() (by 
masking on the appropriate fill value)?

 I'll post that when I'm done and we can see if it looks like too much
 functionality stapled together or not.
 
 Sounds like a plan. Wouldn't mind getting more feedback from fellow  
 users before we get too deep, however...

Agreed.  Anyone?

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May

Pierre GM wrote:
Sounds like a plan. Wouldn't mind getting more feedback from fellow  
users before we get too deep, however...


Ok, I've attached, as a first cut, a diff against SVN HEAD that does (I 
think) what I'm looking for.  It passes all of the old tests and passes 
my own quick test.  A more rigorous test suite will follow, but I want 
this out the door before I need to leave for the day.


What this changeset essentially does is just add support for automatic 
dtypes along with supplying/reading names for flexible dtypes.  It 
leverages StringConverter heavily, using a few tweaks so that old 
behavior is kept.  This is by no means a final version.


Probably the biggest change from what I mentioned earlier is that 
instead of dtype='auto', I've used dtype=None to signal the detection 
code, since dtype=='auto' causes problems.


I welcome any and all suggestions here, both on the code and on the 
original idea of adding these capabilities to loadtxt().


Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Index: lib/io.py
===
--- lib/io.py   (revision 6099)
+++ lib/io.py   (working copy)
@@ -233,29 +233,138 @@
 for name in todel:
 os.remove(name)
 
-# Adapted from matplotlib
+def _string_like(obj):
+try: obj + ''
+except (TypeError, ValueError): return False
+return True
 
-def _getconv(dtype):
-typ = dtype.type
-if issubclass(typ, np.bool_):
-return lambda x: bool(int(x))
-if issubclass(typ, np.integer):
-return lambda x: int(float(x))
-elif issubclass(typ, np.floating):
-return float
-elif issubclass(typ, np.complex):
-return complex
+def str2bool(value):
+
+Tries to transform a string supposed to represent a boolean to a boolean.
+
+Raises
+--
+ValueError
+If the string is not 'True' or 'False' (case independent)
+
+value = value.upper()
+if value == 'TRUE':
+return True
+elif value == 'FALSE':
+return False
 else:
-return str
+return int(bool(value))
 
+class StringConverter(object):
+
+Factory class for function transforming a string into another object (int,
+float).
 
-def _string_like(obj):
-try: obj + ''
-except (TypeError, ValueError): return 0
-return 1
+After initialization, an instance can be called to transform a string 
+into another object. If the string is recognized as representing a missing
+value, a default value is returned.
 
+Parameters
+--
+dtype : dtype, optional
+Input data type, used to define a basic function and a default value
+for missing data. For example, when `dtype` is float, the :attr:`func`
+attribute is set to ``float`` and the default value to `np.nan`.
+missing_values : sequence, optional
+Sequence of strings indicating a missing value.
+
+Attributes
+--
+func : function
+Function used for the conversion
+default : var
+Default value to return when the input corresponds to a missing value.
+mapper : sequence of tuples
+Sequence of tuples (function, default value) to evaluate in order.
+
+
+from numpy.core import nan # To avoid circular import
+mapper = [(str2bool, None),
+  (lambda x: int(float(x)), -1),
+  (float, nan),
+  (complex, nan+0j),
+  (str, '???')]
+
+def __init__(self, dtype=None, missing_values=None):
+if dtype is None:
+self.func = str2bool
+self.default = None
+self._status = 0
+else:
+dtype = np.dtype(dtype).type
+self.func,self.default,self._status = self._get_from_dtype(dtype)
+
+# Store the list of strings corresponding to missing values.
+if missing_values is None:
+self.missing_values = []
+else:
+self.missing_values = set(list(missing_values) + [''])
+
+def __call__(self, value):
+if value in self.missing_values:
+return self.default
+return self.func(value)
+
+def upgrade(self, value):
+
+Tries to find the best converter for `value`, by testing different
+converters in order.
+The order in which the converters are tested is read from the
+:attr:`_status` attribute of the instance.
+
+try:
+self.__call__(value)
+except ValueError:
+_statusmax = len(self.mapper)
+if self._status == _statusmax:
+raise ValueError(Could not find a valid conversion function)
+elif self._status  _statusmax - 1:
+self._status += 1
+(self.func, self.default) = self.mapper[self._status]
+self.upgrade(value)
+
+def _get_from_dtype(self, dtype):
+
+Sets the :attr:`func

Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote:
 Ryan,
 Quick comments:
 
 * I already have some unittests for StringConverter, check the file I 
 attach.

Ok, great.

 * Your str2bool will probably mess things up in upgrade compared to the 
 one JDH had written (the one I send you): you don't wanna use 
 int(bool(value)), as it'll always give you 0 or 1 when you might need a 
 ValueError

Ok, I wasn't sure.  I was trying to merge what the old code used with 
the new str2bool you supplied.  That's probably not all that necessary.

 * Your locked version of update won't probably work either, as you force 
 the converter to output a string (you set the status to largest 
 possible, that's the one that outputs strings). Why don't you set the 
 status to the current one (make a tmp one if needed).

Looking at the code, it looks like mapper is only used in the upgrade() 
method. My goal by setting status to the largest possible is to lock the 
converter to the supplied function.  That way for the user supplied 
converters, the StringConverter doesn't try to upgrade away from it.  My 
thinking was that if the user supplied converter function fails, the 
user should know. (Though I got this wrong the first time.)

 * I'd probably get rid of StringConverter._get_from_dtype, as it is not 
 needed outside the __init__. You may wanna stick to the original __init__.

Done.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May
Pierre GM wrote:
 On Nov 25, 2008, at 10:02 PM, Ryan May wrote:
 Pierre GM wrote:
 * Your locked version of update won't probably work either, as you  
 force
 the converter to output a string (you set the status to largest
 possible, that's the one that outputs strings). Why don't you set the
 status to the current one (make a tmp one if needed).
 Looking at the code, it looks like mapper is only used in the  
 upgrade()
 method. My goal by setting status to the largest possible is to lock  
 the
 converter to the supplied function.  That way for the user supplied
 converters, the StringConverter doesn't try to upgrade away from  
 it.  My
 thinking was that if the user supplied converter function fails, the
 user should know. (Though I got this wrong the first time.)

 
 Then, define a _locked attribute in StringConverter, and prevent  
 upgrade to run if self._locked is True.

Sure if you're into logic and sound design.  I was going more for 
hackish and obtuse.

(No seriously, I don't know why I didn't think of that.)

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Minimum dtype

2008-11-25 Thread Ryan May
Hi,

I'm running on a 64-bit machine, and see the following:

 numpy.array(64.6).dtype
dtype('float64')

 numpy.array(64).dtype
dtype('int64')

Is there any function/setting to make these default to 32-bit types 
except where necessary? I don't mean by specifying dtype=numpy.float32 
or dtype=numpy.int32.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] More loadtxt() changes

2008-11-25 Thread Ryan May

Pierre GM wrote:

On Nov 25, 2008, at 10:02 PM, Ryan May wrote:

Pierre GM wrote:
* Your locked version of update won't probably work either, as you  
force

the converter to output a string (you set the status to largest
possible, that's the one that outputs strings). Why don't you set the
status to the current one (make a tmp one if needed).
Looking at the code, it looks like mapper is only used in the  
upgrade()
method. My goal by setting status to the largest possible is to lock  
the

converter to the supplied function.  That way for the user supplied
converters, the StringConverter doesn't try to upgrade away from  
it.  My

thinking was that if the user supplied converter function fails, the
user should know. (Though I got this wrong the first time.)


Updated patch attached.  This includes:
 * Updated docstring
 * New tests
 * Fixes for previous issues
 * Fixes to make new tests actually work

I appreciate any and all feedback.

Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Index: numpy/lib/io.py
===
--- numpy/lib/io.py (revision 6107)
+++ numpy/lib/io.py (working copy)
@@ -233,29 +233,136 @@
 for name in todel:
 os.remove(name)
 
-# Adapted from matplotlib
+def _string_like(obj):
+try: obj + ''
+except (TypeError, ValueError): return False
+return True
 
-def _getconv(dtype):
-typ = dtype.type
-if issubclass(typ, np.bool_):
-return lambda x: bool(int(x))
-if issubclass(typ, np.integer):
-return lambda x: int(float(x))
-elif issubclass(typ, np.floating):
-return float
-elif issubclass(typ, np.complex):
-return complex
+def str2bool(value):
+
+Tries to transform a string supposed to represent a boolean to a boolean.
+
+Raises
+--
+ValueError
+If the string is not 'True' or 'False' (case independent)
+
+value = value.upper()
+if value == 'TRUE':
+return True
+elif value == 'FALSE':
+return False
 else:
-return str
+raise ValueError(Invalid boolean)
 
+class StringConverter(object):
+
+Factory class for function transforming a string into another object (int,
+float).
 
-def _string_like(obj):
-try: obj + ''
-except (TypeError, ValueError): return 0
-return 1
+After initialization, an instance can be called to transform a string 
+into another object. If the string is recognized as representing a missing
+value, a default value is returned.
 
+Parameters
+--
+dtype : dtype, optional
+Input data type, used to define a basic function and a default value
+for missing data. For example, when `dtype` is float, the :attr:`func`
+attribute is set to ``float`` and the default value to `np.nan`.
+missing_values : sequence, optional
+Sequence of strings indicating a missing value.
+
+Attributes
+--
+func : function
+Function used for the conversion
+default : var
+Default value to return when the input corresponds to a missing value.
+mapper : sequence of tuples
+Sequence of tuples (function, default value) to evaluate in order.
+
+
+from numpy.core import nan # To avoid circular import
+mapper = [(str2bool, None),
+  (int, -1), #Needs to be int so that it can fail and promote
+ #to float
+  (float, nan),
+  (complex, nan+0j),
+  (str, '???')]
+
+def __init__(self, dtype=None, missing_values=None):
+self._locked = False
+if dtype is None:
+self.func = str2bool
+self.default = None
+self._status = 0
+else:
+dtype = np.dtype(dtype).type
+if issubclass(dtype, np.bool_):
+(self.func, self.default, self._status) = (str2bool, 0, 0)
+elif issubclass(dtype, np.integer):
+#Needs to be int(float(x)) so that floating point values will
+#be coerced to int when specifid by dtype
+(self.func, self.default, self._status) = (lambda x: 
int(float(x)), -1, 1)
+elif issubclass(dtype, np.floating):
+(self.func, self.default, self._status) = (float, np.nan, 2)
+elif issubclass(dtype, np.complex):
+(self.func, self.default, self._status) = (complex, np.nan + 
0j, 3)
+else:
+(self.func, self.default, self._status) = (str, '???', -1)
+
+# Store the list of strings corresponding to missing values.
+if missing_values is None:
+self.missing_values = []
+else:
+self.missing_values = set(list(missing_values) + [''])
+
+def __call__(self, value):
+if value in self.missing_values:
+return self.default
+return

[Numpy-discussion] numpy.loadtxt requires seek()?

2008-11-20 Thread Ryan May
Hi,

Does anyone know why numpy.loadtxt(), in checking the validity of a
filehandle, checks for the seek() method, which appears to have no
bearing on whether an object will work?

I'm trying to use loadtxt() directly with the file-like object returned
by urllib2.urlopen().  If I change the check for 'seek' to one for
'readline', using the urlopen object works with a hitch.

As far as I can tell, all the filehandle object needs to meet is:

1) Have a readline() method so that loadtxt can skip the first N lines
and read the first line of data

2) Be compatible with itertools.chain() (should be any iterable)

At a minimum, I'd ask to change the check for 'seek' to one for 'readline'.

On a bit deeper thought, it would seem that loadtxt would work with any
iterable that returns individual lines.  I'd like then to change the
calls to readline() to just getting the next object from the iterable
(iter.next() ?) and change the check for a file-like object to just a
check for an iterable.  In fact, we could use the iter() builtin to
convert whatever got passed.  That would give automatically a next()
method and would raise a TypeError if it's incompatible.

Thoughts?  I'm willing to write up the patch for either
.
Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.loadtxt requires seek()?

2008-11-20 Thread Ryan May
Stéfan van der Walt wrote:
 2008/11/20 Ryan May [EMAIL PROTECTED]:
 Does anyone know why numpy.loadtxt(), in checking the validity of a
 filehandle, checks for the seek() method, which appears to have no
 bearing on whether an object will work?
 
 I think this is simply a naive mistake on my part.  I was looking for
 a way to identify files; your patch would be welcome.

I've attached a simple patch that changes the check for seek() to a
check for readline().  I'll punt on my idea of just using iterators,
since that seems like slightly greater complexity for no gain. (I'm not
sure how many people end up with data in a list of strings and wish they
could pass that to loadtxt).

While you're at it, would you commit my patch to add support for bzipped
files as well (attached)?

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Index: numpy/lib/io.py
===
--- numpy/lib/io.py (revision 5953)
+++ numpy/lib/io.py (working copy)
@@ -253,8 +253,8 @@
 Parameters
 --
 fname : file or string
-File or filename to read.  If the filename extension is ``.gz``,
-the file is first decompressed.
+File or filename to read.  If the filename extension is ``.gz`` or
+``.bz2``, the file is first decompressed.
 dtype : data-type
 Data type of the resulting array.  If this is a record data-type,
 the resulting array will be 1-dimensional, and each row will be
@@ -320,6 +320,9 @@
 if fname.endswith('.gz'):
 import gzip
 fh = gzip.open(fname)
+elif fname.endswith('.bz2'):
+import bz2
+fh = bz2.BZ2File(fname)
 else:
 fh = file(fname)
 elif hasattr(fname, 'seek'):
Index: numpy/lib/io.py
===
--- numpy/lib/io.py (revision 6085)
+++ numpy/lib/io.py (working copy)
@@ -333,7 +333,7 @@
 fh = gzip.open(fname)
 else:
 fh = file(fname)
-elif hasattr(fname, 'seek'):
+elif hasattr(fname, 'readline'):
 fh = fname
 else:
 raise ValueError('fname must be a string or file handle')
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Matlib docstring typos

2008-11-12 Thread Ryan May
Hi,

Here's a quick diff to fix some typos in the docstrings for matlib.zeros
and matlib.ones.  They're causing 2 (of many) failures in the doctests
for me on SVN HEAD.

Filed in trac as #953
(http://www.scipy.org/scipy/numpy/ticket/953)

(Unless someone wants to give me SVN rights for fixing/adding small
things like this.)

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Matlib docstring typos

2008-11-12 Thread Ryan May
Pauli Virtanen wrote:
 Hi,
 
 Wed, 12 Nov 2008 10:16:35 -0600, Ryan May wrote:
 Here's a quick diff to fix some typos in the docstrings for matlib.zeros
 and matlib.ones.  They're causing 2 (of many) failures in the doctests
 for me on SVN HEAD.
 
 There are probably bound to be more of these. It's possible to fix them 
 using this:
 
   http://docs.scipy.org/numpy/
   http://docs.scipy.org/numpy/docs/numpy.matlib.zeros/
   http://docs.scipy.org/numpy/docs/numpy.matlib.ones/
 
 The changes will propagate from there eventually to SVN, alongside all 
 other documentation improvements.
 
Great, can someone get me edit access?

User: rmay

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] setting element

2008-11-12 Thread Ryan May
Charles سمير Doutriaux wrote:
 Hello,
 
 I'm wondering if there's aquick way to do the following:
 
 s[:,5]=value
 
 in a general function
 def setval(array,index,value,axis=0):
   ## code here

Assuming that axis specifies where the index goes, that would be:

def setval(array, index, value, axis=0):
slices = [slice(None)] * len(array.shape)
slices[axis] = index
array[slices] = value

(Adapted from the code for numpy.diff)

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy array change notifier?

2008-10-27 Thread Ryan May
Brent Pedersen wrote:
 On Mon, Oct 27, 2008 at 1:56 PM, Robert Kern [EMAIL PROTECTED] wrote:
 On Mon, Oct 27, 2008 at 15:54, Erik Tollerud [EMAIL PROTECTED] wrote:
 Is there any straightforward way of notifying on change of a numpy
 array that leaves the numpy arrays still efficient?
 Not currently, no.

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

 
 out of curiosity,
 would something like this affect efficiency (and/or work):
 
 class Notify(numpy.ndarray):
 def __setitem__(self, *args):
 self.notify(*args)
 return super(Notify, self).__setitem__(*args)
 
 def notify(self, *args):
 print 'notify:', args
 
 
 with also overriding setslice?

I haven't given this much thought, but you'd also likely need to do this
for the infix operators (+=, etc.).

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Loadtxt .bz2 support

2008-10-22 Thread Ryan May
Charles R Harris wrote:
 On Tue, Oct 21, 2008 at 1:30 PM, Ryan May [EMAIL PROTECTED] wrote:
 
 Hi,

 I noticed numpy.loadtxt has support for gzipped text files, but not for
 bz2'd files.  Here's a 3 line patch to add bzip2 support to loadtxt.

 Ryan

 --
 Ryan May
 Graduate Research Assistant
 School of Meteorology
 University of Oklahoma

 Index: numpy/lib/io.py
 ===
 --- numpy/lib/io.py (revision 5953)
 +++ numpy/lib/io.py (working copy)
 @@ -320,6 +320,9 @@
 if fname.endswith('.gz'):
 import gzip
 fh = gzip.open(fname)
 +elif fname.endswith('.bz2'):
 +import bz2
 +fh = bz2.BZ2File(fname)
 else:
 fh = file(fname)
 elif hasattr(fname, 'seek'):

 
 Could you open a ticket for this? Mark it as an enhancement.
 

Done. #940

http://scipy.org/scipy/numpy/ticket/940

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Loadtxt .bz2 support

2008-10-21 Thread Ryan May
Hi,

I noticed numpy.loadtxt has support for gzipped text files, but not for
bz2'd files.  Here's a 3 line patch to add bzip2 support to loadtxt.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Index: numpy/lib/io.py
===
--- numpy/lib/io.py (revision 5953)
+++ numpy/lib/io.py (working copy)
@@ -320,6 +320,9 @@
 if fname.endswith('.gz'):
 import gzip
 fh = gzip.open(fname)
+elif fname.endswith('.bz2'):
+import bz2
+fh = bz2.BZ2File(fname)
 else:
 fh = file(fname)
 elif hasattr(fname, 'seek'):
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Need **working** code example of 2-D arrays

2008-10-13 Thread Ryan May
Stéfan van der Walt wrote:
 Linda,
 
 2008/10/13 Linda Seltzer [EMAIL PROTECTED]:
 Those statements are not demeaning; lighten up.
 STOP IT.  JUST STOP IT.  STOP IT RIGHT NOW.
 Is there a moderator on the list to put a stop to these kinds of statements?
 I deserve to be treated with respect.
 I deserve to have my questions treated with respect.
 I deserve to receive technical information without personal attacks.
 
 I think you'll be hard pressed to find a more friendly, open and
 relaxed mailing list than this one.  We're like having piña coladas
 while we type.  That said, keep in mind that you are asking
 professionals to donate *their* valuable time to solve *your* problem.
  They gladly do so, but at the same time they try to be efficient; so
 if you sometimes receive a curt answer, it certainly wasn't meant to
 be rude.  Many of us also sprinkle our responses with a liberal dose
 of Tongue In Cheek :)
 
 It looks like you received some good answers to your question, but let
 us know if your problems persist and we'll help you sort it out.

Well said.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Array shape

2008-10-03 Thread Ryan May
Kelly Vincent wrote:
 I'm using Numpy to do some basic array manipulation, and I'm getting
 some unexpected behavior from shape. Specifically, I have some 3x3 and
 2x2 matrices, and shape gives me (5, 3) and (3, 2) for their respective
 sizes. I was expecting (3, 3) and (2, 2), for number of rows, number of
 columns. I'm assuming I must either be misunderstanding what shape gives
 you or doing something wrong. Can anybody give me any advice? I'm using
 Python 2.5 and Numpy 1.1.0.

Can you post a complete, minimal example that shows the problem you
have?  For an array object A, A.shape should give the shape you're
expecting.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] profiling line by line

2008-09-18 Thread Ryan May
Ondrej Certik wrote:
 On Thu, Sep 18, 2008 at 1:01 PM, Robert Cimrman [EMAIL PROTECTED] wrote:

 It requires Cython and a C compiler to build. I'm still debating
 myself about the desired workflow for using it, but for now, it only
 profiles functions which you have registered with it. I have made the
 profiler work as a decorator to make this easy. E.g.,
 many thanks for this! I have wanted to try out the profiler but failed
 to build it (changeset 6 0de294aa75bf):

 $ python setup.py install --root=/home/share/software/
 running install
 running build
 running build_py
 creating build
 creating build/lib.linux-i686-2.4
 copying line_profiler.py - build/lib.linux-i686-2.4
 running build_ext
 cythoning _line_profiler.pyx to _line_profiler.c
 building '_line_profiler' extension
 creating build/temp.linux-i686-2.4
 i486-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC
 -I/usr/include/python2.4 -c -I/usr/include/python2.4 -c _line_profiler.c
 -o build/temp.linux-i686-2.4/_line_profiler.o
 _line_profiler.c:1614: error: 'T_LONGLONG' undeclared here (not in a
 function)
 error: command 'i486-pc-linux-gnu-gcc' failed with exit status 1

 I have cython-0.9.8.1 and GCC 4.1.2, 32-bit machine.
 
 I am telling you all the time Robert to use Debian that it just works
 and you say, no no, gentoo is the best. :)

And what's wrong with that? :)  Once you get over the learning curve, 
Gentoo works just fine.  Must be Robert K.'s fault. :)

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A bug in loadtxt and how to convert a string array (hex data) to decimal?

2008-09-18 Thread Ryan May
frank wang wrote:
 
 
 
 Hi, All,
 
 
  
 
 
 I have found a bug in the loadtxt function. Here is the example. The 
 file name is test.txt and contains:
 
 
 Thist is test
 
 
 3FF 3fE
 
 
 3Ef 3e8
 
 
 3Df 3d9
 
 
 3cF 3c7
 
 
  
 
 
 In the Python 2.5.2, I type:
 
 
  
 
 
 test=loadtxt('test.txt',comments='',dtype='string',converters={0:lambda 
 s:int(s,16)})
 
 
  
 
 
 test will contain
 
 
  
 
 
 array([['102', '3fE'],
 
 
['100', '3e8'],
 
 
['991', '3d9'],
 
 
['975', '3c7']],
 
 
   dtype='|S3')
 

It's because of how numpy handles strings arrays (which I admit I don't 
understand very well.)  Basically, it's converting the numbers properly, 
but truncating them to 3 characters.  Try this, which just forces it to 
expand to strings 4 characters wide:

test=loadtxt('test.txt',comments='',dtype='|S4',converters={0:lambda
s:int(s,16)})

HTH,

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BUG in numpy.loadtxt?

2008-09-05 Thread Ryan May
David Huard wrote:
 Hi Ryan,
 
 I applied your patch in r5788 on the trunk.
 I noticed there was another bug occurring when both converters and 
 usecols are provided.
 I've added regression tests for both bugs. Could you confirm that 
 everything is fine on your side ?
 

I can confirm that it works fine for me.  Can you or someone else 
backport this to the 1.2 branch so that this bug is fixed in the next 
release?

Thanks,

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BUG in numpy.loadtxt?

2008-09-05 Thread Ryan May
Thanks a bunch for getting these done.

David Huard wrote:
 Done in r5790.
 
 On Fri, Sep 5, 2008 at 12:36 PM, Ryan May [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:
 
 David Huard wrote:
   Hi Ryan,
  
   I applied your patch in r5788 on the trunk.
   I noticed there was another bug occurring when both converters and
   usecols are provided.
   I've added regression tests for both bugs. Could you confirm that
   everything is fine on your side ?
  
 
 I can confirm that it works fine for me.  Can you or someone else
 backport this to the 1.2 branch so that this bug is fixed in the next
 release?
 
 Thanks,
 
 Ryan
 
 --
 Ryan May
 Graduate Research Assistant
 School of Meteorology
 University of Oklahoma
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org mailto:Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] BUG in numpy.loadtxt?

2008-09-04 Thread Ryan May
Stefan (or anyone else who can comment),

It appears that the usecols argument to loadtxt no longer accepts numpy 
arrays:

 from StringIO import StringIO
 text = StringIO('1 2 3\n4 5 6\n')
 data = np.loadtxt(text, usecols=np.arange(1,3))

ValueErrorTraceback (most recent call last)

/usr/lib64/python2.5/site-packages/numpy/lib/io.py in loadtxt(fname, 
dtype, comments, delimiter, converters, skiprows, usecols, unpack)
 323 first_line = fh.readline()
 324 first_vals = split_line(first_line)
-- 325 N = len(usecols or first_vals)
 326
 327 dtype_types = flatten_dtype(dtype)

ValueError: The truth value of an array with more than one element is 
ambiguous. Use a.any() or a.all()

 data = np.loadtxt(text, usecols=np.arange(1,3).tolist())
 data
array([[ 2.,  3.],
[ 5.,  6.]])

Was it a conscious design decision that the usecols no longer accept 
arrays? The new behavior (in 1.1.1) breaks existing code that one of my 
colleagues has.  Can we get a patch in before 1.2 to get this working 
with arrays again?

Thanks,

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] BUG in numpy.loadtxt?

2008-09-04 Thread Ryan May
Travis E. Oliphant wrote:
 Ryan May wrote:
 Stefan (or anyone else who can comment),

 It appears that the usecols argument to loadtxt no longer accepts numpy 
 arrays:
   
 
 Could you enter a ticket so we don't lose track of this.  I don't 
 remember anything being intentional.
 

Done: #905
http://scipy.org/scipy/numpy/ticket/905

I've attached a patch that does the obvious and coerces usecols to a 
list when it's not None, so it will work for any iterable.

I don't think it was a conscious decision, just a consequence of the 
rewrite using different methods.  There are two problems:

1) It's an API break, technically speaking
2) It currently doesn't even accept tuples, which are used in the docstring.

Can we hurry and get this into 1.2?

Thanks,
Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] min() of array containing NaN

2008-08-15 Thread Ryan May

 Availability of the NaN functionality in a method of ndarray

 The last point is key.  The NaN behavior is central to analyzing real
 data containing unavoidable bad values, which is the bread and butter
 of a substantial fraction of the user base.  In the languages they're
 switching from, handling NaNs is just part of doing business, and is
 an option of every relevant routine; there's no need for redundant
 sets of routines.  In contrast, numpy appears to consider data
 analysis to be secondary, somehow, to pure math, and takes the NaN
 functionality out of routines like min() and std().  This means it's
 not possible to use many ndarray methods.  If we're ready to handle a
 NaN by returning it, why not enable the more useful behavior of
 ignoring it, at user discretion?


Maybe I missed this somewhere, but this seems like a better use for masked
arrays, not NaN's.  Masked arrays were specifically designed to add
functions that work well with masked/invalid data points.  Why reinvent the
wheel here?

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 1.1.1rc1 to be tagged tonight

2008-07-20 Thread Ryan May

Jarrod Millman wrote:

Hello,

This is a reminder that 1.1.1rc1 will be tagged tonight.  Chuck is
planning to spend some time today fixing a few final bugs on the 1.1.x
branch.  If anyone else is planning to commit anything to the 1.1.x
branch today, please let me know immediately.  Obviously now is not
the time to commit anything to the branch that could break anything,
so please be extremely careful if you have to touch the branch.

Once the release is tagged, Chris and David will create binary
installers for both Windows and Mac.  Hopefully, this will give us an
opportunity to have much more widespread testing before releasing
1.1.1 final at the end of the month.


Can I get anyone to look at this patch for loadtext()?

I was trying to use loadtxt() today to read in some text data, and I had
a problem when I specified a dtype that only contained as many elements
as in columns in usecols.  The example below shows the problem:

import numpy as np
import StringIO
data = '''STID RELH TAIR
JOE 70.1 25.3
BOB 60.5 27.9
'''
f = StringIO.StringIO(data)
names = ['stid', 'temp']
dtypes = ['S4', 'f8']
arr = np.loadtxt(f, usecols=(0,2),dtype=zip(names,dtypes), skiprows=1)

With current 1.1 (and SVN head), this yields:

IndexErrorTraceback (most recent call last)

/home/rmay/ipython console in module()

/usr/lib64/python2.5/site-packages/numpy/lib/io.pyc in loadtxt(fname,
dtype, comments, delimiter, converters, skiprows, usecols, unpack)
309 for j in xrange(len(vals))]
310 if usecols is not None:
-- 311 row = [converterseq[j](vals[j]) for j in usecols]
312 else:
313 row = [converterseq[j](val) for j,val in
enumerate(vals)]

IndexError: list index out of range
-

I've added a patch that checks for usecols, and if present, correctly
creates the converters dictionary to map each specified column with
converter for the corresponding field in the dtype. With the attached
patch, this works fine:

arr
array([('JOE', 25.301), ('BOB', 27.899)],
  dtype=[('stid', '|S4'), ('temp', 'f8')])


Thanks,
Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
--- io.py.bak	2008-07-18 18:12:17.0 -0400
+++ io.py	2008-07-16 22:49:13.0 -0400
@@ -292,8 +292,13 @@
 if converters is None:
 converters = {}
 if dtype.names is not None:
-converterseq = [_getconv(dtype.fields[name][0]) \
-for name in dtype.names]
+if usecols is None:
+converterseq = [_getconv(dtype.fields[name][0]) \
+for name in dtype.names]
+else:
+converters.update([(col,_getconv(dtype.fields[name][0])) \
+for col,name in zip(usecols, dtype.names)])
+
 
 for i,line in enumerate(fh):
 if iskiprows: continue
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Masked array fill_value

2008-07-19 Thread Ryan May
Hi,

I just noticed this and found it surprising:

In [8]: from numpy import ma

In [9]: a = ma.array([1,2,3,4],mask=[False,False,True,False],fill_value=0)

In [10]: a
Out[10]:
masked_array(data = [1 2 -- 4],
   mask = [False False  True False],
   fill_value=0)


In [11]: a[2]
Out[11]:
masked_array(data = --,
   mask = True,
   fill_value=1e+20)

In [12]: np.__version__
Out[12]: '1.1.0'

Is there a reason that the fill_value isn't inherited from the parent array?

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Masked array fill_value

2008-07-19 Thread Ryan May
Eric Firing wrote:
 Ryan May wrote:
 Hi,

 I just noticed this and found it surprising:

 In [8]: from numpy import ma

 In [9]: a = ma.array([1,2,3,4],mask=[False,False,True,False],fill_value=0)

 In [10]: a
 Out[10]:
 masked_array(data = [1 2 -- 4],
mask = [False False  True False],
fill_value=0)


 In [11]: a[2]
 Out[11]:
 masked_array(data = --,
mask = True,
fill_value=1e+20)

 In [12]: np.__version__
 Out[12]: '1.1.0'

 Is there a reason that the fill_value isn't inherited from the parent array?
 
 There was a thread about this a couple months ago, and Pierre GM 
 explained it.  I think the point was that indexing is giving you a new 
 masked scalar, which is therefore taking the default mask value of the 
 type.  I don't see it as a problem; you can always specify the fill 
 value explicitly when you need to.

I thought it sounded familiar.  You're right, it's not a big problem, it 
just seemed unintuitive.  Thanks for the explaination.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.loadtext() fails with dtype + usecols

2008-07-18 Thread Ryan May

Hi,

I was trying to use loadtxt() today to read in some text data, and I had 
a problem when I specified a dtype that only contained as many elements 
as in columns in usecols.  The example below shows the problem:


import numpy as np
import StringIO
data = '''STID RELH TAIR
JOE 70.1 25.3
BOB 60.5 27.9
'''
f = StringIO.StringIO(data)
names = ['stid', 'temp']
dtypes = ['S4', 'f8']
arr = np.loadtxt(f, usecols=(0,2),dtype=zip(names,dtypes), skiprows=1)

With current 1.1 (and SVN head), this yields:

IndexErrorTraceback (most recent call last)

/home/rmay/ipython console in module()

/usr/lib64/python2.5/site-packages/numpy/lib/io.pyc in loadtxt(fname, 
dtype, comments, delimiter, converters, skiprows, usecols, unpack)

309 for j in xrange(len(vals))]
310 if usecols is not None:
-- 311 row = [converterseq[j](vals[j]) for j in usecols]
312 else:
313 row = [converterseq[j](val) for j,val in 
enumerate(vals)]


IndexError: list index out of range
--

I've added a patch that checks for usecols, and if present, correctly 
creates the converters dictionary to map each specified column with 
converter for the corresponding field in the dtype. With the attached 
patch, this works fine:


arr
array([('JOE', 25.301), ('BOB', 27.899)],
  dtype=[('stid', '|S4'), ('temp', 'f8')])

Comments?  Can I get this in for 1.1.1?

Thanks,

Ryan

--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
--- io.py.bak	2008-07-18 18:12:17.0 -0400
+++ io.py	2008-07-16 22:49:13.0 -0400
@@ -292,8 +292,13 @@
 if converters is None:
 converters = {}
 if dtype.names is not None:
-converterseq = [_getconv(dtype.fields[name][0]) \
-for name in dtype.names]
+if usecols is None:
+converterseq = [_getconv(dtype.fields[name][0]) \
+for name in dtype.names]
+else:
+converters.update([(col,_getconv(dtype.fields[name][0])) \
+for col,name in zip(usecols, dtype.names)])
+
 
 for i,line in enumerate(fh):
 if iskiprows: continue
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A correction to numpy trapz function

2008-07-12 Thread Ryan May
Nadav Horesh wrote:
 The function trapz accepts x axis vector only for axis=-1. Here is my 
 modification (correction?) to let it accept a vector x for integration along 
 any axis:
 
 def trapz(y, x=None, dx=1.0, axis=-1):
 
 Integrate y(x) using samples along the given axis and the composite
 trapezoidal rule.  If x is None, spacing given by dx is assumed. If x
 is an array, it must have either the dimensions of y, or a vector of
 length matching the dimension of y along the integration axis.
 
 y = asarray(y)
 nd = y.ndim
 slice1 = [slice(None)]*nd
 slice2 = [slice(None)]*nd
 slice1[axis] = slice(1,None)
 slice2[axis] = slice(None,-1)
 if x is None:
 d = dx
 else:
 x = asarray(x)
 if x.ndim == 1:
 if len(x) != y.shape[axis]:
 raise ValueError('x length (%d) does not match y axis %d 
 length (%d)' % (len(x), axis, y.shape[axis]))
 d = diff(x)
 return tensordot(d, (y[slice1]+y[slice2])/2.0,(0, axis))
 d = diff(x, axis=axis)
 return add.reduce(d * (y[slice1]+y[slice2])/2.0,axis)
 

What version were you working with originally? With 1.1, this is what I 
have:

def trapz(y, x=None, dx=1.0, axis=-1):
 Integrate y(x) using samples along the given axis and the composite
 trapezoidal rule.  If x is None, spacing given by dx is assumed.
 
 y = asarray(y)
 if x is None:
 d = dx
 else:
 d = diff(x,axis=axis)
 nd = len(y.shape)
 slice1 = [slice(None)]*nd
 slice2 = [slice(None)]*nd
 slice1[axis] = slice(1,None)
 slice2[axis] = slice(None,-1)
 return add.reduce(d * (y[slice1]+y[slice2])/2.0,axis)

For me, this works fine with supplying x for axis != -1.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A correction to numpy trapz function

2008-07-12 Thread Ryan May
Nadav Horesh wrote:
 Here is what I get with the orriginal trapz function:
 
 IDLE 1.2.2  
 import numpy as np
 np.__version__
 '1.1.0'
 y = np.arange(24).reshape(6,4)
 x = np.arange(6)
 np.trapz(y, x, axis=0)
 
 Traceback (most recent call last):
   File pyshell#4, line 1, in module
 np.trapz(y, x, axis=0)
   File C:\Python25\Lib\site-packages\numpy\lib\function_base.py, line 1536, 
 in trapz
 return add.reduce(d * (y[slice1]+y[slice2])/2.0,axis)
 ValueError: shape mismatch: objects cannot be broadcast to a single shape
 
(Try not to top post on this list.)

I can get it to work like this:

import numpy as np
y = np.arange(24).reshape(6,4)
x = np.arange(6).reshape(-1,1)
np.trapz(y, x, axis=0)

 From the text of the error message, you can see this is a problem with 
broadcasting.  Due to broadcasting rules (which will *prepend* 
dimensions with size 1), you need to manually add an extra dimension to 
the end.  Once I resize x, I can get this to work.  You might want to 
look at this: http://www.scipy.org/EricsBroadcastingDoc

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Doctest items

2008-07-01 Thread Ryan May
Robert Kern wrote:
 On Tue, Jul 1, 2008 at 17:50, Fernando Perez [EMAIL PROTECTED] wrote:
 On Tue, Jul 1, 2008 at 1:41 PM, Pauli Virtanen [EMAIL PROTECTED] wrote:

 But it's a custom tweak to doctest, so it might break at some point in
 the future, and I don't love the monkeypatching here...
 Welcome to the joys of extending doctest/unittest.  They hardcoded so
 much stuff in there that the only way to reuse that code is by
 copy/paste/monkeypatch.  It's absolutely atrocious.

 We could always just make the plotting section one of those it's just
 an example not a doctest things and remove the  (since it doesn't
 appear to provide any useful test coverage or anything).
 If possible, I'd like other possibilities be considered first before
 jumping this route. I think it would be nice to retain the ability to run
 also the matplotlib examples as (optional) doctests, to make sure also
 they execute correctly. Also, using two different markups in the
 documentation to work around a shortcoming of doctest is IMHO not very
 elegant.
 How about a much simpler approach?  Just pre-populate the globals dict
 where doctest executes with an object called 'plt' that basically does

 def noop(*a,**k): pass

 class dummy():
  def __getattr__(self,k): return noop

 plt = dummy()

 This would ensure that all calls to plt.anything() silently succeed in
 the doctests.  Granted, we're not testing matplotlib, but it has the
 benefit of simplicity and of letting us keep consistent formatting,
 and examples that *users* can still paste into their sessions where
 plt refers to the real matplotlib.
 
 It's actually easier for users to paste the non-doctestable examples
 since they don't have the  markers and any stdout the examples
 produce as a byproduct.
 

I'm with Robert here.  It's definitely easier as an example without the 
 .  I also don't see the utility of being able to have the 
matplotlib code as tests of anything.  We're not testing matplotlib here 
and any behavior that matplotlib relies on (and hence tests) should be 
captured in a test for that behavior separate from matplotlib code.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Doctest items

2008-07-01 Thread Ryan May
Robert Kern wrote:
 On Tue, Jul 1, 2008 at 19:19, Ryan May [EMAIL PROTECTED] wrote:
 Robert Kern wrote:
 On Tue, Jul 1, 2008 at 17:50, Fernando Perez [EMAIL PROTECTED] wrote:
 On Tue, Jul 1, 2008 at 1:41 PM, Pauli Virtanen [EMAIL PROTECTED] wrote:

 But it's a custom tweak to doctest, so it might break at some point in
 the future, and I don't love the monkeypatching here...
 Welcome to the joys of extending doctest/unittest.  They hardcoded so
 much stuff in there that the only way to reuse that code is by
 copy/paste/monkeypatch.  It's absolutely atrocious.

 We could always just make the plotting section one of those it's just
 an example not a doctest things and remove the  (since it doesn't
 appear to provide any useful test coverage or anything).
 If possible, I'd like other possibilities be considered first before
 jumping this route. I think it would be nice to retain the ability to run
 also the matplotlib examples as (optional) doctests, to make sure also
 they execute correctly. Also, using two different markups in the
 documentation to work around a shortcoming of doctest is IMHO not very
 elegant.
 How about a much simpler approach?  Just pre-populate the globals dict
 where doctest executes with an object called 'plt' that basically does

 def noop(*a,**k): pass

 class dummy():
  def __getattr__(self,k): return noop

 plt = dummy()

 This would ensure that all calls to plt.anything() silently succeed in
 the doctests.  Granted, we're not testing matplotlib, but it has the
 benefit of simplicity and of letting us keep consistent formatting,
 and examples that *users* can still paste into their sessions where
 plt refers to the real matplotlib.
 It's actually easier for users to paste the non-doctestable examples
 since they don't have the  markers and any stdout the examples
 produce as a byproduct.

 I'm with Robert here.  It's definitely easier as an example without the
  .  I also don't see the utility of being able to have the
 matplotlib code as tests of anything.  We're not testing matplotlib here
 and any behavior that matplotlib relies on (and hence tests) should be
 captured in a test for that behavior separate from matplotlib code.
 
 To be clear, these aren't tests of the numpy code. The tests would be
 to make sure the examples still run.
 
Right.  I just don't think effort should be put into making examples 
using matplotlib run as doctests.  If the behavior is important, numpy 
should have a standalone test for it.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray methods vs numpy module functions

2008-06-23 Thread Ryan May
Robert Kern wrote:
 On Mon, Jun 23, 2008 at 18:10, Sebastian Haase [EMAIL PROTECTED] wrote:
 On Mon, Jun 23, 2008 at 10:31 AM, Bob Dowling [EMAIL PROTECTED] wrote:
 [ I'm new here and this has the feel of an FAQ but I couldn't find
 anything at http://www.scipy.org/FAQ .  If I should have looked
 somewhere else a URL would be gratefully received. ]


 What's the reasoning behind functions like sum() and cumsum() being
 provided both as module functions (numpy.sum(data, axis=1)) and as
 object methods (data.sum(axis=1)) but other functions - and I stumbled
 over diff() - only being provided as module functions?


 Hi Bob,
 this is a very good question.
 I think the answers are
 a) historical reasons AND, more importantly, differing personal preferences
 b) I would file  the missing data.diff() as a bug.
 
 It's not.
 
Care to elaborate?

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bug in numpy all() function

2008-02-06 Thread Ryan May
Dan Goodman wrote:
 Hi all,
 
 I think this is a bug (I'm running Numpy 1.0.3.1):
 
 from numpy import *
 def f(x): return False
 
 all(f(x) for x in range(10))
 True
 
 I guess the all function doesn't know about generators?
 

That's likely the problem.  However, as of Python 2.5, there's a built
in function that will do what you want.  However, you would mask that
builtin with the from numpy import *.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] New to ctypes. Some problems with loading shared library.

2008-02-05 Thread Ryan May
Lou Pecora wrote:
 I got ctypes installed and passing its own tests.  But
 I cannot get the shared library to load.  I am using
 Mac OS X 10.4.11, Python 2.4 running through the
 Terminal.
 
 I am using Albert Strasheim's example on
 http://scipy.org/Cookbook/Ctypes2 except that I had to
 remove the defined 'extern' for FOO_API since the gcc
 compiler complained about two 'externs' (I don't
 really understand what the extern does here anyway). 
 
 My make file for generating the library is simple,
 
 #  Link ---
 test1ctypes.so:  test1ctypes.o  test1ctypes.mak
   gcc -bundle -flat_namespace -undefined suppress -o
 test1ctypes.so  test1ctypes.o
 
 #  gcc C compile --
 test1ctypes.o:  test1ctypes.c test1ctypes.h
 test1ctypes.mak
   gcc -c test1ctypes.c -o test1ctypes.o
 
 This generates the file test1ctypes.so.  But when I
 try to load it
 
 import numpy as N
 import ctypes as C
 
 _test1 = N.ctypeslib.load_library('test1ctypes', '.')
 
 I get the error message,
 
 OSError: dlopen(/Users/loupecora/test1ctypes.dylib,
 6): image not found
 
 I've been googling for two hours trying to find the
 problem or other examples that would give me a clue,
 but no luck.  
 
 Any ideas what I'm doing wrong?  Thanks for any clues.
 
Well, it's looking for test1ctypes.dylib, which I guess is a MacOSX
shared library?  Meanwhile, you made a test1ctypes.so, which is why it
can't find it.  You could try using this instead:

_test1 = N.ctypeslib.load_library('test1ctypes.so', '.')

or try to get gcc to make a test1ctypes.dylib.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Nasty bug using pre-initialized arrays

2008-01-07 Thread Ryan May
Charles R Harris wrote:
 
 
 On Jan 7, 2008 8:47 AM, Ryan May [EMAIL PROTECTED] mailto:[EMAIL 
 PROTECTED] wrote:
 
 Stuart Brorson wrote:
  I realize NumPy != Matlab, but I'd wager that most users would think
  that this is the natural behavior..
  Well, that behavior won't happen. We won't mutate the dtype of
 the array because
  of assignment. Matlab has copy(-on-write) semantics for things
 like slices while
  we have view semantics. We can't safely do the reallocation of
 memory [1].
 
  That's fair enough.  But then I think NumPy should consistently
  typecheck all assignmetns and throw an exception if the user attempts
  an assignment which looses information.
 
 
 Yeah, there's no doubt in my mind that this is a bug, if for no other
 reason than this inconsistency:
 
 
 One place where Numpy differs from MatLab is the way memory is handled.
 MatLab is always generating new arrays, so for efficiency it is worth
 preallocating arrays and then filling in the parts. This is not the case
 in Numpy where lists can be used for things that grow and subarrays are
 views. Consequently, preallocating arrays in Numpy should be rare and
 used when either the values have to be generated explicitly, which is
 what you see when using the indexes in your first example. As to
 assignment between arrays, it is a mixed question. The problem again is
 memory usage. For large arrays, it makes since to do automatic
 conversions, as is also the case in functions taking output arrays,
 because the typecast can be pushed down into C where it is time and
 space efficient, whereas explicitly converting the array uses up
 temporary space. However, I can imagine an explicit typecast function,
 something like
 
 a[...] = typecast(b)
 
 that would replace the current behavior. I think the typecast function
 could be implemented by returning a view of b with a castable flag set
 to true, that should supply enough information for the assignment
 operator to do its job. This might be a good addition for Numpy 1.1.

While that seems like an ok idea, I'm still not sure what's wrong with
raising an exception when there will be information loss.  The exception
is already raised with standard python complex objects.  I can think of
many times in my code where explicit looping is a necessity, so
pre-allocating the array is the only way to go.

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] weibull distribution has only one parameter?

2007-11-12 Thread Ryan May
D.Hendriks (Dennis) wrote:
 Alan G Isaac wrote:
 On Mon, 12 Nov 2007, D.Hendriks (Dennis) apparently wrote: 
   
 All of this makes me doubt the correctness of the formula 
 you proposed. 
 
 It is always a good idea to hesitate before doubting Robert.
 URL:http://en.wikipedia.org/wiki/Weibull_distribution#Generating_Weibull-distributed_random_variates

 hth,
 Alan Isaac
   
 So, you are saying that it was indeed correct? That still leaves the 
 question why I can't seem to confirm that in the figure I mentioned (red 
 and green lines). Also, if you refer to X = lambda*(-ln(U))^(1/k) as 
 'proof' for the validity of the formula, I have to ask if 
 Weibull(a,Size) does actually correspond to (-ln(U))^(1/a)?
 

Have you actually looked at a histogram of the random variates generated 
this way to see if they are wrong?

Multiplying the the individual random values by a number changes the 
distribution differently than multiplying the distribution/density 
function by a number.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Convert array type

2007-10-08 Thread Ryan May
Gary Ruben wrote:
 Try using astype. This works:
 
 values = array(wavearray.split()).astype(float)
 

Why not use numpy.fromstring?

fromstring(string, dtype=float, count=-1, sep='')

Return a new 1d array initialized from the raw binary data in string.

If count is positive, the new array will have count elements,
otherwise its
size is determined by the size of string.  If sep is not empty then the
string is interpreted in ASCII mode and converted to the desired
number type
using sep as the separator between elements (extra whitespace is
ignored).

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Accessing a numpy array in a mmap fashion

2007-08-30 Thread Ryan May
Brian Donovan wrote:
 Hello all,
 
   I'm wondering if there is a way to use a numpy array that uses disk as
 a memory store rather than ram. I'm looking for something like mmap but
 which can be used like a numpy array. The general idea is this. I'm
 simulating a system which produces a large dataset over a few hours of
 processing time. Rather than store the numpy array in memory during
 processing I'd like to write the data directly to disk but still be able
 to treat the array as a numpy array. Is this possible? Any ideas?

What you're looking for is numpy.memmap, though the documentation is
eluding me at the moment.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] 16bit Integer Array/Scalar Inconsistency

2007-08-02 Thread Ryan May
Hi,

I ran into this while debugging a script today:

In [1]: import numpy as N

In [2]: N.__version__
Out[2]: '1.0.3'

In [3]: d = N.array([32767], dtype=N.int16)

In [4]: d + 32767
Out[4]: array([-2], dtype=int16)

In [5]: d[0] + 32767
Out[5]: 65534

In [6]: type(d[0] + 32767)
Out[6]: type 'numpy.int64'

In [7]: type(d[0])
Out[7]: type 'numpy.int16'

It seems that numpy will automatically promote the scalar to avoid
overflow, but not in the array case.  Is this inconsistency a bug, just
a (known) gotcha?

I myself don't have any problems with the array not being promoted
automatically, but the inconsistency with scalar operation made
debugging my problem more difficult.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] scipy.io.loadmat incompatible with Numpy 1.0.2

2007-04-09 Thread Ryan May
Hi,

As far as I can tell, the new Numpy 1.0.2 broke scipy.io.loadmat. 
Here's what I get when I try to open a file with using loadmat with 
numpy 1.0.2 (on gentoo AMD64):

In [2]: loadmat('tep_iqdata.mat')
---
exceptions.AttributeErrorTraceback (most 
recent call last)

/usr/lib64/python2.4/site-packages/scipy/io/mio.py in loadmat(file_name, 
mdict, appendmat, basename, **kwargs)
  94 '''
  95 MR = mat_reader_factory(file_name, appendmat, **kwargs)
--- 96 matfile_dict = MR.get_variables()
  97 if mdict is not None:
  98 mdict.update(matfile_dict)

/usr/lib64/python2.4/site-packages/scipy/io/miobase.py in 
get_variables(self, variable_names)
 267 variable_names = [variable_names]
 268 self.mat_stream.seek(0)
-- 269 mdict = self.file_header()
 270 mdict['__globals__'] = []
 271 while not self.end_of_stream():

/usr/lib64/python2.4/site-packages/scipy/io/mio5.py in file_header(self)
 508 hdict = {}
 509 hdr = self.read_dtype(self.dtypes['file_header'])
-- 510 hdict['__header__'] = hdr['description'].strip(' \t\n\000')
 511 v_major = hdr['version']  8
 512 v_minor = hdr['version']  0xFF

AttributeError: 'numpy.ndarray' object has no attribute 'strip'


Reverting to numpy 1.0.1 works fine for the same code.  So the question 
is, does scipy need an update, or did something unintended creep into 
Numpy 1.0.2? (Hence the cross-post)

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] scipy.io.loadmat incompatible with Numpy 1.0.2

2007-04-09 Thread Ryan May
Travis Oliphant wrote:
 Ryan May wrote:
 
 Hi,

 As far as I can tell, the new Numpy 1.0.2 broke scipy.io.loadmat. 
  

 Yes, it was the one place that scipy used the fact that field selection 
 of a 0-d array returned a scalar.  This has been changed in NumPy 1.0.2 
 to return a 0-d array.  
 
 The fix is in SciPy SVN.   Just get the mio.py file from SVN and drop it 
 in to your distribution and things should work fine.   Or wait until a 
 SciPy release is made.
 
 -Travis
 

It worked if I also got the new mio5.py (rev. 2893).

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


<    1   2