Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Brent Pedersen
On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin jlcon...@gmail.com wrote:
 On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen p...@iki.fi wrote:
 Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
 I would like to use numpy's memmap on some data files I have. The first
 12 or so lines of the files contain text (header information) and the
 remainder has the numerical data. Is there a way I can tell memmap to
 skip a specified number of lines instead of a number of bytes?

 First use standard Python I/O functions to determine the number of
 bytes to skip at the beginning and the number of data items. Then pass
 in `offset` and `shape` parameters to numpy.memmap.

 Thanks for that suggestion. However, I'm unfamiliar with the I/O
 functions you are referring to. Can you point me to do the
 documentation?

 Thanks again,
 Jeremy
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


this might get you started:


import numpy as np

# make some fake data with 12 header lines.
with open('test.mm', 'w') as fhw:
print  fhw, \n.join('header' for i in range(12))
np.arange(100, dtype=np.uint).tofile(fhw)

# use normal python io to determine of offset after 12 lines.
with open('test.mm') as fhr:
for i in range(12): fhr.readline()
offset = fhr.tell()

# use the offset in your call to np.memmap.
a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset)

assert all(a == np.arange(100))
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to start at line # x when using numpy.memmap

2011-08-19 Thread Brent Pedersen
On Fri, Aug 19, 2011 at 9:09 AM, Jeremy Conlin jlcon...@gmail.com wrote:
 On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen bpede...@gmail.com wrote:
 On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin jlcon...@gmail.com wrote:
 On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen p...@iki.fi wrote:
 Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
 I would like to use numpy's memmap on some data files I have. The first
 12 or so lines of the files contain text (header information) and the
 remainder has the numerical data. Is there a way I can tell memmap to
 skip a specified number of lines instead of a number of bytes?

 First use standard Python I/O functions to determine the number of
 bytes to skip at the beginning and the number of data items. Then pass
 in `offset` and `shape` parameters to numpy.memmap.

 Thanks for that suggestion. However, I'm unfamiliar with the I/O
 functions you are referring to. Can you point me to do the
 documentation?

 Thanks again,
 Jeremy
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 this might get you started:


 import numpy as np

 # make some fake data with 12 header lines.
 with open('test.mm', 'w') as fhw:
    print  fhw, \n.join('header' for i in range(12))
    np.arange(100, dtype=np.uint).tofile(fhw)

 # use normal python io to determine of offset after 12 lines.
 with open('test.mm') as fhr:
    for i in range(12): fhr.readline()
    offset = fhr.tell()

 # use the offset in your call to np.memmap.
 a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset)

 Thanks, that looks good. I tried it, but it doesn't get the correct
 data. I really don't understand what is going on. A simple code and
 sample data is attached if anyone has a chance to look at it.

 Thanks,
 Jeremy

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



in that case, i would use:

np.loadtxt('tmp.dat', skiprows=12)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Memmap with multiprocessing

2011-04-27 Thread Brent Pedersen
On Wed, Apr 27, 2011 at 4:07 PM, Christoph Gohlke cgoh...@uci.edu wrote:
 I don't think this was working correctly in numpy 1.4 either. The
 underlying problem seems to be that instance attributes of ndarray
 subtypes get lost during pickling:

 import pickle
 import numpy as np
 class aarray(np.ndarray):
     def __new__(subtype):
         self = np.ndarray.__new__(subtype, (1,))
         self.attr = 'attr'
         return self
     def __array_finalize__(self, obj):
         self.attr = getattr(obj, 'attr', None)
 a = aarray()
 b = pickle.loads(a.dumps())
 assert a.attr == b.attr, (a.attr, b.attr)

 AssertionError: ('attr', None)

 Christoph


possibly related to this ticket:
http://projects.scipy.org/numpy/ticket/1452



 On 4/26/2011 2:21 PM, Ralf Gommers wrote:
 On Mon, Apr 25, 2011 at 1:16 PM, Thiago Franco Moraes
 totonixs...@gmail.com  wrote:
 Hi,

 Has anyone confirmed if this is a bug? Should I post this in the bug 
 tracker?

 I see the same thing with recent master. Something very strange is
 going on in the memmap.__array_finalize__ method under Windows. Can
 you file a bug?

 Ralf



 Thanks!

 On Tue, Apr 19, 2011 at 9:01 PM, Thiago Franco de Moraes
 totonixs...@gmail.com  wrote:
 Hi all,

 I'm having a error using memmap objects shared among processes created
 by the multprocessing module. This error only happen in Windows with
 numpy 1.5 or above, in numpy 1.4.1 it doesn't happen, In Linux and Mac
 OS X it doesn't happen. This error is demonstrated by this little
 example script here https://gist.github.com/929168 , and the traceback
 is bellow (betweentraceback  tags):

 traceback
 Process Process-1:
 Traceback (most recent call last):
   File C:\Python26\Lib\multiprocessing\process.py, line 232, in 
 _bootstrap
     self.run()
   File C:\Python26\Lib\multiprocessing\process.py, line 88, in run
     self._target(*self._args, **self._kwargs)
   File C:\Documents and Settings\phamorim\Desktop\test.py, line 7, in
 print_ma
 trix
     print matrix
   File C:\Python26\Lib\site-packages\numpy\core\numeric.py, line 1379, in
 arra
 y_str
     return array2string(a, max_line_width, precision, suppress_small, ' ',
 , s
 tr)
   File C:\Python26\Lib\site-packages\numpy\core\arrayprint.py, line 309, 
 in
 ar
 ray2string
     separator, prefix)
   File C:\Python26\Lib\site-packages\numpy\core\arrayprint.py, line 189, 
 in
 _a
 rray2string
     data = _leading_trailing(a)
   File C:\Python26\Lib\site-packages\numpy\core\arrayprint.py, line 162, 
 in
 _l
 eading_trailing
     min(len(a), _summaryEdgeItems))]
   File C:\Python26\Lib\site-packages\numpy\core\memmap.py, line 257, in
 __arra
 y_finalize__
     self.filename = obj.filename
 AttributeError: 'memmap' object has no attribute 'filename'
 Exception AttributeError: AttributeError('NoneType' object has no 
 attribute
 'te
 ll',) inbound method memmap.__del__ of memmap([0, 0, 0, 0, 0, 0, 0, 0, 0,
 0,
 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)  ignored
 /traceback

 I don't know if it's a bug, but I thought it's is import to report
 because the version 1.4.1 was working and 1.5.0 and above was not.

 Thanks!


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] moving window product

2011-03-21 Thread Brent Pedersen
hi, is there a way to take the product along a 1-d array in a moving
window? -- similar to convolve, with product in place of sum?
currently, i'm column_stacking the array with offsets of itself into
window_size columns and then taking the product at axis 1.
like::

  w = np.column_stack(a[i:-window_size+i] for i in range(0, window_size))
  window_product = np.product(w, axis=1)

but then there are the edge effects/array size issues--like those
handled in np.convolve.
it there something in numpy/scipy that addresses this. or that does
the column_stacking with an offset?

thanks,
-brent
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] moving window product

2011-03-21 Thread Brent Pedersen
On Mon, Mar 21, 2011 at 11:19 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Mon, Mar 21, 2011 at 10:10 AM, Brent Pedersen bpede...@gmail.com wrote:
 hi, is there a way to take the product along a 1-d array in a moving
 window? -- similar to convolve, with product in place of sum?
 currently, i'm column_stacking the array with offsets of itself into
 window_size columns and then taking the product at axis 1.
 like::

  w = np.column_stack(a[i:-window_size+i] for i in range(0, window_size))
  window_product = np.product(w, axis=1)

 but then there are the edge effects/array size issues--like those
 handled in np.convolve.
 it there something in numpy/scipy that addresses this. or that does
 the column_stacking with an offset?

 The Bottleneck package has a fast moving window sum (bn.move_sum and
 bn.move_nansum). You could use that along with

 a = np.random.rand(5)
 a.prod()
   0.015877866878931741
 np.exp(np.log(a).sum())
   0.015877866878931751

 Or you could use strides or scipy.ndimage as in
 https://github.com/kwgoodman/bottleneck/blob/master/bottleneck/slow/move.py


ah yes, of course. thank you.

def moving_product(a, window_size, mode=same):
return np.exp(np.convolve(np.log(a), np.ones(window_size), mode))


i'll have a closer look at your strided version in bottleneck as well.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] moving window product

2011-03-21 Thread Brent Pedersen
On Mon, Mar 21, 2011 at 11:57 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Mon, Mar 21, 2011 at 10:34 AM, Brent Pedersen bpede...@gmail.com wrote:
 On Mon, Mar 21, 2011 at 11:19 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Mon, Mar 21, 2011 at 10:10 AM, Brent Pedersen bpede...@gmail.com wrote:
 hi, is there a way to take the product along a 1-d array in a moving
 window? -- similar to convolve, with product in place of sum?
 currently, i'm column_stacking the array with offsets of itself into
 window_size columns and then taking the product at axis 1.
 like::

  w = np.column_stack(a[i:-window_size+i] for i in range(0, window_size))
  window_product = np.product(w, axis=1)

 but then there are the edge effects/array size issues--like those
 handled in np.convolve.
 it there something in numpy/scipy that addresses this. or that does
 the column_stacking with an offset?

 The Bottleneck package has a fast moving window sum (bn.move_sum and
 bn.move_nansum). You could use that along with

 a = np.random.rand(5)
 a.prod()
   0.015877866878931741
 np.exp(np.log(a).sum())
   0.015877866878931751

 Or you could use strides or scipy.ndimage as in
 https://github.com/kwgoodman/bottleneck/blob/master/bottleneck/slow/move.py


 ah yes, of course. thank you.

 def moving_product(a, window_size, mode=same):
    return np.exp(np.convolve(np.log(a), np.ones(window_size), mode))

 i'll have a closer look at your strided version in bottleneck as well.

 I don't know what size problem you are working on or if speed is an
 issue, but here are some timings:

 a = np.random.rand(100)
 window_size = 1000
 timeit np.exp(np.convolve(np.log(a), np.ones(window_size), 'same'))
 1 loops, best of 3: 889 ms per loop
 timeit np.exp(bn.move_sum(np.log(a), window_size))
 10 loops, best of 3: 82.5 ms per loop

 Most all that time is spent in np.exp(np.log(a)):

 timeit bn.move_sum(a, window_size)
 100 loops, best of 3: 3.72 ms per loop

 So I assume if I made a bn.move_prod the time would be around 200x
 compared to convolve.

 BTW, you could do the exp inplace:

 timeit b = bn.move_sum(np.log(a), window_size); np.exp(b, b)
 10 loops, best of 3: 76.3 ms per loop
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


my current use-case is to do this 24 times on arrays of about 200K elements.
file IO is the major bottleneck.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 2D binning

2010-06-01 Thread Brent Pedersen
On Tue, Jun 1, 2010 at 1:51 PM, Wes McKinney wesmck...@gmail.com wrote:
 On Tue, Jun 1, 2010 at 4:49 PM, Zachary Pincus zachary.pin...@yale.edu 
 wrote:
 Hi
 Can anyone think of a clever (non-lopping) solution to the following?

 A have a list of latitudes, a list of longitudes, and list of data
 values. All lists are the same length.

 I want to compute an average  of data values for each lat/lon pair.
 e.g. if lat[1001] lon[1001] = lat[2001] [lon [2001] then
 data[1001] = (data[1001] + data[2001])/2

 Looping is going to take wa to long.

 As a start, are the equal lat/lon pairs exactly equal (i.e. either
 not floating-point, or floats that will always compare equal, that is,
 the floating-point bit-patterns will be guaranteed to be identical) or
 approximately equal to float tolerance?

 If you're in the approx-equal case, then look at the KD-tree in scipy
 for doing near-neighbors queries.

 If you're in the exact-equal case, you could consider hashing the lat/
 lon pairs or something. At least then the looping is O(N) and not
 O(N^2):

 import collections
 grouped = collections.defaultdict(list)
 for lt, ln, da in zip(lat, lon, data):
   grouped[(lt, ln)].append(da)

 averaged = dict((ltln, numpy.mean(da)) for ltln, da in grouped.items())

 Is that fast enough?

 Zach
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 This is a pretty good example of the group-by problem that will
 hopefully work its way into a future edition of NumPy. Given that, a
 good approach would be to produce a unique key from the lat and lon
 vectors, and pass that off to the groupby routine (when it exists).
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


meanwhile groupby from itertools will work but might be a bit slower
since it'll have to convert every row to tuple and group in a list.


import numpy as np
import itertools

# fake data
N = 1
lats = np.repeat(180 * (np.random.ranf(N/ 250) - 0.5), 250)
lons = np.repeat(360 * (np.random.ranf(N/ 250) - 0.5), 250)

np.random.shuffle(lats)
np.random.shuffle(lons)

vals = np.arange(N)
#

inds = np.lexsort((lons, lats))

sorted_lats = lats[inds]
sorted_lons = lons[inds]
sorted_vals = vals[inds]

llv = np.array((sorted_lats, sorted_lons, sorted_vals)).T

for (lat, lon), group in itertools.groupby(llv, lambda row: tuple(row[:2])):
group_vals = [g[-1] for g in group]
print lat, lon, np.mean(group_vals)

# make sure the mean for the last lat/lon from the loop matches the mean
# for that lat/lon from original data.
tests_idx, = np.where((lats == lat)  (lons == lon))
assert np.mean(vals[tests_idx]) == np.mean(group_vals)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Patch] Fix memmap pickling

2010-05-26 Thread Brent Pedersen
On Mon, May 24, 2010 at 3:37 PM, Gael Varoquaux
gael.varoqu...@normalesup.org wrote:
 On Mon, May 24, 2010 at 03:33:09PM -0700, Brent Pedersen wrote:
 On Mon, May 24, 2010 at 3:25 PM, Gael Varoquaux
 gael.varoqu...@normalesup.org wrote:
  Memmapped arrays don't pickle right. I know that to get them to
  really pickle and restore identically, we would need some effort.
  However, in the current status, pickling and restoring a memmapped array
  leads to tracebacks that seem like they could be avoided.

  I am attaching a patch with a test that shows the problem, and a fix.
  Should I create a ticket, or is this light-enough to be applied
  immediatly?

 also check this:
 http://projects.scipy.org/numpy/ticket/1452

 still needs work.

 Does look good. Is there an ETA for your patch to be applied?

 Right now this bug is making code crash when memmapped arrays are used
 (eg multiprocessing), so a hot fix can be useful, without removing any
 merit to your work that addresses the underlying problem.

 Cheers,

 Gaël
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


gael, not sure about ETA of application. i think the main remaining
problem (other than more tests) is py3 support--as charris points out
in the ticket. i have a start which shadows numpy's __getitem__, but
havent fixed all the bugs--and not sure that's a good idea.
my original patch was quite simple as well, but once it starts
supporting all versions and more edge cases ...
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Patch] Fix memmap pickling

2010-05-24 Thread Brent Pedersen
On Mon, May 24, 2010 at 3:25 PM, Gael Varoquaux
gael.varoqu...@normalesup.org wrote:
 Memmapped arrays don't pickle right. I know that to get them to
 really pickle and restore identically, we would need some effort.
 However, in the current status, pickling and restoring a memmapped array
 leads to tracebacks that seem like they could be avoided.

 I am attaching a patch with a test that shows the problem, and a fix.
 Should I create a ticket, or is this light-enough to be applied
 immediatly?

 Cheers,

 Gaël

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



also check this:
http://projects.scipy.org/numpy/ticket/1452

still needs work.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] faster code

2010-05-16 Thread Brent Pedersen
On Sun, May 16, 2010 at 12:14 PM, Davide Lasagna
lasagnadav...@gmail.com wrote:
 Hi all,
 What is the fastest and lowest memory consumption way to compute this?
 y = np.arange(2**24)
 bases = y[1:] + y[:-1]
 Actually it is already quite fast, but i'm not sure whether it is occupying
 some temporary memory
 is the summation. Any help is appreciated.
 Cheers
 Davide
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



how about something like this? may have off-by-1 somewhere.

 bases = np.arange(1, 2*2**24-1, 2)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] patch to pickle np.memmap

2010-04-20 Thread Brent Pedersen
On Tue, Apr 13, 2010 at 8:59 PM, Brent Pedersen bpede...@gmail.com wrote:
 On Tue, Apr 13, 2010 at 8:52 PM, Brent Pedersen bpede...@gmail.com wrote:
 hi, i posted a patch to allow pickling of np.memmap objects.
 http://projects.scipy.org/numpy/ticket/1452

 currently, it always returns 'r' for the mode.
 is that the best thing to do there?
 any other changes?
 -brent


 and i guess it should (but does not with that patch) correctly handle:

 a = np.memmap(...)
 b = a[2:]
 cPickle.dumps(b)


any thoughts? it still always loads in mode='r', but
i updated the patch to handle slicing so it works like this:

 import numpy as np
 a = np.memmap('t.bin', mode='w+', shape=(10,))
 a[:] = np.arange(10)
 a
memmap([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8)
 np.loads(a[1:4].dumps())
memmap([1, 2, 3], dtype=uint8)
 np.loads(a[-2:].dumps())
memmap([8, 9], dtype=uint8)
 np.loads(a[-4:-1].dumps())
memmap([6, 7, 8], dtype=uint8)

http://projects.scipy.org/numpy/ticket/1452

-brent
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] patch to pickle np.memmap

2010-04-13 Thread Brent Pedersen
hi, i posted a patch to allow pickling of np.memmap objects.
http://projects.scipy.org/numpy/ticket/1452

currently, it always returns 'r' for the mode.
is that the best thing to do there?
any other changes?
-brent
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] patch to pickle np.memmap

2010-04-13 Thread Brent Pedersen
On Tue, Apr 13, 2010 at 8:52 PM, Brent Pedersen bpede...@gmail.com wrote:
 hi, i posted a patch to allow pickling of np.memmap objects.
 http://projects.scipy.org/numpy/ticket/1452

 currently, it always returns 'r' for the mode.
 is that the best thing to do there?
 any other changes?
 -brent


and i guess it should (but does not with that patch) correctly handle:

 a = np.memmap(...)
 b = a[2:]
 cPickle.dumps(b)
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Name of the file associated with a memmap

2010-04-12 Thread Brent Pedersen
On Mon, Apr 12, 2010 at 1:59 PM, Robert Kern robert.k...@gmail.com wrote:
 On Mon, Apr 12, 2010 at 15:52, Charles R Harris
 charlesr.har...@gmail.com wrote:

 On Mon, Apr 12, 2010 at 2:37 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:

 On Mon, Apr 12, 2010 at 1:55 PM, Brent Pedersen bpede...@gmail.com
 wrote:

 On Mon, Apr 12, 2010 at 9:49 AM, Robert Kern robert.k...@gmail.com
 wrote:
  On Mon, Apr 12, 2010 at 04:03, Nadav Horesh nad...@visionsense.com
  wrote:
 
  Is there a way to get the file-name given a memmap array object?
 
  Not at this time. This would be very useful, though, so patches are
  welcome.
 
  --
  Robert Kern
 
  I have come to believe that the whole world is an enigma, a harmless
  enigma that is made terrible by our own mad attempt to interpret it as
  though it had an underlying truth.
   -- Umberto Eco
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 

 this sounded easy, so i gave it a shot:
 http://projects.scipy.org/numpy/ticket/1451

 i think that works.

 Looks OK, I applied it. Could you add some documentation?


 And maybe the filename should be the whole path? Thoughts?

 Yes, that would help. While you are looking at it, you may want to
 consider recording some of the other information that is computed in
 or provided to __new__, like offset.

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


copying what i asked in the ticket:
where should i write the docs? in the file itself or through the doc
editor? also re path, since it can be a file-like, that would have to
be something like:

if isinstance(filename, basestring):
filename = os.path.abspath(filename)
self.filename = filename

ok with that?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Name of the file associated with a memmap

2010-04-12 Thread Brent Pedersen
On Mon, Apr 12, 2010 at 2:46 PM, Robert Kern robert.k...@gmail.com wrote:
 On Mon, Apr 12, 2010 at 16:43, Gael Varoquaux
 gael.varoqu...@normalesup.org wrote:
 On Mon, Apr 12, 2010 at 04:39:23PM -0500, Robert Kern wrote:
  where should i write the docs? in the file itself or through the doc
  editor? also re path, since it can be a file-like, that would have to
  be something like:

  if isinstance(filename, basestring):
     filename = os.path.abspath(filename)
  self.filename = filename

  ok with that?

 In the case of file object, we should grab the filename from it.
 Whether the filename argument to the constructor was a file name or a
 file object, self.filename should always be the file name, IMO.

 +1.

 Once this is in, would it make it possible/desirable to have memmapped
 arrays pickle (I know that it would require work, I am just asking).

 You need some more information from the constructor arguments, but yes.


anything other than offset and mode?



 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Name of the file associated with a memmap

2010-04-12 Thread Brent Pedersen
On Mon, Apr 12, 2010 at 3:08 PM, Robert Kern robert.k...@gmail.com wrote:
 On Mon, Apr 12, 2010 at 17:00, Brent Pedersen bpede...@gmail.com wrote:
 On Mon, Apr 12, 2010 at 2:46 PM, Robert Kern robert.k...@gmail.com wrote:
 On Mon, Apr 12, 2010 at 16:43, Gael Varoquaux
 gael.varoqu...@normalesup.org wrote:
 On Mon, Apr 12, 2010 at 04:39:23PM -0500, Robert Kern wrote:
  where should i write the docs? in the file itself or through the doc
  editor? also re path, since it can be a file-like, that would have to
  be something like:

  if isinstance(filename, basestring):
     filename = os.path.abspath(filename)
  self.filename = filename

  ok with that?

 In the case of file object, we should grab the filename from it.
 Whether the filename argument to the constructor was a file name or a
 file object, self.filename should always be the file name, IMO.

 +1.

 Once this is in, would it make it possible/desirable to have memmapped
 arrays pickle (I know that it would require work, I am just asking).

 You need some more information from the constructor arguments, but yes.


 anything other than offset and mode?

 I think that's about it.

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


i added a new patch to the ticket and updated the docs here:
http://docs.scipy.org/numpy/docs/numpy.core.memmap.memmap/
the preview is somehow rendering the sections in a different order
than they appear in the RST, not sure what's going on there.
-b
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Name of the file associated with a memmap

2010-04-12 Thread Brent Pedersen
On Mon, Apr 12, 2010 at 3:31 PM, Brent Pedersen bpede...@gmail.com wrote:
 On Mon, Apr 12, 2010 at 3:08 PM, Robert Kern robert.k...@gmail.com wrote:
 On Mon, Apr 12, 2010 at 17:00, Brent Pedersen bpede...@gmail.com wrote:
 On Mon, Apr 12, 2010 at 2:46 PM, Robert Kern robert.k...@gmail.com wrote:
 On Mon, Apr 12, 2010 at 16:43, Gael Varoquaux
 gael.varoqu...@normalesup.org wrote:
 On Mon, Apr 12, 2010 at 04:39:23PM -0500, Robert Kern wrote:
  where should i write the docs? in the file itself or through the doc
  editor? also re path, since it can be a file-like, that would have to
  be something like:

  if isinstance(filename, basestring):
     filename = os.path.abspath(filename)
  self.filename = filename

  ok with that?

 In the case of file object, we should grab the filename from it.
 Whether the filename argument to the constructor was a file name or a
 file object, self.filename should always be the file name, IMO.

 +1.

 Once this is in, would it make it possible/desirable to have memmapped
 arrays pickle (I know that it would require work, I am just asking).

 You need some more information from the constructor arguments, but yes.


 anything other than offset and mode?

 I think that's about it.

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 i added a new patch to the ticket and updated the docs here:
 http://docs.scipy.org/numpy/docs/numpy.core.memmap.memmap/
 the preview is somehow rendering the sections in a different order
 than they appear in the RST, not sure what's going on there.
 -b


Charles, thanks for committing,
i just added another patch for just the tests which i forgot to
include when i diffed last time.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Utility function to find array items are in ascending order

2010-02-09 Thread Brent Pedersen
On Tue, Feb 9, 2010 at 7:42 AM, Vishal Rana ranavis...@gmail.com wrote:
 Hi,
 Is there any utility function to find if values in the array are in
 ascending or descending order.
 Example:
 arr = [1, 2, 4, 6] should return true
 arr2 = [1, 0, 2, -2] should return false
 Thanks
 Vishal

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



i dont know if there's a utility function, but i'd use:

  np.all(a[1:] = a[:-1])
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] documenting optional out parameter

2009-10-25 Thread Brent Pedersen
hi, i've seen this section:
http://docs.scipy.org/numpy/Questions+Answers/#the-out-argument

should _all_ functions with an optional out parameter have exactly that text?
so if i find a docstring with reasonable, but different doc for out,
should it be changed
to that?

and if a docstring of a function with an optional out that needs
review does not have
the out parameter documented should it be marked as 'Needs Work'?

thanks,
-brentp
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorizing

2009-06-05 Thread Brent Pedersen
On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodmankwgood...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman kwgood...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 12:53 PM,  josef.p...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais bbl...@bryant.edu wrote:
 Hello,
 I have a vectorizing problem that I don't see an obvious way to solve.  
 What
 I have is a vector like:
 obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 and a matrix
 T=zeros((6,6))
 and what I want in T is a count of all of the transitions in obs, e.g.
 T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
 sequence 3-4 only happens once, etc...  I can do it unvectorized like:
 for o1,o2 in zip(obs[:-1],obs[1:]):
     T[o1,o2]+=1

 which gives the correct answer from above, which is:
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  3.,  0.,  0.,  1.],
        [ 0.,  3.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  2.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  2.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.]])


 but I thought there would be a better way.  I tried:
 o1=obs[:-1]
 o2=obs[1:]
 T[o1,o2]+=1
 but this doesn't give a count, it just yields 1's at the transition points,
 like:
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  0.,  1.],
        [ 0.,  1.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.]])

 Is there a clever way to do this?  I could write a quick Cython solution,
 but I wanted to keep this as an all-numpy implementation if I can.


 histogram2d or its imitation, there was a discussion on histogram2d a
 short time ago

 obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 obs2 = obs - 1
 trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
 re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
 ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
 ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
 ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
 ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
 ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
 np.all(re == trans)
 True

 trans
 array([[0, 0, 0, 0, 0, 0],
       [0, 0, 3, 0, 0, 1],
       [0, 3, 0, 1, 0, 0],
       [0, 0, 2, 0, 1, 0],
       [0, 0, 0, 2, 0, 0],
       [0, 0, 0, 0, 1, 0]])


 or

 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
 range=[[0,5],[0,5]])
 re
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  3.,  0.,  0.,  1.],
       [ 0.,  3.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  2.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.]])
 h
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  3.,  0.,  0.,  1.],
       [ 0.,  3.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  2.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.]])

 np.all(re == h)
 True

 There's no way my list method can beat that. But by adding

 import psyco
 psyco.full()

 I get a total speed up of a factor of 15 when obs is length 1.

 Actually, it is faster:

 histogram:

 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
 timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
 range=[[0,5],[0,5]])
 100 loops, best of 3: 4.14 ms per loop

 lists:

 timeit test(obs3, T3)
 1000 loops, best of 3: 1.32 ms per loop
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



here's a go:

import numpy as np
import random
from itertools import groupby

def test1(obs, T):
   for o1,o2 in zip(obs[:-1],obs[1:]):
   T[o1][o2] += 1
   return T


def test2(obs, T):
s = zip(obs[:-1], obs[1:])
for idx, g in groupby(sorted(s)):
T[idx] = len(list(g))
return T

obs = [random.randint(0, 5) for z in range(1)]

print test2(obs, np.zeros((6, 6)))
print test1(obs, np.zeros((6, 6)))


##

In [10]: timeit test1(obs, np.zeros((6, 6)))
100 loops, best of 3: 18.8 ms per loop

In [11]: timeit test2(obs, np.zeros((6, 6)))
100 loops, best of 3: 6.91 ms per loop
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorizing

2009-06-05 Thread Brent Pedersen
On Fri, Jun 5, 2009 at 1:27 PM, Keith Goodmankwgood...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen bpede...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodmankwgood...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman kwgood...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 12:53 PM,  josef.p...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais bbl...@bryant.edu wrote:
 Hello,
 I have a vectorizing problem that I don't see an obvious way to solve.  
 What
 I have is a vector like:
 obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 and a matrix
 T=zeros((6,6))
 and what I want in T is a count of all of the transitions in obs, e.g.
 T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
 sequence 3-4 only happens once, etc...  I can do it unvectorized like:
 for o1,o2 in zip(obs[:-1],obs[1:]):
     T[o1,o2]+=1

 which gives the correct answer from above, which is:
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  3.,  0.,  0.,  1.],
        [ 0.,  3.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  2.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  2.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.]])


 but I thought there would be a better way.  I tried:
 o1=obs[:-1]
 o2=obs[1:]
 T[o1,o2]+=1
 but this doesn't give a count, it just yields 1's at the transition 
 points,
 like:
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  0.,  1.],
        [ 0.,  1.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.]])

 Is there a clever way to do this?  I could write a quick Cython solution,
 but I wanted to keep this as an all-numpy implementation if I can.


 histogram2d or its imitation, there was a discussion on histogram2d a
 short time ago

 obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 obs2 = obs - 1
 trans = 
 np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
 re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
 ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
 ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
 ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
 ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
 ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
 np.all(re == trans)
 True

 trans
 array([[0, 0, 0, 0, 0, 0],
       [0, 0, 3, 0, 0, 1],
       [0, 3, 0, 1, 0, 0],
       [0, 0, 2, 0, 1, 0],
       [0, 0, 0, 2, 0, 0],
       [0, 0, 0, 0, 1, 0]])


 or

 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
 range=[[0,5],[0,5]])
 re
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  3.,  0.,  0.,  1.],
       [ 0.,  3.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  2.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.]])
 h
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  3.,  0.,  0.,  1.],
       [ 0.,  3.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  2.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.]])

 np.all(re == h)
 True

 There's no way my list method can beat that. But by adding

 import psyco
 psyco.full()

 I get a total speed up of a factor of 15 when obs is length 1.

 Actually, it is faster:

 histogram:

 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
 timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
 range=[[0,5],[0,5]])
 100 loops, best of 3: 4.14 ms per loop

 lists:

 timeit test(obs3, T3)
 1000 loops, best of 3: 1.32 ms per loop
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 here's a go:

 import numpy as np
 import random
 from itertools import groupby

 def test1(obs, T):
   for o1,o2 in zip(obs[:-1],obs[1:]):
       T[o1][o2] += 1
   return T


 def test2(obs, T):
    s = zip(obs[:-1], obs[1:])
    for idx, g in groupby(sorted(s)):
        T[idx] = len(list(g))
    return T

 obs = [random.randint(0, 5) for z in range(1)]

 print test2(obs, np.zeros((6, 6)))
 print test1(obs, np.zeros((6, 6)))


 ##

 In [10]: timeit test1(obs, np.zeros((6, 6)))
 100 loops, best of 3: 18.8 ms per loop

 In [11]: timeit test2(obs, np.zeros((6, 6)))
 100 loops, best of 3: 6.91 ms per loop

 Nice!

 Try adding

 import psyco
 psyco.full()

 to test1. Or is that cheating?

it is if you're running 64bit.
:-)


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] vectorizing

2009-06-05 Thread Brent Pedersen
On Fri, Jun 5, 2009 at 2:01 PM, Keith Goodmankwgood...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen bpede...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodmankwgood...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman kwgood...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 12:53 PM,  josef.p...@gmail.com wrote:
 On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais bbl...@bryant.edu wrote:
 Hello,
 I have a vectorizing problem that I don't see an obvious way to solve.  
 What
 I have is a vector like:
 obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 and a matrix
 T=zeros((6,6))
 and what I want in T is a count of all of the transitions in obs, e.g.
 T[1,2]=3 because the sequence 1-2 happens 3 times,  T[3,4]=1 because the
 sequence 3-4 only happens once, etc...  I can do it unvectorized like:
 for o1,o2 in zip(obs[:-1],obs[1:]):
     T[o1,o2]+=1

 which gives the correct answer from above, which is:
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  3.,  0.,  0.,  1.],
        [ 0.,  3.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  2.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  2.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.]])


 but I thought there would be a better way.  I tried:
 o1=obs[:-1]
 o2=obs[1:]
 T[o1,o2]+=1
 but this doesn't give a count, it just yields 1's at the transition 
 points,
 like:
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  0.,  1.],
        [ 0.,  1.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  1.,  0.],
        [ 0.,  0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.]])

 Is there a clever way to do this?  I could write a quick Cython solution,
 but I wanted to keep this as an all-numpy implementation if I can.


 histogram2d or its imitation, there was a discussion on histogram2d a
 short time ago

 obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
 obs2 = obs - 1
 trans = 
 np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
 re = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.],
 ...         [ 0.,  0.,  3.,  0.,  0.,  1.],
 ...         [ 0.,  3.,  0.,  1.,  0.,  0.],
 ...         [ 0.,  0.,  2.,  0.,  1.,  0.],
 ...         [ 0.,  0.,  0.,  2.,  0.,  0.],
 ...         [ 0.,  0.,  0.,  0.,  1.,  0.]])
 np.all(re == trans)
 True

 trans
 array([[0, 0, 0, 0, 0, 0],
       [0, 0, 3, 0, 0, 1],
       [0, 3, 0, 1, 0, 0],
       [0, 0, 2, 0, 1, 0],
       [0, 0, 0, 2, 0, 0],
       [0, 0, 0, 0, 1, 0]])


 or

 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
 range=[[0,5],[0,5]])
 re
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  3.,  0.,  0.,  1.],
       [ 0.,  3.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  2.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.]])
 h
 array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  3.,  0.,  0.,  1.],
       [ 0.,  3.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  2.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.]])

 np.all(re == h)
 True

 There's no way my list method can beat that. But by adding

 import psyco
 psyco.full()

 I get a total speed up of a factor of 15 when obs is length 1.

 Actually, it is faster:

 histogram:

 h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
 timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, 
 range=[[0,5],[0,5]])
 100 loops, best of 3: 4.14 ms per loop

 lists:

 timeit test(obs3, T3)
 1000 loops, best of 3: 1.32 ms per loop
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 here's a go:

 import numpy as np
 import random
 from itertools import groupby

 def test1(obs, T):
   for o1,o2 in zip(obs[:-1],obs[1:]):
       T[o1][o2] += 1
   return T


 def test2(obs, T):
    s = zip(obs[:-1], obs[1:])
    for idx, g in groupby(sorted(s)):
        T[idx] = len(list(g))
    return T

 obs = [random.randint(0, 5) for z in range(1)]

 print test2(obs, np.zeros((6, 6)))
 print test1(obs, np.zeros((6, 6)))


 ##

 In [10]: timeit test1(obs, np.zeros((6, 6)))
 100 loops, best of 3: 18.8 ms per loop

 In [11]: timeit test2(obs, np.zeros((6, 6)))
 100 loops, best of 3: 6.91 ms per loop

 Wait, you tested the list method with an array. Try

 timeit test1(obs, np.zeros((6, 6)).tolist())

 Probably best to move the array/list creation out of the timeit loop.
 Then my method won't have to pay the cost of converting to a list :)


ah right, your test1 is faster than test2 that way.
i'm just stoked to find out about ndimage.historgram.

using this:
 histogram((obs[:-1])*6 +obs[1:], 0, 36, 36).reshape(6,6)

it gives the exact result as test1/test2.

-b



 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy

Re: [Numpy-discussion] loadtxt slow

2009-03-01 Thread Brent Pedersen
On Sun, Mar 1, 2009 at 11:29 AM, Michael Gilbert
michael.s.gilb...@gmail.com wrote:
 On Sun, 1 Mar 2009 16:12:14 -0500 Gideon Simpson wrote:

 So I have some data sets of about 16 floating point numbers stored
 in text files.  I find that loadtxt is rather slow.  Is this to be
 expected?  Would it be faster if it were loading binary data?

 i have run into this as well.  loadtxt uses a python list to allocate
 memory for the data it reads in, so once you get to about 1/4th of your
 available memory, it will start allocating the updated list (every
 time it reads a new value from your data file) in swap instead of main
 memory, which is rediculously slow (in fact it causes my system to be
 quite unresponsive and a jumpy cursor). i have rewritten loadtxt to be
 smarter about allocating memory, but it is slower overall and doesn't
 support all of the original arguments/options (yet).  i have some
 ideas to make it smarter/more efficient, but have not had the time
 to work on it recently.

 i will send the current version to the list tomorrow when i have access
 to the system that it is on.

 best wishes,
 mike
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


to address the slowness, i use wrappers around savetxt/loadtxt that
save/load a .npy file
along with/instead of the .txt file. -- and the loadtxt wrapper checks
if the .npy is up-to-date.
code here:

http://rafb.net/p/dGBJjg80.html

of course it's still slow the first time. i look forward to your speedups.
-brentp
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt issues

2009-02-10 Thread Brent Pedersen
On Tue, Feb 10, 2009 at 9:40 PM, A B python6...@gmail.com wrote:
 Hi,

 How do I write a loadtxt command to read in the following file and
 store each data point as the appropriate data type:

 12|h|34.5|44.5
 14552|bbb|34.5|42.5

 Do the strings have to be read in separately from the numbers?

 Why would anyone use 'S10' instead of 'string'?

 dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
 'S4','f4', 'f4')}

 a = loadtxt(sample_data.txt, dtype=dt)

 gives

 ValueError: need more than 1 value to unpack

 I can do a = loadtxt(sample_data.txt, dtype=string) but can't use
 'string' instead of S4 and all my data is read into strings.

 Seems like all the examples on-line use either numeric or textual
 input, but not both.

 Thanks.
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


works for me but not sure i understand the problem, did you try
setting the delimiter?


import numpy as np
from cStringIO import StringIO

txt = StringIO(\
12|h|34.5|44.5
14552|bbb|34.5|42.5)

dt = {'names': ('gender','age','weight','bal'), 'formats': ('i4',
'S4','f4', 'f4')}
a = np.loadtxt(txt, dtype=dt, delimiter=|)
print a.dtype
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparison of arrays

2009-02-09 Thread Brent Pedersen
On Mon, Feb 9, 2009 at 6:02 AM, Neil neilcrigh...@gmail.com wrote:

  I have two integer arrays of different shape, e.g.
 
   a
 
  array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
 
   b
 
  array([ 3,  4,  5,  6,  7,  8,  9, 10])
 
  How can I extract the values that belong to the array a
  exclusively i.e. array([1,2]) ?


 You could also use numpy.setmember1d to get a boolean mask that selects
 the values:

 In [21]: a = np.array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
 In [22]: b = np.array([ 3,  4,  5,  6,  7,  8,  9, 10])
 In [23]: ismember = np.setmember1d(a,b)
 In [24]: a[~ismember]
 Out[24]: array([1, 2])


 Neil


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


there's also np.setdiff1d() which does the above in a single line.
-brent
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] genfromtxt view with object dtype

2009-02-04 Thread Brent Pedersen
hi, i am using genfromtxt, with a dtype like this:
[('seqid', '|S24'), ('source', '|S16'), ('type', '|S16'), ('start',
'i4'), ('end', 'i4'), ('score', 'f8'), ('strand', '|S1'), ('phase',
'i4'), ('attrs', '|O4')]

where i'm having problems with the attrs column which i'd like to be a
dict. i can specify a convertor to parse a string into a dict, and it
is correctly converted to a dict,
but then in io.py it tries to take a view() of that dtype and it gives
the error:

A = np.genfromtxt(fname, **kwargs)
  File /usr/lib/python2.5/site-packages/numpy/lib/io.py, line 922,
in genfromtxt
output = rows.view(dtype)
 TypeError: Cannot change data-type for object array.


is there anyway around this or must that col be kept as a string?
it seems like genfromtxt expects you to specify either a dtype _or_ a
convertor, not both.

thanks,
-brent
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Renaming a field of an object array

2009-02-04 Thread Brent Pedersen
On Wed, Feb 4, 2009 at 2:50 PM, Pierre GM pgmdevl...@gmail.com wrote:
 All,
 I'm a tad puzzled by the following behavior (I'm trying to correct a
 bug in genfromtxt):

 I'm creating an empty structured ndarray, using np.object as dtype.

   a = np.empty(1,dtype=[('',np.object)])
 array([(None,)],
   dtype=[('f0', '|O4')])

 Now, I'd like to rename the field:
   a.view([('NAME',np.object)])
 TypeError: Cannot change data-type for object array.

 I understand why I can't change the *type* of the field, but not why I
 can't change its name that way. What would be an option that wouldn't
 involve creating a new array ?
 Thx in advance.

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


hi, i was looking at this as well. the code in arrayobject.c doesnt
match the error string. i changed the code to do what the error string
says and seems thing to work.
i think the if-block below it should also use xor (not changed in this
patch), but i'm not a c programmer so i may be missing something
obvious.

svn diff numpy/core/src/arrayobject.c
Index: numpy/core/src/arrayobject.c
===
--- numpy/core/src/arrayobject.c(revision 6338)
+++ numpy/core/src/arrayobject.c(working copy)
@@ -6506,9 +6506,16 @@
 PyErr_SetString(PyExc_TypeError, invalid data-type for array);
 return -1;
 }
-if (PyDataType_FLAGCHK(newtype, NPY_ITEM_HASOBJECT) ||
-PyDataType_FLAGCHK(newtype, NPY_ITEM_IS_POINTER) ||
-PyDataType_FLAGCHK(self-descr, NPY_ITEM_HASOBJECT) ||
+if (PyDataType_FLAGCHK(newtype, NPY_ITEM_HASOBJECT) ^
+PyDataType_FLAGCHK(self-descr, NPY_ITEM_HASOBJECT)) {
+PyErr_SetString(PyExc_TypeError,  \
+Cannot change data-type for object  \
+array.);
+Py_DECREF(newtype);
+return -1;
+}
+
+if (PyDataType_FLAGCHK(newtype, NPY_ITEM_IS_POINTER) ||
 PyDataType_FLAGCHK(self-descr, NPY_ITEM_IS_POINTER)) {
 PyErr_SetString(PyExc_TypeError,  \
 Cannot change data-type for object  \
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt view with object dtype

2009-02-04 Thread Brent Pedersen
On Wed, Feb 4, 2009 at 8:51 PM, Pierre GM pgmdevl...@gmail.com wrote:
 OK, Brent, try r6341.
 I fixed genfromtxt for cases like yours (explicit dtype involving a
 np.object).
 Note that the fix won't work if the dtype is nested and involves
 np.objects (as we would hit the pb of renaming fields we observed...).
 Let me know how it goes.
 P.


that fixes it. thanks again pierre!
-b




 On Feb 4, 2009, at 4:03 PM, Brent Pedersen wrote:

 On Wed, Feb 4, 2009 at 9:36 AM, Pierre GM pgmdevl...@gmail.com
 wrote:

 On Feb 4, 2009, at 12:09 PM, Brent Pedersen wrote:

 hi, i am using genfromtxt, with a dtype like this:
 [('seqid', '|S24'), ('source', '|S16'), ('type', '|S16'), ('start',
 'i4'), ('end', 'i4'), ('score', 'f8'), ('strand', '|S1'),
 ('phase',
 'i4'), ('attrs', '|O4')]

 Brent,
 Please post a simple, self-contained example with a few lines of the
 file you want to load.

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


 hi pierre, here is an example.
 thanks,
 -brent

 ##

 import numpy as np
 from cStringIO import StringIO

 gffstr = \
 ##gff-version 3
 1\tucb\tgene\t2234602\t2234702\t.\t-\t.
 \tID
 =
 grape_1_2234602_2234702
 ;match
 =
 EVM_prediction_supercontig_1.248,EVM_prediction_supercontig_1.248.mRNA
 1\tucb\tgene\t2300292\t2302123\t.\t+\t.
 \tID=grape_1_2300292_2302123;match=EVM_prediction_supercontig_244.8
 1\tucb\tgene\t2303615\t2303967\t.\t+\t.
 \tID=grape_1_2303615_2303967;match=EVM_prediction_supercontig_244.8
 1\tucb\tgene\t2303616\t2303966\t.\t+\t.
 \tParent=grape_1_2303615_2303967
 1\tucb\tgene\t3596400\t3596503\t.\t-\t.
 \tID=grape_1_3596400_3596503;match=evm.TU.supercontig_167.27
 1\tucb\tgene\t3600651\t3600977\t.\t-\t.
 \tmatch=evm.model.supercontig_1217.1,evm.model.supercontig_1217.1.mRNA
 

 dtype = {'names' :
  ('seqid', 'source', 'type', 'start', 'end',
'score', 'strand', 'phase', 'attrs') ,
'formats':
  ['S24', 'S16', 'S16', 'i4', 'i4', 'f8',
  'S1', 'i4', 'S128']}

 #OK with S128 for attrs
 print np.genfromtxt(StringIO(gffstr), dtype = dtype)



 def _attr(kvstr):
pairs = [kv.split(=) for kv in kvstr.split(;)]
return dict(pairs)

 # change S128 to object to have col attrs as dictionary
 dtype['formats'][-1] = 'O'
 converters = {8: _attr }
 #NOT OK
 print np.genfromtxt(StringIO(gffstr), dtype = dtype,
 converters=converters)
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] genfromtxt

2009-01-21 Thread Brent Pedersen
hi, i'm using the new genfromtxt stuff in numpy svn, looks great
pierre any who contributed.
is there a way to have the header commented and still be able to have
it recognized as the header? e.g.

#gender age weight
M   21  72.10
F   35  58.33
M   33  21.99

if i use np.loadtxt or genfromtxt, it tries to use the 2nd row (first
non-commented as the header).
and i get an array like:
array([('M', 21, 72.094), ('F', 35, 58.328),
  ('M', 33, 21.988)],
 dtype=[('f0', '|S1'), ('f1', 'i4'), ('f2', 'f8')])

when i want:

array([('M', 21, 72.094), ('F', 35, 58.328),
  ('M', 33, 21.988)],
 dtype=[('gender', '|S1'), ('age', 'i4'), ('weight', 'f8')])

i can get that by uncommenting the header, but it's useful to have
that for other code.
is there anyway to do that?
thanks,
-brent
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] genfromtxt

2009-01-21 Thread Brent Pedersen
On Wed, Jan 21, 2009 at 9:39 PM, Pierre GM pgmdevl...@gmail.com wrote:
 Brent,
 Mind trying r6330 and let me know if it works for you ? Make sure that
 you use names=True to detect a header.
 P.


yes, works perfectly.
thanks!
-brent
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy array change notifier?

2008-10-27 Thread Brent Pedersen
On Mon, Oct 27, 2008 at 1:56 PM, Robert Kern [EMAIL PROTECTED] wrote:
 On Mon, Oct 27, 2008 at 15:54, Erik Tollerud [EMAIL PROTECTED] wrote:
 Is there any straightforward way of notifying on change of a numpy
 array that leaves the numpy arrays still efficient?

 Not currently, no.

 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
  -- Umberto Eco
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


out of curiosity,
would something like this affect efficiency (and/or work):

class Notify(numpy.ndarray):
def __setitem__(self, *args):
self.notify(*args)
return super(Notify, self).__setitem__(*args)

def notify(self, *args):
print 'notify:', args


with also overriding setslice?
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array gymnastics

2008-09-11 Thread Brent Pedersen
On Thu, Sep 11, 2008 at 6:03 PM, Alan Jackson [EMAIL PROTECTED] wrote:
 There has got to be a simple way to do this, but I'm just not seeing it.

 a = array([[1,2,3,4,5,6],
  [7,8,9,10,11,12]])
 b = array([21,22,23,24,25,26])

 What I want to end up with is :

 c = array([[1,7,21],
   [2,8,22],
..
   [6,12,26]])


 --
 ---
 | Alan K. Jackson| To see a World in a Grain of Sand  |
 | [EMAIL PROTECTED]  | And a Heaven in a Wild Flower, |
 | www.ajackson.org   | Hold Infinity in the palm of your hand |
 | Houston, Texas | And Eternity in an hour. - Blake   |
 ---
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion


hi, i think

 numpy.column_stack((a[0], a[1], b))

does what you want.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] a good polygon class?

2008-08-19 Thread Brent Pedersen
hi,
http://pypi.python.org/pypi/Shapely
supports the array interface, and has all the geos geometry operations:
http://gispython.org/shapely/manual.html#contains

On Tue, Aug 19, 2008 at 8:31 AM, mark [EMAIL PROTECTED] wrote:
 Hello List -

 I am looking for a good polygon class.

 My main interest it to be able to figure out if a point is inside or
 outside the polygon, which can have any shape (well, as long as it is
 a polygon).

 Any suggestions?

 Thanks, Mark
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion