date:20100806

Re: [Numpy-discussion] use index array of len n to select columns of n x m array

2010-08-06 Thread Martin Spacek

Keith Goodman wrote:
  Here's one way:
 
  a.flat[i + a.shape[1] * np.arange(a.shape[0])]
  array([0, 3, 5, 6, 9])


I'm afraid I made my example a little too simple. In retrospect, what I really 
want is to be able to use a 2D index array i, like this:

  a = np.array([[ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8,  9, 10, 11],
   [12, 13, 14, 15],
   [16, 17, 18, 19]])
  i = np.array([[2, 1],
   [3, 1],
   [1, 1],
   [0, 0],
   [3, 1]])
  foo(a, i)
array([[ 2,  1],
[ 7,  5],
[ 9,  9],
[12, 12],
[19, 17]])

I think the flat iterator indexing suggestion is about the only thing that'll 
work. Here's the function I've pretty much settled on:

def rowtake(a, i):
 For each row in a, return values according to column indices in the
 corresponding row in i. Returned shape == i.shape
 assert a.ndim == 2
 assert i.ndim = 2
 if i.ndim == 1:
 return a.flat[i + a.shape[1] * np.arange(a.shape[0])]
 else: # i.ndim == 2
 return a.flat[i + a.shape[1] * np.vstack(np.arange(a.shape[0]))]

This is about half as fast as my Cython function, but the Cython function is 
limited to fixed dtypes and ndim:

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
def rowtake_cy(np.ndarray[np.int32_t, ndim=2] a,
np.ndarray[np.int32_t, ndim=2] i):
 For each row in a, return values according to column indices in the
 corresponding row in i. Returned shape == i.shape

 cdef Py_ssize_t nrows, ncols, rowi, coli
 cdef np.ndarray[np.int32_t, ndim=2] out

 nrows = i.shape[0]
 ncols = i.shape[1] # num cols to take from a for each row
 assert a.shape[0] == nrows
 assert i.max()  a.shape[1]
 out = np.empty((nrows, ncols), dtype=np.int32)

 for rowi in range(nrows):
 for coli in range(ncols):
 out[rowi, coli] = a[rowi, i[rowi, coli]]

 return out

Cheers,

Martin
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] use index array of len n to select columns of n x m array

2010-08-06 Thread josef . pktd

On Fri, Aug 6, 2010 at 6:01 AM, Martin Spacek nu...@mspacek.mm.st wrote:
 Keith Goodman wrote:
   Here's one way:
  
   a.flat[i + a.shape[1] * np.arange(a.shape[0])]
       array([0, 3, 5, 6, 9])


 I'm afraid I made my example a little too simple. In retrospect, what I really
 want is to be able to use a 2D index array i, like this:

   a = np.array([[ 0,  1,  2,  3],
                   [ 4,  5,  6,  7],
                   [ 8,  9, 10, 11],
                   [12, 13, 14, 15],
                   [16, 17, 18, 19]])
   i = np.array([[2, 1],
                   [3, 1],
                   [1, 1],
                   [0, 0],
                   [3, 1]])
   foo(a, i)
 array([[ 2,  1],
        [ 7,  5],
        [ 9,  9],
        [12, 12],
        [19, 17]])

 I think the flat iterator indexing suggestion is about the only thing that'll
 work. Here's the function I've pretty much settled on:

 def rowtake(a, i):
     For each row in a, return values according to column indices in the
     corresponding row in i. Returned shape == i.shape
     assert a.ndim == 2
     assert i.ndim = 2
     if i.ndim == 1:
         return a.flat[i + a.shape[1] * np.arange(a.shape[0])]
     else: # i.ndim == 2
         return a.flat[i + a.shape[1] * np.vstack(np.arange(a.shape[0]))]


I still find broadcasting easier to read, even if it might be a bit slower

 a[np.arange(5)[:,None], i]
array([[ 2,  1],
   [ 7,  5],
   [ 9,  9],
   [12, 12],
   [19, 17]])

Josef



 This is about half as fast as my Cython function, but the Cython function is
 limited to fixed dtypes and ndim:

 @cython.boundscheck(False)
 @cython.wraparound(False)
 @cython.cdivision(True)
 def rowtake_cy(np.ndarray[np.int32_t, ndim=2] a,
                np.ndarray[np.int32_t, ndim=2] i):
     For each row in a, return values according to column indices in the
     corresponding row in i. Returned shape == i.shape

     cdef Py_ssize_t nrows, ncols, rowi, coli
     cdef np.ndarray[np.int32_t, ndim=2] out

     nrows = i.shape[0]
     ncols = i.shape[1] # num cols to take from a for each row
     assert a.shape[0] == nrows
     assert i.max()  a.shape[1]
     out = np.empty((nrows, ncols), dtype=np.int32)

     for rowi in range(nrows):
         for coli in range(ncols):
             out[rowi, coli] = a[rowi, i[rowi, coli]]

     return out

 Cheers,

 Martin
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] use index array of len n to select columns of n x m array

2010-08-06 Thread Keith Goodman

On Fri, Aug 6, 2010 at 3:01 AM, Martin Spacek nu...@mspacek.mm.st wrote:
 Keith Goodman wrote:
   Here's one way:
  
   a.flat[i + a.shape[1] * np.arange(a.shape[0])]
       array([0, 3, 5, 6, 9])


 I'm afraid I made my example a little too simple. In retrospect, what I really
 want is to be able to use a 2D index array i, like this:

   a = np.array([[ 0,  1,  2,  3],
                   [ 4,  5,  6,  7],
                   [ 8,  9, 10, 11],
                   [12, 13, 14, 15],
                   [16, 17, 18, 19]])
   i = np.array([[2, 1],
                   [3, 1],
                   [1, 1],
                   [0, 0],
                   [3, 1]])
   foo(a, i)
 array([[ 2,  1],
        [ 7,  5],
        [ 9,  9],
        [12, 12],
        [19, 17]])

 I think the flat iterator indexing suggestion is about the only thing that'll
 work. Here's the function I've pretty much settled on:

 def rowtake(a, i):
     For each row in a, return values according to column indices in the
     corresponding row in i. Returned shape == i.shape
     assert a.ndim == 2
     assert i.ndim = 2
     if i.ndim == 1:
         return a.flat[i + a.shape[1] * np.arange(a.shape[0])]
     else: # i.ndim == 2
         return a.flat[i + a.shape[1] * np.vstack(np.arange(a.shape[0]))]

 This is about half as fast as my Cython function, but the Cython function is
 limited to fixed dtypes and ndim:

You can speed it up by getting rid of two copies:

idx = np.arange(a.shape[0])
idx *= a.shape[1]
idx += i
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] {OT} Mailing trends

2010-08-06 Thread Vincent Davis

On Thu, Aug 5, 2010 at 2:55 PM, josef.p...@gmail.com wrote:

 On Thu, Aug 5, 2010 at 3:43 PM, Gökhan Sever gokhanse...@gmail.com
 wrote:
  Hello,
  There is a nice e-mailing trend tool for Gmail users
  at http://code.google.com/p/mail-trends/
  It is a command line tool producing an html output showing your e-mailing
  statistics. In my inbox, the following threads are highly ranked in the
 top
  threads section.
 
  [Numpy-discussion] Announcing toydist, improving distribution and
 packaging
  situation
  [SciPy-Dev] scipy.stats
  [Numpy-discussion] curious about how people would feel about moving to
  github
 
  Just out of curiosity, are there any mailing trends (top threads, top
  posters, etc...) provided for the Python related mailing archives?
  Share your comments please.

 I only know the top poster statistics for googlegroups

 http://groups.google.ca/group/scipy-user/about?hl=en

 but numpy-discussion and scipy-dev are not on google groups


Is scipy-user a google group or just mirrored?

As a side note or idea I was playing with trying to search for each
function/module in the archive as a way to rank/prioritize what
documentation may need improvement.
For example searching http://groups.google.ca/group/scipy-user/about?hl=en
http://groups.google.ca/group/scipy-user/about?hl=en*38* results for *
randint**
*6,950 results for optimize**
Of course there are many reasons an it just might be popular in example code
and not really the question in the post.

Vincent

Josef


  --
  Gökhan
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


  *Vincent Davis
720-301-3003 *
vinc...@vincentdavis.net
 my blog http://vincentdavis.net |
LinkedInhttp://www.linkedin.com/in/vincentdavis
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

2010-08-06 Thread Nils Becker

Hi,

I found what looks like a bug in histogram, when the option normed=True
is used together with non-uniform bins.

Consider this example:

import numpy as np
data = np.array([1, 2, 3, 4])
bins = np.array([.5, 1.5, 4.5])
bin_widths = np.diff(bins)
(counts, dummy) = np.histogram(data, bins)
(densities, dummy) = np.histogram(data, bins, normed=True)

What this gives is:

bin_widths
array([ 1.,  3.])

counts
array([1, 3])

densities
array([ 0.1,  0.3])

The documentation claims that histogram with normed=True gives a
density, which integrates to 1. In this example, it is true that
(densities * bin_widths).sum() is 1. However, clearly the data are
equally spaced, so their density should be uniform and equal to 0.25.
Note that (0.25 * bin_widths).sum() is also 1.

I believe np.histogram(data, bins, normed=True) effectively does :
np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]).

However, it _should_ do
np.histogram(data, bins, normed=False) / bins_widths

to get a true density over the data coordinate as a result. It's easy to
fix by hand, but I think the documentation is at least misleading?!

sorry if this has been discussed before; I did not find it anyway (numpy
1.3)





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] {OT} Mailing trends

2010-08-06 Thread josef . pktd

On Fri, Aug 6, 2010 at 11:13 AM, Vincent Davis vinc...@vincentdavis.net wrote:

 On Thu, Aug 5, 2010 at 2:55 PM, josef.p...@gmail.com wrote:

 On Thu, Aug 5, 2010 at 3:43 PM, Gökhan Sever gokhanse...@gmail.com wrote:
  Hello,
  There is a nice e-mailing trend tool for Gmail users
  at http://code.google.com/p/mail-trends/
  It is a command line tool producing an html output showing your e-mailing
  statistics. In my inbox, the following threads are highly ranked in the top
  threads section.
 
  [Numpy-discussion] Announcing toydist, improving distribution and packaging
  situation
  [SciPy-Dev] scipy.stats
  [Numpy-discussion] curious about how people would feel about moving to
  github
 
  Just out of curiosity, are there any mailing trends (top threads, top
  posters, etc...) provided for the Python related mailing archives?
  Share your comments please.

 I only know the top poster statistics for googlegroups

 http://groups.google.ca/group/scipy-user/about?hl=en

 but numpy-discussion and scipy-dev are not on google groups


 Is scipy-user a google group or just mirrored?

just mirrored, original is on scipy.org


 As a side note or idea I was playing with trying to search for each 
 function/module in the archive as a way to rank/prioritize what documentation 
 may need improvement.
 For example searching http://groups.google.ca/group/scipy-user/about?hl=en
 38 results for randint*
 6,950 results for optimize*
 Of course there are many reasons an it just might be popular in example code 
 and not really the question in the post.

I didn't know adding stars works:
12,600 results for *stats*
120  results for *stats* bug

I would guess that the most popular functions are also the ones that
are the best documented (or they are popular because they have the
least obvious API).

(for example I didn't find much on ttest,  too obvious ? except for a
possible enhancement/addition
http://groups.google.ca/group/scipy-user/browse_thread/thread/bc3c36f8908a20af/a85a5d6b7d457436?hl=enlnk=gstq=t_test*#a85a5d6b7d457436
)

Josef


 Vincent

 Josef


  --
  Gökhan
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 Vincent Davis
 720-301-3003
 vinc...@vincentdavis.net

 my blog | LinkedIn
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

2010-08-06 Thread josef . pktd

On Fri, Aug 6, 2010 at 11:46 AM, Nils Becker n.bec...@amolf.nl wrote:
 Hi,

 I found what looks like a bug in histogram, when the option normed=True
 is used together with non-uniform bins.

 Consider this example:

 import numpy as np
 data = np.array([1, 2, 3, 4])
 bins = np.array([.5, 1.5, 4.5])
 bin_widths = np.diff(bins)
 (counts, dummy) = np.histogram(data, bins)
 (densities, dummy) = np.histogram(data, bins, normed=True)

 What this gives is:

 bin_widths
 array([ 1.,  3.])

 counts
 array([1, 3])

 densities
 array([ 0.1,  0.3])

 The documentation claims that histogram with normed=True gives a
 density, which integrates to 1. In this example, it is true that
 (densities * bin_widths).sum() is 1. However, clearly the data are
 equally spaced, so their density should be uniform and equal to 0.25.
 Note that (0.25 * bin_widths).sum() is also 1.

 I believe np.histogram(data, bins, normed=True) effectively does :
 np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]).

 However, it _should_ do
 np.histogram(data, bins, normed=False) / bins_widths

 to get a true density over the data coordinate as a result. It's easy to
 fix by hand, but I think the documentation is at least misleading?!

 sorry if this has been discussed before; I did not find it anyway (numpy
 1.3)

Either I also don't understand histogram or this is a bug.

 data = np.arange(1,10)
 bins = np.array([.5, 1.5, 4.5, 7.5, 8.5, 9.5])
 np.histogram(data, bins, normed=True)
(array([ 0.04761905,  0.14285714,  0.14285714,  0.04761905,
0.04761905]), array([ 0.5,  1.5,  4.5,  7.5,  8.5,  9.5]))
 np.histogram(data, bins)
(array([1, 3, 3, 1, 1]), array([ 0.5,  1.5,  4.5,  7.5,  8.5,  9.5]))
 np.diff(bins)
array([ 1.,  3.,  3.,  1.,  1.])

I don't see what the normed=True numbers are in this case.

 np.array([ 1.,  3.,  3.,  1.,  1.])/7
array([ 0.14285714,  0.42857143,  0.42857143,  0.14285714,  0.14285714])

Josef





 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Installing numpy with MKL

2010-08-06 Thread Francesc Alted

2010/8/5, David Warde-Farley d...@cs.toronto.edu:
 I've been having a similar problem compiling NumPy with MKL on a cluster
 with a site-wide license. Dag's site.cfg fails to config if I use 'iomp5' in
 it, since (at least with this version, 11.1) libiomp5 is located in

   /scinet/gpc/intel/Compiler/11.1/072/lib/intel64/

 whereas the actual proper MKL

   /scinet/gpc/intel/Compiler/11.1/072/mkl/lib/em64t/

 I've tried putting both in my library_dirs separated by a colon as is
 suggested by the docs, but python setup.py config fails to find MKL in this
 case. Has anyone else run into this issue?

I've made a patch to solve this some time ago:

http://projects.scipy.org/numpy/ticket/993

but it did not make into the repo yet.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Broken links on new.scipy

2010-08-06 Thread Gökhan Sever

Hi,

@ http://new.scipy.org/download.html numpy and scipy links for Fedora is
broken.

Could you update the links with these?

https://admin.fedoraproject.org/pkgdb/acls/name/numpy

https://admin.fedoraproject.org/pkgdb/acls/name/numpy
https://admin.fedoraproject.org/pkgdb/acls/name/scipy

Thanks.

-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] {OT} Mailing trends

2010-08-06 Thread Vincent Davis

On Fri, Aug 6, 2010 at 10:16 AM, josef.p...@gmail.com wrote:

 On Fri, Aug 6, 2010 at 11:13 AM, Vincent Davis vinc...@vincentdavis.net
 wrote:
 
  On Thu, Aug 5, 2010 at 2:55 PM, josef.p...@gmail.com wrote:
 
  On Thu, Aug 5, 2010 at 3:43 PM, Gökhan Sever gokhanse...@gmail.com
 wrote:
   Hello,
   There is a nice e-mailing trend tool for Gmail users
   at http://code.google.com/p/mail-trends/
   It is a command line tool producing an html output showing your
 e-mailing
   statistics. In my inbox, the following threads are highly ranked in
 the top
   threads section.
  
   [Numpy-discussion] Announcing toydist, improving distribution and
 packaging
   situation
   [SciPy-Dev] scipy.stats
   [Numpy-discussion] curious about how people would feel about moving to
   github
  
   Just out of curiosity, are there any mailing trends (top threads, top
   posters, etc...) provided for the Python related mailing archives?
   Share your comments please.
 
  I only know the top poster statistics for googlegroups
 
  http://groups.google.ca/group/scipy-user/about?hl=en
 
  but numpy-discussion and scipy-dev are not on google groups
 
 
  Is scipy-user a google group or just mirrored?

 just mirrored, original is on scipy.org

 
  As a side note or idea I was playing with trying to search for each
 function/module in the archive as a way to rank/prioritize what
 documentation may need improvement.
  For example searching
 http://groups.google.ca/group/scipy-user/about?hl=en
  38 results for randint*
  6,950 results for optimize*
  Of course there are many reasons an it just might be popular in example
 code and not really the question in the post.

 I didn't know adding stars works:
 12,600 results for *stats*
 120  results for *stats* bug





 I would guess that the most popular functions are also the ones that
 are the best documented (or they are popular because they have the
 least obvious API).

 This is cool.
http://groups.google.ca/advanced_search?q=

So the google group is just a mirror of the mailman list?

I was thinking this might be valid way to prioritize the documentation (if
one wanted to do such a thing)
1; the documentation is poor, or there is no example (based on a quick
look)
2; prioritize based on hits in the mail list.

Vincent


(for example I didn't find much on ttest,  too obvious ? except for a
 possible enhancement/addition

 http://groups.google.ca/group/scipy-user/browse_thread/thread/bc3c36f8908a20af/a85a5d6b7d457436?hl=enlnk=gstq=t_test*#a85a5d6b7d457436
 )

 Josef

 
  Vincent
 
  Josef
 
 
   --
   Gökhan
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  Vincent Davis
  720-301-3003
  vinc...@vincentdavis.net
 
  my blog | LinkedIn
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


  *Vincent Davis
720-301-3003 *
vinc...@vincentdavis.net
 my blog http://vincentdavis.net |
LinkedInhttp://www.linkedin.com/in/vincentdavis
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

2010-08-06 Thread Nils Becker

Hi again,

first a correction: I posted

 I believe np.histogram(data, bins, normed=True) effectively does :
 np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]).

 However, it _should_ do
 np.histogram(data, bins, normed=False) / bins_widths

but there is a normalization missing; it should read

I believe np.histogram(data, bins, normed=True) effectively does
np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]) / data.sum()

However, it _should_ do
np.histogram(data, bins, normed=False) / bins_widths / data.sum()

Bruce Southey replied:
 As I recall, there as issues with this aspect.
 Please search the discussion regarding histogram especially David
 Huard's reply in this thread:
 http://thread.gmane.org/gmane.comp.python.numeric.general/22445
I think this discussion pertains to a switch in calling conventions
which happened at the time. The last reply of D. Huard (to me) seems to
say that they did not fix anything in the _old_ semantics, but that the
new semantics is expected to work properly.

I tried with an infinite bin:
counts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf])
counts
array([1,3])
ncounts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf], normed=1)
ncounts
array([0.,0.])

this also does not make a lot of sense to me. A better result would be
array([0.25, 0.]), since 25% of the points fall in the first bin; 75%
fall in the second but are spread out over an infinite interval, giving
0. This is what my second proposal would give. I cannot find anything
wrong with it so far...

Cheers, Nils
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] use index array of len n to select columns of n x m array

2010-08-06 Thread Martin Spacek

On 2010-08-06 13:11, Martin Spacek wrote:
 Josef, I'd forgotten you could use None to increase the dimensionality of an
 array. Neat. And, somehow, it's almost twice as fast as the Cython version!:

 timeit a[np.arange(a.shape[0])[:, None], i]
 10 loops, best of 3: 5.76 us per loop

I just realized why the Cython version was slower - the two assertion lines. 
commenting those out:

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True) # might be necessary to release the GIL?
def rowtake_cy(np.ndarray[np.int32_t, ndim=2] a,
np.ndarray[np.int32_t, ndim=2] i):
 For each row in a, return values according to column indices in the
 corresponding row in i. Returned shape == i.shape

 cdef Py_ssize_t nrows, ncols, rowi, coli
 cdef np.ndarray[np.int32_t, ndim=2] out

 nrows = i.shape[0]
 ncols = i.shape[1] # num cols to take for each row
 #assert a.shape[0] == nrows
 #assert i.max()  a.shape[1]
 out = np.empty((nrows, ncols), dtype=np.int32)

 for rowi in range(nrows):
 for coli in range(ncols):
 out[rowi, coli] = a[rowi, i[rowi, coli]]

 return out


gives me:


  timeit rowtake_cy(a, i)
100 loops, best of 3: 1.44 us per loop

which is 4X faster than the a[np.arange(a.shape[0])[:, None], i] broadcasting 
method.

Martin



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

2010-08-06 Thread josef . pktd

On Fri, Aug 6, 2010 at 4:53 PM, Nils Becker n.bec...@amolf.nl wrote:
 Hi again,

 first a correction: I posted

 I believe np.histogram(data, bins, normed=True) effectively does :
 np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]).

 However, it _should_ do
 np.histogram(data, bins, normed=False) / bins_widths

 but there is a normalization missing; it should read

 I believe np.histogram(data, bins, normed=True) effectively does
 np.histogram(data, bins, normed=False) / (bins[-1] - bins[0]) / data.sum()

 However, it _should_ do
 np.histogram(data, bins, normed=False) / bins_widths / data.sum()

 Bruce Southey replied:
 As I recall, there as issues with this aspect.
 Please search the discussion regarding histogram especially David
 Huard's reply in this thread:
 http://thread.gmane.org/gmane.comp.python.numeric.general/22445
 I think this discussion pertains to a switch in calling conventions
 which happened at the time. The last reply of D. Huard (to me) seems to
 say that they did not fix anything in the _old_ semantics, but that the
 new semantics is expected to work properly.

 I tried with an infinite bin:
 counts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf])
 counts
 array([1,3])
 ncounts, dmy = np.histogram([1,2,3,4], [0.5,1.5,np.inf], normed=1)
 ncounts
 array([0.,0.])

 this also does not make a lot of sense to me. A better result would be
 array([0.25, 0.]), since 25% of the points fall in the first bin; 75%
 fall in the second but are spread out over an infinite interval, giving
 0. This is what my second proposal would give. I cannot find anything
 wrong with it so far...

I didn't find any different information about the meaning of
normed=True on the mailing list nor in the trac history

169 
170 if normed:
171 db = array(np.diff(bins), float)
172 return n/(n*db).sum(), bins

this does not look like the correct piecewise density with unequal binsizes.

Thanks Nils for pointing this out, I tried only equal binsizes for a
histogram distribution.

Josef






 Cheers, Nils
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] dtype.type for structured arrays

2010-08-06 Thread Travis Oliphant


On Jul 24, 2010, at 2:42 PM, Thomas Robitaille wrote:

 Hi,
 
 If I create a structured array with vector columns:
 
 array = np.array(zip([[1,2],[1,2],[1,3]]),dtype=[('a',float,2)])
 
 then examine the type of the column, I get:
 
 array.dtype[0]
 dtype(('float64',(2,)))
 
 Then, if I try and view the numerical type, I see:
 
 array.dtype[0].type
 type 'numpy.void'
 
 I have to basically do
 
 array.dtype[0].subdtype[0]
 dtype('float64')
 
 to get what I need. I seem to remember that this used not to be the case, and 
 that even for vector columns, one could access array.dtype[0].type to get the 
 numerical type. Is this a bug, or deliberate?
 


This looks the same as I remember it.   The dtype is a structured array with 
the filed name 'a' having an element which is a vector of floats.   As a 
result, the first dtype[0] extracts the vector of floats dtype.   This must be 
type void because it is a vector of floats.  To get to the underlying type, 
you have to do what you did.  I don't see how it would have worked another way 
in the past. 

-Travis




 Thanks,
 
 Thomas
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

---
Travis Oliphant
Enthought, Inc.
oliph...@enthought.com
1-512-536-1057
http://www.enthought.com



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] use index array of len n to select columns of n x m array

Re: [Numpy-discussion] use index array of len n to select columns of n x m array

Re: [Numpy-discussion] use index array of len n to select columns of n x m array

Re: [Numpy-discussion] {OT} Mailing trends

[Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

Re: [Numpy-discussion] {OT} Mailing trends

Re: [Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

Re: [Numpy-discussion] Installing numpy with MKL

[Numpy-discussion] Broken links on new.scipy

Re: [Numpy-discussion] {OT} Mailing trends

Re: [Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

Re: [Numpy-discussion] use index array of len n to select columns of n x m array

Re: [Numpy-discussion] numpy histogram normed=True (bug / confusing behavior)

Re: [Numpy-discussion] dtype.type for structured arrays

14 matches

Site Navigation

Mail list logo

Footer information