Re: [Numpy-discussion] numpy.sum(..., keepdims=False)

2012-04-04 Thread Pierre Haessig
Hi,

Le 03/04/2012 22:10, Frédéric Bastien a écrit :
 I would like to add this parameter to Theano. So my question is, will
 the interface change or is it stable?
I don't know for the stability, but for the existence of this new parameter:

https://github.com/numpy/numpy/blob/master/numpy/core/fromnumeric.py

looking at def sum(...) it seems the keepdims=False parameter is here
and was introduced 7 months ago by Mark Wiebe and Charles Harris.

The docstring indeed says :
keepdims : bool, optional
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the original `arr`.

The commit message also mentions the skipna parameter, which is part of
the overall NA implementation which is indeed tagged as somehow
experimental (if I'm correct ! ), but I would assume that the
keepdims=False parameter is an orthogonal issue. Hopefully somebody can
give you a more precise answer !

Best,
Pierre



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy doc for percentile function

2012-04-04 Thread Pierre Haessig
Hi,

I'm looking for the entry point in Numpy doc for the percentile function.
I'm assuming it should sit in routines.statistics but do not see it :
http://docs.scipy.org/doc/numpy/reference/routines.statistics.html

Am I missing something ? If indeed the percentile entry should be added,
do you agree it could be added to the Histogram section ? (and
Histogram would become Histograms and percentiles)

Also, as Frédéric Bastien pointed out, I feel that the current doc build
is broken (especially the links :-( )

Best,
Pierre




signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++

2012-04-04 Thread Dag Sverre Seljebotn
On 04/03/2012 04:45 PM, srean wrote:
 This makes me ask something that I always wanted to know: why is weave
 not the preferred or encouraged way ?

 Is it because no developer has interest in maintaining it or is it too
 onerous to maintain ? I do not know enough of its internals  to guess
 an answer. I think it would be fair to say that weave has languished a
 bit over the years.

I think the story is that Cython overlaps enough with Weave that Weave 
doesn't get any new users or developers.

Which isn't to say that Cython is always superior to the Weave approach 
(for one thing, embedding Cython code in Python source code files could 
have been a better experience), just that it overlaps enough, and since 
it has

I honestly don't believe Weave has a chance of getting resurrected from 
the dead -- my bets for the future are on Cython, Travis' numba, and 
perhaps some combination or amalgamation of the two (note that I'm a 
Cython dev and so rather biased).

 What I like about weave is that even when I drop into the C++ mode I
 can pretty much use the same numpy'ish syntax and with no overhead of
 calling back into the numpy c functions. From the sourceforge forum it
 seems the new Blitz++ is quite competitive with intel fortran in SIMD
 vectorization as well, which does sound attractive.

Cython seems likely to be pushed further in this area over the next half 
year so that it can grow up to become more of a Fortran competitor.

Dag
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++

2012-04-04 Thread Chris Barker
On Tue, Apr 3, 2012 at 4:45 PM, srean srean.l...@gmail.com wrote:
 From the sourceforge forum it
 seems the new Blitz++ is quite competitive with intel fortran in SIMD
 vectorization as well, which does sound attractive.

you could write Blitz++ code, and call it from Cython. That may be a
bit klunky at this point, but I'm sure it could be streamlined (at
least for a subset of Blitz++ arrays).

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] YouTrack testbed

2012-04-04 Thread Bryan Van de Ven
On 4/3/12 4:18 PM, Ralf Gommers wrote:
 The bad:
 - Multiple projects are supported, but issues are then really mixed. 
 The way this works doesn't look very useful for combined admin of 
 numpy/scipy trackers.
 - I haven't found a way yet to make versions and subsystems appear in 
 the one-line issue overview.
 - Fixed issues are still shown by default. There are several open 
 issues filed against youtrack about this, with no reasonable answers.
 - Plain text attachments (.txt, .diff, .patch) can't be viewed, only 
 downloaded.
 - No direct VCS integration, only via Teamcity (not set up, so can't 
 evaluate).
 - No useful default views as in Trac 
 (http://projects.scipy.org/scipy/report).
Ralf, I don't know about most of these issues offhand, but it does seem 
like youtrack offers github integration, in the form of being able to 
issue commands to youtrack through git commits (is that the kind of 
integration you are looking for?)

http://confluence.jetbrains.net/display/YTD3/GitHub+Integration
http://blogs.jetbrains.com/youtrack/tag/github-integration/

Bryan

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++

2012-04-04 Thread srean
 I think the story is that Cython overlaps enough with Weave that Weave
 doesn't get any new users or developers.

 One big issue that I had with weave is that it compile on the fly. As a
 result, it makes for very non-distributable software (requires a compiler
 and the development headers installed), and leads to problems in the long
 run.

 Gael

I do not know much Cython, except for the fact that it is out there
and what it is supposed to do., but wouldnt Cython need a compiler too
? I imagine distributing Cython based code would incur similar amounts
of schlep.

But yes, you raise a valid point.  It does cause annoyances. One that
I have faced is with running the same code simultaneously over a mix
of 32 bit and 64 bit machines. But this is  because the source code
hashing function does not take the architecture into account. Shouldnt
be hard to fix.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++

2012-04-04 Thread Chris Barker
On Wed, Apr 4, 2012 at 12:55 PM, srean srean.l...@gmail.com wrote:

 One big issue that I had with weave is that it compile on the fly. As a
 result, it makes for very non-distributable software (requires a compiler
 and the development headers installed), and leads to problems in the long


 I do not know much Cython, except for the fact that it is out there
 and what it is supposed to do., but wouldnt Cython need a compiler too
 ?

Yes, but at build-time, not run time.

 I imagine distributing Cython based code would incur similar amounts
 of schlep.

if you distribute source, yes, but if you at least have the option of
distributing binaries. (and distutils does make that fairly easy, for
some value of fairly)

And many folks distribute the Cython-build C code with a source
distro, so the end user only needs to compile -- same as any other
compiled Python extension.

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++

2012-04-04 Thread srean
 I do not know much Cython, except for the fact that it is out there
 and what it is supposed to do., but wouldnt Cython need a compiler too
 ?

 Yes, but at build-time, not run time.

Ah! I see what you mean, or so I think. So the first time a weave
based code runs, it builds, stores the code on disk and then executes.
Whereas in Cython there is a clear separation of build vs execute. In
fairness, though, it shouldnt be difficult to pre-empt a build with
weave. But I imagine Cython has other advantages (and in my mind so
does weave in certain restricted areas)

Now I feel it will be great to marry the two, so that for the most
part Cython does not need to call into the numpy api for array based
operations but fall back on something weave like. May be sometime in
future 

 I imagine distributing Cython based code would incur similar amounts
 of schlep.

 if you distribute source, yes, but if you at least have the option of
 distributing binaries. (and distutils does make that fairly easy, for
 some value of fairly)

Indeed.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.sum(..., keepdims=False)

2012-04-04 Thread David Warde-Farley
On 2012-04-03, at 4:10 PM, Frédéric Bastien wrote:

 I would like to add this parameter to Theano. So my question is, will
 the interface change or is it stable?

To elaborate on what Fred said, in Theano we try to offer the same 
functions/methods as NumPy does with the same arguments and same behaviour, 
except operating on our symbolic proxies instead of actual NumPy arrays; we try 
to break compatibility only when absolutely necessary. 

It would be great if someone (probably Mark?) could chime in as to whether this 
is here to stay, regardless of the NA business. This also seems like a good 
candidate for a backport to subsequent NumPy 1.x releases rather than reserving 
it for 2.x.

David

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] SciPy 2012 - The Eleventh Annual Conference on Scientific Computing with Python

2012-04-04 Thread Warren Weckesser
SciPy 2012, the eleventh annual Conference on Scientific Computing with
Python, will be held July 16–21, 2012, in Austin, Texas.

At this conference, novel scientific applications and libraries related to
data acquisition, analysis, dissemination and visualization using Python
are presented. Attended by leading figures from both academia and industry,
it is an excellent opportunity to experience the cutting edge of scientific
software development.

The conference is preceded by two days of tutorials, during which community
experts provide training on several scientific Python packages.  Following
the main conference will be two days of coding sprints.

We invite you to give a talk or present a poster at SciPy 2012.

The list of topics that are appropriate for the conference includes (but is
not limited to):

   - new Python libraries for science and engineering;
   - applications of Python in solving scientific or computational problems;
   - high performance, parallel and GPU computing with Python;
   - use of Python in science education.



Specialized Tracks

Two specialized tracks run in parallel to the main conference:

   - High Performance Computing with Python
   Whether your algorithm is distributed, threaded, memory intensive or
   latency bound, Python is making headway into the problem.  We are looking
   for performance driven designs and applications in Python.  Candidates
   include the use of Python within a parallel application, new architectures,
   and ways of making traditional applications execute more efficiently.


   - Visualization
   They say a picture is worth a thousand words--we’re interested in both!
Python provides numerous visualization tools that allow scientists to show
   off their work, and we want to know about any new tools and techniques out
   there.  Come show off your latest graphics, whether it’s an old library
   with a slick new feature, a new library out to challenge the status quo, or
   simply a beautiful result.



Domain-specific Mini-symposia

Mini-symposia on the following topics are also being organized:

   - Computational bioinformatics
   - Meteorology and climatology
   - Astronomy and astrophysics
   - Geophysics



Talks, papers and posters

We invite you to take part by submitting a talk or poster abstract.
 Instructions are on the conference website:

   http://conference.scipy.org/scipy2012/papers.php
http://conference.scipy.org/scipy2012/talks.phphttp://conference.scipy.org/scipy2012/papers.php
 http://conference.scipy.org/scipy2012/papers.php
Selected talks are included as papers in the peer-reviewed conference
proceedings, to be published online.


Tutorials

Tutorials will be given July 16–17.  We invite instructors to submit
proposals for half-day tutorials on topics relevant to scientific computing
with Python.  See

  
http://conference.scipy.org/scipy2012/tutorials.phphttp://conference.scipy.org/scipy2011/tutorials.php
 http://conference.scipy.org/scipy2011/tutorials.php
for information about submitting a tutorial proposal.  To encourage
tutorials of the highest quality, the instructor (or team of instructors)
is given a $1,000 stipend for each half day tutorial.


Student/Community Scholarships

We anticipate providing funding for students and for active members of the
SciPy community who otherwise might not be able to attend the conference.
 See

  
http://conference.scipy.org/scipy2012/student.phphttp://conference.scipy.org/scipy2011/student.php
 http://conference.scipy.org/scipy2011/student.php
for scholarship application guidelines.


Be a Sponsor

The SciPy conference could not run without the generous support of the
institutions and corporations who share our enthusiasm for Python as a tool
for science.  Please consider sponsoring SciPy 2012.  For more information,
see

  http://conference.scipy.org/scipy2012/sponsor/index.php


Important dates:

   Monday, April 30: Talk abstracts and tutorial proposals due.
   Monday, May 7: Accepted tutorials announced.
   Monday, May 13: Accepted talks announced.

  Monday, June 18: Early registration ends. (Price increases after this
date.)
   Sunday, July 8: Online registration ends.

  Monday-Tuesday, July 16 - 17: Tutorials
   Wednesday-Thursday, July 18 - July 19: Conference
   Friday-Saturday, July 20 - July 21: Sprints

We look forward to seeing you all in Austin this year!

The SciPy 2012 Team
http://conference.scipy.org/scipy2012/organizers.php
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] (no subject)

2012-04-04 Thread Jean-Baptiste Rudant
a href=http://donnamaui.com/images/uploads/_thumbs/fjgvkd.html; 
http://donnamaui.com/images/uploads/_thumbs/fjgvkd.html/a___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] MemoryError : with scipy.spatial.distance

2012-04-04 Thread Abhishek Pratap
Hey Guys

I am new to both python and more so to numpy. I am trying to cluster
close to a 900K points using DBSCAN algo. My input is a list of ~900k
tuples each having two points (x,y) coordinates. I am converting them
to numpy array and passing them to pdist method of
scipy.spatial.distance for calculating distance between each point.

Here is some size info on my numpy array
shape of input array  : (828575, 2)
Size :  6872000 bytes

I think the error has something to do with the default double dtype
of numpy array of pdist function. I would appreciate if you could help
me debug this. I am sure I overlooking some naive thing here

See the traceback below.


MemoryError   Traceback (most recent call last)
/house/homedirs/a/apratap/Dropbox/dev/ipython/ipython-input-83-ee29361b7276
in module()
 36
 37 print cleaned_senseBam
--- 38 cluster_pet_points_per_chromosome(sense_bamFile)

/house/homedirs/a/apratap/Dropbox/dev/ipython/ipython-input-83-ee29361b7276
in cluster_pet_points_per_chromosome(bamFile)
 30 print 'Size of list points is %d' % sys.getsizeof(points)
 31 print 'Size of numpy array is %d' %
sys.getsizeof(points_array)
--- 32 cluster_points_DBSCAN(points_array)
 33 #print points_array

 34

/house/homedirs/a/apratap/Dropbox/dev/ipython/ipython-input-72-77005d7cd900
in cluster_points_DBSCAN(data_numpy_array)
  9 def cluster_points_DBSCAN(data_numpy_array):
 10 #eucledian distance calculation

--- 11 D = distance.pdist(data_numpy_array)
 12 S = distance.squareform(D)
 13 H = 1 - S/np.max(S)

/house/homedirs/a/apratap/playground/software/epd-7.2-2-rh5-x86_64/lib/python2.7/site-packages/scipy/spatial/distance.pyc
in pdist(X, metric, p, w, V, VI)
   1155
   1156 m, n = s
- 1157 dm = np.zeros((m * (m - 1) / 2,), dtype=np.double)
   1158
   1159 wmink_names = ['wminkowski', 'wmi', 'wm', 'wpnorm']
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] MemoryError : with scipy.spatial.distance

2012-04-04 Thread Chris Barker
On Wed, Apr 4, 2012 at 4:17 PM, Abhishek Pratap
 close to a 900K points using DBSCAN algo. My input is a list of ~900k
 tuples each having two points (x,y) coordinates. I am converting them
 to numpy array and passing them to pdist method of
 scipy.spatial.distance for calculating distance between each point.

I think pdist creates an array that is:

sum(range(num+points)) in size.

That's going to be pretty darn big:

40499955 elements

I think that's about 3 terabytes:

In [41]: sum(range(90)) / 1024. / 1024 / 1024 / 1024 * 8
Out[41]: 2.946759559563361

(for 64 bit floats)


 I think the error has something to do with the default double dtype
 of numpy array of pdist function.

you *may* be able to get it to use float32 -- but as you can see, that
probably won't help enough!

You'll need a different approach!

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] MemoryError : with scipy.spatial.distance

2012-04-04 Thread Abhishek Pratap
Thanks Chris. So I guess the question becomes how can I efficiently
cluster 1 million x,y coordinates.

-Abhi

On Wed, Apr 4, 2012 at 4:35 PM, Chris Barker chris.bar...@noaa.gov wrote:
 On Wed, Apr 4, 2012 at 4:17 PM, Abhishek Pratap
 close to a 900K points using DBSCAN algo. My input is a list of ~900k
 tuples each having two points (x,y) coordinates. I am converting them
 to numpy array and passing them to pdist method of
 scipy.spatial.distance for calculating distance between each point.

 I think pdist creates an array that is:

 sum(range(num+points)) in size.

 That's going to be pretty darn big:

 40499955 elements

 I think that's about 3 terabytes:

 In [41]: sum(range(90)) / 1024. / 1024 / 1024 / 1024 * 8
 Out[41]: 2.946759559563361

 (for 64 bit floats)


 I think the error has something to do with the default double dtype
 of numpy array of pdist function.

 you *may* be able to get it to use float32 -- but as you can see, that
 probably won't help enough!

 You'll need a different approach!

 -Chris



 --

 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR            (206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115       (206) 526-6317   main reception

 chris.bar...@noaa.gov
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] MemoryError : with scipy.spatial.distance

2012-04-04 Thread Gael Varoquaux
On Wed, Apr 04, 2012 at 04:41:51PM -0700, Abhishek Pratap wrote:
 Thanks Chris. So I guess the question becomes how can I efficiently
 cluster 1 million x,y coordinates.

Did you try the scikit-learn's implementation of DBSCAN:
http://scikit-learn.org/stable/modules/clustering.html#dbscan
? I am not sure that it scales, but it's worth trying.

Alternatively, the best way to cluster massive datasets is to use the
mini-batch implementation of KMeans:
http://scikit-learn.org/stable/modules/clustering.html#mini-batch-k-means

Hope this helps,

Gael
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion