Re: [Numpy-discussion] numpy.sum(..., keepdims=False)
Hi, Le 03/04/2012 22:10, Frédéric Bastien a écrit : I would like to add this parameter to Theano. So my question is, will the interface change or is it stable? I don't know for the stability, but for the existence of this new parameter: https://github.com/numpy/numpy/blob/master/numpy/core/fromnumeric.py looking at def sum(...) it seems the keepdims=False parameter is here and was introduced 7 months ago by Mark Wiebe and Charles Harris. The docstring indeed says : keepdims : bool, optional If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original `arr`. The commit message also mentions the skipna parameter, which is part of the overall NA implementation which is indeed tagged as somehow experimental (if I'm correct ! ), but I would assume that the keepdims=False parameter is an orthogonal issue. Hopefully somebody can give you a more precise answer ! Best, Pierre signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numpy doc for percentile function
Hi, I'm looking for the entry point in Numpy doc for the percentile function. I'm assuming it should sit in routines.statistics but do not see it : http://docs.scipy.org/doc/numpy/reference/routines.statistics.html Am I missing something ? If indeed the percentile entry should be added, do you agree it could be added to the Histogram section ? (and Histogram would become Histograms and percentiles) Also, as Frédéric Bastien pointed out, I feel that the current doc build is broken (especially the links :-( ) Best, Pierre signature.asc Description: OpenPGP digital signature ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++
On 04/03/2012 04:45 PM, srean wrote: This makes me ask something that I always wanted to know: why is weave not the preferred or encouraged way ? Is it because no developer has interest in maintaining it or is it too onerous to maintain ? I do not know enough of its internals to guess an answer. I think it would be fair to say that weave has languished a bit over the years. I think the story is that Cython overlaps enough with Weave that Weave doesn't get any new users or developers. Which isn't to say that Cython is always superior to the Weave approach (for one thing, embedding Cython code in Python source code files could have been a better experience), just that it overlaps enough, and since it has I honestly don't believe Weave has a chance of getting resurrected from the dead -- my bets for the future are on Cython, Travis' numba, and perhaps some combination or amalgamation of the two (note that I'm a Cython dev and so rather biased). What I like about weave is that even when I drop into the C++ mode I can pretty much use the same numpy'ish syntax and with no overhead of calling back into the numpy c functions. From the sourceforge forum it seems the new Blitz++ is quite competitive with intel fortran in SIMD vectorization as well, which does sound attractive. Cython seems likely to be pushed further in this area over the next half year so that it can grow up to become more of a Fortran competitor. Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++
On Tue, Apr 3, 2012 at 4:45 PM, srean srean.l...@gmail.com wrote: From the sourceforge forum it seems the new Blitz++ is quite competitive with intel fortran in SIMD vectorization as well, which does sound attractive. you could write Blitz++ code, and call it from Cython. That may be a bit klunky at this point, but I'm sure it could be streamlined (at least for a subset of Blitz++ arrays). -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] YouTrack testbed
On 4/3/12 4:18 PM, Ralf Gommers wrote: The bad: - Multiple projects are supported, but issues are then really mixed. The way this works doesn't look very useful for combined admin of numpy/scipy trackers. - I haven't found a way yet to make versions and subsystems appear in the one-line issue overview. - Fixed issues are still shown by default. There are several open issues filed against youtrack about this, with no reasonable answers. - Plain text attachments (.txt, .diff, .patch) can't be viewed, only downloaded. - No direct VCS integration, only via Teamcity (not set up, so can't evaluate). - No useful default views as in Trac (http://projects.scipy.org/scipy/report). Ralf, I don't know about most of these issues offhand, but it does seem like youtrack offers github integration, in the form of being able to issue commands to youtrack through git commits (is that the kind of integration you are looking for?) http://confluence.jetbrains.net/display/YTD3/GitHub+Integration http://blogs.jetbrains.com/youtrack/tag/github-integration/ Bryan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++
I think the story is that Cython overlaps enough with Weave that Weave doesn't get any new users or developers. One big issue that I had with weave is that it compile on the fly. As a result, it makes for very non-distributable software (requires a compiler and the development headers installed), and leads to problems in the long run. Gael I do not know much Cython, except for the fact that it is out there and what it is supposed to do., but wouldnt Cython need a compiler too ? I imagine distributing Cython based code would incur similar amounts of schlep. But yes, you raise a valid point. It does cause annoyances. One that I have faced is with running the same code simultaneously over a mix of 32 bit and 64 bit machines. But this is because the source code hashing function does not take the architecture into account. Shouldnt be hard to fix. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++
On Wed, Apr 4, 2012 at 12:55 PM, srean srean.l...@gmail.com wrote: One big issue that I had with weave is that it compile on the fly. As a result, it makes for very non-distributable software (requires a compiler and the development headers installed), and leads to problems in the long I do not know much Cython, except for the fact that it is out there and what it is supposed to do., but wouldnt Cython need a compiler too ? Yes, but at build-time, not run time. I imagine distributing Cython based code would incur similar amounts of schlep. if you distribute source, yes, but if you at least have the option of distributing binaries. (and distutils does make that fairly easy, for some value of fairly) And many folks distribute the Cython-build C code with a source distro, so the end user only needs to compile -- same as any other compiled Python extension. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] creating/working NumPy-ndarrays in C++
I do not know much Cython, except for the fact that it is out there and what it is supposed to do., but wouldnt Cython need a compiler too ? Yes, but at build-time, not run time. Ah! I see what you mean, or so I think. So the first time a weave based code runs, it builds, stores the code on disk and then executes. Whereas in Cython there is a clear separation of build vs execute. In fairness, though, it shouldnt be difficult to pre-empt a build with weave. But I imagine Cython has other advantages (and in my mind so does weave in certain restricted areas) Now I feel it will be great to marry the two, so that for the most part Cython does not need to call into the numpy api for array based operations but fall back on something weave like. May be sometime in future I imagine distributing Cython based code would incur similar amounts of schlep. if you distribute source, yes, but if you at least have the option of distributing binaries. (and distutils does make that fairly easy, for some value of fairly) Indeed. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.sum(..., keepdims=False)
On 2012-04-03, at 4:10 PM, Frédéric Bastien wrote: I would like to add this parameter to Theano. So my question is, will the interface change or is it stable? To elaborate on what Fred said, in Theano we try to offer the same functions/methods as NumPy does with the same arguments and same behaviour, except operating on our symbolic proxies instead of actual NumPy arrays; we try to break compatibility only when absolutely necessary. It would be great if someone (probably Mark?) could chime in as to whether this is here to stay, regardless of the NA business. This also seems like a good candidate for a backport to subsequent NumPy 1.x releases rather than reserving it for 2.x. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] SciPy 2012 - The Eleventh Annual Conference on Scientific Computing with Python
SciPy 2012, the eleventh annual Conference on Scientific Computing with Python, will be held July 16–21, 2012, in Austin, Texas. At this conference, novel scientific applications and libraries related to data acquisition, analysis, dissemination and visualization using Python are presented. Attended by leading figures from both academia and industry, it is an excellent opportunity to experience the cutting edge of scientific software development. The conference is preceded by two days of tutorials, during which community experts provide training on several scientific Python packages. Following the main conference will be two days of coding sprints. We invite you to give a talk or present a poster at SciPy 2012. The list of topics that are appropriate for the conference includes (but is not limited to): - new Python libraries for science and engineering; - applications of Python in solving scientific or computational problems; - high performance, parallel and GPU computing with Python; - use of Python in science education. Specialized Tracks Two specialized tracks run in parallel to the main conference: - High Performance Computing with Python Whether your algorithm is distributed, threaded, memory intensive or latency bound, Python is making headway into the problem. We are looking for performance driven designs and applications in Python. Candidates include the use of Python within a parallel application, new architectures, and ways of making traditional applications execute more efficiently. - Visualization They say a picture is worth a thousand words--we’re interested in both! Python provides numerous visualization tools that allow scientists to show off their work, and we want to know about any new tools and techniques out there. Come show off your latest graphics, whether it’s an old library with a slick new feature, a new library out to challenge the status quo, or simply a beautiful result. Domain-specific Mini-symposia Mini-symposia on the following topics are also being organized: - Computational bioinformatics - Meteorology and climatology - Astronomy and astrophysics - Geophysics Talks, papers and posters We invite you to take part by submitting a talk or poster abstract. Instructions are on the conference website: http://conference.scipy.org/scipy2012/papers.php http://conference.scipy.org/scipy2012/talks.phphttp://conference.scipy.org/scipy2012/papers.php http://conference.scipy.org/scipy2012/papers.php Selected talks are included as papers in the peer-reviewed conference proceedings, to be published online. Tutorials Tutorials will be given July 16–17. We invite instructors to submit proposals for half-day tutorials on topics relevant to scientific computing with Python. See http://conference.scipy.org/scipy2012/tutorials.phphttp://conference.scipy.org/scipy2011/tutorials.php http://conference.scipy.org/scipy2011/tutorials.php for information about submitting a tutorial proposal. To encourage tutorials of the highest quality, the instructor (or team of instructors) is given a $1,000 stipend for each half day tutorial. Student/Community Scholarships We anticipate providing funding for students and for active members of the SciPy community who otherwise might not be able to attend the conference. See http://conference.scipy.org/scipy2012/student.phphttp://conference.scipy.org/scipy2011/student.php http://conference.scipy.org/scipy2011/student.php for scholarship application guidelines. Be a Sponsor The SciPy conference could not run without the generous support of the institutions and corporations who share our enthusiasm for Python as a tool for science. Please consider sponsoring SciPy 2012. For more information, see http://conference.scipy.org/scipy2012/sponsor/index.php Important dates: Monday, April 30: Talk abstracts and tutorial proposals due. Monday, May 7: Accepted tutorials announced. Monday, May 13: Accepted talks announced. Monday, June 18: Early registration ends. (Price increases after this date.) Sunday, July 8: Online registration ends. Monday-Tuesday, July 16 - 17: Tutorials Wednesday-Thursday, July 18 - July 19: Conference Friday-Saturday, July 20 - July 21: Sprints We look forward to seeing you all in Austin this year! The SciPy 2012 Team http://conference.scipy.org/scipy2012/organizers.php ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] (no subject)
a href=http://donnamaui.com/images/uploads/_thumbs/fjgvkd.html; http://donnamaui.com/images/uploads/_thumbs/fjgvkd.html/a___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] MemoryError : with scipy.spatial.distance
Hey Guys I am new to both python and more so to numpy. I am trying to cluster close to a 900K points using DBSCAN algo. My input is a list of ~900k tuples each having two points (x,y) coordinates. I am converting them to numpy array and passing them to pdist method of scipy.spatial.distance for calculating distance between each point. Here is some size info on my numpy array shape of input array : (828575, 2) Size : 6872000 bytes I think the error has something to do with the default double dtype of numpy array of pdist function. I would appreciate if you could help me debug this. I am sure I overlooking some naive thing here See the traceback below. MemoryError Traceback (most recent call last) /house/homedirs/a/apratap/Dropbox/dev/ipython/ipython-input-83-ee29361b7276 in module() 36 37 print cleaned_senseBam --- 38 cluster_pet_points_per_chromosome(sense_bamFile) /house/homedirs/a/apratap/Dropbox/dev/ipython/ipython-input-83-ee29361b7276 in cluster_pet_points_per_chromosome(bamFile) 30 print 'Size of list points is %d' % sys.getsizeof(points) 31 print 'Size of numpy array is %d' % sys.getsizeof(points_array) --- 32 cluster_points_DBSCAN(points_array) 33 #print points_array 34 /house/homedirs/a/apratap/Dropbox/dev/ipython/ipython-input-72-77005d7cd900 in cluster_points_DBSCAN(data_numpy_array) 9 def cluster_points_DBSCAN(data_numpy_array): 10 #eucledian distance calculation --- 11 D = distance.pdist(data_numpy_array) 12 S = distance.squareform(D) 13 H = 1 - S/np.max(S) /house/homedirs/a/apratap/playground/software/epd-7.2-2-rh5-x86_64/lib/python2.7/site-packages/scipy/spatial/distance.pyc in pdist(X, metric, p, w, V, VI) 1155 1156 m, n = s - 1157 dm = np.zeros((m * (m - 1) / 2,), dtype=np.double) 1158 1159 wmink_names = ['wminkowski', 'wmi', 'wm', 'wpnorm'] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] MemoryError : with scipy.spatial.distance
On Wed, Apr 4, 2012 at 4:17 PM, Abhishek Pratap close to a 900K points using DBSCAN algo. My input is a list of ~900k tuples each having two points (x,y) coordinates. I am converting them to numpy array and passing them to pdist method of scipy.spatial.distance for calculating distance between each point. I think pdist creates an array that is: sum(range(num+points)) in size. That's going to be pretty darn big: 40499955 elements I think that's about 3 terabytes: In [41]: sum(range(90)) / 1024. / 1024 / 1024 / 1024 * 8 Out[41]: 2.946759559563361 (for 64 bit floats) I think the error has something to do with the default double dtype of numpy array of pdist function. you *may* be able to get it to use float32 -- but as you can see, that probably won't help enough! You'll need a different approach! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] MemoryError : with scipy.spatial.distance
Thanks Chris. So I guess the question becomes how can I efficiently cluster 1 million x,y coordinates. -Abhi On Wed, Apr 4, 2012 at 4:35 PM, Chris Barker chris.bar...@noaa.gov wrote: On Wed, Apr 4, 2012 at 4:17 PM, Abhishek Pratap close to a 900K points using DBSCAN algo. My input is a list of ~900k tuples each having two points (x,y) coordinates. I am converting them to numpy array and passing them to pdist method of scipy.spatial.distance for calculating distance between each point. I think pdist creates an array that is: sum(range(num+points)) in size. That's going to be pretty darn big: 40499955 elements I think that's about 3 terabytes: In [41]: sum(range(90)) / 1024. / 1024 / 1024 / 1024 * 8 Out[41]: 2.946759559563361 (for 64 bit floats) I think the error has something to do with the default double dtype of numpy array of pdist function. you *may* be able to get it to use float32 -- but as you can see, that probably won't help enough! You'll need a different approach! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] MemoryError : with scipy.spatial.distance
On Wed, Apr 04, 2012 at 04:41:51PM -0700, Abhishek Pratap wrote: Thanks Chris. So I guess the question becomes how can I efficiently cluster 1 million x,y coordinates. Did you try the scikit-learn's implementation of DBSCAN: http://scikit-learn.org/stable/modules/clustering.html#dbscan ? I am not sure that it scales, but it's worth trying. Alternatively, the best way to cluster massive datasets is to use the mini-batch implementation of KMeans: http://scikit-learn.org/stable/modules/clustering.html#mini-batch-k-means Hope this helps, Gael ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion