Re: [Scikit-learn-general] [Matplotlib-users] Scipy2016: call for proposals

2016-03-07 Thread Jacob Vanderplas
I'm not going to be able to make it this year, unfortunately. Jake Jake VanderPlas Senior Data Science Fellow Director of Research in Physical Sciences University of Washington eScience Institute On Mon, Mar 7, 2016 at 9:31 AM, Andreas Mueller wrote: > Are any more core

Re: [Scikit-learn-general] k-NN user defined distance

2016-02-23 Thread Jacob Vanderplas
> > I have been experimenting with the above code. I have noticed the > following things: > > >1. If we set algorithm = 'brute' the algorithm does not enter the >function tan, i.e., putting a breakpoint at the print statement does not >stop execution on it during the fit method. It

Re: [Scikit-learn-general] Using Typed MemoryViews for Numpy Arrays

2016-02-11 Thread Jacob Vanderplas
escribed by PEP 3118 of python). Thoughts on whether this is something > useful for the scikit community? I am probably going to make this change in > my local branch, anyway. I can push these changes back to scikit if there > is interest. > > Thanks, > Mahesh > > > >

Re: [Scikit-learn-general] Using Typed MemoryViews for Numpy Arrays

2016-02-10 Thread Jacob Vanderplas
Hi Mahesh, Regarding the raw data access, what specific parts of the code are you looking at? Thanks, Jake Jake VanderPlas Senior Data Science Fellow Director of Research in Physical Sciences University of Washington eScience Institute On Wed, Feb 10, 2016 at 6:09 PM, mahesh ravishankar <

Re: [Scikit-learn-general] Building sklearn for different python versions (development)

2016-01-26 Thread Jacob Vanderplas
> I don't see an easy way to maintain the changes in two different directories. If both directories are Git repositories linked to a common remote, you could commit the changes on a branch and then sync them that way. Jake VanderPlas Senior Data Science Fellow Director of Research in Physical

Re: [Scikit-learn-general] Building sklearn for different python versions (development)

2016-01-25 Thread Jacob Vanderplas
Hi Antoine, For this type of thing I use conda environments: http://conda.pydata.org/docs/using/envs.html The other thing to keep in mind is that if you're installing from the same source directory, you'll need to do a clean install each time; i.e. type ``python setup.py clean`` before typing

Re: [Scikit-learn-general] PyCon 2016 scikit-learn tutorial

2015-09-30 Thread Jacob Vanderplas
would make sense to work of our scipy ones and improve them > further. > I'd be happy to work on it. > We have some more exercises in a branch, and I have also improved versions > of some of the notebooks that I have been using for teaching. > > Andy > > > On 09/29/2015 06:48 PM, J

Re: [Scikit-learn-general] PyCon 2016 scikit-learn tutorial

2015-09-30 Thread Jacob Vanderplas
tually very important topics, and I noticed that they > typically > > fall a little bit short in the general ML tutorials; typically, because > > these tutorials work with a single, specific dataset. Unfortunately, I > have > > seen a couple of applications where nominal string v

[Scikit-learn-general] PyCon 2016 scikit-learn tutorial

2015-09-29 Thread Jacob Vanderplas
Hi All, PyCon 2016 call for proposals just opened. For the last several years Olivier and I have been teaching a two-part scikit-learn tutorial at each PyCon, and I think they have gone over well. As the conference is just a few hour train ride away

Re: [Scikit-learn-general] KNeighboursClassifier has slow performance when using other than the default distance parameters

2015-06-11 Thread Jacob Vanderplas
Hi, As far as I understand, the reason that any metric other than euclidean run more slowly than expected is due to lack of compile-time inlining. The function to compute a general metric is selected via a string argument which is not known until runtime. For this reason the cython compiler can't

Re: [Scikit-learn-general] add a new point to the ball tree

2015-05-15 Thread Jacob Vanderplas
The tree code in scikit-learn is designed for static datasets. To add a new point you have to reconstruct the tree, Jake Jake VanderPlas Director of Research – Physical Sciences eScience Institute, University of Washington http://www.vanderplas.com On Fri, May 15, 2015 at 4:42 AM, nafise

Re: [Scikit-learn-general] Ball tree - different metrics

2015-05-14 Thread Jacob Vanderplas
Hi Nafise, In general, yes the execution time will depend on the metric, and in a way that can be very difficult to predict. One reason is that because the euclidean metric is the most common, that case is slightly optimized. But that won't lead to a factor of ten. The Ball Tree works by

Re: [Scikit-learn-general] Ball tree - different metrics

2015-05-14 Thread Jacob Vanderplas
, best of 3: 403 ms per loop In [54]: d = sklearn.neighbors.dist_metrics.MinkowskiDistance(1000) In [55]: %timeit d.pairwise(x, y) 1 loops, best of 3: 472 ms per loop In [56]: On Thu, May 14, 2015 at 3:14 PM, Jacob Vanderplas jake...@cs.washington.edu wrote: Hi Nafise, In general, yes

Re: [Scikit-learn-general] Ball tree - different metrics

2015-05-14 Thread Jacob Vanderplas
-defined one. On Thursday, May 14, 2015 11:24 PM, nafise mehdipoor mehdipour...@yahoo.com wrote: Thank you so much. It was an example and the real metric will be defined within ball-tree-valid-metric conditions. The best. On Thursday, May 14, 2015 5:47 PM, Jacob Vanderplas jake

Re: [Scikit-learn-general] Ball tree - different metrics

2015-05-14 Thread Jacob Vanderplas
with a judiciously chosen kernel, Jake Jake VanderPlas Director of Research – Physical Sciences eScience Institute, University of Washington http://www.vanderplas.com On Thu, May 14, 2015 at 4:54 PM, Jacob Vanderplas jake...@cs.washington.edu wrote: Sorry – I should have better specified what I

Re: [Scikit-learn-general] BallTree query

2015-05-05 Thread Jacob Vanderplas
Hi Nafieseh, The strength of the BallTree is querying neighbors without actually computing all distances. If you wish to simply compute distances between specified points, the better tools are in the pairwise submodule. For example: In [1]: import numpy as np In [2]: from sklearn.metrics

Re: [Scikit-learn-general] Question about KernelDensity implementation

2014-11-05 Thread Jacob Vanderplas
Sorry about that oversight in the design! A common test to catch those sorts of inconsistencies would be useful. The biggest problem is that KernelDensity is not fundamentally a classifier, regressor, or transformer, but a density estimator. When I initially did the KDE pull request, I floated

Re: [Scikit-learn-general] Question about KernelDensity implementation

2014-10-21 Thread Jacob Vanderplas
Hi Jose, The KDE implementation does work on multivariate data, and will in general work for multimodal data as well. There are two caveats to that: 1. In the sklearn implementation, the bandwidth must be the same across each dimension. If this poses a problem for your data, the data can be

Re: [Scikit-learn-general] error in grid search for KNN

2014-08-25 Thread Jacob Vanderplas
Hi, This is a bug, and is related to https://github.com/scikit-learn/scikit-learn/issues/2609 Jake Jake VanderPlas Director of Research – Physical Sciences eScience Institute, University of Washington http://www.vanderplas.com On Mon, Aug 25, 2014 at 8:12 AM, Sheila the angel

Re: [Scikit-learn-general] Double Gaussian fitting

2014-03-28 Thread Jacob Vanderplas
Hi, From what I understand, mlab.normpdf expects the standard deviation, while you're passing it the variance, which is the square of the standard deviation. If you want to see what the fit looks like, I think it's much better to just let sklearn do the work, e.g. plt.plot(x,

Re: [Scikit-learn-general] 2D features

2014-01-21 Thread Jacob Vanderplas
On Mon, Jan 20, 2014 at 8:19 AM, Su, Jian, Ph.D. su.j...@mayo.edu wrote: Hi, I now know how to select features with numbers (0D), but some of the features are vectors (1D) or arrays (2D) or possibly higher-dimensional. For example, for object 1 we have two features: Feature #1: 1

Re: [Scikit-learn-general] Bumping the dependencies (numpy and scipy) to the versions from Ubuntu Precise 12.04 LTS

2014-01-06 Thread Jacob Vanderplas
On Mon, Jan 6, 2014 at 4:23 AM, Olivier Grisel olivier.gri...@ensta.orgwrote: - numpy 1.6.1+ (see http://packages.ubuntu.com/precise/python-numpy) - scipy 0.9.0+ (see http://packages.ubuntu.com/precise/python-scipy ) +1 as well: especially with build systems like conda and wheels becoming

Re: [Scikit-learn-general] Possible replacement for Gaussian Process module

2014-01-04 Thread Jacob Vanderplas
Hi, I would tentatively be in favor of this, though I haven't yet looked closely at the proposed code. I have found sklearn's gaussian process module to be very opaque, to the point of being unusable. I've ended up spinning up my own implementation for several applications, and I know several

Re: [Scikit-learn-general] Add polynomial and MARS regression methods

2013-12-08 Thread Jacob Vanderplas
Hi, For Polynomial regression, take a look at this PR, which was recently merged: https://github.com/scikit-learn/scikit-learn/pull/2585 An example of this new feature in use is here:

Re: [Scikit-learn-general] Contributing code

2013-10-17 Thread Jacob Vanderplas
Hi Carlos, Welcome! We'd love to have you contribute. You can start by reading through the developers guide on our website, and following the suggestions there: http://scikit-learn.org/stable/developers/ Feel free to ask here if any questions come up, Jake On Wed, Oct 16, 2013 at 2:09 PM,

Re: [Scikit-learn-general] Proposed feature for graph_shortest_path

2013-10-11 Thread Jacob Vanderplas
Hi Eric, I think this sort of enhancement might fit better in scipy: much of the sparse graph package is back-ported from there. Regarding your specific problem: you might try using the ``dijkstra()`` function directly and passing the list of nodes you're interested in to the ``indices``

Re: [Scikit-learn-general] fit GMM on 2D data

2013-09-26 Thread Jacob Vanderplas
eval() expects data of the same dimension as the fit. Your fit data is shape (48, 2), which is interpreted as 48 points in 2 dimensions. Your eval data is shape (48,) which scikit-learn cannot interpret as (n_samples, n_features). If you fit the model on two-dimensional data, you must call eval

Re: [Scikit-learn-general] Which scikit-learn contributors share common interests?

2013-09-25 Thread Jacob Vanderplas
Very cool! One quick comment: I'd probably normalize the values in the sparse matrix to 1. As it's written, a user with, say, 1 commit on a file will be considered a closer neighbor to a user with 0 commits on that file than to a user with 3 commits on that file. Jake On Wed, Sep 25, 2013 at

Re: [Scikit-learn-general] EM Algorithm Example

2013-09-07 Thread Jacob Vanderplas
David, Have you looked at the K Means algorithm? It uses a similar approach of a two-phase iteration to determine clustering. In K means you're looking for K cluster centers, such that when each point is assigned to the nearest cluster, the total of the distances from points to their clusters is

[Scikit-learn-general] PyCon 2014 Tutorials

2013-08-29 Thread Jacob Vanderplas
Hi all, PyCon 2014 is next April 9-17 in Montreal, Quebec. Talk and tutorial proposals are due on September 15th, just a couple weeks away. I'll likely attend again this year, and based on past experience I think another scikit-learn tutorial would be very well-received. I have a lot of

Re: [Scikit-learn-general] NIPS's Machine Learning Open Source Software workshop

2013-08-28 Thread Jacob Vanderplas
On 22 August 2013 16:13, Jacob Vanderplas jake...@cs.washington.edu wrote: Nelle, That's a great idea! I hadn't been planning on attending, but because NIPS is so close to me this year I think I may be able to make it if there were a compelling reason to be there. Is anybody else from

Re: [Scikit-learn-general] KDTree/BallTree surprising benchmark results

2013-07-30 Thread Jacob Vanderplas
On Tue, Jul 30, 2013 at 10:07 AM, Olivier Grisel olivier.gri...@ensta.orgwrote: According to your tests, sklearn KDtree seem often faster at test time which is the most important IMHO. Also BallTree is mostly interesting to be able to plug custom metrics (non axis aligned as KD-tree

Re: [Scikit-learn-general] Defining a Density Estimation Interface

2013-07-08 Thread Jacob Vanderplas
On Mon, Jul 8, 2013 at 10:20 AM, Bertrand Thirion bertrand.thir...@inria.fr wrote: [clip] Sounds good to me. As a matter of taste, I like `log_likelihood`, which would be a synonym of `eval` in that case (as a second choice, log_density rather than log_probability) ? I'm -1 on

[Scikit-learn-general] Defining a Density Estimation Interface

2013-07-07 Thread Jacob Vanderplas
Hi, I've been working on a big rewrite of the Ball Tree and KD Tree in sklearn.neighbors [0], and one of the enhancements is a fast Kernel Density estimation routine. As part of the PR, I've created a KernelDensity class to wrap this functionality. For the initial pass at the interface, I've

Re: [Scikit-learn-general] sklearn.utils.cs_graph_components in a broken state?

2013-06-05 Thread Jacob Vanderplas
Hi, The cs_graph_components routine in sklearn utils has a few problems: in particular, though the doc string says it only accesses the upper triangular part of the matrix, this is not true. Internally it assumes the matrix is symmetric, and will not return the correct results if the

Re: [Scikit-learn-general] How do we backport code effectively?

2013-05-02 Thread Jacob Vanderplas
Take a look at sklearn/utils/arpack.py. This is a backport of the scipy arpack: the file is basically copied literally, and then a hook at the bottom that replaces the main functions with the scipy version, if they're available. Jake On Thu, May 2, 2013 at 6:03 AM, Lars Buitinck

Re: [Scikit-learn-general] How do we backport code effectively?

2013-05-02 Thread Jacob Vanderplas
not a problem with testing backports per se. Jake [1] https://github.com/scikit-learn/scikit-learn/pull/984 On Thu, May 2, 2013 at 7:17 AM, Andreas Mueller amuel...@ais.uni-bonn.dewrote: On 05/02/2013 03:14 PM, Lars Buitinck wrote: 2013/5/2 Jacob Vanderplas jake...@cs.washington.edu

Re: [Scikit-learn-general] Our own Olivier Grisel giving a scipy keynote

2013-04-18 Thread Jacob Vanderplas
Anne, I'm not aware of any bigger Python conferences in that part of the US any time soon. PyCon 2014 is going to be in Montreal, which is slightly closer to MI (but still a ten hour drive, which probably doesn't help). If you need to stay local, you can often find Python meetups in your own

Re: [Scikit-learn-general] Our own Olivier Grisel giving a scipy keynote

2013-04-17 Thread Jacob Vanderplas
One spring break during college, I drove from Grand Rapids, Michigan to Austin, TX ;) On Wed, Apr 17, 2013 at 6:27 AM, Anne Dwyer anne.p.dw...@gmail.com wrote: Olivier, I live way over in Michigan in the middle of the US. Are there any SciPy events being planned within driving distance from

Re: [Scikit-learn-general] algorithm solve classical MDS with SVD

2013-03-28 Thread Jacob Vanderplas
On Thu, Mar 28, 2013 at 10:10 AM, Lars Buitinck l.j.buiti...@uva.nl wrote: 2013/3/28 Mathieu Blondel math...@mblondel.org: Encoding missing values with np.nan doesn't scale to very high-dimensional problems with mostly missing values. Personally, for encoding missing data, I just use

Re: [Scikit-learn-general] numba, cython and relation to sklearn future

2013-03-04 Thread Jacob Vanderplas
I've played with numba a bit. Right now, installation of numba can be quite a headache. It took me a couple hours to get it up and running, and that was on a linux machine. I'd bet it would be even more difficult on a mac or (heaven forbid) windows box. That being said, I know Anaconda

Re: [Scikit-learn-general] Speakers for SF Data Mining meetup

2013-02-28 Thread Jacob Vanderplas
Todd, I'm based out of Seattle, though I make it down to the Bay Area fairly frequently. As Olivier mentioned, I'll be in the area during PyCon and PyData in a few weeks. You should definitely try to take advantage of Olivier being in town: he's doing some really interesting work in extending

[Scikit-learn-general] Scipy 2013 in Austin TX

2013-02-27 Thread Jacob Vanderplas
Hi folks, The call for tutorial talk proposals for Scipy 2013 is open, and tutorial proposals are due by the end of March. The themes for Scipy 2013 include Machine Learning -- see the info here: http://conference.scipy.org/scipy2013/tutorial_overview.php I've talked to Francesc, who is the

Re: [Scikit-learn-general] Scipy 2013 in Austin TX

2013-02-27 Thread Jacob Vanderplas
Gael, That would be great! Let's wait a bit to hear if others are interested, and then I'll start an off-list email chain to discuss ideas. Jake On Wed, Feb 27, 2013 at 10:17 AM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Wed, Feb 27, 2013 at 10:04:25AM -0800, Jacob Vanderplas

Re: [Scikit-learn-general] Scipy 2013 in Austin TX

2013-02-27 Thread Jacob Vanderplas
at 10:23 AM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Wed, Feb 27, 2013 at 10:22:02AM -0800, Jacob Vanderplas wrote: Let's wait a bit to hear if others are interested, and then I'll start an off-list email chain to discuss ideas. I'll be probably be gone on vacations by then: I

Re: [Scikit-learn-general] Scipy 2013 in Austin TX

2013-02-27 Thread Jacob Vanderplas
Will do, thanks Gael. Enjoy your vacation! Jake On Wed, Feb 27, 2013 at 12:12 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Wed, Feb 27, 2013 at 11:34:43AM -0800, Jacob Vanderplas wrote: Well, since communication time is limited, I'd be happy to work on a proposal on my

Re: [Scikit-learn-general] A sprint in Paris in April ?

2013-02-11 Thread Jacob Vanderplas
I'll be in northern Europe (Denmark Germany) for vacation during the first part of April. It's unlikely I'll be able to change my itinerary to get down to Paris afterward, but if the sprint is the week of April 14th I might be able to work something out. Don't plan it around me, though - it's

[Scikit-learn-general] Ball Tree: speed vs. flexibility/maintainability

2013-02-04 Thread Jacob Vanderplas
Hi folks, On and off for the last several months, I've been looking at the Ball Tree code with a mind to revamping it for increased flexibility and maintainability. A few points I'm aiming for: 1) a more intuitive algorithm for walking the tree (the current code is very confusing to read). 2)

Re: [Scikit-learn-general] Ball Tree: speed vs. flexibility/maintainability

2013-02-04 Thread Jacob Vanderplas
On Sun, Feb 3, 2013 at 11:17 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: Goal #4 causes speed penalties because allowing compile-time switching of the distance function means it can no longer be inlined at compilation. I guess you mean 'run-time' and not 'compile-time'. How

[Scikit-learn-general] Ball Tree: speed vs. flexibility/maintainability

2013-02-01 Thread Jacob Vanderplas
Hi folks, On and off for the last several months, I've been looking at the Ball Tree code with a mind to revamping it for increased flexibility and maintainability. A few points I'm aiming for: 1) a more intuitive algorithm for walking the tree (the current code is very confusing to read). 2)

[Scikit-learn-general] Pycon 2013 tutorial

2012-09-17 Thread Jacob VanderPlas
Hello all, Olivier and I have been discussing submitting scikit-learn tutorials for PyCon 2013 next March. He would like to cover some more advanced topics such as categorical feature extraction for text, multicore parallelism, model selection via parallel grid search, etc. Based on feedback

Re: [Scikit-learn-general] Astronomy Tutorial

2012-07-05 Thread Jacob VanderPlas
Olivier, Gael, Thanks for the detailed suggestions. The tutorial I'm preparing for is on Monday, July 16, so I'll be putting in a lot of effort in the next couple weeks. I think for present purposes, I'll plan to keep the tutorial and examples in the old paradigm of rst + source code with

Re: [Scikit-learn-general] Scipy 2012 Austin Sprint?

2012-06-06 Thread Jacob VanderPlas
, Jacob VanderPlas vanderp...@astro.washington.edu wrote: Hi all, Is there any interest to do a scikit-learn sprint at Scipy in Austin next month? I will be there, and I have a few ideas brewing that I'd love to work on... I'd be happy to be the contact person for the conference

[Scikit-learn-general] Scipy 2012 Austin Sprint?

2012-06-05 Thread Jacob VanderPlas
Hi all, Is there any interest to do a scikit-learn sprint at Scipy in Austin next month? I will be there, and I have a few ideas brewing that I'd love to work on... I'd be happy to be the contact person for the conference organizers, if there's interest. Is anyone else from the team planning

[Scikit-learn-general] failure in SGD classifier for large range in x

2012-05-22 Thread Jacob VanderPlas
Hi, I've been playing with the SGD classifier (using the current master branch), and found that when there is a large range in x, the classifier fails. For example, when running the example script found here: http://scikit-learn.org/0.10/auto_examples/linear_model/plot_sgd_ols.html If I

Re: [Scikit-learn-general] Online Ball-tree

2012-05-13 Thread Jacob VanderPlas
It can't be done with the current functionality. When I rewrote the code last year, we decided picklability of the object and avoiding dynamic memory management was more important than being able to use it online. Currently, all data is stored in pre-allocated arrays. To use the Ball Tree

Re: [Scikit-learn-general] Online Ball-tree

2012-05-13 Thread Jacob VanderPlas
, using c-structures for each node with pointers to children. Jake Andreas Mueller wrote: Am 13.05.2012 17:28, schrieb Jacob VanderPlas: It can't be done with the current functionality. When I rewrote the code last year, we decided picklability of the object and avoiding dynamic memory

[Scikit-learn-general] Astro tutorial

2012-05-06 Thread Jacob VanderPlas
Hi all, I would love to get the astronomy tutorial [1] merged in time for the next release. It's basically ready to go, aside from some typos (my favorite of which is the heading Setup and Inspiration in place of Setup and Installation). There are a couple larger tasks that remain: - make

Re: [Scikit-learn-general] Astro tutorial

2012-05-06 Thread Jacob VanderPlas
in the doc-tests. Gael On Sun, May 06, 2012 at 11:19:20AM -0700, Jacob VanderPlas wrote: Hi all, I would love to get the astronomy tutorial [1] merged in time for the next release. It's basically ready to go, aside from some typos (my favorite of which is the heading Setup and Inspiration

Re: [Scikit-learn-general] Astro tutorial

2012-05-06 Thread Jacob VanderPlas
Varoquaux wrote: On Sun, May 06, 2012 at 11:28:45AM -0700, Jacob VanderPlas wrote: If it can't happen before the release, I understand and that's fine. I think that it is going to be hard. I am really starting to feel tired and was thinking of finishing for today soon and watching

Re: [Scikit-learn-general] Merging pyCRFSuite into scikit-learn

2012-05-02 Thread Jacob VanderPlas
Hi Rob, The crfsuite wrapper is still far from complete (a couple test cases even seg-fault). I also echo Olivier's input on the problem that numpy arrays/scipy sparse matrices cannot be mapped to the crfsuite internal data structure. For that reason, it would be very hard to use crfsuite in

Re: [Scikit-learn-general] computing a graph laplacian

2012-05-02 Thread Jacob VanderPlas
Hi Satrajit, I believe the current implementation is equivalent to the first several equations in this document: http://www.math.ucsd.edu/~fan/research/cb/ch1.pdf It may be that there are several ways to define a laplacian; I'm not an expert on the subject. Gael might be able to weigh in - I

Re: [Scikit-learn-general] computing a graph laplacian

2012-05-02 Thread Jacob VanderPlas
Satra, Yes, a PR on my scipy branch would be helpful! I'm hoping to have the sparse graph module ready well before the next release of scipy. It will still be a while after that before scikit-learn can depend on the future scipy 0.11, so we'll have to maintain a duplicate version in

[Scikit-learn-general] Errors in Variables

2012-04-30 Thread Jacob VanderPlas
Hi, I've been working on some modifications of methods in scikit-learn recently, and there's one deficiency of the interface that I'm having trouble with: errors in variables. I know that few (perhaps none?) of the scikit-learn routines take measurement error into account, but it's an

Re: [Scikit-learn-general] Isomap with more general inputs

2012-04-06 Thread Jacob VanderPlas
There was some discussion along these lines last year, but I don't think anyone has worked on it yet. Scikit-learn doesn't currently have the ability to do manifold learning from a precomputed distance matrix, but it could be extended to that pretty easily. What it would take would be to

Re: [Scikit-learn-general] NMF implementation

2012-03-16 Thread Jacob VanderPlas
Hi Olivier, The code looks very well written. I think it would fit well in scikit-learn. The API would have to be modified to fit the scikit-learn format. You can read more about that at the developers' page: http://scikit-learn.org/stable/developers/index.html It will also require some

Re: [Scikit-learn-general] Le Bergstra Nouveau est arrivé

2012-03-08 Thread Jacob VanderPlas
Interesting! Has anyone ever seen gaussian process learning used for this sort of hyperparameter estimation? I'm thinking of something similar to the Kriging approach to likelihood surfaces, where some random starting points are used to train a GPML solution, and this surface is minimized to

Re: [Scikit-learn-general] Not all plots generated on website

2012-03-05 Thread Jacob VanderPlas
I know that several of the plots fail with Matplotlib 1.0. It's some problem that comes up when certain lines have a dashed linestyle. Jake Andreas wrote: Hi everybody. I noticed that not all example plots are present on the website. Does anyone know why that is? I would guess that

Re: [Scikit-learn-general] Pydata Workshop Tutorial

2012-02-27 Thread Jacob VanderPlas
Thanks for all the feedback. I pushed an update this morning which addressing some of the easy fixes that were brought up, as well as adding the final two exercises. Thanks! http://jakevdp.github.com/tutorial/astronomy/exercises.html Jake Lars Buitinck wrote: 2012/2/27 Jacob VanderPlas

[Scikit-learn-general] Pydata Workshop Tutorial

2012-02-26 Thread Jacob VanderPlas
Hi folks, I'm scheduled to lead a scikit-learn tutorial at the pydata workshop at the Googleplex this Saturday. I'm planning to go through introductory machine learning concepts with scikit-learn, focusing on astronomical data for the examples and exercises. For this purpose, I've been

Re: [Scikit-learn-general] Docs build without warnings / errors

2012-02-20 Thread Jacob VanderPlas
Nice work Andy! I agree that any PR now should meet this standard before merge. And I'll certainly buy you a beer for this next time we meet! Jake Andreas wrote: Hey everybody. Finally, I got the docs in a state where they should build without any warnings or errors. (I distinctly

[Scikit-learn-general] Sphinx image duplication

2012-02-07 Thread Jacob VanderPlas
It looks like the newest version of sphinx has a fix to our image duplication issue when building the docs: http://sphinx.pocoo.org/changes.html#release-1-0-8-sep-23-2011 http://sphinx.pocoo.org/changes.html#release-1-0-8-sep-23-2011 See issue #704 in that list. I haven't had a chance to try

[Scikit-learn-general] GaussianProcess 'nugget'

2012-01-31 Thread Jacob VanderPlas
Hello, I've been working on applying Gaussian Processes to noisy input data. The scikit-learn docs are not especially helpful on this topic, but after reading through some of the references and scanning the code, I found that the keyword 'nugget' in the initializer of GaussianProcess does

Re: [Scikit-learn-general] Nearest neighbor warning when running LocallyLinearEmbedding

2012-01-23 Thread Jacob VanderPlas
I played around with this a bit: it appears to be related to a memory error. https://gist.github.com/1666570 This fails after a few iterations. If the print statement is uncommented, then it no longer fails. The ball tree code uses a lot of raw memory views for speed... I'll have a look

Re: [Scikit-learn-general] RBF kernel with ball tree

2012-01-22 Thread Jacob VanderPlas
I don't think this would work out-of-the-box. The classic ball tree implementation depends on the metric satisfying the triangle inequality. You may be able to cleverly modify the algorithm to work in other cases, but I'm not aware of any examples of that. I think that approximate nearest

Re: [Scikit-learn-general] Distance Metrics for BallTree

2012-01-07 Thread Jacob VanderPlas
how to quickly perform a neighbor search. Hope that helps Jake Emanuele Olivetti wrote: On 01/06/2012 10:59 PM, Jacob VanderPlas wrote: Hi all, Just a quick note: I opened a new repository https://github.com/jakevdp/pyDistances where I'm working on distance functions which can

Re: [Scikit-learn-general] Distance Metrics for BallTree

2012-01-07 Thread Jacob VanderPlas
that casts the two objects appropriately, i.e. numpy.arrays of doubles in my case, and returns a non-negative double. Why do you need templating? Best, Emanuele [0]: http://en.wikipedia.org/wiki/Metric_%28mathematics%29#Definition On 01/07/2012 06:58 PM, Jacob VanderPlas wrote: Emanuele

[Scikit-learn-general] Distance Metrics for BallTree

2012-01-06 Thread Jacob VanderPlas
Hi all, Just a quick note: I opened a new repository https://github.com/jakevdp/pyDistances where I'm working on distance functions which can be incorporated into BallTree (this is expanded from the gist I sent around earlier). Feel free to take a look. Some of it necessarily overlaps with

Re: [Scikit-learn-general] Other distance metrics for kNN

2012-01-05 Thread Jacob VanderPlas
python overhead. I'd be curious to hear peoples' thoughts Jake Gael Varoquaux wrote: On Wed, Jan 04, 2012 at 07:59:04AM -0800, Jacob VanderPlas wrote: If someone has a good idea about how one could specify these distance metrics from python code, with optional ancillary parameters

Re: [Scikit-learn-general] Other distance metrics for kNN

2012-01-05 Thread Jacob VanderPlas
should need to be done/changed to add e.g. the keyword p as was mentioned in Jake's first reply? Cheers and thanks, Mathias On Thu, Jan 5, 2012 at 5:33 PM, Jacob VanderPlas vanderp...@astro.washington.edu mailto:vanderp...@astro.washington.edu wrote: Here's a small example I coded

Re: [Scikit-learn-general] Other distance metrics for kNN

2012-01-05 Thread Jacob VanderPlas
Gael Varoquaux wrote: snip You are cimporting malloc and free. I have a personnal difficult relationship with those two old friends. However, it seems not to be used in the code. I just wanted to check. I initially used malloc and free, but settled on the `tmp` pointer to avoid this (see

Re: [Scikit-learn-general] Other distance metrics for kNN

2012-01-05 Thread Jacob VanderPlas
Emanuele, I should also note that a distinct advantage of cover trees is that, unlike ball tree, there is no need to compute the mean/median point of each node. This means that their storage can be much more compact, and they'd be very suitable to computing distances within sparse data. For

Re: [Scikit-learn-general] Building docs with math

2012-01-05 Thread Jacob VanderPlas
Definitely related. I guess the code should be modified to not use rmtree but to just remove the figure images alone. I'll take a look Jake Andreas wrote: On 01/05/2012 08:03 PM, Jacob VanderPlas wrote: I wonder if this is a problem with that doc/image fix I put up during the sprint

Re: [Scikit-learn-general] Building docs with math

2012-01-05 Thread Jacob VanderPlas
I'm having trouble replicating the problem. When you ``make html`` twice in a row, do you see anything in the _build/html/_images/math directory? Jake Andreas wrote: On 01/05/2012 08:03 PM, Jacob VanderPlas wrote: I wonder if this is a problem with that doc/image fix I put up during

Re: [Scikit-learn-general] Building docs with math

2012-01-05 Thread Jacob VanderPlas
;) On 01/05/2012 08:47 PM, Jacob VanderPlas wrote: I'm having trouble replicating the problem. When you ``make html`` twice in a row, do you see anything in the _build/html/_images/math directory? Jake Andreas wrote: On 01/05/2012 08:03 PM, Jacob VanderPlas

Re: [Scikit-learn-general] Building docs with math

2012-01-05 Thread Jacob VanderPlas
;) On 01/05/2012 08:47 PM, Jacob VanderPlas wrote: I'm having trouble replicating the problem. When you ``make html`` twice in a row, do you see anything in the _build/html/_images/math directory? Jake Andreas wrote: On 01/05/2012 08:03 PM, Jacob VanderPlas wrote

[Scikit-learn-general] Removal of 'decode' from GMM

2012-01-04 Thread Jacob VanderPlas
Hi all, I just ran some old mixture model code using the latest master version of scikit-learn, and it broke because `GMM.decode` was removed during the recent refactoring of the mixture module. I think this should probably have a depreciation warning for the time being. Thoughts? Jake

[Scikit-learn-general] GMM

2012-01-04 Thread Jacob VanderPlas
Sorry, my mistake: the removal of `decode` wasn't in master. It was in the GMM pull request. I'll make the comment there Jake -- Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex

Re: [Scikit-learn-general] Other distance metrics for kNN

2012-01-04 Thread Jacob VanderPlas
Gael Varoquaux wrote: On Wed, Jan 04, 2012 at 07:59:04AM -0800, Jacob VanderPlas wrote: If someone has a good idea about how one could specify these distance metrics from python code, with optional ancillary parameters, and convert these specifications into code for fast distance

Re: [Scikit-learn-general] Other distance metrics for kNN

2012-01-04 Thread Jacob VanderPlas
Emanuele, This is exciting! I've played with Cover Trees in the past, and found similar to you that the insertion time is very slow, and query time is comparable to that of Ball Tree. From my reading and brief experimentation on the subject, it seems that cover trees seem to be nice in that

Re: [Scikit-learn-general] sklearn.test() weirdness

2012-01-04 Thread Jacob VanderPlas
I have noticed in the past that if you try to `import scipy` from within the scipy directory, it raises the following: ImportError: Error importing scipy: you cannot import scipy while being in scipy source directory; please exit the scipy source tree first, and relaunch your python

[Scikit-learn-general] Example failures with matplotlib 0.99.0

2011-12-21 Thread Jacob VanderPlas
Hi all, I'm running matplotlib 0.99.0 (this is the current version in ubuntu repositories) and I get a few errors when building the docs. All are related to a bug in matplotlib when creating legends for dotted/dashed lines in certain circumstances. It's been addressed on the matplotlib-dev

Re: [Scikit-learn-general] RuntimeError: Factor is exactly singular

2011-12-19 Thread Jacob VanderPlas
Timmy, I'm taking a closer look at your problem now. There's something very strange in your adjacency matrix: a small set of points are the nearest neighbors of virtually every other point. I think this is leading to the singular weight matrices which cause the error. I visualized it like

[Scikit-learn-general] Utils upgrade

2011-12-19 Thread Jacob VanderPlas
Hi all, I think we should re-think our utils model. Currently, it's a mish-mash of tools for development (e.g. array2d, logsum, etc.) and performance code that's better than many other available python implementations (e.g. fast_svd, graph_shortest_path, etc.). As far as I can tell, none of

Re: [Scikit-learn-general] December sprint planning (NIPS edition)

2011-12-12 Thread Jacob VanderPlas
I have made my own reservations at Casa Angela through HostelWorld.com. I'll be staying there the nights of Dec 15, and Dec 18-21. Question: is anybody planning to drive up to the workshops at Sierra Nevada on the 16th? I'm looking for the best way to get there. There's a bus that leaves at

Re: [Scikit-learn-general] RuntimeError: Factor is exactly singular

2011-12-08 Thread Jacob VanderPlas
Sometimes this sort of error can happen when not enough neighbors are used, so that your data is split into two unconnected regions. Increasing the number of neighbors could help Jake Timmy Wilson wrote: I get the following error when running Hessian-based LLE::

Re: [Scikit-learn-general] RuntimeError: Factor is exactly singular

2011-12-08 Thread Jacob VanderPlas
Timmy, Interesting... this means that none of the earlier ideas were on the right track, or else we'd see the same error for standard and modified. It turns out that Hessian and LTSA do very similar things. I still haven't wrapped my mind around it intuitively, but what you're seeing

Re: [Scikit-learn-general] NLP course at Stanford available for enrollment

2011-11-20 Thread Jacob VanderPlas
I would recommend these: I'm currently taking the Machine Learning course, taught by Andrew Ng, which will be offered again in January. It's been a great intro to things like logistic regression, neural networks, SVM, etc. for someone like me with no formal ML training. I've found 2-3

Re: [Scikit-learn-general] Question about applying dimension reduction on text

2011-10-26 Thread Jacob VanderPlas
Olivier Grisel wrote: A note for the scikit-learn developers: = we should definitely improve the tooling for checking the input and emit informative ValueError messages that state explicitly that scipy.sparse matrices are not supported as input for the models mentioned by the poster.

[Scikit-learn-general] Documentation linking

2011-10-19 Thread Jacob VanderPlas
Hi all, In the auto-docs found at http://scikit-learn.sourceforge.net/stable/modules/classes.html, anything using :template: function.rst for the auto-summary does not have a link to the documentation. (e.g, almost everything under the heading Datasets heading on the page). Items using

  1   2   >