Re: [Scikit-learn-general] numpy.linalg vs scipy.linalg

2013-10-19 Thread Kenneth C. Arnold
Relatedly, I recently noticed http://docs.scipy.org/doc/numpy/reference/routines.dual.html On Saturday, October 19, 2013, Thomas Unterthiner wrote: > Hi there! > > I've noticed that sklearn uses numpy.linalg instead of scipy.linalg for > its linear algebra implementations (e.g. dot or svd). Ho

Re: [Scikit-learn-general] Testing small code peices

2013-09-02 Thread Kenneth C. Arnold
way to work around relative imports in notebook? they doesn't > seem to work as absolute imports. but absolute imports are taken from the > installed package. I want to import from the version I'm working on. > > -Maheshakya > > > On Mon, Sep 2, 2013 at 3:58 AM, Kenne

Re: [Scikit-learn-general] Testing small code peices

2013-09-01 Thread Kenneth C. Arnold
In addition to what everyone else has responded: you'd probably enjoy working in an ipython notebook. -Ken On Aug 29, 2013 6:22 AM, "Maheshakya Wijewardena" wrote: > Hi > I'm trying to implements a general Bagging module for Scikit-learn for an > university project. I want to test the methods an

Re: [Scikit-learn-general] scikit-learn for Android?

2013-07-17 Thread Kenneth C. Arnold
I wonder if at some point it gets easier to build a emscripten build of the numeric Python ecosystem, then just run it on nodejs. Anybody tried that? -Ken (on mobile) On Jul 17, 2013 8:02 AM, "Olivier Grisel" wrote: > NumPy is really easy to build compared to SciPy. AFAIK you need a > gfortran r

Re: [Scikit-learn-general] Generalised warm start / parameter search

2013-05-20 Thread Kenneth C. Arnold
I haven't been following the details of this thread, but I thought: why automate? GridSearch could, e.g., take an OrderedDict of parameters, and try combinations in C-array order. (For parallelism, maybe batches could be queued up in the opposite (i.e., Fortran) order, though I haven't thought that

Re: [Scikit-learn-general] How do I look at the code for a particular model in scikit learn?

2013-04-30 Thread Kenneth C. Arnold
I'll note also that the Python standard library docs have started to include some source links (e.g., http://docs.python.org/2/library/webbrowser.html or http://docs.python.org/3.4/library/html.entities.html) but it is not consistent between modules. -Ken On Tue, Apr 30, 2013 at 9:34 AM, Jaques

Re: [Scikit-learn-general] Metric Learning Algorithms

2013-04-22 Thread Kenneth C. Arnold
Some gists: https://gist.github.com/kcarnold/5439917 https://gist.github.com/kcarnold/5439945 They are rather terribly documented, sorry. Input to such algorithms is usually given as: - a set of similarity and dissimilarity links, - relative comparisons (x is closer to y than w is to z), or -

Re: [Scikit-learn-general] Metric Learning Algorithms

2013-04-21 Thread Kenneth C. Arnold
I have implemented a few metric learning algorithms myself. The quality of that code is nowhere near sklearn standards, but I may have some incentive to improve it soon. -Ken On Sun, Apr 21, 2013 at 3:42 PM, John Collins wrote: > Has anybody or does anybody have plans to implement metric learn

Re: [Scikit-learn-general] kmeans distance function not configurable

2013-04-02 Thread Kenneth C. Arnold
If you want a Mahalanobis distance, though, you can instead just transform your data using the Cholesky decomposition of the distance matrix. -Ken On Tue, Apr 2, 2013 at 3:09 PM, Andreas Mueller wrote: > Hi Francis. > No. It is highly non-trivial for most distance functions to do k-means as >

Re: [Scikit-learn-general] algorithm solve classical MDS with SVD

2013-03-28 Thread Kenneth C. Arnold
On Thu, Mar 28, 2013 at 2:35 PM, Nelle Varoquaux wrote: > But in general, I don't think we can "force" the user to use sparse > matrices. They are an absolute pain to work with because of the > inconsistencies of interface with ndarray and conversion between sparse and > dense can be time consumin

Re: [Scikit-learn-general] numba, cython and relation to sklearn future

2013-03-05 Thread Kenneth C. Arnold
It was a pretty easy build on Mac -- I just used MacPorts to install and select an llvm. Of course Anaconda is even easier. I'd say Numba is a medium-term consideration. It's enough trouble getting everybody using C compilers, so adding LLVM to the mix is probably way too much of a change for the

Re: [Scikit-learn-general] Python 2.x & 3.x under one code base

2013-02-11 Thread Kenneth C. Arnold
tl;dr: Try Python 3.2 with MacPorts. Unfortunately, Scipy 0.11.0 is broken on Python 3.3. http://projects.scipy.org/scipy/ticket/1739 This is fixed in their master branch. I just made a successful build of that branch on OS X 10.8:https://trac.macports.org/ticket/37400#comment:15There may be some

Re: [Scikit-learn-general] Ovr Classifier predict error

2013-01-14 Thread Kenneth C. Arnold
In your code, 'document' is just a string, not a feature vector. You should use the same Vectorizer that you used to train the classifier to begin with. Trained classifier objects are generally not compatible across versions. You should retrain the classifier using the new version (and who knows,

Re: [Scikit-learn-general] Cross validation turns my lists into numpy arrays

2013-01-13 Thread Kenneth C. Arnold
Why not use numpy arrays of strings all along? Their importance here is fancy indexing... Or use X=np.arange(N) and do the fancy indexing yourself on demand? -Ken On Jan 13, 2013 11:04 PM, "Robert Layton" wrote: > When using cross_validation.X, all arrays are checked in the normal way -- > using

Re: [Scikit-learn-general] update to macport instructions

2012-12-13 Thread Kenneth C. Arnold
I use MacPorts for Python packages that are annoying to compile (like py27-numpy and py27-matplotlib) or that depend on external C libraries (py27-lxml), because in both cases the tooling and (often) pre-compiled packages are helpful. I install sklearn from source (or pip) because that's easy and

Re: [Scikit-learn-general] I messed up or master severely broken

2012-10-28 Thread Kenneth C. Arnold
Btw, if at some point you do need to look at diffs for the cython generated code, temporarily removing the gitattributes file should suffice. You probably don't even have to commit, tho be aware of doing it by accident :) -Ken On Oct 28, 2012 4:32 PM, "Andreas Mueller" wrote: > Hey everybody. >

Re: [Scikit-learn-general] How to save an array of models

2012-10-17 Thread Kenneth C. Arnold
This is actually not related to sklearn at all, but I run into it often enough that I'm replying here anyway: Pickle dumps an object (first parameter) to a file (second parameter). I get those backwards all the time and used to have a utility function to swap args if I got it backwards. Also, it ex

Re: [Scikit-learn-general] rebuilding cython extensions from .pyx file

2012-10-12 Thread Kenneth C. Arnold
setup.py could only try to regenerate the file (and thus require cython) if the source has been modified. Here's an example, though there are likely better ways to accomplish the same thing: https://github.com/commonsense/divisi2/blob/master/setup.py#L57 -Ken On Fri, Oct 12, 2012 at 2:59 PM, Jak

Re: [Scikit-learn-general] cython output in repo

2012-10-06 Thread Kenneth C. Arnold
t 7:21 AM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > On Fri, Oct 05, 2012 at 09:15:29PM -0400, Kenneth C. Arnold wrote: > > Another option: ignore them in development branches and add them back in > the > > release branches. > > The goal is that anybody clon

Re: [Scikit-learn-general] cython output in repo

2012-10-05 Thread Kenneth C. Arnold
If the generated files were marked as binary, they wouldn't show in diffs. (I thought they already were... ) Another option: ignore them in development branches and add them back in the release branches. This would also make cherry picking from the development branch less likely to have merge conf

Re: [Scikit-learn-general] [off-topic] scipy sparse library alternatives

2012-01-19 Thread Kenneth C. Arnold
On Thu, Jan 19, 2012 at 11:03 AM, Satrajit Ghosh wrote: > in one of my projects i use the scipy sparse library for turning a graph > into a sparse dependency matrix and then manipulating this matrix > (adding/subtracting columns/rows, setting elements to 0, ...). this is the > only reason i have s

Re: [Scikit-learn-general] Sparse Matrices and Classifiers

2012-01-19 Thread Kenneth C. Arnold
On Thu, Jan 19, 2012 at 3:05 AM, Olivier Grisel wrote: > Rather than improving the error message when passing sparse arrays to > the dense impl of SVC we should refactor SVC to accept both dense and > sparse representation and use the right wrapper as already done for > SGD, LinearSVC, LogisticReg

Re: [Scikit-learn-general] KMeans implementation in C with OpenMP

2011-12-23 Thread Kenneth C. Arnold
It may be relevant to note that Cython has recently gained some OpenMP support: http://docs.cython.org/src/userguide/parallelism.html -- I haven't tried it, but perhaps it could help improve the scikit-learn implementation. -Ken On Dec 23, 2011 7:31 AM, "Benjamin Hepp" wrote: > > Hi, > > I was

Re: [Scikit-learn-general] Issue with gaussian processes

2011-11-29 Thread Kenneth C. Arnold
On Tue, Nov 29, 2011 at 4:53 PM, Olivier Grisel wrote: > Now back to you problem I think we should support fitting models with > just one sample just for the sake of consistency / continuity even if > theds is no practical application of fitting models with a single > sample: fitting models  with

Re: [Scikit-learn-general] Issue with gaussian processes

2011-11-29 Thread Kenneth C. Arnold
There is no maximum likelihood solution to a GP with a single training point, but you can certainly draw samples from the posterior; in fact, you can draw samples from the prior (without conditioning on data). That may help you determine if your covariance function is reasonable: samples from the p

Re: [Scikit-learn-general] scikit test failure on osx

2011-11-27 Thread Kenneth C. Arnold
a Mac side note: I have found that MacPorts solves most of my getting-things-running-on-Mac problems. Either you can just use their packages directly, often with precompiled binary downloads, or at least `port info py27-scipy` will show you the package names for the dependencies. (I'm currently ru

Re: [Scikit-learn-general] grid search, joblib

2011-11-10 Thread Kenneth C. Arnold
On Thu, Nov 10, 2011 at 9:40 AM, Gael Varoquaux wrote: > I think that it might be an interesting addition. I say 'might' because I > have given such ideas a try on general problems, and they actually often > do not work well: the score as a function of parameters is often a nasty > landscape. Firs

Re: [Scikit-learn-general] Multi Layer Perceptron / Neural Network in Sklearn

2011-11-04 Thread Kenneth C. Arnold
I had nothing to do with that page, but it's https://github.com/scikit-learn/scikit-learn/wiki/Related-Projects. -Ken 2011/11/4 Frédéric Bastien : > On Fri, Nov 4, 2011 at 4:24 PM, Kenneth C. Arnold > wrote: >> +1 for the sklearn review process AND for cooperating with othe

Re: [Scikit-learn-general] Multi Layer Perceptron / Neural Network in Sklearn

2011-11-04 Thread Kenneth C. Arnold
On Fri, Nov 4, 2011 at 12:25 PM, Mathieu Blondel wrote: > Another possibility is to host a Theanos-based implementation as a > side project on github and make the API scikit-learn compatible. > > # In general, I don't really buy the "why implement X if it already > exists in Y" argument because it

Re: [Scikit-learn-general] Randomized PCA

2011-11-02 Thread Kenneth C. Arnold
On Wed, Nov 2, 2011 at 6:04 PM, Olivier Grisel wrote: > 2011/11/2 Radim Rehurek : >> If you decide to implement the randomized PCA, I can offer some observations: >> >> 1. oversampling does little, accuracy comes mostly from the extra power >> iteration steps >> 2. no power iterations result in m

Re: [Scikit-learn-general] Interest in more topic models?

2011-10-28 Thread Kenneth C. Arnold
:) -Ken > On Fri, Oct 28, 2011 at 11:00 PM, Conrad Lee wrote: >> >> Kenneth, >> >> >> On Fri, Oct 28, 2011 at 3:44 PM, Kenneth C. Arnold >> wrote: >>> >>> I just implemented Latent Dirichlet Allocation with collapsed Gibbs >>> samp

[Scikit-learn-general] Interest in more topic models?

2011-10-28 Thread Kenneth C. Arnold
I just implemented Latent Dirichlet Allocation with collapsed Gibbs sampling and made a demo on 20 Newsgroups. If there's interest in having this in sklearn, I could clean up the code for contribution. I noticed there was same discussion back in January about PyMC that didn't reach an actionable c