[Scikit-learn-general] contributing to scikit

2014-01-31 Thread Joseph Perla
I love SciKit and I'm going to contribute an SGD classifier for semi-supervised problems. I already read through all the contributor documentation and I've read many of the docs. I'm asking the list if I should model my code off of the style/quality of the SGDClassifier class or if there is a bet

Re: [Scikit-learn-general] PCA on 3D images

2014-01-31 Thread Kyle Kastner
I have never done exactly what you are suggesting, but there is an inverse_transform method for PCA objects, which may do what you are looking for. On Fri, Jan 31, 2014 at 1:25 PM, Arman Eshaghi wrote: > Dear all, > > I understand that scikit-learn is a general purpose tool, however here I > app

[Scikit-learn-general] PCA on 3D images

2014-01-31 Thread Arman Eshaghi
Dear all, I understand that scikit-learn is a general purpose tool, however here I appreciate if you could forward me to a webpage, tutorial , and etc to be able to understand the basis of my problem. I work on 3D Images of MRI (or 4D). A typical problem, for example, is when I use PCA decompos

Re: [Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-31 Thread Frédéric Bastien
thanks. Fred On Thu, Jan 30, 2014 at 8:28 PM, Patrick Mineault wrote: > Sure you can: > > http://www.cs.toronto.edu/~jasper/bayesopt.pdf > > And some python code: > > https://github.com/JasperSnoek/spearmint > > > On Thu, Jan 30, 2014 at 7:53 PM, Frédéric Bastien wrote: >> >> I have a question

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-31 Thread Arnaud Joly
Here, some results on the 20 newsgroups dataset: Classifiertrain-time test-time error-rate 5-nn0.0047s 13.6651s0.5916 random forest 263.3146s3.9985s0.2459 sgd 0.2265s0.0657s

Re: [Scikit-learn-general] Google Summer of Code - ideas

2014-01-31 Thread Gael Varoquaux
On Fri, Jan 31, 2014 at 12:16:53PM +0100, Arnaud Joly wrote: > We should definitely remove the old list. Since it biases applicants > toward subjects that could be without mentors. I am in favor of editing it quite violently to remove anything that people cannot vouch for. Maybe not remove it comp

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-31 Thread Paolo Losi
On Wed, Jan 22, 2014 at 9:48 AM, Mathieu Blondel wrote: > > Something I was wondering is whether sparse support in decision trees > would actually be useful. Do decision trees (or ensembles of them like > random forests) work better than linear models for high-dimensional data? > I share your poin

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-31 Thread Olivier Grisel
2014/1/31 Felipe Eltermann : > OK, I finished reading _tree.pyx and now I understand CSC dense matrix > format. > I have a general view of what is necessary to be implemented. > > I've never seriously used Cython. What are you guys using as development > environment? Just a good text editor and a

Re: [Scikit-learn-general] Google Summer of Code - ideas

2014-01-31 Thread Arnaud Joly
Hello, Your contributions to scikit-learn is highly appreciated. However, we use only the scikit-learn mailing list to discuss about GSOC ideas. At the moment, I don’t want to give any, but might give some in a near future. We should definitely remove the old list. Since it biases applicants to

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

2014-01-31 Thread Felipe Eltermann
OK, I finished reading _tree.pyx and now I understand CSC dense matrix format. I have a general view of what is necessary to be implemented. I've never seriously used Cython. What are you guys using as development environment? How to easily code/compile/test? On Thu, Jan 23, 2014 at 11:55 AM, Ol

Re: [Scikit-learn-general] Parallel SVM/SVC?

2014-01-31 Thread Olivier Grisel
There are smarter ways to speed up SVM with parallel computation by changing the algorithm, e.g: http://www.cs.utexas.edu/~cjhsieh/dcsvm/ But this is new and not implemented in scikit-learn and it's too recent to be implemented an maintained as part of scikit-learn. However it could be implemente

Re: [Scikit-learn-general] Parallel SVM/SVC?

2014-01-31 Thread Lars Buitinck
2014-01-31 Thomas Johnson : > It's definitely the bottleneck for my particular use case. I spawn ~180 > processes for a grid search on my Google Compute Engine cluster, but still > end up waiting >90 minutes just for a few individual long-running processes > with high C values. Well, have you trie

Re: [Scikit-learn-general] API change for scoring in cross validation @rev 0.14.1

2014-01-31 Thread Gael Varoquaux
> if not isinstance(score, numbers.Number): >     raise ValueError("scoring must return a number, got %s (%s)" >                             " instead." % (str(score), type(score))) I am not opposed to making this check more relaxed: we could add an 'or (isinstance(score, np.ndarray) and score.dty

Re: [Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-31 Thread Gael Varoquaux
On Thu, Jan 30, 2014 at 07:53:16PM -0500, Frédéric Bastien wrote: > I have a question on those type of algo for hyper parameter > optimization. With a grid search, we can run all jobs in parallel. But > I have the impression that those algo remove that possibility. Is > there there way to sample ma