[Scikit-learn-general] fetch_mldata()

2012-06-13 Thread Immanuel B
Hello, I was going to upload some of the data sets listed here: https://github.com/scikit-learn/scikit-learn/wiki/Setting-up-tests-to-benchmark-current-and-future-code to mldata.org so make them easily available in scikit-learn. The problem is that I can't find much information on how mldata.org

[Scikit-learn-general] error in mldata.py docstring?

2012-06-07 Thread Immanuel B
Hey, the following is from a docstring in mldata: - Load the 'leukemia' dataset from mldata.org, which respects the sklearn axes convention: >>> leuk = fetch_mldata('leukemia', transpose_data=False) >>> print(leuk.data.shape[0]) 7129 - according to http://mldata

Re: [Scikit-learn-general] linear model benchmarking

2012-05-31 Thread Immanuel B
I would also like to have a high dim regression data set. 2012/5/31 Vlad Niculae : > > On May 31, 2012, at 12:42 , Immanuel B wrote: > >>> Does N mean n_samples and p n_features? >> yes >> >>> What about number of targets, is it 1 everywhere? >>

Re: [Scikit-learn-general] linear model benchmarking

2012-05-31 Thread Immanuel B
> Does N mean n_samples and p n_features? yes >What about number of targets, is it 1 everywhere? not sure what you mean... The first table contains binary classification data, in the second table the number of classes is given by #class. for the regression problem, I belief, the lpsa variable ha

Re: [Scikit-learn-general] linear model benchmarking

2012-05-29 Thread Immanuel B
ated questions for each step. > > Alex > > On Mon, May 28, 2012 at 1:11 PM, Vlad Niculae wrote: > > On May 28, 2012, at 13:50 , Immanuel B wrote: > > Hello, > I could use some feedback on how to best set-up a benchmark for these > models: >    l2 loss* >    l

[Scikit-learn-general] linear model benchmarking

2012-05-28 Thread Immanuel B
Hello, I could use some feedback on how to best set-up a benchmark for these models: l2 loss* log loss* multi-logit* with l1 and l1 & l2 penalty Please have a look at the following file: https://docs.google.com/document/d/1VjRCU9xAP0hdeMiEQJwIKumMQTQZXdV1oRL_gh38iE8/edit @Vlad I'm ver

Re: [Scikit-learn-general] GSOC 12' 3/3 !!!

2012-04-24 Thread Immanuel B
Hey all, it's really exciting to see so much positive feedback. Thank you all. @Vlad, David Nice job! :) Immanuel -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and thr

Re: [Scikit-learn-general] ROC curve

2012-04-19 Thread Immanuel B
Hi, the ROC curve has indeed been extended to the multiclass case. for example: A simplified extension of the Area under the ROC to the multiclass domain http://homepage.tudelft.nl/a9p19/papers/prasa_06_vuc.pdf I have used the R pROC package for that, maybe that’s an option.

Re: [Scikit-learn-general] GSOC proposal update: Optimizing sparse linear models using coordinate descent and strong rules

2012-04-19 Thread Immanuel B
Done, thanks for pointing out the urgency I wasn't aware of it. 2012/4/19 Olivier Grisel : > Le 19 avril 2012 03:41, Immanuel B a écrit : >> Hello all, >> >> I rewrote the timeline part of my proposal in order to make it better >> readable and provide clearer de

[Scikit-learn-general] GSOC proposal update: Optimizing sparse linear models using coordinate descent and strong rules

2012-04-19 Thread Immanuel B
Hello all, I rewrote the timeline part of my proposal in order to make it better readable and provide clearer definitions for the steps I intend to follow. I would be greatfull for any comments be it on content, formulation or anything else before I update my proposal on the GSOC site. https://d

Re: [Scikit-learn-general] Proposal: Optimizing sparse linear models using coordinate descent and strong rules.

2012-04-06 Thread Immanuel B
> No LARS is another way to solve the LASSO regression problem that is > distinct from the Coordinate Descent method (and from the Stochastic > Gradient Descent method too). Thanks, I was trying to make the connection but only found a Cholesky solver. :) ---

Re: [Scikit-learn-general] Proposal: Optimizing sparse linear models using coordinate descent and strong rules.

2012-04-06 Thread Immanuel B
icit is that a cython implementation would > avoid data copies (as our liblinear bindings makes copies), avoid the > penalization of the intercept and facilitate warm restart which would > also to lead to an efficient LogisticRegressionCV class. > > Alex > > On Thu, Apr 5,

[Scikit-learn-general] Proposal: Optimizing sparse linear models using coordinate descent and strong rules.

2012-04-04 Thread Immanuel B
Hello all, here finally is the draft for my proposal. https://docs.google.com/document/d/1BG7Qmf3yepwkSCngRtJHQjWg2-tX-ltWxbV-goxXudA/edit Any remarks are greatly appreciated. best, Immanuel -- Better than sec? Nothing i

Re: [Scikit-learn-general] compiling cython files in scikit-learn

2012-03-31 Thread Immanuel B
Thanks both of you, > Do `make inplace` for the incremental build only of the C files that > have changed since the last build and then use `nosetests > sklearn/mypackage/module` to launch the tests only on your module. this did the trick. @David I have dependencies linking them manually is somewha

[Scikit-learn-general] compiling cython files in scikit-learn

2012-03-31 Thread Immanuel B
Hello, I'm just starting to work on some cython files in scikit. It would great if someone could suggest me an easy way to compile them. Currently I'm running `cython` on the file and then make on scikit-learn. This seems to work but the second step is quite slow. I also tried to write a short set

[Scikit-learn-general] Coordinated descent in linear models beyond squared loss GSOC

2012-03-27 Thread Immanuel B
Hello all, before attempting a detailed proposal I would like to discuss the big picture with you. I went though the two referenced papers and my feeling is that glmnet as coordinate descent method could be a good choice especially since the connection with strong rule approach is already available

Re: [Scikit-learn-general] Online Non Negative Matrix Factorization GSoC

2012-03-23 Thread Immanuel B
>hum it's seems surprising that a coordinate descent procedure blows up the >memory but i'll have to read the paper. When I find the time … > >I had more in mind the glmnet approach for multinomial logistic regression >which scales pretty well AFIAK These remarks were quite useful to me, thanks. I

Re: [Scikit-learn-general] tests in test_base(linear models) fail

2012-03-22 Thread Immanuel B
2012/3/22 Gael Varoquaux : > On Thu, Mar 22, 2012 at 10:52:32PM +0100, Immanuel B wrote: >> I just debased my scikit-learn fork and run the tests in >> https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/linear_model/tests >> . >> They all return with the s

[Scikit-learn-general] tests in test_base(linear models) fail

2012-03-22 Thread Immanuel B
Hello, I just debased my scikit-learn fork and run the tests in https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/linear_model/tests . They all return with the same error, the tests in the other packages run just fine. Can someone reproduce this? best, Immanuel Failure: ImportError

Re: [Scikit-learn-general] Online Non Negative Matrix Factorization GSoC

2012-03-21 Thread Immanuel B
2012/3/21 Gael Varoquaux : > On Wed, Mar 21, 2012 at 12:24:39PM +0900, Mathieu Blondel wrote: >> If the online NMF and SGD-based matrix factorization proposals are >> merged as I suggested before, I think it would make a decent GSOC >> project. Besides, if two different students were to work on the