Re: [Scikit-learn-general] Codebook generation in error-correcting output-code

2013-08-12 Thread Mathieu Blondel
Hi Karol, Those would indeed be nice additions. However, we should do benchmarks on real data and focus on the most effective methods. I found this paper / software which could serve as a reference: http://jmlr.org/papers/v11/escalera10a.html Mathieu On Mon, Aug 12, 2013 at 1:27 PM, Karol Pysni

Re: [Scikit-learn-general] Univariate feature selection

2013-08-12 Thread Andreas Mueller
Hi Josh. You are right. See this issue: https://github.com/scikit-learn/scikit-learn/issues/1149?source=cc Andy -- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed

Re: [Scikit-learn-general] Codebook generation in error-correcting output-code

2013-08-12 Thread Karol Pysniak
Hi Mathieu, It looks interesting. Do you have in mind any specific real data we should use to benchmark the methods? Thanks, Karol 2013/8/12 Mathieu Blondel > Hi Karol, > > Those would indeed be nice additions. However, we should do benchmarks on > real data and focus on the most effective me

[Scikit-learn-general] canonical correlation using Scikit-learn's generalized linear methods

2013-08-12 Thread James Jensen
Hello! You may already be familiar with canonical correlation analysis (CCA). Given two sets of variables, CCA yields the linear combinations with maximum correlation between them. It is similar to PCA, which finds projections with maximum variance for a single set of variables; in fact, PCA can

[Scikit-learn-general] Selective multiclass

2013-08-12 Thread A
Hello, For my ML problem I am facing a bit of dilemma wrt my solution Problem: Predict a category using a text-classifier for large number of categories. Depending on the category predicted we need some post processing [e.g. get document with URL] and try to predict again.

Re: [Scikit-learn-general] Selective multiclass

2013-08-12 Thread Lars Buitinck
2013/8/12 A <4rk@gmail.com>: > --> To summarize is there a way to get a list of predicted categories (by > likelihood) so as to get the next best category? Or the best category > within a subset of original categories without training a separate > classifier. I am currently using OneVsRes

Re: [Scikit-learn-general] Selective multiclass

2013-08-12 Thread A
> Use the predict_proba method, or decision_function, depending on the > model (for SGD, decision_function always works). Btw., if you're not > doing multilabel, then you don't need OneVsRestClassifier. > Thanks, will give it a shot. On another note, n_jobs > 1 for SGDClassifier seems to r

Re: [Scikit-learn-general] Codebook generation in error-correcting output-code

2013-08-12 Thread Mathieu Blondel
Hi Karol, I would do the benchmark on commonly-used datasets such as MNIST, USPS, News20, Covertype, Sector, etc. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ Note that ECOC can potentially improve accuracy on binary classification too, so I would do benchmarks on binary classificatio

Re: [Scikit-learn-general] Codebook generation in error-correcting output-code

2013-08-12 Thread Karol Pysniak
Hi Mathieu, Thanks for the suggestions, I'll test the methods and get back with the results. Thanks, Karol On Aug 12, 2013, at 7:55 PM, Mathieu Blondel wrote: > Hi Karol, > > I would do the benchmark on commonly-used datasets such as MNIST, USPS, > News20, Covertype, Sector, etc. > > http:/