Hi Karol,
Those would indeed be nice additions. However, we should do benchmarks on
real data and focus on the most effective methods.
I found this paper / software which could serve as a reference:
http://jmlr.org/papers/v11/escalera10a.html
Mathieu
On Mon, Aug 12, 2013 at 1:27 PM, Karol Pysni
Hi Josh.
You are right.
See this issue:
https://github.com/scikit-learn/scikit-learn/issues/1149?source=cc
Andy
--
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed
Hi Mathieu,
It looks interesting. Do you have in mind any specific real data we should
use to benchmark the methods?
Thanks,
Karol
2013/8/12 Mathieu Blondel
> Hi Karol,
>
> Those would indeed be nice additions. However, we should do benchmarks on
> real data and focus on the most effective me
Hello!
You may already be familiar with canonical correlation analysis (CCA).
Given two sets of variables, CCA yields the linear combinations with
maximum correlation between them. It is similar to PCA, which finds
projections with maximum variance for a single set of variables; in fact,
PCA can
Hello,
For my ML problem I am facing a bit of dilemma wrt my solution
Problem: Predict a category using a text-classifier for large number of
categories. Depending on the category predicted we need some post
processing [e.g. get document with URL] and try to predict again.
2013/8/12 A <4rk@gmail.com>:
> --> To summarize is there a way to get a list of predicted categories (by
> likelihood) so as to get the next best category? Or the best category
> within a subset of original categories without training a separate
> classifier. I am currently using OneVsRes
> Use the predict_proba method, or decision_function, depending on the
> model (for SGD, decision_function always works). Btw., if you're not
> doing multilabel, then you don't need OneVsRestClassifier.
>
Thanks, will give it a shot.
On another note, n_jobs > 1 for SGDClassifier seems to r
Hi Karol,
I would do the benchmark on commonly-used datasets such as MNIST, USPS,
News20, Covertype, Sector, etc.
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Note that ECOC can potentially improve accuracy on binary classification
too, so I would do benchmarks on binary classificatio
Hi Mathieu,
Thanks for the suggestions, I'll test the methods and get back with the results.
Thanks,
Karol
On Aug 12, 2013, at 7:55 PM, Mathieu Blondel wrote:
> Hi Karol,
>
> I would do the benchmark on commonly-used datasets such as MNIST, USPS,
> News20, Covertype, Sector, etc.
>
> http:/