Hello,
I was going to upload some of the data sets listed here:
https://github.com/scikit-learn/scikit-learn/wiki/Setting-up-tests-to-benchmark-current-and-future-code
to mldata.org so make them easily available in scikit-learn.
The problem is that I can't find much information on how mldata.org
Hey,
the following is from a docstring in mldata:
-
Load the 'leukemia' dataset from mldata.org, which respects the
sklearn axes convention:
>>> leuk = fetch_mldata('leukemia', transpose_data=False)
>>> print(leuk.data.shape[0])
7129
-
according to http://mldata
I would also like to have a high dim regression data set.
2012/5/31 Vlad Niculae :
>
> On May 31, 2012, at 12:42 , Immanuel B wrote:
>
>>> Does N mean n_samples and p n_features?
>> yes
>>
>>> What about number of targets, is it 1 everywhere?
>>
> Does N mean n_samples and p n_features?
yes
>What about number of targets, is it 1 everywhere?
not sure what you mean...
The first table contains binary classification data, in the second table the
number of classes is given by #class.
for the regression problem, I belief, the lpsa variable ha
ated questions for each step.
>
> Alex
>
> On Mon, May 28, 2012 at 1:11 PM, Vlad Niculae wrote:
>
> On May 28, 2012, at 13:50 , Immanuel B wrote:
>
> Hello,
> I could use some feedback on how to best set-up a benchmark for these
> models:
> l2 loss*
> l
Hello,
I could use some feedback on how to best set-up a benchmark for these models:
l2 loss*
log loss*
multi-logit*
with l1 and l1 & l2 penalty
Please have a look at the following file:
https://docs.google.com/document/d/1VjRCU9xAP0hdeMiEQJwIKumMQTQZXdV1oRL_gh38iE8/edit
@Vlad
I'm ver
Hey all,
it's really exciting to see so much positive feedback.
Thank you all.
@Vlad, David
Nice job! :)
Immanuel
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
thr
Hi,
the ROC curve has indeed been extended to the multiclass case.
for example:
A simplified extension of the Area under the ROC to the multiclass domain
http://homepage.tudelft.nl/a9p19/papers/prasa_06_vuc.pdf
I have used the R pROC package for that, maybe that’s an option.
Done, thanks for pointing out the urgency I wasn't aware of it.
2012/4/19 Olivier Grisel :
> Le 19 avril 2012 03:41, Immanuel B a écrit :
>> Hello all,
>>
>> I rewrote the timeline part of my proposal in order to make it better
>> readable and provide clearer de
Hello all,
I rewrote the timeline part of my proposal in order to make it better
readable and provide clearer definitions for the steps I intend to follow.
I would be greatfull for any comments be it on content, formulation or anything
else before I update my proposal on the GSOC site.
https://d
> No LARS is another way to solve the LASSO regression problem that is
> distinct from the Coordinate Descent method (and from the Stochastic
> Gradient Descent method too).
Thanks, I was trying to make the connection but only found a Cholesky solver. :)
---
icit is that a cython implementation would
> avoid data copies (as our liblinear bindings makes copies), avoid the
> penalization of the intercept and facilitate warm restart which would
> also to lead to an efficient LogisticRegressionCV class.
>
> Alex
>
> On Thu, Apr 5,
Hello all,
here finally is the draft for my proposal.
https://docs.google.com/document/d/1BG7Qmf3yepwkSCngRtJHQjWg2-tX-ltWxbV-goxXudA/edit
Any remarks are greatly appreciated.
best,
Immanuel
--
Better than sec? Nothing i
Thanks both of you,
> Do `make inplace` for the incremental build only of the C files that
> have changed since the last build and then use `nosetests
> sklearn/mypackage/module` to launch the tests only on your module.
this did the trick.
@David
I have dependencies linking them manually is somewha
Hello,
I'm just starting to work on some cython files in scikit. It would
great if someone could suggest
me an easy way to compile them.
Currently I'm running `cython` on the file and then make on
scikit-learn. This seems to work but the second
step is quite slow. I also tried to write a short set
Hello all,
before attempting a detailed proposal I would like to discuss the big
picture with you. I went though the two referenced papers and my
feeling is that glmnet as coordinate descent method could be a good
choice especially since the connection with strong rule approach is
already available
>hum it's seems surprising that a coordinate descent procedure blows up the
>memory but i'll have to read the paper. When I find the time …
>
>I had more in mind the glmnet approach for multinomial logistic regression
>which scales pretty well AFIAK
These remarks were quite useful to me, thanks. I
2012/3/22 Gael Varoquaux :
> On Thu, Mar 22, 2012 at 10:52:32PM +0100, Immanuel B wrote:
>> I just debased my scikit-learn fork and run the tests in
>> https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/linear_model/tests
>> .
>> They all return with the s
Hello,
I just debased my scikit-learn fork and run the tests in
https://github.com/scikit-learn/scikit-learn/tree/master/sklearn/linear_model/tests
.
They all return with the same error, the tests in the other packages
run just fine.
Can someone reproduce this?
best,
Immanuel
Failure: ImportError
2012/3/21 Gael Varoquaux :
> On Wed, Mar 21, 2012 at 12:24:39PM +0900, Mathieu Blondel wrote:
>> If the online NMF and SGD-based matrix factorization proposals are
>> merged as I suggested before, I think it would make a decent GSOC
>> project. Besides, if two different students were to work on the
20 matches
Mail list logo