Hi there. Can you please post the code you are using?
Thanks, Robert
On Jun 16, 2012 10:35 AM, "Fahd S. Alotaibi"
wrote:
> Hi everybody,
>
> I'm using this brilliant framework in text classification. I spotted that
> when the number of classes are > 10, the sklearn just work on with the
> first
Hi everybody,
I'm using this brilliant framework in text classification. I spotted that when
the number of classes are > 10, the sklearn just work on with the first 10
classes only and ignore the remaining classes. This seems a bit strange. I went
quickly throw the sklearn files to see if I cou
On Fri, Jun 15, 2012 at 4:50 PM, Yaroslav Halchenko wrote:
>
> On Fri, 15 Jun 2012, [email protected] wrote:
>> https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/misc/dcov.py#L160
>> looks like a double sum, but wikipedia only has one sum, elementwise product.
>
> sorry -- I might be slow -- w
2012/6/15 Peter Prettenhofer :
> Both are not proper multinomial logistic regression models;
> LogisticRegression does not care and simply computes the probability
> estimates of each OVR classifier and normalized to make sure they sum
> to one. You could do the same for SGDClassifier(loss='log') b
Thanks for the prompt reply, Peter. I may be in a situation that will call
for SGDClassifier, so I have two follow-up questions:
1) I'd like to compute the class probs; are the probs for the individual
OvR classifiers (easily) accessible? My intuition is that I can compute
these from the returned
I've added the sprint on pyconfr's website:
http://www.pycon.fr/2012/sprints/
and I've updated the upcoming event on the github's wiki:
https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events
I've transfered the information on the Granada sprint to the "previous
sprint" section.
Thanks,
Hi Fred,
the major difference is the optimization algorithm:
Liblinear/Coordinate Descent vs. Stochastic Gradient Descent.
If your problem is high dimensional (10K or more) and you have a large
number of examples (100K or more) you should choose the latter -
otherwise, LogisticRegression should b
On Fri, 15 Jun 2012, [email protected] wrote:
> https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/misc/dcov.py#L160
> looks like a double sum, but wikipedia only has one sum, elementwise product.
sorry -- I might be slow -- what sum? there is only an outer product in
160:Axy = Ax[:, None
On Fri, Jun 15, 2012 at 4:20 PM, Yaroslav Halchenko wrote:
> Here is a comparison to output of my code (marked with >):
>
> 0.00458652660079 0.788017364828 0.00700027844478 0.00483928213727
>> 0.145564526722 0.480124905375 0.422482399359 0.217567496918
> 6.50616752373e-07 7.99461373461e-05 0.0070
Dear all,
What are the advantages of choosing one of the Subject line classifiers
over the other? At a quick glance, I see the following:
- LogisticRegression implements predict_proba for the multiclass case,
while SGDClassifier doesn't
- SGDClassifier(loss="log") lets you specify multiple CPUs f
Here is a comparison to output of my code (marked with >):
0.00458652660079 0.788017364828 0.00700027844478 0.00483928213727
> 0.145564526722 0.480124905375 0.422482399359 0.217567496918
6.50616752373e-07 7.99461373461e-05 0.00700027844478 0.0094610687282
> 0.120884106118 0.249205123601 0.4224823
On Fri, Jun 15, 2012 at 3:50 PM, wrote:
> On Fri, Jun 15, 2012 at 10:45 AM, Yaroslav Halchenko
> wrote:
>>
>> On Fri, 15 Jun 2012, Satrajit Ghosh wrote:
>>> hi yarik,
>>> here is my attempt:
>>>
>>> [1]https://github.com/satra/scikit-learn/blob/enh/covariance/sklearn/covariance/distan
On Fri, Jun 15, 2012 at 10:45 AM, Yaroslav Halchenko
wrote:
>
> On Fri, 15 Jun 2012, Satrajit Ghosh wrote:
>> hi yarik,
>> here is my attempt:
>>
>> [1]https://github.com/satra/scikit-learn/blob/enh/covariance/sklearn/covariance/distance_covariance.py
>> i'll look at your code in det
2012/6/15 Dinesh B Vadhia :
> The class CharNGramAnalyzer is documentated at
> http://scikit-learn.org/0.8/modules/generated/scikits.learn.feature_extraction.text.CharNGramAnalyzer.html#scikits.learn.feature_extraction.text.CharNGramAnalyzer.
That's the 0.8 documentation. The latest release is 0.1
Olivier
I tried to run
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/text.py
and got the error:
from sklearn.feature_extraction.text import CharNGramAnalyzer
ImportError: cannot import name CharNGramAnalyzer
The class CharNGramAnalyzer is documentated at
ht
On Fri, 15 Jun 2012, Satrajit Ghosh wrote:
>hi yarik,
>here is my attempt:
>
> [1]https://github.com/satra/scikit-learn/blob/enh/covariance/sklearn/covariance/distance_covariance.py
>i'll look at your code in detail later today to understand the uv=True
it is just to compute dCo[v
hi yarik,
here is my attempt:
https://github.com/satra/scikit-learn/blob/enh/covariance/sklearn/covariance/distance_covariance.py
i'll look at your code in detail later today to understand the uv=True case.
cheers,
satra
On Fri, Jun 15, 2012 at 10:19 AM, Yaroslav Halchenko wrote:
> I haven't
I haven't had a chance to play with it extensively but I have a basic
implementation:
https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/misc/dcov.py
which still lacks statistical assessment, but provides dCov, dCor values
and yes -- it is "inherently multivariate", but since also could be
useful
hi yarik,
hm... interesting -- and there is no comparison against "minimizing
> independence"? e.g. dCov measure
> http://en.wikipedia.org/wiki/Distance_correlation which is really simple
> to estimate and as intuitive as a correlation coefficient
>
thanks for bringing up dCov. have you had a cha
Submitted 5/07; Revised 6/11; Published 5/12
It takes such a long time ...
On Fri, Jun 15, 2012 at 8:58 PM, Satrajit Ghosh wrote:
> fyi
>
> -- Forwarded message --
> From: joshua vogelstein
> Date: Fri, Jun 15, 2012 at 12:35 AM
>
> http://jmlr.csail.mit.edu/papers/volume13/song
hm... interesting -- and there is no comparison against "minimizing
independence"? e.g. dCov measure
http://en.wikipedia.org/wiki/Distance_correlation which is really simple
to estimate and as intuitive as a correlation coefficient
On Fri, 15 Jun 2012, Satrajit Ghosh wrote:
>fyi
>
fyi
-- Forwarded message --
From: joshua vogelstein
Date: Fri, Jun 15, 2012 at 12:35 AM
http://jmlr.csail.mit.edu/papers/volume13/song12a/song12a.pdf
these guys define a nice nonlinear/nonparametric measure of correlation
that might be of interest to you.
---
On 06/13/2012 10:52 AM, Olivier Grisel wrote:
> 2012/6/13 Emanuele Olivetti:
>> Hi,
>>
>> You can use gzip.open() instead of open() to add compression and to
>> (possibly)
>> decrease the file size a lot - at least it did to me in a similar example:
>>
>> import gzip
>> pickle.dump(clf, gzip.open(
2012/6/15 xinfan meng :
> The docs tell you that you can customize an define a preprocessor to first
> segment the text if needed, e.g. in Chinese or Japanese. However, sklearn
> does not provide one such preprocessor. To see how you can implement one,
> the best way is to take a look at the codes.
Am 15.06.2012 10:48, schrieb Olivier Grisel:
> 2012/6/15 iBayer:
>> Hey Andreas,
>>
>> I'm in contact with folks at mldata.org apparently thinks aren't as
>> easy as I was hoping. The hdf5 format description isn't is outdated...
>> I already uploaded a couple files but there aren't of any use and
2012/6/15 iBayer :
> Hey Andreas,
>
> I'm in contact with folks at mldata.org apparently thinks aren't as
> easy as I was hoping. The hdf5 format description isn't is outdated...
> I already uploaded a couple files but there aren't of any use and
> yes the sparse format is especially problematic.
I'll create a wiki page on the scikit's github wiki, and indicate the
sprint on pyconfr's website.
Cheers,
N
On 14 June 2012 18:42, Alexandre Gramfort wrote:
> I should be there too
>
> Alex
>
> On Thu, Jun 14, 2012 at 6:50 PM, Olivier Grisel
> wrote:
> > 2012/6/14 Nelle Varoquaux :
> >> Hi eve
The docs tell you that you can customize an define a preprocessor to first
segment the text if needed, e.g. in Chinese or Japanese. However, sklearn
does not provide one such preprocessor. To see how you can implement one,
the best way is to take a look at the codes. I think the text processing
pip
Hey Andreas,
I'm in contact with folks at mldata.org apparently thinks aren't as
easy as I was hoping. The hdf5 format description isn't is outdated...
I already uploaded a couple files but there aren't of any use and
yes the sparse format is especially problematic.
I'll keep you posted
2012/6/
29 matches
Mail list logo