Olivier Grisel writes:
>
> 2012/9/6 Ark :
> >
> >> Hand how large in bytes? It seems that is should be small enough to be
> >> able to use sklearn.linear_model.LogisticRegression despite the data
> >> copy in memory.
> >>
> >
> > Right now its not even 100M, but it will extend to 1G atleast.
2012/9/6 Ark :
>
>> Hand how large in bytes? It seems that is should be small enough to be
>> able to use sklearn.linear_model.LogisticRegression despite the data
>> copy in memory.
>>
>
> Right now its not even 100M, but it will extend to 1G atleast.
Alright, have you tried sklearn.linear_model.L
> Hand how large in bytes? It seems that is should be small enough to be
> able to use sklearn.linear_model.LogisticRegression despite the data
> copy in memory.
>
Right now its not even 100M, but it will extend to 1G atleast.
2012/9/5 Ark :
>
>> How large (in bytes and in which format)? What are n_samples,
>> n_features and n_classes?
>>
>
> Input data is in the form of paragraphs from English literature
> n_samples=1, n_features=100,000, n_classes=max 100[still collecting data]
Hand how large in bytes? It seems th
Ark writes:
>
>
> > How large (in bytes and in which format)? What are n_samples,
> > n_features and n_classes?
> >
>
> Input data is in the form of paragraphs from English literature
So,
raw data -> Countvectorizer -> test, train set -> sgd.fit -> predict
is the flow.
> n_samples=1,
> How large (in bytes and in which format)? What are n_samples,
> n_features and n_classes?
>
Input data is in the form of paragraphs from English literature
n_samples=1, n_features=100,000, n_classes=max 100[still collecting data]
2012/9/5 Ark :
> What would be the best approach to classify a large dataset with sparse
> features, into multiple categories.
How large (in bytes and in which format)? What are n_samples,
n_features and n_classes?
> I referred to the multiclass page in the
> sklearn documentation, but was no
What would be the best approach to classify a large dataset with sparse
features, into multiple categories. I referred to the multiclass page in the
sklearn documentation, but was not sure on which one to use for multiclass
probabilities [top n probabilities would be nice].
I tried usin