Re: [Scikit-learn-general] Multi-class sparse data

2012-09-10 Thread Ark
Olivier Grisel writes: > > 2012/9/6 Ark : > > > >> Hand how large in bytes? It seems that is should be small enough to be > >> able to use sklearn.linear_model.LogisticRegression despite the data > >> copy in memory. > >> > > > > Right now its not even 100M, but it will extend to 1G atleast.

Re: [Scikit-learn-general] Multi-class sparse data

2012-09-06 Thread Olivier Grisel
2012/9/6 Ark : > >> Hand how large in bytes? It seems that is should be small enough to be >> able to use sklearn.linear_model.LogisticRegression despite the data >> copy in memory. >> > > Right now its not even 100M, but it will extend to 1G atleast. Alright, have you tried sklearn.linear_model.L

Re: [Scikit-learn-general] Multi-class sparse data

2012-09-06 Thread Ark
> Hand how large in bytes? It seems that is should be small enough to be > able to use sklearn.linear_model.LogisticRegression despite the data > copy in memory. > Right now its not even 100M, but it will extend to 1G atleast.

Re: [Scikit-learn-general] Multi-class sparse data

2012-09-05 Thread Olivier Grisel
2012/9/5 Ark : > >> How large (in bytes and in which format)? What are n_samples, >> n_features and n_classes? >> > > Input data is in the form of paragraphs from English literature > n_samples=1, n_features=100,000, n_classes=max 100[still collecting data] Hand how large in bytes? It seems th

Re: [Scikit-learn-general] Multi-class sparse data

2012-09-05 Thread Ark
Ark writes: > > > > How large (in bytes and in which format)? What are n_samples, > > n_features and n_classes? > > > > Input data is in the form of paragraphs from English literature So, raw data -> Countvectorizer -> test, train set -> sgd.fit -> predict is the flow. > n_samples=1,

Re: [Scikit-learn-general] Multi-class sparse data

2012-09-05 Thread Ark
> How large (in bytes and in which format)? What are n_samples, > n_features and n_classes? > Input data is in the form of paragraphs from English literature n_samples=1, n_features=100,000, n_classes=max 100[still collecting data]

Re: [Scikit-learn-general] Multi-class sparse data

2012-09-05 Thread Olivier Grisel
2012/9/5 Ark : > What would be the best approach to classify a large dataset with sparse > features, into multiple categories. How large (in bytes and in which format)? What are n_samples, n_features and n_classes? > I referred to the multiclass page in the > sklearn documentation, but was no

[Scikit-learn-general] Multi-class sparse data

2012-09-05 Thread Ark
What would be the best approach to classify a large dataset with sparse features, into multiple categories. I referred to the multiclass page in the sklearn documentation, but was not sure on which one to use for multiclass probabilities [top n probabilities would be nice]. I tried usin