Hi Alex,
If I understand correctly you are using 2 different kinds of features :
categorical + ngrams.
In a similar situation but in a classification setting a trick that worked
reasonably well was to train two different models, one feeding the other.
I.e. build a first model out of ngrams/nlp f
2013/4/24 Alex Kopp :
> Thanks, guys.
>
> Perhaps I should explain what I am trying to do and then open it up for
> suggestions.
>
> I have 203k training examples each with 457k features. The features are
> composed of one-hot encoded categorical values as well as stemmed, TFIDF
> weighted unigrams
Have you tried tuning the hyper-parameters of the SGDRegressor? You really
need to tune the learning rate for SGDRegressor (SGDClassifier has a pretty
decent default). E.g. set up a grid search w/ a constant learning rate and
try different values of eta0 ([0.1, 0.01, 0.001, 0.0001]). You can also s
Thanks, guys.
Perhaps I should explain what I am trying to do and then open it up for
suggestions.
I have 203k training examples each with 457k features. The features are
composed of one-hot encoded categorical values as well as stemmed, TFIDF
weighted unigrams and bigrams (NLP). As you can proba
2013/4/24 Olivier Grisel
> 2013/4/24 Peter Prettenhofer :
> > I totally agree with Brian - although I'd suggest you drop option 3)
> because
> > it will be a lot of work.
> >
> > I'd suggest you rather should do a) feature extraction or b) feature
> > selection.
> >
> > Personally, I think decisi
2013/4/24 Peter Prettenhofer :
> I totally agree with Brian - although I'd suggest you drop option 3) because
> it will be a lot of work.
>
> I'd suggest you rather should do a) feature extraction or b) feature
> selection.
>
> Personally, I think decision trees in general and random forest in
> pa
I totally agree with Brian - although I'd suggest you drop option 3)
because it will be a lot of work.
I'd suggest you rather should do a) feature extraction or b) feature
selection.
Personally, I think decision trees in general and random forest in
particular are not a good fit for sparse datase
At the moment your three options are
1) get more memory
2) do feature selection - 400k features on 200k samples seems to me to
contain a lot of redundant information or irrelevant features
3) submit a PR to support dense matrices - this is going to be a lot of
work and I doubt it's worth it.
All t
@Alex: I don't have a workaround for you but this seems like a useful
addition. I don't know how hard it would be, but you should definitely
raise it as an issue on the github issues page for the project:
https://github.com/scikit-learn/scikit-learn/issues?sort=updated&state=open
On Wed, Apr 24,
get more memory?
On 23 April 2013 17:06, Alex Kopp wrote:
> Hi,
>
> I am looking to build a random forest regression model with a pretty large
> amount of sparse data. I noticed that I cannot fit the random forest model
> with a sparse matrix. Unfortunately, a dense matrix is too large to fit in
Hi,
I am looking to build a random forest regression model with a pretty large
amount of sparse data. I noticed that I cannot fit the random forest model
with a sparse matrix. Unfortunately, a dense matrix is too large to fit in
memory. What are my options?
For reference, I have just over 400k fe
11 matches
Mail list logo