Re: [Scikit-learn-general] Clustering using TfidfVectorizer

2014-06-30 Thread Abijith Kp
On Tue, Jul 1, 2014 at 3:35 AM, Joel Nothman wrote: > It may be beneficial to use some kind of query expansion or unsupervised > dimensionality reduction, as the vectors from a bag of words encoding will > probably be very sparse. Does that help? > > How can query expansion help?? I don't think I

Re: [Scikit-learn-general] Clustering using TfidfVectorizer

2014-06-30 Thread Robert Layton
A bit more concretely, have a look at this class: http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html It is a transformer, so you can apply it to any matrix (that doesn't mean it makes sense, just that you can): # Create original matrix X = creat

Re: [Scikit-learn-general] Clustering using TfidfVectorizer

2014-06-30 Thread Joel Nothman
It may be beneficial to use some kind of query expansion or unsupervised dimensionality reduction, as the vectors from a bag of words encoding will probably be very sparse. Does that help? On 30 June 2014 03:03, Abijith Kp wrote: > Hi, > > Is it possible to use TfidfVectorizer to cluster very s

[Scikit-learn-general] Clustering using TfidfVectorizer

2014-06-30 Thread Abijith Kp
Hi, Is it possible to use TfidfVectorizer to cluster very small sized texts?? By small I mean with words less than 20. Or is there any better way to do it. Regards, Abijith -- Abijith KP github.com/abijith-kp kpabijith.wordpress.com -