On Tue, Jul 1, 2014 at 3:35 AM, Joel Nothman <joel.noth...@gmail.com> wrote:
> It may be beneficial to use some kind of query expansion or unsupervised
> dimensionality reduction, as the vectors from a bag of words encoding will
> probably be very sparse. Does that help?
>
> How can query expansion help?? I don't think I can use that in my case.
Will there be any method by which I can find some kind of distance between
strings. In which I wanted to give more weight age to similar words. If I
get that, then I could use Affinity propagation for clustering.
For distance calculation I had used difflib.SequenceMatcher().ratio()
between two strings. But here even if the strings were to be in the same
cluster, due to its length differences, they go two different clusters. Is
this a good approach? Is there any better method??
Regards,
Abijith
> On 30 June 2014 03:03, Abijith Kp <abijith....@gmail.com> wrote:
>
>> Hi,
>>
>> Is it possible to use TfidfVectorizer to cluster very small sized
>> texts?? By small I mean with words less than 20.
>>
>> Or is there any better way to do it.
>>
>> Regards,
>> Abijith
>>
>> --
>> Abijith KP
>> github.com/abijith-kp
>> kpabijith.wordpress.com
>>
>>
>> ------------------------------------------------------------------------------
>> Open source business process management suite built on Java and Eclipse
>> Turn processes into business applications with Bonita BPM Community
>> Edition
>> Quickly connect people, data, and systems into organized workflows
>> Winner of BOSSIE, CODIE, OW2 and Gartner awards
>> http://p.sf.net/sfu/Bonitasoft
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
--
Abijith KP
github.com/abijith-kp
kpabijith.wordpress.com
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general