Hi Yujing, Thanks for your questions. See my responses inline.
On Thu, Nov 20, 2008 at 2:47 AM, 国玉晶 <[EMAIL PROTECTED]> wrote: > Dear Mr. Pedersen, > > I'm a student majoring in speech recognition. In my recent research, I want > to extract some semantic features based on LSA, especially word semantic > similarity. And I'm very glad that I find SenseCluster is the very tool that > I need :) But I'm a little confused by the output from the online interface. > So can I ask two questions about the tool? > > Q1: > Does LSA use only 1st order representation? And the 2ed order representation > is the native methodology? No. :) LSA uses 2nd order representation, and so does the native methodology. The difference is that for LSA the second order representation is created using a "feature by context" representation, whereas for the native representation it's a "feature by feature" representation. LSA does start with the first order representation, which is "context by feature". Then it transposes that into "feature by context", from which the second order vectors are constructed. You may want to explore the online documentation a little bit, especially for order1vec.pl, order2vec.pl to get an idea of how that's all working. You can also find a good overview of the first and second order representation issues here : Unsupervised Context Discrimination and Automatic Cluster Stopping (Kulkarni) - Master of Science Thesis, Department of Computer Science, University of Minnesota, Duluth, July, 2006. http://www.d.umn.edu/~tpederse/Pubs/anagha-thesis.pdf We also discuss the same in a number of our SenseClusters related papers, so you might want to check those out too: http://www.d.umn.edu/~tpederse/senseclusters-pubs.html > > Q2: > I need only the word similarity, not the clustering result. So how can I get > the result of word similarity between each word? Is it a intermediate file? You should use the "Feature Clustering" option in the web interface or the --wordclust option in discriminate.pl (from the command line). You can do word clustering with either the native methodology or LSA. We call it feature clustering rather than word clustering since you can actually determine the unit you wish to cluster by setting the --token file appropriately. So features could be bigrams, character grams, etc. or words (which is the default). Good luck! Ted > > Thank you for spending time reading this mail. Your help is very important > for me and will be really appreciated :) > > Regards, > Yujing Guo -- Ted Pedersen http://www.d.umn.edu/~tpederse ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
