Re: [Senseclusters-users] about usage of SenseCluster

Ted Pedersen Thu, 20 Nov 2008 07:05:06 -0800

Hi Yujing,

Thanks for your questions. See my responses inline.

On Thu, Nov 20, 2008 at 2:47 AM, 国玉晶 <[EMAIL PROTECTED]> wrote:
> Dear Mr. Pedersen,
>
> I'm a student majoring in speech recognition. In my recent research, I want
> to extract some semantic features based on LSA, especially word semantic
> similarity. And I'm very glad that I find SenseCluster is the very tool that
> I need :) But I'm a little confused by the output from the online interface.
> So can I ask two questions about the tool?
>
> Q1:
> Does LSA use only 1st order representation? And the 2ed order representation
> is the native methodology?

No. :)

LSA uses 2nd order representation, and so does the native methodology.
The difference is that for LSA the second order representation is
created using a "feature by context" representation, whereas for the
native representation it's a "feature by feature" representation.

LSA does start with the first order representation, which is "context
by feature". Then it transposes that into "feature by context", from
which the second order vectors are constructed.

You may want to explore the online documentation a little bit,
especially for order1vec.pl, order2vec.pl to get an idea of how that's
all working.

You can also find a good overview of the first and second order
representation issues here :

Unsupervised Context Discrimination and Automatic Cluster Stopping
(Kulkarni) - Master of Science Thesis, Department of Computer Science,
University of Minnesota, Duluth, July, 2006.
http://www.d.umn.edu/~tpederse/Pubs/anagha-thesis.pdf

We also discuss the same in a number of our SenseClusters related
papers, so you might want to check those out too:

http://www.d.umn.edu/~tpederse/senseclusters-pubs.html

>
> Q2:
> I need only the word similarity, not the clustering result. So how can I get
> the result of word similarity between each word? Is it a intermediate file?

You should use the "Feature Clustering" option in the web interface or
the --wordclust option in discriminate.pl (from the command line). You
can do word clustering with either the native methodology or LSA.

We call it feature clustering rather than word clustering since you
can actually determine the unit you wish to cluster by setting the
--token file appropriately. So features could be bigrams, character
grams, etc. or words (which is the default).

Good luck!
Ted
>
> Thank you for spending time reading this mail. Your help is very important
> for me and will be really appreciated :)
>
> Regards,
> Yujing Guo

-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Re: [Senseclusters-users] about usage of SenseCluster

Reply via email to