Hi Sai,

I've copied this to the users list as it seems like a generally
important issue.  Yes, you could mix bigrams and co-occurrences, or
unigrams and bigrams, or really any combination of the different kinds
of features supported in SenseClusters.

The important point is that while discriminate.pl does not do that
sort of mixing, it is certainly possible to do that in SenseClusters.
You would simply need to build your different features files, run each
of them through nsp2regex seperately, and then concatenate the
resulting regex files.

Building the different feature files would involve various calls to
count.pl and statistic.pl, and then after that various calls to
nsp2regex.pl to build your regex files (which are what SenseClusters
uses to find features).

You may want to check out the flowchart available here to see more
clearly the sequence in which things could be done:

http://www.d.umn.edu/~tpederse/Docs/SenseClusters-ContextClustering.pdf

Hope this helps!
Good luck,
Ted
On Thu, Feb 14, 2008 at 5:56 AM, Sai Tang Huang
<[EMAIL PROTECTED]> wrote:
> Hi Ted,
>
>  I've been wondering. The way SenseClusters selects features is done in
>  either of the three ways: unigram, bigram, co-occurrence. Right? Can the 3
>  different ways be mixed? Wouldn't it be more efficient to, say, have all the
>  significant bigrams and all the significant co-occurrences represented as a
>  vector space? This vector space is going to be huge right? Is this good?
>
>  I hope you could tell me your thoughts on this,
>
>  Regards,
>
>  Sai
>
>
>  --------------------------------------------------
>  From: "Ted Pedersen" <[EMAIL PROTECTED]>
>  Sent: Monday, February 11, 2008 9:55 PM
>  To: "Sai Tang Huang" <[EMAIL PROTECTED]>
>  Subject: Re: floating point overflow error
>
>  > Yes, I think using precision of 6 with ll is very adequate. In general
>  > I don't think any of the measures require more than 6, except possibly
>  > tmi. So I would not worry too much about adjusting the precision -
>  > that's a fairly minor option in some respects and won't have as much
>  > impact as others (such as the features you are using, etc. etc)
>  >
>  > Good luck!
>  > Ted



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to