Hi Sai, I've copied this to the users list as it seems like a generally important issue. Yes, you could mix bigrams and co-occurrences, or unigrams and bigrams, or really any combination of the different kinds of features supported in SenseClusters.
The important point is that while discriminate.pl does not do that sort of mixing, it is certainly possible to do that in SenseClusters. You would simply need to build your different features files, run each of them through nsp2regex seperately, and then concatenate the resulting regex files. Building the different feature files would involve various calls to count.pl and statistic.pl, and then after that various calls to nsp2regex.pl to build your regex files (which are what SenseClusters uses to find features). You may want to check out the flowchart available here to see more clearly the sequence in which things could be done: http://www.d.umn.edu/~tpederse/Docs/SenseClusters-ContextClustering.pdf Hope this helps! Good luck, Ted On Thu, Feb 14, 2008 at 5:56 AM, Sai Tang Huang <[EMAIL PROTECTED]> wrote: > Hi Ted, > > I've been wondering. The way SenseClusters selects features is done in > either of the three ways: unigram, bigram, co-occurrence. Right? Can the 3 > different ways be mixed? Wouldn't it be more efficient to, say, have all the > significant bigrams and all the significant co-occurrences represented as a > vector space? This vector space is going to be huge right? Is this good? > > I hope you could tell me your thoughts on this, > > Regards, > > Sai > > > -------------------------------------------------- > From: "Ted Pedersen" <[EMAIL PROTECTED]> > Sent: Monday, February 11, 2008 9:55 PM > To: "Sai Tang Huang" <[EMAIL PROTECTED]> > Subject: Re: floating point overflow error > > > Yes, I think using precision of 6 with ll is very adequate. In general > > I don't think any of the measures require more than 6, except possibly > > tmi. So I would not worry too much about adjusting the precision - > > that's a fairly minor option in some respects and won't have as much > > impact as others (such as the features you are using, etc. etc) > > > > Good luck! > > Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
