> If you use the --stat option to select a test of association for > identifying bigram or co-occurrence features, you really should use the > --stat_rank or --stat_score to select a subset of the bigrams or > co-occurrences that are identified. If you do not, the effect will be the > same as if you didn't use the --stat option at all, since all features
This is true ONLY FOR order1, for order2 --stat uses stat scores in word vectors, even if you dont remove the insignificant features. SO, using --context o2 and --stat without --stat_score or --stat_rank will create word vectors for all bigrams/cocs in your training data. For order1, yes we consider frequency counts of features in test contexts, so scores dont matter. But for word vectors they show scores/frequencies depending on whether or not you use --stat. Also, one more relevant point. --window, --stat do not apply to unigram features. So even if you do specify these params for uni features, they wont take any effect. Amruta ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ senseclusters-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
