Hello ! Just few more clarifications from my side.
> > I find this project very interesting. But I'd like to check how you define > > contexts. There are 2 ways to define context - 1. you can specify --scope W option in wrappers or use windower to select only W words around the target word 2. By not specifying --scope W or running windower with a vert large window size like 10,000, you essentially use entire context data between <context> and </context> tags as the context of the instance. Case 1 will select and match local features within some positions from the target word while case 2 will use global features. In both cases, W words on both sides (left and right) of the target word are used. > > Am I right in thinking that the context "features" which are used to construct > > context vectors and then similarity matrices are bigram word pairs only, In order1, features supported are Ngrams like unigrams, bigrams of words within some flexible window as you point out below, and also the co-occurrences like 1st order and 2nd order ... In order2, we currently support 2 types of features - 1. bigrams 2. co-occurrence pairs However with a different interpretation ! With bigram features, a word matrix will have bigrams scores. i.e. MAT [i][j] shows the frequency/statistics score of bigram WORDi<>WORDj. With co-occurrence features in order2, the word order will be ignored. i.e. MAT[i][j] will show frequency/stat. score of pair WORDi<=>WORDj in any order, meaning, WORDi can follow/precede WORDj. > > I think I do something very similar, but I found for my purposes it has been > > necessary to have both the preceding and following context in a single Both bigram and co-occurrence features are computed by considering context on both sides of the target word. The only difference is that with bigrams, order of words is important while co-occurrence pairs do show any particular order of the component words. Please let us know if this doesn't answer your question or leaves any further doubts. Thanks for writing, Amruta ------------------------------------------------------- This SF.net email is sponsored by: Perforce Software. Perforce is the Fast Software Configuration Management System offering advanced branching capabilities and atomic changes on 50+ platforms. Free Eval! http://www.perforce.com/perforce/loadprog.html _______________________________________________ senseclusters-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
