Hello ! 

Just few more clarifications from my side. 

> > I find this project very interesting. But I'd like to check how you define 
> > contexts.

There are 2 ways to define context - 

1. you can specify --scope W option in wrappers or use windower to select 
only W words around the target word 

2. By not specifying --scope W or running windower with a vert large 
window size like 10,000, you essentially use entire context data
between <context> and </context> tags as the context of the instance. 

Case 1 will select and match local features within some positions from
the target word while case 2 will use global features. 

In both cases, W words on both sides (left and right) of the target 
word are used. 

> > Am I right in thinking that the context "features" which are used to construct 
> > context vectors and then similarity matrices are bigram word pairs only, 

In order1, features supported are Ngrams like unigrams, bigrams of 
words within some flexible window as you point out below, and also the 
co-occurrences like 1st order and 2nd order ... 

In order2, we currently support 2 types of features - 
1. bigrams
2. co-occurrence pairs 

However with a different interpretation !

With bigram features, a word matrix will have bigrams scores.
i.e. MAT [i][j] shows the frequency/statistics score of bigram 
WORDi<>WORDj. 

With co-occurrence features in order2, the word order will be ignored. 
i.e. MAT[i][j] will show frequency/stat. score of pair WORDi<=>WORDj in 
any order, meaning, WORDi can follow/precede WORDj. 

> > I think I do something very similar, but I found for my purposes it has been 
> > necessary to have both the preceding and following context in a single 

Both bigram and co-occurrence features are computed by considering 
context on both sides of the target word. The only difference is that 
with bigrams, order of words is important while co-occurrence pairs do 
show any particular order of the component words. 

Please let us know if this doesn't answer your question or leaves any 
further doubts. 

Thanks for writing, 
Amruta



-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
senseclusters-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to