Hi Curtis, > But I can easily adjust the algorithm so it doesn't output pairs.that are > further than some N distance, if you desire.
I am pretty sure we are going to want to experiment with that.... What will make sense is to do some experiments on a modest-sized corpus to try to find a methodology that "seems to work", and then repeat that methodology on a larger corpus... So as part of the initial experimentation phase, we will probably want to try the "N distance limitation" with a few values of N... Also to note is that, for later phases in the process, we will want to load "tagged sentences" rather than just sentences. In a tagged sentence, each word will be associated with one or more category labels. [But I don't envision more than, say, 10 labels associated with an individual word... often it will be 1 or 2] We will then want to update counts based on category labels as well as words. (The mechanism described in the previous paragraph is relevant for workflow in which one is doing the statistical learning process for clustering/disambiguation partly outside the Atomspace, which is what Ruiting and I aim to try first.... It's not relevant if one is doing this statistical learning process within the Atomspace, which is what Linas is about to try...) So a couple next steps will be 1) make code to export sparse feature vectors for word, where the vector for word W has two entry for each other word V: one entry for V on the left of W, and one entry for V on the right of W. For instance: The entry in W's feature vector corresponding to "V on the left of W" is based on the total weight of links pointing from V to W (with V on the left) in the spanning-tree parses that Linas's code finds... 2) try to run Shujing's pattern miner on the collection of MST parses in the Atomspace. This typically requires some fiddling with the templates and parameters for the pattern miner ... Shujing is good at giving practical guidance on this, Bitseat and Tensae are not bad at it either by this point... The patterns mined by the pattern miner can be used to make more sophisticated feature vectors to export, which can be tried as an alternative to the simpler ones described in step (1) above. In these more sophisticated feature vectors, a library of significant patterns in the collection of MST parses will be found, and W's feature vector will have an entry corresponding to "W occurs in position k of pattern i in library" (for each i and k) We can discuss all this on Monday when Ruiting and I are back in HK, I'm just putting it down in an email while it's fresh in my mind... -- Ben thanks ben -- Ben Goertzel, PhD http://goertzel.org "I am God! I am nothing, I'm play, I am freedom, I am life. I am the boundary, I am the peak." -- Alexander Scriabin -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBd4%2BipcsdSpjDrUT2k8BfTW%3DbCbNXc92Nd8FP_riHHKsQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
