Hey Ben, you wrote: > So as part of the initial experimentation phase, we will probably want to > try the "N > distance limitation" with a few values of N...
This now works and I did some testing to check on both performance and resource usage for various values of N (which I am calling Pair Distance Limit below from the notes in opencog/nlp/learn/observe_cc_notes.txt <https://github.com/opencog/opencog#diff-f43bf34f292bc591f2aefa1f9c12dda1>): Pair Distance Atoms Total Observe Ops per Limit Added RAM Time Time Second ------------- ----- ----- ----- ------- ------- 1 164,574 0.167G 13s. 4s 1,483 2 334,716 0.297G 16s. 7s 848 3 493,364 0.410G 20s. 11s 539 6 896,482 0.715G 35s 26s 228 12 1,473,987 1.084G 47s 38s 156 All pairs 2,690,934 1.949G 87s 78s 76 Noop - just send text. 0 0.044G 9s. 0s. N/A NOTE: Observe Time = Total Time - Noop Time I submitted pull request #2766 <https://github.com/opencog/opencog/pull/2766> for this in order to make it easier for Ruiting to pull down my changes to try this tomorrow. Now that the plumbing is in place, it's pretty easy to add anything else you'd need for this. On Fri, Jun 9, 2017 at 10:23 AM, Ben Goertzel <[email protected]> wrote: > Hi Curtis, > > > But I can easily adjust the algorithm so it doesn't output pairs.that are > > further than some N distance, if you desire. > > I am pretty sure we are going to want to experiment with that.... > What will make sense is to do some experiments on a modest-sized > corpus to try to find a methodology that "seems to work", and then > repeat that methodology on a larger corpus... So as part of the > initial experimentation phase, we will probably want to try the "N > distance limitation" with a few values of N... > > Also to note is that, for later phases in the process, we will want to > load "tagged sentences" rather than just sentences. In a tagged > sentence, each word will be associated with one or more category > labels. [But I don't envision more than, say, 10 labels associated > with an individual word... often it will be 1 or 2] We will then > want to update counts based on category labels as well as words. > > (The mechanism described in the previous paragraph is relevant for > workflow in which one is doing the statistical learning process for > clustering/disambiguation partly outside the Atomspace, which is what > Ruiting and I aim to try first.... It's not relevant if one is doing > this statistical learning process within the Atomspace, which is what > Linas is about to try...) > > So a couple next steps will be > > 1) make code to export sparse feature vectors for word, where the > vector for word W has two entry for each other word V: one entry for V > on the left of W, and one entry for V on the right of W. For > instance: The entry in W's feature vector corresponding to "V on the > left of W" is based on the total weight of links pointing from V to W > (with V on the left) in the spanning-tree parses that Linas's code > finds... > > 2) try to run Shujing's pattern miner on the collection of MST parses > in the Atomspace. This typically requires some fiddling with the > templates and parameters for the pattern miner ... Shujing is good at > giving practical guidance on this, Bitseat and Tensae are not bad at > it either by this point... > > The patterns mined by the pattern miner can be used to make more > sophisticated feature vectors to export, which can be tried as an > alternative to the simpler ones described in step (1) above. In > these more sophisticated feature vectors, a library of significant > patterns in the collection of MST parses will be found, and W's > feature vector will have an entry corresponding to "W occurs in > position k of pattern i in library" (for each i and k) > > We can discuss all this on Monday when Ruiting and I are back in HK, > I'm just putting it down in an email while it's fresh in my mind... > > -- Ben > > > thanks > ben > > > -- > Ben Goertzel, PhD > http://goertzel.org > > "I am God! I am nothing, I'm play, I am freedom, I am life. I am the > boundary, I am the peak." -- Alexander Scriabin > -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAJzHpFr3wSmcLJdiaPPfpvnp92EkXZdHGJCL6E91%3DanvY%2BNf5w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
