[opencog-dev] Re: Preliminary results of C++ observe-text

Curtis Faith Sat, 10 Jun 2017 03:52:58 -0700

Hey Ben, you wrote:


> So as part of the initial experimentation phase, we will probably want to
> try the "N
> distance limitation" with a few values of N...



This now works and I did some testing to check on both performance and
resource usage for various  values of  N (which I am calling Pair Distance
Limit below from the notes in opencog/nlp/learn/observe_cc_notes.txt
<https://github.com/opencog/opencog#diff-f43bf34f292bc591f2aefa1f9c12dda1>):

Pair Distance           Atoms                         Total    Observe
Ops per
Limit                   Added             RAM          Time       Time
 Second
-------------           -----           -----         -----    -------
-------
1                     164,574          0.167G           13s.        4s
1,483
2                     334,716          0.297G           16s.        7s
  848
3                     493,364          0.410G           20s.       11s
  539
6                     896,482          0.715G           35s        26s
  228
12                  1,473,987          1.084G           47s        38s
  156
All pairs           2,690,934          1.949G           87s        78s
   76

Noop - just send text.      0          0.044G            9s.        0s.
 N/A

NOTE:
Observe Time = Total Time - Noop Time

I submitted pull request #2766
<https://github.com/opencog/opencog/pull/2766> for this in order to make it
easier for Ruiting to pull down my changes to try this tomorrow. Now that
the plumbing is in place, it's pretty easy to add anything else you'd need
for this.


On Fri, Jun 9, 2017 at 10:23 AM, Ben Goertzel <[email protected]> wrote:

> Hi Curtis,
>
> > But I can easily adjust the algorithm so it doesn't output pairs.that are
> > further than some N distance, if you desire.
>
> I am pretty sure we are going to want to experiment with that....
> What will make sense is to do some experiments on a modest-sized
> corpus to try to find a methodology that "seems to work", and then
> repeat that methodology on a larger corpus...  So as part of the
> initial experimentation phase, we will probably want to try the "N
> distance limitation" with a few values of N...
>
> Also to note is that, for later phases in the process, we will want to
> load "tagged sentences" rather than just sentences.   In a tagged
> sentence, each word will be associated with one or more category
> labels.  [But I don't envision more than, say, 10 labels associated
> with an individual word... often it will be 1 or 2]   We will then
> want to update counts based on category labels as well as words.
>
> (The mechanism described in the previous paragraph is relevant for
> workflow in which one is doing the statistical learning process for
> clustering/disambiguation partly outside the Atomspace, which is what
> Ruiting and I aim to try first....   It's not relevant if one is doing
> this statistical learning process within the Atomspace, which is what
> Linas is about to try...)
>
> So a couple next steps will be
>
> 1) make code to export sparse feature vectors for word, where the
> vector for word W has two entry for each other word V: one entry for V
> on the left of W, and one entry for V on the right of W.   For
> instance: The entry in W's feature vector corresponding to "V on the
> left of W" is based on the total weight of links pointing from V to W
> (with V on the left) in the spanning-tree parses that Linas's code
> finds...
>
> 2) try to run Shujing's pattern miner on the collection of MST parses
> in the Atomspace.  This typically requires some fiddling with the
> templates and parameters for the pattern miner ... Shujing is good at
> giving practical guidance on this, Bitseat and Tensae are not bad at
> it either by this point...
>
> The patterns mined by the pattern miner can be used to make more
> sophisticated feature vectors to export, which can be tried as an
> alternative to the simpler ones described in step (1) above.   In
> these more sophisticated feature vectors, a library of significant
> patterns in the collection of MST parses will be found, and W's
> feature vector will have an entry corresponding to "W occurs in
> position k of pattern i in library" (for each i and k)
>
> We can discuss all this on Monday when Ruiting and I are back in HK,
> I'm just putting it down in an email while it's fresh in my mind...
>
> -- Ben
>
>
> thanks
> ben
>
>
> --
> Ben Goertzel, PhD
> http://goertzel.org
>
> "I am God! I am nothing, I'm play, I am freedom, I am life. I am the
> boundary, I am the peak." -- Alexander Scriabin
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAJzHpFr3wSmcLJdiaPPfpvnp92EkXZdHGJCL6E91%3DanvY%2BNf5w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: Preliminary results of C++ observe-text

Reply via email to