[opencog-dev] Re: Preliminary results of C++ observe-text

Ben Goertzel Thu, 08 Jun 2017 19:24:27 -0700

Hi Curtis,

> But I can easily adjust the algorithm so it doesn't output pairs.that are
> further than some N distance, if you desire.

I am pretty sure we are going to want to experiment with that....
What will make sense is to do some experiments on a modest-sized
corpus to try to find a methodology that "seems to work", and then
repeat that methodology on a larger corpus... So as part of the
initial experimentation phase, we will probably want to try the "N
distance limitation" with a few values of N...

Also to note is that, for later phases in the process, we will want to
load "tagged sentences" rather than just sentences. In a tagged
sentence, each word will be associated with one or more category
labels. [But I don't envision more than, say, 10 labels associated
with an individual word... often it will be 1 or 2] We will then
want to update counts based on category labels as well as words.

(The mechanism described in the previous paragraph is relevant for
workflow in which one is doing the statistical learning process for
clustering/disambiguation partly outside the Atomspace, which is what
Ruiting and I aim to try first.... It's not relevant if one is doing
this statistical learning process within the Atomspace, which is what
Linas is about to try...)

So a couple next steps will be

1) make code to export sparse feature vectors for word, where the
vector for word W has two entry for each other word V: one entry for V
on the left of W, and one entry for V on the right of W. For
instance: The entry in W's feature vector corresponding to "V on the
left of W" is based on the total weight of links pointing from V to W
(with V on the left) in the spanning-tree parses that Linas's code
finds...

2) try to run Shujing's pattern miner on the collection of MST parses
in the Atomspace. This typically requires some fiddling with the
templates and parameters for the pattern miner ... Shujing is good at
giving practical guidance on this, Bitseat and Tensae are not bad at
it either by this point...

The patterns mined by the pattern miner can be used to make more
sophisticated feature vectors to export, which can be tried as an
alternative to the simpler ones described in step (1) above. In
these more sophisticated feature vectors, a library of significant
patterns in the collection of MST parses will be found, and W's
feature vector will have an entry corresponding to "W occurs in
position k of pattern i in library" (for each i and k)

We can discuss all this on Monday when Ruiting and I are back in HK,
I'm just putting it down in an email while it's fresh in my mind...

-- Ben

thanks
ben

--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

--
You received this message because you are subscribed to the Google Groups
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit
https://groups.google.com/d/msgid/opencog/CACYTDBd4%2BipcsdSpjDrUT2k8BfTW%3DbCbNXc92Nd8FP_riHHKsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: Preliminary results of C++ observe-text

Reply via email to