Re: [opencog-dev] Re: Atomspace & Pattern Matcher on Graphcore IPU?

Linas Vepstas Tue, 16 Apr 2019 14:01:08 -0700

On Tue, Apr 16, 2019 at 12:02 AM Nil Geisweiller <[email protected]>
wrote:

> On 4/16/19 6:52 AM, Linas Vepstas wrote:
> > What about PLN? Well, today's PLN, built on the pattern matcher, will
> > run thousands of CPU cycles and then do a small handful of float-point
> > ops.

I'm making several "meta" claims, which perhaps I should be more specific
about. First claim:

* The reason that deep-learning has been so effective and successful is
that they found a way of avoiding 'useless calculations'. This is by a
combination of two tricks: backpropagation and dimensional reduction.

* Backpropagation means that an order of magnitude (factors of N or NlogN
or N^2) of useless float-point computations are eliminated and only the
useful, non-redundant calculations are are kept.

* Dimensional reduction is the weak spot, the Achilles heel of NN's. It
reduces the problem space to a size where results can be obtained
relatively quickly; however, the reduced problem space is too small for
human-level reasoning and language.

* Using highly-sparse matrix math instead of dimensional reduction
preserves everything important, while eliminating the weaknesses of
dimensional reduction. The calculation-space remains small, but the
representation space remains large/huge. NN's have small
computation-spaces (that's good, it makes them fast) but also small
representation spaces (really bad, it destroys structure).

* I'm concerned that the current PLN architecture, of performing complex
graph searches (using integer pattern matching code) followed by infrequent
numeric (float-point) work is very inefficient. That is, when comparing to
NN, the search-and-traversal is a waste of time and effort, and only the
float-point computations matter (are meaningful). So I'm wondering if the
PLN algo can be reformulated to be more backpropagation-like, which means
the math becomes a kind of "inner loop", while the graph-traversal parts of
it become the "outer loop". This last sentence is extremely imprecise: it
is meant to be inspirational, not practical.

To rephrase: PLN (and symbolic-AI approaches in general) preserve the
large/huge representation space. That's good. Algorithmically, my gut
intuition is that traditional symbolic-AI algos (such as PLN) are extremely
CPU-inefficient. (spend too much time graph-searching, graph-traversing,
and not enough time actually computing things (i.e. multiplying and adding
floats))

The attempts with the ultra-super-sparse matrix code is to retain the giant
representational spaces of symbolic AI, while minimizing the graph-search
efforts. This is done by working with a single, small, simple fixed graph.
i.e. by making multiply-add the inner loop, where the graph is held fixed,
and make the outer loop be the exploration of different graph shapes, of
how bigger graphs are assembled from smaller parts. I think I have a
fairly clear conception of how this works for language learning; I do not
yet have an equivalent conception for reasoning. However, its important to
obtain this, and for more reasons than one: I think its a better
theoretical foundation, but also, it's exactly what IPU-style machines are
very efficient at computing.

The "sheaves" paper is trying to explain exactly how to exchange the inner
and outer loops. viz graphs are the slowly-changing things, the
combinations of which sit in the outer loop (classical symbolic AI), and
the weights/probabilities are rapidly updated in the inner loop (which
works because everything looks regular, uniform at the local level, viz.
just a vertex, and its nearest neighbors, all look "the same", simply
because they are small, simple, tiny. Work with these small, tiny
components, *before* these are all joined up to form some mega-graph. This
is the same trick that NN deep learning is using, except that existing NN
deep learning algos also collapse/blur/average away the large scale
structure (which is fundamentally wrong). NN's do this because they do not
know how to identify and manage that large-scale structure. They're blind
to it. The weight-vector projections in mash together, average together
any/all relationships that are more than two nearest-neighbor distances
apart.

-- Linas

--
cassette tapes - analog TV - film cameras - you

--
You received this message because you are subscribed to the Google Groups
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit
https://groups.google.com/d/msgid/opencog/CAHrUA35Wbq7CAR5U5ct8RoFnZKx-Xr6tOueK3v6CYCUOqBbNmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [opencog-dev] Re: Atomspace & Pattern Matcher on Graphcore IPU?

Reply via email to