Re: [opencog-dev] Introducing myself and talking about metamath

Linas Vepstas Tue, 19 Jan 2021 14:21:30 -0800

Hi Luke,

(BTW, my middle name, my "Christian name" is Lukas -- Luke; Linas is my
"pagan name" -- the linen plant (linseed oil, linoleum, linen cloth --
flax) Anyway...

On Sun, Jan 17, 2021 at 9:43 PM Luke Peterson <[email protected]>
wrote:

> Hi Linas,
>
> Please share those PDFs.  お願い  🙏
>

See below. More important is perhaps having a conversation which exposes
the issues, so that we can focus on what's important.

>
> I’ve been searching for a unifying theory that can encompass both formal
> reasoning systems and neural nets for some time, and I suspect you might
> have it.  Or at the very least you’re much closer than I am.
>
> My project (Hippocampus) was/is a value-flow network that could represent
> programs.  Not unlike Atomspace.  I opted for connectivity rules that I
> called “flux attenuation”.  That is to say each linkage could express a
> value between 0 and 1, and collectively the “conductance” of any path
> through the graph could be evaluated using Ohm’s law.  For example, a
> CondLink (in Atomese parlance) could be thought of as a semiconductor with
> the conductance changing depending on the value passing through it.
>

A foundational concern is "what is a network?" and "how do I represent it?"
I take a "network" to be a synonym for a "graph-theoretical graph". There
are many ways of representing a graph - a list of vertices and edges, an
adjacency matrix... questions include: "is it RAM-efficient?" "is it CPU
efficient?" and "is it generic enough to represent generic networks,
ranging from disease-spread networks to mathematical proofs, from abstract
syntax trees to bio-molecular networks?"

 I've settled on something called "germs" or more generally "sheaves" --
this is a single vertex, with attached half-edges. (Amir: I cc'ed you
because this is more-or-less the same thing as a link-grammar "disjunct")
A nice property of these is that they are uniformly composable: some
connected germs still look like a germ; and they resemble elements of
syntactical structures, and... more ... this is exposed in a collection of
PDF that go into details. The word "sheaf" comes from algebraic topology.

Hmm... what order should you read these in? Maybe start with the blog
entry, first...(below)

This PDF talks about RAM and CPU usage:
https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/ram-cpu.lyx

This one talks about tensors:
https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/tensors.pdf
The vectors (of neural nets) are a special case of tensors. But tensors are
more-or-less the same thing as the graphical germs. This is not "deep" in
and of itself, but is widely and wildly under-appreciated; it is the
corner-stone, the bedrock on which the foundations of neural-nets vs.
symbolic representations are laid.

This talks about how seeds/germs can be  used to represent different kinds
of data structures: from lambda expressions, to many other kinds of things:
https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/connectors-and-variables.pdf

The above PDF's are short; 5-10 pages long. Now for the long ones, which
attempt to unify symbolic logic and neural nets. First, a blog entry (its
fairly short):

https://blog.opencog.org/2018/10/21/symbolic-and-neural-nets-two-sides-of-the-same-coin/

Hmm. Actually, that blog entry links to the big PDFs, so just go there.

> I had two reasons for this design choice: 1.) I wanted to be able to mix
> and match subgraphs with predictable results. (first and foremost HC is
> aiming to be a programming language) and 2.) I wanted to be able to apply
> Quasi-Monte-Carlo methods (low-discrepancy sequence sampling, e.g. Halton,
> etc.) to creating a probability-distribution-function, solving an entire
> graph.
>
> But, without the ability to tweak biases, HC networks are pretty much
> untrainable, compared with neural networks.  At least I haven’t been able
> to train them to do much beyond some very simple toy problems.  Maybe I’m a
> bad teacher.
>
> HC allows NNs to be embedded inside HC nodes, e.g. a classifier could use
> a SoftMax to normalize the outputs into attenuation values, but it feels
> like I’m missing something important.
>

OK, so here's a sequence of remarks...

There are a large number of "training" and "learning" algorithms.
Superficially, some of these seem to be very very different than others.
However, if you try to compare them, and ask "what properties do they have
in common?", you gain a lot of insight.  You gain even more if you
disentangle the data representation from the algorithm.  For example,
"sparse matrix factorization" has been the primary problem that Amazon,
Facebook, Google try to solve. The matrix is "consumers" (for rows) and
"product preferences" (for columns) with 10's or 100's of millions of
consumers, and millions of products (in some of the competitions, "movie
reviews" were a stand-in for "products") This can be approached as a
old-school linear algebra problem, where you try to apply some smoothing
functions, compute some eigenvectors, interpolate across missing values,
etc. I'm tempted to call this "boring old linear algebra" and it is
interesting only because a good solution will allow GOOG/AMZN/FB to make
billions in profits by targeting me with the right kind of advertisements.

You can also change perspective. Here's a super-quick sketch. a graph can
be represented as an adjacency matrix. An adjacency matrix is a matrix
whose columns and rows are vertices. The entries are 0 or 1 where 1 means
"there is an edge connecting these two".  If you replace the 0/1 with
fractional values, then you can interpret this as a weighted graph. If the
weights sum to 1.0 you can interpret this as a Markov matrix. Back in the
day when google had only two employees, the grand discovery was that you
didn't have to put the entire matrix into RAM in order to solve it -- the
Page-Brin algorithm solves the Markov matrix problem by paging into RAM
only 0.00001% of the graph at a time.

But the consumer/product-preference matrix is different. Almost all entries
in the matrix are unknown. (its impossible for 10 million consumers to
express preferences for a million products). The matrix factorization is
M=L D R where L and R are sparse matrices, of dimension 1M x 1K, and D is a
dense matrix of 1K by 1K. You can solve D with standard neural-net
techniques.

Curiously, the natural language dictionaries in Link Grammar have the same
structure. Many words have very similar grammatical structure (e.g. "all
nouns" vs "all adjectives") -- put these into L. The product D R encodes
the grammar. If you look at the dictionary, you will spot that D R is also
factorized: R is a collection of "disjuncts" that commonly occur together
(for example <b-minus> or <mv-coord> or <verb-rq-aux> in link-grammar) and
that the "guts" of the English language lie in D -- which is a dense (not
sparse) matrix that interconnects word classes (e.g. "words.n.2.x" which is
one of the elements of L, to <b-minus> and <mv-coord> which are the
elements of R.)

That task of grammar learning is to perform this factorization: find L and
D and R. You can find L and R with Bayesian methods, and D with neural net
methods. Or you can use other algorithms, e.g. the algos from the
consumer-preference movie-ranking competitions.

BTW: Link grammar uses "disjuncts" and "costs" but these are really just
"seeds/germs" from a "sheaf"; the sampling is a very sparse matrix. This is
not a "deep" statement, its "obvious" and "shallow" once you see it, but,
for whatever reason, almost no one ever "sees it", and even then, they
almost never leverage the power behind it. (The power being that you can
move algorithms from one kind of data representation to another.)

OK... here is another change in perspective. When you read the neural-net
papers, the vast majority of them talk about "costine distance" or, like
you, blurt out "SoftMax" without ever thinking about it. I believe this is
a serious error. It is exposed by another change of perspective. Soo..

When one says "linear algebra" or one says "vector" or "matrix", this
immediately implies "Euclidean space". That is because vectors and matrices
have transformational properties (1-tensor, 2-tensor) that are "natural" in
Euclidean space. The cosine product is preserved in euclidean space under
rotations. (its a zero-tensor, its a kind-of-like Casimir invariant)

But who ever said that consumer preferences live in Euclidean space? Where
does it say that euclidean space is the natural setting for neural nets?
Nowhere. Absolutely nowhere.  If you shift focus, and look at the "matrix"
(the 2-tensor) not as a linear-algebra thing, but instead as a weighted
graph, then you can see other possibilities. My favorite is mutual
information (yes, some people are sick of me saying this over and over ...)
if you take a neural net algo which has a cos(theta)=(v dot w)/|v||w| where
v and w are vectors, and replace it by  MI=log[(v dot w) / (v dot va) (w
dot wa)]  (see skippy.pdf for details) you get not a Euclidean space, but a
probability space (simplex). The conserved quantity is not the scalar dot
product, but instead the sum of probabilities. (This is not my discovery;
this is something noted a decade or two ago by a handful of authors; and
has been ignored by 99% of the neural net literature. Ignored, not
rejected; this does not appear to be a conscious decision, but rather just
forgotten/unobserved. Would it improve NN learning? Who knows?)

There are a bunch of these kinds of "changes of perspective", I try to
sketch them in the "skippy.pdf" paper. It's all "green field" development:
fairly obvious direct and immediate possibilities, angles and approaches
that have been overlooked and remain unexplored for whatever reasons.  I
suspect the mainstream researchers are FOMO of other discoveries, that they
are too busy to explore some really quite promising algos. Kind of a
sociology-of-science question. And, just to ding Ben a bit: he keeps
telling me that this is all "obvious and trivial" but if it's so obvious
and trivial, then why has no one investigated any of these avenues yet? Why
aren't people publishing results? Harrumph.

Anyway, this is where I am currently stalled. One issue is that there are
so many different avenues and possibilities that look promising, it's hard
to pick just one.  And each avenue requires a lot of work to explore. It's
hard to do single-handedly; collaborations are more effective.

-- Linas

>
> Thank you.
>
> -Luke
>
>
> > On Jan 17, 2021, at 3:21 PM, Linas Vepstas <[email protected]>
> wrote:
> > ...
> > So I've attempted to build the AtomSpace as a place to store and
> connect-up axioms/sequents/assertions/rules with connections that are
> probabilities/weights/fuzzy-logic values/etc. -- that is, numbers, or
> number-like things: qubits/homogenous spaces/etc.
> >
> > If you study neural networks, you can see that they are densely
> connected networks, with nodes, and almost all weights between almost all
> nodes being non-zero. If you study formal mathematical proofs, you can see
> that they are extremely sparse networks, where every node is connected to
> only 1 to 3 or 4 others, where the weights are exactly true/false/0/1.  If
> you study natural language, and biochemistry and many other natural
> phenomena, you find a scale-free network that is neither dense, nor is it
> sparse, but somewhere in the middle.
> >
> > I am deeply interested in converting time-ordered expressions of that
> network into the underlying structure. (and back). So, by analogy: a
> seismologist, all they have are some time-series recordings of Earth's
> vibrations; from that they try to reconstruct the structure of Earth. I
> have a time series of words, I want to reconstruct the structure of the
> brain that wrote those words.  And, once reconstructed, what else might
> that "brain" have said? Just like the Earth model: what other kinds of
> earthquakes might it produce?
> >
> > I've got half-a-dozen PDFS all 20 to 100 pages long, that spin out each
> of the above paragraphs into great detail. I think they're important, but I
> can't get anyone to read them :-) So it goes...
>
>

-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA3447JWDCGBM%2Bzzf%3DL8OxQRHY5Gv54wva1r-7i7DHZzPjA%40mail.gmail.com.

Re: [opencog-dev] Introducing myself and talking about metamath

Reply via email to