Hi Luke, (BTW, my middle name, my "Christian name" is Lukas -- Luke; Linas is my "pagan name" -- the linen plant (linseed oil, linoleum, linen cloth -- flax) Anyway...
On Sun, Jan 17, 2021 at 9:43 PM Luke Peterson <[email protected]> wrote: > Hi Linas, > > Please share those PDFs. お願い 🙏 > See below. More important is perhaps having a conversation which exposes the issues, so that we can focus on what's important. > > I’ve been searching for a unifying theory that can encompass both formal > reasoning systems and neural nets for some time, and I suspect you might > have it. Or at the very least you’re much closer than I am. > > My project (Hippocampus) was/is a value-flow network that could represent > programs. Not unlike Atomspace. I opted for connectivity rules that I > called “flux attenuation”. That is to say each linkage could express a > value between 0 and 1, and collectively the “conductance” of any path > through the graph could be evaluated using Ohm’s law. For example, a > CondLink (in Atomese parlance) could be thought of as a semiconductor with > the conductance changing depending on the value passing through it. > A foundational concern is "what is a network?" and "how do I represent it?" I take a "network" to be a synonym for a "graph-theoretical graph". There are many ways of representing a graph - a list of vertices and edges, an adjacency matrix... questions include: "is it RAM-efficient?" "is it CPU efficient?" and "is it generic enough to represent generic networks, ranging from disease-spread networks to mathematical proofs, from abstract syntax trees to bio-molecular networks?" I've settled on something called "germs" or more generally "sheaves" -- this is a single vertex, with attached half-edges. (Amir: I cc'ed you because this is more-or-less the same thing as a link-grammar "disjunct") A nice property of these is that they are uniformly composable: some connected germs still look like a germ; and they resemble elements of syntactical structures, and... more ... this is exposed in a collection of PDF that go into details. The word "sheaf" comes from algebraic topology. Hmm... what order should you read these in? Maybe start with the blog entry, first...(below) This PDF talks about RAM and CPU usage: https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/ram-cpu.lyx This one talks about tensors: https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/tensors.pdf The vectors (of neural nets) are a special case of tensors. But tensors are more-or-less the same thing as the graphical germs. This is not "deep" in and of itself, but is widely and wildly under-appreciated; it is the corner-stone, the bedrock on which the foundations of neural-nets vs. symbolic representations are laid. This talks about how seeds/germs can be used to represent different kinds of data structures: from lambda expressions, to many other kinds of things: https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/connectors-and-variables.pdf The above PDF's are short; 5-10 pages long. Now for the long ones, which attempt to unify symbolic logic and neural nets. First, a blog entry (its fairly short): https://blog.opencog.org/2018/10/21/symbolic-and-neural-nets-two-sides-of-the-same-coin/ Hmm. Actually, that blog entry links to the big PDFs, so just go there. > I had two reasons for this design choice: 1.) I wanted to be able to mix > and match subgraphs with predictable results. (first and foremost HC is > aiming to be a programming language) and 2.) I wanted to be able to apply > Quasi-Monte-Carlo methods (low-discrepancy sequence sampling, e.g. Halton, > etc.) to creating a probability-distribution-function, solving an entire > graph. > > But, without the ability to tweak biases, HC networks are pretty much > untrainable, compared with neural networks. At least I haven’t been able > to train them to do much beyond some very simple toy problems. Maybe I’m a > bad teacher. > > HC allows NNs to be embedded inside HC nodes, e.g. a classifier could use > a SoftMax to normalize the outputs into attenuation values, but it feels > like I’m missing something important. > OK, so here's a sequence of remarks... There are a large number of "training" and "learning" algorithms. Superficially, some of these seem to be very very different than others. However, if you try to compare them, and ask "what properties do they have in common?", you gain a lot of insight. You gain even more if you disentangle the data representation from the algorithm. For example, "sparse matrix factorization" has been the primary problem that Amazon, Facebook, Google try to solve. The matrix is "consumers" (for rows) and "product preferences" (for columns) with 10's or 100's of millions of consumers, and millions of products (in some of the competitions, "movie reviews" were a stand-in for "products") This can be approached as a old-school linear algebra problem, where you try to apply some smoothing functions, compute some eigenvectors, interpolate across missing values, etc. I'm tempted to call this "boring old linear algebra" and it is interesting only because a good solution will allow GOOG/AMZN/FB to make billions in profits by targeting me with the right kind of advertisements. You can also change perspective. Here's a super-quick sketch. a graph can be represented as an adjacency matrix. An adjacency matrix is a matrix whose columns and rows are vertices. The entries are 0 or 1 where 1 means "there is an edge connecting these two". If you replace the 0/1 with fractional values, then you can interpret this as a weighted graph. If the weights sum to 1.0 you can interpret this as a Markov matrix. Back in the day when google had only two employees, the grand discovery was that you didn't have to put the entire matrix into RAM in order to solve it -- the Page-Brin algorithm solves the Markov matrix problem by paging into RAM only 0.00001% of the graph at a time. But the consumer/product-preference matrix is different. Almost all entries in the matrix are unknown. (its impossible for 10 million consumers to express preferences for a million products). The matrix factorization is M=L D R where L and R are sparse matrices, of dimension 1M x 1K, and D is a dense matrix of 1K by 1K. You can solve D with standard neural-net techniques. Curiously, the natural language dictionaries in Link Grammar have the same structure. Many words have very similar grammatical structure (e.g. "all nouns" vs "all adjectives") -- put these into L. The product D R encodes the grammar. If you look at the dictionary, you will spot that D R is also factorized: R is a collection of "disjuncts" that commonly occur together (for example <b-minus> or <mv-coord> or <verb-rq-aux> in link-grammar) and that the "guts" of the English language lie in D -- which is a dense (not sparse) matrix that interconnects word classes (e.g. "words.n.2.x" which is one of the elements of L, to <b-minus> and <mv-coord> which are the elements of R.) That task of grammar learning is to perform this factorization: find L and D and R. You can find L and R with Bayesian methods, and D with neural net methods. Or you can use other algorithms, e.g. the algos from the consumer-preference movie-ranking competitions. BTW: Link grammar uses "disjuncts" and "costs" but these are really just "seeds/germs" from a "sheaf"; the sampling is a very sparse matrix. This is not a "deep" statement, its "obvious" and "shallow" once you see it, but, for whatever reason, almost no one ever "sees it", and even then, they almost never leverage the power behind it. (The power being that you can move algorithms from one kind of data representation to another.) OK... here is another change in perspective. When you read the neural-net papers, the vast majority of them talk about "costine distance" or, like you, blurt out "SoftMax" without ever thinking about it. I believe this is a serious error. It is exposed by another change of perspective. Soo.. When one says "linear algebra" or one says "vector" or "matrix", this immediately implies "Euclidean space". That is because vectors and matrices have transformational properties (1-tensor, 2-tensor) that are "natural" in Euclidean space. The cosine product is preserved in euclidean space under rotations. (its a zero-tensor, its a kind-of-like Casimir invariant) But who ever said that consumer preferences live in Euclidean space? Where does it say that euclidean space is the natural setting for neural nets? Nowhere. Absolutely nowhere. If you shift focus, and look at the "matrix" (the 2-tensor) not as a linear-algebra thing, but instead as a weighted graph, then you can see other possibilities. My favorite is mutual information (yes, some people are sick of me saying this over and over ...) if you take a neural net algo which has a cos(theta)=(v dot w)/|v||w| where v and w are vectors, and replace it by MI=log[(v dot w) / (v dot va) (w dot wa)] (see skippy.pdf for details) you get not a Euclidean space, but a probability space (simplex). The conserved quantity is not the scalar dot product, but instead the sum of probabilities. (This is not my discovery; this is something noted a decade or two ago by a handful of authors; and has been ignored by 99% of the neural net literature. Ignored, not rejected; this does not appear to be a conscious decision, but rather just forgotten/unobserved. Would it improve NN learning? Who knows?) There are a bunch of these kinds of "changes of perspective", I try to sketch them in the "skippy.pdf" paper. It's all "green field" development: fairly obvious direct and immediate possibilities, angles and approaches that have been overlooked and remain unexplored for whatever reasons. I suspect the mainstream researchers are FOMO of other discoveries, that they are too busy to explore some really quite promising algos. Kind of a sociology-of-science question. And, just to ding Ben a bit: he keeps telling me that this is all "obvious and trivial" but if it's so obvious and trivial, then why has no one investigated any of these avenues yet? Why aren't people publishing results? Harrumph. Anyway, this is where I am currently stalled. One issue is that there are so many different avenues and possibilities that look promising, it's hard to pick just one. And each avenue requires a lot of work to explore. It's hard to do single-handedly; collaborations are more effective. -- Linas > > Thank you. > > -Luke > > > > On Jan 17, 2021, at 3:21 PM, Linas Vepstas <[email protected]> > wrote: > > ... > > So I've attempted to build the AtomSpace as a place to store and > connect-up axioms/sequents/assertions/rules with connections that are > probabilities/weights/fuzzy-logic values/etc. -- that is, numbers, or > number-like things: qubits/homogenous spaces/etc. > > > > If you study neural networks, you can see that they are densely > connected networks, with nodes, and almost all weights between almost all > nodes being non-zero. If you study formal mathematical proofs, you can see > that they are extremely sparse networks, where every node is connected to > only 1 to 3 or 4 others, where the weights are exactly true/false/0/1. If > you study natural language, and biochemistry and many other natural > phenomena, you find a scale-free network that is neither dense, nor is it > sparse, but somewhere in the middle. > > > > I am deeply interested in converting time-ordered expressions of that > network into the underlying structure. (and back). So, by analogy: a > seismologist, all they have are some time-series recordings of Earth's > vibrations; from that they try to reconstruct the structure of Earth. I > have a time series of words, I want to reconstruct the structure of the > brain that wrote those words. And, once reconstructed, what else might > that "brain" have said? Just like the Earth model: what other kinds of > earthquakes might it produce? > > > > I've got half-a-dozen PDFS all 20 to 100 pages long, that spin out each > of the above paragraphs into great detail. I think they're important, but I > can't get anyone to read them :-) So it goes... > > -- Patrick: Are they laughing at us? Sponge Bob: No, Patrick, they are laughing next to us. -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA3447JWDCGBM%2Bzzf%3DL8OxQRHY5Gv54wva1r-7i7DHZzPjA%40mail.gmail.com.
