Alexey, Nil, Zar, Linas, others... **** Ah, I have some new thoughts on the "theorem proving + AGI + parsing" side, but will have to wait to type them in till I get a little time at the computer, I'm traveling between meetings and conferences in Europe just now... ****
And here we go... GENERAL BLATHER About the general issue of what role logic and theorem-proving have to play in AGI, obviously there is room for different opinions on this…. My own view, as most of you know, is 1) There are going to be many viable routes to human-level and superhuman AGI, not just one 2) A route with a large role for probabilistic-logic theorem-proving is one viable route 3) Advantages of a route of type 2) would seem to be: 3a) A system that can make good use of current computing hardware which is logic-based (whereas e.g. a neural net based architecture would make better use of analog computing hardware) 3b) A system that should be good at scientific and engineering thinking, and also good at meta-thinking about how to improve its own code … i.e. a good candidate to figure out how to cure aging, create nanotech and femtotech, launch the Singularity, etc. 3c) A system that should be coherent in maintaining its goals over time, as compared to a system operating more similarly to mammals … Anyway that is my view which is the result of a lot of thought from diverse directions, but of course none of us knows for sure how to create advanced AGI yet so disagreement is understandable.. I note also that since, by Curry-Howard correspondence, programs and proofs are equivalent, the statement that AGI can be founded centrally on theorem-proving is equivalent to the statement that AGI can be founded centrally on program-execution. The statement that AGI can be founded centrally on probabilistic logic theorem proving, is equivalent to the statement that AGI can be founded centrally on probabilistic programming. The difference between a theorem proving based AI and a program learning based AI is merely an “implementation detail” ;-) … Now more interestingly let me explore some specifics I have been thinking about lately! CONNECTOR THEOREM PROVING First of all, Linas, if you haven’t seen it before you will enjoy the diagrams in http://www.scholarpedia.org/article/Connection_method which are explained in more detail in “A Vision for Automated Deduction Rooted in the Connection Method” (W. Bibel0 https://www.researchgate.net/publication/318226306_A_Vision_for_Automated_Deduction_Rooted_in_the_Connection_Method?enrichId=rgreq-e84163672cff86aca93bd93eac91d998-XXX&enrichSource=Y292ZXJQYWdlOzMxODIyNjMwNjtBUzo1NDg5MjU5MjY4OTU2MTZAMTUwNzg4NTU0NzA0NQ%3D%3D&el=1_x_3&_esc=publicationCoverPdf and in even more detail in the book Wolfgang Bibel: Automated Theorem Proving. Vieweg Verlag, Wiesbaden, 293 pp. (1982); 2. Edition 289 pp, 1987. which I found online in pdf via sci-hub … This is connection-based theorem proving, in which a proof of a theorem is constructed by drawing connections (links) between terms in the theorem, in such a way that each link joins two instances that involve the same predicate (but one in negated form and one in non-negated form, where “negated vs. non-negated” is assessed based on whether an instance would be negated in a DNF normalized version of the theorem). A proof is a bunch of links that form a set of complete paths thru the theorem (so every instance is along some path leading to the final instance of some term in the theorem); and such that there is some unification of all the terms linked, that plays nicely with equating each linked pair of instances. A complication is that sometimes there need to be a couple links emanating from a given instance, i.e. an instance in the theorem may get used twice, three times, etc. (This counting of usages would provide an obvious connection with linear logic, which may or may not be useful…) This general concept has been around since the 1980s, but recently has been used within some highly effective theorem-provers, i.e. leanCop http://www.leancop.de and its descendants. These provers are interesting, among other reasons, because they consist of extremely concise Prolog code, yet they work nearly as well as the state-of-the-art theorem-provers that are much more complex as code, and have been pretty heavily optimized... Leancop operates on theorems in disjunctive normal form. However, a variant called nanocop http://www.leancop.de/nanocop/ operates on non-normalized theorems, using a straightforward and elegant variant of the connection-based method given here http://www.jens-otten.de/papers/concalc_tab11.pdf Also, randocop http://ceur-ws.org/Vol-373/paper-08.pdf is like leancop but with a more efficient search algorithm. Leancop and nanocop basically use chaining based search involving two operators, “extension” and “retraction.” XXX … What randocop does, is considers many randomizations of the order of the clauses in the DNF version of the theorem, and does the chaining search on each of these. This ends up working faster. (This is not surprising, it’s in line with many other sorts of results in the search space, showing that doing a shallow search from a lot of randomly selected starting points can often be better than doing a deep search from a single starting point…) SAT SOLVERS FOR CONNECTOR THEOREM PROVING As a semi-aside here, one purely algorithmic thought I had when reading all this is that the search+unification here can probably be done way faster using an SAT solver, at least for all but the shortest proofs. This is vaguely similar to how with the link parser it's way faster for long sentences to use SAT for parsing than a standard NL parsing algorithm... Use of SAT for theorem-proving has been tried already in a slightly different context, see https://www.lsi.upc.edu/~oliveras/espai/smtSlides/lynch.pdf There are also various other tricky ways to code e.g. the unification part as an SAT problem, https://pdfs.semanticscholar.org/59ad/1b79d7a99c9330a494787deb8d6d3020d376.pdf (that paper's not about SAT it's about unification using neural nets, but the part where they code unification as constraint satisfaction would apply to SAT as well as to NNs) So bottom line is, there are lots of ways to play with the encoding but one could use SAT or SMT here as a substitute or augmentation to chaining. To use SAT here you’d presumably have to set a bound on how many times each instance can be re-used. Then you could increase the bound incrementally, trying the SAT solver again for the constraints generated with each new bound. Though the SAT solver would want to work on the normalized version of the theorem, one could use information from the grouping of instances in the non-normalized version to guide the search inside the SAT solver. CONNECTORS, PROOF SKETCHES AND AGI Now why do I like this from an AGI perspective? Because it lets one do higher-level, abstract probabilistic reasoning just by reasoning about the links (and setting the unification aside). To see how this might work consider that a “proof sketch” could be written in connector form as: A set of connectors that is not complete, and skips some steps. For instance, in a proof sketch one might have a link skipping from an instance of P in clause 1 to an instance of P in clause 10, bypassing instances of P in clauses 3 and 5. Or, in a proof sketch, one might have an instance of P used for the second time, but not for the first time (leaving it open when it will be used for the first time). Or one might have an instance of P used for the k’th time in one link, and used for the m’th time in another link, with the constraint that k<m but no commitment made about the values of k and m. In general a proof sketch, then, is a set of proofs sharing some common elements. If one is doing probabilistic induction or abduction across a bunch of connector proofs, one is going to learn a lot about which connections and which combinations of connections occur in which kind of contexts within which kinds of proofs. This learning will naturally lend itself to various conjectural proof sketches for newly presented theorems. The process of filling in a proof sketch to get a proof can be confronted just like the problem of proving a theorem de novo - by chaining or by SAT/SMT or some combination thereof. SYNTACTIC LINKS AND LOGICAL CONNECTORS As a minor aside, given the loose analogy between link parsing and connector proofs, it’s also interesting to look at how the links in a sentence’s link parse relate to the connectors in that same sentence’s logical interpretation. The links in a sentence’s link parse become atomic relationships Li(w_a, w_b) where w_a and w_b are word-instances and Li is a link of type i. The logical interpretation of a sentence then involves a bunch of implications such as L1(w_1, w_3) & L5(w_3,w_7) & L9(w_1,w_7) ==> P4(w_1,w_7) & P8(w_7, w_3) <p> where the Pi are logical relationships and <p> is a probability value. Sometimes there may be complex quantifier relations on the right hand side of the implication. The overall semantic interpretation of a sentence, then looks like an implication of the form (conjunction of all the syntactic links found in the sentence) ==> (logical formula involving conjunctions and disjunctions and negations of logical relationships P_i between logical terms corresponding to the word-instances in the sentence) where, in the logical formula on the right hand side, each term may be tagged with a probability value. The existence of a syntactic link between two word-instances in a sentence, has the effect of causing the logical terms corresponding to the word-instances to be grouped into (one or more of) the same probability-tagged conjunctions of logical relationships in the logical interpretation of the sentence. If the sentence describes N different situations (each one with different observation-instances providing groundings for the word-instances) S1, S2, …, S_N , then a proof of the validity of the logical interpretation of the sentence as a model of the situations, is produced by drawing connectors from the observation-instances to the logical terms… On Tue, May 15, 2018 at 6:43 AM, Ben Goertzel <[email protected]> wrote: > Ah, I have some new thoughts on the "theorem proving + AGI + parsing" > side, but will have to wait to type them in till I get a little time > at the computer, I'm traveling between meetings and conferences in > Europe just now... > > > > On Tue, May 15, 2018 at 12:44 AM, Linas Vepstas <[email protected]> > wrote: >> >> >> On Mon, May 14, 2018 at 4:54 PM, Alexey Potapov <[email protected]> wrote: >>> >>> Hi Linas, >>> this is quite an interesting discussion, and I believe we should involve >>> Ben and others in it. >> >> >> Ben has been involved in this discussion for a decade; I think he knows the >> general outline, even as we argue violently about the details. Anyway, this >> is why I say "we should discuss on the mailing list", a dictum that I have >> been violating. >> >>> >>> Actually, I have some concerns regarding too strong focus on the formal >>> logic and automatic theorem proving in OpenCog since I'm not completely >>> convinced of its fundamental role in AGI... ;) >> >> >> Yes, I agree. I am very much trying to pursue a probabilistic approach. The >> question is then "probabilities on what?" and I have an answer that I like, >> but I have trouble sharing that answer in a way that is understandable. Let >> me try to sketch that, in this context. >> >> You know probabilistic programming very very much better than I, so please >> correct my mis-steps. In probabilistic programming, one essentially has >> lambda-expressions, and is assigning probabilities to them. For example: >> what is the probability of taking this branch (this if-statement) or not? >> The probability of running this loop N times? the probability of calling >> this routine? The probability of making this list/array longer or shorter? >> This might be done with either custom-built probabilistic programming >> languages, or with modified versions of C or Lisp. At any rate, these can >> be abstractly understood as being lambda-expressions, with probabilities >> assigned to them. >> >> However, as I exhort below, lambdas are closely tied to cartesian products, >> and to classical logic, and we have plenty of evidence that suggests that >> this is inappropriate for natural, human intelligence. So what is the right >> thing? To keep a long story short, for me, it is this concept of "disjoined >> connector sets" (aka "jigsaw-puzzle pieces"), and I attach probabilities to >> those. More generally, more vaguely, the structures are "patterns", this is >> what the pattern miner looks for. There are various theoretical arguments >> that show that these are sufficient for general purposes. For example, >> theorem-proving is like assembling jigsaw-puzzle pieces, where he >> puzzle-pieces are rules of inference; this is why theorem-proving is like >> parsing. However, general knowledge representation is also like this: >> solving Sherlock Holmes mysteries has long been said to be like "solving a >> jigsaw puzzle": well, except now, we have the mathematical formalism to make >> this statement precise. I think much of what neural nets and deep learning >> do also fits into this general framework; I want to write a paper on this, >> but have not had the time yet. >> >> So, I am not really into formal logic, per-se; rather, I am trying to amass >> evidence why probabilistic patterns are the correct approach, while >> probabilistic lambdas are too stringent, too constrained, too tight to be >> useful. One has to assign probabilities to the patterns, the arrangements, >> and NOT to the individual components of the pattern. (Also, I keep saying >> "probability" when I actually mean "mutual information" or maybe >> "surprisingness", as the case may be. Just as lambda is not quite right, I >> get the feeling that probability is not quite right, and that talking about >> mutual information and surprisingness is more correct, than talking about >> probability. But this intuition remains difficult to explain.). >> >> -- Linas >> >>> >>> Best regards, >>> Alexey. >>> >>> 2018-05-15 0:13 GMT+03:00 Linas Vepstas <[email protected]>: >>>> >>>> Alexey, Konstantin, >>>> >>>> On Mon, May 14, 2018 at 2:41 PM, Константин Тимофеев >>>> <[email protected]> wrote: >>>>> >>>>> Nil, Linas, thanks for the explanations. It seems like I have to learn >>>>> some new things before I could understand clearly what is the issue behind >>>>> this discussion. Hopefully it will not take too much time. >>>> >>>> >>>> If it doesn't take much time, then you are a super-human genius working >>>> in the wrong field. To understand the issue, you'd have to read a book or >>>> two on proof theory, along with a large rainbow of related topics. Its >>>> taken >>>> me ten years to understand what the problem is. I can't magically transfer >>>> this knowledge, but I can tell you what keywords to search for and read on. >>>> Besides proof theory and model theory, anything that describes linear logic >>>> should make clear what the issue with cartesian products is (in proof >>>> theory, it corresponds to the rules of weakening and contraction, which are >>>> assumed by the lambda abstraction but are forbidden by tensor products.) >>>> To >>>> understand why products in linguistics are tensor products and not >>>> cartesian >>>> products, look at the wikipedia article on "pregroup grammar". The >>>> earliest >>>> reference that I know of that discusses this clearly is Marcus Solomon >>>> "Algebraic Linguistics" (1967) which your compatriots in Novosibirsk sent >>>> me. The categorial grammars there are discusses in pages 90-120, which you >>>> can match up to what the wikipedia pregroup grammar article states. >>>> >>>> Last time I said that "theorem proving is like parsing", I said it to >>>> Ben's son Zar, who is studying theorem proving, and his response was "yeah, >>>> so what, everyone knows that". If you already know that, good. My point >>>> is >>>> that since we know that theorem proving is incompatible with lambda, we >>>> should stop using lambda so much. To me, this still remains a deep, >>>> important insight; I'm trying to use it to guide the design of atomese. I >>>> can't imagine that you will understand in only a few months. A few years, >>>> maybe. I'm just saying, now is the time to start reading about it. >>>> >>>> -- Linas >>>>> >>>>> >>>>> >>>>> On Mon, May 14, 2018 at 8:45 PM, Linas Vepstas <[email protected]> >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Mon, May 14, 2018 at 12:54 AM, Nil Geisweiller >>>>>> <[email protected]> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> On 05/11/2018 11:36 PM, Linas Vepstas wrote: >>>>>>>> >>>>>>>> light sprinkling of lambda calculus on top. Nil keeps putting in too >>>>>>>> much lambda calculus, I can't seem to get him to stop. It needs more >>>>>>>> prolog, I keep telling him. >>>>>>> >>>>>>> >>>>>>> But >>>>>> >>>>>> >>>>>> I meant that to be funny. Like saying "it needs more pepper, no, it >>>>>> needs more salt". >>>>>> >>>>>>> >>>>>>> the left hand side term of a prolog fact shares the same goodies I'm >>>>>>> after when I use a LambdaLink. It exposes a list of variables, in a >>>>>>> determined order, abstracting the body away. >>>>>> >>>>>> >>>>>> But Alexey and Konstantin have not heard this argument before, so let >>>>>> me sketch it briefly. >>>>>> >>>>>> * Yes, lambdas provide abstraction. They're useful. But ... >>>>>> * lambda calc is old, it was originally developed in the view of >>>>>> Hilbert's program, of providing a foundational basis for classical logic. >>>>>> * prolog is new, developed for knowledge representation and theorem >>>>>> proving, that is, non-classical logic. >>>>>> * lambda calc is the internal language of cartesian, closed categories: >>>>>> viz anything with lists, pairs, cartesian products. lambda calc >>>>>> corresponds >>>>>> to classical logic. >>>>>> >>>>>> * by contrast, theorem proving is known to NOT use classical logic; the >>>>>> logic of theorem proving is intuitionist logic, and more generally, the >>>>>> logic of Kripke frames. Insofar as the rule engine and PLN is a >>>>>> kind-of-like >>>>>> theorem prover, it should be based on the principles given in books on >>>>>> proof >>>>>> theory, and NOT on classical logic! People who work on proof theory have >>>>>> made a lot of advancements since the 1930's. We should know of, >>>>>> understand, >>>>>> and employ those concepts. Proof theory is a big deal. We should not >>>>>> ignore >>>>>> it. >>>>>> >>>>>> * There are other things that classical logic fails to describe, >>>>>> besides proof theory: natural language and biochemistry. Both of these >>>>>> appear to be described by intuitionistic logic, or, more properly, by >>>>>> fragments of linear logic. The difference is that linear logic throws >>>>>> away >>>>>> the notion of pairs, lists, cartesian products and lambdas: lambda's >>>>>> cannot >>>>>> be used to describe language or biochemistry (In physics, the >>>>>> failure/incorrectness of lambda is "well-known", and is called the >>>>>> "no-cloning theorem". The fundamental reason for this is that tensor >>>>>> products violate the assumption of the existence of pairs, lists, >>>>>> cartesian >>>>>> products. Tensor products are incompatible with cartesian products. >>>>>> Insofar >>>>>> as the lambda abstracts the cartesian product, it too must be discarded >>>>>> from >>>>>> the theory, as being incompatible. Any theory that has tensor products >>>>>> cannot have lambdas in it. They can't co-exist without contradiction.) >>>>>> >>>>>> * Insofar as we work with language, learning and theorem proving, where >>>>>> there are "well-known" theorems that prove that lambda calc is >>>>>> incompatible, >>>>>> we should perhaps stop focusing on it so much. Lambda calc has it's >>>>>> place, >>>>>> -- lambda is great for abstraction, but not everything can be abstracted >>>>>> in >>>>>> that way. Most notably (for me), the link-types of link-grammar cannot >>>>>> be >>>>>> abstracted by lambdas, because the disjuncts of link-grammar >>>>>> (alternately, >>>>>> the VP, NP of phrase-structure grammars) look like tensor products, and >>>>>> not >>>>>> link cartesian products. >>>>>> >>>>>> For these reasons, I think we are over-using lambdas. They offer a >>>>>> deeply-incorrect, basically-broken view of the world we are interested >>>>>> in: >>>>>> proof theory, linguistics, knowledge, biochemistry. >>>>>> >>>>>>> >>>>>>> >>>>>>> I know that FreeLink automatically digs up the variables, but it has >>>>>>> other problems >>>>>>> >>>>>>> 1. The variable order depends on their appearances in the body. Even >>>>>>> worse, this may depend on the variable names (see >>>>>>> https://github.com/opencog/atomspace/issues/1677). >>>>>> >>>>>> >>>>>> Yes, that is the definition of a "free variable". Do you want to >>>>>> propose some kind of semi-free variable? >>>>>>> >>>>>>> >>>>>>> 2. There's no way to tell FreeLink to escape some variable (unless a >>>>>>> QuoteLink is used). >>>>>> >>>>>> >>>>>> Yeah, quote links are awful. I think a very very high priority is to >>>>>> figure out how to do the things we need to do, without having to use >>>>>> quotes. >>>>>> I'm not sure, but the need for quotes seem to be a side-effect of using >>>>>> lambdas or something like that. I don't understand. In all the work I >>>>>> do, >>>>>> with link-grammar, with language learning, quotes are never needed; the >>>>>> concept of lambda abstraction is replaced by connectors and disjuncts; >>>>>> and >>>>>> these never need to be quoted. Actually, I can't even imagine how one >>>>>> could >>>>>> "quote" a connector or disjunct, it doesn't make sense, conceptually. I >>>>>> would have to think hard about that. >>>>>> >>>>>> (Well, there is also no concept of "variables", either. The roll of >>>>>> variables are taken up by connectors, which are kind-of-like typed free >>>>>> variables. Sort of. Again, this requires deep thought.). >>>>>> >>>>>> Upshot: maybe if we used lambdas less, then we would not need quotes. >>>>>> Maybe they are partners in crime. >>>>>> >>>>>>> >>>>>>> 3. Variables cannot be typed. >>>>>> >>>>>> >>>>>> If they were typed, they would not be free, right? Maybe there is a >>>>>> way to have typed free variables, I'm not sure; its certainly not >>>>>> obvious; >>>>>> but I can also see how it might be possible. You'd have to design it >>>>>> carefully, to avoid contradictions. >>>>>> >>>>>>> >>>>>>> >>>>>>> For these 3 reasons I generally stay away from FreeLink, even though >>>>>>> I'm sure it has good uses, if one is careful. >>>>>> >>>>>> >>>>>> !? I don't recall ever suggesting that we should use FreeLink for >>>>>> anything other than what it is currently used for. it provides some basic >>>>>> utilities for locating variables in an expression, but that's it. I >>>>>> don't >>>>>> understand why you even brought it up... ?? >>>>>> >>>>>> --linas >>>>>> >>>>>> -- >>>>>> cassette tapes - analog TV - film cameras - you >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> cassette tapes - analog TV - film cameras - you >>> >>> >> >> >> >> -- >> cassette tapes - analog TV - film cameras - you >> >> -- >> You received this message because you are subscribed to the Google Groups >> "opencog" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/opencog. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/opencog/CAHrUA356-6JTNTp1Txct21ZTuBYXwDs6bdMcr781EusGijVT3w%40mail.gmail.com. >> For more options, visit https://groups.google.com/d/optout. > > > > -- > Ben Goertzel, PhD > http://goertzel.org > > "Only those who will risk going too far can possibly find out how far > they can go." - T.S. Eliot -- Ben Goertzel, PhD http://goertzel.org "Only those who will risk going too far can possibly find out how far they can go." - T.S. Eliot -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBfwXFb0MFrtTVGsV%2BLLu%2BFUsBc1ugQ72wp422X%2BHf7-6g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
