correction: swap co/contravariant. On 4/22/17, Jesús López <jesus.lopez.salva...@gmail.com> wrote: > Hi again, just wanted to drop a pair of thoughts. > > What I'm talking about is more of conceptual exploration, categorical > and liguistically motivated while Ben talk is more neural and > hands-on. What would be nice is connecting the threads. > > Previously Ben said: >> The semiring could also be a non-Boolean algebra of relations on > graphs or hypergraphs > > That would demand to substitute the numbers in the word2vec vectors > (and Coecke tensors!) by whole relations (relations on hypergraphs are > much fatter than just numbers) which I'm not sure you'd even want. I > didn't remember seeing this before. For good or bad, last week > appeared arxiv:1704.05725 for the categorical quantum mechanics > setting where they seem to be doing just that sort of thing, > substituting the complex numbers field by an arbitrary C*-algebra. If > you can think of your algebra of relations as C-star, that would push > that idea some further, though I don't really know how far it goes > semantically, not to speak about learning parameters. One would need > also the glue to apply the former paper idea to the quantum flavor of > Cocke semantics. > > Can't help on GAN stuff because of lacking homework on that. However I > would also look to what Socher did in 2013. Typical neural nets are > many-flat sandwiches of rectangles of weights (linear), that have > stacked on top a vector of nonlinearities and so on. Socher > introduced/used *tensor* neural nets where he used a *cube* for a > *bi*-linear transformation followed by nonlinearity. His units > transform pair of vectors to single vectors and his NN topology is a > binary tree (instead of a linear stacking of layers of a classical > NN). If you have a fragment of English generated by a CFG, the parse > tree (true tree) can be binarized [1], and each node would be a Socher > net unit, with leaves being distributional (word2vec) vectors. > > The difference of this with Cocke is that in the later there is not > binarization (instead multilinear, general tensors), and the net is > not a tree but a DAG. And more importantly of course there are > nonlinear extra toppings of nodes in Socher and an actual learning > algorithm, thing left for the future more or less in Coecke view > despite some efforts. So basically if you put a nonlinear topping or > hat on each of the nodes of what I was calling a tensor network you > should arrive at a neural tensor net. Just split the rank r of the > tensor in r = u + v, for u the quantity of contravariant (input) > indices, and v the quantity of covariants (outputs). Then each node > tensor has u *vectors* as inputs (2 in Socher) and v output vectors. > One needs an analogue of the element-wise nonlinearity in this context > but I don't know which. As the topology can include "diamond" paths, > one needs a suited learning method. I've read about what's called > backpropagation through structure in tensor neural net papers. > > Another technical difference is that Socher had an extra additive > contribution to the output of their bilinearly-flavored units by an > extra classical NN-stage, just not to lie. > > All the former if one has serious interest in the Cocke approach to > semantics. > > Note that while Coecke theory is very pleasant categorically, the > nonlinear toppings have not received any attention from categorists > that I know of. > > On the purely categorical side of understanding this same problem, and > forgetting parameter learning for a moment, I had a litte realization > to share. I talked about categories resulting of several monads as > *targets* of Coecke semantic functor. Later I remembered that the > source has also monad flavour. Sequences of things can be understood > through the list monad from the viewpoint of functional programming, > or the free monoid monad of the purists. One can thus see sentences as > sequences of words (lexical entities) given by a specific monad. Thus > we have monad flavour in both source and target of the semantics > functor. That prompts questions on the character of the functor > itself. > > That thoughts put me in the functional programmer mindset and I > remembered an old reading by Wadler, he was talking of understanding > (in functional programming and using Moggi ideas on computing with > monads) recursive descent parsers of domain specific languages given > by a context free grammar by monadic means. The topic is called > monadic parsing. For developers. Interestingly this viewpoint is > permeating into Linguistics as well, as demonstrated by "Monads for > natural language semantics" (Shan). He talks of semantics as a monad > transformer. We are at a point where there even is a section called > "The CCG monad" in book of isbn 9783110251708. > > I don't know of work reconciling the monadic viewpoint with Coecke > stuff, but it is intriguing. > > Regards, Jesús. > > > [1] http://images.slideplayer.com/15/4559376/slides/slide_39.jpg > > > > > On 4/13/17, Ben Goertzel <b...@goertzel.org> wrote: >> OK, let me try to rephrase this more clearly... >> >> What I am thinking is -- >> >> In the GAN, the generative network takes in some random noise >> variables, and outputs a distribution over (link type, word) pairs >> [or in the plain-vanilla version without dependency parses, it would >> merely be over words >> >> The GAN would then be generating "statistical contexts" (corresponding to >> words) >> >> The adversarial (discriminator) network is trying to tell the real >> contexts from the randomly generated fake contexts... >> >> The InfoGAN variation would mean the GAN has some latent noise >> variables that indicate key features of real word contexts..... >> Presumably these would give a multidimensional parametrization of the >> scope of word contexts, and hence the scope of words-in-context (i.e. >> word meanings) >> >> So the architecture is nothing like word2vec, but the result is a >> vector for each word: the vector being the settings of the latent >> variables of the GAN network that generate the context for that >> word... >> >> This may still be fuzzy but hopefully is more clearly in a meaningful >> direction... >> >> This is "just" to find a maximally nice way to fill in the >> clustering-ish step in our unsupervised grammar induction algorithm... >> >> ben >> >> On Wed, Apr 12, 2017 at 6:50 PM, Ben Goertzel <b...@goertzel.org> wrote: >>> Having thought a little more... I'll need to think more about what's >>> the right network architecture to handle the inputs for applying the >>> InfoGAN methodology to this case... >>> >>> On Wed, Apr 12, 2017 at 4:46 PM, Ben Goertzel <b...@goertzel.org> wrote: >>>> Speculating a little further on this... >>>> >>>> In word2vec one trains a neural networks to do the following. Given a >>>> specific word in the middle of a sentence (the input word), one looks >>>> at the words nearby and pick one at random. The network is going to >>>> tell us the probability -- for every word in our vocabulary -- of that >>>> word being the “nearby word” that we chose. >>>> >>>> Suppose we try to use word2vec on a vocabulary of 10K words and try to >>>> project the words into vectors of 300 features. >>>> >>>> Then the input layer has 10K neurons (one per word), only one of which >>>> is active at a time; the hidden layer has 300 neurons, and the output >>>> layer has 10K neurons... the vector for a word is then given by the >>>> weights to the hidden layer from that word... >>>> >>>> (see >>>> http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/ >>>> for simple overview...) >>>> >>>> This is cool but not necessarily the best way to do this sort of thing, >>>> right? >>>> >>>> An alternate approach in the spirit of InfoGAN would be to try to >>>> learn a "generative" network that, given an input word W, outputs the >>>> distribution of words surrounding W .... There would also be an >>>> "adversarial" network that would try to distinguish the distributions >>>> produced by the generative network, from the distribution produced >>>> from the actual word.... The generative network could have some >>>> latent variables that are supposed to be informationally correlated >>>> with the distribution produced... >>>> >>>> One would then expect/hope that the latent variables of the generative >>>> model would correspond to relevant linguistic features... so one would >>>> get shorter and more interesting vectors than word2vec gives... >>>> >>>> Suppose that in such a network, for "words surrounding W", one used >>>> "words linked to W in a dependency parse".... Then the latent >>>> variables of the generative model mentioned above, should be the >>>> relevant syntactico-semantic aspects of the syntactic relationships >>>> that W displays in the dependency parse.... >>>> >>>> Clustering on these vectors of latent variables should give very nice >>>> clusters which can then be used to define new variables ("parts of >>>> speech") for the next round of dependency parsing in our language >>>> learning algorithm... >>>> >>>> -- Ben >>>> >>>> >>>> On Sat, Apr 8, 2017 at 2:24 AM, Jesús López >>>> <jesus.lopez.salva...@gmail.com> wrote: >>>>> Hello Ben and Linas, >>>>> >>>>> Sorry for the delay, I was reading the papers. About additivity: In >>>>> Coecke's et al. program you turn a sentence into a *multilinear* map >>>>> that goes from the vectors of the words having elementary syntactic >>>>> category to a semantic vector space, the sentence meaning space. So >>>>> yes, there is additivity in each of theese arguments (thing which by >>>>> the way should have a consequence in those beautiful word2vec >>>>> relations of France - Paris ~= Spain - Madrid, though I haven't seen a >>>>> description). >>>>> >>>>> As I understand, your goal is to go from plain text to logical forms >>>>> in a probabilistic logic, and you have two stages, parsing from plain >>>>> text to a pregroup grammar parse structure (I'm not sure that the >>>>> parse trees I spoken before are really trees, hence the change to >>>>> 'parse structure'), and then you go from that parse structure (via >>>>> RelEx and RelEx2Logic if that's ok) to a lambda calculus term bearing >>>>> the meaning and having attached extrinsically a kind of probability >>>>> and another number. >>>>> >>>>> How do Coecke's program (and from now on that unfairly includes all >>>>> the et als.) fit in that picture? I think the key observation is when >>>>> Coecke says that his framework can be interpreted, as a particular >>>>> case, as Montague semantics. Though adorned by linguistic >>>>> considerations this semantic is well known as amenable to computation, >>>>> and a toy version is shown in chapter 10 of the NLTK book, where they >>>>> show how lambda calculus represents a logic that has a model theory. >>>>> That is important because all those lambda terms have to be actual >>>>> functions with actual values. >>>>> >>>>> How exactly does Coecke's framework reduces to Montague semantics? >>>>> That matters, because if we understand how Montague semantics is a >>>>> particular case of Coecke's, we can think in the opposite direction >>>>> and see Coecke's semantics as an extension. >>>>> >>>>> As starting point we have the fact that Coecke semantics can be >>>>> summarized as a monoidal functor that sends a morphism from a compact >>>>> closed category in syntax-land (the pregroup grammar parse structure, >>>>> resulting from parsing the plain text of a sentence) to a morphism in >>>>> a compact closed category in semantics-land, the category of real >>>>> vector spaces, that morphism being a (multi)linear map. >>>>> >>>>> Coecke semantic functor definition, however, hardly needs any >>>>> modification if we use as target the compact closed category of >>>>> modules over a fixed semiring. If the semiring is that of booleans, we >>>>> are talking about the category of relations between sets, with Pierce >>>>> relational product (uncle = brother * father) expressed with the same >>>>> matrix product formula of linear algebra, and with cartesian product >>>>> as the tensor product that makes it monoidal. >>>>> >>>>> The idea is that when Coecke semantic functor has as codomain the >>>>> category of relations, one obtains Montague semantics. More exactly, >>>>> when one applies the semantic functor to a pregroup grammar parse >>>>> structure of a sentence, one obtains the lambda term that Montague >>>>> would have attached to it. Naturally the question is how exactly >>>>> unfold that abstract notion. The folk joke on 'abstract nonsense' >>>>> forgets that there is a down button in the elevator. >>>>> >>>>> Well, this would be lenghty here, but the way I started to come to >>>>> grips is by entering into the equation the CCG linguistic formalism. A >>>>> fast and good slide show of how one goes from plain text to CCG >>>>> derivations, and from derivations then to classic Montague-semantics >>>>> lambda terms, can be found in [1]. >>>>> >>>>> One important feature in CCG is that it is lexicalized, i. e., all the >>>>> linguistic data necessary to do both syntatic and semantic parsing is >>>>> attached to the words of the dictionary, in contrast with, say, NLTK >>>>> book ch. 10, where the linguistic data is inside production rules of >>>>> an explicit grammar. >>>>> >>>>> Looking closer to the lexicon (dictionary), one has that each word is >>>>> supplemented with its syntactic category (N/N...) and also with a >>>>> lambda term compatible with the syntactic category used in semantic >>>>> parsing. Those lambda terms are not magical letters. For the lambda >>>>> terms to have a true model theoretic semantics they must correspond to >>>>> specific functions. >>>>> >>>>> The good thing is that the work of porting Coecke semantics to CCG >>>>> (instead of pregroup grammar) is already done: in [2]. The details are >>>>> there, but the thing that I want to highlight is that in this case, >>>>> when one is doing Coecke semantics with CCG parsing, the structure of >>>>> the lexicon is changed. One retains the words, and their associated >>>>> syntactic category. But now, instead of the lambda terms (with their >>>>> corresponding interpretation as actual relations/functions), one has >>>>> vectors and tensors for simple and compound syntactic categories (say >>>>> N vs N/N) respectively. When those tensors/vectors are of booleans one >>>>> recovers Montague semantics. >>>>> >>>>> In the Coecke general case, sentences mean vectors in a real vector >>>>> space and the benefits start by using its inner product, and hence >>>>> norm and metric, so you can measure quantitatively sentence similarity >>>>> (rather normalized vectors...). >>>>> >>>>> CCG is very nice in practical terms. An open SOTA parser >>>>> implementation is [3] described in [4], to be compared with [5] ("The >>>>> parser finds the optimal parse for 99.9% of held-out sentences"). >>>>> openCCG is older but does parsing and generation. >>>>> >>>>> One thing that I don't understand well with the above stuff is that >>>>> the category of vector spaces over a fixed field (or even the finite >>>>> dimensional ones) is *not* cartesian closed. While in the presentation >>>>> of Montague semantics in NLTK book ch. 10 the lambda calculus appears >>>>> to be untyped, more faithful presentations seem to require (simply) >>>>> typed or even a more complex calculus/logic. In that case the semantic >>>>> category perhaps should had to be cartesian closed, supporting in >>>>> particular higher order maps. >>>>> >>>>> That's all in the expository front and now some speculation. >>>>> >>>>> Up to now the only tangible enhancement brought by Coecke semantics is >>>>> the motivation of a metric among sentence meanings. What we really >>>>> want is a mathematical motivation to probabilize the crisp, hard facts >>>>> character of the interpretation of sentences as Montague lambda terms. >>>>> How to attack the problem? >>>>> >>>>> One idea is to experiment with other kinds of semantic category as >>>>> target of the Coecke semantic functor. To be terse, this can be >>>>> explored by means of a monad on a vanilla unstructured base category >>>>> such as finite sets. One can have several choices of endofunctor to >>>>> specify the corresponding monad. Then the semantic category proposed >>>>> is its Kleisli category. Theese categories are monoidal and have a >>>>> revealing diagrammatic notation. >>>>> >>>>> 1.- Powerset endofunctor. This gives rise to the category of sets, >>>>> relations and cartesian product as monoidal operation. Coecke >>>>> semantincs results in montagovian hard facts as described above. >>>>> Coecke and Kissinger's new book [6] details the diagramatic language >>>>> particulars. >>>>> 2.- Vector space monad (over the reals). Since the sets are finite, >>>>> the Kleisli category is that of finite dimensional real vector spaces. >>>>> That is properly Coecke's framework for computing sentence similarity. >>>>> Circuit diagrams are tensor networks where boxes are tensors and wires >>>>> are contractions of specific indices. >>>>> 3.- A monad in quantum computing is shown in [7], and quantumly >>>>> motivated semantics is specifically addressed by Coecke. The whole >>>>> book [8] discuss the connection though I haven't read it. Circuit >>>>> diagrams should be quantum circuits representing possibly unitary >>>>> process. Quantum amplitudes through measurement give rise to classical >>>>> probabilities. >>>>> 4.- The Giry monad here results from the functor that produces all >>>>> formal convex linear combinations of the elements of a given set. The >>>>> Kleisli category is very interesting, having as maps probabilistic >>>>> mappings that under the hood are just conditional probabilities. This >>>>> maps allow a more user friendly understanding of Markov Chains, Markov >>>>> Decission Processes, HMMs, POMDPs, Naive Bayes classifiers and Kalman >>>>> filters. Circuit diagrams have to correspond to the factor diagrams >>>>> notation of bayesian networks [9], and the law of total probability >>>>> generalizes in bayesian networks to the linear algebra tensor network >>>>> calculations of the corresponding network (this can be shown in actual >>>>> bayesian network software). >>>>> >>>>> A quote from mathematician Gian Carlo Rota [10]: >>>>> >>>>> "The first lecture by Jack [Schwartz] I listened to was given in the >>>>> spring of 1954 in a seminar in functional analysis. A brilliant array >>>>> of lecturers had been expounding throughout the spring term on their >>>>> pet topics. Jack's lecture dealt with stochastic processes. >>>>> Probability was still a mysterious subject cultivated by a few >>>>> scattered mathematicians, and the expression "Markov chain" conveyed >>>>> more than a hint of mystery. Jack started his lecture with the words, >>>>> "A Markov chain is a generalization of a function." His perfect >>>>> motivation of the Markov property put the audience at ease. Graduate >>>>> students and instructors relaxed and followed his every word to the >>>>> end." >>>>> >>>>> The thing I would research would be to use as semantic category that >>>>> of those generalized functions of the former quote and bullet 4 so >>>>> basically you replace word2vec vectors by probability distributions of >>>>> the words meaning something, connect a bayesian network from the CCG >>>>> parse and apply generalized total probability to obtain probabilized >>>>> booleans, i.e. a number 0 <= x <= 1 (instead of just a boolean as with >>>>> Montague semantics). That is, the probability that a sentence holds >>>>> depends on the distributions of its syntactically elementary >>>>> contituyents meaning something, and those distros are combined by >>>>> factors of a bayesian net with conditional independence relations that >>>>> respect and reflect the sentence syntax and have the local Markov >>>>> property. The factors are for words of complex syntactic cateogory (as >>>>> N/N...) and their attached tensors are multivariate conditional >>>>> probability distributions. >>>>> >>>>> Hope this helps somehow. Kind regards, >>>>> Jesus. >>>>> >>>>> >>>>> [1] http://yoavartzi.com/pub/afz-tutorial.acl.2013.pdf >>>>> [2] http://www.cl.cam.ac.uk/~sc609/pubs/eacl14types.pdf >>>>> [3] http://homepages.inf.ed.ac.uk/s1049478/easyccg.html >>>>> [4] http://www.aclweb.org/anthology/D14-1107 >>>>> [5] https://arxiv.org/abs/1607.01432 >>>>> [6] ISBN 1108107710 >>>>> [7] https://bram.westerbaan.name/kleisli.pdf >>>>> [8] ISBN 9780199646296 >>>>> [9] http://helper.ipam.ucla.edu/publications/gss2012/gss2012_10799.pdf >>>>> [10] Indiscrete thoughts >>>>> >>>>> On 4/2/17, Linas Vepstas <linasveps...@gmail.com> wrote: >>>>>> Hi Ben, >>>>>> >>>>>> On Sun, Apr 2, 2017 at 3:16 PM, Ben Goertzel <b...@goertzel.org> >>>>>> wrote: >>>>>> >>>>>>> So e.g. if we find X+Y is roughly equal to Z in the domain >>>>>>> of semantic vectors, >>>>>>> >>>>>> >>>>>> But what Jesus is saying (and what we say in our paper, with all that >>>>>> fiddle-faddle about categories) is precisely that while the concept >>>>>> of >>>>>> addition is kind-of-ish OK for meanings it can be even better if >>>>>> replaced >>>>>> with the correct categorial generalization. >>>>>> >>>>>> That is, addition -- the plus sign --is a certain speciific morphism, >>>>>> and >>>>>> that this morphism, the addition of vectors, has the unfortunate >>>>>> property >>>>>> of being commutative, whereas we know that language is >>>>>> non-commutative. >>>>>> The >>>>>> stuff about pre-group grammars is all about identifying exactly >>>>>> which >>>>>> morphism it is that correctly generalizes the addition morphism. >>>>>> >>>>>> That addition is kind-of OK is why word2vec kind-of works. But I >>>>>> think >>>>>> we >>>>>> can do better. >>>>>> >>>>>> Unfortunately, the pressing needs of having to crunch data, and to >>>>>> write >>>>>> the code to crunch that data, prevents me from devoting enough time >>>>>> to >>>>>> this >>>>>> issue for at least a few more weeks or a month. I would very much >>>>>> like >>>>>> to >>>>>> clarify the theoretical situation here, but need to find a chunk of >>>>>> time >>>>>> that isn't taken up by email and various mundane tasks. >>>>>> >>>>>> --linas >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "opencog" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to opencog+unsubscr...@googlegroups.com. >>>>> To post to this group, send email to opencog@googlegroups.com. >>>>> Visit this group at https://groups.google.com/group/opencog. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/opencog/CAFx29Pu6zK7MwbOuTPHcwOUuW9C6Wrhc9ZFxc5Kp3J4GkMtHkg%40mail.gmail.com. >>>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> >>>> -- >>>> Ben Goertzel, PhD >>>> http://goertzel.org >>>> >>>> "I am God! I am nothing, I'm play, I am freedom, I am life. I am the >>>> boundary, I am the peak." -- Alexander Scriabin >>> >>> >>> >>> -- >>> Ben Goertzel, PhD >>> http://goertzel.org >>> >>> "I am God! I am nothing, I'm play, I am freedom, I am life. I am the >>> boundary, I am the peak." -- Alexander Scriabin >> >> >> >> -- >> Ben Goertzel, PhD >> http://goertzel.org >> >> "I am God! I am nothing, I'm play, I am freedom, I am life. I am the >> boundary, I am the peak." -- Alexander Scriabin >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "opencog" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/opencog/mX93L866Z_Q/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> opencog+unsubscr...@googlegroups.com. >> To post to this group, send email to opencog@googlegroups.com. >> Visit this group at https://groups.google.com/group/opencog. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/opencog/CACYTDBcgaVWFzeQ6jVHcpqb2%2BoRJBvs0MOwEMHFKpD3__gSbxA%40mail.gmail.com. >> For more options, visit https://groups.google.com/d/optout. >> >
-- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscr...@googlegroups.com. To post to this group, send email to opencog@googlegroups.com. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAFx29Ps03Y5XMCw0Ee4vDxm1sp1oWmnPDYg%2BrZA3TsKk26gdpQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.