Ben, Nil, Linas, Cassio, and whoever might be interested, 2018-05-20 12:54 GMT+03:00 Ben Goertzel <[email protected]>:
> > But how will you calculate P(image|crow,black)? > > Well as you know, if you really want to, something like "the RGB value > of the pixel at coordinate (444,555) is within a distance .01 of > (.3,.7,.8)" can be represented as a logical atom ... so there is no > problem using logic to reason about perceptual data in a very raw way if > you want to > > OTOH I don't really want to do it that way... instead, as you know, I > want to model visual data using deep NNs of the right sort, and then > feed info about the structured latent variables of these NNs and their > interrelationships into the logical reasoning engine.... This is > because it seems like NNs, rather than explicit logic or probabilistic > programming, are more efficient at processing large-scale raw video > data... > Yeah... and here is the dilemma. We consider two different yet connected tasks: – Connecting OpenCog with deep neural networks (more specifically, with Tensorflow library); – Implementing efficient probabilistic programming with the use of OpenCog. Both tasks can be considered as a part of the Semantic Vision problem, but their solution can be useful in a more general context. *OpenCog + Tensorflow* Depth of OpenCog+Tensoflow integration can be quite different. Shallow integration implies that Tensorflow is used as an external module, and communication between Tensorflow and OpenCog is limited to passing activities of neurons, which are represented both by Tensorflow and Atomspace nodes. The most restricted way is just to run (pre-trained) TF models on input data and to set values of Atomspace nodes in correspondence with the activities of output neurons. What will be missing in this case: feedback connections from the cognitive level to the perception system; online (and joint) training of neural networks and OpenCog. Let us consider the Visual Question Answering (VQA) task as a motivating example. How will OpenCog be able to answer such questions as “What is the color of the dress of the girl standing to the left of the man in a blue coat?” If our network is pre-trained to detect and recognize all objects in the image and supplement them with detailed descriptions of colors, shapes, poses, textures, etc., then Pattern Matcher will be able to answer such questions (converted to corresponding queries). However, this approach is not computationally feasible: there are too many objects in images, and too many grounded predicates which can be applied to them. Thus, the question should influence the process of how the image is interpreted. For example, even if we detected bounding boxes (BBs) for all objects and inserted them into AtomSpace, predicate “left to” is not immediately evaluated to all pairs of BBs. Instead, it will be evaluated during query execution by Pattern Matcher (hopefully) only for relevant BBs labeled as “girl” and “man”. Similarly, grounded predicate “is blue” implemented by a neural subnetwork can be computed only in the course of query execution meaning that the work of Pattern Matcher should be extended to neural network levels. Indeed, purely DNN solutions for VQA usually implement some top-down processes at least in the form of attention mechanisms. Apparently, a cognitive feedback to perception is necessary for AGI in general. It is not a problem to feed Tensorflow models with data generated by OpenCog via placeholders, but OpenCog will also need some interface for executing computational graphs in Tensorflow. This can be done by binding corresponding Session.run calls with Grounded Predicate/Schema nodes. The question is how to combine OpenCog and neural networks on the algorithmic level. Let us return to the considered request for VQA. We can imagine a grounded schema node, which detects all bounded boxes with a given class label, and inserts them into Atomspace, so Pattern Matcher or Backward Chainer can further evaluate some grounded predicates over them finally finding an answer to the question. However, the question can be “What is the rightmost object in the scene?” In this case, we don’t expect our system to find all objects, but rather to examine the image starting from its right border. We can imagine queries supposing other strategies of image processing/examination. In general, we would like not to hardcode all possible cases, but to have a general mechanism, which can be trained to execute different queries. To make neural networks transparent for Pattern Matcher, we need to make nodes of Tensorflow also habitants of Atomspace. The same is needed for a general case of unsupervised learning. In particular, architecture search is needed in order to achieve better generalization with neural networks or simply to choose an appropriate structure of the latent code. Thus, OpenCog should be able to add or deleted nodes in Tensorflow graphs. These nodes correspond not just to neural layers, but also to operations over them. One can imagine TensorNode nodes connected by PlusLink, TimesLink, etc.. There can be tricky technical issues with Tensorflow (e.g. compilation of dynamical graphs), but they should be solvable. A conceptual problem consists in that fact that Pattern Matcher work with Atoms, but not with Values. Apparently, activities of neurons should be Values. However, evaluation of, e.g. GreaterThanLink requires NumberNode nodes. Operations over (truth) values are usually implemented in Scheme within rules fed to URE. This might be enough for dealing with individual neuron activities as truth values and with neural networks as grounded predicates, but patterns in values cannot be matched or mined directly (while the idea of SynerGANs implied the necessity to mine patterns in activities of neurons of the latent code). I was going to illustrate by concreate the same kind of problems with implementing probabilistic programming with OpenCog, but I guess it's already TL;DR. So, briefly speaking, we need Pattern Matcher and Pattern Miner to work over Values/Valuations, that is not the case now (OpenCog uses only truth and attention values, and Atomese/Pattern Matcher doesn't have a built-in semantic even for them). I cite Linas here: "Atoms are: * slow to create, hard to destroy * are indexed and globally unique * are searchable * are immutable Values are: * fast and easy to create, destroy, change * values are highly mutable. * values are not indexed, are not searchable, are not globally unique." But we need "fast and easy to create, destroy, change, highly mutable, but searchable" entities. So, this is not only technical, but also conceptual problem... I would really like to hear your opinion on this. What should we do? Resort to the most shallow integration between OpenCog and DNNs? In this case, SynerGANs will not work since we will not be able to mine patterns in values, and we will not be able to use Pattern Matcher to solve VQA. Express output of DNNs as Atoms? Linas objected even the idea to express coordinates and lables of bounding boxes as Atoms. To do this with activities of neurons will be even worse. Put everything into Space-Time server? But the idea to use the power of Pattern Matcher, URE, etc. will not be achievable. Extend Pattern Matcher to work with Values? Maybe... /*I like the idea of embedding TF computational graph into Atomspace, but tf.mul works over Values (tensors) - not NumberNodes. Thus, in this case, it will be required to make all links (like TimesLink) to work not only with NumberNodes, but also with Values... but I foresee objections from Linas here... Also, I believe it should be useful in general since Values are not first-class objects in Atomese - you should use Scheme/Python/C to describe how to recalculate truth values; you cannot reason about them directly... Or should we try to use a sort of PPL as a bridge between Values and Atoms? Maybe... Or we should do something unifying all these.*/ The question is not just about binding vision and PLN. It is more general. Say, if you driving a car, you estimate distances and velocities of other cars and take actions on this basis. These are also Values, and you 'reason' over them using both 'number crunching' and 'logic' simultaneously (I don't mean procedural knowledge here in sense of GroundedSchemaNode). So, I don't think that we should limit outselves to a shallow integration and use DNNs/PPL/etc. peripherically only... Ben Goertzel <[email protected]>: > if one stays in the world of finite discrete > distributions, one can construct probabilistic logics with > sampling-based semantics... https://arxiv.org/pdf/1602.06420.pdf > Sounds quite interesting. I'll study it in detail... -- Alexey -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/opencog. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CABpRrhwBXirbRJM1cK2PSuiZvFKZrUmhSuqkuau2OQOAFxoYZg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
