Ben, Nil, Linas, Cassio, and whoever might be interested,

2018-05-20 12:54 GMT+03:00 Ben Goertzel <[email protected]>:

> > But how will you calculate P(image|crow,black)?
>
> Well as you know, if you really want to, something like "the RGB value
> of the pixel at coordinate (444,555) is within a distance .01 of
> (.3,.7,.8)" can be represented as a logical atom  ... so there is no
> problem using logic to reason about perceptual data in a very raw way if
> you want to
>
> OTOH I don't really want to do it that way... instead, as you know, I
> want to model visual data using deep NNs of the right sort, and then
> feed info about the structured latent variables of these NNs and their
> interrelationships into the logical reasoning engine....   This is
> because it seems like NNs, rather than explicit logic or probabilistic
> programming, are more efficient at processing large-scale raw video
> data...
>

Yeah... and here is the dilemma.
We consider two different yet connected tasks:
– Connecting OpenCog with deep neural networks (more specifically, with
Tensorflow library);
– Implementing efficient probabilistic programming with the use of OpenCog.

Both tasks can be considered as a part of the Semantic Vision problem, but
their solution can be useful in a more general context.

*OpenCog + Tensorflow*
Depth of OpenCog+Tensoflow integration can be quite different. Shallow
integration implies that Tensorflow is used as an external module, and
communication between Tensorflow and OpenCog is limited to passing
activities of neurons, which are represented both by Tensorflow and
Atomspace nodes.
The most restricted way is just to run (pre-trained) TF models on input
data and to set values of Atomspace nodes in correspondence with the
activities of output neurons. What will be missing in this case: feedback
connections from the cognitive level to the perception system; online (and
joint) training of neural networks and OpenCog.
Let us consider the Visual Question Answering (VQA) task as a motivating
example. How will OpenCog be able to answer such questions as “What is the
color of the dress of the girl standing to the left of the man in a blue
coat?” If our network is pre-trained to detect and recognize all objects in
the image and supplement them with detailed descriptions of colors, shapes,
poses, textures, etc., then Pattern Matcher will be able to answer such
questions (converted to corresponding queries). However, this approach is
not computationally feasible: there are too many objects in images, and too
many grounded predicates which can be applied to them. Thus, the question
should influence the process of how the image is interpreted.
For example, even if we detected bounding boxes (BBs) for all objects and
inserted them into AtomSpace, predicate “left to” is not immediately
evaluated to all pairs of BBs. Instead, it will be evaluated during query
execution by Pattern Matcher (hopefully) only for relevant BBs labeled as
“girl” and “man”. Similarly, grounded predicate “is blue” implemented by a
neural subnetwork can be computed only in the course of query execution
meaning that the work of Pattern Matcher should be extended to neural
network levels. Indeed, purely DNN solutions for VQA usually implement some
top-down processes at least in the form of attention mechanisms.
Apparently, a cognitive feedback to perception is necessary for AGI in
general.
It is not a problem to feed Tensorflow models with data generated by
OpenCog via placeholders, but OpenCog will also need some interface for
executing computational graphs in Tensorflow. This can be done by binding
corresponding Session.run calls with Grounded Predicate/Schema nodes.
The question is how to combine OpenCog and neural networks on the
algorithmic level. Let us return to the considered request for VQA. We can
imagine a grounded schema node, which detects all bounded boxes with a
given class label, and inserts them into Atomspace, so Pattern Matcher or
Backward Chainer can further evaluate some grounded predicates over them
finally finding an answer to the question. However, the question can be
“What is the rightmost object in the scene?” In this case, we don’t expect
our system to find all objects, but rather to examine the image starting
from its right border. We can imagine queries supposing other strategies of
image processing/examination. In general, we would like not to hardcode all
possible cases, but to have a general mechanism, which can be trained to
execute different queries.
To make neural networks transparent for Pattern Matcher, we need to make
nodes of Tensorflow also habitants of Atomspace. The same is needed for a
general case of unsupervised learning. In particular, architecture search
is needed in order to achieve better generalization with neural networks or
simply to choose an appropriate structure of the latent code. Thus, OpenCog
should be able to add or deleted nodes in Tensorflow graphs.
These nodes correspond not just to neural layers, but also to operations
over them. One can imagine TensorNode nodes connected by PlusLink,
TimesLink, etc.. There can be tricky technical issues with Tensorflow (e.g.
compilation of dynamical graphs), but they should be solvable.
A conceptual problem consists in that fact that Pattern Matcher work with
Atoms, but not with Values. Apparently, activities of neurons should be
Values. However, evaluation of, e.g. GreaterThanLink requires NumberNode
nodes. Operations over (truth) values are usually implemented in Scheme
within rules fed to URE. This might be enough for dealing with individual
neuron activities as truth values and with neural networks as grounded
predicates, but patterns in values cannot be matched or mined directly
(while the idea of SynerGANs implied the necessity to mine patterns in
activities of neurons of the latent code).

I was going to illustrate by concreate the same kind of problems with
implementing probabilistic programming with OpenCog, but I guess it's
already TL;DR.

So, briefly speaking, we need Pattern Matcher and Pattern Miner to work
over Values/Valuations, that is not the case now (OpenCog uses only truth
and attention values, and Atomese/Pattern Matcher doesn't have a built-in
semantic even for them). I cite Linas here:
"Atoms are:

* slow to create, hard to destroy

* are indexed and globally unique

* are searchable

* are immutable


Values are:

* fast and easy to create, destroy, change

* values are highly mutable.

* values are not indexed, are not searchable, are not globally unique."

But we need "fast and easy to create, destroy, change, highly mutable, but
searchable" entities. So, this is not only technical, but also conceptual
problem...

I would really like to hear your opinion on this. What should we do? Resort
to the most shallow integration between OpenCog and DNNs? In this case,
SynerGANs will not work since we will not be able to mine patterns in
values, and we will not be able to use Pattern Matcher to solve VQA.
Express output of DNNs as Atoms? Linas objected even the idea to express
coordinates and lables of bounding boxes as Atoms. To do this with
activities of neurons will be even worse. Put everything into Space-Time
server? But the idea to use the power of Pattern Matcher, URE, etc. will
not be achievable. Extend Pattern Matcher to work with Values? Maybe... /*I
like the idea of embedding TF computational graph into Atomspace, but
tf.mul works over Values (tensors) - not NumberNodes. Thus, in this case,
it will be required to make all links (like TimesLink) to work not only
with NumberNodes, but also with Values... but I foresee objections from
Linas here... Also, I believe it should be useful in general since Values
are not first-class objects in Atomese - you should use Scheme/Python/C to
describe how to recalculate truth values; you cannot reason about them
directly...

Or should we try to use a sort of PPL as a bridge between Values and Atoms?
Maybe... Or we should do something unifying all these.*/


The question is not just about binding vision and PLN. It is more general.
Say, if you driving a car, you estimate distances and velocities of other
cars and take actions on this basis. These are also Values, and you
'reason' over them using both 'number crunching' and 'logic' simultaneously
(I don't mean procedural knowledge here in sense of GroundedSchemaNode).
So, I don't think that we should limit outselves to a shallow integration
and use DNNs/PPL/etc. peripherically only...


Ben Goertzel <[email protected]>:

> if one stays in the world of finite discrete
> distributions, one can construct probabilistic logics with
> sampling-based semantics... https://arxiv.org/pdf/1602.06420.pdf
>

Sounds quite interesting. I'll study it in detail...

 -- Alexey

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CABpRrhwBXirbRJM1cK2PSuiZvFKZrUmhSuqkuau2OQOAFxoYZg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to