Re: Audio-video unsupervised learning [was: Re: [opencog-dev] UnionLink, IntersectionLink, ComplementLink

Linas Vepstas Mon, 20 Sep 2021 14:03:22 -0700

On Sun, Sep 19, 2021 at 11:57 AM Adrian Borucki <[email protected]> wrote:
>
> Just to clarify: by “performance” I mean the rate of success on a given task, 
> not necessarily speed.


Well, I think its likely to be successful, but clearly I have not
convinced you of that.

> Anyway: I’m afraid I can’t help with the visual processing part then — I know 
> nothing of using wavelets for image analysis

You don't have to use wavelets. You do have to have a basic
understanding of image processing and how one applies image processing
primitives to extract information.

There is an easy way to learn this, though: The earliest programming
task is simply to write atomese wrappers for common textbook
image-processing primitives. This means, in practice, downloading a
copy of OpenCV, and reading through it's documentation. Writing
Atomese wrappers for it would allow you to learn image processing
"hands on" -- there are a number of OpenCV demos; you can run them,
convert them to Atomese, run the Atomese versions, and verify that you
get the same results. There are textbooks on image processing, filled
with examples; converting them to atomese and running them would be a
good, practical way of learning the core concepts.

An alternative would be to do this for audio; in some sense, this
would be simpler, but certainly a lot geekier: audio does not have the
immediate visual feedback of image processing.  It's more abstract.

> so I can’t really say anything further until how this is supposed to work is 
> fully sorted out.

I'm sorry to hear this. You seem to be politely backing away from the
project; I'm not sure what you expected it to be, but clearly what I
painted is not what you'd hoped for.

The project is "sorted out", but I guess I'm not communicating
something important about it. Again: the pipeline is already working
in the language domain. I tried to provide enough of an explanation
and pseudocode snippets to explain how to port it over for vision and
audio.  It's pretty concrete; there's no airy-fairy hand-waving, just
a pile of pseudocode that needs to be converted to real code.

I'm guessing that somehow, I still failed to somehow explain what this
is all about. Perhaps I should bounce you to the abstract theory
papers?  There are two: one that's hand-wavey with no math, another
with lots of math. These are

https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/sheaves.pdf

and

https://github.com/opencog/learn/blob/master/learn-lang-diary/skippy.pdf

--linas


>
> On Friday, 17 September 2021 at 22:19:47 UTC+2 linas wrote:
>>
>> Hi Adrian,
>>
>> On Thu, Sep 16, 2021 at 3:02 PM Adrian Borucki <[email protected]> wrote:
>> >
>> > Yeah, this is clear to me to now — the grammar learning part is kind of a 
>> > given, the real question is how well this “image predicate” learning can 
>> > go…
>>
>> Yes, that is a question. Based on current experience, I'll say "very
>> far" or at least, "much farther than anyone else has gone". But that
>> is rather speculative: it's based on what I've been learning in a 1D
>> setting, and so any doubters or skeptics in the audience are
>> justified in doubting. Basically, I'm proposing this because it looks
>> promising.
>>
>> It does not help that I am just one person proposing a rather novel,
>> radical, counter-cultural idea that flies in the face of conventional
>> wisdom. I'm quite aware of this. My burden of proof is much higher,
>> and I am trying to supply it as best as I can. Keep asking doubtful
>> questions, this is maybe the most useful thing you can do right now.
>> So I like how this is going. I'm only irritated that you can't read my
>> mind :-)
>>
>> > This is a deep question as no one is even sure why neural nets themselves 
>> > work so well.
>>
>> Well, again, this goes in a very different direction. Here, the
>> reason that it would "work so well" is much more obvious: we ourselves
>> are very good at spotting part-whole structure. Why, in just a few
>> minutes, I can write down the obvious grammar for stop lights: glowing
>> red above yellow above green, surrounded by a painted yellow or black
>> harness. This is "obvious", and detecting this in images seems like it
>> should be pretty easy.
>>
>> This is in very sharp contrast to what neural nets do: you are right:
>> when a neural net picks out a stoplight from an image, we have no idea
>> how it is doing that. Perhaps somewhere in there are some weight
>> vectors for red, yellow, green, but where are they? Where are they
>> hiding? How do neural nets handle part-whole relationships? There is
>> a paper (from Hinton?) stating that the part-whole relationship for
>> neural nets is the grand challenge of the upcoming decades. By
>> contrast, the part-whole relationship for grammars is "obvious".
>>
>> > What needs clarification is what the structure of this filter learning 
>> > would be — what is the algorithm and what direct learning objective is it 
>> > given?
>>
>> The exact same algo as in the existing grammar learning code, modulo
>> needed tweaks. That code is debugged and works well. Getting it going
>> on images does pose some serious challenges and open questions, but I
>> think the general ideas survive.
>>
>> To recap that algo: given a set of inputs, one explores the parameter
>> space, and looks for high mutual-information correlations between
>> pairs. Once high-MI pairs are discovered, the dataset is passed over a
>> second time, this time, creating maximal spanning trees. The tree
>> edges are then cut to give the grammar components.
>>
>> The above yields extremely high-dimensional sparse vectors: dimension
>> of a million. By comparison, the highest dimension that neural nets go
>> up to is about a thousand. So this is one of the big differences
>> between the two approaches. The other, of course, is that the basis is
>> labelled symbolically: you can see exactly which basis element
>> attaches to what ("red above yellow", etc.)
>>
>> I'm currently working on the best ways to cluster these vectors into
>> groupings. Early results look pretty good, but also show that these
>> can be made much better. I can say much more in this.
>>
>> > Like in the above example, where are all these filters and numerical 
>> > arguments even coming from?
>>
>> Randomly generated. With or without some sampling bias.
>>
>> > The numerical part is especially difficult, given that you seemingly want 
>> > to get some symbolic structure out of it.
>>
>> I don't understand this statement.
>>
>> >
>> > Going back to neural nets, the obvious problem is that if we make one big 
>> > neural “filter” then you don’t know what is going on inside —
>>
>> That's correct.
>>
>> > so the learning will be “shallower”. The question is how much of a problem 
>> > this really is.
>>
>> Well, the leading lights of neural-net world claim that this is one of
>> the grand challenges of the upcoming decades, and I won't argue with
>> them about that.
>>
>> > Is learning down to the low-level filtering operations a viable approach 
>> > right now?
>>
>> Yes, absolutely, I think so. Obviously, I haven't convinced you yet.
>> That is in part because I have not fully (clearly?) communicated the
>> general idea, just yet.
>>
>> > An interesting research question is if you could train a neural net that 
>> > can be “queried”, possibly in natural language or some simple formal one, 
>> > so that the system on top of it can learn to “extract” various statements 
>> > about an image out of it — so these predicates would be essentially hooked 
>> > to some queries that get send to the underlying model.
>>
>> Sure, there are hundreds of people working on this, and they are
>> making progress. You can go to seminars, new results are regularly
>> presented on this.
>>
>> > Technically this probably falls somewhere in the Visual Question Answering 
>> > field… the challenge is that these models are trained to answer questions 
>> > about more abstract things like objects, not some low level features of 
>> > the image.
>>
>> Yes. Lack of a symbolic structure to neural nets impedes desirable
>> applicatiions, such as symbolic reasoning.
>>
>> > The final big question is what can you really do after you get that 
>> > grammar? What sort of inferences? How useful they are?
>>
>> Well, for starters, if the system recognizes a stop light, you can ask
>> it: "how do you know its a stop light?" and get an answer: "because
>> red above yellow above green." you can ask "and what else?" and get
>> the answer "on a painted black or yellow background" -- "and what
>> else?" "the colors glow in the dark" "and what else?" "they are round"
>> and what else" only one comes on at a time" "and what else?" "the
>> cycle time varies from 30 second to three minutes" "what is a cycle
>> time?" "the parameter on the time filter by which repetition repeats"
>> "what do you mean by round?" the image area of the light is defined
>> via a circular aperature filter".
>>
>> Good luck getting a neural net answering even one of those questions,
>> never mind all of them.
>>
>> > The key thing here is that if you, say, have a system that classifies 
>> > pictures, if it being built on top of this whole grammar and filter 
>> > learning pipeline means it doesn’t achieve competitive performance with 
>> > neural nets then it’s difficult to see what the comparative advantage of 
>> > it is — beyond the obvious advantage of interpretability, but that won’t 
>> > save that solution if its performance is considerably lower.
>>
>> Really? The ability to do symbolic reasoning is valueless if it is
>> slow? If the filter that recognizes that lights are round also
>> appears in other grammatically meaningful situations, you can ask a
>> question "what else is round?" "the sun, the moon, billiard balls,
>> bowling balls, baseballs, basketballs". I think we are very very far
>> away from having a neural net do that kind of question answering. I
>> think this is well within reach of grammatical systems.
>>
>> Associations between symbols and the things they represent is the
>> famous "symbol grounding problem", considered to be a very difficult,
>> unsolved problem in AI. I'm sketching a technique that solves this
>> problem. I think this is unique in the history of AI research. I don't
>> see that anyone else has ever proposed a plausible solution to the
>> symbol grounding problem.
>>
>> > Well, the problem is not really with grammars, that can definitely be 
>> > useful, but if that “filter sequence” part works poorly then it will 
>> > bottleneck the performance of the entire system.
>>
>> Learning it, or running it, once learned? Clearly, running it can be
>> superfast .. even 1980's-era DSP's did image processing quite well.
>> Even single-threaded CPU's have no particular problem; these days we
>> have multi-core CPU's and oodles of GPU's.
>>
>> The learning algo is ..something else. There are two steps: Step one:
>> can we get it to work, at any speed? (I think we can) Step two: can we
>> get it to work fast? (Who knows -- compare to deep learning, which
>> took decades of basic research spanning hundreds of PhD theses before
>> it started running fast. You and I and whatever fan-base might
>> materialize are not going to replicate a few thousand man-years of
>> basic research into performance.)
>>
>> > If that low level layer outputs garbage, then all the upper layers get 
>> > garbage, and we know what happens when you have garbage inputs in this 
>> > field...
>>
>> Don't feed it garbage!
>>
>> --linas
>>
>> --
>> Patrick: Are they laughing at us?
>> Sponge Bob: No, Patrick, they are laughing next to us.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/opencog/959e9352-cc5b-481d-9a85-a4fd0a587578n%40googlegroups.com.



-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35-SZQ9RUh7fgsK2C_yQhWESmJP5_jRizFocnkEUzZjJA%40mail.gmail.com.

Re: Audio-video unsupervised learning [was: Re: [opencog-dev] UnionLink, IntersectionLink, ComplementLink

Reply via email to