Just to clarify: by “performance” I mean the rate of success on a given task, not necessarily speed.
Anyway: I’m afraid I can’t help with the visual processing part then — I know nothing of using wavelets for image analysis so I can’t really say anything further until how this is supposed to work is fully sorted out. On Friday, 17 September 2021 at 22:19:47 UTC+2 linas wrote: > Hi Adrian, > > On Thu, Sep 16, 2021 at 3:02 PM Adrian Borucki <[email protected]> wrote: > > > > Yeah, this is clear to me to now — the grammar learning part is kind of > a given, the real question is how well this “image predicate” learning can > go… > > Yes, that is a question. Based on current experience, I'll say "very > far" or at least, "much farther than anyone else has gone". But that > is rather speculative: it's based on what I've been learning in a 1D > setting, and so any doubters or skeptics in the audience are > justified in doubting. Basically, I'm proposing this because it looks > promising. > > It does not help that I am just one person proposing a rather novel, > radical, counter-cultural idea that flies in the face of conventional > wisdom. I'm quite aware of this. My burden of proof is much higher, > and I am trying to supply it as best as I can. Keep asking doubtful > questions, this is maybe the most useful thing you can do right now. > So I like how this is going. I'm only irritated that you can't read my > mind :-) > > > This is a deep question as no one is even sure why neural nets > themselves work so well. > > Well, again, this goes in a very different direction. Here, the > reason that it would "work so well" is much more obvious: we ourselves > are very good at spotting part-whole structure. Why, in just a few > minutes, I can write down the obvious grammar for stop lights: glowing > red above yellow above green, surrounded by a painted yellow or black > harness. This is "obvious", and detecting this in images seems like it > should be pretty easy. > > This is in very sharp contrast to what neural nets do: you are right: > when a neural net picks out a stoplight from an image, we have no idea > how it is doing that. Perhaps somewhere in there are some weight > vectors for red, yellow, green, but where are they? Where are they > hiding? How do neural nets handle part-whole relationships? There is > a paper (from Hinton?) stating that the part-whole relationship for > neural nets is the grand challenge of the upcoming decades. By > contrast, the part-whole relationship for grammars is "obvious". > > > What needs clarification is what the structure of this filter learning > would be — what is the algorithm and what direct learning objective is it > given? > > The exact same algo as in the existing grammar learning code, modulo > needed tweaks. That code is debugged and works well. Getting it going > on images does pose some serious challenges and open questions, but I > think the general ideas survive. > > To recap that algo: given a set of inputs, one explores the parameter > space, and looks for high mutual-information correlations between > pairs. Once high-MI pairs are discovered, the dataset is passed over a > second time, this time, creating maximal spanning trees. The tree > edges are then cut to give the grammar components. > > The above yields extremely high-dimensional sparse vectors: dimension > of a million. By comparison, the highest dimension that neural nets go > up to is about a thousand. So this is one of the big differences > between the two approaches. The other, of course, is that the basis is > labelled symbolically: you can see exactly which basis element > attaches to what ("red above yellow", etc.) > > I'm currently working on the best ways to cluster these vectors into > groupings. Early results look pretty good, but also show that these > can be made much better. I can say much more in this. > > > Like in the above example, where are all these filters and numerical > arguments even coming from? > > Randomly generated. With or without some sampling bias. > > > The numerical part is especially difficult, given that you seemingly > want to get some symbolic structure out of it. > > I don't understand this statement. > > > > > Going back to neural nets, the obvious problem is that if we make one > big neural “filter” then you don’t know what is going on inside — > > That's correct. > > > so the learning will be “shallower”. The question is how much of a > problem this really is. > > Well, the leading lights of neural-net world claim that this is one of > the grand challenges of the upcoming decades, and I won't argue with > them about that. > > > Is learning down to the low-level filtering operations a viable approach > right now? > > Yes, absolutely, I think so. Obviously, I haven't convinced you yet. > That is in part because I have not fully (clearly?) communicated the > general idea, just yet. > > > An interesting research question is if you could train a neural net that > can be “queried”, possibly in natural language or some simple formal one, > so that the system on top of it can learn to “extract” various statements > about an image out of it — so these predicates would be essentially hooked > to some queries that get send to the underlying model. > > Sure, there are hundreds of people working on this, and they are > making progress. You can go to seminars, new results are regularly > presented on this. > > > Technically this probably falls somewhere in the Visual Question > Answering field… the challenge is that these models are trained to answer > questions about more abstract things like objects, not some low level > features of the image. > > Yes. Lack of a symbolic structure to neural nets impedes desirable > applicatiions, such as symbolic reasoning. > > > The final big question is what can you really do after you get that > grammar? What sort of inferences? How useful they are? > > Well, for starters, if the system recognizes a stop light, you can ask > it: "how do you know its a stop light?" and get an answer: "because > red above yellow above green." you can ask "and what else?" and get > the answer "on a painted black or yellow background" -- "and what > else?" "the colors glow in the dark" "and what else?" "they are round" > and what else" only one comes on at a time" "and what else?" "the > cycle time varies from 30 second to three minutes" "what is a cycle > time?" "the parameter on the time filter by which repetition repeats" > "what do you mean by round?" the image area of the light is defined > via a circular aperature filter". > > Good luck getting a neural net answering even one of those questions, > never mind all of them. > > > The key thing here is that if you, say, have a system that classifies > pictures, if it being built on top of this whole grammar and filter > learning pipeline means it doesn’t achieve competitive performance with > neural nets then it’s difficult to see what the comparative advantage of it > is — beyond the obvious advantage of interpretability, but that won’t save > that solution if its performance is considerably lower. > > Really? The ability to do symbolic reasoning is valueless if it is > slow? If the filter that recognizes that lights are round also > appears in other grammatically meaningful situations, you can ask a > question "what else is round?" "the sun, the moon, billiard balls, > bowling balls, baseballs, basketballs". I think we are very very far > away from having a neural net do that kind of question answering. I > think this is well within reach of grammatical systems. > > Associations between symbols and the things they represent is the > famous "symbol grounding problem", considered to be a very difficult, > unsolved problem in AI. I'm sketching a technique that solves this > problem. I think this is unique in the history of AI research. I don't > see that anyone else has ever proposed a plausible solution to the > symbol grounding problem. > > > Well, the problem is not really with grammars, that can definitely be > useful, but if that “filter sequence” part works poorly then it will > bottleneck the performance of the entire system. > > Learning it, or running it, once learned? Clearly, running it can be > superfast .. even 1980's-era DSP's did image processing quite well. > Even single-threaded CPU's have no particular problem; these days we > have multi-core CPU's and oodles of GPU's. > > The learning algo is ..something else. There are two steps: Step one: > can we get it to work, at any speed? (I think we can) Step two: can we > get it to work fast? (Who knows -- compare to deep learning, which > took decades of basic research spanning hundreds of PhD theses before > it started running fast. You and I and whatever fan-base might > materialize are not going to replicate a few thousand man-years of > basic research into performance.) > > > If that low level layer outputs garbage, then all the upper layers get > garbage, and we know what happens when you have garbage inputs in this > field... > > Don't feed it garbage! > > --linas > > -- > Patrick: Are they laughing at us? > Sponge Bob: No, Patrick, they are laughing next to us. > -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/959e9352-cc5b-481d-9a85-a4fd0a587578n%40googlegroups.com.
