Re: Audio-video unsupervised learning [was: Re: [opencog-dev] UnionLink, IntersectionLink, ComplementLink

Linas Vepstas Tue, 21 Sep 2021 12:33:00 -0700

Hey!

On Tue, Sep 21, 2021 at 11:32 AM Adrian Borucki <[email protected]> wrote:
>
> Sure, I’ve already forked the repository below and started adding things,


Feel free to push to the main opencog repo. Either push directly, or
use pull requests. Probably easier to push directly. The only things I
insist on is that the makefiles and directory structures follow that
of the other repos, and you've done that. (well, quibble:
`opencog/visops` should be `opencog/atoms/visops` but this probably
doesn't matter.)

> I don’t know if I’m going to have something working this week or if I get 
> stuck, we’ll see.

I don't think you'll get stuck.

>> The StreamValue was invented to hold things like audio, video, and I
>> guess its OK to use it for static images, too. See
>> https://github.com/opencog/atomspace/blob/master/examples/atomspace/stream.scm
>>
> It seems like streams correspond to a concept of the same name in some other 
> programming languages (or to the concept named “generators”).

You are right. The intent is generators, not streams, so these are
perhaps misnamed. The only defense I have is that "streams" is easier
to type than "generators", and that the atomspace does not have loop
constructs, nor does it have any "get-next" constructs, and so, at the
atomese level, both streams and generators are "the same thing". More
or less.

There is very little experience in how these things should work, in
Atomese. The existing streams were created to be just enough to allow
the basic demos, and that's all. They do work "as intended", and
that's all. There may be better ways.

One interesting variant is the QueueValue, which allows multiple
threads to push stuff onto a queue for later pickup. This was created
to allow a parallelized pattern engine; a few years ago, Ben was
pushing hard to have it run in parallel to get faster results. Now it
does, although the interest has waned. This means that the QueueValue
is stream-like and not generator-like. Basically, the data-producer
(the pattern engine) is slower than the data-consumer, and so we want
to operate in a mode where it's creating data as fast as possible.
This is a weird mirror-symmetric variation to "lazy evaluation": now
that the consumer has asked to producer for some data, the consumer
expects the producer to work as fast as possible, and dribble in the
results as they become ready, rather than saving them up to be
delivered in one big batch at the end.

What's the right way to deal with audio and video (or image) data?
Right now, I don't know, beyond some gut-feels. Something simple that
works is better than something complicated. Don't add complexity
unless you really really need it. So I'm quite happy to be ambiguous
as to whether these things are generators or streams or promises or
something else similar to all that. Something that works is better
than something fancy that doesn't work.

> That should mean that if we have a list of image files to process, then we 
> can iterate through that, getting the “next” image each time.

Ah! That's a trick question, with two answers. First knee-jerk answer
is "yes". Since atomese has no explicit iterators, or loops or "do the
next one" constructs, all of this iteration has to happen under the
covers.

For the learning pipeline, though, its trickier. Let me sketch that
out. Currently, the learning pipeline is a large collection of mostly
scheme code, rather than Atomese, that processes data files in an ad
hoc fashion, feeding them into the pipeline, accumulating counts in
the atomspace. It's "ad hoc" because there hasn't been any reason to
do anything better/fancier. It's in scheme, not c++ or python or
atomese, because that was (for me) the easiest and fastest way to get
things working. Someday, it could be redesigned, but not today.

So, the learning pipeline for images, as I currently envision it,
would work like so:

Create N=50 to N=500 random filter sequences. Given a single image,
each filter sequence produces a single-bit t/f output. Given one image
and N filters, there are N(N-1)/2 result pairs. If both ends of the
pair are t, then the count is incremented for that pair; otherwise
not.

Given M input images, apply the above to each of the images. The
result is a collection of pairs, with varying pair-counts. (Up to a
maximum of M. The bigger the M, the better is the general rule).
Given this raw info on pairs, the generic learning pipeline kicks in,
and does the rest.  The generic pipeline computes the mutual
information of the pairs, it extracts disjuncts, it merges disjuncts
into classes, and ... whatever will come next.

There are two aspects that are different with the image pipeline, as
compared to before. One is that some of these random filters may be
generating useless noise. These are presumably those with the lowest
marginal MI. They need to be discarded, and replaced, so that we build
up a good collection of "useful" or "meaningful" filters. The other is
that the filters with the highest MI with each-other might in fact be
nearly identical, and so we only need one of these, not both. One of
the two needs to be discarded.  How exactly this gets handled is a big
TBD question.

The point of my writing out the above is to show what the "stream"
looks like, today. All of the above (for sentences, not for images) is
implemented in the "ad hoc" processing pipeline. A sequence of bits
corresponding to a sequence of images might be useful, but not
necessary. A sequence of bit-pairs might be useful, but not necessary.
Could the pipeline be redesigned to work with such streams? Possibly.
Does it seem urgent, right now? No.

(Well, actually, now that I think about it: I am struggling with how
to implement incremental learning aka "lifetime learning", and moving
the code to a stream/generator infrastructure may be just the
thing...)

> The RandomStream should probably be renamed to something more descriptive, so 
> that it is clear it produces a specific data type (the lack of name spaces in 
> Atomese hurts here but that’s a side note).

Atomese has many issues. The ones that get fixed tend to be the ones
that people complain about the most (and that have a clear solution).

-- linas

-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA36%2BtYp9sO7baDnCzo9G1fR0VmOZnfwcFJfO%2BWc6-Lgn1w%40mail.gmail.com.

Re: Audio-video unsupervised learning [was: Re: [opencog-dev] UnionLink, IntersectionLink, ComplementLink

Reply via email to