Hey! On Tue, Sep 21, 2021 at 11:32 AM Adrian Borucki <[email protected]> wrote: > > Sure, I’ve already forked the repository below and started adding things,
Feel free to push to the main opencog repo. Either push directly, or use pull requests. Probably easier to push directly. The only things I insist on is that the makefiles and directory structures follow that of the other repos, and you've done that. (well, quibble: `opencog/visops` should be `opencog/atoms/visops` but this probably doesn't matter.) > I don’t know if I’m going to have something working this week or if I get > stuck, we’ll see. I don't think you'll get stuck. >> The StreamValue was invented to hold things like audio, video, and I >> guess its OK to use it for static images, too. See >> https://github.com/opencog/atomspace/blob/master/examples/atomspace/stream.scm >> > It seems like streams correspond to a concept of the same name in some other > programming languages (or to the concept named “generators”). You are right. The intent is generators, not streams, so these are perhaps misnamed. The only defense I have is that "streams" is easier to type than "generators", and that the atomspace does not have loop constructs, nor does it have any "get-next" constructs, and so, at the atomese level, both streams and generators are "the same thing". More or less. There is very little experience in how these things should work, in Atomese. The existing streams were created to be just enough to allow the basic demos, and that's all. They do work "as intended", and that's all. There may be better ways. One interesting variant is the QueueValue, which allows multiple threads to push stuff onto a queue for later pickup. This was created to allow a parallelized pattern engine; a few years ago, Ben was pushing hard to have it run in parallel to get faster results. Now it does, although the interest has waned. This means that the QueueValue is stream-like and not generator-like. Basically, the data-producer (the pattern engine) is slower than the data-consumer, and so we want to operate in a mode where it's creating data as fast as possible. This is a weird mirror-symmetric variation to "lazy evaluation": now that the consumer has asked to producer for some data, the consumer expects the producer to work as fast as possible, and dribble in the results as they become ready, rather than saving them up to be delivered in one big batch at the end. What's the right way to deal with audio and video (or image) data? Right now, I don't know, beyond some gut-feels. Something simple that works is better than something complicated. Don't add complexity unless you really really need it. So I'm quite happy to be ambiguous as to whether these things are generators or streams or promises or something else similar to all that. Something that works is better than something fancy that doesn't work. > That should mean that if we have a list of image files to process, then we > can iterate through that, getting the “next” image each time. Ah! That's a trick question, with two answers. First knee-jerk answer is "yes". Since atomese has no explicit iterators, or loops or "do the next one" constructs, all of this iteration has to happen under the covers. For the learning pipeline, though, its trickier. Let me sketch that out. Currently, the learning pipeline is a large collection of mostly scheme code, rather than Atomese, that processes data files in an ad hoc fashion, feeding them into the pipeline, accumulating counts in the atomspace. It's "ad hoc" because there hasn't been any reason to do anything better/fancier. It's in scheme, not c++ or python or atomese, because that was (for me) the easiest and fastest way to get things working. Someday, it could be redesigned, but not today. So, the learning pipeline for images, as I currently envision it, would work like so: Create N=50 to N=500 random filter sequences. Given a single image, each filter sequence produces a single-bit t/f output. Given one image and N filters, there are N(N-1)/2 result pairs. If both ends of the pair are t, then the count is incremented for that pair; otherwise not. Given M input images, apply the above to each of the images. The result is a collection of pairs, with varying pair-counts. (Up to a maximum of M. The bigger the M, the better is the general rule). Given this raw info on pairs, the generic learning pipeline kicks in, and does the rest. The generic pipeline computes the mutual information of the pairs, it extracts disjuncts, it merges disjuncts into classes, and ... whatever will come next. There are two aspects that are different with the image pipeline, as compared to before. One is that some of these random filters may be generating useless noise. These are presumably those with the lowest marginal MI. They need to be discarded, and replaced, so that we build up a good collection of "useful" or "meaningful" filters. The other is that the filters with the highest MI with each-other might in fact be nearly identical, and so we only need one of these, not both. One of the two needs to be discarded. How exactly this gets handled is a big TBD question. The point of my writing out the above is to show what the "stream" looks like, today. All of the above (for sentences, not for images) is implemented in the "ad hoc" processing pipeline. A sequence of bits corresponding to a sequence of images might be useful, but not necessary. A sequence of bit-pairs might be useful, but not necessary. Could the pipeline be redesigned to work with such streams? Possibly. Does it seem urgent, right now? No. (Well, actually, now that I think about it: I am struggling with how to implement incremental learning aka "lifetime learning", and moving the code to a stream/generator infrastructure may be just the thing...) > The RandomStream should probably be renamed to something more descriptive, so > that it is clear it produces a specific data type (the lack of name spaces in > Atomese hurts here but that’s a side note). Atomese has many issues. The ones that get fixed tend to be the ones that people complain about the most (and that have a clear solution). -- linas -- Patrick: Are they laughing at us? Sponge Bob: No, Patrick, they are laughing next to us. -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA36%2BtYp9sO7baDnCzo9G1fR0VmOZnfwcFJfO%2BWc6-Lgn1w%40mail.gmail.com.
