Re: [opencog-dev] Vision for pi_vision and AGI/atomspace

Mark Wigzell Sat, 26 Feb 2022 12:23:53 -0800

Hi Dave, if you have the impulse to help out, I'm happy to work with you.

On the subject of this thread, I was actually hoping to hear more about how
low level "AGI" type of perception works. I have no idea. I would be happy
to work with someone who wants to try hooking up the visual input to
something that is or could become "intelligent". I see the issue of true
artificial vision as being one that tries to avoid clever but
non-intelligent algorithms in favour of something more organic. Surely a
vision system must be trained much as a baby is trained. Indeed, surely AGI
must start off like a baby? (everything is hooked up, but there is no
control. movement is wild, emitted sounds are non-sensical, experienced
input blends with the general awareness stemming from all feedback systems,
but no "sense" is being made initially. Intelligence is present hopefully,
but not manifesting rationally at this point. Or am I being completely
impractical?


On Fri, Feb 25, 2022 at 5:45 PM Linas Vepstas <[email protected]>
wrote:

>
>
> On Fri, Feb 25, 2022 at 6:41 PM xanatos xanatos.com <[email protected]>
> wrote:
>
>>  But again, I haven't really done much in about a year, partially because
>> I am seeing tech SOAR past what I can do on my own.  It's a little
>> disheartening.
>>
>
> search the term "soft programming". Basically, its about how to harness
> some of that whiz-bang-ness into a framework that you can treat as
> tinker-toy building blocks.
>
> But yeah, individuals cannot compete against either sharply focused
> startups, or against the giant corporations. It takes money, time,
> coordination. and a vision for how to do things. When something becomes
> economically important, the tinkerers are left behind.
>
> I'm tinkering with stuff that the start-ups and the big compass don't yet
> know how to do. I'm interested in common-sense reasoning. This leaves me in
> calm, placid backwaters where no one is paying attention, and the stress
> levels are low.
>
>
>>
>>
>> If you ever think I can help – let me know…
>>
>
> This is between you and Mark.  He's taken an interest in modernizing the
> old infrastructure, and that is definitely a worthwhile task.  If you think
> some parts can be swapped out for better parts, go for it.  I'm busy with
> my project(s) above, and so can't really do any coding.  I can act as a
> question-answering machine, though, and explain how it all used to work.
>
> For the future, I'd like to see something that is modular and documented
> and is a collection of "tinker-toy" parts that people can assemble and
> re-assemble for personal projects. Despite Ameca and gpt-3 and boston
> dynamics, I still think there's plenty of space for tinkerers. What's
> missing are the tinker-tools. For example, if lego mindstorms had been
> open-source, wow, things could be, could have been different. Lego
> mindstorms was one of the great missed opportunities.  Capitalism seems to
> fail whenever a broader common-good is needed. That's why I'm into open
> source.
>
> So, then, you understand the general architecture, the general
> requirements. How cann all this be packaged as a kit, with a set of
> instructions, put-it-together and it will work type system. I think that's
> the goal.
>
> I mean, its easy enough to make things work in a narrow sense. Just
> hard-wire everything together. It's a lot harder to make it modular, so it
> can be adapted for different uses.
>
> --linas
>
>
>
> Dave
>>
>>
>>
>>
>>
>>
>>
>> *From:* [email protected] <[email protected]> *On Behalf
>> Of *Linas Vepstas
>> *Sent:* Friday, February 25, 2022 7:07 PM
>> *To:* opencog <[email protected]>
>> *Subject:* Re: [opencog-dev] Vision for pi_vision and AGI/atomspace
>>
>>
>>
>> Hi Dave,
>>
>>
>>
>> Thank you for that nice note!  I want to splice in some comments with my
>> own real-world experience ....
>>
>>
>>
>> On Fri, Feb 25, 2022 at 3:42 PM xanatos xanatos.com <[email protected]>
>> wrote:
>>
>> Not sure if this is cogent since my application is autonomous robots in
>> actual hardware, but maybe useful…
>>
>>
>>
>> I used OpenCV with a carrier board ("StereoPi") for the Raspberry Pi
>> Compute Module that breaks out both camera ports on the Pi.  I automated
>> face recognition with code that leveraged OpenCV that I came to find from
>> one Adrian Rosebock (pyimagesearch.com) that employed Haar Cascades to
>> determine there was a face present.
>>
>>
>>
>> The code we used also rested on a Haar cascade. It "worked great" if you
>> were in conventional office lighting, and faced the camera squarely.  It
>> failed if you turned quarter-face, or showed a profile. It failed if your
>> office had windows, and the shade wasn't drawn. It failed in direct
>> sunlight. Outdoors. In stage demo and trade-show lighting conditions.  We
>> considered a medical-training robot application, where the first responder
>> would be kneeling over the robot-dummy, and so their face would be at
>> right-angles to the camera.  The Haar cascade can't do that. (We never did
>> find a better solution, either, at least while I was there.)
>>
>>
>>
>> The Haar cascade was able to measure the distance between the eyes, and
>> thus able to estimate the distance to the face, and thus able to get the
>> parallax right when steering the robot eyes to focus on the right spot.
>> (The two eyes in blender move automatically, so in principle, you could
>> have a cross-eyed animation, or a roll-your-eyes animation, but we never
>> did that.) The depth was noisey. We used an alpha-beta filter to smooth out
>> the jitter.
>>
>>
>>
>> I've heard vague intimations that neural nets can do better, but if so, I
>> suspect all available systems are proprietary and expensive (and wouldn't
>> run on a Pi, anyway)  I do have some general ideas on how to improve on
>> this situation, but it would blow up this email.
>>
>>
>>
>> Once a face is detected, it sends the center-of-face data to another Pi
>> (the robots have three Pis in them – "cores" – a vision acquisition "core",
>> a language "core" and vision processing core).  The vision processing core
>> (depending on the state the robot is in) takes this face positioning data,
>> chews on it and sends the corresponding servo signals to the motor core
>> that controls the head and eyes, and the robot follows you with its gaze
>> and head movements.  So in theory, face **detection** and tracking are
>> always functionally available, but may be overridden/ignored by other
>> behavioral commands/statuses.
>>
>>
>>
>> The language processing side of things is always listening (I use python
>> speech recognition with PocketSphinx as the recognizer which works
>> surprisingly well)
>>
>>
>>
>> I never experimented directly with this, but everyone turned up their
>> noses at this, and opted for a real-time internet connection to google
>> speech.  In retrospect, I'm wondering if this is because all the developers
>> either had a heavy foreign accent, or had a habit of slurring their speech
>> and/or mumbling. At any rate, trade-show floors are problematic, what with
>> the sonic assault of neighboring booths. Questions from the audience via
>> microphones are also a problem, although there, you could get a direct
>> audio cable from the mixing  board that the stage techs were running.
>>
>>
>>
>> The point here is that in natural settings, audio quality is an issue.
>> I'm not aware of the current state-of-the-art with regards to neural nets.
>> I suspect that, again, the solutions are proprietary, expensive, and don't
>> run on Pi. But I dunno, i'm pretty much 100% totally unplugged from that
>> world.
>>
>>
>>
>> and now has several hundred routines it can engage depending on what it
>> hears, and some conflict resolution and buffering code in case responses to
>> one phrase would interfere with ongoing responses playing out).
>>
>>
>>
>> The system is set up so that if I use a phrase like "my name is", or "I'd
>> like to introduce you to"
>>
>>
>>
>> We had three versions. One was to feed text into AIML. There's an
>> AIML-to-AtomSpace converter. It worked as well as "native" AIML chatbots,
>> except that it took several minutes on startup, to load the database.  That
>> was almost fatal.
>>
>>
>>
>> Its easy, "trivial", to write custom response rules in AIML. If I recall
>> the syntax, it would be something like "PATTERN:my name is *"
>> "RESPONSE:pleased to meet you $star-1"
>>
>>
>>
>> The second was ChatScript. That bypassed the atomspace entirely.
>>
>>
>>
>> The third was a chat-script-inspired domain-specific language called
>> "ghost".  The intent was that authors would be able to write rules such as
>> "RESPONSE: please to meet you $star-1 BLINK GAZE-AT $star-1 BLINK SMILE"  I
>> guess it worked. I never saw a working demo.  The actual authors were drama
>> students, with no software experience: they felt it was "difficult
>> programming", they were used to type-written scripts for TV shows and if it
>> wasn't done on a word-processor, it was "programming".  This was tough.
>> Only one person was good at this, Audrey LeeAnn Brown, and she had a
>> background in C++.  And I don't think she liked ghost. I think there were
>> some PhD students who did manage to get something going for LovingAI. But I
>> think they too side-stepped the complexity.
>>
>>
>>
>> I later saw a demo from a game company. It was actually fairly
>> impressive: they had developed a GUI that allowed game designers to
>> drag-n-drop their way through directed NPC interactions.  Basically, the
>> NPC is trying to tell the player to go to this-n-such spaceport and meet
>> some sketchy space-pirate to get gold, weapons, etc.  The dialog tree
>> automated a lot of the low-level interaction, yet allowed fine-grained
>> control.  In this sense, the GUI's that have been developed for games are
>> light-years beyond what you can do with AIML or ChatScript; the main
>> problem is that they're expensive, proprietary, and have lots of core
>> issues that would need to be fixed to apply them to robots.
>>
>>
>>
>> Open source is great for operating systems, compilers and databases. Not
>> so much for everything else.
>>
>>
>>
>> (and several similar phrases that are recognized by a fuzzy-logic kind of
>> similarity finder I wrote), **AND** it can tell a face is present, It
>> can filter out the name given, if any.  Then a few things happen – first,
>> the language processor confirms the name by speaking "Hello <name> - did I
>> get that right?" and listens for a variety of words that are either
>> affirming or denying.
>>
>>
>>
>> If you're walking that path, .. well, this is what AIML is really good
>> at.  Or, I guess, ghost?
>>
>>
>>
>> On affirmation, the system immediately begins taking snapshots every 10
>> frames and stores them in a folder (the new faces dataset) of the person's
>> name plus the date and time as a numeric string (Dave-202202251623 for
>> example).  Once either the person exits the view for more than 100 frames
>> (would-be 10 snapshots) or the system gains 100 actual face snapshots, it
>> hands off those images to another of the scripts from Adrian Rosebock
>> (encode_faces.py) that encodes the faces and turns the whole bunch into a
>> pickle, which is then appended to the bigger pickle that all the other
>> known faces are in…  The name and data are also written to the database of
>> "people known", where additional data is written over time as interactions
>> with that person accrue.
>>
>>
>>
>> So I'm not sure if this answers your question about integrating it into
>> the speech subsystem – I basically have the audio input and processing,
>> audio output and visual input and processing all running in parallel on
>> separate physical SBCs, which all talk to each other via ZeroMQ (or PyZMQ
>> specifically).
>>
>>
>>
>> The point of using ROS was that it allowed everything to be "modular", at
>> least in theory. That you could replace one subsystem by another.  Much
>> easier said than done.
>>
>>
>>
>> ROS uses UDP to "talk". For ROS2, they thought about using ZMQ but
>> rejected it in favor of something else.  I forget what.
>>
>>
>>
>> It works very well, reasonably fast (especially given it only runs on Pi
>> 4/8gig SBCs) and provides people interacting with the unmistakable feeling
>> that the robot sees them, responds to their movements and speech, etc., and
>> remembers them.
>>
>>
>>
>> Moore's law.
>>
>>
>>
>> So, ahh, one person who should have known better ordered the best,
>> highest-resolution webcams they could find. 1280x1024 or something. You
>> could only plug two of them into a USB hub before the USB hub was
>> overwhelmed. And the CPU attached to that could barely keep up with the
>> frame rate. Despite this obvious hardware-fail, there was tremendous
>> resistance to down-scaling to a far more practical 640x480.  Add to that a
>> power, heat and cooling budget. Ugh.
>>
>>
>>
>> Managing engineers is like herding cats.  Or pushing rope. Something like
>> that.
>>
>>
>>
>>
>>
>> The drawback that I haven't done anything with in the past year or so,
>> but has a relatively easy fix – is that the pickle data for a given person
>> ages (my grandkids are no longer reliably recognized since they were 3 and
>> 5 when I first implemented that build, and they are 6 and 8 now) – so I
>> need to add a routine that occasionally updates the images silently in the
>> background in the recognition pickle to keep up with changes…  but I've not
>> had the time I wanted to to do these things…
>>
>>
>>
>> For more-or-less all of the performances, there was a robot operator who
>> sat in the audience, monitoring the system in case it went haywire,
>> over-riding any responses that were inappropriate.  Putting together a good
>> GUI that allowed the robot operator to do this, running on a tablet, is
>> non-trivial. (It was a website, with assorted javascript attached to
>> various bits and pieces of the processing pipeline.)
>>
>>
>>
>> For pretty much anything non-trivial running in the atomspace, one needs
>> some kind of visualization GUI to see what's going on.  We do not have
>> one.  I personally use printf for everything, because I can. But its not,
>> umm, usable by anyone else.
>>
>>
>>
>> If any of this gives you anything useful to pick from, I can get you
>> code, original source and my custom stuff.  It's all Python, so I'm
>> guessing you should be good with that.
>>
>>
>>
>> Dave
>>
>>
>>
>>
>>
>> *From:* [email protected] <[email protected]> *On Behalf
>> Of *Linas Vepstas
>> *Sent:* Friday, February 25, 2022 3:33 PM
>> *To:* opencog <[email protected]>
>> *Subject:* Re: [opencog-dev] Vision for pi_vision and AGI/atomspace
>>
>>
>>
>> Hi Mark,
>>
>>
>>
>> Preface for anyone else reading this: Mark is dusting off the old Hanson
>> Robotics code for Eva.  One of the subsystems was face-tracking. When your
>> webcam was calibrated correctly, then Eva had this uncanny ability to look
>> at you from out of the screen: her eyes would track your position. It was
>> really pretty cool, as you really got the sense she was looking at you.
>>
>>
>>
>> Anyway, it seems that Mark has this code working again, or almost
>> working? A related gotcha is some of the camera-transforms in Blender
>> needed to be adjusted, to accurately reflect that you sit about an
>> arms-length away from your computer screen, which is small on laptops but
>> big on desktops, etc. so eye tracking didn't work right if all these
>> dimensions weren't accounted for. It was kind of tricky to get it all
>> right.  But when it worked, it was really cool and even spine-tingling.
>>
>>
>>
>> What about face recognition? This too worked, in a limited setting: she
>> could recognize a handful of faces, and pull out the names of those people
>> from a database.  There are then three questions; how did this work, back
>> then, how can it be made to work in the short term, and what is the correct
>> long-term architecture?
>>
>>
>>
>> First part: "how did it work back then"? See
>> https://github.com/opencog/ros-behavior-scripting The code might be
>> bit-rotted, but it worked. (There was some radical meatball surgery towards
>> the end; this might need to be revisited.)  The general philosophy, back
>> then, was that:
>>
>> * The 3D locations of objects (such as faces) would be stored in the
>> opencog "spacetime server".
>>
>> * The only reason to do this was so that there could be an API for verbal
>> propositions: near, far, next to, behind, in front of, to the left of, etc.
>> that the language subsystem could use. That API was never built.
>>
>> * The AtomSpace would hold all information about everything, e.g face
>> #135 is actually Ben who is NN years old, lives in YY, loves robots, and is
>> standing "next to" David (as reported by the space-server)
>>
>> * Why the AtomSpace? Because its the obvious place where current sensory
>> info: sight & sound, can be integrated in with long-term knowledge and
>> memories, as well as the dialog/language subsystem, as well as controlling
>> movement and behaviour (turn left, right, blink and smile..)
>>
>> * Unfortunately, integrating the senses together with the background
>> knowledge is hard. It was done in an ad hoc manner, it was
>> under-documented, hard to use, hard to understand.  An adequate framework
>> was never developed. This is not something one college student can knock
>> out in a few weeks. The foundation for that framework is in the
>> ros-behavior-scripting git repo. Fragments are in other places, I'd have to
>> dig them up.
>>
>>
>>
>> So ... back to the question: face recognition:  Sure. Whatever. If you
>> have a module that can recognize faces, then sure, whatever, have it
>> forward that info to the AtomSpace.  That's the easy part.  The hard part
>> is to integrate it into the speech subsystem.  So, when a new person
>> appears in front of the camera, and says "Hi, my name is Mark", something
>> has to extract the word "Mark", realize that "Mark" is someone's name,
>> understand that there is probably a real-time correlation between that name
>> and what the camera is seeing, take a snapshot of what the camera is
>> seeing, and permanently tag that image with the name "Mark". To remember
>> it. So that, minutes later, when Mark leaves the room and comes back, or
>> months later, after a reboot, Eva still remembers what Mark looks like, as
>> well as his favorite color, sports-team, childhood hero, mother's maiden
>> name, last four digits of his soc sec and bank account #.
>>
>>
>>
>> I think all that is doable, and there are many different ways of doing
>> the above, from quick short hacks to complicated theoretically-correct
>> approaches ... but .. this email is too long, so, let me leave it at that.
>>
>>
>>
>> -- Linas
>>
>>
>>
>> On Fri, Feb 25, 2022 at 8:16 AM Mark Wigzell <[email protected]>
>> wrote:
>>
>> Hi folks, my subject stems from having recently done a deep-dive into the
>> pi_vision implementation. The original face detection and tracking was
>> rusted, so I revamped it. In doing so I added in a hook for eventually
>> augmenting the "new_face" message with some face recognition. I was
>> informed that rather than splicing in some face detection algorithm at the
>> pi_vision level, the "vision" would be to have the image elements reach the
>> atomspace, and thus allow recognition to occur at a more basic level.
>>
>>
>>
>> Therefore, pursuant to the above, I'm asking for a high level description
>> of how AGI vision could be accomplished. Perhaps we can also address
>> the question of why face detection and tracking are "ok" but face
>> recognition is not? Maybe all processing should be done at a lower level?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CA%2Ba9A7AYNxawVTjbn5sQXp7AjToj1xteyCnCibrBO7TZwDDsSQ%40mail.gmail.com
>> <https://groups.google.com/d/msgid/opencog/CA%2Ba9A7AYNxawVTjbn5sQXp7AjToj1xteyCnCibrBO7TZwDDsSQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>>
>>
>> --
>>
>> Patrick: Are they laughing at us?
>>
>> Sponge Bob: No, Patrick, they are laughing next to us.
>>
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CAHrUA37f2-KYNQG8oj4QQdoLNXLib4%2BeJB1hKrf5H9r5qPy%3Dgw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/opencog/CAHrUA37f2-KYNQG8oj4QQdoLNXLib4%2BeJB1hKrf5H9r5qPy%3Dgw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CH0PR20MB3980707E0E973F770A5D3196B53E9%40CH0PR20MB3980.namprd20.prod.outlook.com
>> <https://groups.google.com/d/msgid/opencog/CH0PR20MB3980707E0E973F770A5D3196B53E9%40CH0PR20MB3980.namprd20.prod.outlook.com?utm_medium=email&utm_source=footer>
>> .
>>
>>
>>
>> --
>>
>> Patrick: Are they laughing at us?
>>
>> Sponge Bob: No, Patrick, they are laughing next to us.
>>
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CAHrUA35x4vEj_GJvFdOYNc7-4vfnjipW%2BdFv1LQDc%3DcSpTOTOA%40mail.gmail.com
>> <https://groups.google.com/d/msgid/opencog/CAHrUA35x4vEj_GJvFdOYNc7-4vfnjipW%2BdFv1LQDc%3DcSpTOTOA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "opencog" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/opencog/CH0PR20MB398072E84567CEEB6BC385F9B53F9%40CH0PR20MB3980.namprd20.prod.outlook.com
>> <https://groups.google.com/d/msgid/opencog/CH0PR20MB398072E84567CEEB6BC385F9B53F9%40CH0PR20MB3980.namprd20.prod.outlook.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> --
> Patrick: Are they laughing at us?
> Sponge Bob: No, Patrick, they are laughing next to us.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA37Zx4vPsKM4KDsHYpagw%2B3SyVJjDDj3rWhRL%3DnQoLiRdQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAHrUA37Zx4vPsKM4KDsHYpagw%2B3SyVJjDDj3rWhRL%3DnQoLiRdQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CA%2Ba9A7By5pp5XXYuyHnDytTrfEqS63tr%3Dgcn2CA8RFSzkOhVTg%40mail.gmail.com.

Re: [opencog-dev] Vision for pi_vision and AGI/atomspace

Reply via email to