Re: [opencog-dev] Vision for pi_vision and AGI/atomspace

Linas Vepstas Fri, 25 Feb 2022 17:45:44 -0800

On Fri, Feb 25, 2022 at 6:41 PM xanatos xanatos.com <[email protected]>
wrote:


>  But again, I haven't really done much in about a year, partially because
> I am seeing tech SOAR past what I can do on my own.  It's a little
> disheartening.
>

search the term "soft programming". Basically, its about how to harness
some of that whiz-bang-ness into a framework that you can treat as
tinker-toy building blocks.

But yeah, individuals cannot compete against either sharply focused
startups, or against the giant corporations. It takes money, time,
coordination. and a vision for how to do things. When something becomes
economically important, the tinkerers are left behind.

I'm tinkering with stuff that the start-ups and the big compass don't yet
know how to do. I'm interested in common-sense reasoning. This leaves me in
calm, placid backwaters where no one is paying attention, and the stress
levels are low.


>
>
> If you ever think I can help – let me know…
>

This is between you and Mark.  He's taken an interest in modernizing the
old infrastructure, and that is definitely a worthwhile task.  If you think
some parts can be swapped out for better parts, go for it.  I'm busy with
my project(s) above, and so can't really do any coding.  I can act as a
question-answering machine, though, and explain how it all used to work.

For the future, I'd like to see something that is modular and documented
and is a collection of "tinker-toy" parts that people can assemble and
re-assemble for personal projects. Despite Ameca and gpt-3 and boston
dynamics, I still think there's plenty of space for tinkerers. What's
missing are the tinker-tools. For example, if lego mindstorms had been
open-source, wow, things could be, could have been different. Lego
mindstorms was one of the great missed opportunities.  Capitalism seems to
fail whenever a broader common-good is needed. That's why I'm into open
source.

So, then, you understand the general architecture, the general
requirements. How cann all this be packaged as a kit, with a set of
instructions, put-it-together and it will work type system. I think that's
the goal.

I mean, its easy enough to make things work in a narrow sense. Just
hard-wire everything together. It's a lot harder to make it modular, so it
can be adapted for different uses.

--linas



Dave
>
>
>
>
>
>
>
> *From:* [email protected] <[email protected]> *On Behalf Of
> *Linas Vepstas
> *Sent:* Friday, February 25, 2022 7:07 PM
> *To:* opencog <[email protected]>
> *Subject:* Re: [opencog-dev] Vision for pi_vision and AGI/atomspace
>
>
>
> Hi Dave,
>
>
>
> Thank you for that nice note!  I want to splice in some comments with my
> own real-world experience ....
>
>
>
> On Fri, Feb 25, 2022 at 3:42 PM xanatos xanatos.com <[email protected]>
> wrote:
>
> Not sure if this is cogent since my application is autonomous robots in
> actual hardware, but maybe useful…
>
>
>
> I used OpenCV with a carrier board ("StereoPi") for the Raspberry Pi
> Compute Module that breaks out both camera ports on the Pi.  I automated
> face recognition with code that leveraged OpenCV that I came to find from
> one Adrian Rosebock (pyimagesearch.com) that employed Haar Cascades to
> determine there was a face present.
>
>
>
> The code we used also rested on a Haar cascade. It "worked great" if you
> were in conventional office lighting, and faced the camera squarely.  It
> failed if you turned quarter-face, or showed a profile. It failed if your
> office had windows, and the shade wasn't drawn. It failed in direct
> sunlight. Outdoors. In stage demo and trade-show lighting conditions.  We
> considered a medical-training robot application, where the first responder
> would be kneeling over the robot-dummy, and so their face would be at
> right-angles to the camera.  The Haar cascade can't do that. (We never did
> find a better solution, either, at least while I was there.)
>
>
>
> The Haar cascade was able to measure the distance between the eyes, and
> thus able to estimate the distance to the face, and thus able to get the
> parallax right when steering the robot eyes to focus on the right spot.
> (The two eyes in blender move automatically, so in principle, you could
> have a cross-eyed animation, or a roll-your-eyes animation, but we never
> did that.) The depth was noisey. We used an alpha-beta filter to smooth out
> the jitter.
>
>
>
> I've heard vague intimations that neural nets can do better, but if so, I
> suspect all available systems are proprietary and expensive (and wouldn't
> run on a Pi, anyway)  I do have some general ideas on how to improve on
> this situation, but it would blow up this email.
>
>
>
> Once a face is detected, it sends the center-of-face data to another Pi
> (the robots have three Pis in them – "cores" – a vision acquisition "core",
> a language "core" and vision processing core).  The vision processing core
> (depending on the state the robot is in) takes this face positioning data,
> chews on it and sends the corresponding servo signals to the motor core
> that controls the head and eyes, and the robot follows you with its gaze
> and head movements.  So in theory, face **detection** and tracking are
> always functionally available, but may be overridden/ignored by other
> behavioral commands/statuses.
>
>
>
> The language processing side of things is always listening (I use python
> speech recognition with PocketSphinx as the recognizer which works
> surprisingly well)
>
>
>
> I never experimented directly with this, but everyone turned up their
> noses at this, and opted for a real-time internet connection to google
> speech.  In retrospect, I'm wondering if this is because all the developers
> either had a heavy foreign accent, or had a habit of slurring their speech
> and/or mumbling. At any rate, trade-show floors are problematic, what with
> the sonic assault of neighboring booths. Questions from the audience via
> microphones are also a problem, although there, you could get a direct
> audio cable from the mixing  board that the stage techs were running.
>
>
>
> The point here is that in natural settings, audio quality is an issue. I'm
> not aware of the current state-of-the-art with regards to neural nets. I
> suspect that, again, the solutions are proprietary, expensive, and don't
> run on Pi. But I dunno, i'm pretty much 100% totally unplugged from that
> world.
>
>
>
> and now has several hundred routines it can engage depending on what it
> hears, and some conflict resolution and buffering code in case responses to
> one phrase would interfere with ongoing responses playing out).
>
>
>
> The system is set up so that if I use a phrase like "my name is", or "I'd
> like to introduce you to"
>
>
>
> We had three versions. One was to feed text into AIML. There's an
> AIML-to-AtomSpace converter. It worked as well as "native" AIML chatbots,
> except that it took several minutes on startup, to load the database.  That
> was almost fatal.
>
>
>
> Its easy, "trivial", to write custom response rules in AIML. If I recall
> the syntax, it would be something like "PATTERN:my name is *"
> "RESPONSE:pleased to meet you $star-1"
>
>
>
> The second was ChatScript. That bypassed the atomspace entirely.
>
>
>
> The third was a chat-script-inspired domain-specific language called
> "ghost".  The intent was that authors would be able to write rules such as
> "RESPONSE: please to meet you $star-1 BLINK GAZE-AT $star-1 BLINK SMILE"  I
> guess it worked. I never saw a working demo.  The actual authors were drama
> students, with no software experience: they felt it was "difficult
> programming", they were used to type-written scripts for TV shows and if it
> wasn't done on a word-processor, it was "programming".  This was tough.
> Only one person was good at this, Audrey LeeAnn Brown, and she had a
> background in C++.  And I don't think she liked ghost. I think there were
> some PhD students who did manage to get something going for LovingAI. But I
> think they too side-stepped the complexity.
>
>
>
> I later saw a demo from a game company. It was actually fairly impressive:
> they had developed a GUI that allowed game designers to drag-n-drop their
> way through directed NPC interactions.  Basically, the NPC is trying to
> tell the player to go to this-n-such spaceport and meet some sketchy
> space-pirate to get gold, weapons, etc.  The dialog tree automated a lot of
> the low-level interaction, yet allowed fine-grained control.  In this
> sense, the GUI's that have been developed for games are light-years beyond
> what you can do with AIML or ChatScript; the main problem is that they're
> expensive, proprietary, and have lots of core issues that would need to be
> fixed to apply them to robots.
>
>
>
> Open source is great for operating systems, compilers and databases. Not
> so much for everything else.
>
>
>
> (and several similar phrases that are recognized by a fuzzy-logic kind of
> similarity finder I wrote), **AND** it can tell a face is present, It can
> filter out the name given, if any.  Then a few things happen – first, the
> language processor confirms the name by speaking "Hello <name> - did I get
> that right?" and listens for a variety of words that are either affirming
> or denying.
>
>
>
> If you're walking that path, .. well, this is what AIML is really good
> at.  Or, I guess, ghost?
>
>
>
> On affirmation, the system immediately begins taking snapshots every 10
> frames and stores them in a folder (the new faces dataset) of the person's
> name plus the date and time as a numeric string (Dave-202202251623 for
> example).  Once either the person exits the view for more than 100 frames
> (would-be 10 snapshots) or the system gains 100 actual face snapshots, it
> hands off those images to another of the scripts from Adrian Rosebock
> (encode_faces.py) that encodes the faces and turns the whole bunch into a
> pickle, which is then appended to the bigger pickle that all the other
> known faces are in…  The name and data are also written to the database of
> "people known", where additional data is written over time as interactions
> with that person accrue.
>
>
>
> So I'm not sure if this answers your question about integrating it into
> the speech subsystem – I basically have the audio input and processing,
> audio output and visual input and processing all running in parallel on
> separate physical SBCs, which all talk to each other via ZeroMQ (or PyZMQ
> specifically).
>
>
>
> The point of using ROS was that it allowed everything to be "modular", at
> least in theory. That you could replace one subsystem by another.  Much
> easier said than done.
>
>
>
> ROS uses UDP to "talk". For ROS2, they thought about using ZMQ but
> rejected it in favor of something else.  I forget what.
>
>
>
> It works very well, reasonably fast (especially given it only runs on Pi
> 4/8gig SBCs) and provides people interacting with the unmistakable feeling
> that the robot sees them, responds to their movements and speech, etc., and
> remembers them.
>
>
>
> Moore's law.
>
>
>
> So, ahh, one person who should have known better ordered the best,
> highest-resolution webcams they could find. 1280x1024 or something. You
> could only plug two of them into a USB hub before the USB hub was
> overwhelmed. And the CPU attached to that could barely keep up with the
> frame rate. Despite this obvious hardware-fail, there was tremendous
> resistance to down-scaling to a far more practical 640x480.  Add to that a
> power, heat and cooling budget. Ugh.
>
>
>
> Managing engineers is like herding cats.  Or pushing rope. Something like
> that.
>
>
>
>
>
> The drawback that I haven't done anything with in the past year or so, but
> has a relatively easy fix – is that the pickle data for a given person ages
> (my grandkids are no longer reliably recognized since they were 3 and 5
> when I first implemented that build, and they are 6 and 8 now) – so I need
> to add a routine that occasionally updates the images silently in the
> background in the recognition pickle to keep up with changes…  but I've not
> had the time I wanted to to do these things…
>
>
>
> For more-or-less all of the performances, there was a robot operator who
> sat in the audience, monitoring the system in case it went haywire,
> over-riding any responses that were inappropriate.  Putting together a good
> GUI that allowed the robot operator to do this, running on a tablet, is
> non-trivial. (It was a website, with assorted javascript attached to
> various bits and pieces of the processing pipeline.)
>
>
>
> For pretty much anything non-trivial running in the atomspace, one needs
> some kind of visualization GUI to see what's going on.  We do not have
> one.  I personally use printf for everything, because I can. But its not,
> umm, usable by anyone else.
>
>
>
> If any of this gives you anything useful to pick from, I can get you code,
> original source and my custom stuff.  It's all Python, so I'm guessing you
> should be good with that.
>
>
>
> Dave
>
>
>
>
>
> *From:* [email protected] <[email protected]> *On Behalf Of
> *Linas Vepstas
> *Sent:* Friday, February 25, 2022 3:33 PM
> *To:* opencog <[email protected]>
> *Subject:* Re: [opencog-dev] Vision for pi_vision and AGI/atomspace
>
>
>
> Hi Mark,
>
>
>
> Preface for anyone else reading this: Mark is dusting off the old Hanson
> Robotics code for Eva.  One of the subsystems was face-tracking. When your
> webcam was calibrated correctly, then Eva had this uncanny ability to look
> at you from out of the screen: her eyes would track your position. It was
> really pretty cool, as you really got the sense she was looking at you.
>
>
>
> Anyway, it seems that Mark has this code working again, or almost working?
> A related gotcha is some of the camera-transforms in Blender needed to be
> adjusted, to accurately reflect that you sit about an arms-length away from
> your computer screen, which is small on laptops but big on desktops, etc.
> so eye tracking didn't work right if all these dimensions weren't accounted
> for. It was kind of tricky to get it all right.  But when it worked, it was
> really cool and even spine-tingling.
>
>
>
> What about face recognition? This too worked, in a limited setting: she
> could recognize a handful of faces, and pull out the names of those people
> from a database.  There are then three questions; how did this work, back
> then, how can it be made to work in the short term, and what is the correct
> long-term architecture?
>
>
>
> First part: "how did it work back then"? See
> https://github.com/opencog/ros-behavior-scripting The code might be
> bit-rotted, but it worked. (There was some radical meatball surgery towards
> the end; this might need to be revisited.)  The general philosophy, back
> then, was that:
>
> * The 3D locations of objects (such as faces) would be stored in the
> opencog "spacetime server".
>
> * The only reason to do this was so that there could be an API for verbal
> propositions: near, far, next to, behind, in front of, to the left of, etc.
> that the language subsystem could use. That API was never built.
>
> * The AtomSpace would hold all information about everything, e.g face #135
> is actually Ben who is NN years old, lives in YY, loves robots, and is
> standing "next to" David (as reported by the space-server)
>
> * Why the AtomSpace? Because its the obvious place where current sensory
> info: sight & sound, can be integrated in with long-term knowledge and
> memories, as well as the dialog/language subsystem, as well as controlling
> movement and behaviour (turn left, right, blink and smile..)
>
> * Unfortunately, integrating the senses together with the background
> knowledge is hard. It was done in an ad hoc manner, it was
> under-documented, hard to use, hard to understand.  An adequate framework
> was never developed. This is not something one college student can knock
> out in a few weeks. The foundation for that framework is in the
> ros-behavior-scripting git repo. Fragments are in other places, I'd have to
> dig them up.
>
>
>
> So ... back to the question: face recognition:  Sure. Whatever. If you
> have a module that can recognize faces, then sure, whatever, have it
> forward that info to the AtomSpace.  That's the easy part.  The hard part
> is to integrate it into the speech subsystem.  So, when a new person
> appears in front of the camera, and says "Hi, my name is Mark", something
> has to extract the word "Mark", realize that "Mark" is someone's name,
> understand that there is probably a real-time correlation between that name
> and what the camera is seeing, take a snapshot of what the camera is
> seeing, and permanently tag that image with the name "Mark". To remember
> it. So that, minutes later, when Mark leaves the room and comes back, or
> months later, after a reboot, Eva still remembers what Mark looks like, as
> well as his favorite color, sports-team, childhood hero, mother's maiden
> name, last four digits of his soc sec and bank account #.
>
>
>
> I think all that is doable, and there are many different ways of doing the
> above, from quick short hacks to complicated theoretically-correct
> approaches ... but .. this email is too long, so, let me leave it at that.
>
>
>
> -- Linas
>
>
>
> On Fri, Feb 25, 2022 at 8:16 AM Mark Wigzell <[email protected]>
> wrote:
>
> Hi folks, my subject stems from having recently done a deep-dive into the
> pi_vision implementation. The original face detection and tracking was
> rusted, so I revamped it. In doing so I added in a hook for eventually
> augmenting the "new_face" message with some face recognition. I was
> informed that rather than splicing in some face detection algorithm at the
> pi_vision level, the "vision" would be to have the image elements reach the
> atomspace, and thus allow recognition to occur at a more basic level.
>
>
>
> Therefore, pursuant to the above, I'm asking for a high level description
> of how AGI vision could be accomplished. Perhaps we can also address
> the question of why face detection and tracking are "ok" but face
> recognition is not? Maybe all processing should be done at a lower level?
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CA%2Ba9A7AYNxawVTjbn5sQXp7AjToj1xteyCnCibrBO7TZwDDsSQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CA%2Ba9A7AYNxawVTjbn5sQXp7AjToj1xteyCnCibrBO7TZwDDsSQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
>
> --
>
> Patrick: Are they laughing at us?
>
> Sponge Bob: No, Patrick, they are laughing next to us.
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA37f2-KYNQG8oj4QQdoLNXLib4%2BeJB1hKrf5H9r5qPy%3Dgw%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAHrUA37f2-KYNQG8oj4QQdoLNXLib4%2BeJB1hKrf5H9r5qPy%3Dgw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CH0PR20MB3980707E0E973F770A5D3196B53E9%40CH0PR20MB3980.namprd20.prod.outlook.com
> <https://groups.google.com/d/msgid/opencog/CH0PR20MB3980707E0E973F770A5D3196B53E9%40CH0PR20MB3980.namprd20.prod.outlook.com?utm_medium=email&utm_source=footer>
> .
>
>
>
> --
>
> Patrick: Are they laughing at us?
>
> Sponge Bob: No, Patrick, they are laughing next to us.
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA35x4vEj_GJvFdOYNc7-4vfnjipW%2BdFv1LQDc%3DcSpTOTOA%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAHrUA35x4vEj_GJvFdOYNc7-4vfnjipW%2BdFv1LQDc%3DcSpTOTOA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CH0PR20MB398072E84567CEEB6BC385F9B53F9%40CH0PR20MB3980.namprd20.prod.outlook.com
> <https://groups.google.com/d/msgid/opencog/CH0PR20MB398072E84567CEEB6BC385F9B53F9%40CH0PR20MB3980.namprd20.prod.outlook.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA37Zx4vPsKM4KDsHYpagw%2B3SyVJjDDj3rWhRL%3DnQoLiRdQ%40mail.gmail.com.

Re: [opencog-dev] Vision for pi_vision and AGI/atomspace

Reply via email to