RE: [opencog-dev] Vision for pi_vision and AGI/atomspace

xanatos xanatos . com Sat, 26 Feb 2022 13:36:13 -0800

Hi Mark,

I am in full agreement with you on genuine AGI.  An architecture needs to be 
created that has no "if-thens" as I call them – it has to start from scratch 
and build its world model in a way that allows it to record "successful" 
interactions with the world (such as receiving food, aka power recharges), and 
it needs to start from the "servos just randomly twitching" phase.


That said, my expertise has always been in tying many different functions 
together in python.  I've managed to tie together the visual, auditory and 
movement functions on my robots successfully, but definitely NOT in the 
"neonatal" AGI framework I would have liked.  I was after immediate results and 
short-term successes, which I achieved, but I regret not having put forth the 
effort to produce an "infant AGI".

That said, I have created the frameworks for such an AGI to express within.  
The visual acquisition system is robust, with a few hardwired functions 
pertaining to recognizing faces and objects; the auditory processing functions 
are robust and in some ways do still, despite my lagging in SO many other 
areas, excel beyond much else out there (with the exception perhaps of GPT-3, 
but my model is about 1% of the size of GPT-3 and works well at very specific 
parts of identifying what is being presented to it.)  The auditory output is 
quite robust, although devoid of nonverbal utterances, which I have created in 
code to allow a sort of analog movement-based expression of unverbalized 
"statements".  And I have a rudimentary facial/head/body movement function 
running, but nothing anywhere near close to the amazing stuff I'm seeing with 
Ameca 
(https://www.cnn.com/videos/business/2021/12/08/humanoid-robot-ameca-lon-orig-tp.cnn)

I consider having this framework in place a perfect "cradle" in which to embed 
the type of AGI we are both interested in, but I am still myself in "AGI 
Infancy".  My goal was to create robots that I could interact with in a useful 
and significant manner (for laughs, one of my goals is to have robots that can 
stack my annual firewood deliveries lol)

I have other systems functioning in test-bed environments that are non-human in 
structure, but with very sophisticated visual systems that are – apologies – 
intended to drive intelligent wagons with grippers that will autonomously pick 
up pinecones on my property every spring.  We literally get hundreds of 
thousands from the pine groves that  line our property lol.

So no, Mark, I don't believe you're being impractical.  What I believe is being 
impractical is the trend towards creating AGI as an "instant adult".  That is 
what I attempted to do and as a result I have systems that are very impressive 
in a very narrow range of functions.  So long as they are here, on my property, 
in the environments I have trained them on, on the tasks I have trained them 
on, they're pretty cool.  But they'd fail horribly outside of this environment. 
 Your vision is what will create robust AGI.

I just hope we survive as a species long enough to see these goal realized….

Let me know if any of this is useful to you,

Dave




From: [email protected] <[email protected]> On Behalf Of Mark 
Wigzell
Sent: Saturday, February 26, 2022 3:24 PM
To: opencog <[email protected]>
Subject: Re: [opencog-dev] Vision for pi_vision and AGI/atomspace

Hi Dave, if you have the impulse to help out, I'm happy to work with you.

On the subject of this thread, I was actually hoping to hear more about how low 
level "AGI" type of perception works. I have no idea. I would be happy to work 
with someone who wants to try hooking up the visual input to something that is 
or could become "intelligent". I see the issue of true artificial vision as 
being one that tries to avoid clever but non-intelligent algorithms in favour 
of something more organic. Surely a vision system must be trained much as a 
baby is trained. Indeed, surely AGI must start off like a baby? (everything is 
hooked up, but there is no control. movement is wild, emitted sounds are 
non-sensical, experienced input blends with the general awareness stemming from 
all feedback systems, but no "sense" is being made initially. Intelligence is 
present hopefully, but not manifesting rationally at this point. Or am I being 
completely impractical?

On Fri, Feb 25, 2022 at 5:45 PM Linas Vepstas 
<[email protected]<mailto:[email protected]>> wrote:


On Fri, Feb 25, 2022 at 6:41 PM xanatos xanatos.com<http://xanatos.com> 
<[email protected]<mailto:[email protected]>> wrote:
 But again, I haven't really done much in about a year, partially because I am 
seeing tech SOAR past what I can do on my own.  It's a little disheartening.

search the term "soft programming". Basically, its about how to harness some of 
that whiz-bang-ness into a framework that you can treat as tinker-toy building 
blocks.

But yeah, individuals cannot compete against either sharply focused startups, 
or against the giant corporations. It takes money, time, coordination. and a 
vision for how to do things. When something becomes economically important, the 
tinkerers are left behind.

I'm tinkering with stuff that the start-ups and the big compass don't yet know 
how to do. I'm interested in common-sense reasoning. This leaves me in calm, 
placid backwaters where no one is paying attention, and the stress levels are 
low.


If you ever think I can help – let me know…

This is between you and Mark.  He's taken an interest in modernizing the old 
infrastructure, and that is definitely a worthwhile task.  If you think some 
parts can be swapped out for better parts, go for it.  I'm busy with my 
project(s) above, and so can't really do any coding.  I can act as a 
question-answering machine, though, and explain how it all used to work.

For the future, I'd like to see something that is modular and documented and is 
a collection of "tinker-toy" parts that people can assemble and re-assemble for 
personal projects. Despite Ameca and gpt-3 and boston dynamics, I still think 
there's plenty of space for tinkerers. What's missing are the tinker-tools. For 
example, if lego mindstorms had been open-source, wow, things could be, could 
have been different. Lego mindstorms was one of the great missed opportunities. 
 Capitalism seems to fail whenever a broader common-good is needed. That's why 
I'm into open source.

So, then, you understand the general architecture, the general requirements. 
How cann all this be packaged as a kit, with a set of instructions, 
put-it-together and it will work type system. I think that's the goal.

I mean, its easy enough to make things work in a narrow sense. Just hard-wire 
everything together. It's a lot harder to make it modular, so it can be adapted 
for different uses.

--linas



Dave



From: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>> On Behalf Of Linas 
Vepstas
Sent: Friday, February 25, 2022 7:07 PM
To: opencog <[email protected]<mailto:[email protected]>>
Subject: Re: [opencog-dev] Vision for pi_vision and AGI/atomspace

Hi Dave,

Thank you for that nice note!  I want to splice in some comments with my own 
real-world experience ....

On Fri, Feb 25, 2022 at 3:42 PM xanatos xanatos.com<http://xanatos.com> 
<[email protected]<mailto:[email protected]>> wrote:
Not sure if this is cogent since my application is autonomous robots in actual 
hardware, but maybe useful…

I used OpenCV with a carrier board ("StereoPi") for the Raspberry Pi Compute 
Module that breaks out both camera ports on the Pi.  I automated face 
recognition with code that leveraged OpenCV that I came to find from one Adrian 
Rosebock (pyimagesearch.com<http://pyimagesearch.com>) that employed Haar 
Cascades to determine there was a face present.

The code we used also rested on a Haar cascade. It "worked great" if you were 
in conventional office lighting, and faced the camera squarely.  It failed if 
you turned quarter-face, or showed a profile. It failed if your office had 
windows, and the shade wasn't drawn. It failed in direct sunlight. Outdoors. In 
stage demo and trade-show lighting conditions.  We considered a 
medical-training robot application, where the first responder would be kneeling 
over the robot-dummy, and so their face would be at right-angles to the camera. 
 The Haar cascade can't do that. (We never did find a better solution, either, 
at least while I was there.)

The Haar cascade was able to measure the distance between the eyes, and thus 
able to estimate the distance to the face, and thus able to get the parallax 
right when steering the robot eyes to focus on the right spot. (The two eyes in 
blender move automatically, so in principle, you could have a cross-eyed 
animation, or a roll-your-eyes animation, but we never did that.) The depth was 
noisey. We used an alpha-beta filter to smooth out the jitter.

I've heard vague intimations that neural nets can do better, but if so, I 
suspect all available systems are proprietary and expensive (and wouldn't run 
on a Pi, anyway)  I do have some general ideas on how to improve on this 
situation, but it would blow up this email.

Once a face is detected, it sends the center-of-face data to another Pi (the 
robots have three Pis in them – "cores" – a vision acquisition "core", a 
language "core" and vision processing core).  The vision processing core 
(depending on the state the robot is in) takes this face positioning data, 
chews on it and sends the corresponding servo signals to the motor core that 
controls the head and eyes, and the robot follows you with its gaze and head 
movements.  So in theory, face *detection* and tracking are always functionally 
available, but may be overridden/ignored by other behavioral commands/statuses.

The language processing side of things is always listening (I use python speech 
recognition with PocketSphinx as the recognizer which works surprisingly well)

I never experimented directly with this, but everyone turned up their noses at 
this, and opted for a real-time internet connection to google speech.  In 
retrospect, I'm wondering if this is because all the developers either had a 
heavy foreign accent, or had a habit of slurring their speech and/or mumbling. 
At any rate, trade-show floors are problematic, what with the sonic assault of 
neighboring booths. Questions from the audience via microphones are also a 
problem, although there, you could get a direct audio cable from the mixing  
board that the stage techs were running.

The point here is that in natural settings, audio quality is an issue. I'm not 
aware of the current state-of-the-art with regards to neural nets. I suspect 
that, again, the solutions are proprietary, expensive, and don't run on Pi. But 
I dunno, i'm pretty much 100% totally unplugged from that world.

and now has several hundred routines it can engage depending on what it hears, 
and some conflict resolution and buffering code in case responses to one phrase 
would interfere with ongoing responses playing out).

The system is set up so that if I use a phrase like "my name is", or "I'd like 
to introduce you to"

We had three versions. One was to feed text into AIML. There's an 
AIML-to-AtomSpace converter. It worked as well as "native" AIML chatbots, 
except that it took several minutes on startup, to load the database.  That was 
almost fatal.

Its easy, "trivial", to write custom response rules in AIML. If I recall the 
syntax, it would be something like "PATTERN:my name is *"  "RESPONSE:pleased to 
meet you $star-1"

The second was ChatScript. That bypassed the atomspace entirely.

The third was a chat-script-inspired domain-specific language called "ghost".  
The intent was that authors would be able to write rules such as "RESPONSE: 
please to meet you $star-1 BLINK GAZE-AT $star-1 BLINK SMILE"  I guess it 
worked. I never saw a working demo.  The actual authors were drama students, 
with no software experience: they felt it was "difficult programming", they 
were used to type-written scripts for TV shows and if it wasn't done on a 
word-processor, it was "programming".  This was tough. Only one person was good 
at this, Audrey LeeAnn Brown, and she had a background in C++.  And I don't 
think she liked ghost. I think there were some PhD students who did manage to 
get something going for LovingAI. But I think they too side-stepped the 
complexity.

I later saw a demo from a game company. It was actually fairly impressive: they 
had developed a GUI that allowed game designers to drag-n-drop their way 
through directed NPC interactions.  Basically, the NPC is trying to tell the 
player to go to this-n-such spaceport and meet some sketchy space-pirate to get 
gold, weapons, etc.  The dialog tree automated a lot of the low-level 
interaction, yet allowed fine-grained control.  In this sense, the GUI's that 
have been developed for games are light-years beyond what you can do with AIML 
or ChatScript; the main problem is that they're expensive, proprietary, and 
have lots of core issues that would need to be fixed to apply them to robots.

Open source is great for operating systems, compilers and databases. Not so 
much for everything else.

(and several similar phrases that are recognized by a fuzzy-logic kind of 
similarity finder I wrote), *AND* it can tell a face is present, It can filter 
out the name given, if any.  Then a few things happen – first, the language 
processor confirms the name by speaking "Hello <name> - did I get that right?" 
and listens for a variety of words that are either affirming or denying.

If you're walking that path, .. well, this is what AIML is really good at.  Or, 
I guess, ghost?

On affirmation, the system immediately begins taking snapshots every 10 frames 
and stores them in a folder (the new faces dataset) of the person's name plus 
the date and time as a numeric string (Dave-202202251623 for example).  Once 
either the person exits the view for more than 100 frames (would-be 10 
snapshots) or the system gains 100 actual face snapshots, it hands off those 
images to another of the scripts from Adrian Rosebock (encode_faces.py) that 
encodes the faces and turns the whole bunch into a pickle, which is then 
appended to the bigger pickle that all the other known faces are in…  The name 
and data are also written to the database of "people known", where additional 
data is written over time as interactions with that person accrue.

So I'm not sure if this answers your question about integrating it into the 
speech subsystem – I basically have the audio input and processing, audio 
output and visual input and processing all running in parallel on separate 
physical SBCs, which all talk to each other via ZeroMQ (or PyZMQ specifically).

The point of using ROS was that it allowed everything to be "modular", at least 
in theory. That you could replace one subsystem by another.  Much easier said 
than done.

ROS uses UDP to "talk". For ROS2, they thought about using ZMQ but rejected it 
in favor of something else.  I forget what.

It works very well, reasonably fast (especially given it only runs on Pi 4/8gig 
SBCs) and provides people interacting with the unmistakable feeling that the 
robot sees them, responds to their movements and speech, etc., and remembers 
them.

Moore's law.

So, ahh, one person who should have known better ordered the best, 
highest-resolution webcams they could find. 1280x1024 or something. You could 
only plug two of them into a USB hub before the USB hub was overwhelmed. And 
the CPU attached to that could barely keep up with the frame rate. Despite this 
obvious hardware-fail, there was tremendous resistance to down-scaling to a far 
more practical 640x480.  Add to that a power, heat and cooling budget. Ugh.

Managing engineers is like herding cats.  Or pushing rope. Something like that.


The drawback that I haven't done anything with in the past year or so, but has 
a relatively easy fix – is that the pickle data for a given person ages (my 
grandkids are no longer reliably recognized since they were 3 and 5 when I 
first implemented that build, and they are 6 and 8 now) – so I need to add a 
routine that occasionally updates the images silently in the background in the 
recognition pickle to keep up with changes…  but I've not had the time I wanted 
to to do these things…

For more-or-less all of the performances, there was a robot operator who sat in 
the audience, monitoring the system in case it went haywire, over-riding any 
responses that were inappropriate.  Putting together a good GUI that allowed 
the robot operator to do this, running on a tablet, is non-trivial. (It was a 
website, with assorted javascript attached to various bits and pieces of the 
processing pipeline.)

For pretty much anything non-trivial running in the atomspace, one needs some 
kind of visualization GUI to see what's going on.  We do not have one.  I 
personally use printf for everything, because I can. But its not, umm, usable 
by anyone else.

If any of this gives you anything useful to pick from, I can get you code, 
original source and my custom stuff.  It's all Python, so I'm guessing you 
should be good with that.

Dave


From: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>> On Behalf Of Linas 
Vepstas
Sent: Friday, February 25, 2022 3:33 PM
To: opencog <[email protected]<mailto:[email protected]>>
Subject: Re: [opencog-dev] Vision for pi_vision and AGI/atomspace

Hi Mark,

Preface for anyone else reading this: Mark is dusting off the old Hanson 
Robotics code for Eva.  One of the subsystems was face-tracking. When your 
webcam was calibrated correctly, then Eva had this uncanny ability to look at 
you from out of the screen: her eyes would track your position. It was really 
pretty cool, as you really got the sense she was looking at you.

Anyway, it seems that Mark has this code working again, or almost working? A 
related gotcha is some of the camera-transforms in Blender needed to be 
adjusted, to accurately reflect that you sit about an arms-length away from 
your computer screen, which is small on laptops but big on desktops, etc. so 
eye tracking didn't work right if all these dimensions weren't accounted for. 
It was kind of tricky to get it all right.  But when it worked, it was really 
cool and even spine-tingling.

What about face recognition? This too worked, in a limited setting: she could 
recognize a handful of faces, and pull out the names of those people from a 
database.  There are then three questions; how did this work, back then, how 
can it be made to work in the short term, and what is the correct long-term 
architecture?

First part: "how did it work back then"? See  
https://github.com/opencog/ros-behavior-scripting The code might be bit-rotted, 
but it worked. (There was some radical meatball surgery towards the end; this 
might need to be revisited.)  The general philosophy, back then, was that:
* The 3D locations of objects (such as faces) would be stored in the opencog 
"spacetime server".
* The only reason to do this was so that there could be an API for verbal 
propositions: near, far, next to, behind, in front of, to the left of, etc. 
that the language subsystem could use. That API was never built.
* The AtomSpace would hold all information about everything, e.g face #135 is 
actually Ben who is NN years old, lives in YY, loves robots, and is standing 
"next to" David (as reported by the space-server)
* Why the AtomSpace? Because its the obvious place where current sensory info: 
sight & sound, can be integrated in with long-term knowledge and memories, as 
well as the dialog/language subsystem, as well as controlling movement and 
behaviour (turn left, right, blink and smile..)
* Unfortunately, integrating the senses together with the background knowledge 
is hard. It was done in an ad hoc manner, it was under-documented, hard to use, 
hard to understand.  An adequate framework was never developed. This is not 
something one college student can knock out in a few weeks. The foundation for 
that framework is in the ros-behavior-scripting git repo. Fragments are in 
other places, I'd have to dig them up.

So ... back to the question: face recognition:  Sure. Whatever. If you have a 
module that can recognize faces, then sure, whatever, have it forward that info 
to the AtomSpace.  That's the easy part.  The hard part is to integrate it into 
the speech subsystem.  So, when a new person appears in front of the camera, 
and says "Hi, my name is Mark", something has to extract the word "Mark", 
realize that "Mark" is someone's name, understand that there is probably a 
real-time correlation between that name and what the camera is seeing, take a 
snapshot of what the camera is seeing, and permanently tag that image with the 
name "Mark". To remember it. So that, minutes later, when Mark leaves the room 
and comes back, or months later, after a reboot, Eva still remembers what Mark 
looks like, as well as his favorite color, sports-team, childhood hero, 
mother's maiden name, last four digits of his soc sec and bank account #.

I think all that is doable, and there are many different ways of doing the 
above, from quick short hacks to complicated theoretically-correct approaches 
... but .. this email is too long, so, let me leave it at that.

-- Linas

On Fri, Feb 25, 2022 at 8:16 AM Mark Wigzell 
<[email protected]<mailto:[email protected]>> wrote:
Hi folks, my subject stems from having recently done a deep-dive into the 
pi_vision implementation. The original face detection and tracking was rusted, 
so I revamped it. In doing so I added in a hook for eventually augmenting the 
"new_face" message with some face recognition. I was informed that rather than 
splicing in some face detection algorithm at the pi_vision level, the "vision" 
would be to have the image elements reach the atomspace, and thus allow 
recognition to occur at a more basic level.

Therefore, pursuant to the above, I'm asking for a high level description of 
how AGI vision could be accomplished. Perhaps we can also address the question 
of why face detection and tracking are "ok" but face recognition is not? Maybe 
all processing should be done at a lower level?
--
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CA%2Ba9A7AYNxawVTjbn5sQXp7AjToj1xteyCnCibrBO7TZwDDsSQ%40mail.gmail.com<https://groups.google.com/d/msgid/opencog/CA%2Ba9A7AYNxawVTjbn5sQXp7AjToj1xteyCnCibrBO7TZwDDsSQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.


--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.


--
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA37f2-KYNQG8oj4QQdoLNXLib4%2BeJB1hKrf5H9r5qPy%3Dgw%40mail.gmail.com<https://groups.google.com/d/msgid/opencog/CAHrUA37f2-KYNQG8oj4QQdoLNXLib4%2BeJB1hKrf5H9r5qPy%3Dgw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CH0PR20MB3980707E0E973F770A5D3196B53E9%40CH0PR20MB3980.namprd20.prod.outlook.com<https://groups.google.com/d/msgid/opencog/CH0PR20MB3980707E0E973F770A5D3196B53E9%40CH0PR20MB3980.namprd20.prod.outlook.com?utm_medium=email&utm_source=footer>.


--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.


--
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA35x4vEj_GJvFdOYNc7-4vfnjipW%2BdFv1LQDc%3DcSpTOTOA%40mail.gmail.com<https://groups.google.com/d/msgid/opencog/CAHrUA35x4vEj_GJvFdOYNc7-4vfnjipW%2BdFv1LQDc%3DcSpTOTOA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CH0PR20MB398072E84567CEEB6BC385F9B53F9%40CH0PR20MB3980.namprd20.prod.outlook.com<https://groups.google.com/d/msgid/opencog/CH0PR20MB398072E84567CEEB6BC385F9B53F9%40CH0PR20MB3980.namprd20.prod.outlook.com?utm_medium=email&utm_source=footer>.


--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.


--
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA37Zx4vPsKM4KDsHYpagw%2B3SyVJjDDj3rWhRL%3DnQoLiRdQ%40mail.gmail.com<https://groups.google.com/d/msgid/opencog/CAHrUA37Zx4vPsKM4KDsHYpagw%2B3SyVJjDDj3rWhRL%3DnQoLiRdQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CA%2Ba9A7By5pp5XXYuyHnDytTrfEqS63tr%3Dgcn2CA8RFSzkOhVTg%40mail.gmail.com<https://groups.google.com/d/msgid/opencog/CA%2Ba9A7By5pp5XXYuyHnDytTrfEqS63tr%3Dgcn2CA8RFSzkOhVTg%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CH0PR20MB3980CD1EA204537380EA070CB53F9%40CH0PR20MB3980.namprd20.prod.outlook.com.

RE: [opencog-dev] Vision for pi_vision and AGI/atomspace

Reply via email to