Hi Mark,

Preface for anyone else reading this: Mark is dusting off the old Hanson
Robotics code for Eva.  One of the subsystems was face-tracking. When your
webcam was calibrated correctly, then Eva had this uncanny ability to look
at you from out of the screen: her eyes would track your position. It was
really pretty cool, as you really got the sense she was looking at you.

Anyway, it seems that Mark has this code working again, or almost working?
A related gotcha is some of the camera-transforms in Blender needed to be
adjusted, to accurately reflect that you sit about an arms-length away from
your computer screen, which is small on laptops but big on desktops, etc.
so eye tracking didn't work right if all these dimensions weren't accounted
for. It was kind of tricky to get it all right.  But when it worked, it was
really cool and even spine-tingling.

What about face recognition? This too worked, in a limited setting: she
could recognize a handful of faces, and pull out the names of those people
from a database.  There are then three questions; how did this work, back
then, how can it be made to work in the short term, and what is the correct
long-term architecture?

First part: "how did it work back then"? See
https://github.com/opencog/ros-behavior-scripting The code might be
bit-rotted, but it worked. (There was some radical meatball surgery towards
the end; this might need to be revisited.)  The general philosophy, back
then, was that:
* The 3D locations of objects (such as faces) would be stored in the
opencog "spacetime server".
* The only reason to do this was so that there could be an API for verbal
propositions: near, far, next to, behind, in front of, to the left of, etc.
that the language subsystem could use. That API was never built.
* The AtomSpace would hold all information about everything, e.g face #135
is actually Ben who is NN years old, lives in YY, loves robots, and is
standing "next to" David (as reported by the space-server)
* Why the AtomSpace? Because its the obvious place where current sensory
info: sight & sound, can be integrated in with long-term knowledge and
memories, as well as the dialog/language subsystem, as well as controlling
movement and behaviour (turn left, right, blink and smile..)
* Unfortunately, integrating the senses together with the background
knowledge is hard. It was done in an ad hoc manner, it was
under-documented, hard to use, hard to understand.  An adequate framework
was never developed. This is not something one college student can knock
out in a few weeks. The foundation for that framework is in the
ros-behavior-scripting git repo. Fragments are in other places, I'd have to
dig them up.

So ... back to the question: face recognition:  Sure. Whatever. If you have
a module that can recognize faces, then sure, whatever, have it forward
that info to the AtomSpace.  That's the easy part.  The hard part is to
integrate it into the speech subsystem.  So, when a new person appears in
front of the camera, and says "Hi, my name is Mark", something has to
extract the word "Mark", realize that "Mark" is someone's name, understand
that there is probably a real-time correlation between that name and what
the camera is seeing, take a snapshot of what the camera is seeing, and
permanently tag that image with the name "Mark". To remember it. So that,
minutes later, when Mark leaves the room and comes back, or months later,
after a reboot, Eva still remembers what Mark looks like, as well as his
favorite color, sports-team, childhood hero, mother's maiden name, last
four digits of his soc sec and bank account #.

I think all that is doable, and there are many different ways of doing the
above, from quick short hacks to complicated theoretically-correct
approaches ... but .. this email is too long, so, let me leave it at that.

-- Linas

On Fri, Feb 25, 2022 at 8:16 AM Mark Wigzell <[email protected]> wrote:

> Hi folks, my subject stems from having recently done a deep-dive into the
> pi_vision implementation. The original face detection and tracking was
> rusted, so I revamped it. In doing so I added in a hook for eventually
> augmenting the "new_face" message with some face recognition. I was
> informed that rather than splicing in some face detection algorithm at the
> pi_vision level, the "vision" would be to have the image elements reach the
> atomspace, and thus allow recognition to occur at a more basic level.
>
> Therefore, pursuant to the above, I'm asking for a high level description
> of how AGI vision could be accomplished. Perhaps we can also address
> the question of why face detection and tracking are "ok" but face
> recognition is not? Maybe all processing should be done at a lower level?
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CA%2Ba9A7AYNxawVTjbn5sQXp7AjToj1xteyCnCibrBO7TZwDDsSQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CA%2Ba9A7AYNxawVTjbn5sQXp7AjToj1xteyCnCibrBO7TZwDDsSQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA37f2-KYNQG8oj4QQdoLNXLib4%2BeJB1hKrf5H9r5qPy%3Dgw%40mail.gmail.com.

Reply via email to