On Fri, Feb 25, 2022 at 6:41 PM xanatos xanatos.com <[email protected]> wrote:
> But again, I haven't really done much in about a year, partially because > I am seeing tech SOAR past what I can do on my own. It's a little > disheartening. > search the term "soft programming". Basically, its about how to harness some of that whiz-bang-ness into a framework that you can treat as tinker-toy building blocks. But yeah, individuals cannot compete against either sharply focused startups, or against the giant corporations. It takes money, time, coordination. and a vision for how to do things. When something becomes economically important, the tinkerers are left behind. I'm tinkering with stuff that the start-ups and the big compass don't yet know how to do. I'm interested in common-sense reasoning. This leaves me in calm, placid backwaters where no one is paying attention, and the stress levels are low. > > > If you ever think I can help – let me know… > This is between you and Mark. He's taken an interest in modernizing the old infrastructure, and that is definitely a worthwhile task. If you think some parts can be swapped out for better parts, go for it. I'm busy with my project(s) above, and so can't really do any coding. I can act as a question-answering machine, though, and explain how it all used to work. For the future, I'd like to see something that is modular and documented and is a collection of "tinker-toy" parts that people can assemble and re-assemble for personal projects. Despite Ameca and gpt-3 and boston dynamics, I still think there's plenty of space for tinkerers. What's missing are the tinker-tools. For example, if lego mindstorms had been open-source, wow, things could be, could have been different. Lego mindstorms was one of the great missed opportunities. Capitalism seems to fail whenever a broader common-good is needed. That's why I'm into open source. So, then, you understand the general architecture, the general requirements. How cann all this be packaged as a kit, with a set of instructions, put-it-together and it will work type system. I think that's the goal. I mean, its easy enough to make things work in a narrow sense. Just hard-wire everything together. It's a lot harder to make it modular, so it can be adapted for different uses. --linas Dave > > > > > > > > *From:* [email protected] <[email protected]> *On Behalf Of > *Linas Vepstas > *Sent:* Friday, February 25, 2022 7:07 PM > *To:* opencog <[email protected]> > *Subject:* Re: [opencog-dev] Vision for pi_vision and AGI/atomspace > > > > Hi Dave, > > > > Thank you for that nice note! I want to splice in some comments with my > own real-world experience .... > > > > On Fri, Feb 25, 2022 at 3:42 PM xanatos xanatos.com <[email protected]> > wrote: > > Not sure if this is cogent since my application is autonomous robots in > actual hardware, but maybe useful… > > > > I used OpenCV with a carrier board ("StereoPi") for the Raspberry Pi > Compute Module that breaks out both camera ports on the Pi. I automated > face recognition with code that leveraged OpenCV that I came to find from > one Adrian Rosebock (pyimagesearch.com) that employed Haar Cascades to > determine there was a face present. > > > > The code we used also rested on a Haar cascade. It "worked great" if you > were in conventional office lighting, and faced the camera squarely. It > failed if you turned quarter-face, or showed a profile. It failed if your > office had windows, and the shade wasn't drawn. It failed in direct > sunlight. Outdoors. In stage demo and trade-show lighting conditions. We > considered a medical-training robot application, where the first responder > would be kneeling over the robot-dummy, and so their face would be at > right-angles to the camera. The Haar cascade can't do that. (We never did > find a better solution, either, at least while I was there.) > > > > The Haar cascade was able to measure the distance between the eyes, and > thus able to estimate the distance to the face, and thus able to get the > parallax right when steering the robot eyes to focus on the right spot. > (The two eyes in blender move automatically, so in principle, you could > have a cross-eyed animation, or a roll-your-eyes animation, but we never > did that.) The depth was noisey. We used an alpha-beta filter to smooth out > the jitter. > > > > I've heard vague intimations that neural nets can do better, but if so, I > suspect all available systems are proprietary and expensive (and wouldn't > run on a Pi, anyway) I do have some general ideas on how to improve on > this situation, but it would blow up this email. > > > > Once a face is detected, it sends the center-of-face data to another Pi > (the robots have three Pis in them – "cores" – a vision acquisition "core", > a language "core" and vision processing core). The vision processing core > (depending on the state the robot is in) takes this face positioning data, > chews on it and sends the corresponding servo signals to the motor core > that controls the head and eyes, and the robot follows you with its gaze > and head movements. So in theory, face **detection** and tracking are > always functionally available, but may be overridden/ignored by other > behavioral commands/statuses. > > > > The language processing side of things is always listening (I use python > speech recognition with PocketSphinx as the recognizer which works > surprisingly well) > > > > I never experimented directly with this, but everyone turned up their > noses at this, and opted for a real-time internet connection to google > speech. In retrospect, I'm wondering if this is because all the developers > either had a heavy foreign accent, or had a habit of slurring their speech > and/or mumbling. At any rate, trade-show floors are problematic, what with > the sonic assault of neighboring booths. Questions from the audience via > microphones are also a problem, although there, you could get a direct > audio cable from the mixing board that the stage techs were running. > > > > The point here is that in natural settings, audio quality is an issue. I'm > not aware of the current state-of-the-art with regards to neural nets. I > suspect that, again, the solutions are proprietary, expensive, and don't > run on Pi. But I dunno, i'm pretty much 100% totally unplugged from that > world. > > > > and now has several hundred routines it can engage depending on what it > hears, and some conflict resolution and buffering code in case responses to > one phrase would interfere with ongoing responses playing out). > > > > The system is set up so that if I use a phrase like "my name is", or "I'd > like to introduce you to" > > > > We had three versions. One was to feed text into AIML. There's an > AIML-to-AtomSpace converter. It worked as well as "native" AIML chatbots, > except that it took several minutes on startup, to load the database. That > was almost fatal. > > > > Its easy, "trivial", to write custom response rules in AIML. If I recall > the syntax, it would be something like "PATTERN:my name is *" > "RESPONSE:pleased to meet you $star-1" > > > > The second was ChatScript. That bypassed the atomspace entirely. > > > > The third was a chat-script-inspired domain-specific language called > "ghost". The intent was that authors would be able to write rules such as > "RESPONSE: please to meet you $star-1 BLINK GAZE-AT $star-1 BLINK SMILE" I > guess it worked. I never saw a working demo. The actual authors were drama > students, with no software experience: they felt it was "difficult > programming", they were used to type-written scripts for TV shows and if it > wasn't done on a word-processor, it was "programming". This was tough. > Only one person was good at this, Audrey LeeAnn Brown, and she had a > background in C++. And I don't think she liked ghost. I think there were > some PhD students who did manage to get something going for LovingAI. But I > think they too side-stepped the complexity. > > > > I later saw a demo from a game company. It was actually fairly impressive: > they had developed a GUI that allowed game designers to drag-n-drop their > way through directed NPC interactions. Basically, the NPC is trying to > tell the player to go to this-n-such spaceport and meet some sketchy > space-pirate to get gold, weapons, etc. The dialog tree automated a lot of > the low-level interaction, yet allowed fine-grained control. In this > sense, the GUI's that have been developed for games are light-years beyond > what you can do with AIML or ChatScript; the main problem is that they're > expensive, proprietary, and have lots of core issues that would need to be > fixed to apply them to robots. > > > > Open source is great for operating systems, compilers and databases. Not > so much for everything else. > > > > (and several similar phrases that are recognized by a fuzzy-logic kind of > similarity finder I wrote), **AND** it can tell a face is present, It can > filter out the name given, if any. Then a few things happen – first, the > language processor confirms the name by speaking "Hello <name> - did I get > that right?" and listens for a variety of words that are either affirming > or denying. > > > > If you're walking that path, .. well, this is what AIML is really good > at. Or, I guess, ghost? > > > > On affirmation, the system immediately begins taking snapshots every 10 > frames and stores them in a folder (the new faces dataset) of the person's > name plus the date and time as a numeric string (Dave-202202251623 for > example). Once either the person exits the view for more than 100 frames > (would-be 10 snapshots) or the system gains 100 actual face snapshots, it > hands off those images to another of the scripts from Adrian Rosebock > (encode_faces.py) that encodes the faces and turns the whole bunch into a > pickle, which is then appended to the bigger pickle that all the other > known faces are in… The name and data are also written to the database of > "people known", where additional data is written over time as interactions > with that person accrue. > > > > So I'm not sure if this answers your question about integrating it into > the speech subsystem – I basically have the audio input and processing, > audio output and visual input and processing all running in parallel on > separate physical SBCs, which all talk to each other via ZeroMQ (or PyZMQ > specifically). > > > > The point of using ROS was that it allowed everything to be "modular", at > least in theory. That you could replace one subsystem by another. Much > easier said than done. > > > > ROS uses UDP to "talk". For ROS2, they thought about using ZMQ but > rejected it in favor of something else. I forget what. > > > > It works very well, reasonably fast (especially given it only runs on Pi > 4/8gig SBCs) and provides people interacting with the unmistakable feeling > that the robot sees them, responds to their movements and speech, etc., and > remembers them. > > > > Moore's law. > > > > So, ahh, one person who should have known better ordered the best, > highest-resolution webcams they could find. 1280x1024 or something. You > could only plug two of them into a USB hub before the USB hub was > overwhelmed. And the CPU attached to that could barely keep up with the > frame rate. Despite this obvious hardware-fail, there was tremendous > resistance to down-scaling to a far more practical 640x480. Add to that a > power, heat and cooling budget. Ugh. > > > > Managing engineers is like herding cats. Or pushing rope. Something like > that. > > > > > > The drawback that I haven't done anything with in the past year or so, but > has a relatively easy fix – is that the pickle data for a given person ages > (my grandkids are no longer reliably recognized since they were 3 and 5 > when I first implemented that build, and they are 6 and 8 now) – so I need > to add a routine that occasionally updates the images silently in the > background in the recognition pickle to keep up with changes… but I've not > had the time I wanted to to do these things… > > > > For more-or-less all of the performances, there was a robot operator who > sat in the audience, monitoring the system in case it went haywire, > over-riding any responses that were inappropriate. Putting together a good > GUI that allowed the robot operator to do this, running on a tablet, is > non-trivial. (It was a website, with assorted javascript attached to > various bits and pieces of the processing pipeline.) > > > > For pretty much anything non-trivial running in the atomspace, one needs > some kind of visualization GUI to see what's going on. We do not have > one. I personally use printf for everything, because I can. But its not, > umm, usable by anyone else. > > > > If any of this gives you anything useful to pick from, I can get you code, > original source and my custom stuff. It's all Python, so I'm guessing you > should be good with that. > > > > Dave > > > > > > *From:* [email protected] <[email protected]> *On Behalf Of > *Linas Vepstas > *Sent:* Friday, February 25, 2022 3:33 PM > *To:* opencog <[email protected]> > *Subject:* Re: [opencog-dev] Vision for pi_vision and AGI/atomspace > > > > Hi Mark, > > > > Preface for anyone else reading this: Mark is dusting off the old Hanson > Robotics code for Eva. One of the subsystems was face-tracking. When your > webcam was calibrated correctly, then Eva had this uncanny ability to look > at you from out of the screen: her eyes would track your position. It was > really pretty cool, as you really got the sense she was looking at you. > > > > Anyway, it seems that Mark has this code working again, or almost working? > A related gotcha is some of the camera-transforms in Blender needed to be > adjusted, to accurately reflect that you sit about an arms-length away from > your computer screen, which is small on laptops but big on desktops, etc. > so eye tracking didn't work right if all these dimensions weren't accounted > for. It was kind of tricky to get it all right. But when it worked, it was > really cool and even spine-tingling. > > > > What about face recognition? This too worked, in a limited setting: she > could recognize a handful of faces, and pull out the names of those people > from a database. There are then three questions; how did this work, back > then, how can it be made to work in the short term, and what is the correct > long-term architecture? > > > > First part: "how did it work back then"? See > https://github.com/opencog/ros-behavior-scripting The code might be > bit-rotted, but it worked. (There was some radical meatball surgery towards > the end; this might need to be revisited.) The general philosophy, back > then, was that: > > * The 3D locations of objects (such as faces) would be stored in the > opencog "spacetime server". > > * The only reason to do this was so that there could be an API for verbal > propositions: near, far, next to, behind, in front of, to the left of, etc. > that the language subsystem could use. That API was never built. > > * The AtomSpace would hold all information about everything, e.g face #135 > is actually Ben who is NN years old, lives in YY, loves robots, and is > standing "next to" David (as reported by the space-server) > > * Why the AtomSpace? Because its the obvious place where current sensory > info: sight & sound, can be integrated in with long-term knowledge and > memories, as well as the dialog/language subsystem, as well as controlling > movement and behaviour (turn left, right, blink and smile..) > > * Unfortunately, integrating the senses together with the background > knowledge is hard. It was done in an ad hoc manner, it was > under-documented, hard to use, hard to understand. An adequate framework > was never developed. This is not something one college student can knock > out in a few weeks. The foundation for that framework is in the > ros-behavior-scripting git repo. Fragments are in other places, I'd have to > dig them up. > > > > So ... back to the question: face recognition: Sure. Whatever. If you > have a module that can recognize faces, then sure, whatever, have it > forward that info to the AtomSpace. That's the easy part. The hard part > is to integrate it into the speech subsystem. So, when a new person > appears in front of the camera, and says "Hi, my name is Mark", something > has to extract the word "Mark", realize that "Mark" is someone's name, > understand that there is probably a real-time correlation between that name > and what the camera is seeing, take a snapshot of what the camera is > seeing, and permanently tag that image with the name "Mark". To remember > it. So that, minutes later, when Mark leaves the room and comes back, or > months later, after a reboot, Eva still remembers what Mark looks like, as > well as his favorite color, sports-team, childhood hero, mother's maiden > name, last four digits of his soc sec and bank account #. > > > > I think all that is doable, and there are many different ways of doing the > above, from quick short hacks to complicated theoretically-correct > approaches ... but .. this email is too long, so, let me leave it at that. > > > > -- Linas > > > > On Fri, Feb 25, 2022 at 8:16 AM Mark Wigzell <[email protected]> > wrote: > > Hi folks, my subject stems from having recently done a deep-dive into the > pi_vision implementation. The original face detection and tracking was > rusted, so I revamped it. In doing so I added in a hook for eventually > augmenting the "new_face" message with some face recognition. I was > informed that rather than splicing in some face detection algorithm at the > pi_vision level, the "vision" would be to have the image elements reach the > atomspace, and thus allow recognition to occur at a more basic level. > > > > Therefore, pursuant to the above, I'm asking for a high level description > of how AGI vision could be accomplished. Perhaps we can also address > the question of why face detection and tracking are "ok" but face > recognition is not? Maybe all processing should be done at a lower level? > > -- > You received this message because you are subscribed to the Google Groups > "opencog" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CA%2Ba9A7AYNxawVTjbn5sQXp7AjToj1xteyCnCibrBO7TZwDDsSQ%40mail.gmail.com > <https://groups.google.com/d/msgid/opencog/CA%2Ba9A7AYNxawVTjbn5sQXp7AjToj1xteyCnCibrBO7TZwDDsSQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > > > -- > > Patrick: Are they laughing at us? > > Sponge Bob: No, Patrick, they are laughing next to us. > > > > > > -- > You received this message because you are subscribed to the Google Groups > "opencog" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CAHrUA37f2-KYNQG8oj4QQdoLNXLib4%2BeJB1hKrf5H9r5qPy%3Dgw%40mail.gmail.com > <https://groups.google.com/d/msgid/opencog/CAHrUA37f2-KYNQG8oj4QQdoLNXLib4%2BeJB1hKrf5H9r5qPy%3Dgw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > -- > You received this message because you are subscribed to the Google Groups > "opencog" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CH0PR20MB3980707E0E973F770A5D3196B53E9%40CH0PR20MB3980.namprd20.prod.outlook.com > <https://groups.google.com/d/msgid/opencog/CH0PR20MB3980707E0E973F770A5D3196B53E9%40CH0PR20MB3980.namprd20.prod.outlook.com?utm_medium=email&utm_source=footer> > . > > > > -- > > Patrick: Are they laughing at us? > > Sponge Bob: No, Patrick, they are laughing next to us. > > > > > > -- > You received this message because you are subscribed to the Google Groups > "opencog" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CAHrUA35x4vEj_GJvFdOYNc7-4vfnjipW%2BdFv1LQDc%3DcSpTOTOA%40mail.gmail.com > <https://groups.google.com/d/msgid/opencog/CAHrUA35x4vEj_GJvFdOYNc7-4vfnjipW%2BdFv1LQDc%3DcSpTOTOA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > -- > You received this message because you are subscribed to the Google Groups > "opencog" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CH0PR20MB398072E84567CEEB6BC385F9B53F9%40CH0PR20MB3980.namprd20.prod.outlook.com > <https://groups.google.com/d/msgid/opencog/CH0PR20MB398072E84567CEEB6BC385F9B53F9%40CH0PR20MB3980.namprd20.prod.outlook.com?utm_medium=email&utm_source=footer> > . > -- Patrick: Are they laughing at us? Sponge Bob: No, Patrick, they are laughing next to us. -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAHrUA37Zx4vPsKM4KDsHYpagw%2B3SyVJjDDj3rWhRL%3DnQoLiRdQ%40mail.gmail.com.
