> > Actions it can perform are to walk the directory tree; but why? Is it > somehow "fun" for the agent to walk directories and look at files? What > should it do next? >
Very interesting questions. The twist is that these data descriptors are being written as Link Grammar > (LG) connector types. I believe there is a correspondence between types and grammars. I worked out a typed framework (implementation is in process) which uses a kind of unrestricted grammar instead of traditional types. IvanV pon, 29. tra 2024. u 04:28 Linas Vepstas <[email protected]> napisao je: > For your disbelief and general entertainment: a new project exploring what > perception and action is, and how this could be integrated with Atomese > agents: > > https://github.com/opencog/sensory > > The README has more: > > Sensory Atomese > =============== > This repo explores how perception and action within an external world > might work with the [AtomSpace](https://github.com/opencog/atomspace). > > TL;DR: Explores philosophical approaches to perception & action via > actual working code using low-level AtomSpace sensory I/O Atoms. > Experimental lab for this is > "perceiving" filesystem files, "moving" through directories, and > likewise for IRC chat streams. > > Philosophical Overview > ---------------------- > The issue for any agent is being able to perceive the environment that > it is in, and then being able to interact with this environment. > > For OpenCog, and, specifically, for OpenCog Atomese, all interaction, > knowledge, data and reasoning is represented with and performed by > Atoms (stored in a hypergraph database) and Values (transient data > flowing through a network defined by Atoms). > > It is not hard to generate Atoms, flow some Values around, and perform > some action (control a robot, say some words). The hard part is to > (conceptually, philosophically) understand how an agent can represent > the external world (with Atoms and Values), and how it should go about > doing things. The agent needs to perceive the environment in some way > that results in an AtomSpace representation that is easy for agent > subsystems to work with. This perception must include the agent's > awareness of the different kinds of actions it might be able to perform > upon interacting with the external world. That is, before one can > attempt to model the external world, one must be able to model one's own > ability to perform action. This is where the boundary between the agent > and the external world lies: the boundary of the controllable. > > Traditional human conceptions of senses include sight and hearing; > traditional ideas consist of moving (robot) limbs. See, for example, > the git repo [opencog/vision](https://github.com/opencog/vision) > for OpenCV-based vision Atomese. (Note: It is at version 0.0.2) > > The task being tackled here is at once much simpler and much harder: > exploring the unix filesystem, and interacting via chat. This might > sound easy, trivially easy, even, if you're a software developer. > The hard part is this: how does an agent know what a "file" is? > What a "directory" is? Actions it can perform are to walk the directory > tree; but why? Is it somehow "fun" for the agent to walk directories > and look at files? What should it do next? Read the same file again, > or maybe try some other file? Will the agent notice that maybe some > file has changed? If it notices, what should it do? What does it mean, > personally, to the agent, that some file changed? Should it care? Should > it be curious? > > The agent can create files. Does it notice that it has created them? > Does it recognize those files as works of it's own making? Should it > read them, and admire the contents? Or perform some processing on them? > Or is this like eating excrement? What part of the "external world" > (the filesystem) is perceived to be truly external, and what part is > "part of the agent itself"? What does it mean to exist and operate in > a world like this? What's the fundamental nature of action and > perception? > > When an agent "looks at" a file, or "looks at" the list of users on > a chat channel, is this an action, or a perception? Both, of course: > the agent must make a conscious decision to look (take an action) and > then, upon taking that action, sense the results (get the text in the > file or the chat text). After this, it must "perceive" the results: > figure out what they "mean". > > These are the questions that seem to matter, for agent design. The code > in this git repo is some extremely low-level, crude Atomese interfaces > that try to expose these issues up into the AtomSpace. > > Currently, two interfaces are being explored: a unix filesystem > interface, and an IRC chat interface. Hopefully, this is broad enough to > expose some of the design issues. Basically, chat is not like a > filesystem: there is a large variety of IRC commands, there are public > channels, there are private conversations. They are bi-directional. > The kind of sensory information coming from chat is just different than > the sensory information coming from files (even though, as a clever > software engineer, one could map chat I/O to a filesystem-style > interface.) The point here is not to be "clever", but to design > action-perception correctly. Trying to support very different kinds > of sensorimotor systems keeps us honest. > > Typed Pipes and Data Processing Networks > ---------------------------------------- > In unix, there is the conception of a "pipe", having two endpoints. A > pair of unix processes can communicate "data" across a pipe, merely by > opening each endpoint, and reading/writing data to it. Internet sockets > are a special case of pipes, where the connected processes are running > on different computers somewhere on the internet. > > Unix pipes are not typed: there is no a priori way of knowing what kind > of data might come flowing down them. Could be anything. For a pair of > processes to communicate, they must agree on the message set passing > through the pipe. The current solution to this is the IETF RFC's, which > are a rich collection of human-readable documents describing datastream > formats at the meta level. In a few rare cases, one can get a machine- > -readable description of the data. An example of this is the DTD, the > [Data Type Definition]( > https://en.wikipedia.org/wiki/Document_type_definition), > which is used by web browsers to figure out what kind of HTML is being > delivered (although the DTD is meant to be general enough for "any" use.) > Other examples include the X.500 and LDAP schemas, as well as SNMP. > > However, there is no generic way of asking a pipe "hey mister pipe, what > are you? What kind of data passes over you?" or "how do I communicate > with whatever is at the other end of this pipe?" Usually, these > questions are resolved by some sort of hand-shaking and negotiation > when two parties connect. > > The experiment being done here, in this git repo, in this code-base, is > to assign a type to a pipe. This replaces the earliest stages of > protocol negotiation: if a system wishes only connect to pipes of type > `FOO`, then it can know what is available a priori, by examining the > connection types attached to that pipe. If they are > `BAR+ or FOO+ or BLITZ+`, then we're good: the `or` is a disjunctive-or, > a menu choice of what is being served on that pipe. Upon opening that > pipe, some additional data descriptors might be served up, again in the > form of a menu choice. If the communicating processes wish to exchange > text data, when eventually find `TEXT-` and `TEXT+`, which are two > connectors stating "I'll send you text data" and "That's great, because > I can receive text data". > > So far, so good. This is just plain-old ordinary computer science, so > far. The twist is that these data descriptors are being written as Link > Grammar (LG) connector types. Link Grammar is a language parser: given a > collection of "words", to which a collection of connectors are attached, > the parser can connect up the connectors to create "links". The linkages > are such that the endpoints always agree as to the type of the > connector. > > The twist of using Link Grammar to create linkages changes the focus > from pair-wise, peer-to-peer connections, to a more global network > connection perspective. A linkage is possible, only if all of the > connectors are connected, only if they are connected in a way that > preserves the connector types (so that the two endpoints can actually > talk to one-another.) > > This kind of capability is not needed for the Internet, or for > peer-to-peer networks, which is why you don't see this "in real life". > That's because humans and sysadmins and software developers are smart > enough to figure out how to connect what to what, and corporate > executives can say "make it so". However, machine agents and "bots" are > not this smart. > > So the aim of this project is to create a sensory-motor system, which > self-describes using Link Grammar-style disjuncts. Each "external world" > (the unix filesystem, IRC chat, a webcam or microphone, etc.) exposes > a collection of connectors that describe the data coming from that > sensor (text, images ...) and a collection of connectors that describe > the actions that can be taken (move, open, ...) These connector-sets > are "affordances" to the external world: they describe how an agent can > work with the sensori-motor interface to "do things" in the external > world. > > Autonomous Agents > ----------------- > The sensori-motor system is just an interface. In between must lie a > bunch of data-processing nodes that take "inputs" and convert them to > "outputs". There are several ways to do this. One is to hand-code, > hard-code these connections, to create a "stimulus-response" (SRAI) > type system. For each input (stimulus) some processing happens, > and then some output is generated (response). A second way is to create > a dictionary of processing elements, each of which can take assorted > inputs or outputs, defined by connector types. Link Grammar can then be > used to obtain valid linkages between them. This approach resembles > electronics design automation (EDA): there is a dictionary of parts > (resistors, capacitors, coils, transistors ... op amps, filters, ...) > each able to take different kinds of connections. With guidance from the > (human) user, the EDA tool selects parts from the dictionary, and hooks > them up in valid circuits. Here, Link Grammar takes the role of the EDA > tool, generating only valid linkages. The (human) user still had to > select the "LG words" or "EDA parts", but LG/EDA does the rest, > generating a "netlist" (in the case of EDA) or a "linkage" (in the case > of LG). > > What if there is no human to guide parts selection and circuit design? > You can't just give an EDA tool a BOM (Bill of Materials) and say > "design some random circuit out of these parts". Well, actually, you > can, if you use some sort of evolutionary programming system. Such > systems (e.g. [as-moses](https://github.com/opencog/as-moses)) are able > to generate random trees, and then select the best/fittest ones for some > given purpose. A collection of such trees is called a "random forest" or > "decision tree forest", and, until a few years ago, random forests were > competitive in the machine-learning industry, equaling the performance > seen in deep-learning neural nets (DLNN). > > Deep learning now outperforms random forests. Can we (how can we) attach > a DLNN system to the sensori-motor system being described here? Should > we, or is this a bad idea? Let's review the situation. > > * Yes, maybe hooking up DLNN to the sensory system here is a stupid > idea. Maybe it's just technically ill-founded, and there are easier > ways of doing this. But I don't know; that's why I'm doing these > experiments. > > * Has anyone ever built a DLNN for electronic circuit design? That is, > taken a training corpus of a million different circuit designs > (netlists), and created a new system that will generate new > electronic circuits for you? I dunno. Maybe. > > * Has anyone done this for software? Yes, GPT-4 (and I guess Microsoft > CodePilot) is capable of writing short sequences of valid software to > accomplish various tasks. > > * How should one think about "training"? I like to think of LLM's as > high-resolution photo-realistic snapshots of human language. What you > "see" when you interact with GPT-2 are very detailed models of things > that humans have written, things in the training set. What you see > in GPT-4 are not just the surface text-strings, but a layer or two > deeper into the structure, resembling human reasoning. That is, GPT-2 > captures base natural language syntax (as a start), plus entities and > entity relationships and entity attributes (one layer down, past > surface syntax.) GPT-4 does down one more layer, adequate capturing > some types of human reasoning (e.g. analogical reasoning about > entities). No doubt, GPT-5 will do an even better job of emulating > the kinds of human reasoning seen in the training corpus. Is it > "just emulating" or is it "actually doing"? This is where the > industry experts debate, and I will ignore this debate. > > * DLNN training is a force-feeding of the training corpus down the > gullet of the network. Given some wiring diagram for the DLNN, > carefully crafted by human beings to have some specific number of > attention heads of a certain width, located at some certain depth, > maybe in several places, the training corpus is forced through the > circuit, arriving at a weight matrix via gradient descent. Just like a > human engineer designs an electronic circuit, so a human engineer > designs the network to be trained (using TensorFlow, or whatever). > > The proposal here is to "learn by walking aboud". A decade ago, the MIT > Robotics Lab (and others) demoed randomly-constructed virtual robots > that, starting from nothing, learned how to walk, run, climb, jump, > navigate obstacles. The training here is "learning by doing", rather > than "here's a training corpus of digitized humans/animals walking, > running, climbing, jumping". There's no corpus of moves to emulate; > there's no single-shot learning of dance-steps from youtube videos. > The robots stumble around in an environment, until they figure out > how things work, how to get stuff done. > > The proposal here is to do "the same thing", but instead of doing it > in some 3D landscape (Gazebo, Stage/Player, Minecraft...) to instead > do it in a generic sensori-motor landscape. > > Thus, the question becomes: "What is a generic sensori-motor landscape?" > and "how does a learning system interface to such a thing?" This git > repo is my best attempt to try to understand these two questions, and to > find answers to them. Apologies if the current state is underwhelming. > > > -- Linas > > -- > Patrick: Are they laughing at us? > Sponge Bob: No, Patrick, they are laughing next to us. > > > -- > You received this message because you are subscribed to the Google Groups > "opencog" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/opencog/CAHrUA36nfE29duzPNwx0kW4dzhsb44Q%3DSJ8ezEudLb0YLUpRLg%40mail.gmail.com > <https://groups.google.com/d/msgid/opencog/CAHrUA36nfE29duzPNwx0kW4dzhsb44Q%3DSJ8ezEudLb0YLUpRLg%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "opencog" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAB5%3Dj6Xx1ZbKEhVqWBmpWBHWHOdU2SV_QeSU0EdAsntyKm0GFg%40mail.gmail.com.
