[agi] Flexibility of AI vs. a PC
One thing that has been puzzling me for a while is, why some people expect an intelligence to be less flexible than a PC. What do I mean by this? A PC can have any learning algorithm, bias or representation of data we care to create. This raises another question: how are we creating a representation if not copying it from some sense from our brains? So why do we still create systems that have fixed representations of the external world, fixed methods of learning? Take the development of echo location in blind people, or the ability to take visual information from stimulating the tongue. Isn't this sufficient evidence to suggest we should be trying to make our AIs as flexible as the most flexible things we know? Will Pearson - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72201582-721bf8
Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]
Interesting. Since I am interested in parsing, I read Collin's paper. It's a solid piece of work (though with the stated error percentages, I don't believe that it really proves anything worthwhile at all) -- but your over-interpretations of it are ridiculous. You claim that It is actually showing that you can do something roughly equivalent to growing neural gas (GNG) in a space with something approaching 500,000 dimensions, but you can do it without normally having to deal with more than a few of those dimensions at one time. Collins makes no claims that even remotely resembles this. He *is* taking a deconstructionist approach (which Richard and many others would argue vehemently with) -- but that is virtually the entirety of the overlap between his paper and your claims. Where do you get all this crap about 500,000 dimensions, for example? You also make statements that are explicitly contradicted in the paper. For example, you say But there really seem to be no reason why there should be any limit to the dimensionality of the space in which the Collin's algorithm works, because it does not use an explicit vector representation while his paper quite clearly states Each tree is represented by an n dimensional vector where the i'th component counts the number of occurences of the i'th tree fragment. (A mistake I believe you made because you didn't understand the prevceding sentence -- or, more critically, *any* of the math). Are all your claims on this list this far from reality if one pursues them? - Original Message - From: Ed Porter [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Tuesday, December 04, 2007 10:52 PM Subject: RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research] The particular NL parser paper in question, Collins's Convolution Kernels for Natural Language (http://l2r.cs.uiuc.edu/~danr/Teaching/CS598-05/Papers/Collins-kernels.pdf) is actually saying something quite important that extends way beyond parsers and is highly applicable to AGI in general. It is actually showing that you can do something roughly equivalent to growing neural gas (GNG) in a space with something approaching 500,000 dimensions, but you can do it without normally having to deal with more than a few of those dimensions at one time. GNG is an algorithm I learned about from reading Peter Voss that allows one to learn how to efficiently represent a distribution in a relatively high dimensional space in a totally unsupervised manner. But there really seem to be no reason why there should be any limit to the dimensionality of the space in which the Collin's algorithm works, because it does not use an explicit vector representation, nor, if I recollect correctly, a Euclidian distance metric, but rather a similarity metric which is generally much more appropriate for matching in very high dimensional spaces. But what he is growing are not just points representing where data has occurred in a high dimensional space, but sets of points that define hyperplanes for defining the boundaries between classes. My recollection is that this system learns automatically from both labeled data (instances of correct parse trees) and randomly generated deviations from those instances. His particular algorithm matches tree structures, but with modification it would seem to be extendable to matching arbitrary nets. Other versions of it could be made to operate, like GNG, in an unsupervised manner. If you stop and think about what this is saying and generalize from it, it provides an important possible component in an AGI tool kit. What it shows is not limited to parsing, but it would seem possibly applicable to virtually any hierarchical or networked representation, including nets of semantic web RDF triples, and semantic nets, and predicate logic expressions. At first glance it appears it would even be applicable to kinkier net matching algorithms, such as an Augmented transition network (ATN) matching. So if one reads this paper with a mind to not only what it specifically shows, but to what how what it shows could be expanded, this paper says something very important. That is, that one can represent, learn, and classify things in very high dimensional spaces -- such as 10^1 dimensional spaces -- and do it efficiently provided the part of the space being represented is sufficiently sparsely connected. I had already assumed this, before reading this paper, but the paper was valuable to me because it provided a mathematically rigorous support for my prior models, and helped me better understand the mathematical foundations of my own prior intuitive thinking. It means that systems like Novemente can deal in very high dimensional spaces relatively efficiently. It does not mean that all processes that can be performed in such spaces will be computationally cheap (for example, combinatorial searches), but it means that many of them, such as GNG like recording of
Re: [agi] None of you seem to be able ...
Ben: Obviously the brain contains answers to many of the unsolved problems of AGI (not all -- e.g. not the problem of how to create a stable goal system under recursive self-improvement). However, current neuroscience does NOT contain these answers. And neither you nor anyone else has ever made a cogent argument that emulating the brain is the ONLY route to creating powerful AGI. Absolutely agree re neuroscience's lack of answers (hence Richard's assertion that his system is based on what cognitive science knows about brain architecture is not a smart one - the truth is not much at all.) The cogent argument for emulating the brain - in brief - is simply that it's the only *all-rounder* cognitive system, the only multisensory, multimedia, multisignsystem that can solve problems in language AND maths AND (arithmetic/algebra/geometry) AND diagrams AND maps AND photographs AND cinema AND painting AND sculpture 3-D models AND body language etc - and switch from solving problems in any one sign or sensory system to solving the same problems in any other sign or sensory system. And it's by extension the only truly multidomain system that can switch from solving problems in any one subject domain to any other, from solving problems of how to play football to how to marshall troops on a battlefield to how to do geometry, applying the same knowledge across domains. (I'm just formulating this argument for the first time - so it will no doubt need revisions!) But - correct me - I don't think there's any AI system that's a two-rounder, able to work across two domains and sign systems, let alone, of course all of them. (And it's taken a billion years to evolve this all-round system which is clearly grounded in a body) It LOOKS relatively straightforward to emulate or suspersede this system, when you make the cardinal error of drawing specialist comparisons - your we-can-make-a-plane-that-flies-faster-than-a-bird argument (and of course we already have machines that can think billions of times faster than the brain). But inventing general, all-round systems that are continually alive, complex psychoeconomies managing whole sets of complex activities in the real, as opposed to artificial world(s) and not just isolated tasks, is a whole different ballgame, to inventing specialist systems. It represents a whole new stage of machine evolution - a step as drastic as the evolution of life from matter - and you, sir, :), have scant respect for the awesomeness of the undertaking (even though, paradoxically, you're much more aware than most of its complexity). Respect to the brain, bro! It's a little as if you - not, I imagine, the very finest athletic specimen - were to say: hey, I can take the heavyweight champ of the world ... AND Federer... AND Tiger Woods... AND the champ of every other sport. Well, yeah, you can indeed box and play tennis and actually do every other sport, but there's an awful lot more to beating even one of those champs let alone all or a selection of them than meets the eye (even if you were in addition to have a machine that could throw super-powerful punches or play superfast backhands). Ben/MT: none of the unsolved problems are going to be solved - without major creative leaps. Just look even at the ipod iphone - major new technology never happens without such leaps. Ben:The above sentence is rather hilarious to me. If the Ipod and Iphone are your measure for creative leaps then there have been loads and loads of major creative leaps in AGI and narrow-AI research. As an example of a creative leap (that is speculative and may be wrong, but is certainly creative), check out my hypothesis of emergent social-psychological intelligence as related to mirror neurons and octonion algebras: http://www.goertzel.org/dynapsyc/2007/mirrorself.pdf Ben, Name ONE major creative leap in AGI (in narrow AI, no question, there's loads). Some background here: I am deeply interested in, have done a lot of work, on the psychology philosophy of creativity, as well as intelligence. So your creative paper is interesting to me, because it helps refine definitions of creativity and creative leaps. The ipod iphone do indeed represent brilliant leaps in terms of interfaces - with the touch-wheel and the pinch touchscreen [as distinct from the touchscreen itself] - v. neat lateral ideas which worked. No, not revolutionary in terms of changing vast fields of technology, just v. lateral, unexpected, albeit simple ideas. I have seen no similarly lateral approaches in AGI. Your paper represents almost a literal application of the idea that creativity is ingenious/lateral. Hey it's no trick to be just ingenious/lateral or fantastic. How does memory work? - well, you see, there's this system of angels that ferry every idea you have and file it in an infinite set of multiverses...etc... Anyone can come up with fantastic ideas. The
RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]
Dave, Thanks for the link. Seems like it gives Matt the right to say to the world I told you so. I wonder if OpenCog could get involved in this, or something like this, in a productive way. Ed Porter -Original Message- From: David Hart [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 3:16 AM To: agi@v2.listbox.com Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research] On 12/5/07, Matt Mahoney [EMAIL PROTECTED] wrote: [snip] Centralized search is limited to a few big players that can keep a copy of the Internet on their servers. Google is certainly useful, but imagine if it searched a space 1000 times larger and if posts were instantly added to its index, without having to wait days for its spider to find them. Imagine your post going to persistent queries posted days earlier. Imagine your queries being answered by real human beings in addition to other peers. I probably won't be the one writing this program, but where there is a need, I expect it will happen. Wikia, the company run by Wikipedia founder Jimmy Wales, is tackling the Internet-scale distributed search problem - http://search.wikia.com/wiki/Atlas Connecting to related threads (some recent, some not-so-recent), the Grub distributed crawler ( http://search.wikia.com/wiki/Grub ) is intended to be one of many plug-in Atlas Factories. A development goal for Grub is to enhance it with a NL toolkit (e.g. the soon-to-be-released RelEx), so it can do more than parse simple keywords and calculate statistical word relationships. -dave _ This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/? http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72270417-205c60
Re: [agi] None of you seem to be able ...
Ed Porter wrote: RICHARD LOOSEMOORE There is a high prima facie *risk* that intelligence involves a significant amount of irreducibility (some of the most crucial characteristics of a complete intelligence would, in any other system, cause the behavior to show a global-local disconnect), ED PORTER= Richard, prima facie means obvious on its face. The above statement and those that followed it below may be obvious to you, but it is not obvious to a lot of us, and at least I have not seen (perhaps because of my own ignorance, but perhaps not) any evidence that it is obvious. Apparently Ben also does not find your position to be obvious, and Ben is no dummy. Richard, did you ever just consider that it might be turtles all the way down, and by that I mean experiential patterns, such as those that could be represented by Novamente atoms (nodes and links) in a gen/comp hierarchy all the way down. In such a system each level is quite naturally derived from levels below it by learning from experience. There is a lot of dynamic activity, but much of it is quite orderly, like that in Hecht-Neilsen's Confabulation. There is no reason why there has to be a GLOBAL-LOCAL DISCONNECT of the type you envision, i.e., one that is totally impossible to architect in terms of until one totally explores global-local disconnect space (just think how large an exploration space that might be). So if you have prima facie evidence to support your claim (other than your paper which I read which does not meet that standard Ed, Could you please summarize for me what your understandig is of my claim for the prima facie evidence (that I gave in that paper), and then, if you would, please explain where you believe the claim goes wrong. With that level of specificity, we can discuss it. Many thanks, Richard Loosemore ), then present it. If you make me eat my words you will have taught me something sufficiently valuable that I will relish the experience. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72269726-d5af19
Re: [agi] None of you seem to be able ...
Mike Tintner wrote: Richard: science does too know a good deal about brain architecture!I *know* cognitive science. Cognitive science is a friend of mine. Mike, you are no cognitive scientist :-). Thanks, Richard, for keeping it friendly - but - are you saying cog sci knows the: *'engram' - how info is encoded *any precise cognitive form or level of the hierarchical processing vaguely defined by Hawkins et al *how ideas are compared at any level - *how analogies are produced *whether templates or similar are/are not used in visual object processing etc. etc ??? Well, you are crossing over between levels here in a way that confuses me. Did you mean brain architecture when you said brain architecture? that is, are you taking about brain-level stuff, or cognitive-level stuff? I took you to be talking quite literally about the neural level. More generally, though, we understand a lot, but of course the picture is extremely incomplete. But even though the picture is incomplete that would not mean that cognitive science knows almost nothing. My position is that cog sci has a *huge* amount of information stashed away, but it is in a format that makes it very hard for someone trying to build an intelligent system to actually use. AI people make very little use of this information at all. My goal is to deconstruct cog sci in such a way as to make it usable in AI. That is what I am doing now. Obviously, if science can't answer the engram question, it can hardly answer anything else. You are indeed a cognitive scientist but you don't seem to have a very good overall scientific/philosophical perspective on what that entails - and the status of cog. sci. is a fascinating one, philosophically. You see, I utterly believe in the cog. sci. approach of applying computational models to the brain and human thinking. But what that has produced is *not* hard knowledge. It has made us aware of the complexities of what is probably involved, got us to the point where we are, so to speak, v. warm / close to the truth. But no, as, I think Ben asserted, what we actually *know* for sure about the brain's information processing is v. v. little. (Just look at our previous dispute, where clearly there is no definite knowledge at all about how much parallel computation is involved in the brain's processing of any idea [like a sentence]). Those cog. sci, models are more like analogies than true theoretical models. And anyway most of the time though by no means all, cognitive scientists are like you Minsky - much more interested in the AI applications of their models than in their literal scientific truth. If you disagree, point to the hard knowledge re items like those listed above, which surely must be the basis of any AI system that can legitimately claim to be based on the brain's architecture. Well, it is difficult to know where to start. What about the word priming results? There is an enormous corpus of data concerning the time course of activation of words as a result of seeing/hearing other words. I can use some of that data to constrain my models of activation. Then there are studies of speech errors that show what kinds of events occur during attempts to articulate sentences: that data can be used to say a great deal about the processes involved in going from an intention to articulation. On and on the list goes: I could spend all day just writing down examples of cognitive data and how it relates to models of intelligence. Did you know, for example, that certain kinds of brain damage can leave a person with the ability to name a visually presented object, but then be unable to pick the object up and move it through space in a way that is consistent with the object's normal use . and that another type of brain damage can result in a person have exactly the opposite problem: they can look at an object and say I have no idea what that is, and yet when you ask them to pick the thing up and do what they would typically do with the object, they pick it up and show every sign that they know exactly what it is for (e.g. object is a key: they say they don't know what it is, but then they pick it up and put it straight into a nearby lock). Now, interpreting that result is not easy, but it does seem to tell us that there are two almost independent systems in the brain that handle vision-for-identification and vision-for-action. Why? I don't know, but I have some ideas, and those ideas are helping to constrain my framework. Another example of where you are not so hot on the *philosophy* of cog. sci. is our v. first dispute. I claimed and claim that it is fundamental to cog sci to treat the brain/mind as rational. And I'm right - and produced and can continue endlessly producing evidence. (It is fundamental to all the social sciences to treat humans as rational decisionmaking agents). Oh no it doesn't, you said, in
Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]
Ed Porter wrote: Mark, MARK WASER=== You claim that It is actually showing that you can do something roughly equivalent to growing neural gas (GNG) in a space with something approaching 500,000 dimensions, but you can do it without normally having to deal with more than a few of those dimensions at one time. Collins makes no claims that even remotely resembles this. He *is* taking a deconstructionist approach (which Richard and many others would argue vehemently with) -- but that is virtually the entirety of the overlap between his paper and your claims. Where do you get all this crap about 500,000 dimensions, for example? ED PORTER= The 500K dimensions were mentioned several times in a lecture Collins gave at MIT about his parse. This was probably 5 years ago so I am not 100% sure the number was 500K, but I am about 90% sure that was the number used, and 100% sure the number was well over 100K. The very large size of the number of dimensions was mentioned repeatedly by both Collin's and at least one other professor with whom I talked after the lecture. One of the points both emphasized was that by use of the kernel trick he was effectively matching in a 500K dimensional space, without having to deal with most of those dimensions at any one time (although, it is my understanding, that over many parses the system would deal with a large percent of all those dimensions.) It sounds like you may have misunderstood the relevance of the high number of dimensions. Correct me if I am wrong, but Collins is not really matching in large numbers of dimensions, he is using the kernel trick to transform a nonlinear CLASSIFICATION problem into a high-dimensional linear classification. This is just a trick to enable a better type of supervised learning. Would you follow me if I said that using supervised learning is of no use in general? Because it means that someone has already (a) decided on the dimensions of representation in the initial problem domain, and (b) already done all the work of classifying the sentences into syntactically correct and syntactically incorrect. All that the SVM is doing is summarizing this training data in a nice compact form: the high number of dimensions involved at one stage of the problem appear to be just an artifact of the method, it means nothing in general. It especially does not mean that this supervised training algorithm is somehow able to break out and become and unsupervised, feature-discovery method, which it would have to do to be of any general interest. I still have not read Collins' paper: I am just getting this from my understanding of the math you have mentioned here. It seems that whether or not he mentioned 500K dimensions or an infinite number of dimensions (which he could have done) makes no difference to anything. If you think it does make a big difference, could you explain why? Richard Loosemore If you read papers on support vector machines using kernel methods you will realize that it is well know that you can do certain types of matching and other operations in high dimensional spaces with out having to actually normally deal in the high dimensions by use of the kernel trick. The issue is often that of finding a particular kernel that works well for your problem. Collins shows the kernel trick can be extended to parse tree net matching. With regard to my statement that the efficiency of the kernel trick could be applied relatively generally, it is quite well supported by the following text from page 4 of the paper. This paper and previous work by Lodhi et al. [12] examining the application of convolution kernels to strings provide some evidence that convolution kernels may provide an extremely useful tool for applying modern machine learning techniques to highly structured objects. The key idea here is that one may take a structured object and split it up into parts. If one can construct kernels over the parts then one can combine these into a kernel over the whole object. Clearly, this idea can be extended recursively so that one only needs to construct kernels over the atomic parts of a structured object. The recursive combination of the kernels over parts of an object retains information regarding the structure of that object MARK WASER=== You also make statements that are explicitly contradicted in the paper. For example, you say But there really seem to be no reason why there should be any limit to the dimensionality of the space in which the Collin's algorithm works, because it does not use an explicit vector representation while his paper quite clearly states Each tree is represented by an n dimensional vector where the i'th component counts the number of occurences of the i'th tree fragment. (A mistake I believe you made because you didn't understand the prevceding sentence -- or, more critically, *any* of the math). ED PORTER= The quote you give is from the last paragraph on page
Re: [agi] None of you seem to be able ...
Richard: science does too know a good deal about brain architecture!I *know* cognitive science. Cognitive science is a friend of mine. Mike, you are no cognitive scientist :-). Thanks, Richard, for keeping it friendly - but - are you saying cog sci knows the: *'engram' - how info is encoded *any precise cognitive form or level of the hierarchical processing vaguely defined by Hawkins et al *how ideas are compared at any level - *how analogies are produced *whether templates or similar are/are not used in visual object processing etc. etc ??? Obviously, if science can't answer the engram question, it can hardly answer anything else. You are indeed a cognitive scientist but you don't seem to have a very good overall scientific/philosophical perspective on what that entails - and the status of cog. sci. is a fascinating one, philosophically. You see, I utterly believe in the cog. sci. approach of applying computational models to the brain and human thinking. But what that has produced is *not* hard knowledge. It has made us aware of the complexities of what is probably involved, got us to the point where we are, so to speak, v. warm / close to the truth. But no, as, I think Ben asserted, what we actually *know* for sure about the brain's information processing is v. v. little. (Just look at our previous dispute, where clearly there is no definite knowledge at all about how much parallel computation is involved in the brain's processing of any idea [like a sentence]). Those cog. sci, models are more like analogies than true theoretical models. And anyway most of the time though by no means all, cognitive scientists are like you Minsky - much more interested in the AI applications of their models than in their literal scientific truth. If you disagree, point to the hard knowledge re items like those listed above, which surely must be the basis of any AI system that can legitimately claim to be based on the brain's architecture. Another example of where you are not so hot on the *philosophy* of cog. sci. is our v. first dispute. I claimed and claim that it is fundamental to cog sci to treat the brain/mind as rational. And I'm right - and produced and can continue endlessly producing evidence. (It is fundamental to all the social sciences to treat humans as rational decisionmaking agents). Oh no it doesn't, you said, in effect - sci psychology is obsessed with the irrationalities of the human mind. And that is true, too. If you hadn't gone off in high dudgeon, we could have resolved the apparent contradiction. Sci psych does indeed love to study and point out all kinds of illusions and mistakes of the human mind. But to cog. sci. these are all so many *bugs* in an otherwise rational system. The system as a whole is still rational, as far as cog sci is concerned, but some of its parts - its heuristics, attitudes etc - are not. They, however, can be fixed. So what I have been personally asserting elsewhere - namely that the brain is fundamentally irrational or crazy - that the human mind can't follow a logical, joined up train of reflective thought for more than a relatively few seconds on end - and is positively designed to be like that, and can't and isn't meant to be fixed - does indeed represent a fundamental challenge to cog. sci's current rational paradigm of mind. (The flip side of that craziness is that it is a fundamentally *creative* mind - this is utterly central to AGI) - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72344338-9fc6ac
Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]
ED PORTER= The 500K dimensions were mentioned several times in a lecture Collins gave at MIT about his parse. This was probably 5 years ago so I am not 100% sure the number was 500K, but I am about 90% sure that was the number used, and 100% sure the number was well over 100K. OK. I'll bite. So what do *you* believe that these dimensions are? Words? Word pairs? Entire sentences? Different trees? - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72410952-199e0d
RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]
Richard, It actually is more valuable than you say. First, the same kernel trick can be used for GNG type unsupervised learning in high dimensional spaces. So it is not limited to supervised learning. Second, you are correct is saying that through the kernel trick it is doing the actually doing almost all of its computations in a lower dimensional space. But unlike with many kernel tricks, in this one the system actually directly access each of the dimensions in the space in different combinations as necessary. That is important. It means that you can have a space with as many dimensions as there are features or patterns in your system and still efficiently do similarity matching (but not distance matching.) Ed Porter -Original Message- From: Richard Loosemore [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 2:37 PM To: agi@v2.listbox.com Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research] Ed Porter wrote: Mark, MARK WASER=== You claim that It is actually showing that you can do something roughly equivalent to growing neural gas (GNG) in a space with something approaching 500,000 dimensions, but you can do it without normally having to deal with more than a few of those dimensions at one time. Collins makes no claims that even remotely resembles this. He *is* taking a deconstructionist approach (which Richard and many others would argue vehemently with) -- but that is virtually the entirety of the overlap between his paper and your claims. Where do you get all this crap about 500,000 dimensions, for example? ED PORTER= The 500K dimensions were mentioned several times in a lecture Collins gave at MIT about his parse. This was probably 5 years ago so I am not 100% sure the number was 500K, but I am about 90% sure that was the number used, and 100% sure the number was well over 100K. The very large size of the number of dimensions was mentioned repeatedly by both Collin's and at least one other professor with whom I talked after the lecture. One of the points both emphasized was that by use of the kernel trick he was effectively matching in a 500K dimensional space, without having to deal with most of those dimensions at any one time (although, it is my understanding, that over many parses the system would deal with a large percent of all those dimensions.) It sounds like you may have misunderstood the relevance of the high number of dimensions. Correct me if I am wrong, but Collins is not really matching in large numbers of dimensions, he is using the kernel trick to transform a nonlinear CLASSIFICATION problem into a high-dimensional linear classification. This is just a trick to enable a better type of supervised learning. Would you follow me if I said that using supervised learning is of no use in general? Because it means that someone has already (a) decided on the dimensions of representation in the initial problem domain, and (b) already done all the work of classifying the sentences into syntactically correct and syntactically incorrect. All that the SVM is doing is summarizing this training data in a nice compact form: the high number of dimensions involved at one stage of the problem appear to be just an artifact of the method, it means nothing in general. It especially does not mean that this supervised training algorithm is somehow able to break out and become and unsupervised, feature-discovery method, which it would have to do to be of any general interest. I still have not read Collins' paper: I am just getting this from my understanding of the math you have mentioned here. It seems that whether or not he mentioned 500K dimensions or an infinite number of dimensions (which he could have done) makes no difference to anything. If you think it does make a big difference, could you explain why? Richard Loosemore If you read papers on support vector machines using kernel methods you will realize that it is well know that you can do certain types of matching and other operations in high dimensional spaces with out having to actually normally deal in the high dimensions by use of the kernel trick. The issue is often that of finding a particular kernel that works well for your problem. Collins shows the kernel trick can be extended to parse tree net matching. With regard to my statement that the efficiency of the kernel trick could be applied relatively generally, it is quite well supported by the following text from page 4 of the paper. This paper and previous work by Lodhi et al. [12] examining the application of convolution kernels to strings provide some evidence that convolution kernels may provide an extremely useful tool for applying modern machine learning techniques to highly structured objects. The key idea here is that one may take a structured object and split it up into parts. If one can construct kernels over the parts then one can
RE: [agi] None of you seem to be able ...
Richard, I quickly reviewed your paper, and you will be happy to note that I had underlined and highlighted it so such skimming was more valuable that it otherwise would have been. With regard to COMPUTATIONAL IRREDUCIBILITY, I guess a lot depends on definition. Yes, my vision of a human AGI would be a very complex machine. Yes, a lot of its outputs could only be made with human level reasonableness after a very large amount of computation. I know of no shortcuts around the need to do such complex computation. So it arguably falls in to what you say Wolfram calls computational irreducibility. But the same could be said for any of many types of computations, such as large matrix equations or Google's map-reduces, which are routinely performed on supercomputers. So if that is how you define irreducibility, its not that big a deal. It just means you have to do a lot of computing to get an answer, which I have assumed all along for AGI (Remember I am the one pushing for breaking the small hardware mindset.) But it doesn't mean we don't know how to do such computing or that we have to do a lot more complexity research, of the type suggested in your paper, before we can successfully designing AGIs. With regard to GLOBAL-LOCAL DISCONNECT, again it depends what you mean. You define it as The GLD merely signifies that it might be difficult or impossible to derive analytic explanations of global regularities that we observe in the system, given only a knowledge of the local rules that drive the system. I don't know what this means. Even the game of Life referred to in your paper can be analytically explained. It is just that some of the things that happen are rather complex and would take a lot of computing to analyze. So does the global-local disconnect apply to anything where an explanation requires a lot of analysis? If that is the case than any large computation, of the type which mankind does and designs every day, would have a global-local disconnect. If that is the case, the global-local disconnect is no big deal. We deal with it every day. I don't know exactly what you mean by regularities in the above definition, but I think you mean something equivalent to patterns or meaningful generalizations. In many types of computing commonly done, you don't know what the regularizes will be without tremendous computing. For example in principal component analysis, you often don't know what the major dimensions of a distribution will be until you do a tremendous amount of computation. Does that mean there is a GLD in that problem? If so, it doesn't seem to be a big deal. PCA is done all the time, as are all sorts of other complex matrix computations. But you have implied multiple times that you think the global-local disconnect is a big, big deal. You have implied multiple times it presents a major problem to developing AGI. If I interpret your prior statements taken in conjunction with your paper correctly, I am guessing your major thrust is that it will be very difficult to design AGI's where the desired behavior is to be the result of many casual relations between a vast number of active elements, because in such system the causality is so non-linear and complex that we cannot currently properly think and design in terms of them. Although this proposition is not obviously true on its face, it is arguably also not obviously false on its face. Although it is easy to design system where the systems behavior would be sufficiently chaotic that such design would be impossible, it seems likely that it is also possible to design complex system in which the behavior is not so chaotic or unpredictable. Take the internet. Something like 10^8 computers talk to each other, and in general it works as designed. Take IBM's supercomputer BlueGene L, 64K dual core processor computer each with at least 256MBytes all capable of receiving and passing messages at 4Ghz on each of over 3 dimensions, and capable of performing 100's of trillions of FLOP/sec. Such a system probably contains at least 10^14 non-linear separately functional elements, and yet it works as designed. If there is a global-local disconnect in the BlueGene L, which there could be depending on your definition, it is not a problem for most of the computation it does. So why are we to believe, as your paper seems to suggest, that we have to do some scan of complexity space before we can design AGI systems? In the AGI I am thinking of one would be able to predict many of the behaviors of the machine, at least at a general level from local rules, because the system has been designed to produce certain types of results in certain types of situations. Of course, because the system is large the inferencing from each of the many local rules would require a hell of a lot of computing, so much
RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]
Mark, The paper said: Conceptually we begin by enumerating all tree fragments that occur in the training data 1,...,n. Those are the dimensions, all of the parse tree fragments in the training data. And as I pointed out in an email I just sent to Richard, although usually only a small set of them are involved in any one match between two parse trees, they can all be used over set of many such matches. So the full dimensionality is actually there, it is just that only a particular subset of them are being used at any one time. And when the system is waiting for the next tree to match it is potentially capability of matching it against any of its dimensions. Ed Porter -Original Message- From: Mark Waser [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 3:07 PM To: agi@v2.listbox.com Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research] ED PORTER= The 500K dimensions were mentioned several times in a lecture Collins gave at MIT about his parse. This was probably 5 years ago so I am not 100% sure the number was 500K, but I am about 90% sure that was the number used, and 100% sure the number was well over 100K. OK. I'll bite. So what do *you* believe that these dimensions are? Words? Word pairs? Entire sentences? Different trees? - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72646193-0bde77attachment: winmail.dat
Re: [agi] How to tepresent things problem
On Dec 5, 2007 7:13 PM, Richard Loosemore [EMAIL PROTECTED] wrote: Vladimir Nesov wrote: Richard, I'll try to summarize my solutions to these problems which allow to use a network without need for explicit copying of instances (or any other kind of explicit allocation of entities which are to correspond to instances). (Although my model also requires ubiquitous induction between nodes which disregards network structure.) Basic structure of network: network is 'spiking' in the sense that it operates in real time and links between nodes have a delay. Input nodes send in the network sensory data, output nodes read actions. All links between nodes can shift over time and experience through induction. Initial configuration specifies simple pathways from input to output, shifting of links changes these pathways, making them more intricate to reflect experience. Scene (as a graph which describes objects) is represented by active nodes: node being active corresponds to feature being included in the scene. Not all features present in the scene are active at the same time, some of them can activate periodically, every several tacts or more, and some other features can be represented by summarizing simplified features (node 'apple' instead of 3D sketch of its surface). Network edges (links) activate the nodes. If condition (configuration of nodes from which link originates) for a link is satisfied, and link is active, it activates the target node. Activation in the network follows a variation of Hebbian rule, 'induction rule' (which is essential for mechanism of instance representation): link becomes active (starts to activate its target node) only if it observed that node to be activated after condition for link was satisfied in a majority of cases (like 90% or more). So, if some node is activated in a network, there are good reasons for that, no blind association-seeking. Representation of instances. If scene contains multiple instances of the same object (or pattern, say an apple), and these patterns are not modified in it, there is no point in representing those instances separately: all places at which instances are located ('instantiation points', say places where apples lie or hang) refer to the same pattern. The only problem is modification of instances in specific instantiation points. This scene can be implemented by creating links from instantiation points to nodes that represent the pattern. As a result, during activation cycle of represented scene, activation of instantiation points leads to activation of patterns (as there's only one pattern for each instantiation point, so induction rule works in this direction), but not in other direction (as there are many instantiation points for the pattern, none of them will be a target of a link originating from the pattern). This one-way activation results in a propagation of 'activation waves' from instantiation points to the pattern, so that each wave 'outlines' both pattern and instantiation point. These waves effectively represent instances. If there's a modifier associated with specific instantiation point, during an activation wave it will activate during the same wave as pattern does, and as a result it can be applied to it. As other instantiation points refer to the pattern 'by value', pattern at those points won't change much. Also, this way of representing instances is central to extraction of similarities: if several objects are similar, they will share some of their nodes and as a result their structures will influence one another, creating a pressure to extract a common pattern. I have questions at this point. Your notion of instantiation point sounds like what I would call an instance node which is created on the fly. No, it's not that; I'll try to clarify using more detailed example. Say, there are these apples (to which I referred to as 'pattern'), which are all represented by single clump of nodes, the same as would be used for a single apple. Instantiation points are actual objects that in some sense 'hold' the apples on the scene, for example a particular plate on which an apple lies. In the scene, there is (for simplicity) only one plate, and there's always an apple that lies on it. So, we can create a PLATE-APPLE link, and this link satisfies the induction rule since whenever PLATE is encountered, there's an APPLE on it. Here, PLATE is an instantiation point, and APPLE is a pattern. If scene also contains an apple-tree BRANCH, on which there's also an APPLE hanging, we can create a BRANCH-APPLE link. But we can't create an APPLE-PLATE link, since in one of instantiation points (BRANCH), PLATE is not there when APPLE is. Also, these links are short-term things (as plates don't always have apples on them), but scene can be stored long-term if they are duplicated on new nodes corresponding to these nodes
Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]
Dimensions is an awfully odd word for that since dimensions are normally assumed to be orthogonal. - Original Message - From: Ed Porter [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, December 05, 2007 5:08 PM Subject: RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research] Mark, The paper said: Conceptually we begin by enumerating all tree fragments that occur in the training data 1,...,n. Those are the dimensions, all of the parse tree fragments in the training data. And as I pointed out in an email I just sent to Richard, although usually only a small set of them are involved in any one match between two parse trees, they can all be used over set of many such matches. So the full dimensionality is actually there, it is just that only a particular subset of them are being used at any one time. And when the system is waiting for the next tree to match it is potentially capability of matching it against any of its dimensions. Ed Porter -Original Message- From: Mark Waser [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 3:07 PM To: agi@v2.listbox.com Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research] ED PORTER= The 500K dimensions were mentioned several times in a lecture Collins gave at MIT about his parse. This was probably 5 years ago so I am not 100% sure the number was 500K, but I am about 90% sure that was the number used, and 100% sure the number was well over 100K. OK. I'll bite. So what do *you* believe that these dimensions are? Words? Word pairs? Entire sentences? Different trees? - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72664919-0f4727
Re: [agi] None of you seem to be able ...
Richard: Now, interpreting that result is not easy, Richard, I get the feeling you're getting understandably tired with all your correspondence today. Interpreting *any* of the examples of *hard* cog sci that you give is not easy. They're all useful, stimulating stuff, but they don't add up to a hard pic. of the brain's cognitive architecture. Perhaps Ben will back me up on this - it's a rather important point - our overall *integrated* picture of the brain's cognitive functioning is really v. poor, although certainly we have a wealth of details about, say, which part of the brain is somehow connected to a given operation. Richard:I admit that I am confused right now: in the above paragraphs you say that your position is that the human mind is 'rational' and then later that it is 'irrational' - was the first one of those a typo? Richard, No typo whatsoever if you just reread. V. clear. I say and said: *scientific pychology* and *cog sci* treat the mind as rational. I am the weirdo who is saying this is nonsense - the mind is irrational/crazy/creative - rationality is a major *achievement* not something that comes naturally. Mike Tintner= crazy/irrational- somehow, I don't think you'll find that hard to remember. - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72407413-5af67f
Re: [agi] How to tepresent things problem
Richard, I'll try to summarize my solutions to these problems which allow to use a network without need for explicit copying of instances (or any other kind of explicit allocation of entities which are to correspond to instances). (Although my model also requires ubiquitous induction between nodes which disregards network structure.) Basic structure of network: network is 'spiking' in the sense that it operates in real time and links between nodes have a delay. Input nodes send in the network sensory data, output nodes read actions. All links between nodes can shift over time and experience through induction. Initial configuration specifies simple pathways from input to output, shifting of links changes these pathways, making them more intricate to reflect experience. Scene (as a graph which describes objects) is represented by active nodes: node being active corresponds to feature being included in the scene. Not all features present in the scene are active at the same time, some of them can activate periodically, every several tacts or more, and some other features can be represented by summarizing simplified features (node 'apple' instead of 3D sketch of its surface). Network edges (links) activate the nodes. If condition (configuration of nodes from which link originates) for a link is satisfied, and link is active, it activates the target node. Activation in the network follows a variation of Hebbian rule, 'induction rule' (which is essential for mechanism of instance representation): link becomes active (starts to activate its target node) only if it observed that node to be activated after condition for link was satisfied in a majority of cases (like 90% or more). So, if some node is activated in a network, there are good reasons for that, no blind association-seeking. Representation of instances. If scene contains multiple instances of the same object (or pattern, say an apple), and these patterns are not modified in it, there is no point in representing those instances separately: all places at which instances are located ('instantiation points', say places where apples lie or hang) refer to the same pattern. The only problem is modification of instances in specific instantiation points. This scene can be implemented by creating links from instantiation points to nodes that represent the pattern. As a result, during activation cycle of represented scene, activation of instantiation points leads to activation of patterns (as there's only one pattern for each instantiation point, so induction rule works in this direction), but not in other direction (as there are many instantiation points for the pattern, none of them will be a target of a link originating from the pattern). This one-way activation results in a propagation of 'activation waves' from instantiation points to the pattern, so that each wave 'outlines' both pattern and instantiation point. These waves effectively represent instances. If there's a modifier associated with specific instantiation point, during an activation wave it will activate during the same wave as pattern does, and as a result it can be applied to it. As other instantiation points refer to the pattern 'by value', pattern at those points won't change much. Also, this way of representing instances is central to extraction of similarities: if several objects are similar, they will share some of their nodes and as a result their structures will influence one another, creating a pressure to extract a common pattern. Creation of new nodes. Each new node during a creation phase corresponds to an existing node ('original node') in the network. During this phase (which isn't long), each activated link that connects to original node (both incoming and outgoing connections) is copied so that in a copy original node is substituted by a new node. As a result, new node will be active in situations in this original node activated during creation of the new node. New node can represent episodic memory or more specific subcategory of category represented by original node. Initially, new node doesn't influence behavior of the system (as it's activated in a subset of tacts in which original node can activate), but because of this difference it can obtain inductive links different from those that fit original node. On Dec 5, 2007 4:47 AM, Richard Loosemore [EMAIL PROTECTED] wrote: Dennis Gorelik wrote: Richard, 3) A way to represent things - and in particular, uncertainty - without getting buried up to the eyeballs in (e.g.) temporal logics that nobody believes in. Conceptually the way of representing things is described very well. It's Neural Network -- set of nodes (concepts), when every node can be connected with the set of other nodes. Every connection has it's own weight. Some nodes are connected with external devices. For example, one node can be connected with one word in text dictionary (that is an external device). Do you see any
Re: [agi] Flexibility of AI vs. a PC
William Pearson wrote: One thing that has been puzzling me for a while is, why some people expect an intelligence to be less flexible than a PC. What do I mean by this? A PC can have any learning algorithm, bias or representation of data we care to create. This raises another question: how are we creating a representation if not copying it from some sense from our brains? So why do we still create systems that have fixed representations of the external world, fixed methods of learning? Take the development of echo location in blind people, or the ability to take visual information from stimulating the tongue. Isn't this sufficient evidence to suggest we should be trying to make our AIs as flexible as the most flexible things we know? Well said. Richard Loosemore - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72270075-4c3b39
Re: [agi] None of you seem to be able ...
Mike Tintner wrote: Ben: Obviously the brain contains answers to many of the unsolved problems of AGI (not all -- e.g. not the problem of how to create a stable goal system under recursive self-improvement). However, current neuroscience does NOT contain these answers. And neither you nor anyone else has ever made a cogent argument that emulating the brain is the ONLY route to creating powerful AGI. Absolutely agree re neuroscience's lack of answers (hence Richard's assertion that his system is based on what cognitive science knows about brain architecture is not a smart one - the truth is not much at all.) Um, excuse me? Let me just make sure I understand this: you say that it is not smart of me to say that my system is based on what cognitive science knows about brain architecture, because cognitive science knows not much at all about brain architecture? Number one: I don't actually say that (brain architecture is only a small part of what is involved in my system). Number two: Cognitive science does too know a good deal about brain architecture! I *know* cognitive science. Cognitive science is a friend of mine. Mike, you are no cognitive scientist :-). Richard Loosemore - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72293683-687e21
Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]
HeavySarcasmWow. Is that what dot products are?/HeavySarcasm You're confusing all sorts of related concepts with a really garbled vocabulary. Let's do this with some concrete 10-D geometry . . . . Vector A runs from (0,0,0,0,0,0,0,0,0,0) to (1, 1, 0,0,0,0,0,0,0,0). Vector B runs from (0,0,0) to (1, 0, 1,0,0,0,0,0,0,0). Clearly A and B share the first dimension. Do you believe that they share the second and the third dimension? Do you believe that dropping out the fourth through tenth dimension in all calculations is some sort of huge conceptual breakthrough? The two vectors are similar in the first dimension (indeed, in all but the second and third) but otherwise very distant from each other (i.e. they are *NOT* similar). Do you believe that these vectors are similar or distant? THE ALLEGATION BELOW THAT I MISUNDERSTOOD THE MATH BECAUSE THOUGHT COLLIN'S PARSER DIDN'T HAVE TO DEAL WITH A VECTOR HAVING THE FULL DIMENSIONALITY OF THE SPACE BEING DEALT WITH IS CLEARLY FALSE. My allegation was that you misunderstood the math because you claimed that Collin's paper does not use an explicit vector representation while Collin's statements and the math itself makes it quite clear that they are dealing with a vector representation scheme. I'm now guessing that you're claiming that you intended explicit to mean full dimensionality. Whatever. Don't invent your own meanings for words and you'll be misunderstood less often (unless you continue to drop out key words like in the capitalized sentence above). - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72452073-36665f
RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]
Mark, Your last email started OK. I'll bite. I guess you didn't bite for very long. We are already back to explicitly marked HeavySarcasm mode. I guess one could argue, as you seem to be doing, that indicating which of 500k dimensions had a match between two subtrees currently being compared, could be considered equivalent to explicitly representing a huge 500k dimensional binary vector -- but i think one could more strongly claim that such an indication would be, at best, only an implicit representation of the 500k vector. THE KEY POINT I WAS TRYING TO GET ACROSS WAS ABOUT NOT HAVING TO EXPLICITLY DEAL WITH 500K TUPLES in each match, which is what I meant when I said not explicitly deal with the high dimensional vectors. This is a big plus in terms of representational and computational efficiency. I did not say there was nothing equivalent to an implicit use of the high dimensional vector, because kernels implicitly do use high dimensional vectors, but they do so implicitly rather than explicitly. That is why they increase efficiency. My Merriam-Webster's Collegiate Dictionary gives as its first, which usually means most common, definition of explicit the following: fully revealed or expressed without vagueness, implication, or ambiguity. The information that two subtree to be matched contains a given set of subtrees, defined by their indicies, without more, does not by itself define a full 500K vector, nor even the full dimensionality of the vector. That information can only be derived from other information, which presumably is not even used in the match procedure Of course there are other definitions of the world explicit which mean exact, and you could argue that indicating a few of the 500K indicies is equivalent to exactly specifying a corresponding 500K dimensional vector, once one takes into account other information. When a use of a word in a given statement has two interpretations one of which is correct, it is not clear one has the right to attack the person making that statement for being incorrect. At most you can attack him for being ambiguous. And normally on this list people do not attack other people as rudely as you have attached me for merely being ambiguous. Ed Porter -Original Message- From: Mark Waser [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 3:40 PM To: agi@v2.listbox.com Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research] HeavySarcasmWow. Is that what dot products are?/HeavySarcasm You're confusing all sorts of related concepts with a really garbled vocabulary. Let's do this with some concrete 10-D geometry . . . . Vector A runs from (0,0,0,0,0,0,0,0,0,0) to (1, 1, 0,0,0,0,0,0,0,0). Vector B runs from (0,0,0) to (1, 0, 1,0,0,0,0,0,0,0). Clearly A and B share the first dimension. Do you believe that they share the second and the third dimension? Do you believe that dropping out the fourth through tenth dimension in all calculations is some sort of huge conceptual breakthrough? The two vectors are similar in the first dimension (indeed, in all but the second and third) but otherwise very distant from each other (i.e. they are *NOT* similar). Do you believe that these vectors are similar or distant? THE ALLEGATION BELOW THAT I MISUNDERSTOOD THE MATH BECAUSE THOUGHT COLLIN'S PARSER DIDN'T HAVE TO DEAL WITH A VECTOR HAVING THE FULL DIMENSIONALITY OF THE SPACE BEING DEALT WITH IS CLEARLY FALSE. My allegation was that you misunderstood the math because you claimed that Collin's paper does not use an explicit vector representation while Collin's statements and the math itself makes it quite clear that they are dealing with a vector representation scheme. I'm now guessing that you're claiming that you intended explicit to mean full dimensionality. Whatever. Don't invent your own meanings for words and you'll be misunderstood less often (unless you continue to drop out key words like in the capitalized sentence above). - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72881028-794447attachment: winmail.dat
Re: [agi] How to tepresent things problem
Vladimir Nesov wrote: Richard, I'll try to summarize my solutions to these problems which allow to use a network without need for explicit copying of instances (or any other kind of explicit allocation of entities which are to correspond to instances). (Although my model also requires ubiquitous induction between nodes which disregards network structure.) Basic structure of network: network is 'spiking' in the sense that it operates in real time and links between nodes have a delay. Input nodes send in the network sensory data, output nodes read actions. All links between nodes can shift over time and experience through induction. Initial configuration specifies simple pathways from input to output, shifting of links changes these pathways, making them more intricate to reflect experience. Scene (as a graph which describes objects) is represented by active nodes: node being active corresponds to feature being included in the scene. Not all features present in the scene are active at the same time, some of them can activate periodically, every several tacts or more, and some other features can be represented by summarizing simplified features (node 'apple' instead of 3D sketch of its surface). Network edges (links) activate the nodes. If condition (configuration of nodes from which link originates) for a link is satisfied, and link is active, it activates the target node. Activation in the network follows a variation of Hebbian rule, 'induction rule' (which is essential for mechanism of instance representation): link becomes active (starts to activate its target node) only if it observed that node to be activated after condition for link was satisfied in a majority of cases (like 90% or more). So, if some node is activated in a network, there are good reasons for that, no blind association-seeking. Representation of instances. If scene contains multiple instances of the same object (or pattern, say an apple), and these patterns are not modified in it, there is no point in representing those instances separately: all places at which instances are located ('instantiation points', say places where apples lie or hang) refer to the same pattern. The only problem is modification of instances in specific instantiation points. This scene can be implemented by creating links from instantiation points to nodes that represent the pattern. As a result, during activation cycle of represented scene, activation of instantiation points leads to activation of patterns (as there's only one pattern for each instantiation point, so induction rule works in this direction), but not in other direction (as there are many instantiation points for the pattern, none of them will be a target of a link originating from the pattern). This one-way activation results in a propagation of 'activation waves' from instantiation points to the pattern, so that each wave 'outlines' both pattern and instantiation point. These waves effectively represent instances. If there's a modifier associated with specific instantiation point, during an activation wave it will activate during the same wave as pattern does, and as a result it can be applied to it. As other instantiation points refer to the pattern 'by value', pattern at those points won't change much. Also, this way of representing instances is central to extraction of similarities: if several objects are similar, they will share some of their nodes and as a result their structures will influence one another, creating a pressure to extract a common pattern. I have questions at this point. Your notion of instantiation point sounds like what I would call an instance node which is created on the fly. There is nothing wrong with this in principle, I believe, but it all depends on the details of how these things are handled. For example, it requires a *substantial* modification of the neural network idea to allow for the rapid formation of instance nodes, and that modification is so substantial that it would dominate the behavior of the system. I don't know if you follow the colloquialism, but there is a sense in which the tail is wagging the dog: the instance nodes are such an important mechanism that everything depends on the details of how they are handled. So, to consider one or two of the details that you mention. You would like there to be only a one-way connection between the generic node (do you call this the pattern node?) and the instance node (instantiation point?), so that the latter can contact the former, but not vice-versa. Does this not contradict the data from psychology (if you care about that)? For instance, we are able to see a field of patterns, of different colors, and then when someone says the phrase the green patterns we find that the set of green patterns jumps out at us from the scene. It is as if we did indeed have links from the generic concept [green pattern] to all the instances. You may not care about the psychology, but
RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]
They need not be. -Original Message- From: Mark Waser [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 6:04 PM To: agi@v2.listbox.com Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research] Dimensions is an awfully odd word for that since dimensions are normally assumed to be orthogonal. - Original Message - From: Ed Porter [EMAIL PROTECTED] To: agi@v2.listbox.com Sent: Wednesday, December 05, 2007 5:08 PM Subject: RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research] Mark, The paper said: Conceptually we begin by enumerating all tree fragments that occur in the training data 1,...,n. Those are the dimensions, all of the parse tree fragments in the training data. And as I pointed out in an email I just sent to Richard, although usually only a small set of them are involved in any one match between two parse trees, they can all be used over set of many such matches. So the full dimensionality is actually there, it is just that only a particular subset of them are being used at any one time. And when the system is waiting for the next tree to match it is potentially capability of matching it against any of its dimensions. Ed Porter -Original Message- From: Mark Waser [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 3:07 PM To: agi@v2.listbox.com Subject: Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research] ED PORTER= The 500K dimensions were mentioned several times in a lecture Collins gave at MIT about his parse. This was probably 5 years ago so I am not 100% sure the number was 500K, but I am about 90% sure that was the number used, and 100% sure the number was well over 100K. OK. I'll bite. So what do *you* believe that these dimensions are? Words? Word pairs? Entire sentences? Different trees? - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72742511-f9bb8b
Re: Hacker intelligence level [WAS Re: [agi] Funding AGI research]
On 12/5/07, Matt Mahoney [EMAIL PROTECTED] wrote: [snip] Centralized search is limited to a few big players that can keep a copy of the Internet on their servers. Google is certainly useful, but imagine if it searched a space 1000 times larger and if posts were instantly added to its index, without having to wait days for its spider to find them. Imagine your post going to persistent queries posted days earlier. Imagine your queries being answered by real human beings in addition to other peers. I probably won't be the one writing this program, but where there is a need, I expect it will happen. Wikia, the company run by Wikipedia founder Jimmy Wales, is tackling the Internet-scale distributed search problem - http://search.wikia.com/wiki/Atlas Connecting to related threads (some recent, some not-so-recent), the Grub distributed crawler ( http://search.wikia.com/wiki/Grub ) is intended to be one of many plug-in Atlas Factories. A development goal for Grub is to enhance it with a NL toolkit (e.g. the soon-to-be-released RelEx), so it can do more than parse simple keywords and calculate statistical word relationships. -dave - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72165246-397899
RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]
From: Matt Mahoney [mailto:[EMAIL PROTECTED] My design would use most of the Internet (10^9 P2P nodes). Messages would be natural language text strings, making no distinction between documents, queries, and responses. Each message would have a header indicating the ID and time stamp of the originator and any intermediate nodes through which the message was routed. A message could also have attached files. Each node would have a cache of messages and its own policy on which messages it decides to keep or discard. The goal of the network is to route messages to other nodes that store messages with matching terms. To route an incoming message x, it matches terms in x to terms in stored messages and sends copies to nodes that appear in those headers, appending its own ID and time stamp to the header of the outgoing copies. It also keeps a copy, so that the receiving nodes knows that they know it has a copy of x (at least temporarily). The network acts as a distributed database with a distributed search function. If X posts a document x and Y posts a query y with matching terms, then the network acts to route x to Y and y to X. The very tricky but required part of creating a global network like this is going from zero nodes to whatever the goal is. I think that much emphasis of a design needs to be put into the growth function. If you have 50 nodes running how do you get to 500? And 500 to 5,000? And then if it goes down from 50,000 to 10,000 fast how is it revived before crash? Engineering expertise, ingenuity + maybe psychological and sociological wisdom can be used to make this happen. And we all know that the growth could happen quickly, even overnight. Then once getting to 10^9 nodes they have to be maintained or they can die quickly and even instantaneously. Having an intelligent botnet has its advantages. Once it's running and users try to uninstall it the botnet can try to fight for survival by reasoning with the users. You could make it such that a user has to verbally communicate with it to remove it. The botnet could stall and ask things like Why are you doing this to me after all I have done for you? User:sorry charlie, I command you to uninstall! Bot:OK let's cut a deal... I know we can work this out... John - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72911975-ce1dcc
Re: [agi] None of you seem to be able ...
Tintner wrote: Your paper represents almost a literal application of the idea that creativity is ingenious/lateral. Hey it's no trick to be just ingenious/lateral or fantastic. Ah ... before creativity was what was lacking. But now you're shifting arguments and it's something else that is lacking ;-) You clearly like producing new psychological ideas - from a skimming of your work, you've produced several. However, I didn't come across a single one that was grounded or where any attempt was made to ground them in direct, fresh observation (as opposed to occasionally referring to an existing scientific paper). That is a very strange statement. In fact nearly all my psychological ideas are grounded in direct, fresh **introspective** observation --- but they're not written up that way because that's not the convention in modern academia. To publish your ideas in academic journals, you need to ground them in the existing research literature, not in your own personal introspective observations. It is true that few of my psychological hypotheses are grounded in my own novel lab experiments, though. I did a little psych lab work in the late 90's, in the domain of perceptual illusions -- but the truth is that psych and neuroscience are not currently sophisticated enough to allow empirical investigation of really interesting questions about the nature of cognition, self, etc. Wait a couple decades, I guess. In terms of creative psychology, that is consistent with your resistance to producing prototypes - and grounding your invention/innovation. Well, I don't have any psychological resistance to producing working software, obviously. Most of my practical software work has been proprietary for customers; but, check out MOSES and OpenBiomind on Google Code -- two open-source projects that have emerged from my Novamente LLC and Biomind LLC work ... It just happens that AGI does not lend itself to prototyping, for reasons I've already tried and failed to explain to you We're gonna launch trainable, adaptive virtual animals in Second Life sometime in 2008 But I won't consider them real prototypes of Novamente AGI, even though in fact they will use several aspects of the Novamente Cognition Engine software. They won't embody the key emergent structures/dynamics that I believe need to be there to have human-level cognition -- and there is no simple prototype system that will do so. You celebrate Jeff Hawkins' prototype systems, but have you tried them? He's built (or, rather Dileep George has built) an image classification engine, not much different in performance from many others out there. It's nice work but it's not really an AGI prototype, it's an image classifiers. He may be sort-of labeling it a prototype of his AGI approach -- but really, it doesn't prove anything dramatic about his AGI approach. No one who inspected his code and ran it would think that it did provide such proof. There are at least two stages of creative psychological development - which you won't find in any literature. The first I'd call simply original thinking, the second is truly creative thinking. The first stage is when people realise they too can have new ideas and get hooked on the excitement of producing them. Only much later comes the second stage, when thinkers realise that truly creative ideas have to be grounded. Arguably, the great majority of people who may officially be labelled as creatives, never get beyond the first stage - you can make a living doing just that. But the most beautiful and valuable ideas come from being repeatedly refined against the evidence. People resist this stage because it does indeed mean a lot of extra work , but it's worth it. (And it also means developing that inner faculty which calls for actual evidence). OK, now you're making a very different critique than what you started with though. Before you were claiming there are no creative ideas in AGI. Now, when confronted with creative ideas, you're complaining that they're not grounded via experimental validation. Well, yeah... And the problem is that if one's creative ideas pertain to the dynamics of large-scale, complex software systems, then it takes either a lot of time or a lot of money to achieve this validation that you mention. It is not the case that I (and other AGI researchers) are somehow psychologically undesirous of seeing our creative ideas explored via experiment. It is, rather, the case that doing the relevant experiments requires a LOT OF WORK, and we are few in number with relatively scant resources. What I am working toward, with Novamente and soon with OpenCog as well, is precisely the empirical exploration of the various creative ideas of myself, others whose work has been built on in the Novamente design, and my colleagues... -- Ben G - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to:
Distributed search (was RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research])
--- Ed Porter [EMAIL PROTECTED] wrote: Matt, Perhaps your are right. But one problem is that big Google-like compuplexes in the next five to ten years will be powerful enough to do AGI and they will be much more efficient for AGI search because the physical closeness of their machines will make it possible for them to perform the massive interconnected needed for powerful AGI much more efficiently. Google controls about 0.1% of the world's computing power. But I think their ability to achieve AGI first will not be so much due to the high bandwidth of their CPU cluster, as that nobody controls the other 99.9%. Centralized search tends to produce monopolies as the cost of entry goes up. It is not so bad now because Google still has a (dwindling) set of competitors. They can't yet hide content that threatens them. Distributed search like Wikia/Atlas/Grub is interesting, but if people don't see a compelling need for it, it won't happen. How big will it have to get before it is better than Google? File sharing networks would probably be a lot bigger and more useful (with mostly legitimate content) if we could solve the distributed search problem. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=72969535-74e4ee
Distrubuted message pool (was RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research])
--- John G. Rose [EMAIL PROTECTED] wrote: From: Matt Mahoney [mailto:[EMAIL PROTECTED] My design would use most of the Internet (10^9 P2P nodes). Messages would be natural language text strings, making no distinction between documents, queries, and responses. Each message would have a header indicating the ID and time stamp of the originator and any intermediate nodes through which the message was routed. A message could also have attached files. Each node would have a cache of messages and its own policy on which messages it decides to keep or discard. The goal of the network is to route messages to other nodes that store messages with matching terms. To route an incoming message x, it matches terms in x to terms in stored messages and sends copies to nodes that appear in those headers, appending its own ID and time stamp to the header of the outgoing copies. It also keeps a copy, so that the receiving nodes knows that they know it has a copy of x (at least temporarily). The network acts as a distributed database with a distributed search function. If X posts a document x and Y posts a query y with matching terms, then the network acts to route x to Y and y to X. The very tricky but required part of creating a global network like this is going from zero nodes to whatever the goal is. I think that much emphasis of a design needs to be put into the growth function. If you have 50 nodes running how do you get to 500? And 500 to 5,000? And then if it goes down from 50,000 to 10,000 fast how is it revived before crash? Engineering expertise, ingenuity + maybe psychological and sociological wisdom can be used to make this happen. And we all know that the growth could happen quickly, even overnight. Getting the network to grow means providing enough incentive that people will want to install your software. A distributed message pool offers two services: distributed search and a message posting service. Information has negative value, so it is the second service that provides the incentive. You type your message into a client window, and it instantly becomes available to anyone who enters a query with matching terms. Then once getting to 10^9 nodes they have to be maintained or they can die quickly and even instantaneously. How? A peer would a piece of software that people would use every day, like a web browser or email. People aren't going to suddenly decide to uninstall them all at once or turn off their computers. One possible scenario is a virus or worm spreading quickly from peer to peer. Hopefully there will be a wide variety of peers offering different services, so that individual vulnerabilities could affect only a small part of the network. Having an intelligent botnet has its advantages. Once it's running and users try to uninstall it the botnet can try to fight for survival by reasoning with the users. You could make it such that a user has to verbally communicate with it to remove it. The botnet could stall and ask things like Why are you doing this to me after all I have done for you? User:sorry charlie, I command you to uninstall! Bot:OK let's cut a deal... I know we can work this out... Well, I expect the intelligence to come from having a large number of specialized but relatively dumb peers, and a network that can direct your queries to the right ones. Peers would individually be under the control of their human owners, just as web servers and clients are now. It's not like you could command the Internet to uninstall anyway. Eventually we will need to deal with the problem of the network becoming smarter than us, but I think the threshold of concern is when the collective computing power in silicon exceeds the collective computing power in carbon. Right now the Internet has about as much computing power as a few hundred human brains, but we still have a ways to go to the singularity. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=73000478-537c13
RE: Distributed search (was RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research])
I have a lot of respect for Google, but I don't like monopolies, whether it is Microsoft or Google. I think it is vitally important that there be several viable search competators. I wish this wicki one luck. As I said, it sounds a lot like your idea. Ed Porter -Original Message- From: Matt Mahoney [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 9:24 PM To: agi@v2.listbox.com Subject: Distributed search (was RE: Hacker intelligence level [WAS Re: [agi] Funding AGI research]) --- Ed Porter [EMAIL PROTECTED] wrote: Matt, Perhaps your are right. But one problem is that big Google-like compuplexes in the next five to ten years will be powerful enough to do AGI and they will be much more efficient for AGI search because the physical closeness of their machines will make it possible for them to perform the massive interconnected needed for powerful AGI much more efficiently. Google controls about 0.1% of the world's computing power. But I think their ability to achieve AGI first will not be so much due to the high bandwidth of their CPU cluster, as that nobody controls the other 99.9%. Centralized search tends to produce monopolies as the cost of entry goes up. It is not so bad now because Google still has a (dwindling) set of competitors. They can't yet hide content that threatens them. Distributed search like Wikia/Atlas/Grub is interesting, but if people don't see a compelling need for it, it won't happen. How big will it have to get before it is better than Google? File sharing networks would probably be a lot bigger and more useful (with mostly legitimate content) if we could solve the distributed search problem. -- Matt Mahoney, [EMAIL PROTECTED] - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?; - This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244id_secret=73068614-a9079e