Re: [agi] Introducing Steve's Theory of Everything in cognition.
Loosemore, et al, Just to get this discussion out of esoteric math, here is a REALLY SIMPLE way of doing unsupervised learning with dp/dt that looks like it ought to work. Suppose we record each occurrence of the inputs to a neuron, keeping counters to identify how many times each combination has happened. For this discussion, each input will be considered to have either a substantial positive, substantial negative, or nearly zero dp/dt. When we reach a threshold, of, say 20, identical occurrences of the same combination of dp/dt that is NOT accompanied by lateral inhibition, we will proclaim THAT to be our principal component function for that neuron to do for the rest of its life. Thereafter, the neuron will require the previously observed positive and negative inputs to be as programmed, but will ignore all inputs that were nearly zero. Of course, many frames will be corrupted because of overlapping phenomena, sampling on a dp/dt edges, noise, fast phenomena, etc., etc. However, there will be few if any precise repetitions of corrupted frames, whereas clean frames should be quite common. First the most common frame (all zeros - nothing there) will be recognized, followed by each of the most common simultaneously occurring temporal patterns recognized by successive neurons, all identified in order of decreasing frequency exactly as needed for Huffman or PCA coding. This process won't start until all inputs are accompanied by an indication that they have already been programmed by this process, so that programming will proceed layer by layer without corruption from inputs being only partially developed (a common problem in multi-layer NNs). While clever math might make this work a little faster, and certainly wet neurons can't store many previous patterns, this should be guaranteed to work, and produce substantially perfect unsupervised learning, albeit probably slower than better-math methods, but probably faster than wet neurons that can't save thousands of combinations during early programming. Of course, this would be completely unworkable outside of dp/dt space, as in object space, this would probably exhaust a computer's memory before completing. Does this get the Loosemore Certificate of No Objection as being an apparently workable method for substantially optimal unsupervised learning? Thanks for considering this. Steve Richfield --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
[agi] Alternative Cicuitry
Reading this - http://www.nytimes.com/2008/12/23/health/23blin.html?ref=science makes me wonder what other circuitry we have that's discouraged from being accepted. John --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
RE: [agi] Universal intelligence test benchmark
From: Matt Mahoney [mailto:matmaho...@yahoo.com] --- On Sat, 12/27/08, John G. Rose johnr...@polyplexic.com wrote: Well I think consciousness must be some sort of out of band intelligence that bolsters an entity in terms of survival. Intelligence probably stratifies or optimizes in zonal regions of similar environmental complexity, consciousness being one or an overriding out-of-band one... No, consciousness only seems mysterious because human brains are programmed that way. For example, I should logically be able to convince you that pain is just a signal that reduces the probability of you repeating whatever actions immediately preceded it. I can't do that because emotionally you are convinced that pain is real. Emotions can't be learned the way logical facts can, so emotions always win. If you could accept the logical consequences of your brain being just a computer, then you would not pass on your DNA. That's why you can't. BTW the best I can do is believe both that consciousness exists and consciousness does not exist. I realize these positions are inconsistent, and I leave it at that. Consciousness must be a component of intelligence. For example - to pass on DNA for humans, they need to be conscious, or have been up to this point. Humans only live approx. 80 years. Intelligence is really a multi-agent thing, IOW our individual intelligence has come about through the genetic algorithm of humanity, we are really a distributed intelligence and theoretically AGI will be born out of that. So maybe for improved genetic algorithms used for obtaining max compression there needs to be a consciousness component in the agents? Just an idea I think there is potential for distributed consciousness inside of command line compressors :) John --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Alternative Cicuitry
John G. Rose wrote: Reading this - http://www.nytimes.com/2008/12/23/health/23blin.html?ref=science makes me wonder what other circuitry we have that's discouraged from being accepted. This blindsight news is not really news. It has been known for decades that there are two separate visual pathways in the brain, which seem to process what information and vision for action information. So this recent hubbub is just a new, more dramatic demonstration of something that has been known about for a long time. This is my take on what is going on here: The interesting fact is that the vision for action pathway can operate without conscious awareness. It is an autopilot. What this seems to imply is that at some early point in evolution there was only that pathway, and there was no general ability to think about higher level aspects of the world. Then the higher cognitive mechanisms developed, while the older system remained in place. The higher cognitive mechanisms grew their own system for analyzing visual input (the 'what pathway), but it turned out that the brain could still use the older pathway in parallel with the new, so it was left in place. I am going to add this as a prediction derived from the model of consciousness in my AGI-09 paper: the prediction is that when we uncover the exact implementation details of the analysis mechanism that I discussed in the paper, we will find that the AM is entirely within the higher cognitive system, and that the vision-for-action pathway just happens to be beyond the scope of what the AM can access. It is because it is outside that scope that no consciousness is associated with what that pathway does. (Unfortunately, of course, this prediction cannot be fully tested until we can pin down the exact details of how the analysis mechanism gets implemented in the brain. The same is true of the other predictions). Richard Loosemore --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
RE: [agi] Universal intelligence test benchmark
--- On Sun, 12/28/08, John G. Rose johnr...@polyplexic.com wrote: So maybe for improved genetic algorithms used for obtaining max compression there needs to be a consciousness component in the agents? Just an idea I think there is potential for distributed consciousness inside of command line compressors :) No, consciousness (as the term is commonly used) is the large set of properties of human mental processes that distinguish life from death, such as ability to think, learn, experience, make decisions, take actions, communicate, etc. It is only relevant as an independent concept to agents that have a concept of death and the goal of avoiding it. The only goal of a compressor is to predict the next input symbol. -- Matt Mahoney, matmaho...@yahoo.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Introducing Steve's Theory of Everything in cognition.
Steve, This sort of simple solution us what makes me say that relational learning is where real progress is to be made. That's not to say that we shouldn't rely on past work in flat learning: a great deal of progress has been made in that area, boosting them far beyond what simplistic solutions can do. Anyway, some comments on your proposal... The method sounds more like clustering then like principle components. I suppose it depends on exactly how the lateral inhibition behaves. If features are allowed to combine linearly, it is PCA, but if lateral inhibition forces only one neuron to respond to a given input, it is clustering. It seems unlikely that an entire visual frame will ever be repeated, even in dp/dt space. So, I infer that when you say frame you are thinking only of the field of inputs of an individual neuron, which perhaps correspond to a small region on the retina. Taking the standard route, the neurons could then be arranged in a hierarchy, so that more abstract neurons take as input the output of less abstract ones. But I'm not sure this would go well the way you've described things. The top level could only recognize whole-scene-classes that were defined by the intersection of the nonzero elements of all their members (because each individual neuron will have this property), which seems very limiting. This could be fixed easily enough, though, by standard methods. Anyway, such a hierarchy will not learn any relational concepts :P. There are ways of getting it to learn *some* relational concepts (for example, simply the fact that our eyes are constantly moving will help tremendously, since moving our eyes to different parts of the picture is equivalent to one of the suggestions I make in the blog post I referred you to). It may be true that all standard PCA methods are batch mode only, but there are standard clustering methods that do what you want (one such method is called sparse distributed memory). --Abram On Sun, Dec 28, 2008 at 5:45 AM, Steve Richfield steve.richfi...@gmail.com wrote: Loosemore, et al, Just to get this discussion out of esoteric math, here is a REALLY SIMPLE way of doing unsupervised learning with dp/dt that looks like it ought to work. Suppose we record each occurrence of the inputs to a neuron, keeping counters to identify how many times each combination has happened. For this discussion, each input will be considered to have either a substantial positive, substantial negative, or nearly zero dp/dt. When we reach a threshold, of, say 20, identical occurrences of the same combination of dp/dt that is NOT accompanied by lateral inhibition, we will proclaim THAT to be our principal component function for that neuron to do for the rest of its life. Thereafter, the neuron will require the previously observed positive and negative inputs to be as programmed, but will ignore all inputs that were nearly zero. Of course, many frames will be corrupted because of overlapping phenomena, sampling on a dp/dt edges, noise, fast phenomena, etc., etc. However, there will be few if any precise repetitions of corrupted frames, whereas clean frames should be quite common. First the most common frame (all zeros - nothing there) will be recognized, followed by each of the most common simultaneously occurring temporal patterns recognized by successive neurons, all identified in order of decreasing frequency exactly as needed for Huffman or PCA coding. This process won't start until all inputs are accompanied by an indication that they have already been programmed by this process, so that programming will proceed layer by layer without corruption from inputs being only partially developed (a common problem in multi-layer NNs). While clever math might make this work a little faster, and certainly wet neurons can't store many previous patterns, this should be guaranteed to work, and produce substantially perfect unsupervised learning, albeit probably slower than better-math methods, but probably faster than wet neurons that can't save thousands of combinations during early programming. Of course, this would be completely unworkable outside of dp/dt space, as in object space, this would probably exhaust a computer's memory before completing. Does this get the Loosemore Certificate of No Objection as being an apparently workable method for substantially optimal unsupervised learning? Thanks for considering this. Steve Richfield agi | Archives | Modify Your Subscription -- Abram Demski Public address: abram-dem...@googlegroups.com Public archive: http://groups.google.com/group/abram-demski Private address: abramdem...@gmail.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b
Re: [agi] Introducing Steve's Theory of Everything in cognition.
Steve, There has been plenty of speculation regarding just WHAT is buried in those principal components. Do they generally comprise simple combinations if identifiable features, or some sort of smushing that virtually encrypts the features? I have heard arguments on both sides of this issue. Can anyone here shine some light on this? It seems like this gets back to the ill-defined problem again. There is no way of answering without more information about what the output of PCA is to be used for! The only immediate criteria we have is how good of a probabilistic model an algorithm finds. There are, of course, many models to explain any finite set of data. Some like PCA may do so more concisely, while others may do so in ways that better lend themselves to subsequent computations, presumably to direct future actions. Conciseness will tend to be better for computation, simply because it is computationally easier to manipulate less data... of course, this is no guarantee, since the data may need to be fully uncompressed to extract a different feature, and if the compression is lossy then the feature may no longer be available. But if we know what we want to compute, we should be using supervised learning methods. Good predictive models will be likely to help us regardless of our goal. Unrestrained, PCA could change a neuron's functionality based on new data, and very likely wreck a functioning NN's future operation by doing so. Learning can still eventually converge. Also, I want to note that this is in some respects a quirk of NN methods that goes away if you think of things symbolically. --Abram On Sun, Dec 28, 2008 at 2:36 AM, Steve Richfield steve.richfi...@gmail.com wrote: Abram, On 12/27/08, Abram Demski abramdem...@gmail.com wrote: Steve, My thinking in the significant figures issue is that the purpose of unsupervised learning is to find a probabilistic model of the data There are, of course, many models to explain any finite set of data. Some like PCA may do so more concisely, while others may do so in ways that better lend themselves to subsequent computations, presumably to direct future actions. (whereas the purpose of supervised learning is to find a probabilistic model of *one* variable *conditioned on* all the others). When you talk about the insufficiency of standard PCA, do you think the problems you refer to relate to (1) PCA finding a suboptimal model, or There has been plenty of speculation regarding just WHAT is buried in those principal components. Do they generally comprise simple combinations if identifiable features, or some sort of smushing that virtually encrypts the features? I have heard arguments on both sides of this issue. Can anyone here shine some light on this? If the features can be extracted from combinations of components, then PCA is arguably optimal. If not, then PCA is probably not what is needed. Genuine PCA has some other unrelated problems, in that it is VERY computationally intensive, and there isn't (yet) any good incremental PCA algorithm that learns somewhat like you would expect a neuron to learn. I suspect that I may also have to crack this nut before dp/dt becomes truly useful. (2) the optimal model being not quite what you are after? I, like everyone else, want to use an optimal model. However, my idea of optimality may be different than other people's idea of optimality, as we seek to optimize different things. Unrestrained, PCA could change a neuron's functionality based on new data, and very likely wreck a functioning NN's future operation by doing so. I suspect that some additional cleverness is needed, e.g. neurons initially being in a discovery mode that produces no output until a principal component (or something like a principal component) is discovered. Then, when downstream neurons use that principal component, subsequent alteration would be constrained to refining that component, with no possibility of completely abandoning it for a completely different component that might better represent the input. Any thoughts? Steve Richfield === On Sat, Dec 27, 2008 at 3:05 AM, Steve Richfield steve.richfi...@gmail.com wrote: Abram, On 12/26/08, Abram Demski abramdem...@gmail.com wrote: Steve, When I made the statement about Fourier I was thinking of JPEG encoding. A little digging found this book, which presents a unified approach to (low-level) computer vision based on the Fourier transform: http://books.google.com/books?id=1wJuTMbNT0MCdq=fourier+visionprintsec=frontcoversource=blots=3ogSJ2i5uWsig=ZdvvWvu82q8UX1c5Abq6hWvgZCYhl=ensa=Xoi=book_resultresnum=2ct=result#PPA4,M Interesting, but seems far removed from wet neuronal functionality, unsupervised learning, etc. But that is beside the present point. :) Probably so. I noticed that you recently graduated, so I thought that I would drop that
Re: Human-centric AGI approach-paper (was Re: Indexing and Re: [agi] AGI Preschool: sketch of an evaluation framework for early stage AGI systems aimed at human-level, roughly humanlike AGI
Robert, What kind of problems have you designed this to solve? Can you give some examples? Robert: A brief paper on an AGI system for human-level ...had only 2 pages to fit in. If you are working on a system, you probably hope it will one day help design a better world, better tools, better inventions. The better is a subjective human value. A place for or human-like representation of at least rough, general human values (bias, likes) in the AGI is essential. The paper give a quick view of the Human-centric representation and behavioral systems approach for problem-solving, reasoning as giving meaning (human values) to stories and games...Indexing relations via spatially related registers is it's simulated substrate. Happy Holidays, Robert ...all the human values were biased, unlike the very objective AGI systems designed on the Mudfish's home planet; AGI systems that objectively knew that sticky mud is beautiful, large oceans of gooey mud..how enchanting! Pure clean water, now that's fishy! --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Universal intelligence test benchmark
2008/12/27 Matt Mahoney matmaho...@yahoo.com: --- On Fri, 12/26/08, Philip Hunt cabala...@googlemail.com wrote: Humans are very good at predicting sequences of symbols, e.g. the next word in a text stream. Why not have that as your problem domain, instead of text compression? That's the same thing, isn't it? Yes and no. What i mean is they may be the same in principle, but I don't think they are in practice. I'll illustrate this by way of an analogy. The Turing Test is considered by many to be a reasonable definition of intelligence. And I'd agree with them -- if a computer can fool sophisticated alert people into thinking it's a human, it's probably at least as clever as a human. Now consider the Loebner Prize. IMO this is a waste of time in terms of advancement of AI because we're not anyway near advanced enough to build a machine that can think as well as a human. So programs that are good at the Loebner prize as so not because they have good AI architectures, but because threy employ clever tricks to fool people. But that's all there is -- clever tricks with no real substance. Consider compression programs. I have several on my computer: zip, compress, bzip2, gzip, etc. These are all quite good at compression (they all seem to work well on Python source code, for example), but there is not real intelligence or understanding behind them -- they are clever tricks with no substance (where by substance I mean intelligence). Now, consider if I build a program that can predict how some sequences will continue. For example, given ABACADAEA it'll predict the next letter is F, or given: 1 2 4 8 16 32 it'll predict the next number is 64. (Whether the program works on bits, bytes, or longer chunks is a detail, though it might be an important detail.) Even though the program is good at certain types of sequences, it doesn't do compression. For it to do so, I'd have to give it some notation to build a compressed file and then uncompress it again. This is a lot of tedious detail work and doesn't add to it's intelligence. IMO it would just get in the way. -- Philip Hunt, cabala...@googlemail.com Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Universal intelligence test benchmark
2008/12/28 Philip Hunt cabala...@googlemail.com: Now, consider if I build a program that can predict how some sequences will continue. For example, given ABACADAEA it'll predict the next letter is F, or given: 1 2 4 8 16 32 it'll predict the next number is 64. (Whether the program works on bits, bytes, or longer chunks is a detail, though it might be an important detail.) Even though the program is good at certain types of sequences, it doesn't do compression. For it to do so, I'd have to give it some notation to build a compressed file and then uncompress it again. This is a lot of tedious detail work and doesn't add to it's intelligence. IMO it would just get in the way. Furthermore, I don't see that a sequence-predictor should necessarily attempt to guess the next in the sequence by attempting to generate thre shortest possible Turing machine capable of producing the sequence (certainly humans don't work that way). If sequence-predictor uses this method and is good at predictinbg sequences, good; but if it uses anotherm ethod and is good at predicting sequences, it's just as good. What matters is a program's performance, not how it does it. -- Philip Hunt, cabala...@googlemail.com Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: Human-centric AGI approach-paper (was Re: Indexing and Re: [agi] AGI Preschool: sketch of an evaluation framework for early stage AGI systems aimed at human-level, roughly humanlike AGI
Mike, Mike wrote: What kind of problems have you designed this to solve? Can you give some examples? Natural language understanding, path finding, game playing Any problems that can be represented as a situation in the four component domain (value - role - relation - feature models) can be 3-C (compared, contrast, combined) to give a resulting situation (frame pattern). What is combined/compared/or contrast?: only the regions under attention, including its focus detail level are examined. What is placed and represented in the regions determines what component can be 3-C analyzed... as a general computing paradigm using 3-C (AND - OR - NOT). Example: Here's a pattern example you may not have seen before, but by 3C you discover the pattern and how to make an example: As spoken aloud: five and nine [is] fine two and six [is] twix five and seven [is] fiven Take the five and seven = fiven. when the system compares the resultant of fiven to five ..the result is that five is at the start of the situation. When it compares fiven and seven... the result is that ven is at the end position. resulting situation PATTERN = [situation 1 ][ focus inward ] [ start-position ] combined with [situation 2 ][ focus inward ] [ end position ] (Spatial and sequence positions are a key part of the representation system) How was the correct (reasoning) method chosen? This result was was by comparison; it could have been by contrasting. All three Compare, Contrast and Combine happen symultaneously. The winner is whichever resulting situation makes sense to the system has the most activation in the value area (some direct or indirect value from past experience or value given by the authority system in the value region: e.g. fearful or attractive spectrum). How was the correct region and focus detail level chosen? The attention region in the example was on the sound region, the focus detail was on the phoneme level (syllable), it could have looked for patterns in the number values or the emotions related to each word, or the letter patterns, or hand motions, eye position when spoken, etc). The regions are biased by the value system's current index (amygdala/septum analog): e.g. when you see five the quantity region will be given a lower threshold, and the focus level associated will give the content on the 1 - 10 scale. The index region weights are re-organized only by stronger reward/failure (authority system), 3-C results can on the index changing the content connections weights. Now you compare apples to oranges for an encore; what do you get? a color, a taste, a mass, a new fruit..your attention determines te result All regions are being matched for patterns in the 2 primary index modules (action selection, emotional value,..others can be integrated seamlessly). Five and seven is not fiven, it is twelve, but in this situation it makes sense to the circumstances. Sense and meaning are contextual for the model, for humans. Hope this sheds light. Detailed paper has been in the works. Robert --- On Sun, 12/28/08, Mike Tintner tint...@blueyonder.co.uk wrote: From: Mike Tintner tint...@blueyonder.co.uk Subject: Re: Human-centric AGI approach-paper (was Re: Indexing and Re: [agi] AGI Preschool: sketch of an evaluation framework for early stage AGI systems aimed at human-level, roughly humanlike AGI To: agi@v2.listbox.com Date: Sunday, December 28, 2008, 4:49 PM Robert, What kind of problems have you designed this to solve? Can you give some examples? Robert: A brief paper on an AGI system for human-level ...had only 2 pages to fit in. If you are working on a system, you probably hope it will one day help design a better world, better tools, better inventions. The better is a subjective human value. A place for or human-like representation of at least rough, general human values (bias, likes) in the AGI is essential. The paper give a quick view of the Human-centric representation and behavioral systems approach for problem-solving, reasoning as giving meaning (human values) to stories and games...Indexing relations via spatially related registers is it's simulated substrate. Happy Holidays, Robert ...all the human values were biased, unlike the very objective AGI systems designed on the Mudfish's home planet; AGI systems that objectively knew that sticky mud is beautiful, large oceans of gooey mud..how enchanting! Pure clean water, now that's fishy! agi | Archives | Modify Your Subscription --- On Sun, 12/28/08, Mike Tintner tint...@blueyonder.co.uk wrote: From: Mike Tintner tint...@blueyonder.co.uk Subject: Re: Human-centric AGI approach-paper (was Re: Indexing and Re: [agi] AGI Preschool: sketch of an evaluation framework for early stage AGI systems aimed at human-level, roughly humanlike AGI To: agi@v2.listbox.com Date:
Re: Human-centric AGI approach-paper (was Re: Indexing and Re: [agi] AGI Preschool: sketch of an evaluation framework for early stage AGI systems aimed at human-level, roughly humanlike AGI
Robert: Example: Here's a pattern example you may not have seen before, but by 3C you discover the pattern and how to make an example: As spoken aloud: five and nine[is] fine two and six [is] twix five and seven [is] fiven Robert, So, if I understand, you're designing a system to deal with problems concerning objects, which have multiple domain associations. For example, words as above are associated with their sounds, letter patterns, and perhaps meanings. But the system always *knows* these domains beforehand - and that it must consider them in any problem? It couldn't say find the pattern to a problem like: Six 2003 Seven 1996 Eight 2001 Eight and a half ? where it wouldn't know any domain relevant to solving the problem, and would first have to *find* the appropriate domain?. (In creative, human-level intelligence problems you often have to do this). --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: [agi] Universal intelligence test benchmark
2008/12/29 Matt Mahoney matmaho...@yahoo.com: Please remember that I am not proposing compression as a solution to the AGI problem. I am proposing it as a measure of progress in an important component (prediction). Then why not cut out the middleman and measure prediction directly? I.e. put the prediction program in a test harness, feed it chunks one at a time, ask it what the next value in the sequence will be, tell it what the actual answer was, etc. The program's score is then simply the number it got right divided by the number of predictions it had to make. Turning a prediction program into a compression program requires superfluous extra work: you have to invent an efficient file format to hold compressed data, and you have to write a decompression program as well as a compressor. Furthermore there are bound to be programs that're good at compression but not good at prediction. Whereas all programs that're good at prediction are guaranteed to be good at prediction. -- Philip Hunt, cabala...@googlemail.com Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com
Re: Human-centric AGI approach-paper (was Re: Indexing and Re: [agi] AGI Preschool: sketch of an evaluation framework for early stage AGI systems aimed at human-level, roughly humanlike AGI
Mike, Very good choice. But the system always *knows* these domains beforehand - and that it must consider them in any problem? YES the domains content structure is what you mean, are the human-centric ones provided by living a childs life loading the value system with biases such as humans are warm and candy is really sweet. By further being pushed thru western culture grade level curriculum we value the visual features symbols 2003 and 1996 as numbers, then as dates. The content models (concept patterns) are build up from any basic feature to form instance from the basic content of the four domain, such as dates of leap years, century marks, millenium or anniversary. problems more like: -- ice cream favorite red happee -- What this group of words means has everything to do with what the reader knows and values beforehands. And what he values will determine what his attention is on, the food, the emotions, the color, the positions; or how deep the focus is: on the entire situation (sentence), a group of them, a single word or a letter. Humans value from the top so we'll likely think of cherry ice cream before we see: the occurance pattern of letter e in every word in that 'sentence' above. Good choice for your problem: Six 2003 Seven 1996 Eight 2001 Eight and a half ? (i see a number of patterns, such as 00 99, multiply, add word to end - but haven't gotten the complete formula) For the system, it is biased; it make sense for itself, it's internal value. The answer the system chooses is the one that makes sense to what it knows and values. Sure, it can and will be used as a general pattern mining by comparing and contrasting within lines, line-to-line, number-to-text, text-to-number, date-to-word, month-to-number, middle-part to end, end-to-end, etc, until a resulting comparison yeilds a pattern that it values (from experience or being told). However, the value system controlling attention prevents any combinatorial explosion - animals only search through the models that have value (indirectly or directly) to the problem situation, thus limiting the total gueses we could even make (it looks for patterns it already knows). To solve problems it has not been taught or can't see a pattern for: 1) If self-motivated because a reward/avoidance is strong: Keeps looking for patterns 3-C by persiting in its behavior (doing the same ol thing) and fail. If a value happens to occur in one of the result when it kept going, it will see that something was different. It has acces to its own actions (role and relation domain) and this different action stands out (auto-contrast) and become of greater value due to the associated difference (non-failure). It keeps trying until the motivation runs out (energy level decays) or other value or past experiences exceeds its model of how long it should take.. 2) Instructed how to solve it by trying x, y or x. Wden your attention, expand your focus - then it has a larger set of regions to try and find a pattern it values. If set, it can examine regions of the instruction (x, y , and z) and see what was different from what it was trying (if the comparision yeilds a high enough value, it will try those as well). Try going left and up O.K. auto-contrast I was trying only up: the difference is to add one more direction; I can try left and up and back etc.. Creativity and reason come from the 3-C mechanism Creativity in the model is to combine any sets of domain content and give it a respective value from its experience and domain models. Example: Combine the form of a computer mouse, the look of diamonds, the function of a steering wheel, with the feel of leather: what do you get? Focus on each region and combine, then e-valuate (compare it to objects, functions). What's your result? Models in my experience say that it's a luxury-car controller; while you might say it would be something in an art galleryy, etc (art, value without function/role). Anyway, Bens, pre-school for AGI is one of the means to bias such a system with experience and human values; another way is to try to properly represent human experience (static and dynamic) and then essentially implanting memories and experience instead of just declarative facts. Robert --- On Sun, 12/28/08, Mike Tintner tint...@blueyonder.co.uk wrote: From: Mike Tintner tint...@blueyonder.co.uk Subject: Re: Human-centric AGI approach-paper (was Re: Indexing and Re: [agi] AGI Preschool: sketch of an evaluation framework for early stage AGI systems aimed at human-level, roughly humanlike AGI To: agi@v2.listbox.com Date: Sunday, December 28, 2008, 8:38 PM Robert: Example: Here's a pattern example you may not have seen before, but by 3C you discover the pattern and how to make an example: As spoken aloud: five and nine [is] fine two and six [is] twix five and seven [is] fiven Robert, So, if
Re: Human-centric AGI approach-paper (was Re: Indexing and Re: [agi] AGI Preschool: sketch of an evaluation framework for early stage AGI systems aimed at human-level, roughly humanlike AGI
Robert, Thanks for your detailed, helpful replies. I like your approach of operating in multiple domains for problemsolving. But if the domains are known beforehand, then it's not truly creative problemsolving - where you do have to be prepared to go in search of the appropriate domains - and thus truly cross domains rather than simply combining preselected ones. I gave you a perhaps exaggerated example just to make the point. You had to realise that the correct domain to solve my problem was that of movies - the numbers were the titles of movies and the dates they came out. If you're dealing with real world rather than just artificial creative problems like our two, you may definitely have to make that kind of domain switch - solving any scientific detective problem, say, like that of binding in the brain, may require you to think in a surprising, new domain, for which you will have to search long and hard (and possibly without end). Mike, Very good choice. But the system always *knows* these domains beforehand - and that it must consider them in any problem? YES the domains content structure is what you mean, are the human-centric ones provided by living a childs life loading the value system with biases such as humans are warm and candy is really sweet. By further being pushed thru western culture grade level curriculum we value the visual features symbols 2003 and 1996 as numbers, then as dates. The content models (concept patterns) are build up from any basic feature to form instance from the basic content of the four domain, such as dates of leap years, century marks, millenium or anniversary. problems more like: -- ice cream favorite red happee -- What this group of words means has everything to do with what the reader knows and values beforehands. And what he values will determine what his attention is on, the food, the emotions, the color, the positions; or how deep the focus is: on the entire situation (sentence), a group of them, a single word or a letter. Humans value from the top so we'll likely think of cherry ice cream before we see: the occurance pattern of letter e in every word in that 'sentence' above. Good choice for your problem: Six 2003 Seven 1996 Eight 2001 Eight and a half ? (i see a number of patterns, such as 00 99, multiply, add word to end - but haven't gotten the complete formula) For the system, it is biased; it make sense for itself, it's internal value. The answer the system chooses is the one that makes sense to what it knows and values. Sure, it can and will be used as a general pattern mining by comparing and contrasting within lines, line-to-line, number-to-text, text-to-number, date-to-word, month-to-number, middle-part to end, end-to-end, etc, until a resulting comparison yeilds a pattern that it values (from experience or being told). However, the value system controlling attention prevents any combinatorial explosion - animals only search through the models that have value (indirectly or directly) to the problem situation, thus limiting the total gueses we could even make (it looks for patterns it already knows). To solve problems it has not been taught or can't see a pattern for: 1) If self-motivated because a reward/avoidance is strong: Keeps looking for patterns 3-C by persiting in its behavior (doing the same ol thing) and fail. If a value happens to occur in one of the result when it kept going, it will see that something was different. It has acces to its own actions (role and relation domain) and this different action stands out (auto-contrast) and become of greater value due to the associated difference (non-failure). It keeps trying until the motivation runs out (energy level decays) or other value or past experiences exceeds its model of how long it should take.. 2) Instructed how to solve it by trying x, y or x. Wden your attention, expand your focus - then it has a larger set of regions to try and find a pattern it values. If set, it can examine regions of the instruction (x, y , and z) and see what was different from what it was trying (if the comparision yeilds a high enough value, it will try those as well). Try going left and up O.K. auto-contrast I was trying only up: the difference is to add one more direction; I can try left and up and back etc.. Creativity and reason come from the 3-C mechanism Creativity in the model is to combine any sets of domain content and give it a respective value from its experience and domain models. Example: Combine the form of a computer mouse, the look of diamonds, the function of a steering wheel, with the feel of leather: what do you get? Focus on each region and combine, then e-valuate (compare it to
Re: [agi] Introducing Steve's Theory of Everything in cognition.
Steve, I should have specified further. When I say good I mean good at predicting. PCA attempts to isolate components that give maximum information... so my question to you becomes, do you think that the problem you're pointing towards is suboptimal models that don't predict the data well enough, or models that predict the data fine but aren't directly useful for what you expect them to be useful for? To that end... you weren't talking about using the *predictions* of the PCA model, but rather the principle components themselves. The components are essentially hidden variables to make the model run. I'm thinking there are two reasons to examine them: either in hopes of a coincidental direct link between the hidden variable and the goal (an expedient that would make the calculation of predictions unnecessary), or in an attempt to complexify the model to make it more accurate in its predictions, by looking for links between the hidden variables, or patterns over time, et cetera. --Abram On Sun, Dec 28, 2008 at 11:13 PM, Steve Richfield steve.richfi...@gmail.com wrote: Abram, On 12/28/08, Abram Demski abramdem...@gmail.com wrote: Steve, There has been plenty of speculation regarding just WHAT is buried in those principal components. Do they generally comprise simple combinations if identifiable features, or some sort of smushing that virtually encrypts the features? I have heard arguments on both sides of this issue. Can anyone here shine some light on this? It seems like this gets back to the ill-defined problem again. There is no way of answering without more information about what the output of PCA is to be used for! Sure we know - it is to be combined with other outputs is a big Bayesian fuzzy-logic network to recognize things, devise plans, and execute them (but hopefully not us included). The only immediate criteria we have is how good of a probabilistic model an algorithm finds. Presumably, good means suitable for the above purpose. There are, of course, many models to explain any finite set of data. Some like PCA may do so more concisely, while others may do so in ways that better lend themselves to subsequent computations, presumably to direct future actions. Conciseness will tend to be better for computation, simply because it is computationally easier to manipulate less data... of course, this is no guarantee, since the data may need to be fully uncompressed to extract a different feature, and if the compression is lossy then the feature may no longer be available. But if we know what we want to compute, we should be using supervised learning methods. At the risk of wasting a few neurons, unsupervised methods may work just as well - or even better since a layer of neurons can be completely finished before there is enough of subsequent layers put together for supervision to even work. Good predictive models will be likely to help us regardless of our goal. Of course, the entire discussion centers around good above. Unrestrained, PCA could change a neuron's functionality based on new data, and very likely wreck a functioning NN's future operation by doing so. Learning can still eventually converge. Also, I want to note that this is in some respects a quirk of NN methods that goes away if you think of things symbolically. No, I think this is a quirk of supervised learning, which people who write symbols usually avoid. Note the proposal on this thread that I just directed toward Loosemore, which also avoids this problem in a NN structure. What I think most people are missing is that done right, UNsupervised learning is WAY faster than supervised learning. Steve Richfield === On Sun, Dec 28, 2008 at 2:36 AM, Steve Richfield steve.richfi...@gmail.com wrote: Abram, On 12/27/08, Abram Demski abramdem...@gmail.com wrote: Steve, My thinking in the significant figures issue is that the purpose of unsupervised learning is to find a probabilistic model of the data There are, of course, many models to explain any finite set of data. Some like PCA may do so more concisely, while others may do so in ways that better lend themselves to subsequent computations, presumably to direct future actions. (whereas the purpose of supervised learning is to find a probabilistic model of *one* variable *conditioned on* all the others). When you talk about the insufficiency of standard PCA, do you think the problems you refer to relate to (1) PCA finding a suboptimal model, or There has been plenty of speculation regarding just WHAT is buried in those principal components. Do they generally comprise simple combinations if identifiable features, or some sort of smushing that virtually encrypts the features? I have heard arguments on both sides of this issue. Can anyone here shine some light on this? If the features can be extracted from combinations of components, then PCA is
Re: [agi] Universal intelligence test benchmark
2008/12/29 Philip Hunt cabala...@googlemail.com: 2008/12/29 Matt Mahoney matmaho...@yahoo.com: Please remember that I am not proposing compression as a solution to the AGI problem. I am proposing it as a measure of progress in an important component (prediction). [...] Turning a prediction program into a compression program requires superfluous extra work: you have to invent an efficient file format to hold compressed data, and you have to write a decompression program as well as a compressor. Incidently, reading Matt's posts got me interested in writing a compression program using Markov-chain prediction. The prediction bit was a piece of piss to write; the compression code is proving considerably more difficult. -- Philip Hunt, cabala...@googlemail.com Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=123753653-47f84b Powered by Listbox: http://www.listbox.com