--- Richard Loosemore <[EMAIL PROTECTED]> wrote: > Matt Mahoney wrote: > > > One problem with some > > connectionist models is trying to assign a 1-1 mapping between words and > > neurons. The brain might have 10^8 neurons devoted to language, enough to > > represent many copies of the different senses of a word and to learn new > ones. > > But most of the nets I am talking about do not assign 1 neuron to one > concept: they had three layers of roughly ten nodes each, and total > connectivity between layers (so 100 plus 100 connection weights). It > was the *weights* that stored the data, not the neurons. And the > concepts were stored across *all* of the weights. > > Ditto for the brain. With a few thousand neurons, in three layers, we > could store ALL of the grapheme-phoneme correspondences in one entire > language.
That is true, but there are about 1000 times as many words as there are graphemes or phonemes, so you need 1000 times as many neurons, or 10^6 times as many connections. (There are 10^6 times as many possible relations between words as between graphemes and phonemes). I think if it were as easy as you say, I think it would have been done by now. > >> Then you will need to represent layered representations: concepts > >> learned from conjunctikons of other concepts rather than layer-1 > >> percepts. Then represent action, negation, operations, intentions, > >> variables....... > > > > These are high level grammars, like learning how to convert word problems > into > > arithmetic or first order logic. I think anything learned at the level of > > higher education is going to require a huge network (beyond what is > practical > > now), but I think the underlying learning principles are the same. > > Oh, I disagree entirely: these are the basic things needed as the > *underpinning* of the grammar. You need action for verbs, negation for > everything, operations for abstraction, etc. etc. How do humans learn these things using only neurons that follow simple rules? I think learning arithmetic or logic is similar to learning grammar. For example, you can learn to substitute "a + b" for "b + a" using the same type of representation you might use to substitute "I gave Bob $10" with "Bob was given $10 by me". Negation is hard to learn. For example, if you read "Nutra-Sweet does not cause stomach cancer", you might start to believe that it does. We learn negation more as an abstract symbol, e.g. "neither x nor y" means "not x and not y". When we build knowledge representation systems, we build logical operators into the system as primitives because we don't know any other way to do it. Logic is hard even for humans to learn. It is a high level language skill. I think it dooms the usual (but always unsuccessful) approach of building a structured knowledge base and trying to tack on a natural language interface later. > But you cannot do any estimates like that until the algorithm itself is > clear: there are no *algorithms* available for grammar learning, > nothing that describes the class of all possible algorithms that do > grammar learning. Complexity calculations mean nothing for handwaving > suggestions about (eg) representing numbers of neurons: they strictly > only apply to situations in which you can point to an algorithm and ask > how it behaves. My original dissertation topic (until I changed it to get funding) was to do exactly that. I looked at about 30 different language models, comparing compression ratio with model size, and projecting what size model would be needed to compress text to the entropy estimated by Shannon in 1950 using human text prediction (about 1 bit per character). The graph is here: http://cs.fit.edu/~mmahoney/dissertation/ It suggests very roughly 10^8 to 10^10 bits, in agreement with three other estimates of 10^9 bits: 1. Turing's 1950 estimate, which he did not explain. 2. Landauer's estimate of human long term memory capacity based on memory tests. 3. The approximate information content of all the language you are exposed to through about age 20. This estimate is independent of the algorithm, so it only predicts memory requirements, not speed. If you use a neural network, that is about 10^9 connections. To train on 1 GB of text, you need about 10^18 operations, about a year on a PC. I think there are ways to optimize this, such as activating only a small number of neurons at any one time, and other tricks, but of course I am breaking the rule of getting it to work first and optimizing later. Also, it does not explain why the brain seems to use so much more memory and processing than these estimates, higher by a factor of perhaps 10^4 to 10^6. But of course language evolved to fit our brains, not the other way around. A lot of smart people are working on AGI, including many on this list. I don't believe the reason it hasn't been solved yet is because we are too dumb to figure it out. -- Matt Mahoney, [EMAIL PROTECTED] ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=4007604&user_secret=8eb45b07
