Re: Neural language models (was Re: [singularity] Help get the 400k SIAI matching challenge on DIGG's front page)

Matt Mahoney Thu, 17 May 2007 19:28:44 -0700

--- Richard Loosemore <[EMAIL PROTECTED]> wrote:

> Matt Mahoney wrote:
> 
> > One problem with some
> > connectionist models is trying to assign a 1-1 mapping between words and
> > neurons.  The brain might have 10^8 neurons devoted to language, enough to
> > represent many copies of the different senses of a word and to learn new
> ones.
> 
> But most of the nets I am talking about do not assign 1 neuron to one 
> concept:  they had three layers of roughly ten nodes each, and total 
> connectivity between layers (so 100 plus 100 connection weights).  It 
> was the *weights* that stored the data, not the neurons.  And the 
> concepts were stored across *all* of the weights.
> 
> Ditto for the brain.  With a few thousand neurons, in three layers, we 
> could store ALL of the grapheme-phoneme correspondences in one entire 
> language.


That is true, but there are about 1000 times as many words as there are
graphemes or phonemes, so you need 1000 times as many neurons, or 10^6 times
as many connections.  (There are 10^6 times as many possible relations between
words as between graphemes and phonemes).

I think if it were as easy as you say, I think it would have been done by now.


> >> Then you will need to represent layered representations:  concepts 
> >> learned from conjunctikons of other concepts rather than layer-1 
> >> percepts.  Then represent action, negation, operations, intentions, 
> >> variables.......
> > 
> > These are high level grammars, like learning how to convert word problems
> into
> > arithmetic or first order logic.  I think anything learned at the level of
> > higher education is going to require a huge network (beyond what is
> practical
> > now), but I think the underlying learning principles are the same.
> 
> Oh, I disagree entirely:  these are the basic things needed as the 
> *underpinning* of the grammar.  You need action for verbs, negation for 
> everything, operations for abstraction, etc. etc.

How do humans learn these things using only neurons that follow simple rules?

I think learning arithmetic or logic is similar to learning grammar.  For
example, you can learn to substitute "a + b" for "b + a" using the same type
of representation you might use to substitute "I gave Bob $10" with "Bob was
given $10 by me".

Negation is hard to learn.  For example, if you read "Nutra-Sweet does not
cause stomach cancer", you might start to believe that it does.  We learn
negation more as an abstract symbol, e.g. "neither x nor y" means "not x and
not y".

When we build knowledge representation systems, we build logical operators
into the system as primitives because we don't know any other way to do it. 
Logic is hard even for humans to learn.  It is a high level language skill.  I
think it dooms the usual (but always unsuccessful) approach of building a
structured knowledge base and trying to tack on a natural language interface
later.

> But you cannot do any estimates like that until the algorithm itself is 
> clear:  there are no *algorithms* available for grammar learning, 
> nothing that describes the class of all possible algorithms that do 
> grammar learning.  Complexity calculations mean nothing for handwaving 
> suggestions about (eg) representing numbers of neurons:  they strictly 
> only apply to situations in which you can point to an algorithm and ask 
> how it behaves.

My original dissertation topic (until I changed it to get funding) was to do
exactly that.  I looked at about 30 different language models, comparing
compression ratio with model size, and projecting what size model would be
needed to compress text to the entropy estimated by Shannon in 1950 using
human text prediction (about 1 bit per character).  The graph is here:
http://cs.fit.edu/~mmahoney/dissertation/

It suggests very roughly 10^8 to 10^10 bits, in agreement with three other
estimates of 10^9 bits:
1. Turing's 1950 estimate, which he did not explain.
2. Landauer's estimate of human long term memory capacity based on memory
tests.
3. The approximate information content of all the language you are exposed to
through about age 20.

This estimate is independent of the algorithm, so it only predicts memory
requirements, not speed.  If you use a neural network, that is about 10^9
connections.  To train on 1 GB of text, you need about 10^18 operations, about
a year on a PC.  I think there are ways to optimize this, such as activating
only a small number of neurons at any one time, and other tricks, but of
course I am breaking the rule of getting it to work first and optimizing
later.

Also, it does not explain why the brain seems to use so much more memory and
processing than these estimates, higher by a factor of perhaps 10^4 to 10^6. 
But of course language evolved to fit our brains, not the other way around.

A lot of smart people are working on AGI, including many on this list.  I
don't believe the reason it hasn't been solved yet is because we are too dumb
to figure it out.


-- Matt Mahoney, [EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=4007604&user_secret=8eb45b07

Re: Neural language models (was Re: [singularity] Help get the 400k SIAI matching challenge on DIGG's front page)

Reply via email to